I noticed mapreduce.Export.createSubmittableJob() doesn't call setCaching() in 0.20.3
Should call to setCaching() be added ? Thanks On Sun, Apr 11, 2010 at 2:14 AM, Jean-Daniel Cryans <[email protected]>wrote: > A map against a HBase table by default cannot have more tasks than the > number of regions in that table. > > Also you want to enable scanner caching. Pass a Scan object to the > TableMapReduceUtil.initTableMapperJob that is configured with > scan.setCaching(some_value) where the value should be the number of > rows to fetch every time we hit a region server with next(). On rows > of 100-200 bytes, our jobs usually are configured with 1000 up to > 10000. > > Finally, is your job running in local mode or on a job tracker? Even > if HBase uses HDFS, it usually doesn't know of the job tracker unless > you configure HBase's classpath with Hadoop's conf. > > J-D > > On Sun, Apr 11, 2010 at 3:17 AM, Andriy Kolyadenko > <[email protected]> wrote: > > Hi, > > > > thanks for quick response. I tried to do following in the code: > > > > job.getConfiguration().setInt("mapred.map.tasks", 10000); > > > > but unfortunately have the same result. > > > > Any other ideas? > > > > --- [email protected] wrote: > > > > From: Amandeep Khurana <[email protected]> > > To: [email protected], [email protected] > > Subject: Re: set number of map tasks for HBase MR > > Date: Sat, 10 Apr 2010 20:04:18 -0700 > > > > You can set the number of map tasks in your job config to a big number > (eg: > > 100000), and the library will automatically spawn one map task per > region. > > > > -ak > > > > > > Amandeep Khurana > > Computer Science Graduate Student > > University of California, Santa Cruz > > > > > > On Sat, Apr 10, 2010 at 7:59 PM, Andriy Kolyadenko < > > [email protected]> wrote: > > > >> Hi guys, > >> > >> I have about 8G Hbase table and I want to run MR job against it. It > works > >> extremely slow in my case. One thing I noticed is that job runs only 2 > map > >> tasks. Is it any way to setup bigger number of map tasks? I sow some > method > >> in mapred package, but can't find anything like this in new mapreduce > >> package. > >> > >> I run my MR job one a single machine in cluster mode. > >> > >> > >> _____________________________________________________________ > >> Sign up for your free SaturnFans email account at > >> http://webmail.saturnfans.com/ > >> > > > > > > > > > > _____________________________________________________________ > > Sign up for your free SaturnFans email account at > http://webmail.saturnfans.com/ > > >
