Also, I updated the configuration and things seem to be working a bit better.
What's a good heap size to set? -----Original Message----- From: [email protected] [mailto:[email protected]] On Behalf Of stack Sent: Wednesday, October 21, 2009 12:46 PM To: [email protected] Subject: Re: Table Upload Optimization On Wed, Oct 21, 2009 at 8:53 AM, Mark Vigeant <[email protected]>wrote: > >I saw this in your first posting: 10/21/09 10:22:52 INFO mapred.JobClient: > >map 100% reduce 0%. > > >Is your job writing hbase in the map task or in reducer? Are you using > >TableOutputFormat? > > I am using table output format and only a mapper. There is no reducer. > Would a reducer make things more efficient? > > No. Unless you need the reduce step for some reason avoid it. > > >> I'm using Hadoop 0.20.1 and HBase 0.20.0 > >> > >> Each node is a virtual machine with 2 CPU, 4 GB host memory and 100 GB > >> storage. > >> > >> > >You are running DN, TT, HBase, and ZK on above? One disk shared by all? > > I'm only running zookeeper on 2 of the above nodes, and then a TT DN and > regionserver on all. > > zk cluster should be an odd number. One disk shared by all? > >Children running at any one time on a TaskTracker. You should start with > >one only since you have such an anemic platform. > > Ah, and I can set that in the hadoop config? > > <property> <name>mapred.tasktracker.map.tasks.maximum</name> <value>2</value> <description>The maximum number of map tasks that will be run simultaneously by a task tracker. </description> </property> St.Ack > >You've upped filedescriptors and xceivers, all the stuff in 'Getting > >Started'? > > And no it appears as though I accidentally overlooked that beginning stuff. > Yikes. Ok. > > I will take care of those and get back to you. > > > > > > > -----Original Message----- > > From: [email protected] [mailto:[email protected]] On Behalf Of > > Jean-Daniel Cryans > > Sent: Wednesday, October 21, 2009 11:04 AM > > To: [email protected] > > Subject: Re: Table Upload Optimization > > > > Well the XMLStreamingInputFormat lets you map XML files which is neat > > but it has a problem and always needs to be patched. I wondered if > > that was missing but in your case it's not the problem. > > > > Did you check the logs of the master and region servers? Also I'd like to > > know > > > > - Version of Hadoop and HBase > > - Nodes's hardware > > - How many map slots per TT > > - HBASE_HEAPSIZE from conf/hbase-env.sh > > - Special configuration you use > > > > Thx, > > > > J-D > > > > On Wed, Oct 21, 2009 at 7:57 AM, Mark Vigeant > > <[email protected]> wrote: > > > No. Should I? > > > > > > -----Original Message----- > > > From: [email protected] [mailto:[email protected]] On Behalf Of > > Jean-Daniel Cryans > > > Sent: Wednesday, October 21, 2009 10:55 AM > > > To: [email protected] > > > Subject: Re: Table Upload Optimization > > > > > > Are you using the Hadoop Streaming API? > > > > > > J-D > > > > > > On Wed, Oct 21, 2009 at 7:52 AM, Mark Vigeant > > > <[email protected]> wrote: > > >> Hey > > >> > > >> So I want to upload a lot of XML data into an HTable. I have a class > > that successfully maps up to about 500 MB of data or so (on one > > regionserver) into a table, but if I go for much bigger than that it > takes > > forever and eventually just stops. I tried uploading a big XML file into > my > > 4 regionserver cluster (about 7 GB) and it's been a day and it's still > going > > at it. > > >> > > >> What I get when I run the job on the 4 node cluster is: > > >> 10/21/09 10:22:35 INFO mapred.LocalJobRunner: > > >> 10/21/09 10:22:38 INFO mapred.LocalJobRunner: > > >> (then it does that for a while until...) > > >> 10/21/09 10:22:52 INFO mapred.TaskRunner: Task > > attempt_local_0001_m_000117_0 is done. And is in the process of > committing > > >> 10/21/09 10:22:52 INFO mapred.LocalJobRunner: > > >> 10/21/09 10:22:52 mapred.TaskRunner: Task > > 'attempt_local_0001_m_000117_0' is done. > > >> 10/21/09 10:22:52 INFO mapred.JobClient: map 100% reduce 0% > > >> 10/21/09 10:22:58 INFO mapred.LocalJobRunner: > > >> 10/21/09 10:22:59 INFO mapred.JobClient: map 99% reduce 0% > > >> > > >> > > >> I'm convinced I'm not configuring hbase or hadoop correctly. Any > > suggestions? > > >> > > >> Mark Vigeant > > >> RiskMetrics Group, Inc. > > >> > > > > > >
