RE: Table Upload Optimization

Mark Vigeant Wed, 21 Oct 2009 11:31:45 -0700

Also, I updated the configuration and things seem to be working a bit better.


What's a good heap size to set?

-----Original Message-----
From: [email protected] [mailto:[email protected]] On Behalf Of stack
Sent: Wednesday, October 21, 2009 12:46 PM
To: [email protected]
Subject: Re: Table Upload Optimization

On Wed, Oct 21, 2009 at 8:53 AM, Mark Vigeant
<[email protected]>wrote:

> >I saw this in your first posting: 10/21/09 10:22:52 INFO mapred.JobClient:
> >map 100% reduce 0%.
>
> >Is your job writing hbase in the map task or in reducer?  Are you using
> >TableOutputFormat?
>
> I am using table output format and only a mapper. There is no reducer.
> Would a reducer make things more efficient?
>
>
No.  Unless you need the reduce step for some reason avoid it.




>
> >> I'm using Hadoop 0.20.1 and HBase 0.20.0
> >>
> >> Each node is a virtual machine with 2 CPU, 4 GB host memory and 100 GB
> >> storage.
> >>
> >>
> >You are running DN, TT, HBase, and ZK on above?  One disk shared by all?
>
> I'm only running zookeeper on 2 of the above nodes, and then a TT DN and
> regionserver on all.
>
>
zk cluster should be an odd number.

One disk shared by all?



> >Children running at any one time on a TaskTracker.  You should start with
> >one only since you have such an anemic platform.
>
> Ah, and I can set that in the hadoop config?
>
>

<property>
  <name>mapred.tasktracker.map.tasks.maximum</name>
  <value>2</value>
  <description>The maximum number of map tasks that will be run
  simultaneously by a task tracker.
  </description>
</property>



St.Ack



> >You've upped filedescriptors and xceivers, all the stuff in 'Getting
> >Started'?
>
> And no it appears as though I accidentally overlooked that beginning stuff.
> Yikes. Ok.
>
> I will take care of those and get back to you.
>
>


>
> >
> > -----Original Message-----
> > From: [email protected] [mailto:[email protected]] On Behalf Of
> > Jean-Daniel Cryans
> > Sent: Wednesday, October 21, 2009 11:04 AM
> > To: [email protected]
> > Subject: Re: Table Upload Optimization
> >
> > Well the XMLStreamingInputFormat lets you map XML files which is neat
> > but it has a problem and always needs to be patched. I wondered if
> > that was missing but in your case it's not the problem.
> >
> > Did you check the logs of the master and region servers? Also I'd like to
> > know
> >
> > - Version of Hadoop and HBase
> > - Nodes's hardware
> > - How many map slots per TT
> > - HBASE_HEAPSIZE from conf/hbase-env.sh
> > - Special configuration you use
> >
> > Thx,
> >
> > J-D
> >
> > On Wed, Oct 21, 2009 at 7:57 AM, Mark Vigeant
> > <[email protected]> wrote:
> > > No. Should I?
> > >
> > > -----Original Message-----
> > > From: [email protected] [mailto:[email protected]] On Behalf Of
> > Jean-Daniel Cryans
> > > Sent: Wednesday, October 21, 2009 10:55 AM
> > > To: [email protected]
> > > Subject: Re: Table Upload Optimization
> > >
> > > Are you using the Hadoop Streaming API?
> > >
> > > J-D
> > >
> > > On Wed, Oct 21, 2009 at 7:52 AM, Mark Vigeant
> > > <[email protected]> wrote:
> > >> Hey
> > >>
> > >> So I want to upload a lot of XML data into an HTable. I have a class
> > that successfully maps up to about 500 MB of data or so (on one
> > regionserver) into a table, but if I go for much bigger than that it
> takes
> > forever and eventually just stops. I tried uploading a big XML file into
> my
> > 4 regionserver cluster (about 7 GB) and it's been a day and it's still
> going
> > at it.
> > >>
> > >> What I get when I run the job on the 4 node cluster is:
> > >> 10/21/09 10:22:35 INFO mapred.LocalJobRunner:
> > >> 10/21/09 10:22:38 INFO mapred.LocalJobRunner:
> > >> (then it does that for a while until...)
> > >> 10/21/09 10:22:52 INFO mapred.TaskRunner: Task
> > attempt_local_0001_m_000117_0 is done. And is in the process of
> committing
> > >> 10/21/09 10:22:52 INFO mapred.LocalJobRunner:
> > >> 10/21/09 10:22:52 mapred.TaskRunner: Task
> > 'attempt_local_0001_m_000117_0' is done.
> > >> 10/21/09 10:22:52 INFO mapred.JobClient:   map 100% reduce 0%
> > >> 10/21/09 10:22:58 INFO mapred.LocalJobRunner:
> > >> 10/21/09 10:22:59 INFO mapred.JobClient: map 99% reduce 0%
> > >>
> > >>
> > >> I'm convinced I'm not configuring hbase or hadoop correctly. Any
> > suggestions?
> > >>
> > >> Mark Vigeant
> > >> RiskMetrics Group, Inc.
> > >>
> > >
> >
>

RE: Table Upload Optimization

Reply via email to