Thanks for your suggestions, Ryan, Robert, Andy.
I was able to get it to run all the way through now, and loaded up 300
million rows, which is the volume I wanted in order to do some performance
testing for my application.
I'm not entirely sure what made the difference. At first I was trying one
thing at a time and getting inconcistent results (but always an eventual
failure). Then I just reworked the whole thing and got it to work. Here are
the main changes I made:
- Large instances in EC2 -- meaning more memory and more cores.
- Xcievers setting for HDFS and ulimit -n increase
- Moved the processes around.
- Originally I had both the Hadoop Master and HBase master on the same
instance. Now they are separate.
- I also killed the TaskTrackers on the Hadoop Master and the HBase
master
- Reduced the number of map tasks and reduce tasks to "2"
- Provided more free disk space for the HDFS nodes
- Started with a fresh, empty HDFS filesystem
Marc
On Thu, Jan 7, 2010 at 2:54 PM, Andrew Purtell <[email protected]> wrote:
> Thanks.
>
> Not that you might be missing something, but note that we do set up a
> separate
> ZooKeeper quorum ensemble using c1.medium instances, so ZK can be resource
> independent.
>
> - Andy
>
>
>
> ----- Original Message ----
> > From: Marc Limotte <[email protected]>
> > To: [email protected]
> > Sent: Thu, January 7, 2010 1:42:29 PM
> > Subject: Re: Seeing errors after loading a fair amount of data.
> KeeperException$NoNodeException, IOException
> >
> > I'm using own scripts and methods to construct the EC2 cluster. I'll
> take a
> > look through the src/contrib scripts, though, maybe there's a clue there
> > about something I'm missing.
> >
> > Marc
> >
> > On Thu, Jan 7, 2010 at 12:49 PM, Andrew Purtell wrote:
> >
> > > Marc,
> > >
> > > Are you using the HBase EC2 scripts (in src/contrib/ec2/), or is this a
> set
> > > of instances you are setting up using your own methods?
> > >
> > > Best regards,
> > >
> > > - Andy
> > >
> > >
> > >
> > > ----- Original Message ----
> > > > From: Marc Limotte
> > > > To: [email protected]
> > > > Sent: Thu, January 7, 2010 12:39:48 AM
> > > > Subject: Re: Seeing errors after loading a fair amount of data.
> > > KeeperException$NoNodeException, IOException
> > > > I increased ulimt and Xceivers. My load job still dies, but the
> > > > RegionServers stay up and running, and I can use the hbase shell to
> > > retrieve
> > > > a row, so I guess HBase is still running.
> > > [...]
> > > > I also should have mentioned that this is running in Amazon EC2. I
> > > checked
> > > > out the recommendations for EC2 on the wiki, and hence this
> particular
> > > run
> > > > is on a 'c1.xlarge' instance, although most of my prior testing has
> been
> > > on
> > > > m1.large.
> > > [...]
> > >
> > >
> > >
> > >
> > >
>
>
>
>
>
>