St.Ack,

I ran a test yesterday that limited concurrent MR tasks to one per node,
which didn't appear to resolve the issue.

Essentially our bulk import job inserts into two tables simultaneously. The
first table is a simple lookup table named "content_bridge" that we're using
for key translation and has a single column family. The second table named
"content" contains the majority of the data and consists of 12 families.

We wrote a couple of map/reduce jobs to help with validation and noticed
that we're not only missing rows from the "content" table, be we also appear
to be missing data in column families for rows that are actually in the
table. Our import code validates each piece of content prior to insertion so
we're confident that each row should have data for particular columns. Rows
that were missing values for particular these would have been thrown out
prior to insertion into HBase.

I was able to extract the following context around the compaction/split
failure from the regionserver logs.
http://pastebin.com/yrJqxbv5

I tried grepping the datanode logs for the same file
(/hbase/content/compaction.dir/390541851/6685068727329269560) identified in
the regionserver log, but found nothing matching.

Thanks for your help,

Nathan

On Mon, Mar 22, 2010 at 1:46 PM, Stack <st...@duboce.net> wrote:

> On Sun, Mar 21, 2010 at 3:50 PM, Nathan Harkenrider
> <nathan.harkenri...@gmail.com> wrote:
> > I managed to locate the following errors in the regionserver logs related
> to
> > failed compactions and/or splits.
> > http://pastebin.com/5WjDpS9F
> >
> Is there anything else earlier in the logs on why the fail happened?
> You might try running one MR task per node rather than 3.  You only
> have 8G of RAM so three concurrent children are taking resources from
> running datanodes and regionservers.
> St.Ack
>

Reply via email to