St.Ack, I ran a test yesterday that limited concurrent MR tasks to one per node, which didn't appear to resolve the issue.
Essentially our bulk import job inserts into two tables simultaneously. The first table is a simple lookup table named "content_bridge" that we're using for key translation and has a single column family. The second table named "content" contains the majority of the data and consists of 12 families. We wrote a couple of map/reduce jobs to help with validation and noticed that we're not only missing rows from the "content" table, be we also appear to be missing data in column families for rows that are actually in the table. Our import code validates each piece of content prior to insertion so we're confident that each row should have data for particular columns. Rows that were missing values for particular these would have been thrown out prior to insertion into HBase. I was able to extract the following context around the compaction/split failure from the regionserver logs. http://pastebin.com/yrJqxbv5 I tried grepping the datanode logs for the same file (/hbase/content/compaction.dir/390541851/6685068727329269560) identified in the regionserver log, but found nothing matching. Thanks for your help, Nathan On Mon, Mar 22, 2010 at 1:46 PM, Stack <st...@duboce.net> wrote: > On Sun, Mar 21, 2010 at 3:50 PM, Nathan Harkenrider > <nathan.harkenri...@gmail.com> wrote: > > I managed to locate the following errors in the regionserver logs related > to > > failed compactions and/or splits. > > http://pastebin.com/5WjDpS9F > > > Is there anything else earlier in the logs on why the fail happened? > You might try running one MR task per node rather than 3. You only > have 8G of RAM so three concurrent children are taking resources from > running datanodes and regionservers. > St.Ack >