[ 
https://issues.apache.org/jira/browse/HBASE-12782?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

stack updated HBASE-12782:
--------------------------
    Attachment: 12782.unit.test.writing.txt

Focusing on write side first.

Debugging, the emission on end of verify step is of no use. I find that I have 
to go into the reduce logging to find these log lines from ITBLL:

          LOG.error("Linked List error: Key = " + keyString + " References = " 
+ refsSb.toString());

I then take the 'References' record, do a get on it.  It is the 'meta:previous' 
that is 'missing'.  This missing record will have been 'written' as part of the 
previous 1M writes at 'count' - 1M. The time on this record will be a timestamp 
that is '1M' ahead of when the 'missing' record would have been written 
(usually about 15seconds per 1M but if server down, can be minutes writing the 
1M).

The ITBLL rows have too many unprintable characters -- quotes, single ticks, 
left braces, etc. -- to make for easy scripting.  Tried but its kinda tough 
bridging 'text' output -- escaped bytes -- jruby and java.  Spent some time 
trying to write rows with printable records but seems to make for more 
failures; need to spend time on this... as is its hard to script ITBLL failures 
so can get a 'bigger picture' on failure profile.  Another issue.

I've disabled killing master and splits to make things easier for myself. We 
still fail reliably.

I can triangulate a little looking at a few failed records and have identified 
suspicious-looking write periods as asyncprocess tries to cross over a failed 
regionserver.  The attached test reproduces the same logging sequence in a unit 
test (was trying to narrow the moving parts around a failure) that I see up in 
cluster but it looks like the asyncprocess is not the issue; it's accounting 
doesn't seem to be hiccuping.

Let me redo this test as an integrationtest to run against the cluster to be 
sure -- perhaps it a timing thing hard to repro in the one JVM -- but it 
doesn't look like write side is the issue.  Dang.

> ITBLL fails for me if generator does anything but 5M per maptask
> ----------------------------------------------------------------
>
>                 Key: HBASE-12782
>                 URL: https://issues.apache.org/jira/browse/HBASE-12782
>             Project: HBase
>          Issue Type: Bug
>          Components: integration tests
>    Affects Versions: 1.0.0
>            Reporter: stack
>            Priority: Critical
>             Fix For: 1.0.0
>
>         Attachments: 12782.unit.test.writing.txt
>
>
> Anyone else seeing this?  If I do an ITBLL with generator doing 5M rows per 
> maptask, all is good -- verify passes. I've been running 5 servers and had 
> one splot per server.  So below works:
> HADOOP_CLASSPATH="/home/stack/conf_hbase:`/home/stack/hbase/bin/hbase 
> classpath`" ./hadoop/bin/hadoop --config ~/conf_hadoop 
> org.apache.hadoop.hbase.test.IntegrationTestBigLinkedList --monkey 
> serverKilling Generator 5 5000000 g1.tmp
> or if I double the map tasks, it works:
> HADOOP_CLASSPATH="/home/stack/conf_hbase:`/home/stack/hbase/bin/hbase 
> classpath`" ./hadoop/bin/hadoop --config ~/conf_hadoop 
> org.apache.hadoop.hbase.test.IntegrationTestBigLinkedList --monkey 
> serverKilling Generator 10 5000000 g2.tmp
> ...but if I change the 5M to 50M or 25M, Verify fails.
> Looking into it.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to