[ https://issues.apache.org/jira/browse/HBASE-8031?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13623201#comment-13623201 ]
Hudson commented on HBASE-8031: ------------------------------- Integrated in HBase-0.94-security-on-Hadoop-23 #13 (See [https://builds.apache.org/job/HBase-0.94-security-on-Hadoop-23/13/]) HBASE-8031 Adopt goraci as an Integration test (Revision 1456284) Result = FAILURE enis : Files : * /hbase/branches/0.94/src/test/java/org/apache/hadoop/hbase/test/IntegrationTestBigLinkedList.java > Adopt goraci as an Integration test > ----------------------------------- > > Key: HBASE-8031 > URL: https://issues.apache.org/jira/browse/HBASE-8031 > Project: HBase > Issue Type: Improvement > Components: test > Reporter: Enis Soztutar > Assignee: Enis Soztutar > Fix For: 0.95.0, 0.98.0, 0.94.6 > > Attachments: hbase-8031_v1-0.94.patch, hbase-8031_v1.patch > > > As you might know, I am a big fan of the goraci test that Keith Turner has > developed, which in turn is inspired by the Accumulo test called Continuous > Ingest. > As much as I hate to say it, having to rely on gora and and external github > library makes using this lib cumbersome. And lately we had to use this for > testing against secure clusters and with Hadoop2, which gora does not support > for now. > So, I am proposing we add this test as an IT in the HBase code base so that > all HBase devs can benefit from it. > The original source code can be found here: > * https://github.com/keith-turner/goraci > * https://github.com/enis/goraci/ > From the javadoc: > {code} > Apache Accumulo [0] has a simple test suite that verifies that data is not > * lost at scale. This test suite is called continuous ingest. This test runs > * many ingest clients that continually create linked lists containing 25 > * million nodes. At some point the clients are stopped and a map reduce job > is > * run to ensure no linked list has a hole. A hole indicates data was lost.·· > * > * The nodes in the linked list are random. This causes each linked list to > * spread across the table. Therefore if one part of a table loses data, then > it > * will be detected by references in another part of the table. > * > Below is rough sketch of how data is written. For specific details look at > * the Generator code. > * > * 1 Write out 1 million nodes· 2 Flush the client· 3 Write out 1 million that > * reference previous million· 4 If this is the 25th set of 1 million nodes, > * then update 1st set of million to point to last· 5 goto 1 > * > * The key is that nodes only reference flushed nodes. Therefore a node should > * never reference a missing node, even if the ingest client is killed at any > * point in time. > * > * Some ASCII art time: > * [ . . . ] represents one batch of random longs of length WIDTH > * > * _________________________ > * | ______ | > * | | || > * __+_________________+_____ || > * v v v ||| > * first = [ . . . . . . . . . . . ] ||| > * ^ ^ ^ ^ ^ ^ ^ ^ ^ ^ ^ ||| > * | | | | | | | | | | | ||| > * prev = [ . . . . . . . . . . . ] ||| > * ^ ^ ^ ^ ^ ^ ^ ^ ^ ^ ^ ||| > * | | | | | | | | | | | ||| > * current = [ . . . . . . . . . . . ] ||| > * ||| > * ... ||| > * ||| > * last = [ . . . . . . . . . . . ] ||| > * | | | | | | | | | | |-----||| > * | |--------|| > * |___________________________| > {code} -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira