Thanks to Jean-Marc's assistance, the cause for TestVisibilityLabelsWithDistributedLogReplay#testAddVisibilityLabelsOnRSRestart failure has been found.
I logged HBASE-11878 and attached a preliminary patch. Cheers On Mon, Sep 1, 2014 at 10:41 AM, Ted Yu <yuzhih...@gmail.com> wrote: > Searching for 'hbase.ResourceChecker(147): before' in > http://server.distparser.com:81/hbase/with_teds_patch/node8/hbase-server/target/surefire-reports/org.apache.hadoop.hbase.security.visibility.TestVisibilityLabelsWithDistributedLogReplay-output.txt > showed that testAddVisibilityLabelsOnRSRestart was the first sub-test to > be run: > > 2014-09-01 08:41:12,266 INFO [pool-1-thread-1] hbase.ResourceChecker(147): > before: > security.visibility.TestVisibilityLabelsWithDistributedLogReplay#testAddVisibilityLabelsOnRSRestart > Thread=295, OpenFileDescriptor=380, MaxFileDescriptor=65536, > > What Ram described was different from the problem JMS reported - since "ABC" > and "XYZ" were not written to hbase:labels table, the test would surely fail. > > Cheers > > > > On Mon, Sep 1, 2014 at 9:53 AM, ramkrishna vasudevan < > ramkrishna.s.vasude...@gmail.com> wrote: > >> To reproduce just ensured that >> in TestVisibilityLabelsWithDefaultVisLabelService the testAddLabels and >> then testVddAddVisibilityLabelsOnRSRestart runs. >> This ensures that the total no of labels is 17 and not 13. >> >> So may be when it runs in Jenkins the order of execution determines the >> way >> the labels are added or deleted. >> String[] labels = { SECRET, TOPSECRET, CONFIDENTIAL, PUBLIC, PRIVATE, >> COPYRIGHT, >> ACCENT, UNICODE_VIS_TAG, UC1, UC2 }; >> static String[] labels1 = { "L1", SECRET, "L2", "invalid~", "L3" }; >> static String[] labels2 = { SECRET, CONFIDENTIAL, PRIVATE, "ABC", "XYZ" >> }; >> >> Count this excluding the repeated confidential, secret and private it >> gives >> 17. So may be we can seperate the test cases? >> >> Regards >> Ram >> >> >> On Mon, Sep 1, 2014 at 9:55 PM, Jean-Marc Spaggiari < >> jean-m...@spaggiari.org >> > wrote: >> >> > Cool! Glad to see I'm not the only one ;) >> > >> > Since it passed on Jenkins does it mean we need to add some additional >> > tests? >> > Le 2014-09-01 12:20, "ramkrishna vasudevan" < >> > ramkrishna.s.vasude...@gmail.com> a écrit : >> > >> > > Am able to reproduce this locally. Will try to come up with a patch. >> > > >> > > >> > > On Mon, Sep 1, 2014 at 9:43 PM, Jean-Marc Spaggiari < >> > > jean-m...@spaggiari.org >> > > > wrote: >> > > >> > > > So I guess that's why we don't see the log you have added? It loops >> and >> > > > timeout and never reach that point? >> > > > >> > > > What's next? Anything else I can run/do/test/check? I can even >> install >> > > > Jenkins on those 4 servers.... >> > > > >> > > > >> > > > 2014-09-01 11:59 GMT-04:00 Ted Yu <yuzhih...@gmail.com>: >> >> > > > >> > > > > The patch I sent to Jean-Marc added a loop when retrieving labels. >> > > > > >> > > > > Looking at >> > > > > >> > > > > >> > > > >> > > >> > >> http://server.distparser.com:81/hbase/with_teds_patch/node8/hbase-server/target/surefire-reports/org.apache.hadoop.hbase.security.visibility.TestVisibilityLabelsWithDistributedLogReplay-output.txt >> > > > > , there was no log from DefaultVisibilityLabelServiceImpl w.r.t. >> > labels >> > > > ABC >> > > > > and XYZ. >> > > > > >> > > > > The timeout of test was due to these two labels not written to >> > > > hbase:labels >> > > > > table. >> > > > > >> > > > > Cheers >> > > > > >> > > > > >> > > > > >> > > > > On Mon, Sep 1, 2014 at 8:26 AM, Jean-Marc Spaggiari < >> > > > > jean-m...@spaggiari.org >> > > > > > wrote: >> > > > > >> > > > > > Ted sent me another patch to test. >> > > > > > >> > > > > > Everything is here: >> > > > > http://server.distparser.com:81/hbase/with_teds_patch/ >> > > > > > >> > > > > > This specific test did not failed on hbasetest1 >> > > > > > >> > > > > > Only this one failed on node8: >> > > > > > Tests in error: >> > > > > > >> > > > > > >> > > > > > >> > > > > >> > > > >> > > >> > >> testAddVisibilityLabelsOnRSRestart(org.apache.hadoop.hbase.security.visibility.TestVisibilityLabelsWithDistributedLogReplay): >> > > > > > test timed out after 60000 milliseconds >> > > > > > >> > > > > > Tests run: 1284, Failures: 0, Errors: 1, Skipped: 8 >> > > > > > >> > > > > > >> > > > > > And many tests failed on node1 and t430s. But all >> > > > > > TestVisibilityLabelsWithDistributedLogReplay failed with >> timeout. >> > > > > > >> > > > > > Again, I have copied everything on the server so you look at >> > whatever >> > > > can >> > > > > > be interesting. >> > > > > > >> > > > > > < >> > > > > > >> > > > > >> > > > >> > > >> > >> http://server.distparser.com:81/hbase/node1/hbase-server/target/surefire-reports/org.apache.hadoop.hbase.security.visibility.TestVisibilityLabelsWithDistributedLogReplay-output.txt >> > > > > > > >> > > > > > >> > > > > > Feel free to send me anything you want me to test. Also, I can >> > share >> > > > all >> > > > > > required information, like CPU, etc. if that helps. >> > > > > > >> > > > > > JM >> > > > > > >> > > > > > 2014-09-01 0:06 GMT-04:00 Anoop John <anoop.hb...@gmail.com>: >> > > > > > >> > > > > > > The addition of the 2 new labels would have failed (?) May >> be we >> > > log >> > > > > the >> > > > > > > return value of this addition in test and see the trace? >> > > > > > > >> > > > > > > -Anoop- >> > > > > > > >> > > > > > > On Mon, Sep 1, 2014 at 5:59 AM, Ted Yu <yuzhih...@gmail.com> >> > > wrote: >> > > > > > > >> > > > > > > > In the output of successful test run, I saw: >> > > > > > > > >> > > > > > > > 2014-08-30 11:24:12,139 DEBUG >> > > > > > > > [B.defaultRpcServer.handler=1,queue=0,port=56716] >> > > > > > > > visibility.DefaultVisibilityLabelServiceImpl(252): Adding >> the >> > > label >> > > > > ABC >> > > > > > > > >> > > > > > > > 2014-08-30 11:24:12,139 DEBUG >> > > > > > > > [B.defaultRpcServer.handler=1,queue=0,port=56716] >> > > > > > > > visibility.DefaultVisibilityLabelServiceImpl(252): Adding >> the >> > > label >> > > > > XYZ >> > > > > > > > >> > > > > > > > From >> > > > > > > > >> > > > > > > > >> > > > > > > >> > > > > > >> > > > > >> > > > >> > > >> > >> http://server.distparser.com:81/hbase/node1/hbase-server/target/surefire-reports/org.apache.hadoop.hbase.security.visibility.TestVisibilityLabelsWithDistributedLogReplay-output.txt >> > > > > > > > , I don't see the above. >> > > > > > > > >> > > > > > > > The additional debug log confirmed that the two new labels >> were >> > > not >> > > > > > read >> > > > > > > > back. >> > > > > > > > >> > > > > > > > Cheers >> > > > > > > > >> > > > > > > > >> > > > > > > > >> > > > > > > > >> > > > > > > > On Sun, Aug 31, 2014 at 3:36 PM, Jean-Marc Spaggiari < >> > > > > > > > jean-m...@spaggiari.org> wrote: >> > > > > > > > >> > > > > > > > > Here are the results. >> > > > > > > > > >> > > > > > > > > 4 builds, 4 computers, 4 failed. >> > > > > > > > > >> > > > > > > > > HTH >> > > > > > > > > >> > > > > > > > > http://server.distparser.com:81/hbase/ >> > > > > > > > > >> > > > > > > > > Let me know if you want me to run anything else. >> > > > > > > > > >> > > > > > > > > JM >> > > > > > > > > >> > > > > > > > > >> > > > > > > > > 2014-08-30 14:29 GMT-04:00 Ted Yu <yuzhih...@gmail.com>: >> > > > > > > > > >> > > > > > > > > > Jean-Marc: >> > > > > > > > > > I couldn't reproduce the test failure - on Mac or Linux. >> > > > > > > > > > >> > > > > > > > > > Can you apply the following and run test again ? >> > > > > > > > > > http://pastebin.com/Z1czdBes >> > > > > > > > > > >> > > > > > > > > > It would reveal whether log replay didn't bring back the >> > > labels >> > > > > > > written >> > > > > > > > > > prior to RS restart, or the new labels were not written >> > > > > > successfully. >> > > > > > > > > > >> > > > > > > > > > Thanks >> > > > > > > > > > >> > > > > > > > > > >> > > > > > > > > > On Fri, Aug 29, 2014 at 6:08 PM, Jean-Marc Spaggiari < >> > > > > > > > > > jean-m...@spaggiari.org> wrote: >> > > > > > > > > > >> > > > > > > > > > > I will see if I can build something where the logs are >> > > > > > > automatically >> > > > > > > > > > > uploaded so that will make easier to look at them. >> > > > > > > > > > > >> > > > > > > > > > > I just pushed the files related to this test. >> > > > > > > > > > > >> > > > > > > > > > > >> > > > > > > > > > > >> > > > > > > > > > >> > > > > > > > > >> > > > > > > > >> > > > > > > >> > > > > > >> > > > > >> > > > >> > > >> > >> http://www.spaggiari.org/org.apache.hadoop.hbase.security.visibility.TestVisibilityLabelsWithDistributedLogReplay.txt >> > > > > > > > > > > >> > > > > > > > > > > >> > > > > > > > > > >> > > > > > > > > >> > > > > > > > >> > > > > > > >> > > > > > >> > > > > >> > > > >> > > >> > >> http://www.spaggiari.org/org.apache.hadoop.hbase.security.visibility.TestVisibilityLabelsWithDistributedLogReplay-output.txt >> > > > > > > > > > > >> > > > > > > > > > > HTH >> > > > > > > > > > > >> > > > > > > > > > > JM >> > > > > > > > > > > >> > > > > > > > > > > >> > > > > > > > > > > 2014-08-29 20:05 GMT-04:00 Andrew Purtell < >> > > > apurt...@apache.org >> > > > > >: >> > > > > > > > > > > >> > > > > > > > > > > > On Fri, Aug 29, 2014 at 5:01 PM, Jean-Marc >> Spaggiari < >> > > > > > > > > > > > jean-m...@spaggiari.org> wrote: >> > > > > > > > > > > > >> > > > > > > > > > > > > Here are the logs for >> > > > > > > > > > > > > >> > > > > > > > > > > > > >> > > > > > > > > > > > >> > > > > > > > > > > >> > > > > > > > > > >> > > > > > > > > >> > > > > > > > >> > > > > > > >> > > > > > >> > > > > >> > > > >> > > >> > >> testAddVisibilityLabelsOnRSRestart(org.apache.hadoop.hbase.security.visibility.TestVisibilityLabelsWithDistributedLogReplay) >> > > > > > > > > > > > > : http://www.spaggiari.org/hbase-0.98.6.logs >> > > > > > > > > > > > > >> > > > > > > > > > > > > Let me know if you need any other file. This is >> the >> > > > > standard >> > > > > > > > output >> > > > > > > > > > > with >> > > > > > > > > > > > > -X. >> > > > > > > > > > > > > >> > > > > > > > > > > > >> > > > > > > > > > > > I was looking for the >> > > > > > > > > > > > >> > > hbase-server/target/surefire-reports/<test-name>-output.txt >> > > > > > file, >> > > > > > > > but >> > > > > > > > > > > you >> > > > > > > > > > > > could run the test with -Dtest.output.tofile=false >> and >> > > > > capture >> > > > > > > > > > standard >> > > > > > > > > > > > output too. >> > > > > > > > > > > > >> > > > > > > > > > > > >> > > > > > > > > > > > >> > > > > > > > > > > > -- >> > > > > > > > > > > > Best regards, >> > > > > > > > > > > > >> > > > > > > > > > > > - Andy >> > > > > > > > > > > > >> > > > > > > > > > > > Problems worthy of attack prove their worth by >> hitting >> > > > back. >> > > > > - >> > > > > > > Piet >> > > > > > > > > > Hein >> > > > > > > > > > > > (via Tom White) >> > > > > > > > > > > > >> > > > > > > > > > > >> > > > > > > > > > >> > > > > > > > > >> > > > > > > > >> > > > > > > >> > > > > > >> > > > > >> > > > >> > > >> > >> > >