Fine, let's focus on verifying whether it's a real problem rather than arguing about wording, after all that's not my intention...
As mentioned, I participated in the 1.4.7 release vote[1] and IIRC I was using the same env and all tests passed w/o issue, that's where my concern lies and the main reason I gave a -1 vote. I'm running against 1.4.7 source on the same now and let's see the result. [1] https://www.mail-archive.com/[email protected]/msg51380.html Best Regards, Yu On Fri, 12 Apr 2019 at 12:05, Andrew Purtell <[email protected]> wrote: > I believe the test execution order matters. We run some tests in parallel. > The ordering of tests is determined by readdir() results and this differs > from host to host and checkout to checkout. So when you see a repeatable > group of failures, that’s great. And when someone else doesn’t see those > same tests fail, or they cannot be reproduced when running by themselves, > the commonly accepted term of art for this is “flaky”. > > > > On Apr 11, 2019, at 8:52 PM, Yu Li <[email protected]> wrote: > > > > Sorry but I'd call it "possible environment related problem" or "some > > feature may not work well in specific environment", rather than a flaky. > > > > Will check against 1.4.7 released source package before opening any JIRA. > > > > Best Regards, > > Yu > > > > > > On Fri, 12 Apr 2019 at 11:37, Andrew Purtell <[email protected]> > > wrote: > > > >> And if they pass in my environment , then what should we call it then. I > >> have no doubt you are seeing failures. Therefore can you please file > JIRAs > >> and attach information that can help identify a fix. Thanks. > >> > >>> On Apr 11, 2019, at 8:35 PM, Yu Li <[email protected]> wrote: > >>> > >>> I ran the test suite with the -Dsurefire.rerunFailingTestsCount=2 > option > >>> and on two different env separately, so it sums up to 6 times stable > >>> failure for each case, and from my perspective this is not flaky. > >>> > >>> IIRC last time when verifying 1.4.7 on the same env no such issue > >> observed, > >>> will double check. > >>> > >>> Best Regards, > >>> Yu > >>> > >>> > >>> On Fri, 12 Apr 2019 at 00:07, Andrew Purtell <[email protected] > > > >>> wrote: > >>> > >>>> There are two failure cases it looks like. And this looks like flakes. > >>>> > >>>> The wrong FS assertions are not something I see when I run these tests > >>>> myself. I am not able to investigate something I can’t reproduce. > What I > >>>> suggest is since you can reproduce do a git bisect to find the commit > >> that > >>>> introduced the problem. Then we can revert it. As an alternative we > can > >>>> open a JIRA, report the problem, temporarily @ignore the test, and > >>>> continue. This latter option only should be done if we are fairly > >> confident > >>>> it is a test only problem. > >>>> > >>>> The connect exceptions are interesting. I see these sometimes when the > >>>> suite is executed, not this particular case, but when the failed test > is > >>>> executed by itself it always passes. It is possible some change to > >> classes > >>>> related to the minicluster or startup or shutdown timing are the > cause, > >> but > >>>> it is test time flaky behavior. I’m not happy about this but it > doesn’t > >>>> actually fail the release because the failure is never repeatable when > >> the > >>>> test is run standalone. > >>>> > >>>> In general it would be great if some attention was paid to test > >>>> cleanliness on branch-1. As RM I’m not in a position to insist that > >>>> everything is perfect or there will never be another 1.x release, > >> certainly > >>>> not from branch-1. So, tests which fail repeatedly block a release > IMHO > >> but > >>>> flakes do not. > >>>> > >>>> > >>>>> On Apr 10, 2019, at 11:20 PM, Yu Li <[email protected]> wrote: > >>>>> > >>>>> -1 > >>>>> > >>>>> Observed many UT failures when checking the source package (tried > >>>> multiple > >>>>> rounds on two different environments, MacOs and Linux, got the same > >>>>> result), including (but not limited to): > >>>>> > >>>>> TestBulkload: > >>>>> > >>>> > >> > shouldBulkLoadSingleFamilyHLog(org.apache.hadoop.hbase.regionserver.TestBulkLoad) > >>>>> Time elapsed: 0.083 s <<< ERROR! > >>>>> java.lang.IllegalArgumentException: Wrong FS: > >>>>> > >>>> > >> > file:/var/folders/t6/vch4nh357f98y1wlq09lbm7h0000gn/T/junit1805329913454564189/junit8020757893576011944/data/default/shouldBulkLoadSingleFamilyHLog/8f4a6b584533de2fd1bf3c398dfaac29, > >>>>> expected: hdfs://localhost:55938 > >>>>> at > >>>>> > >>>> > >> > org.apache.hadoop.hbase.regionserver.TestBulkLoad.testRegionWithFamiliesAndSpecifiedTableName(TestBulkLoad.java:246) > >>>>> at > >>>>> > >>>> > >> > org.apache.hadoop.hbase.regionserver.TestBulkLoad.testRegionWithFamilies(TestBulkLoad.java:256) > >>>>> at > >>>>> > >>>> > >> > org.apache.hadoop.hbase.regionserver.TestBulkLoad.shouldBulkLoadSingleFamilyHLog(TestBulkLoad.java:150) > >>>>> > >>>>> TestStoreFile: > >>>>> > >>>> > >> > testCacheOnWriteEvictOnClose(org.apache.hadoop.hbase.regionserver.TestStoreFile) > >>>>> Time elapsed: 0.083 s <<< ERROR! > >>>>> java.net.ConnectException: Call From localhost/127.0.0.1 to > >>>> localhost:55938 > >>>>> failed on connection exception: java.net.ConnectException: Connection > >>>>> refused; For more details see: > >>>>> http://wiki.apache.org/hadoop/ConnectionRefused > >>>>> at > >>>>> > >>>> > >> > org.apache.hadoop.hbase.regionserver.TestStoreFile.writeStoreFile(TestStoreFile.java:1047) > >>>>> at > >>>>> > >>>> > >> > org.apache.hadoop.hbase.regionserver.TestStoreFile.testCacheOnWriteEvictOnClose(TestStoreFile.java:908) > >>>>> > >>>>> TestHFile: > >>>>> testEmptyHFile(org.apache.hadoop.hbase.io.hfile.TestHFile) Time > >> elapsed: > >>>>> 0.08 s <<< ERROR! > >>>>> java.net.ConnectException: Call From > >>>>> z05f06378.sqa.zth.tbsite.net/11.163.183.195 to localhost:35529 > failed > >> on > >>>>> connection exception: java.net.ConnectException: Connection refused; > >> For > >>>>> more details see: http://wiki.apache.org/hadoop/ConnectionRefused > >>>>> at > >>>>> org.apache.hadoop.hbase.io > >>>> .hfile.TestHFile.testEmptyHFile(TestHFile.java:90) > >>>>> Caused by: java.net.ConnectException: Connection refused > >>>>> at > >>>>> org.apache.hadoop.hbase.io > >>>> .hfile.TestHFile.testEmptyHFile(TestHFile.java:90) > >>>>> > >>>>> TestBlocksScanned: > >>>>> > >>>> > >> > testBlocksScannedWithEncoding(org.apache.hadoop.hbase.regionserver.TestBlocksScanned) > >>>>> Time elapsed: 0.069 s <<< ERROR! > >>>>> java.lang.IllegalArgumentException: Wrong FS: > >> hdfs://localhost:35529/tmp/ > >>>>> > >>>> > >> > hbase-jueding.ly/hbase/data/default/TestBlocksScannedWithEncoding/a4a416cc3060d9820a621c294af0aa08 > >>>> , > >>>>> expected: file:/// > >>>>> at > >>>>> > >>>> > >> > org.apache.hadoop.hbase.regionserver.TestBlocksScanned._testBlocksScanned(TestBlocksScanned.java:90) > >>>>> at > >>>>> > >>>> > >> > org.apache.hadoop.hbase.regionserver.TestBlocksScanned.testBlocksScannedWithEncoding(TestBlocksScanned.java:86) > >>>>> > >>>>> And please let me know if any known issue I'm not aware of. Thanks. > >>>>> > >>>>> Best Regards, > >>>>> Yu > >>>>> > >>>>> > >>>>>> On Mon, 8 Apr 2019 at 11:38, Yu Li <[email protected]> wrote: > >>>>>> > >>>>>> The performance report LGTM, thanks! (and sorry for the lag due to > >>>>>> Qingming Festival Holiday here in China) > >>>>>> > >>>>>> Still verifying the release, just some quick feedback: observed some > >>>>>> incompatible changes in compatibility report including > >>>>>> HBASE-21492/HBASE-21684 and worth a reminder in ReleaseNote. > >>>>>> > >>>>>> Irrelative but noticeable: the 1.4.9 release note URL is invalid on > >>>>>> https://hbase.apache.org/downloads.html > >>>>>> > >>>>>> Best Regards, > >>>>>> Yu > >>>>>> > >>>>>> > >>>>>>> On Fri, 5 Apr 2019 at 08:45, Andrew Purtell <[email protected]> > >>>> wrote: > >>>>>>> > >>>>>>> The difference is basically noise per the usual YCSB evaluation. > >> Small > >>>>>>> differences in workloads D and F (slightly worse) and workload E > >>>> (slightly > >>>>>>> better) that do not indicate serious regression. > >>>>>>> > >>>>>>> Linux version 4.14.55-62.37.amzn1.x86_64 > >>>>>>> c3.8xlarge x 5 > >>>>>>> OpenJDK Runtime Environment (build 1.8.0_181-shenandoah-b13) > >>>>>>> -Xms20g -Xmx20g -XX:+UseG1GC -XX:+AlwaysPreTouch -XX:+UseNUMA > >>>>>>> -XX:-UseBiasedLocking -XX:+ParallelRefProcEnabled > >>>>>>> Hadoop 2.9.2 > >>>>>>> Init: Load 100 M rows and snapshot > >>>>>>> Run: Delete table, clone and redeploy from snapshot, run 10 M > >>>> operations > >>>>>>> Args: -threads 100 -target 50000 > >>>>>>> Test table: {NAME => 'u', BLOOMFILTER => 'ROW', VERSIONS => '1', > >>>> IN_MEMORY > >>>>>>> => 'false', KEEP_DELETED_CELLS => 'FALSE', DATA_BLOCK_ENCODING => > >>>>>>> 'ROW_INDEX_V1', TTL => 'FOREVER', COMPRESSION => 'SNAPPY', > >>>> MIN_VERSIONS => > >>>>>>> '0', BLOCKCACHE => 'true', BLOCKSIZE => '65536', REPLICATION_SCOPE > => > >>>>>>> '0'} > >>>>>>> > >>>>>>> > >>>>>>> YCSB Workload A > >>>>>>> > >>>>>>> target 50k/op/s 1.4.9 1.5.0 > >>>>>>> > >>>>>>> > >>>>>>> > >>>>>>> [OVERALL], RunTime(ms) 200592 200583 > >>>>>>> [OVERALL], Throughput(ops/sec) 49852 49855 > >>>>>>> [READ], AverageLatency(us) 544 559 > >>>>>>> [READ], MinLatency(us) 267 292 > >>>>>>> [READ], MaxLatency(us) 165631 185087 > >>>>>>> [READ], 95thPercentileLatency(us) 738 742 > >>>>>>> [READ], 99thPercentileLatency(us), 1877 1961 > >>>>>>> [UPDATE], AverageLatency(us) 1370 1181 > >>>>>>> [UPDATE], MinLatency(us) 702 646 > >>>>>>> [UPDATE], MaxLatency(us) 180735 177279 > >>>>>>> [UPDATE], 95thPercentileLatency(us) 1943 1652 > >>>>>>> [UPDATE], 99thPercentileLatency(us) 3257 3085 > >>>>>>> > >>>>>>> YCSB Workload B > >>>>>>> > >>>>>>> target 50k/op/s 1.4.9 1.5.0 > >>>>>>> > >>>>>>> > >>>>>>> > >>>>>>> [OVERALL], RunTime(ms) 200599 200581 > >>>>>>> [OVERALL], Throughput(ops/sec) 49850 49855 > >>>>>>> [READ], AverageLatency(us), 454 471 > >>>>>>> [READ], MinLatency(us) 203 213 > >>>>>>> [READ], MaxLatency(us) 183423 174207 > >>>>>>> [READ], 95thPercentileLatency(us) 563 599 > >>>>>>> [READ], 99thPercentileLatency(us) 1360 1172 > >>>>>>> [UPDATE], AverageLatency(us) 1064 1029 > >>>>>>> [UPDATE], MinLatency(us) 746 726 > >>>>>>> [UPDATE], MaxLatency(us) 163455 101631 > >>>>>>> [UPDATE], 95thPercentileLatency(us) 1327 1157 > >>>>>>> [UPDATE], 99thPercentileLatency(us) 2241 1898 > >>>>>>> > >>>>>>> YCSB Workload C > >>>>>>> > >>>>>>> target 50k/op/s 1.4.9 1.5.0 > >>>>>>> > >>>>>>> > >>>>>>> > >>>>>>> [OVERALL], RunTime(ms) 200541 200538 > >>>>>>> [OVERALL], Throughput(ops/sec) 49865 49865 > >>>>>>> [READ], AverageLatency(us) 332 327 > >>>>>>> [READ], MinLatency(us) 175 179 > >>>>>>> [READ], MaxLatency(us) 210559 170367 > >>>>>>> [READ], 95thPercentileLatency(us) 410 396 > >>>>>>> [READ], 99thPercentileLatency(us) 871 892 > >>>>>>> > >>>>>>> YCSB Workload D > >>>>>>> > >>>>>>> target 50k/op/s 1.4.9 1.5.0 > >>>>>>> > >>>>>>> > >>>>>>> > >>>>>>> [OVERALL], RunTime(ms) 200579 200562 > >>>>>>> [OVERALL], Throughput(ops/sec) 49855 49859 > >>>>>>> [READ], AverageLatency(us) 487 547 > >>>>>>> [READ], MinLatency(us) 210 214 > >>>>>>> [READ], MaxLatency(us) 192255 177535 > >>>>>>> [READ], 95thPercentileLatency(us) 973 1529 > >>>>>>> [READ], 99thPercentileLatency(us) 1836 2683 > >>>>>>> [INSERT], AverageLatency(us) 1239 1152 > >>>>>>> [INSERT], MinLatency(us) 807 788 > >>>>>>> [INSERT], MaxLatency(us) 184575 148735 > >>>>>>> [INSERT], 95thPercentileLatency(us) 1496 1243 > >>>>>>> [INSERT], 99thPercentileLatency(us) 2965 2495 > >>>>>>> > >>>>>>> YCSB Workload E > >>>>>>> > >>>>>>> target 10k/op/s 1.4.9 1.5.0 > >>>>>>> > >>>>>>> > >>>>>>> > >>>>>>> [OVERALL], RunTime(ms) 100605 100568 > >>>>>>> [OVERALL], Throughput(ops/sec) 9939 9943 > >>>>>>> [SCAN], AverageLatency(us) 3548 2687 > >>>>>>> [SCAN], MinLatency(us) 696 678 > >>>>>>> [SCAN], MaxLatency(us) 1059839 238463 > >>>>>>> [SCAN], 95thPercentileLatency(us) 8327 6791 > >>>>>>> [SCAN], 99thPercentileLatency(us) 17647 14415 > >>>>>>> [INSERT], AverageLatency(us) 2688 1555 > >>>>>>> [INSERT], MinLatency(us) 887 815 > >>>>>>> [INSERT], MaxLatency(us) 173311 154623 > >>>>>>> [INSERT], 95thPercentileLatency(us) 4455 2571 > >>>>>>> [INSERT], 99thPercentileLatency(us) 9303 5375 > >>>>>>> > >>>>>>> YCSB Workload F > >>>>>>> > >>>>>>> target 50k/op/s 1.4.9 1.5.0 > >>>>>>> > >>>>>>> > >>>>>>> > >>>>>>> [OVERALL], RunTime(ms) 200562 204178 > >>>>>>> [OVERALL], Throughput(ops/sec) 49859 48976 > >>>>>>> [READ], AverageLatency(us) 856 1137 > >>>>>>> [READ], MinLatency(us) 262 257 > >>>>>>> [READ], MaxLatency(us) 205567 222335 > >>>>>>> [READ], 95thPercentileLatency(us) 2365 3475 > >>>>>>> [READ], 99thPercentileLatency(us) 3099 4143 > >>>>>>> [READ-MODIFY-WRITE], AverageLatency(us) 2559 2917 > >>>>>>> [READ-MODIFY-WRITE], MinLatency(us) 1100 1034 > >>>>>>> [READ-MODIFY-WRITE], MaxLatency(us) 208767 204799 > >>>>>>> [READ-MODIFY-WRITE], 95thPercentileLatency(us) 5747 7627 > >>>>>>> [READ-MODIFY-WRITE], 99thPercentileLatency(us) 7203 8919 > >>>>>>> [UPDATE], AverageLatency(us) 1700 1777 > >>>>>>> [UPDATE], MinLatency(us) 737 687 > >>>>>>> [UPDATE], MaxLatency(us) 97983 94271 > >>>>>>> [UPDATE], 95thPercentileLatency(us) 3377 4147 > >>>>>>> [UPDATE], 99thPercentileLatency(us) 4147 4831 > >>>>>>> > >>>>>>> > >>>>>>>> On Thu, Apr 4, 2019 at 1:14 AM Yu Li <[email protected]> wrote: > >>>>>>>> > >>>>>>>> Thanks for the efforts boss. > >>>>>>>> > >>>>>>>> Since it's a new minor release, do we have performance comparison > >>>> report > >>>>>>>> with 1.4.9 as we did when releasing 1.4.0? If so, any reference? > >> Many > >>>>>>>> thanks! > >>>>>>>> > >>>>>>>> Best Regards, > >>>>>>>> Yu > >>>>>>>> > >>>>>>>> > >>>>>>>> On Thu, 4 Apr 2019 at 07:44, Andrew Purtell <[email protected]> > >>>>>>> wrote: > >>>>>>>> > >>>>>>>>> The fourth HBase 1.5.0 release candidate (RC3) is available for > >>>>>>> download > >>>>>>>> at > >>>>>>>>> https://dist.apache.org/repos/dist/dev/hbase/hbase-1.5.0RC3/ and > >>>>>>> Maven > >>>>>>>>> artifacts are available in the temporary repository > >>>>>>>>> > >>>>>>> > >>>> > https://repository.apache.org/content/repositories/orgapachehbase-1292/ > >>>>>>>>> > >>>>>>>>> The git tag corresponding to the candidate is '1.5.0RC3’ > >>>> (b0bc7225c5). > >>>>>>>>> > >>>>>>>>> A detailed source and binary compatibility report for this > release > >> is > >>>>>>>>> available for your review at > >>>>>>>>> > >>>>>>>>> > >>>>>>>> > >>>>>>> > >>>> > >> > https://dist.apache.org/repos/dist/dev/hbase/hbase-1.5.0RC3/compat-check-report.html > >>>>>>>>> . > >>>>>>>>> > >>>>>>>>> A list of the 115 issues resolved in this release can be found at > >>>>>>>>> https://s.apache.org/K4Wk . The 1.5.0 changelog is derived from > >> the > >>>>>>>>> changelog of the last branch-1.4 release, 1.4.9. > >>>>>>>>> > >>>>>>>>> Please try out the candidate and vote +1/0/-1. > >>>>>>>>> > >>>>>>>>> The vote will be open for at least 72 hours. Unless objection I > >> will > >>>>>>> try > >>>>>>>> to > >>>>>>>>> close it Friday April 12, 2019 if we have sufficient votes. > >>>>>>>>> > >>>>>>>>> Prior to making this announcement I made the following preflight > >>>>>>> checks: > >>>>>>>>> > >>>>>>>>> RAT check passes (7u80) > >>>>>>>>> Unit test suite passes (7u80, 8u181)* > >>>>>>>>> Opened the UI in a browser, poked around > >>>>>>>>> LTT load 100M rows with 100% verification and 20% updates > (8u181) > >>>>>>>>> ITBLL 1B rows with slowDeterministic monkey (8u181) > >>>>>>>>> ITBLL 1B rows with serverKilling monkey (8u181) > >>>>>>>>> > >>>>>>>>> There are known flaky tests. See HBASE-21904 and HBASE-21905. > These > >>>>>>> flaky > >>>>>>>>> tests do not represent serious test failures that would prevent a > >>>>>>>> release. > >>>>>>>>> > >>>>>>>>> > >>>>>>>>> -- > >>>>>>>>> Best regards, > >>>>>>>>> Andrew > >>>>>>>>> > >>>>>>>> > >>>>>>> > >>>>>>> > >>>>>>> -- > >>>>>>> Best regards, > >>>>>>> Andrew > >>>>>>> > >>>>>>> Words like orphans lost among the crosstalk, meaning torn from > >> truth's > >>>>>>> decrepit hands > >>>>>>> - A23, Crosstalk > >>>>>>> > >>>>>> > >>>> > >> >
