[jira] [Resolved] (HBASE-23741) Data loss when WAL split to HFile enabled
[ https://issues.apache.org/jira/browse/HBASE-23741?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Guanghao Zhang resolved HBASE-23741. Resolution: Fixed Pushed to branch-2.3+. Thanks [~zhangduo] for reviewing. > Data loss when WAL split to HFile enabled > - > > Key: HBASE-23741 > URL: https://issues.apache.org/jira/browse/HBASE-23741 > Project: HBase > Issue Type: Bug > Components: MTTR >Affects Versions: 3.0.0, 2.3.0 >Reporter: Pankaj Kumar >Assignee: Guanghao Zhang >Priority: Blocker > Fix For: 3.0.0, 2.3.0, 2.4.0 > > > Very simple steps as below, > 1. Create table with 1 region > 2. Insert 1 record > 3. Flush the table > 4. Scan table and observe timestamp of the inserted row > 5. Insert same row key with same timestamp as previously inserted but with > different value > 6. Kill -9 RS where table region is online > 7. Start RS > Scan the table and check the result, latest cell must be returned. > Thanks [~sreenivasulureddy] for finding this issue. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (HBASE-24034) [Flakey Tests] A couple of fixes and cleanups
Michael Stack created HBASE-24034: - Summary: [Flakey Tests] A couple of fixes and cleanups Key: HBASE-24034 URL: https://issues.apache.org/jira/browse/HBASE-24034 Project: HBase Issue Type: Bug Components: flakies Reporter: Michael Stack Assignee: Michael Stack Here's a few cleanups and flakey fixes accumulated in last few days: {code} hbase-rsgroup/src/test/java/org/apache/hadoop/hbase/rsgroup/TestRSGroupMajorCompactionTTL.java Remove spurious assert. Just before this it waits an arbitrary 10 seconds. Compactions could have completed inside this time. The spirit of the test remains. hbase-server/src/main/java/org/apache/hadoop/hbase/master/cleaner/HFileCleaner.java Get log cleaner to go down promptly; its sticking around. See if this helps with TestMasterShutdown hbase-server/src/main/java/org/apache/hadoop/hbase/regionserver/wal/FSHLog.java We get a rare NPE trying to sync. Make local copy of SyncFuture and see if that helps. hbase-server/src/test/java/org/apache/hadoop/hbase/client/TestAsyncRegionAdminApi.java Compaction may have completed when not expected; allow for it. hbase-server/src/test/java/org/apache/hadoop/hbase/client/TestBlockEvictionFromClient.java Add wait before testing. Compaction may not have completed. Let compaction complete before progressing and then test for empty cache. hbase-server/src/test/java/org/apache/hadoop/hbase/master/TestMasterShutdown.java Less resources. hbase-server/src/test/java/org/apache/hadoop/hbase/master/balancer/TestDefaultLoadBalancer.java Less resources. hbase-server/src/test/java/org/apache/hadoop/hbase/regionserver/TestEndToEndSplitTransaction.java Wait till online before we try and do compaction (else request is ignored) hbase-server/src/test/java/org/apache/hadoop/hbase/tool/TestCanaryTool.java Disable test that tests for timeout that fails randomly w/ mockito complaint on some mac os x's. {code} -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (HBASE-24035) [Flakey Tests] Disable TestClusterScopeQuotaThrottle#testUserNamespaceClusterScopeQuota
Michael Stack created HBASE-24035: - Summary: [Flakey Tests] Disable TestClusterScopeQuotaThrottle#testUserNamespaceClusterScopeQuota Key: HBASE-24035 URL: https://issues.apache.org/jira/browse/HBASE-24035 Project: HBase Issue Type: Bug Reporter: Michael Stack testUserNamespaceClusterScopeQuota in TestClusterScopeQuotaThrottle is just spewing: {code} 2020-03-23 13:33:23,996 INFO [Listener at localhost/50281] hbase.ChoreService(157): Chore ScheduledChore name=QuotaRefresherChore, period=180, unit=MILLISECONDS is enabled. 2020-03-23 13:33:25,000 INFO [Listener at localhost/50281] hbase.ChoreService(157): Chore ScheduledChore name=QuotaRefresherChore, period=180, unit=MILLISECONDS is enabled. 2020-03-23 13:33:26,002 INFO [Listener at localhost/50281] hbase.ChoreService(157): Chore ScheduledChore name=QuotaRefresherChore, period=180, unit=MILLISECONDS is enabled. 2020-03-23 13:33:27,003 INFO [Listener at localhost/50281] hbase.ChoreService(157): Chore ScheduledChore name=QuotaRefresherChore, period=180, unit=MILLISECONDS is enabled. {code} If I run it standalone it passes. Fails in nightly currently. Rather than spend time on this, disabling for now. It came in a good while ago as part of HBASE-21820 -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (HBASE-24036) [Flakey Tests] Re-enable TestClusterScopeQuotaThrottle#testUserNamespaceClusterScopeQuota
Michael Stack created HBASE-24036: - Summary: [Flakey Tests] Re-enable TestClusterScopeQuotaThrottle#testUserNamespaceClusterScopeQuota Key: HBASE-24036 URL: https://issues.apache.org/jira/browse/HBASE-24036 Project: HBase Issue Type: Sub-task Reporter: Michael Stack Disable in parent because flakey. Reenable after fix. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Resolved] (HBASE-24035) [Flakey Tests] Disable TestClusterScopeQuotaThrottle#testUserNamespaceClusterScopeQuota
[ https://issues.apache.org/jira/browse/HBASE-24035?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Michael Stack resolved HBASE-24035. --- Fix Version/s: 2.3.0 3.0.0 Resolution: Fixed Pushed on branch-2.3, branch-2, and master. > [Flakey Tests] Disable > TestClusterScopeQuotaThrottle#testUserNamespaceClusterScopeQuota > --- > > Key: HBASE-24035 > URL: https://issues.apache.org/jira/browse/HBASE-24035 > Project: HBase > Issue Type: Bug >Reporter: Michael Stack >Priority: Major > Fix For: 3.0.0, 2.3.0 > > > testUserNamespaceClusterScopeQuota in TestClusterScopeQuotaThrottle is just > spewing: > {code} > 2020-03-23 13:33:23,996 INFO [Listener at localhost/50281] > hbase.ChoreService(157): Chore ScheduledChore name=QuotaRefresherChore, > period=180, unit=MILLISECONDS is enabled. > 2020-03-23 13:33:25,000 INFO [Listener at localhost/50281] > hbase.ChoreService(157): Chore ScheduledChore name=QuotaRefresherChore, > period=180, unit=MILLISECONDS is enabled. > 2020-03-23 13:33:26,002 INFO [Listener at localhost/50281] > hbase.ChoreService(157): Chore ScheduledChore name=QuotaRefresherChore, > period=180, unit=MILLISECONDS is enabled. > 2020-03-23 13:33:27,003 INFO [Listener at localhost/50281] > hbase.ChoreService(157): Chore ScheduledChore name=QuotaRefresherChore, > period=180, unit=MILLISECONDS is enabled. > {code} > If I run it standalone it passes. Fails in nightly currently. > Rather than spend time on this, disabling for now. It came in a good while > ago as part of HBASE-21820 -- This message was sent by Atlassian Jira (v8.3.4#803005)
Re: Request for Apache Hbase slack channel -- apache-hbase.slack.com
Invitation sent. On Sat, Mar 21, 2020 at 11:07 PM mallik.v.ar...@gmail.com < mallik.v.ar...@gmail.com> wrote: > Invite* > --- > Mallikarjun > > > On Sun, Mar 22, 2020 at 11:36 AM mallik.v.ar...@gmail.com < > mallik.v.ar...@gmail.com> wrote: > > > > > --- > > Mallikarjun > > >
[jira] [Resolved] (HBASE-23885) [Flakey Test] TestReplicationStatus
[ https://issues.apache.org/jira/browse/HBASE-23885?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Michael Stack resolved HBASE-23885. --- Fix Version/s: (was: 2.3.0) (was: 3.0.0) Resolution: Not A Problem Seems to be cleared up. Resolving as no longer an issue. > [Flakey Test] TestReplicationStatus > --- > > Key: HBASE-23885 > URL: https://issues.apache.org/jira/browse/HBASE-23885 > Project: HBase > Issue Type: Bug > Components: flakies >Reporter: Michael Stack >Priority: Major > > Fails about 20% of the time even when i run locally. Spent time trying to > untangle but this is an awkward one. It is old. Subclasses > TestReplicationBase which itself needs cleanup. Tried adding barriers to wait > on events but holding up main thread messes up the test. Needs deeper dive. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Resolved] (HBASE-24034) [Flakey Tests] A couple of fixes and cleanups
[ https://issues.apache.org/jira/browse/HBASE-24034?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Michael Stack resolved HBASE-24034. --- Fix Version/s: 2.3.0 3.0.0 Resolution: Fixed I merged this to branch-2, 2.3, and master. > [Flakey Tests] A couple of fixes and cleanups > - > > Key: HBASE-24034 > URL: https://issues.apache.org/jira/browse/HBASE-24034 > Project: HBase > Issue Type: Bug > Components: flakies >Reporter: Michael Stack >Assignee: Michael Stack >Priority: Major > Fix For: 3.0.0, 2.3.0 > > > Here's a few cleanups and flakey fixes accumulated in last few days: > {code} > > hbase-rsgroup/src/test/java/org/apache/hadoop/hbase/rsgroup/TestRSGroupMajorCompactionTTL.java > Remove spurious assert. Just before this it waits an arbitrary 10 > seconds. Compactions could have completed inside this time. The spirit > of the test remains. > > hbase-server/src/main/java/org/apache/hadoop/hbase/master/cleaner/HFileCleaner.java > Get log cleaner to go down promptly; its sticking around. See if this > helps with TestMasterShutdown > > hbase-server/src/main/java/org/apache/hadoop/hbase/regionserver/wal/FSHLog.java > We get a rare NPE trying to sync. Make local copy of SyncFuture and see > if that helps. > > hbase-server/src/test/java/org/apache/hadoop/hbase/client/TestAsyncRegionAdminApi.java > Compaction may have completed when not expected; allow for it. > > hbase-server/src/test/java/org/apache/hadoop/hbase/client/TestBlockEvictionFromClient.java > Add wait before testing. Compaction may not have completed. Let > compaction complete before progressing and then test for empty cache. > > hbase-server/src/test/java/org/apache/hadoop/hbase/master/TestMasterShutdown.java > Less resources. > > hbase-server/src/test/java/org/apache/hadoop/hbase/master/balancer/TestDefaultLoadBalancer.java > Less resources. > > hbase-server/src/test/java/org/apache/hadoop/hbase/regionserver/TestEndToEndSplitTransaction.java > Wait till online before we try and do compaction (else request is > ignored) > > hbase-server/src/test/java/org/apache/hadoop/hbase/tool/TestCanaryTool.java > Disable test that tests for timeout that fails randomly w/ mockito > complaint on some mac os > x's. > {code} -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (HBASE-24037) Add ut for root dir and wal root dir are different
Guanghao Zhang created HBASE-24037: -- Summary: Add ut for root dir and wal root dir are different Key: HBASE-24037 URL: https://issues.apache.org/jira/browse/HBASE-24037 Project: HBase Issue Type: Sub-task Reporter: Guanghao Zhang -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Reopened] (HBASE-23829) Get `-PrunSmallTests` passing on JDK11
[ https://issues.apache.org/jira/browse/HBASE-23829?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Wei-Chiu Chuang reopened HBASE-23829: - Reopen this. I am getting consistent test failure as reported in HBASE-23976, and I traced it to this commit. I am on JDK 8. {noformat} 2020-03-23 20:57:04,934 ERROR [Time-limited test] bucket.BucketCache(312): Can't restore from file[/Users/weichiu/sandbox/hbase/hbase-server/target/test-data/c9d48c67-87ed-70cb-e19c-4dc6c14c29c6/bucket.persistence] because of java.io.IOException: Mismatch of checksum! The persistent checksum is `9"�0����X!ɍ=, but the calculate checksum is 1��h&�B���D(� at org.apache.hadoop.hbase.io.hfile.bucket.PersistentIOEngine.verifyFileIntegrity(PersistentIOEngine.java:55) at org.apache.hadoop.hbase.io.hfile.bucket.BucketCache.parsePB(BucketCache.java:1158) at org.apache.hadoop.hbase.io.hfile.bucket.BucketCache.retrieveFromFile(BucketCache.java:1106) at org.apache.hadoop.hbase.io.hfile.bucket.BucketCache.(BucketCache.java:310) at org.apache.hadoop.hbase.io.hfile.bucket.BucketCache.(BucketCache.java:258) at org.apache.hadoop.hbase.io.hfile.bucket.TestVerifyBucketCacheFile.testRetrieveFromFile(TestVerifyBucketCacheFile.java:116) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:498) at org.junit.runners.model.FrameworkMethod$1.runReflectiveCall(FrameworkMethod.java:59) at org.junit.internal.runners.model.ReflectiveCallable.run(ReflectiveCallable.java:12) at org.junit.runners.model.FrameworkMethod.invokeExplosively(FrameworkMethod.java:56) at org.junit.internal.runners.statements.InvokeMethod.evaluate(InvokeMethod.java:17) at org.junit.runners.ParentRunner$3.evaluate(ParentRunner.java:306) at org.junit.runners.BlockJUnit4ClassRunner$1.evaluate(BlockJUnit4ClassRunner.java:100) at org.junit.runners.ParentRunner.runLeaf(ParentRunner.java:366) at org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:103) at org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:63) at org.junit.runners.ParentRunner$4.run(ParentRunner.java:331) at org.junit.runners.ParentRunner$1.schedule(ParentRunner.java:79) at org.junit.runners.ParentRunner.runChildren(ParentRunner.java:329) at org.junit.runners.ParentRunner.access$100(ParentRunner.java:66) at org.junit.runners.ParentRunner$2.evaluate(ParentRunner.java:293) at org.junit.runners.ParentRunner.run(ParentRunner.java:413) at org.junit.runners.Suite.runChild(Suite.java:128) at org.junit.runners.Suite.runChild(Suite.java:27) at org.junit.runners.ParentRunner$4.run(ParentRunner.java:331) at org.junit.runners.ParentRunner$1.schedule(ParentRunner.java:79) at org.junit.runners.ParentRunner.runChildren(ParentRunner.java:329) at org.junit.runners.ParentRunner.access$100(ParentRunner.java:66) at org.junit.runners.ParentRunner$2.evaluate(ParentRunner.java:293) at org.junit.internal.runners.statements.FailOnTimeout$CallableStatement.call(FailOnTimeout.java:288) at org.junit.internal.runners.statements.FailOnTimeout$CallableStatement.call(FailOnTimeout.java:282) at java.util.concurrent.FutureTask.run$$$capture(FutureTask.java:266) at java.util.concurrent.FutureTask.run(FutureTask.java) at java.lang.Thread.run(Thread.java:748) {noformat} > Get `-PrunSmallTests` passing on JDK11 > -- > > Key: HBASE-23829 > URL: https://issues.apache.org/jira/browse/HBASE-23829 > Project: HBase > Issue Type: Sub-task > Components: test >Reporter: Nick Dimiduk >Assignee: Nick Dimiduk >Priority: Major > Fix For: 3.0.0, 2.3.0 > > > Start with the small tests, shaking out issues identified by the harness. So > far it seems like {{-Dhadoop.profile=3.0}} and > {{-Dhadoop-three.version=3.3.0-SNAPSHOT}} maybe be required. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Resolved] (HBASE-23919) [Flakey Test] Standalone Zookeeper won't start (minizookeepercluster won't come up)
[ https://issues.apache.org/jira/browse/HBASE-23919?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Michael Stack resolved HBASE-23919. --- Resolution: Not A Problem HBASE-23993 fixed this condition > [Flakey Test] Standalone Zookeeper won't start (minizookeepercluster won't > come up) > --- > > Key: HBASE-23919 > URL: https://issues.apache.org/jira/browse/HBASE-23919 > Project: HBase > Issue Type: Bug > Components: flakies >Reporter: Michael Stack >Priority: Major > > I've seen this on occasion across different hardwares; the standalone > zookeeper won't come up and then random unit test fails in its junit startup > phase as part of launching mini cluster. > I've been trying to track this w/ a while now adding in logging, using more > of zk client instead of the copy/paste we've had a long while now, and adding > in logging (I've been running locally w/ zk logging set to INFO). > It looks like this currently where the last thing out of the server launch is > this > {code:java} > 2020-03-02 07:57:46,129 INFO [Time-limited test] > server.ZooKeeperServer(854): maxSessionTimeout set to -1 > > 2020-03-02 07:57:46,139 INFO [Time-limited test] > server.NIOServerCnxnFactory(89): binding to port 0.0.0.0/0.0.0.0:49316 > > 2020-03-02 07:57:46,181 INFO [Time-limited test] > zookeeper.MiniZooKeeperCluster(256): Started connectionTimeout=3, > dir=/Users/stack/checkouts/hbase.git/hbase-server/target/test-data/89d57393-fe97-9200-630f-7843ee406bd2/ > cluster_6b4d6f67-7978-dc67-a1a3-7b1b0c0e4268/zookeeper_0, > clientPort=49316, > dataDir=/Users/stack/checkouts/hbase.git/hbase-server/target/test-data/89d57393-fe97-9200-630f-7843ee406bd2/cluster_6b4d6f67-7978-dc67-a1a3-7b1b0c0e4268/ > zookeeper_0/version-2, > dataLogDir=/Users/stack/checkouts/hbase.git/hbase-server/target/test-data/89d57393-fe97-9200-630f-7843ee406bd2/cluster_6b4d6f67-7978-dc67-a1a3-7b1b0c0e4268/zookeeper_0/version-2, > tickTime=2000, maxClientCnxns=300, minSessionTimeout=4000, > maxSessionTimeout=4, serverId=0{code} > ... then the client just does this over and over: > {code:java} > 2020-03-02 07:57:46,182 INFO [Time-limited test] > client.FourLetterWordMain(65): connecting to localhost 49316 > > 2020-03-02 07:57:46,213 INFO [Time-limited test] > zookeeper.MiniZooKeeperCluster(453): localhost:49316 not up > > java.net.SocketException: Connection reset > > > at > java.net.SocketInputStream.read(SocketInputStream.java:209) > >at > java.net.SocketInputStream.read(SocketInputStream.java:141) > >at > sun.nio.cs.StreamDecoder.readBytes(StreamDecoder.java:284) > >at > sun.nio.cs.StreamDecoder.implRead(StreamDecoder.java:326) > >at > sun.nio.cs.StreamDecoder.read(StreamDecoder.java:178) > >at > java.io.InputStreamReader.read(InputStreamReader.java:184) > >at > java.io.BufferedReader.fill(BufferedReader.java:161) > >at > java.io.Bu
[jira] [Resolved] (HBASE-22740) [RSGroup] Forward-port HBASE-22658 to master branch and branch-2.x
[ https://issues.apache.org/jira/browse/HBASE-22740?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Reid Chan resolved HBASE-22740. --- Hadoop Flags: Reviewed Resolution: Fixed > [RSGroup] Forward-port HBASE-22658 to master branch and branch-2.x > -- > > Key: HBASE-22740 > URL: https://issues.apache.org/jira/browse/HBASE-22740 > Project: HBase > Issue Type: Bug > Components: rsgroup >Reporter: Reid Chan >Assignee: Reid Chan >Priority: Major > Fix For: 2.3.0, master, 2.2.5 > > -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Reopened] (HBASE-22740) [RSGroup] Forward-port HBASE-22658 to master branch and branch-2.x
[ https://issues.apache.org/jira/browse/HBASE-22740?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Reid Chan reopened HBASE-22740: --- Some conflicts in branch-2x, all are reverted and needed some more works. > [RSGroup] Forward-port HBASE-22658 to master branch and branch-2.x > -- > > Key: HBASE-22740 > URL: https://issues.apache.org/jira/browse/HBASE-22740 > Project: HBase > Issue Type: Bug > Components: rsgroup >Reporter: Reid Chan >Assignee: Reid Chan >Priority: Major > Fix For: 2.3.0, master, 2.2.5 > > -- This message was sent by Atlassian Jira (v8.3.4#803005)