[jira] [Resolved] (HBASE-23741) Data loss when WAL split to HFile enabled

2020-03-23 Thread Guanghao Zhang (Jira)


 [ 
https://issues.apache.org/jira/browse/HBASE-23741?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Guanghao Zhang resolved HBASE-23741.

Resolution: Fixed

Pushed to branch-2.3+. Thanks [~zhangduo] for reviewing.

> Data loss when WAL split to HFile enabled
> -
>
> Key: HBASE-23741
> URL: https://issues.apache.org/jira/browse/HBASE-23741
> Project: HBase
>  Issue Type: Bug
>  Components: MTTR
>Affects Versions: 3.0.0, 2.3.0
>Reporter: Pankaj Kumar
>Assignee: Guanghao Zhang
>Priority: Blocker
> Fix For: 3.0.0, 2.3.0, 2.4.0
>
>
> Very simple steps as below,
> 1. Create table with 1 region
> 2. Insert 1 record 
> 3. Flush the table 
> 4. Scan table and observe timestamp of the inserted row
> 5. Insert same row key with same timestamp as previously inserted but with 
> different value
> 6. Kill -9 RS where table region is online
> 7. Start RS
> Scan the table and check the result, latest cell must be returned.
> Thanks [~sreenivasulureddy] for finding this issue.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (HBASE-24034) [Flakey Tests] A couple of fixes and cleanups

2020-03-23 Thread Michael Stack (Jira)
Michael Stack created HBASE-24034:
-

 Summary: [Flakey Tests] A couple of fixes and cleanups
 Key: HBASE-24034
 URL: https://issues.apache.org/jira/browse/HBASE-24034
 Project: HBase
  Issue Type: Bug
  Components: flakies
Reporter: Michael Stack
Assignee: Michael Stack


Here's a few cleanups and flakey fixes accumulated in last few days:

{code}

hbase-rsgroup/src/test/java/org/apache/hadoop/hbase/rsgroup/TestRSGroupMajorCompactionTTL.java
 Remove spurious assert. Just before this it waits an arbitrary 10
 seconds. Compactions could have completed inside this time. The spirit
 of the test remains.


hbase-server/src/main/java/org/apache/hadoop/hbase/master/cleaner/HFileCleaner.java
 Get log cleaner to go down promptly; its sticking around. See if this
 helps with TestMasterShutdown


hbase-server/src/main/java/org/apache/hadoop/hbase/regionserver/wal/FSHLog.java
 We get a rare NPE trying to sync. Make local copy of SyncFuture and see
 if that helps.


hbase-server/src/test/java/org/apache/hadoop/hbase/client/TestAsyncRegionAdminApi.java
 Compaction may have completed when not expected; allow for it.


hbase-server/src/test/java/org/apache/hadoop/hbase/client/TestBlockEvictionFromClient.java
 Add wait before testing. Compaction may not have completed. Let
 compaction complete before progressing and then test for empty cache.


hbase-server/src/test/java/org/apache/hadoop/hbase/master/TestMasterShutdown.java
 Less resources.


hbase-server/src/test/java/org/apache/hadoop/hbase/master/balancer/TestDefaultLoadBalancer.java
 Less resources.


hbase-server/src/test/java/org/apache/hadoop/hbase/regionserver/TestEndToEndSplitTransaction.java
 Wait till online before we try and do compaction (else request is
 ignored)

hbase-server/src/test/java/org/apache/hadoop/hbase/tool/TestCanaryTool.java
 Disable test that tests for timeout that fails randomly w/ mockito 
complaint on some mac os
 x's.
{code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (HBASE-24035) [Flakey Tests] Disable TestClusterScopeQuotaThrottle#testUserNamespaceClusterScopeQuota

2020-03-23 Thread Michael Stack (Jira)
Michael Stack created HBASE-24035:
-

 Summary: [Flakey Tests] Disable 
TestClusterScopeQuotaThrottle#testUserNamespaceClusterScopeQuota
 Key: HBASE-24035
 URL: https://issues.apache.org/jira/browse/HBASE-24035
 Project: HBase
  Issue Type: Bug
Reporter: Michael Stack


testUserNamespaceClusterScopeQuota in TestClusterScopeQuotaThrottle is just 
spewing:

{code}
2020-03-23 13:33:23,996 INFO  [Listener at localhost/50281] 
hbase.ChoreService(157): Chore ScheduledChore name=QuotaRefresherChore, 
period=180, unit=MILLISECONDS is enabled.
2020-03-23 13:33:25,000 INFO  [Listener at localhost/50281] 
hbase.ChoreService(157): Chore ScheduledChore name=QuotaRefresherChore, 
period=180, unit=MILLISECONDS is enabled.
2020-03-23 13:33:26,002 INFO  [Listener at localhost/50281] 
hbase.ChoreService(157): Chore ScheduledChore name=QuotaRefresherChore, 
period=180, unit=MILLISECONDS is enabled.
2020-03-23 13:33:27,003 INFO  [Listener at localhost/50281] 
hbase.ChoreService(157): Chore ScheduledChore name=QuotaRefresherChore, 
period=180, unit=MILLISECONDS is enabled.
{code}

If I run it standalone it passes. Fails in nightly currently.

Rather than spend time on this, disabling for now. It came in a good while ago 
as part of HBASE-21820



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (HBASE-24036) [Flakey Tests] Re-enable TestClusterScopeQuotaThrottle#testUserNamespaceClusterScopeQuota

2020-03-23 Thread Michael Stack (Jira)
Michael Stack created HBASE-24036:
-

 Summary: [Flakey Tests] Re-enable 
TestClusterScopeQuotaThrottle#testUserNamespaceClusterScopeQuota
 Key: HBASE-24036
 URL: https://issues.apache.org/jira/browse/HBASE-24036
 Project: HBase
  Issue Type: Sub-task
Reporter: Michael Stack


Disable in parent because flakey. Reenable after fix.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Resolved] (HBASE-24035) [Flakey Tests] Disable TestClusterScopeQuotaThrottle#testUserNamespaceClusterScopeQuota

2020-03-23 Thread Michael Stack (Jira)


 [ 
https://issues.apache.org/jira/browse/HBASE-24035?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Michael Stack resolved HBASE-24035.
---
Fix Version/s: 2.3.0
   3.0.0
   Resolution: Fixed

Pushed on branch-2.3, branch-2, and master.

> [Flakey Tests] Disable 
> TestClusterScopeQuotaThrottle#testUserNamespaceClusterScopeQuota
> ---
>
> Key: HBASE-24035
> URL: https://issues.apache.org/jira/browse/HBASE-24035
> Project: HBase
>  Issue Type: Bug
>Reporter: Michael Stack
>Priority: Major
> Fix For: 3.0.0, 2.3.0
>
>
> testUserNamespaceClusterScopeQuota in TestClusterScopeQuotaThrottle is just 
> spewing:
> {code}
> 2020-03-23 13:33:23,996 INFO  [Listener at localhost/50281] 
> hbase.ChoreService(157): Chore ScheduledChore name=QuotaRefresherChore, 
> period=180, unit=MILLISECONDS is enabled.
> 2020-03-23 13:33:25,000 INFO  [Listener at localhost/50281] 
> hbase.ChoreService(157): Chore ScheduledChore name=QuotaRefresherChore, 
> period=180, unit=MILLISECONDS is enabled.
> 2020-03-23 13:33:26,002 INFO  [Listener at localhost/50281] 
> hbase.ChoreService(157): Chore ScheduledChore name=QuotaRefresherChore, 
> period=180, unit=MILLISECONDS is enabled.
> 2020-03-23 13:33:27,003 INFO  [Listener at localhost/50281] 
> hbase.ChoreService(157): Chore ScheduledChore name=QuotaRefresherChore, 
> period=180, unit=MILLISECONDS is enabled.
> {code}
> If I run it standalone it passes. Fails in nightly currently.
> Rather than spend time on this, disabling for now. It came in a good while 
> ago as part of HBASE-21820



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


Re: Request for Apache Hbase slack channel -- apache-hbase.slack.com

2020-03-23 Thread Nick Dimiduk
Invitation sent.

On Sat, Mar 21, 2020 at 11:07 PM mallik.v.ar...@gmail.com <
mallik.v.ar...@gmail.com> wrote:

> Invite*
> ---
> Mallikarjun
>
>
> On Sun, Mar 22, 2020 at 11:36 AM mallik.v.ar...@gmail.com <
> mallik.v.ar...@gmail.com> wrote:
>
> >
> > ---
> > Mallikarjun
> >
>


[jira] [Resolved] (HBASE-23885) [Flakey Test] TestReplicationStatus

2020-03-23 Thread Michael Stack (Jira)


 [ 
https://issues.apache.org/jira/browse/HBASE-23885?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Michael Stack resolved HBASE-23885.
---
Fix Version/s: (was: 2.3.0)
   (was: 3.0.0)
   Resolution: Not A Problem

Seems to be cleared up. Resolving as no longer an issue.

> [Flakey Test] TestReplicationStatus
> ---
>
> Key: HBASE-23885
> URL: https://issues.apache.org/jira/browse/HBASE-23885
> Project: HBase
>  Issue Type: Bug
>  Components: flakies
>Reporter: Michael Stack
>Priority: Major
>
> Fails about 20% of the time even when i run locally. Spent time trying to 
> untangle but this is an awkward one. It is old. Subclasses 
> TestReplicationBase which itself needs cleanup. Tried adding barriers to wait 
> on events but holding up main thread messes up the test. Needs deeper dive.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Resolved] (HBASE-24034) [Flakey Tests] A couple of fixes and cleanups

2020-03-23 Thread Michael Stack (Jira)


 [ 
https://issues.apache.org/jira/browse/HBASE-24034?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Michael Stack resolved HBASE-24034.
---
Fix Version/s: 2.3.0
   3.0.0
   Resolution: Fixed

I merged this to branch-2, 2.3, and master.

> [Flakey Tests] A couple of fixes and cleanups
> -
>
> Key: HBASE-24034
> URL: https://issues.apache.org/jira/browse/HBASE-24034
> Project: HBase
>  Issue Type: Bug
>  Components: flakies
>Reporter: Michael Stack
>Assignee: Michael Stack
>Priority: Major
> Fix For: 3.0.0, 2.3.0
>
>
> Here's a few cleanups and flakey fixes accumulated in last few days:
> {code}
> 
> hbase-rsgroup/src/test/java/org/apache/hadoop/hbase/rsgroup/TestRSGroupMajorCompactionTTL.java
>  Remove spurious assert. Just before this it waits an arbitrary 10
>  seconds. Compactions could have completed inside this time. The spirit
>  of the test remains.
> 
> hbase-server/src/main/java/org/apache/hadoop/hbase/master/cleaner/HFileCleaner.java
>  Get log cleaner to go down promptly; its sticking around. See if this
>  helps with TestMasterShutdown
> 
> hbase-server/src/main/java/org/apache/hadoop/hbase/regionserver/wal/FSHLog.java
>  We get a rare NPE trying to sync. Make local copy of SyncFuture and see
>  if that helps.
> 
> hbase-server/src/test/java/org/apache/hadoop/hbase/client/TestAsyncRegionAdminApi.java
>  Compaction may have completed when not expected; allow for it.
> 
> hbase-server/src/test/java/org/apache/hadoop/hbase/client/TestBlockEvictionFromClient.java
>  Add wait before testing. Compaction may not have completed. Let
>  compaction complete before progressing and then test for empty cache.
> 
> hbase-server/src/test/java/org/apache/hadoop/hbase/master/TestMasterShutdown.java
>  Less resources.
> 
> hbase-server/src/test/java/org/apache/hadoop/hbase/master/balancer/TestDefaultLoadBalancer.java
>  Less resources.
> 
> hbase-server/src/test/java/org/apache/hadoop/hbase/regionserver/TestEndToEndSplitTransaction.java
>  Wait till online before we try and do compaction (else request is
>  ignored)
> 
> hbase-server/src/test/java/org/apache/hadoop/hbase/tool/TestCanaryTool.java
>  Disable test that tests for timeout that fails randomly w/ mockito 
> complaint on some mac os
>  x's.
> {code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (HBASE-24037) Add ut for root dir and wal root dir are different

2020-03-23 Thread Guanghao Zhang (Jira)
Guanghao Zhang created HBASE-24037:
--

 Summary: Add ut for root dir and wal root dir are different
 Key: HBASE-24037
 URL: https://issues.apache.org/jira/browse/HBASE-24037
 Project: HBase
  Issue Type: Sub-task
Reporter: Guanghao Zhang






--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Reopened] (HBASE-23829) Get `-PrunSmallTests` passing on JDK11

2020-03-23 Thread Wei-Chiu Chuang (Jira)


 [ 
https://issues.apache.org/jira/browse/HBASE-23829?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wei-Chiu Chuang reopened HBASE-23829:
-

Reopen this.

I am getting consistent test failure as reported in HBASE-23976, and I traced 
it to this commit.
I am on JDK 8.

{noformat}
2020-03-23 20:57:04,934 ERROR [Time-limited test] bucket.BucketCache(312): 
Can't restore from 
file[/Users/weichiu/sandbox/hbase/hbase-server/target/test-data/c9d48c67-87ed-70cb-e19c-4dc6c14c29c6/bucket.persistence]
 because of 
java.io.IOException: Mismatch of checksum! The persistent checksum is 
`9"�0����X!ɍ=, but the calculate checksum is 1��h&�B���D(�
at 
org.apache.hadoop.hbase.io.hfile.bucket.PersistentIOEngine.verifyFileIntegrity(PersistentIOEngine.java:55)
at 
org.apache.hadoop.hbase.io.hfile.bucket.BucketCache.parsePB(BucketCache.java:1158)
at 
org.apache.hadoop.hbase.io.hfile.bucket.BucketCache.retrieveFromFile(BucketCache.java:1106)
at 
org.apache.hadoop.hbase.io.hfile.bucket.BucketCache.(BucketCache.java:310)
at 
org.apache.hadoop.hbase.io.hfile.bucket.BucketCache.(BucketCache.java:258)
at 
org.apache.hadoop.hbase.io.hfile.bucket.TestVerifyBucketCacheFile.testRetrieveFromFile(TestVerifyBucketCacheFile.java:116)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at 
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
at 
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:498)
at 
org.junit.runners.model.FrameworkMethod$1.runReflectiveCall(FrameworkMethod.java:59)
at 
org.junit.internal.runners.model.ReflectiveCallable.run(ReflectiveCallable.java:12)
at 
org.junit.runners.model.FrameworkMethod.invokeExplosively(FrameworkMethod.java:56)
at 
org.junit.internal.runners.statements.InvokeMethod.evaluate(InvokeMethod.java:17)
at org.junit.runners.ParentRunner$3.evaluate(ParentRunner.java:306)
at 
org.junit.runners.BlockJUnit4ClassRunner$1.evaluate(BlockJUnit4ClassRunner.java:100)
at org.junit.runners.ParentRunner.runLeaf(ParentRunner.java:366)
at 
org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:103)
at 
org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:63)
at org.junit.runners.ParentRunner$4.run(ParentRunner.java:331)
at org.junit.runners.ParentRunner$1.schedule(ParentRunner.java:79)
at org.junit.runners.ParentRunner.runChildren(ParentRunner.java:329)
at org.junit.runners.ParentRunner.access$100(ParentRunner.java:66)
at org.junit.runners.ParentRunner$2.evaluate(ParentRunner.java:293)
at org.junit.runners.ParentRunner.run(ParentRunner.java:413)
at org.junit.runners.Suite.runChild(Suite.java:128)
at org.junit.runners.Suite.runChild(Suite.java:27)
at org.junit.runners.ParentRunner$4.run(ParentRunner.java:331)
at org.junit.runners.ParentRunner$1.schedule(ParentRunner.java:79)
at org.junit.runners.ParentRunner.runChildren(ParentRunner.java:329)
at org.junit.runners.ParentRunner.access$100(ParentRunner.java:66)
at org.junit.runners.ParentRunner$2.evaluate(ParentRunner.java:293)
at 
org.junit.internal.runners.statements.FailOnTimeout$CallableStatement.call(FailOnTimeout.java:288)
at 
org.junit.internal.runners.statements.FailOnTimeout$CallableStatement.call(FailOnTimeout.java:282)
at java.util.concurrent.FutureTask.run$$$capture(FutureTask.java:266)
at java.util.concurrent.FutureTask.run(FutureTask.java)
at java.lang.Thread.run(Thread.java:748)
{noformat}

> Get `-PrunSmallTests` passing on JDK11
> --
>
> Key: HBASE-23829
> URL: https://issues.apache.org/jira/browse/HBASE-23829
> Project: HBase
>  Issue Type: Sub-task
>  Components: test
>Reporter: Nick Dimiduk
>Assignee: Nick Dimiduk
>Priority: Major
> Fix For: 3.0.0, 2.3.0
>
>
> Start with the small tests, shaking out issues identified by the harness. So 
> far it seems like {{-Dhadoop.profile=3.0}} and 
> {{-Dhadoop-three.version=3.3.0-SNAPSHOT}} maybe be required.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Resolved] (HBASE-23919) [Flakey Test] Standalone Zookeeper won't start (minizookeepercluster won't come up)

2020-03-23 Thread Michael Stack (Jira)


 [ 
https://issues.apache.org/jira/browse/HBASE-23919?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Michael Stack resolved HBASE-23919.
---
Resolution: Not A Problem

HBASE-23993 fixed this condition

> [Flakey Test] Standalone Zookeeper won't start (minizookeepercluster won't 
> come up)
> ---
>
> Key: HBASE-23919
> URL: https://issues.apache.org/jira/browse/HBASE-23919
> Project: HBase
>  Issue Type: Bug
>  Components: flakies
>Reporter: Michael Stack
>Priority: Major
>
> I've seen this on occasion across different hardwares; the standalone 
> zookeeper won't come up and then random unit test fails in its junit startup 
> phase as part of launching mini cluster.
> I've been trying to track this w/ a while now adding in logging, using more 
> of zk client instead of the copy/paste we've had a long while now, and adding 
> in logging (I've been running locally w/ zk logging set to INFO).
> It looks like this currently where the last thing out of the server launch is 
> this
> {code:java}
>   2020-03-02 07:57:46,129 INFO  [Time-limited test] 
> server.ZooKeeperServer(854): maxSessionTimeout set to -1  
>   
> 2020-03-02 07:57:46,139 INFO  [Time-limited test] 
> server.NIOServerCnxnFactory(89): binding to port 0.0.0.0/0.0.0.0:49316
>   
> 2020-03-02 07:57:46,181 INFO  [Time-limited test] 
> zookeeper.MiniZooKeeperCluster(256): Started connectionTimeout=3, 
> dir=/Users/stack/checkouts/hbase.git/hbase-server/target/test-data/89d57393-fe97-9200-630f-7843ee406bd2/
>   cluster_6b4d6f67-7978-dc67-a1a3-7b1b0c0e4268/zookeeper_0, 
> clientPort=49316, 
> dataDir=/Users/stack/checkouts/hbase.git/hbase-server/target/test-data/89d57393-fe97-9200-630f-7843ee406bd2/cluster_6b4d6f67-7978-dc67-a1a3-7b1b0c0e4268/
>  zookeeper_0/version-2, 
> dataLogDir=/Users/stack/checkouts/hbase.git/hbase-server/target/test-data/89d57393-fe97-9200-630f-7843ee406bd2/cluster_6b4d6f67-7978-dc67-a1a3-7b1b0c0e4268/zookeeper_0/version-2,
>  tickTime=2000,  maxClientCnxns=300, minSessionTimeout=4000, 
> maxSessionTimeout=4, serverId=0{code}
> ... then the client just does this over and over:
> {code:java}
>  2020-03-02 07:57:46,182 INFO  [Time-limited test] 
> client.FourLetterWordMain(65): connecting to localhost 49316  
>   
> 2020-03-02 07:57:46,213 INFO  [Time-limited test] 
> zookeeper.MiniZooKeeperCluster(453): localhost:49316 not up   
>   
> java.net.SocketException: Connection reset
>   
>   
>   at 
> java.net.SocketInputStream.read(SocketInputStream.java:209)   
>   
>at 
> java.net.SocketInputStream.read(SocketInputStream.java:141)   
>   
>at 
> sun.nio.cs.StreamDecoder.readBytes(StreamDecoder.java:284)
>   
>at 
> sun.nio.cs.StreamDecoder.implRead(StreamDecoder.java:326) 
>   
>at 
> sun.nio.cs.StreamDecoder.read(StreamDecoder.java:178) 
>   
>at 
> java.io.InputStreamReader.read(InputStreamReader.java:184)
>   
>at 
> java.io.BufferedReader.fill(BufferedReader.java:161)  
>   
>at 
> java.io.Bu

[jira] [Resolved] (HBASE-22740) [RSGroup] Forward-port HBASE-22658 to master branch and branch-2.x

2020-03-23 Thread Reid Chan (Jira)


 [ 
https://issues.apache.org/jira/browse/HBASE-22740?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Reid Chan resolved HBASE-22740.
---
Hadoop Flags: Reviewed
  Resolution: Fixed

> [RSGroup] Forward-port HBASE-22658 to master branch and branch-2.x
> --
>
> Key: HBASE-22740
> URL: https://issues.apache.org/jira/browse/HBASE-22740
> Project: HBase
>  Issue Type: Bug
>  Components: rsgroup
>Reporter: Reid Chan
>Assignee: Reid Chan
>Priority: Major
> Fix For: 2.3.0, master, 2.2.5
>
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Reopened] (HBASE-22740) [RSGroup] Forward-port HBASE-22658 to master branch and branch-2.x

2020-03-23 Thread Reid Chan (Jira)


 [ 
https://issues.apache.org/jira/browse/HBASE-22740?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Reid Chan reopened HBASE-22740:
---

Some conflicts in branch-2x, all are reverted and needed some more works.

> [RSGroup] Forward-port HBASE-22658 to master branch and branch-2.x
> --
>
> Key: HBASE-22740
> URL: https://issues.apache.org/jira/browse/HBASE-22740
> Project: HBase
>  Issue Type: Bug
>  Components: rsgroup
>Reporter: Reid Chan
>Assignee: Reid Chan
>Priority: Major
> Fix For: 2.3.0, master, 2.2.5
>
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)