[jira] [Commented] (HBASE-24099) Use a fair ReentrantReadWriteLock for the region close lock
[ https://issues.apache.org/jira/browse/HBASE-24099?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17074116#comment-17074116 ] Xu Cang commented on HBASE-24099: - >LTT results look like any difference is within normal variance. Tested with >branch-1 HEAD (1.7.0-SNAPSHOT). Server configured with 100 handlers. Great perf data! +1 > Use a fair ReentrantReadWriteLock for the region close lock > --- > > Key: HBASE-24099 > URL: https://issues.apache.org/jira/browse/HBASE-24099 > Project: HBase > Issue Type: Improvement >Reporter: Andrew Kyle Purtell >Assignee: Andrew Kyle Purtell >Priority: Major > Fix For: 3.0.0, 2.3.1, 1.3.7, 1.7.0, 2.4.0, 2.1.10, 1.4.14, 2.2.5 > > Attachments: ltt_results.pdf > > > Consider creating the region's ReentrantReadWriteLock with the fair locking > policy. We have had a couple of production incidents where a regionserver > stalled in shutdown for a very very long time, leading to RIT (FAILED_CLOSE). > The latest example is a 43 minute shutdown, ~40 minutes (2465280 ms) of that > time was spent waiting to acquire the write lock on the region in order to > finish closing it. > {quote} > ... > Finished memstore flush of ~66.92 MB/70167112, currentsize=0 B/0 for region > . in 927ms, sequenceid=6091133815, compaction requested=false at > 1585175635349 (+60 ms) > Disabling writes for close at 1585178100629 (+2465280 ms) > {quote} > This time was spent in between the memstore flush and the task status change > "Disabling writes for close at...". This is at HRegion.java:1481 in 1.3.6: > {code} > 1480: // block waiting for the lock for closing > 1481: lock.writeLock().lock(); // FindBugs: Complains > UL_UNRELEASED_LOCK_EXCEPTION_PATH but seems fine > {code} > > The close lock is operating in unfair mode. The table in question is under > constant high query load. When the close request was received, there were > active readers. After the close request there were more active readers, > near-continuous contention. Although the clients would receive > RegionServerStoppingException and other error notifications, because the > region could not be reassigned, they kept coming, region (re-)location would > find the region still hosted on the stuck server. Finally the closing thread > waiting for the write lock became no longer starved (by chance) after 40 > minutes. > The ReentrantReadWriteLock javadoc is clear about the possibility of > starvation when continuously contended: "_When constructed as non-fair (the > default), the order of entry to the read and write lock is unspecified, > subject to reentrancy constraints. A nonfair lock that is continuously > contended may indefinitely postpone one or more reader or writer threads, but > will normally have higher throughput than a fair lock._" > We could try changing the acquisition semantics of this lock to fair. This is > a one line change, where we call the RW lock constructor. Then: > "_When constructed as fair, threads contend for entry using an approximately > arrival-order policy. When the currently held lock is released, either the > longest-waiting single writer thread will be assigned the write lock, or if > there is a group of reader threads waiting longer than all waiting writer > threads, that group will be assigned the read lock._" > This could be better. The close process will have to wait until all readers > and writers already waiting for acquisition either acquire and release or go > away but won't be starved by future/incoming requests. > There could be a throughput loss in request handling, though, because this is > the global reentrant RW lock for the region. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (HBASE-23886) FileSystemUtilizationChore reduces unnecessary regions
[ https://issues.apache.org/jira/browse/HBASE-23886?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17068959#comment-17068959 ] Xu Cang commented on HBASE-23886: - [~Bo Cui] I want to look into this Jira you reported, could you please explain a bit more? I don't fully get your points. thanks > FileSystemUtilizationChore reduces unnecessary regions > -- > > Key: HBASE-23886 > URL: https://issues.apache.org/jira/browse/HBASE-23886 > Project: HBase > Issue Type: Improvement > Components: Quotas >Affects Versions: 2.2.3 >Reporter: Bo Cui >Priority: Major > Attachments: image-2020-02-23-13-59-32-894.png > > > If the RS has 2000 reegions, and only one region1 has been disk space quota. > FileSystemUtilizationChore can reduce unnecessary regions. > RS only need to collect the region1 size > !image-2020-02-23-13-59-32-894.png|width=608,height=138! -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Assigned] (HBASE-23906) remove_servers_rsgroup did not get the correct result
[ https://issues.apache.org/jira/browse/HBASE-23906?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Xu Cang reassigned HBASE-23906: --- Assignee: Xu Cang > remove_servers_rsgroup did not get the correct result > - > > Key: HBASE-23906 > URL: https://issues.apache.org/jira/browse/HBASE-23906 > Project: HBase > Issue Type: Bug > Components: rsgroup, shell >Affects Versions: 2.2.3 >Reporter: Saurav Mehta >Assignee: Xu Cang >Priority: Major > > While trying to remove a server from a rs group, if the rs group is not > valid/present in the rs group, it still shows *0 row(s)*, instead of an > *exception "Server Invalid/Not found"*. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Assigned] (HBASE-23960) Refactor TestFromClientSide; takes too long since got parameterized
[ https://issues.apache.org/jira/browse/HBASE-23960?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Xu Cang reassigned HBASE-23960: --- Assignee: Xu Cang > Refactor TestFromClientSide; takes too long since got parameterized > --- > > Key: HBASE-23960 > URL: https://issues.apache.org/jira/browse/HBASE-23960 > Project: HBase > Issue Type: Improvement > Components: test > Environment: Test is showing as timed out in our flakies list. It got > refactored recently and runs with three parameters where each run takes > 4minutes plus bringing us up close to our 13mins max for a test. Let me break > it up. >Reporter: Michael Stack >Assignee: Xu Cang >Priority: Major > -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (HBASE-23960) Refactor TestFromClientSide; takes too long since got parameterized
[ https://issues.apache.org/jira/browse/HBASE-23960?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17068955#comment-17068955 ] Xu Cang commented on HBASE-23960: - I can take up this one if no one is planning working on this. > Refactor TestFromClientSide; takes too long since got parameterized > --- > > Key: HBASE-23960 > URL: https://issues.apache.org/jira/browse/HBASE-23960 > Project: HBase > Issue Type: Improvement > Components: test > Environment: Test is showing as timed out in our flakies list. It got > refactored recently and runs with three parameters where each run takes > 4minutes plus bringing us up close to our 13mins max for a test. Let me break > it up. >Reporter: Michael Stack >Priority: Major > -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Updated] (HBASE-24028) MapReduce on snapshot restores and opens all regions in each mapper
[ https://issues.apache.org/jira/browse/HBASE-24028?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Xu Cang updated HBASE-24028: Description: Given this scenario: one MR job scans a table (with many regions). I will use 'RestoreSnapshotHelper' to restore snapshot for all regions in each mapper. In the code [https://github.com/apache/hbase/blob/branch-2.0/hbase-server/src/main/java/org/apache/hadoop/hbase/snapshot/RestoreSnapshotHelper.java#L183] Seems there is no way to only restore relevant regions from snapshot to region. This leads to extreme slowness and waste of resource. One quick example I can show as below, in my test, there are 2 regions in a testing table, each mapper opens and iterates 2 regions. (which is wrong IMO, each mapper should only touch 1 region based on the splits) I have checked the splits are correct, which have correct startKey and endKey 2020-03-19 18:58:15,225 INFO [main] mapred.MapTask - Map output collector class = org.apache.hadoop.mapred.MapTask$MapOutputBuffer 2020-03-19 18:58:15,285 INFO [main] snapshot.RestoreSnapshotHelper - region to add: *d7f85b4a9d3fa22a5e7b88bda39f6d50* 2020-03-19 18:58:15,285 INFO [main] snapshot.RestoreSnapshotHelper - region to add: *69dd3fdba3698f827f8883ed911161ef* 2020-03-19 18:58:15,286 INFO [main] snapshot.RestoreSnapshotHelper - clone region=d7f85b4a9d3fa22a5e7b88bda39f6d50 as d7f85b4a9d3fa22a5e7b88bda39f6d50 Please correct me if I am wrong or miss anything. thanks. So if I misunderstood anything, can anyone point to me where in this class, can distinguish which region to go through for different mappers? btw the original implementation for MR on Snapshot is here, there weren't too many big changes after that HBASE-8369 was: Given this scenario: one MR job scans a table (with many regions). I will use 'RestoreSnapshotHelper' to restore snapshot for all regions in each mapper. In the code [https://github.com/apache/hbase/blob/branch-2.0/hbase-server/src/main/java/org/apache/hadoop/hbase/snapshot/RestoreSnapshotHelper.java#L183] Seems there is no way to only restore relevant regions from snapshot to region. This leads to extreme slowness and waste of resource. Please correct me if I am wrong or miss anything. thanks. One quick example I san show as below, in my test, there are 2 regions in a testing table. and each mapper opens and iterates 2 regions. 2020-03-19 18:58:15,225 INFO [main] mapred.MapTask - Map output collector class = org.apache.hadoop.mapred.MapTask$MapOutputBuffer 2020-03-19 18:58:15,285 INFO [main] snapshot.RestoreSnapshotHelper - region to add: *d7f85b4a9d3fa22a5e7b88bda39f6d50* 2020-03-19 18:58:15,285 INFO [main] snapshot.RestoreSnapshotHelper - region to add: *69dd3fdba3698f827f8883ed911161ef* 2020-03-19 18:58:15,286 INFO [main] snapshot.RestoreSnapshotHelper - clone region=d7f85b4a9d3fa22a5e7b88bda39f6d50 as d7f85b4a9d3fa22a5e7b88bda39f6d50 So if I misunderstood anything, can anyone point to me where in this class, can distinguish which region to go through for different mappers? btw the original implementation for MR on Snapshot is here, there weren't too many big changes after that HBASE-8369 > MapReduce on snapshot restores and opens all regions in each mapper > --- > > Key: HBASE-24028 > URL: https://issues.apache.org/jira/browse/HBASE-24028 > Project: HBase > Issue Type: Bug >Affects Versions: 2.3.0, 1.6.0 >Reporter: Xu Cang >Priority: Major > > Given this scenario: one MR job scans a table (with many regions). I will use > 'RestoreSnapshotHelper' to restore snapshot for all regions in each mapper. > In the code > [https://github.com/apache/hbase/blob/branch-2.0/hbase-server/src/main/java/org/apache/hadoop/hbase/snapshot/RestoreSnapshotHelper.java#L183] > Seems there is no way to only restore relevant regions from snapshot to > region. > This leads to extreme slowness and waste of resource. > > > One quick example I can show as below, in my test, there are 2 regions in a > testing table, each mapper opens and iterates 2 regions. (which is wrong > IMO, each mapper should only touch 1 region based on the splits) I have > checked the splits are correct, which have correct startKey and endKey > 2020-03-19 18:58:15,225 INFO [main] mapred.MapTask - Map output collector > class = org.apache.hadoop.mapred.MapTask$MapOutputBuffer > 2020-03-19 18:58:15,285 INFO [main] snapshot.RestoreSnapshotHelper - region > to add: *d7f85b4a9d3fa22a5e7b88bda39f6d50* > 2020-03-19 18:58:15,285 INFO [main] snapshot.RestoreSnapshotHelper - region > to add: *69dd3fdba3698f827f8883ed911161ef* > 2020-03-19 18:58:15,286 INFO [main] snapshot.RestoreSnapshotHelper - clone > region=d7f85b4a9d3fa22a5e7b88bda39f6d50 as d7f85b4a9d3fa22a5e7b88bda
[jira] [Updated] (HBASE-24028) MapReduce on snapshot restores and opens all regions in each mapper
[ https://issues.apache.org/jira/browse/HBASE-24028?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Xu Cang updated HBASE-24028: External issue URL: https://issues.apache.org/jira/browse/PHOENIX-5774 > MapReduce on snapshot restores and opens all regions in each mapper > --- > > Key: HBASE-24028 > URL: https://issues.apache.org/jira/browse/HBASE-24028 > Project: HBase > Issue Type: Bug >Affects Versions: 2.3.0, 1.6.0 >Reporter: Xu Cang >Priority: Major > > Given this scenario: one MR job scans a table (with many regions). I will use > 'RestoreSnapshotHelper' to restore snapshot for all regions in each mapper. > In the code > [https://github.com/apache/hbase/blob/branch-2.0/hbase-server/src/main/java/org/apache/hadoop/hbase/snapshot/RestoreSnapshotHelper.java#L183] > Seems there is no way to only restore relevant regions from snapshot to > region. > This leads to extreme slowness and waste of resource. > Please correct me if I am wrong or miss anything. thanks. > > One quick example I san show as below, in my test, there are 2 regions in a > testing table. and each mapper opens and iterates 2 regions. > 2020-03-19 18:58:15,225 INFO [main] mapred.MapTask - Map output collector > class = org.apache.hadoop.mapred.MapTask$MapOutputBuffer > 2020-03-19 18:58:15,285 INFO [main] snapshot.RestoreSnapshotHelper - region > to add: *d7f85b4a9d3fa22a5e7b88bda39f6d50* > 2020-03-19 18:58:15,285 INFO [main] snapshot.RestoreSnapshotHelper - region > to add: *69dd3fdba3698f827f8883ed911161ef* > 2020-03-19 18:58:15,286 INFO [main] snapshot.RestoreSnapshotHelper - clone > region=d7f85b4a9d3fa22a5e7b88bda39f6d50 as d7f85b4a9d3fa22a5e7b88bda39f6d50 > > So if I misunderstood anything, can anyone point to me where in this class, > can distinguish which region to go through for different mappers? > > btw the original implementation for MR on Snapshot is here, there weren't too > many big changes after that HBASE-8369 > > -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Comment Edited] (HBASE-24028) MapReduce on snapshot restores and opens all regions in each mapper
[ https://issues.apache.org/jira/browse/HBASE-24028?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17063651#comment-17063651 ] Xu Cang edited comment on HBASE-24028 at 3/20/20, 10:39 PM: Fyi [~apurtell] [~rushabh.shah] [~akshita.malhotra] [~abhishek.chouhan] [~gjacoby] [~vjasani] was (Author: xucang): Fyi [~apurtell] [~akshita.malhotra] [~abhishek.chouhan] [~gjacoby] [~vjasani] > MapReduce on snapshot restores and opens all regions in each mapper > --- > > Key: HBASE-24028 > URL: https://issues.apache.org/jira/browse/HBASE-24028 > Project: HBase > Issue Type: Bug >Affects Versions: 2.3.0, 1.6.0 >Reporter: Xu Cang >Priority: Major > > Given this scenario: one MR job scans a table (with many regions). I will use > 'RestoreSnapshotHelper' to restore snapshot for all regions in each mapper. > In the code > [https://github.com/apache/hbase/blob/branch-2.0/hbase-server/src/main/java/org/apache/hadoop/hbase/snapshot/RestoreSnapshotHelper.java#L183] > Seems there is no way to only restore relevant regions from snapshot to > region. > This leads to extreme slowness and waste of resource. > Please correct me if I am wrong or miss anything. thanks. > > One quick example I san show as below, in my test, there are 2 regions in a > testing table. and each mapper opens and iterates 2 regions. > 2020-03-19 18:58:15,225 INFO [main] mapred.MapTask - Map output collector > class = org.apache.hadoop.mapred.MapTask$MapOutputBuffer > 2020-03-19 18:58:15,285 INFO [main] snapshot.RestoreSnapshotHelper - region > to add: *d7f85b4a9d3fa22a5e7b88bda39f6d50* > 2020-03-19 18:58:15,285 INFO [main] snapshot.RestoreSnapshotHelper - region > to add: *69dd3fdba3698f827f8883ed911161ef* > 2020-03-19 18:58:15,286 INFO [main] snapshot.RestoreSnapshotHelper - clone > region=d7f85b4a9d3fa22a5e7b88bda39f6d50 as d7f85b4a9d3fa22a5e7b88bda39f6d50 > > So if I misunderstood anything, can anyone point to me where in this class, > can distinguish which region to go through for different mappers? > > btw the original implementation for MR on Snapshot is here, there weren't too > many big changes after that HBASE-8369 > > -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (HBASE-24028) MapReduce on snapshot restores and opens all regions in each mapper
[ https://issues.apache.org/jira/browse/HBASE-24028?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17063651#comment-17063651 ] Xu Cang commented on HBASE-24028: - Fyi [~apurtell] [~akshita.malhotra] [~abhishek.chouhan] [~gjacoby] [~vjasani] > MapReduce on snapshot restores and opens all regions in each mapper > --- > > Key: HBASE-24028 > URL: https://issues.apache.org/jira/browse/HBASE-24028 > Project: HBase > Issue Type: Bug >Affects Versions: 2.3.0, 1.6.0 >Reporter: Xu Cang >Priority: Major > > Given this scenario: one MR job scans a table (with many regions). I will use > 'RestoreSnapshotHelper' to restore snapshot for all regions in each mapper. > In the code > [https://github.com/apache/hbase/blob/branch-2.0/hbase-server/src/main/java/org/apache/hadoop/hbase/snapshot/RestoreSnapshotHelper.java#L183] > Seems there is no way to only restore relevant regions from snapshot to > region. > This leads to extreme slowness and waste of resource. > Please correct me if I am wrong or miss anything. thanks. > > One quick example I san show as below, in my test, there are 2 regions in a > testing table. and each mapper opens and iterates 2 regions. > 2020-03-19 18:58:15,225 INFO [main] mapred.MapTask - Map output collector > class = org.apache.hadoop.mapred.MapTask$MapOutputBuffer > 2020-03-19 18:58:15,285 INFO [main] snapshot.RestoreSnapshotHelper - region > to add: *d7f85b4a9d3fa22a5e7b88bda39f6d50* > 2020-03-19 18:58:15,285 INFO [main] snapshot.RestoreSnapshotHelper - region > to add: *69dd3fdba3698f827f8883ed911161ef* > 2020-03-19 18:58:15,286 INFO [main] snapshot.RestoreSnapshotHelper - clone > region=d7f85b4a9d3fa22a5e7b88bda39f6d50 as d7f85b4a9d3fa22a5e7b88bda39f6d50 > > So if I misunderstood anything, can anyone point to me where in this class, > can distinguish which region to go through for different mappers? > > btw the original implementation for MR on Snapshot is here, there weren't too > many big changes after that HBASE-8369 > > -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (HBASE-24028) MapReduce on snapshot restores and opens all regions in each mapper
Xu Cang created HBASE-24028: --- Summary: MapReduce on snapshot restores and opens all regions in each mapper Key: HBASE-24028 URL: https://issues.apache.org/jira/browse/HBASE-24028 Project: HBase Issue Type: Bug Affects Versions: 1.6.0, 2.3.0 Reporter: Xu Cang Given this scenario: one MR job scans a table (with many regions). I will use 'RestoreSnapshotHelper' to restore snapshot for all regions in each mapper. In the code [https://github.com/apache/hbase/blob/branch-2.0/hbase-server/src/main/java/org/apache/hadoop/hbase/snapshot/RestoreSnapshotHelper.java#L183] Seems there is no way to only restore relevant regions from snapshot to region. This leads to extreme slowness and waste of resource. Please correct me if I am wrong or miss anything. thanks. One quick example I san show as below, in my test, there are 2 regions in a testing table. and each mapper opens and iterates 2 regions. 2020-03-19 18:58:15,225 INFO [main] mapred.MapTask - Map output collector class = org.apache.hadoop.mapred.MapTask$MapOutputBuffer 2020-03-19 18:58:15,285 INFO [main] snapshot.RestoreSnapshotHelper - region to add: *d7f85b4a9d3fa22a5e7b88bda39f6d50* 2020-03-19 18:58:15,285 INFO [main] snapshot.RestoreSnapshotHelper - region to add: *69dd3fdba3698f827f8883ed911161ef* 2020-03-19 18:58:15,286 INFO [main] snapshot.RestoreSnapshotHelper - clone region=d7f85b4a9d3fa22a5e7b88bda39f6d50 as d7f85b4a9d3fa22a5e7b88bda39f6d50 So if I misunderstood anything, can anyone point to me where in this class, can distinguish which region to go through for different mappers? btw the original implementation for MR on Snapshot is here, there weren't too many big changes after that HBASE-8369 -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Comment Edited] (HBASE-21394) Restore snapshot in parallel
[ https://issues.apache.org/jira/browse/HBASE-21394?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17063006#comment-17063006 ] Xu Cang edited comment on HBASE-21394 at 3/20/20, 12:19 AM: While I am debugging snapshot related issue. I found this JIRA. >From my observation, this method : RestoreSnapshotHelper#restoreHdfsRegions() >will always try to iterate all regions and open all hfiles for the table from >all mappers. So suppose we have 500 mappers scanning snapshot of the table, all 500 mappers are iterating all regions/hfiles. (even though the splitting was correct for mappers, but this scanner makes all mappers are scanning all regions) Was this the same symptom you saw and was that by design? (BTW, I am using branch-1 code). [~openinx] Thanks! was (Author: xucang): While I am debugging snapshot related issue. I found this JIRA. >From my observation, this method : RestoreSnapshotHelper#restoreHdfsRegions() >will always try to iterate all regions and open all hfiles for the table from >all mappers. So suppose we have 500 mappers scanning snapshot of the table, all 500 mappers are iterating all regions/hfiles. (even though the splitting was correct for mappers, but this scanner makes all mappers are scanning all regions) Was this the same symptom you saw and was that by design? (BTW, I am using branch-1 code, haven't tried this parallel improvements). [~openinx] Thanks! > Restore snapshot in parallel > > > Key: HBASE-21394 > URL: https://issues.apache.org/jira/browse/HBASE-21394 > Project: HBase > Issue Type: Improvement >Reporter: Zheng Hu >Assignee: Zheng Hu >Priority: Major > Fix For: 3.0.0, 2.2.0, 2.1.2 > > > Our MapReduce/Spark job is highly dependent on SnapshotScanner. When restore > a big table for SnapshotScanner, it'll take hours .. > Restore snapshot in parallel will helps a lot. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Comment Edited] (HBASE-21394) Restore snapshot in parallel
[ https://issues.apache.org/jira/browse/HBASE-21394?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17063006#comment-17063006 ] Xu Cang edited comment on HBASE-21394 at 3/20/20, 12:09 AM: While I am debugging snapshot related issue. I found this JIRA. >From my observation, this method : RestoreSnapshotHelper#restoreHdfsRegions() >will always try to iterate all regions and open all hfiles for the table from >all mappers. So suppose we have 500 mappers scanning snapshot of the table, all 500 mappers are iterating all regions/hfiles. (even though the splitting was correct for mappers, but this scanner makes all mappers are scanning all regions) Was this the same symptom you saw and was that by design? (BTW, I am using branch-1 code, haven't tried this parallel improvements). [~openinx] Thanks! was (Author: xucang): While I am debugging snapshot related issue. I found this JIRA. >From my observation, this method : RestoreSnapshotHelper#restoreHdfsRegions() >will always try to iterate all regions and open all hfiles for the table from >all mappers. So suppose we have 500 mappers scanning snapshot of the table, all 500 mappers are iterating all regions/hfiles. Was this the same symptom you saw and was that by design? (BTW, I am using branch-1 code, haven't tried this parallel improvements). [~openinx] Thanks! > Restore snapshot in parallel > > > Key: HBASE-21394 > URL: https://issues.apache.org/jira/browse/HBASE-21394 > Project: HBase > Issue Type: Improvement >Reporter: Zheng Hu >Assignee: Zheng Hu >Priority: Major > Fix For: 3.0.0, 2.2.0, 2.1.2 > > > Our MapReduce/Spark job is highly dependent on SnapshotScanner. When restore > a big table for SnapshotScanner, it'll take hours .. > Restore snapshot in parallel will helps a lot. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (HBASE-21394) Restore snapshot in parallel
[ https://issues.apache.org/jira/browse/HBASE-21394?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17063006#comment-17063006 ] Xu Cang commented on HBASE-21394: - While I am debugging snapshot related issue. I found this JIRA. >From my observation, this method : RestoreSnapshotHelper#restoreHdfsRegions() >will always try to iterate all regions and open all hfiles for the table from >all mappers. So suppose we have 500 mappers scanning snapshot of the table, all 500 mappers are iterating all regions/hfiles. Was this the same symptom you saw and was that by design? (BTW, I am using branch-1 code, haven't tried this parallel improvements). [~openinx] Thanks! > Restore snapshot in parallel > > > Key: HBASE-21394 > URL: https://issues.apache.org/jira/browse/HBASE-21394 > Project: HBase > Issue Type: Improvement >Reporter: Zheng Hu >Assignee: Zheng Hu >Priority: Major > Fix For: 3.0.0, 2.2.0, 2.1.2 > > > Our MapReduce/Spark job is highly dependent on SnapshotScanner. When restore > a big table for SnapshotScanner, it'll take hours .. > Restore snapshot in parallel will helps a lot. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (HBASE-23774) Announce user-zh list
[ https://issues.apache.org/jira/browse/HBASE-23774?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17026968#comment-17026968 ] Xu Cang commented on HBASE-23774: - good idea. [~elserj] +1 Has this email been setup? > Announce user-zh list > - > > Key: HBASE-23774 > URL: https://issues.apache.org/jira/browse/HBASE-23774 > Project: HBase > Issue Type: Task > Components: website >Reporter: Josh Elser >Assignee: Josh Elser >Priority: Trivial > Attachments: HBASE-23774.001.patch > > > Let folks know about the new user-zh list that is dedicated for user > questions in chinese (as opposed to the norm of english on user) -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (HBASE-23578) [UI] Master UI shows long stack traces when table is broken
[ https://issues.apache.org/jira/browse/HBASE-23578?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17026342#comment-17026342 ] Xu Cang commented on HBASE-23578: - LGTM, please fix a minor conflict. [~Shuhei Yamasaki] > [UI] Master UI shows long stack traces when table is broken > --- > > Key: HBASE-23578 > URL: https://issues.apache.org/jira/browse/HBASE-23578 > Project: HBase > Issue Type: Improvement > Components: master, UI >Reporter: Shuhei Yamasaki >Priority: Minor > Attachments: stackCompact1_short.png, table_jsp.png > > > The table.jsp in Master UI shows long stack traces when table is broken. > (shown as table_jsp.png) > This messages are hard to read and web page is very wide because stack traces > displayed in a single line. > -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (HBASE-23744) FastPathBalancedQueueRpcExecutor should enforce queue length of 0
[ https://issues.apache.org/jira/browse/HBASE-23744?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17025408#comment-17025408 ] Xu Cang commented on HBASE-23744: - The idea you showed in PR makes sense to me. Just wondering, is there another way to properly "temporarily prevent writes on cluster" such as disabling PRC handling ? Setting the callqueue.length to 0 is a bit subtle to indicate the fact we want to disable writes. If we do so, could you please add a one sentence comment in FastPathBalancedQueueRpcExecutor class? thanks. [~gjacoby] > FastPathBalancedQueueRpcExecutor should enforce queue length of 0 > - > > Key: HBASE-23744 > URL: https://issues.apache.org/jira/browse/HBASE-23744 > Project: HBase > Issue Type: Bug >Reporter: Geoffrey Jacoby >Assignee: Geoffrey Jacoby >Priority: Minor > > FastPathBalancedQueueRpcExecutor allows RPC requests to skip the RPC queue > and get worked by an available handler under certain circumstances. > Relatedly, the hbase.ipc.server.max.callqueue.length parameter can be set to > 0, including dynamically. This can be useful to temporarily prevent writes on > a cluster. > When this is the case the executor is supposed to block all dispatching. > However, the FastPathBalancedQueueRpcExecutor will still dispatch the request > if one of the "fast path" handlers is available on its stack. This both isn't > the desired effect, and also makes > TestSimpleRpcScheduler.testSoftAndHardQueueLimits unstable when it checks the > queue length 0 behavior. > A simple fix is just to check max queue length > 0 before > FastPathBalancedQueueRpcExecutor pops the fast handler off the stack. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (HBASE-21593) closing flags show be set false in HRegion
[ https://issues.apache.org/jira/browse/HBASE-21593?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17024718#comment-17024718 ] Xu Cang commented on HBASE-21593: - Where is the github PR for this issue? [~stack] I came to cleanup my Jira issues and wanted to follow up on this one. If it has been solved. I can close it. Otherwise I can give another try. thanks. > closing flags show be set false in HRegion > -- > > Key: HBASE-21593 > URL: https://issues.apache.org/jira/browse/HBASE-21593 > Project: HBase > Issue Type: Bug >Reporter: xiaolerzheng >Assignee: Xu Cang >Priority: Minor > Attachments: HBASE-21593.branch-1.001.patch, > image-2018-12-13-16-04-51-892.png, image-2018-12-13-16-05-09-246.png, > image-2018-12-13-16-05-36-404.png > > > in HRegion.java > > > 1429 // block waiting for the lock for closing > 1430 lock.writeLock().lock(); > 1431 this.closing.set(true); > 1432 status.setStatus("Disabling writes for close"); > > > > > 1557 } finally { > {color:#FF} //should here add {color} > {color:#FF} this.closing.set(false); {color} > 1558 lock.writeLock().unlock(); > 1559 } -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Comment Edited] (HBASE-21893) Space quota: Usage is not calculated correctly if snapshot is created on a table then table is deleted and created again
[ https://issues.apache.org/jira/browse/HBASE-21893?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17024593#comment-17024593 ] Xu Cang edited comment on HBASE-21893 at 1/27/20 7:23 PM: -- [~a00408367] Archived file will be deleted after TTL. (setting is this : hbase.master.hfilecleaner.ttl , The period (in milliseconds) to keep store files in the archive folder before deleting them from the file system) I think the question comes down to if Quota should take archived file into account. If so, there is nothing wrong with above behavior. By reading https://issues.apache.org/jira/browse/HBASE-18135 , seems this is by design. was (Author: xucang): [~a00408367] Archived file will be deleted after TTL. (setting is this : hbase.master.hfilecleaner.ttl , The period (in milliseconds) to keep store files in the archive folder before deleting them from the file system) I think the question comes down to if Quota should take archived file into account. If so, there is nothing wrong with above behavior. > Space quota: Usage is not calculated correctly if snapshot is created on a > table then table is deleted and created again > -- > > Key: HBASE-21893 > URL: https://issues.apache.org/jira/browse/HBASE-21893 > Project: HBase > Issue Type: Bug >Reporter: Ajeet Rai >Priority: Minor > Labels: Quota, Space > > *Steps to reproduce:* > 1: ./hbase pe --table="bugatti" --nomapred --rows=400 sequentialWrite 10 > (will put 4 mb data) > 2: set_quota TYPE => SPACE, TABLE => 'bugatti', LIMIT => '7M', POLICY => > NO_WRITES_COMPACTIONS > 3: snapshot 'bugatti','bugatti_snapshot' > 4: disable 'bugatti' > 5: drop 'bugatti' > 6: create 'bugatti','info0' > 7: set_quota TYPE => SPACE, TABLE => 'bugatti', LIMIT => '5M', POLICY => > NO_WRITES_COMPACTIONS > 8: scan 'bugatti' > >> Observe that no data here and original snapshot size was 4 MB. but current > >> usage is shown as 8 MB -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (HBASE-21893) Space quota: Usage is not calculated correctly if snapshot is created on a table then table is deleted and created again
[ https://issues.apache.org/jira/browse/HBASE-21893?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17024593#comment-17024593 ] Xu Cang commented on HBASE-21893: - [~a00408367] Archived file will be deleted after TTL. (setting is this : hbase.master.hfilecleaner.ttl , The period (in milliseconds) to keep store files in the archive folder before deleting them from the file system) I think the question comes down to if Quota should take archived file into account. If so, there is nothing wrong with above behavior. > Space quota: Usage is not calculated correctly if snapshot is created on a > table then table is deleted and created again > -- > > Key: HBASE-21893 > URL: https://issues.apache.org/jira/browse/HBASE-21893 > Project: HBase > Issue Type: Bug >Reporter: Ajeet Rai >Priority: Minor > Labels: Quota, Space > > *Steps to reproduce:* > 1: ./hbase pe --table="bugatti" --nomapred --rows=400 sequentialWrite 10 > (will put 4 mb data) > 2: set_quota TYPE => SPACE, TABLE => 'bugatti', LIMIT => '7M', POLICY => > NO_WRITES_COMPACTIONS > 3: snapshot 'bugatti','bugatti_snapshot' > 4: disable 'bugatti' > 5: drop 'bugatti' > 6: create 'bugatti','info0' > 7: set_quota TYPE => SPACE, TABLE => 'bugatti', LIMIT => '5M', POLICY => > NO_WRITES_COMPACTIONS > 8: scan 'bugatti' > >> Observe that no data here and original snapshot size was 4 MB. but current > >> usage is shown as 8 MB -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Comment Edited] (HBASE-21902) TestCompactionWithCoprocessor doesn't invoke coprocessor logic
[ https://issues.apache.org/jira/browse/HBASE-21902?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17024581#comment-17024581 ] Xu Cang edited comment on HBASE-21902 at 1/27/20 7:07 PM: -- Hi [~dcz99] thanks for working on this JIRA. Please see this doc how to submit patch correctly. [https://hbase.apache.org/book.html#developing] And by looking at your patch, if you think NoOpScanPolicyObserver is not instantiated, why does throwing the RuntimeException in that class help? thanks. was (Author: xucang): Hi [~dcz99] thanks for working on this JIRA. Please see this doc how to submit patch correctly. [https://hbase.apache.org/book.html#developing] > TestCompactionWithCoprocessor doesn't invoke coprocessor logic > -- > > Key: HBASE-21902 > URL: https://issues.apache.org/jira/browse/HBASE-21902 > Project: HBase > Issue Type: Test > Components: test >Affects Versions: 2.0.4 >Reporter: david zhang >Priority: Minor > Attachments: hbase-test.patch > > > TestCompactionWithCoprocessor is designed to invoke NoOpScanPolicyObserver > which implements default behavior. in reality NoOpScanPolicyObserver isn't > instantiated, TestCompactionWithCoprocessor runs/passes trivially, without > increasing test coverage beyond TestCompaction. > See patch which passes TestCompactionWithCoprocessor. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Comment Edited] (HBASE-21902) TestCompactionWithCoprocessor doesn't invoke coprocessor logic
[ https://issues.apache.org/jira/browse/HBASE-21902?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17024581#comment-17024581 ] Xu Cang edited comment on HBASE-21902 at 1/27/20 7:06 PM: -- Hi [~dcz99] thanks for working on this JIRA. Please see this doc how to submit patch correctly. [https://hbase.apache.org/book.html#developing] was (Author: xucang): Hi [~dcz99] thanks for working on this patch. Please see this doc how to submit patch correctly. [https://hbase.apache.org/book.html#developing] > TestCompactionWithCoprocessor doesn't invoke coprocessor logic > -- > > Key: HBASE-21902 > URL: https://issues.apache.org/jira/browse/HBASE-21902 > Project: HBase > Issue Type: Test > Components: test >Affects Versions: 2.0.4 >Reporter: david zhang >Priority: Minor > Attachments: hbase-test.patch > > > TestCompactionWithCoprocessor is designed to invoke NoOpScanPolicyObserver > which implements default behavior. in reality NoOpScanPolicyObserver isn't > instantiated, TestCompactionWithCoprocessor runs/passes trivially, without > increasing test coverage beyond TestCompaction. > See patch which passes TestCompactionWithCoprocessor. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (HBASE-21902) TestCompactionWithCoprocessor doesn't invoke coprocessor logic
[ https://issues.apache.org/jira/browse/HBASE-21902?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17024581#comment-17024581 ] Xu Cang commented on HBASE-21902: - Hi [~dcz99] thanks for working on this patch. Please see this doc how to submit patch correctly. [https://hbase.apache.org/book.html#developing] > TestCompactionWithCoprocessor doesn't invoke coprocessor logic > -- > > Key: HBASE-21902 > URL: https://issues.apache.org/jira/browse/HBASE-21902 > Project: HBase > Issue Type: Test > Components: test >Affects Versions: 2.0.4 >Reporter: david zhang >Priority: Minor > Attachments: hbase-test.patch > > > TestCompactionWithCoprocessor is designed to invoke NoOpScanPolicyObserver > which implements default behavior. in reality NoOpScanPolicyObserver isn't > instantiated, TestCompactionWithCoprocessor runs/passes trivially, without > increasing test coverage beyond TestCompaction. > See patch which passes TestCompactionWithCoprocessor. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (HBASE-23578) [UI] Master UI shows long stack traces when table is broken
[ https://issues.apache.org/jira/browse/HBASE-23578?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16997710#comment-16997710 ] Xu Cang commented on HBASE-23578: - thank you [~Shuhei Yamasaki] You could submit Pull Request against this repo: [https://github.com/apache/hbase] PRs will be seen here: [https://github.com/apache/hbase/pulls] Or you can submit a patch file directly to this Jira. > [UI] Master UI shows long stack traces when table is broken > --- > > Key: HBASE-23578 > URL: https://issues.apache.org/jira/browse/HBASE-23578 > Project: HBase > Issue Type: Improvement > Components: master, UI >Reporter: Shuhei Yamasaki >Priority: Minor > Attachments: stackCompact1_short.png, table_jsp.png > > > The table.jsp in Master UI shows long stack traces when table is broken. > (shown as table_jsp.png) > This messages are hard to read and web page is very wide because stack traces > displayed in a single line. > -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (HBASE-22978) Online slow response log
[ https://issues.apache.org/jira/browse/HBASE-22978?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16997702#comment-16997702 ] Xu Cang commented on HBASE-22978: - Left some comments in PR. thanks [~vjasani] > Online slow response log > > > Key: HBASE-22978 > URL: https://issues.apache.org/jira/browse/HBASE-22978 > Project: HBase > Issue Type: New Feature > Components: Admin, Operability, regionserver, shell >Affects Versions: 3.0.0, 2.3.0, 1.5.1 >Reporter: Andrew Kyle Purtell >Assignee: Viraj Jasani >Priority: Minor > Fix For: 3.0.0, 2.3.0, 1.6.0 > > Attachments: Screen Shot 2019-10-19 at 2.31.59 AM.png, Screen Shot > 2019-10-19 at 2.32.54 AM.png, Screen Shot 2019-10-19 at 2.34.11 AM.png, > Screen Shot 2019-10-19 at 2.36.14 AM.png > > > Today when an individual RPC exceeds a configurable time bound we log a > complaint by way of the logging subsystem. These log lines look like: > {noformat} > 2019-08-30 22:10:36,195 WARN [,queue=15,port=60020] ipc.RpcServer - > (responseTooSlow): > {"call":"Scan(org.apache.hadoop.hbase.protobuf.generated.ClientProtos$ScanRequest)", > "starttimems":1567203007549, > "responsesize":6819737, > "method":"Scan", > "param":"region { type: REGION_NAME value: > \"tsdb,\\000\\000\\215\\f)o\\024\\302\\220\\000\\000\\000\\000\\000\\001\\000\\000\\000\\000\\000\\006\\000\\000\\000\\000\\000\\005\\000\\000", > "processingtimems":28646, > "client":"10.253.196.215:41116", > "queuetimems":22453, > "class":"HRegionServer"} > {noformat} > Unfortunately we often truncate the request parameters, like in the above > example. We do this because the human readable representation is verbose, the > rate of too slow warnings may be high, and the combination of these things > can overwhelm the log capture system. The truncation is unfortunate because > it eliminates much of the utility of the warnings. For example, the region > name, the start and end keys, and the filter hierarchy are all important > clues for debugging performance problems caused by moderate to low > selectivity queries or queries made at a high rate. > We can maintain an in-memory ring buffer of requests that were judged to be > too slow in addition to the responseTooSlow logging. The in-memory > representation can be complete and compressed. A new admin API and shell > command can provide access to the ring buffer for online performance > debugging. A modest sizing of the ring buffer will prevent excessive memory > utilization for a minor performance debugging feature by limiting the total > number of retained records. There is some chance a high rate of requests will > cause information on other interesting requests to be overwritten before it > can be read. This is the nature of a ring buffer and an acceptable trade off. > The write request types do not require us to retain all information submitted > in the request. We don't need to retain all key-values in the mutation, which > may be too large to comfortably retain. We only need a unique set of row > keys, or even a min/max range, and total counts. > The consumers of this information will be debugging tools. We can afford to > apply fast compression to ring buffer entries (if codec support is > available), something like snappy or zstandard, and decompress on the fly > when servicing the retrieval API request. This will minimize the impact of > retaining more information about slow requests than we do today. > This proposal is for retention of request information only, the same > information provided by responseTooSlow warnings. Total size of response > serialization, possibly also total cell or row counts, should be sufficient > to characterize the response. > Optionally persist new entries added to the ring buffer into one or more > files in HDFS in a write-behind manner. If the HDFS writer blocks or falls > behind and we are unable to persist an entry before it is overwritten, that > is fine. Response too slow logging is best effort. If we can detect this make > a note of it in the log file. Provide a tool for parsing, dumping, filtering, > and pretty printing the slow logs written to HDFS. The tool and the shell can > share and reuse some utility classes and methods for accomplishing that. > — > New shell commands: > {{get_slow_responses [ ... , ] [ , \{ > } ]}} > Retrieve, decode, and pretty print the contents of the too slow response ring > buffer maintained by the given list of servers; or all servers in the cluster > if no list is provided. Optionally provide a map of parameters for filtering > as additional argument. The TABLE filter, which expects a string containing a > table name, will include only entries pertaining to that table. The REGION > filter, which expects a string containing a
[jira] [Commented] (HBASE-23359) RS going down with NPE when splitting a region with compaction disabled in branch-1
[ https://issues.apache.org/jira/browse/HBASE-23359?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16988167#comment-16988167 ] Xu Cang commented on HBASE-23359: - I see this bug is also in branch-1.4, should we also push the fix to there? [~brfrn169] > RS going down with NPE when splitting a region with compaction disabled in > branch-1 > --- > > Key: HBASE-23359 > URL: https://issues.apache.org/jira/browse/HBASE-23359 > Project: HBase > Issue Type: Bug >Reporter: Toshihiro Suzuki >Assignee: Toshihiro Suzuki >Priority: Major > Fix For: 1.6.0 > > > Trying to backport HBASE-22096 to brach-1, I faced the issue where a RS goes > down with NPE when splitting a region with compaction disabled. > The steps to reproduce this issue are as follows: > {code} > compaction_switch false > create "test", "cf" > (0...2000).each{|i| put "test", "row#{i}", "cf:col", "val#{i}"} > split "test" > {code} > Looking at the regionserver log, I saw the following log: > {code} > 2019-12-03 22:25:38,611 INFO [RS:0;10.0.1.11:53504-splits-0] > regionserver.SplitRequest: Running rollback/cleanup of failed split of > test,,1575379535506.50e322ec68162025e17cddffdc2fb17e.; null > java.lang.NullPointerException > at > org.apache.hadoop.hbase.regionserver.HStore.cancelRequestedCompaction(HStore.java:1834) > at > org.apache.hadoop.hbase.regionserver.CompactSplitThread$Rejection.rejectedExecution(CompactSplitThread.java:656) > at > java.util.concurrent.ThreadPoolExecutor.reject(ThreadPoolExecutor.java:830) > at > java.util.concurrent.ThreadPoolExecutor.execute(ThreadPoolExecutor.java:1379) > at > org.apache.hadoop.hbase.regionserver.CompactSplitThread.requestCompactionInternal(CompactSplitThread.java:401) > at > org.apache.hadoop.hbase.regionserver.CompactSplitThread.requestSystemCompaction(CompactSplitThread.java:348) > at > org.apache.hadoop.hbase.regionserver.HRegionServer.postOpenDeployTasks(HRegionServer.java:2111) > at > org.apache.hadoop.hbase.regionserver.HRegionServer.postOpenDeployTasks(HRegionServer.java:2097) > at > org.apache.hadoop.hbase.regionserver.SplitTransactionImpl.openDaughters(SplitTransactionImpl.java:478) > at > org.apache.hadoop.hbase.regionserver.SplitTransactionImpl.stepsAfterPONR(SplitTransactionImpl.java:549) > at > org.apache.hadoop.hbase.regionserver.SplitTransactionImpl.execute(SplitTransactionImpl.java:532) > at > org.apache.hadoop.hbase.regionserver.SplitRequest.doSplitting(SplitRequest.java:82) > at > org.apache.hadoop.hbase.regionserver.SplitRequest.run(SplitRequest.java:153) > at > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) > at > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) > at java.lang.Thread.run(Thread.java:748) > 2019-12-03 22:25:38,613 FATAL [RS:0;10.0.1.11:53504-splits-0] > regionserver.HRegionServer: ABORTING region server > 10.0.1.11,53504,1575379011279: Abort; we got an error after point-of-no-return > {code} -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (HBASE-23236) Upgrade to yetus 0.11.1
[ https://issues.apache.org/jira/browse/HBASE-23236?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16967991#comment-16967991 ] Xu Cang commented on HBASE-23236: - [~zhangduo] Are you also planning making this change for branch1? Since branch-1 precommit jenkins jobs are failing because of this. thanks! I can lend some help if needed. > Upgrade to yetus 0.11.1 > --- > > Key: HBASE-23236 > URL: https://issues.apache.org/jira/browse/HBASE-23236 > Project: HBase > Issue Type: Task > Components: build >Reporter: Duo Zhang >Assignee: Duo Zhang >Priority: Major > Fix For: 3.0.0, 2.3.0, 2.1.8, 2.2.3 > > -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Updated] (HBASE-23212) Provide config reload for Auto Region Reopen based on storeFile ref count
[ https://issues.apache.org/jira/browse/HBASE-23212?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Xu Cang updated HBASE-23212: Attachment: HBASE-23212.branch-2.000.patch HBASE-23212.branch-1.000.patch > Provide config reload for Auto Region Reopen based on storeFile ref count > - > > Key: HBASE-23212 > URL: https://issues.apache.org/jira/browse/HBASE-23212 > Project: HBase > Issue Type: Improvement >Affects Versions: 3.0.0, 2.3.0, 1.6.0 >Reporter: Viraj Jasani >Assignee: Viraj Jasani >Priority: Major > Fix For: 3.0.0, 2.3.0, 1.6.0 > > Attachments: HBASE-23212.branch-1.000.patch, > HBASE-23212.branch-1.000.patch, HBASE-23212.branch-2.000.patch, > HBASE-23212.branch-2.000.patch > > > We should provide flexibility to tune max storeFile Ref Count threshold that > is considered for auto region reopen as it represents leak on store file. > While running some perf tests, user can bring ref count very high if > required, but this config change should be dynamic and should not require > HMaster restart. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (HBASE-23212) Provide config reload for Auto Region Reopen based on storeFile ref count
[ https://issues.apache.org/jira/browse/HBASE-23212?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16966960#comment-16966960 ] Xu Cang commented on HBASE-23212: - I gave +1 and merged the master branch patch. Thank you [~vjasani]! Are you going to port this to branch-1 and branch-2 too? thanks. > Provide config reload for Auto Region Reopen based on storeFile ref count > - > > Key: HBASE-23212 > URL: https://issues.apache.org/jira/browse/HBASE-23212 > Project: HBase > Issue Type: Improvement >Affects Versions: 3.0.0, 2.3.0, 1.6.0 >Reporter: Viraj Jasani >Assignee: Viraj Jasani >Priority: Major > Fix For: 3.0.0, 2.3.0, 1.6.0 > > > We should provide flexibility to tune max storeFile Ref Count threshold that > is considered for auto region reopen as it represents leak on store file. > While running some perf tests, user can bring ref count very high if > required, but this config change should be dynamic and should not require > HMaster restart. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (HBASE-23143) Region Server Crash due to 2 cells out of order ( between 2 DELETEs)
[ https://issues.apache.org/jira/browse/HBASE-23143?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16948899#comment-16948899 ] Xu Cang commented on HBASE-23143: - comes during flush I believe. Related stacktrac: ( I have to remove some company confidential table names and so on) 2019-10-03 09:44:11,600 FATAL [MemStoreFlusher.0] regionserver.HRegionServer - ABORTING region server xxx: Replay of WAL required. Forcing server shutdown org.apache.hadoop.hbase.DroppedSnapshotException: region: TABLEXXX,00D7F00rGWE0D57F25zuZT0057F03biB1T\x00057F03biB1,1555980511237.d333ca887c6c78385dc44c0e7ddf97df. at org.apache.hadoop.hbase.regionserver.HRegion.internalFlushCacheAndCommit(HRegion.java:2621) at org.apache.hadoop.hbase.regionserver.HRegion.internalFlushcache(HRegion.java:2297) at org.apache.hadoop.hbase.regionserver.HRegion.internalFlushcache(HRegion.java:2259) at org.apache.hadoop.hbase.regionserver.HRegion.flushcache(HRegion.java:2143) at org.apache.hadoop.hbase.regionserver.HRegion.flush(HRegion.java:2068) at org.apache.hadoop.hbase.regionserver.MemStoreFlusher.flushRegion(MemStoreFlusher.java:512) at org.apache.hadoop.hbase.regionserver.MemStoreFlusher.flushRegion(MemStoreFlusher.java:482) at org.apache.hadoop.hbase.regionserver.MemStoreFlusher.access$900(MemStoreFlusher.java:76) at org.apache.hadoop.hbase.regionserver.MemStoreFlusher$FlushHandler.run(MemStoreFlusher.java:264) at java.lang.Thread.run(Thread.java:748) Caused by: java.io.IOException: Added a key not lexically larger than previous. Current cell = 00D7F00xG5g0D52v8UYAOg0F92v00oOMnG\x000DB7F00Cc3q000/0:COLUMNX/1570095189616/DeleteColumn/vlen=0/seqid=1981610, lastCell = 00D7F00xG5g0D52v8UYAOg0F92v00oOMnG\x000DB7F00Cc3q000/0:COLUMNX/1570095164786/DeleteColumn/vlen=0/seqid=1981620 at org.apache.hadoop.hbase.io.hfile.AbstractHFileWriter.checkKey(AbstractHFileWriter.java:204) at org.apache.hadoop.hbase.io.hfile.HFileWriterV2.append(HFileWriterV2.java:267) at org.apache.hadoop.hbase.io.hfile.HFileWriterV3.append(HFileWriterV3.java:87) at org.apache.hadoop.hbase.regionserver.StoreFile$Writer.append(StoreFile.java:1184) at org.apache.hadoop.hbase.regionserver.StoreFlusher.performFlush(StoreFlusher.java:138) at org.apache.hadoop.hbase.regionserver.DefaultStoreFlusher.flushSnapshot(DefaultStoreFlusher.java:75) at org.apache.hadoop.hbase.regionserver.HStore.flushCache(HStore.java:991) at org.apache.hadoop.hbase.regionserver.HStore$StoreFlusherImpl.flushCache(HStore.java:2506) at org.apache.hadoop.hbase.regionserver.HRegion.internalFlushCacheAndCommit(HRegion.java:2567) ... 9 more > Region Server Crash due to 2 cells out of order ( between 2 DELETEs) > > > Key: HBASE-23143 > URL: https://issues.apache.org/jira/browse/HBASE-23143 > Project: HBase > Issue Type: Bug >Affects Versions: 1.3.2 >Reporter: Xu Cang >Priority: Major > > Region Server Crash due to 2 cells out of order ( between 2 DELETEs) > > Caused by: java.io.IOException: Added a key not lexically larger than > previous. > Current cell = > 00D7F00xxQ10D52v8UY6yV0057F00bPaGT\x00057F00bPaG/0:TABLE1_ID/*1570095189597*/DeleteColumn/vlen=0/seqid=*2128373*, > > lastCell = > 00D7F00xxQ10D52v8UY6yV0057F00bPaGT\x00057F00bPaG/0:TABLE1_ID/*1570095165147*/DeleteColumn/vlen=0/seqid=*2128378* > > > I am aware of this JIRA: https://issues.apache.org/jira/browse/HBASE-22862 > Though it's slightly different, HBASE-22862 issue was caused One Delete and > One Put. > This issue I am reporting is caused by 2 Deletes > > Has anyone seen this issue? > > After I read the code and debugged the test cases. > In AbstractHFileWriter.java > {code:java} > int keyComp = comparator.compareOnlyKeyPortion(lastCell, cell);{code} > This call will always ignore SequenceId. And time stamps are in the correct > order (above case) > And since these 2 cells have same KEY. The comparison result should be 0. > *only possible issue I can think of is, in this code piece: in > CellComparator.java:* > {code:java} > Bytes.compareTo(left.getRowArray(), left.getRowOffset(), left.getRowLength(), > right.getRowArray(), right.getRowOffset(), right.getRowLength());{code} > The getRowLength() returns a wrong value. > Or the offset is messed up. (?) -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Updated] (HBASE-23143) Region Server Crash due to 2 cells out of order ( between 2 DELETEs)
[ https://issues.apache.org/jira/browse/HBASE-23143?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Xu Cang updated HBASE-23143: Description: Region Server Crash due to 2 cells out of order ( between 2 DELETEs) Caused by: java.io.IOException: Added a key not lexically larger than previous. Current cell = 00D7F00xxQ10D52v8UY6yV0057F00bPaGT\x00057F00bPaG/0:TABLE1_ID/*1570095189597*/DeleteColumn/vlen=0/seqid=*2128373*, lastCell = 00D7F00xxQ10D52v8UY6yV0057F00bPaGT\x00057F00bPaG/0:TABLE1_ID/*1570095165147*/DeleteColumn/vlen=0/seqid=*2128378* I am aware of this JIRA: https://issues.apache.org/jira/browse/HBASE-22862 Though it's slightly different, HBASE-22862 issue was caused One Delete and One Put. This issue I am reporting is caused by 2 Deletes Has anyone seen this issue? After I read the code and debugged the test cases. In AbstractHFileWriter.java {code:java} int keyComp = comparator.compareOnlyKeyPortion(lastCell, cell);{code} This call will always ignore SequenceId. And time stamps are in the correct order (above case) And since these 2 cells have same KEY. The comparison result should be 0. *only possible issue I can think of is, in this code piece: in CellComparator.java:* {code:java} Bytes.compareTo(left.getRowArray(), left.getRowOffset(), left.getRowLength(), right.getRowArray(), right.getRowOffset(), right.getRowLength());{code} The getRowLength() returns a wrong value. Or the offset is messed up. (?) was: Region Server Crash due to 2 cells out of order ( between 2 DELETEs) Caused by: java.io.IOException: Added a key not lexically larger than previous. Current cell = 00D7F00xxQ10D52v8UY6yV0057F00bPaGT\x00057F00bPaG/0:TABLE1_ID/*1570095189597*/DeleteColumn/vlen=0/seqid=*2128373*, lastCell = 00D7F00xxQ10D52v8UY6yV0057F00bPaGT\x00057F00bPaG/0:TABLE1_ID/*1570095165147*/DeleteColumn/vlen=0/seqid=*2128378* I am aware of this JIRA: https://issues.apache.org/jira/browse/HBASE-22862 Though it's slightly different, HBASE-22862 issue was caused One Delete and One Put. This issue I am reporting is caused by 2 Deletes Has anyone seen this issue? After I read the code and debugged the test cases. In AbstractHFileWriter.java {code:java} int keyComp = comparator.compareOnlyKeyPortion(lastCell, cell);{code} This call will always ignore SequenceId. And time stamps are in the correct order (above case) And since these 2 cells have same KEY. T*he comparison result should be 0.* *only possible issue I can think of is, in this code piece: in CellComparator.java:* {code:java} Bytes.compareTo(left.getRowArray(), left.getRowOffset(), left.getRowLength(), right.getRowArray(), right.getRowOffset(), right.getRowLength());{code} The getRowLength() returns a wrong value. Or the offset is messed up. (?) > Region Server Crash due to 2 cells out of order ( between 2 DELETEs) > > > Key: HBASE-23143 > URL: https://issues.apache.org/jira/browse/HBASE-23143 > Project: HBase > Issue Type: Bug >Affects Versions: 1.3.2 >Reporter: Xu Cang >Priority: Major > > Region Server Crash due to 2 cells out of order ( between 2 DELETEs) > > Caused by: java.io.IOException: Added a key not lexically larger than > previous. > Current cell = > 00D7F00xxQ10D52v8UY6yV0057F00bPaGT\x00057F00bPaG/0:TABLE1_ID/*1570095189597*/DeleteColumn/vlen=0/seqid=*2128373*, > > lastCell = > 00D7F00xxQ10D52v8UY6yV0057F00bPaGT\x00057F00bPaG/0:TABLE1_ID/*1570095165147*/DeleteColumn/vlen=0/seqid=*2128378* > > > I am aware of this JIRA: https://issues.apache.org/jira/browse/HBASE-22862 > Though it's slightly different, HBASE-22862 issue was caused One Delete and > One Put. > This issue I am reporting is caused by 2 Deletes > > Has anyone seen this issue? > > After I read the code and debugged the test cases. > In AbstractHFileWriter.java > {code:java} > int keyComp = comparator.compareOnlyKeyPortion(lastCell, cell);{code} > This call will always ignore SequenceId. And time stamps are in the correct > order (above case) > And since these 2 cells have same KEY. The comparison result should be 0. > *only possible issue I can think of is, in this code piece: in > CellComparator.java:* > {code:java} > Bytes.compareTo(left.getRowArray(), left.getRowOffset(), left.getRowLength(), > right.getRowArray(), right.getRowOffset(), right.getRowLength());{code} > The getRowLength() returns a wrong value. > Or the offset is messed up. (?) -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Updated] (HBASE-23143) Region Server Crash due to 2 cells out of order ( between 2 DELETEs)
[ https://issues.apache.org/jira/browse/HBASE-23143?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Xu Cang updated HBASE-23143: Description: Region Server Crash due to 2 cells out of order ( between 2 DELETEs) Caused by: java.io.IOException: Added a key not lexically larger than previous. Current cell = 00D7F00xxQ10D52v8UY6yV0057F00bPaGT\x00057F00bPaG/0:TABLE1_ID/*1570095189597*/DeleteColumn/vlen=0/seqid=*2128373*, lastCell = 00D7F00xxQ10D52v8UY6yV0057F00bPaGT\x00057F00bPaG/0:TABLE1_ID/*1570095165147*/DeleteColumn/vlen=0/seqid=*2128378* I am aware of this JIRA: https://issues.apache.org/jira/browse/HBASE-22862 Though it's slightly different, HBASE-22862 issue was caused One Delete and One Put. This issue I am reporting is caused by 2 Deletes Has anyone seen this issue? After I read the code and debugged the test cases. In AbstractHFileWriter.java {code:java} int keyComp = comparator.compareOnlyKeyPortion(lastCell, cell);{code} This call will always ignore SequenceId. And time stamps are in the correct order (above case) And since these 2 cells have same KEY. T*he comparison result should be 0.* *only possible issue I can think of is, in this code piece: in CellComparator.java:* {code:java} Bytes.compareTo(left.getRowArray(), left.getRowOffset(), left.getRowLength(), right.getRowArray(), right.getRowOffset(), right.getRowLength());{code} The getRowLength() returns a wrong value. Or the offset is messed up. (?) was: Region Server Crash due to 2 cells out of order ( between 2 DELETEs) Caused by: java.io.IOException: Added a key not lexically larger than previous. Current cell = 00D7F00xxQ10D52v8UY6yV0057F00bPaGT\x00057F00bPaG/0:TABLE1_ID/*1570095189597*/DeleteColumn/vlen=0/seqid=*2128373*, lastCell = 00D7F00xxQ10D52v8UY6yV0057F00bPaGT\x00057F00bPaG/0:TABLE1_ID/*1570095165147*/DeleteColumn/vlen=0/seqid=*2128378* I am aware https://issues.apache.org/jira/browse/HBASE-22862 but it's slightly different, this issue is not caused by One Delete and One Put. This issue I am seeing is caused by 2 Deletes Has anyone seen this issue? > Region Server Crash due to 2 cells out of order ( between 2 DELETEs) > > > Key: HBASE-23143 > URL: https://issues.apache.org/jira/browse/HBASE-23143 > Project: HBase > Issue Type: Bug >Affects Versions: 1.3.2 >Reporter: Xu Cang >Priority: Major > > Region Server Crash due to 2 cells out of order ( between 2 DELETEs) > > Caused by: java.io.IOException: Added a key not lexically larger than > previous. > Current cell = > 00D7F00xxQ10D52v8UY6yV0057F00bPaGT\x00057F00bPaG/0:TABLE1_ID/*1570095189597*/DeleteColumn/vlen=0/seqid=*2128373*, > > lastCell = > 00D7F00xxQ10D52v8UY6yV0057F00bPaGT\x00057F00bPaG/0:TABLE1_ID/*1570095165147*/DeleteColumn/vlen=0/seqid=*2128378* > > > I am aware of this JIRA: https://issues.apache.org/jira/browse/HBASE-22862 > Though it's slightly different, HBASE-22862 issue was caused One Delete and > One Put. > This issue I am reporting is caused by 2 Deletes > > Has anyone seen this issue? > > After I read the code and debugged the test cases. > In AbstractHFileWriter.java > {code:java} > int keyComp = comparator.compareOnlyKeyPortion(lastCell, cell);{code} > This call will always ignore SequenceId. And time stamps are in the correct > order (above case) > > And since these 2 cells have same KEY. T*he comparison result should be 0.* > *only possible issue I can think of is, in this code piece: in > CellComparator.java:* > {code:java} > Bytes.compareTo(left.getRowArray(), left.getRowOffset(), left.getRowLength(), > right.getRowArray(), right.getRowOffset(), right.getRowLength());{code} > The getRowLength() returns a wrong value. > Or the offset is messed up. (?) -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (HBASE-23143) Region Server Crash due to 2 cells out of order ( between 2 DELETEs)
Xu Cang created HBASE-23143: --- Summary: Region Server Crash due to 2 cells out of order ( between 2 DELETEs) Key: HBASE-23143 URL: https://issues.apache.org/jira/browse/HBASE-23143 Project: HBase Issue Type: Bug Affects Versions: 1.3.2 Reporter: Xu Cang Region Server Crash due to 2 cells out of order ( between 2 DELETEs) Caused by: java.io.IOException: Added a key not lexically larger than previous. Current cell = 00D7F00xxQ10D52v8UY6yV0057F00bPaGT\x00057F00bPaG/0:TABLE1_ID/*1570095189597*/DeleteColumn/vlen=0/seqid=*2128373*, lastCell = 00D7F00xxQ10D52v8UY6yV0057F00bPaGT\x00057F00bPaG/0:TABLE1_ID/*1570095165147*/DeleteColumn/vlen=0/seqid=*2128378* I am aware https://issues.apache.org/jira/browse/HBASE-22862 but it's slightly different, this issue is not caused by One Delete and One Put. This issue I am seeing is caused by 2 Deletes Has anyone seen this issue? -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (HBASE-22660) Probabilistic end to end tracking of cross cluster replication latency
[ https://issues.apache.org/jira/browse/HBASE-22660?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16936268#comment-16936268 ] Xu Cang commented on HBASE-22660: - regarding "wall clock time" in different RSes across a cluster, does HBase require some wall time difference upper limit? > Probabilistic end to end tracking of cross cluster replication latency > -- > > Key: HBASE-22660 > URL: https://issues.apache.org/jira/browse/HBASE-22660 > Project: HBase > Issue Type: New Feature >Reporter: Andrew Purtell >Assignee: Xu Cang >Priority: Major > > ageOfLastShippedOp tracks replication latency forward from the point where a > source process tailing a WAL has found an edit to ship. This is not an end to > end measure. > To achieve a holistic end to end measure we should have an active process > that periodically injects sentinel values at commit time adjacent to the > WALedits carrying application data at the source and records when they are > finally processed at the sink, using a timestamp embedded in the sentinel to > measure true end to end latency for the adjacent commit. This could be done > for a configurable (and small) percentage of commits so would give a > probabilistic measure with confidence controlled by sample rate. It should be > done this way rather than by passively sampling cell timestamps because cell > timestamps can be set by the user and may not correspond to wall clock time. > We could introduce a new type of synthetic WALedit, a new global metric, and > because the adjacent commit from which we build the sentinel contains table > information we could track that too and add a per table metric. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Comment Edited] (HBASE-22969) A new binary component comparator(BinaryComponentComparator) to perform comparison of arbitrary length and position
[ https://issues.apache.org/jira/browse/HBASE-22969?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16936209#comment-16936209 ] Xu Cang edited comment on HBASE-22969 at 9/23/19 10:06 PM: --- Skimmed the patch and it looks good! One question is, do you think "BinaryComponentComparator" represents the purpose of this -filter- comparator? Or maybe it's a bit not intuitive to me? Thanks. was (Author: xucang): Skimmed the patch and it looks good! One question is, do you think "BinaryComponentComparator" represents the purpose of this filter? Or maybe it's a bit not intuitive to me? Thanks. > A new binary component comparator(BinaryComponentComparator) to perform > comparison of arbitrary length and position > --- > > Key: HBASE-22969 > URL: https://issues.apache.org/jira/browse/HBASE-22969 > Project: HBase > Issue Type: Improvement > Components: Filters >Reporter: Udai Bhan Kashyap >Assignee: Udai Bhan Kashyap >Priority: Minor > Attachments: HBASE-22969.0003.patch, HBASE-22969.0004.patch, > HBASE-22969.0005.patch, HBASE-22969.0006.patch, HBASE-22969.0007.patch, > HBASE-22969.0008.patch, HBASE-22969.0009.patch, > HBASE-22969.HBASE-22969.0001.patch, HBASE-22969.master.0001.patch > > > Lets say you have composite key: a+b+c+d. And for simplicity assume that > a,b,c, and d all are 4 byte integers. > Now, if you want to execute a query which is semantically same to following > sql: > {{"SELECT * from table where a=1 and b > 10 and b < 20 and c > 90 and c < 100 > and d=1"}} > The only choice you have is to do client side filtering. That could be lots > of unwanted data going through various software components and network. > Solution: > We can create a "component" comparator which takes the value of the > "component" and its relative position in the key to pass the 'Filter' > subsystem of the server: > {code} > FilterList filterList = new FilterList(FilterList.Operator.MUST_PASS_ALL); > int bOffset = 4; > byte[] b10 = Bytes.toBytes(10); > Filter b10Filter = new RowFilter(CompareFilter.CompareOp.GREATER, > new BinaryComponentComparator(b10,bOffset)); > filterList.addFilter(b10Filter); > byte[] b20 = Bytes.toBytes(20); > Filter b20Filter = new RowFilter(CompareFilter.CompareOp.LESS, > new BinaryComponentComparator(b20,bOffset)); > filterList.addFilter(b20Filter); > int cOffset = 8; > byte[] c90 = Bytes.toBytes(90); > Filter c90Filter = new RowFilter(CompareFilter.CompareOp.GREATER, > new BinaryComponentComparator(c90,cOffset)); > filterList.addFilter(c90Filter); > byte[] c100 = Bytes.toBytes(100); > Filter c100Filter = new RowFilter(CompareFilter.CompareOp.LESS, > new BinaryComponentComparator(c100,cOffset)); > filterList.addFilter(c100Filter); > in dOffset = 12; > byte[] d1 = Bytes.toBytes(1); > Filter dFilter = new RowFilter(CompareFilter.CompareOp.EQUAL, > new BinaryComponentComparator(d1,dOffset)); > filterList.addFilter(dFilter); > //build start and end key for scan > int aOffset = 0; > byte[] startKey = new byte[16]; //key size with four ints > Bytes.putInt(startKey,aOffset,1); //a=1 > Bytes.putInt(startKey,bOffset,11); //b=11, takes care of b > 10 > Bytes.putInt(startKey,cOffset,91); //c=91, > Bytes.putInt(startKey,dOffset,1); //d=1, > byte[] endKey = new byte[16]; > Bytes.putInt(endKey,aOffset,1); //a=1 > Bytes.putInt(endKey,bOffset,20); //b=20, takes care of b < 20 > Bytes.putInt(endKey,cOffset,100); //c=100, > Bytes.putInt(endKey,dOffset,1); //d=1, > //setup scan > Scan scan = new Scan(startKey,endKey); > scan.setFilter(filterList); > //The scanner below now should give only desired rows. > //No client side filtering is required. > ResultScanner scanner = table.getScanner(scan); > {code} > The comparator can be used with any filter which makes use of > ByteArrayComparable. Most notably it can be used with ValueFilter to filter > out KV based on partial comparison of 'values' : > {code} > byte[] partialValue = Bytes.toBytes("partial_value"); > int partialValueOffset = > Filter partialValueFilter = new > ValueFilter(CompareFilter.CompareOp.GREATER, > new BinaryComponentComparator(partialValue,partialValueOffset)); > {code} > Which in turn can be combined with RowFilter to create a poweful predicate: > {code} > RowFilter rowFilter = new RowFilter(GREATER, new > BinaryComponentComparator(Bytes.toBytes("a"),1); > FilterLiost fl = new FilterList > (MUST_PASS_ALL,rowFilter,partialValueFilter); > {code} -- This message wa
[jira] [Commented] (HBASE-22969) A new binary component comparator(BinaryComponentComparator) to perform comparison of arbitrary length and position
[ https://issues.apache.org/jira/browse/HBASE-22969?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16936209#comment-16936209 ] Xu Cang commented on HBASE-22969: - Skimmed the patch and it looks good! One question is, do you think "BinaryComponentComparator" represents the purpose of this filter? Or maybe it's a bit not intuitive to me? Thanks. > A new binary component comparator(BinaryComponentComparator) to perform > comparison of arbitrary length and position > --- > > Key: HBASE-22969 > URL: https://issues.apache.org/jira/browse/HBASE-22969 > Project: HBase > Issue Type: Improvement > Components: Filters >Reporter: Udai Bhan Kashyap >Assignee: Udai Bhan Kashyap >Priority: Minor > Attachments: HBASE-22969.0003.patch, HBASE-22969.0004.patch, > HBASE-22969.0005.patch, HBASE-22969.0006.patch, HBASE-22969.0007.patch, > HBASE-22969.0008.patch, HBASE-22969.0009.patch, > HBASE-22969.HBASE-22969.0001.patch, HBASE-22969.master.0001.patch > > > Lets say you have composite key: a+b+c+d. And for simplicity assume that > a,b,c, and d all are 4 byte integers. > Now, if you want to execute a query which is semantically same to following > sql: > {{"SELECT * from table where a=1 and b > 10 and b < 20 and c > 90 and c < 100 > and d=1"}} > The only choice you have is to do client side filtering. That could be lots > of unwanted data going through various software components and network. > Solution: > We can create a "component" comparator which takes the value of the > "component" and its relative position in the key to pass the 'Filter' > subsystem of the server: > {code} > FilterList filterList = new FilterList(FilterList.Operator.MUST_PASS_ALL); > int bOffset = 4; > byte[] b10 = Bytes.toBytes(10); > Filter b10Filter = new RowFilter(CompareFilter.CompareOp.GREATER, > new BinaryComponentComparator(b10,bOffset)); > filterList.addFilter(b10Filter); > byte[] b20 = Bytes.toBytes(20); > Filter b20Filter = new RowFilter(CompareFilter.CompareOp.LESS, > new BinaryComponentComparator(b20,bOffset)); > filterList.addFilter(b20Filter); > int cOffset = 8; > byte[] c90 = Bytes.toBytes(90); > Filter c90Filter = new RowFilter(CompareFilter.CompareOp.GREATER, > new BinaryComponentComparator(c90,cOffset)); > filterList.addFilter(c90Filter); > byte[] c100 = Bytes.toBytes(100); > Filter c100Filter = new RowFilter(CompareFilter.CompareOp.LESS, > new BinaryComponentComparator(c100,cOffset)); > filterList.addFilter(c100Filter); > in dOffset = 12; > byte[] d1 = Bytes.toBytes(1); > Filter dFilter = new RowFilter(CompareFilter.CompareOp.EQUAL, > new BinaryComponentComparator(d1,dOffset)); > filterList.addFilter(dFilter); > //build start and end key for scan > int aOffset = 0; > byte[] startKey = new byte[16]; //key size with four ints > Bytes.putInt(startKey,aOffset,1); //a=1 > Bytes.putInt(startKey,bOffset,11); //b=11, takes care of b > 10 > Bytes.putInt(startKey,cOffset,91); //c=91, > Bytes.putInt(startKey,dOffset,1); //d=1, > byte[] endKey = new byte[16]; > Bytes.putInt(endKey,aOffset,1); //a=1 > Bytes.putInt(endKey,bOffset,20); //b=20, takes care of b < 20 > Bytes.putInt(endKey,cOffset,100); //c=100, > Bytes.putInt(endKey,dOffset,1); //d=1, > //setup scan > Scan scan = new Scan(startKey,endKey); > scan.setFilter(filterList); > //The scanner below now should give only desired rows. > //No client side filtering is required. > ResultScanner scanner = table.getScanner(scan); > {code} > The comparator can be used with any filter which makes use of > ByteArrayComparable. Most notably it can be used with ValueFilter to filter > out KV based on partial comparison of 'values' : > {code} > byte[] partialValue = Bytes.toBytes("partial_value"); > int partialValueOffset = > Filter partialValueFilter = new > ValueFilter(CompareFilter.CompareOp.GREATER, > new BinaryComponentComparator(partialValue,partialValueOffset)); > {code} > Which in turn can be combined with RowFilter to create a poweful predicate: > {code} > RowFilter rowFilter = new RowFilter(GREATER, new > BinaryComponentComparator(Bytes.toBytes("a"),1); > FilterLiost fl = new FilterList > (MUST_PASS_ALL,rowFilter,partialValueFilter); > {code} -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (HBASE-23058) Should be "Column Family Name" in table.jsp
[ https://issues.apache.org/jira/browse/HBASE-23058?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16934561#comment-16934561 ] Xu Cang commented on HBASE-23058: - +1 Thank you! > Should be "Column Family Name" in table.jsp > --- > > Key: HBASE-23058 > URL: https://issues.apache.org/jira/browse/HBASE-23058 > Project: HBase > Issue Type: Improvement >Reporter: Qiongwu >Assignee: Qiongwu >Priority: Minor > Attachments: 2019-09-20 19-16-22屏幕截图.png, HBASE-23058.master.001.patch > > -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Resolved] (HBASE-22804) Provide an API to get list of successful regions and total expected regions in Canary
[ https://issues.apache.org/jira/browse/HBASE-22804?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Xu Cang resolved HBASE-22804. - Fix Version/s: 1.4.12 2.2.2 2.1.7 1.3.6 2.3.1 3.0.0 Resolution: Fixed > Provide an API to get list of successful regions and total expected regions > in Canary > - > > Key: HBASE-22804 > URL: https://issues.apache.org/jira/browse/HBASE-22804 > Project: HBase > Issue Type: Improvement > Components: canary >Affects Versions: 3.0.0, 1.3.0, 1.4.0, 1.5.0, 2.0.0, 2.1.5, 2.2.1 >Reporter: Caroline >Assignee: Caroline >Priority: Minor > Labels: Canary > Fix For: 3.0.0, 1.5.0, 2.3.1, 1.3.6, 2.1.7, 2.2.2, 1.4.12 > > Attachments: HBASE-22804.branch-1.001.patch, > HBASE-22804.branch-1.002.patch, HBASE-22804.branch-1.003.patch, > HBASE-22804.branch-1.004.patch, HBASE-22804.branch-1.005.patch, > HBASE-22804.branch-1.006.patch, HBASE-22804.branch-1.007.patch, > HBASE-22804.branch-1.008.patch, HBASE-22804.branch-1.009.patch, > HBASE-22804.branch-1.009.patch, HBASE-22804.branch-1.010.patch, > HBASE-22804.branch-2.001.patch, HBASE-22804.branch-2.002.patch, > HBASE-22804.branch-2.003.patch, HBASE-22804.branch-2.004.patch, > HBASE-22804.branch-2.005.patch, HBASE-22804.branch-2.006.patch, > HBASE-22804.master.001.patch, HBASE-22804.master.002.patch, > HBASE-22804.master.003.patch, HBASE-22804.master.004.patch, > HBASE-22804.master.005.patch, HBASE-22804.master.006.patch > > > At present HBase Canary tool only prints the successes as part of logs. > Providing an API to get the list of successes, as well as total number of > expected regions, will make it easier to get a more accurate availability > estimate. > -- This message was sent by Atlassian Jira (v8.3.2#803003)
[jira] [Comment Edited] (HBASE-22804) Provide an API to get list of successful regions and total expected regions in Canary
[ https://issues.apache.org/jira/browse/HBASE-22804?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16930732#comment-16930732 ] Xu Cang edited comment on HBASE-22804 at 9/16/19 6:31 PM: -- pushed to branch-1.4 branch-2 branch-2.1 branch-2.2 master branch. branch-1.3 was (Author: xucang): pushed to branch-1.4 branch-2 branch-2.1 branch-2.2 master branch. Will do for branch-1.3 (patch needs some minor work) > Provide an API to get list of successful regions and total expected regions > in Canary > - > > Key: HBASE-22804 > URL: https://issues.apache.org/jira/browse/HBASE-22804 > Project: HBase > Issue Type: Improvement > Components: canary >Affects Versions: 3.0.0, 1.3.0, 1.4.0, 1.5.0, 2.0.0, 2.1.5, 2.2.1 >Reporter: Caroline >Assignee: Caroline >Priority: Minor > Labels: Canary > Fix For: 1.5.0 > > Attachments: HBASE-22804.branch-1.001.patch, > HBASE-22804.branch-1.002.patch, HBASE-22804.branch-1.003.patch, > HBASE-22804.branch-1.004.patch, HBASE-22804.branch-1.005.patch, > HBASE-22804.branch-1.006.patch, HBASE-22804.branch-1.007.patch, > HBASE-22804.branch-1.008.patch, HBASE-22804.branch-1.009.patch, > HBASE-22804.branch-1.009.patch, HBASE-22804.branch-1.010.patch, > HBASE-22804.branch-2.001.patch, HBASE-22804.branch-2.002.patch, > HBASE-22804.branch-2.003.patch, HBASE-22804.branch-2.004.patch, > HBASE-22804.branch-2.005.patch, HBASE-22804.branch-2.006.patch, > HBASE-22804.master.001.patch, HBASE-22804.master.002.patch, > HBASE-22804.master.003.patch, HBASE-22804.master.004.patch, > HBASE-22804.master.005.patch, HBASE-22804.master.006.patch > > > At present HBase Canary tool only prints the successes as part of logs. > Providing an API to get the list of successes, as well as total number of > expected regions, will make it easier to get a more accurate availability > estimate. > -- This message was sent by Atlassian Jira (v8.3.2#803003)
[jira] [Comment Edited] (HBASE-22804) Provide an API to get list of successful regions and total expected regions in Canary
[ https://issues.apache.org/jira/browse/HBASE-22804?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16930732#comment-16930732 ] Xu Cang edited comment on HBASE-22804 at 9/16/19 6:15 PM: -- pushed to branch-1.4 branch-2 branch-2.1 branch-2.2 master branch. Will do for branch-1.3 (patch needs some minor work) was (Author: xucang): pushed to branch-1.4 > Provide an API to get list of successful regions and total expected regions > in Canary > - > > Key: HBASE-22804 > URL: https://issues.apache.org/jira/browse/HBASE-22804 > Project: HBase > Issue Type: Improvement > Components: canary >Affects Versions: 3.0.0, 1.3.0, 1.4.0, 1.5.0, 2.0.0, 2.1.5, 2.2.1 >Reporter: Caroline >Assignee: Caroline >Priority: Minor > Labels: Canary > Fix For: 1.5.0 > > Attachments: HBASE-22804.branch-1.001.patch, > HBASE-22804.branch-1.002.patch, HBASE-22804.branch-1.003.patch, > HBASE-22804.branch-1.004.patch, HBASE-22804.branch-1.005.patch, > HBASE-22804.branch-1.006.patch, HBASE-22804.branch-1.007.patch, > HBASE-22804.branch-1.008.patch, HBASE-22804.branch-1.009.patch, > HBASE-22804.branch-1.009.patch, HBASE-22804.branch-1.010.patch, > HBASE-22804.branch-2.001.patch, HBASE-22804.branch-2.002.patch, > HBASE-22804.branch-2.003.patch, HBASE-22804.branch-2.004.patch, > HBASE-22804.branch-2.005.patch, HBASE-22804.branch-2.006.patch, > HBASE-22804.master.001.patch, HBASE-22804.master.002.patch, > HBASE-22804.master.003.patch, HBASE-22804.master.004.patch, > HBASE-22804.master.005.patch, HBASE-22804.master.006.patch > > > At present HBase Canary tool only prints the successes as part of logs. > Providing an API to get the list of successes, as well as total number of > expected regions, will make it easier to get a more accurate availability > estimate. > -- This message was sent by Atlassian Jira (v8.3.2#803003)
[jira] [Commented] (HBASE-22804) Provide an API to get list of successful regions and total expected regions in Canary
[ https://issues.apache.org/jira/browse/HBASE-22804?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16930732#comment-16930732 ] Xu Cang commented on HBASE-22804: - pushed to branch-1.4 > Provide an API to get list of successful regions and total expected regions > in Canary > - > > Key: HBASE-22804 > URL: https://issues.apache.org/jira/browse/HBASE-22804 > Project: HBase > Issue Type: Improvement > Components: canary >Affects Versions: 3.0.0, 1.3.0, 1.4.0, 1.5.0, 2.0.0, 2.1.5, 2.2.1 >Reporter: Caroline >Assignee: Caroline >Priority: Minor > Labels: Canary > Fix For: 1.5.0 > > Attachments: HBASE-22804.branch-1.001.patch, > HBASE-22804.branch-1.002.patch, HBASE-22804.branch-1.003.patch, > HBASE-22804.branch-1.004.patch, HBASE-22804.branch-1.005.patch, > HBASE-22804.branch-1.006.patch, HBASE-22804.branch-1.007.patch, > HBASE-22804.branch-1.008.patch, HBASE-22804.branch-1.009.patch, > HBASE-22804.branch-1.009.patch, HBASE-22804.branch-1.010.patch, > HBASE-22804.branch-2.001.patch, HBASE-22804.branch-2.002.patch, > HBASE-22804.branch-2.003.patch, HBASE-22804.branch-2.004.patch, > HBASE-22804.branch-2.005.patch, HBASE-22804.branch-2.006.patch, > HBASE-22804.master.001.patch, HBASE-22804.master.002.patch, > HBASE-22804.master.003.patch, HBASE-22804.master.004.patch, > HBASE-22804.master.005.patch, HBASE-22804.master.006.patch > > > At present HBase Canary tool only prints the successes as part of logs. > Providing an API to get the list of successes, as well as total number of > expected regions, will make it easier to get a more accurate availability > estimate. > -- This message was sent by Atlassian Jira (v8.3.2#803003)
[jira] [Commented] (HBASE-22804) Provide an API to get list of successful regions and total expected regions in Canary
[ https://issues.apache.org/jira/browse/HBASE-22804?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16930681#comment-16930681 ] Xu Cang commented on HBASE-22804: - [~psomogyi] thanks! Not sure it was closed. Right, it was only committed to branch-1, will do for other branches today. > Provide an API to get list of successful regions and total expected regions > in Canary > - > > Key: HBASE-22804 > URL: https://issues.apache.org/jira/browse/HBASE-22804 > Project: HBase > Issue Type: Improvement > Components: canary >Affects Versions: 3.0.0, 1.3.0, 1.4.0, 1.5.0, 2.0.0, 2.1.5, 2.2.1 >Reporter: Caroline >Assignee: Caroline >Priority: Minor > Labels: Canary > Fix For: 1.5.0 > > Attachments: HBASE-22804.branch-1.001.patch, > HBASE-22804.branch-1.002.patch, HBASE-22804.branch-1.003.patch, > HBASE-22804.branch-1.004.patch, HBASE-22804.branch-1.005.patch, > HBASE-22804.branch-1.006.patch, HBASE-22804.branch-1.007.patch, > HBASE-22804.branch-1.008.patch, HBASE-22804.branch-1.009.patch, > HBASE-22804.branch-1.009.patch, HBASE-22804.branch-1.010.patch, > HBASE-22804.branch-2.001.patch, HBASE-22804.branch-2.002.patch, > HBASE-22804.branch-2.003.patch, HBASE-22804.branch-2.004.patch, > HBASE-22804.branch-2.005.patch, HBASE-22804.branch-2.006.patch, > HBASE-22804.master.001.patch, HBASE-22804.master.002.patch, > HBASE-22804.master.003.patch, HBASE-22804.master.004.patch, > HBASE-22804.master.005.patch, HBASE-22804.master.006.patch > > > At present HBase Canary tool only prints the successes as part of logs. > Providing an API to get the list of successes, as well as total number of > expected regions, will make it easier to get a more accurate availability > estimate. > -- This message was sent by Atlassian Jira (v8.3.2#803003)
[jira] [Assigned] (HBASE-22660) Probabilistic end to end tracking of cross cluster replication latency
[ https://issues.apache.org/jira/browse/HBASE-22660?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Xu Cang reassigned HBASE-22660: --- Assignee: Xu Cang > Probabilistic end to end tracking of cross cluster replication latency > -- > > Key: HBASE-22660 > URL: https://issues.apache.org/jira/browse/HBASE-22660 > Project: HBase > Issue Type: New Feature >Reporter: Andrew Purtell >Assignee: Xu Cang >Priority: Major > > ageOfLastShippedOp tracks replication latency forward from the point where a > source process tailing a WAL has found an edit to ship. This is not an end to > end measure. > To achieve a holistic end to end measure we should have an active process > that periodically injects sentinel values at commit time adjacent to the > WALedits carrying application data at the source and records when they are > finally processed at the sink, using a timestamp embedded in the sentinel to > measure true end to end latency for the adjacent commit. This could be done > for a configurable (and small) percentage of commits so would give a > probabilistic measure with confidence controlled by sample rate. It should be > done this way rather than by passively sampling cell timestamps because cell > timestamps can be set by the user and may not correspond to wall clock time. > We could introduce a new type of synthetic WALedit, a new global metric, and > because the adjacent commit from which we build the sentinel contains table > information we could track that too and add a per table metric. -- This message was sent by Atlassian Jira (v8.3.2#803003)
[jira] [Commented] (HBASE-22866) Multiple slf4j-log4j provider versions included in binary package (branch-1)
[ https://issues.apache.org/jira/browse/HBASE-22866?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16928782#comment-16928782 ] Xu Cang commented on HBASE-22866: - question for [~vjasani] and [~apurtell] this patch's hadoop-qa run had "shadedjars" with -1 and the patch is still committed. Do you already know the reason of this shadedjars failure? The reason I am asking is I am trying to get HBASE-22804 committed but having the same shadedjars failure. And I am not sure how to add flag to show more info on HBase Jenkins to reveal which rule was offended. Thanks! cc. [~caroliney14] > Multiple slf4j-log4j provider versions included in binary package (branch-1) > > > Key: HBASE-22866 > URL: https://issues.apache.org/jira/browse/HBASE-22866 > Project: HBase > Issue Type: Bug >Affects Versions: 1.5.0 >Reporter: Andrew Purtell >Assignee: Viraj Jasani >Priority: Minor > Fix For: 1.5.0, 1.3.6, 1.4.11 > > Attachments: HBASE-22866.branch-1.000.patch, > HBASE-22866.branch-1.000.patch > > > Examining binary assembly results there are multiple versions of slf4j-log4j > in lib/ > {noformat} > slf4j-api-1.7.7.jar > slf4j-log4j12-1.6.1.jar > slf4j-log4j12-1.7.10.jar > slf4j-log4j12-1.7.7.jar > {noformat} > We aren't managing slf4j-log4j12 dependency versions correctly, somehow. -- This message was sent by Atlassian Jira (v8.3.2#803003)
[jira] [Commented] (HBASE-22775) Enhance logging for peer related operations
[ https://issues.apache.org/jira/browse/HBASE-22775?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16921768#comment-16921768 ] Xu Cang commented on HBASE-22775: - [~apurtell] Do you think my explanation above make sense? thanks! > Enhance logging for peer related operations > --- > > Key: HBASE-22775 > URL: https://issues.apache.org/jira/browse/HBASE-22775 > Project: HBase > Issue Type: Improvement >Reporter: Xu Cang >Assignee: Xu Cang >Priority: Minor > Attachments: HBASE-22775.master.001.patch, > HBASE-22775.master.002.patch > > > Now we don't have good logging regarding peer operations, for example addPeer > does not log itself: > [https://github.com/apache/hbase/blob/master/hbase-replication/src/main/java/org/apache/hadoop/hbase/replication/ZKReplicationPeerStorage.java#L102] > This Jira is aiming to enhancing this area -- This message was sent by Atlassian Jira (v8.3.2#803003)
[jira] [Commented] (HBASE-22836) MemcachedBlockCache parameter error
[ https://issues.apache.org/jira/browse/HBASE-22836?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16921753#comment-16921753 ] Xu Cang commented on HBASE-22836: - Locally I was able to build hbase by using your patch, resubmitting same patch to trigger hadoop-QA to run again. > MemcachedBlockCache parameter error > --- > > Key: HBASE-22836 > URL: https://issues.apache.org/jira/browse/HBASE-22836 > Project: HBase > Issue Type: Bug > Components: BlockCache >Affects Versions: 1.4.9, 1.4.10 >Reporter: zbq.dean >Priority: Major > Attachments: HBASE-22836.branch-1.0001.patch, > HBASE-22836.branch-1.0001.patch > > > When cache a block,the expiration is always set MAX_SIZE (which is a static > final field). > MAX_SIZE was mistakenly thought to be the max size of block. In fact, this > parameter represents the expiration of the cached block. MAX_SIZE should be > set 0 which means forever. -- This message was sent by Atlassian Jira (v8.3.2#803003)
[jira] [Updated] (HBASE-22836) MemcachedBlockCache parameter error
[ https://issues.apache.org/jira/browse/HBASE-22836?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Xu Cang updated HBASE-22836: Attachment: HBASE-22836.branch-1.0001.patch > MemcachedBlockCache parameter error > --- > > Key: HBASE-22836 > URL: https://issues.apache.org/jira/browse/HBASE-22836 > Project: HBase > Issue Type: Bug > Components: BlockCache >Affects Versions: 1.4.9, 1.4.10 >Reporter: zbq.dean >Priority: Major > Attachments: HBASE-22836.branch-1.0001.patch, > HBASE-22836.branch-1.0001.patch > > > When cache a block,the expiration is always set MAX_SIZE (which is a static > final field). > MAX_SIZE was mistakenly thought to be the max size of block. In fact, this > parameter represents the expiration of the cached block. MAX_SIZE should be > set 0 which means forever. -- This message was sent by Atlassian Jira (v8.3.2#803003)
[jira] [Commented] (HBASE-22823) Mark Canary as Public/Evolving
[ https://issues.apache.org/jira/browse/HBASE-22823?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16921552#comment-16921552 ] Xu Cang commented on HBASE-22823: - I was on vacation and off the grid for 2 weeks. Thank you for handling this [~Apache9] [~apurtell] ! > Mark Canary as Public/Evolving > -- > > Key: HBASE-22823 > URL: https://issues.apache.org/jira/browse/HBASE-22823 > Project: HBase > Issue Type: Sub-task >Reporter: Andrew Purtell >Assignee: Caroline >Priority: Minor > Attachments: HBASE-22823.branch-1.000.patch, > HBASE-22823.branch-2.000.patch, HBASE-22823.master.000.patch > > > Canary is marked as a Private class. Its interfaces could change at any time. > Should we change the annotation on Canary to Public/Evolving? Or add > annotations on some of these subtypes? I think it depends on how we think > Canary results should be consumed. > In our production we find that scraping logs and parsing them is brittle and > not scalable. Although the scalability issue is more to do with the totality > of logs from a Hadoopish stack, if you run HBase then you have this problem, > and you wouldn't be using the canary if you didn't run HBase. We have a tool > that embeds the Canary and calls various methods and takes actions without > needing a round trip to the logs and whatever aggregates them. > I propose we promote Canary to Public/Evolving. -- This message was sent by Atlassian Jira (v8.3.2#803003)
[jira] [Commented] (HBASE-22823) Mark Canary as Public/Evolving
[ https://issues.apache.org/jira/browse/HBASE-22823?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16908482#comment-16908482 ] Xu Cang commented on HBASE-22823: - pushed to branch-1 branch-1.3 branch-1.4 branch-2 branch-2.1 branch-2.2 master > Mark Canary as Public/Evolving > -- > > Key: HBASE-22823 > URL: https://issues.apache.org/jira/browse/HBASE-22823 > Project: HBase > Issue Type: Sub-task >Reporter: Andrew Purtell >Assignee: Caroline >Priority: Minor > Fix For: 3.0.0, 1.5.0, 2.2.1, 2.1.6, 1.3.6, 1.4.11 > > Attachments: HBASE-22823.branch-1.000.patch, > HBASE-22823.branch-2.000.patch, HBASE-22823.master.000.patch > > > Canary is marked as a Private class. Its interfaces could change at any time. > Should we change the annotation on Canary to Public/Evolving? Or add > annotations on some of these subtypes? I think it depends on how we think > Canary results should be consumed. > In our production we find that scraping logs and parsing them is brittle and > not scalable. Although the scalability issue is more to do with the totality > of logs from a Hadoopish stack, if you run HBase then you have this problem, > and you wouldn't be using the canary if you didn't run HBase. We have a tool > that embeds the Canary and calls various methods and takes actions without > needing a round trip to the logs and whatever aggregates them. > I propose we promote Canary to Public/Evolving. -- This message was sent by Atlassian JIRA (v7.6.14#76016)
[jira] [Updated] (HBASE-22823) Mark Canary as Public/Evolving
[ https://issues.apache.org/jira/browse/HBASE-22823?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Xu Cang updated HBASE-22823: Resolution: Fixed Status: Resolved (was: Patch Available) > Mark Canary as Public/Evolving > -- > > Key: HBASE-22823 > URL: https://issues.apache.org/jira/browse/HBASE-22823 > Project: HBase > Issue Type: Sub-task >Reporter: Andrew Purtell >Assignee: Caroline >Priority: Minor > Fix For: 3.0.0, 1.5.0, 2.2.1, 2.1.6, 1.3.6, 1.4.11 > > Attachments: HBASE-22823.branch-1.000.patch, > HBASE-22823.branch-2.000.patch, HBASE-22823.master.000.patch > > > Canary is marked as a Private class. Its interfaces could change at any time. > Should we change the annotation on Canary to Public/Evolving? Or add > annotations on some of these subtypes? I think it depends on how we think > Canary results should be consumed. > In our production we find that scraping logs and parsing them is brittle and > not scalable. Although the scalability issue is more to do with the totality > of logs from a Hadoopish stack, if you run HBase then you have this problem, > and you wouldn't be using the canary if you didn't run HBase. We have a tool > that embeds the Canary and calls various methods and takes actions without > needing a round trip to the logs and whatever aggregates them. > I propose we promote Canary to Public/Evolving. -- This message was sent by Atlassian JIRA (v7.6.14#76016)
[jira] [Updated] (HBASE-22823) Mark Canary as Public/Evolving
[ https://issues.apache.org/jira/browse/HBASE-22823?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Xu Cang updated HBASE-22823: Fix Version/s: 1.4.11 1.3.6 2.1.6 2.2.1 1.5.0 3.0.0 > Mark Canary as Public/Evolving > -- > > Key: HBASE-22823 > URL: https://issues.apache.org/jira/browse/HBASE-22823 > Project: HBase > Issue Type: Sub-task >Reporter: Andrew Purtell >Assignee: Caroline >Priority: Minor > Fix For: 3.0.0, 1.5.0, 2.2.1, 2.1.6, 1.3.6, 1.4.11 > > Attachments: HBASE-22823.branch-1.000.patch, > HBASE-22823.branch-2.000.patch, HBASE-22823.master.000.patch > > > Canary is marked as a Private class. Its interfaces could change at any time. > Should we change the annotation on Canary to Public/Evolving? Or add > annotations on some of these subtypes? I think it depends on how we think > Canary results should be consumed. > In our production we find that scraping logs and parsing them is brittle and > not scalable. Although the scalability issue is more to do with the totality > of logs from a Hadoopish stack, if you run HBase then you have this problem, > and you wouldn't be using the canary if you didn't run HBase. We have a tool > that embeds the Canary and calls various methods and takes actions without > needing a round trip to the logs and whatever aggregates them. > I propose we promote Canary to Public/Evolving. -- This message was sent by Atlassian JIRA (v7.6.14#76016)
[jira] [Commented] (HBASE-22804) Provide an API to get list of successful regions and total expected regions in Canary
[ https://issues.apache.org/jira/browse/HBASE-22804?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16908467#comment-16908467 ] Xu Cang commented on HBASE-22804: - still one more checkstyle issue (import order). you can refer to other files to see how they are ordered. [~caroliney14] > Provide an API to get list of successful regions and total expected regions > in Canary > - > > Key: HBASE-22804 > URL: https://issues.apache.org/jira/browse/HBASE-22804 > Project: HBase > Issue Type: Improvement > Components: canary >Affects Versions: 3.0.0, 1.3.0, 1.4.0, 1.5.0, 2.0.0, 2.1.5, 2.2.1 >Reporter: Caroline >Assignee: Caroline >Priority: Minor > Labels: Canary > Attachments: HBASE-22804.branch-1.001.patch, > HBASE-22804.branch-1.002.patch, HBASE-22804.branch-1.003.patch, > HBASE-22804.branch-1.004.patch, HBASE-22804.branch-2.001.patch, > HBASE-22804.branch-2.002.patch, HBASE-22804.branch-2.003.patch, > HBASE-22804.branch-2.004.patch, HBASE-22804.master.001.patch, > HBASE-22804.master.002.patch, HBASE-22804.master.003.patch, > HBASE-22804.master.004.patch > > > At present HBase Canary tool only prints the successes as part of logs. > Providing an API to get the list of successes, as well as total number of > expected regions, will make it easier to get a more accurate availability > estimate. > -- This message was sent by Atlassian JIRA (v7.6.14#76016)
[jira] [Commented] (HBASE-22823) Mark Canary as Public/Evolving
[ https://issues.apache.org/jira/browse/HBASE-22823?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16907683#comment-16907683 ] Xu Cang commented on HBASE-22823: - +1 I will commit this tomorrow if there is no objection. thanks > Mark Canary as Public/Evolving > -- > > Key: HBASE-22823 > URL: https://issues.apache.org/jira/browse/HBASE-22823 > Project: HBase > Issue Type: Sub-task >Reporter: Andrew Purtell >Assignee: Caroline >Priority: Minor > Attachments: HBASE-22823.branch-1.000.patch, > HBASE-22823.branch-2.000.patch, HBASE-22823.master.000.patch > > > Canary is marked as a Private class. Its interfaces could change at any time. > Should we change the annotation on Canary to Public/Evolving? Or add > annotations on some of these subtypes? I think it depends on how we think > Canary results should be consumed. > In our production we find that scraping logs and parsing them is brittle and > not scalable. Although the scalability issue is more to do with the totality > of logs from a Hadoopish stack, if you run HBase then you have this problem, > and you wouldn't be using the canary if you didn't run HBase. We have a tool > that embeds the Canary and calls various methods and takes actions without > needing a round trip to the logs and whatever aggregates them. > I propose we promote Canary to Public/Evolving. -- This message was sent by Atlassian JIRA (v7.6.14#76016)
[jira] [Commented] (HBASE-22804) Provide an API to get list of successful regions and total expected regions in Canary
[ https://issues.apache.org/jira/browse/HBASE-22804?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16907675#comment-16907675 ] Xu Cang commented on HBASE-22804: - I have several nits: 1. {code:java} import static org.junit.Assert.*;{code} Usually we do not use * in import. 2. please remove the extra empty line after {code:java} for (String regionName : regionMap.keySet()) {{code} Otherwise all look good to me! thanks! I'd suggest get the subtask committed before committing this one. > Provide an API to get list of successful regions and total expected regions > in Canary > - > > Key: HBASE-22804 > URL: https://issues.apache.org/jira/browse/HBASE-22804 > Project: HBase > Issue Type: Improvement > Components: canary >Affects Versions: 3.0.0, 1.3.0, 1.4.0, 1.5.0, 2.0.0, 2.1.5, 2.2.1 >Reporter: Caroline >Assignee: Caroline >Priority: Minor > Labels: Canary > Attachments: HBASE-22804.branch-1.001.patch, > HBASE-22804.branch-1.002.patch, HBASE-22804.branch-1.003.patch, > HBASE-22804.branch-1.004.patch, HBASE-22804.branch-2.001.patch, > HBASE-22804.branch-2.002.patch, HBASE-22804.branch-2.003.patch, > HBASE-22804.branch-2.004.patch, HBASE-22804.master.001.patch, > HBASE-22804.master.002.patch, HBASE-22804.master.003.patch, > HBASE-22804.master.004.patch > > > At present HBase Canary tool only prints the successes as part of logs. > Providing an API to get the list of successes, as well as total number of > expected regions, will make it easier to get a more accurate availability > estimate. > -- This message was sent by Atlassian JIRA (v7.6.14#76016)
[jira] [Commented] (HBASE-15867) Move HBase replication tracking from ZooKeeper to HBase
[ https://issues.apache.org/jira/browse/HBASE-15867?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16905686#comment-16905686 ] Xu Cang commented on HBASE-15867: - Since this Jira is blocked by a [dead loop issue|https://issues.apache.org/jira/browse/HBASE-20166]. Should we take one step back and ask, what's *the real issue we want to solve* by moving it from ZK to HBase table? ( I guess we want to remove the dependency of ZK for this usecase and make saving those info in a more reliable media?) Can we tackle those issues in different ways? I tried to read thru Jira descriptions and comments to find my answer but failed. Can you shed some lights? thanks [~openinx] > Move HBase replication tracking from ZooKeeper to HBase > --- > > Key: HBASE-15867 > URL: https://issues.apache.org/jira/browse/HBASE-15867 > Project: HBase > Issue Type: New Feature > Components: Replication >Affects Versions: 2.1.0 >Reporter: Joseph >Assignee: Zheng Hu >Priority: Major > Fix For: 2.3.0 > > > Move the WAL file and offset tracking out of ZooKeeper and into an HBase > table called hbase:replication. > The largest three new changes will be two classes ReplicationTableBase, > TableBasedReplicationQueues, and TableBasedReplicationQueuesClient. As of now > ReplicationPeers and HFileRef's tracking will not be implemented. Subtasks > have been filed for these two jobs. -- This message was sent by Atlassian JIRA (v7.6.14#76016)
[jira] [Commented] (HBASE-20166) Make sure the RS/Master can works fine when using table based replication storage layer
[ https://issues.apache.org/jira/browse/HBASE-20166?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16905680#comment-16905680 ] Xu Cang commented on HBASE-20166: - >From high level, this difficulty sounds like HBase needs an external storage >to keep the peer info (source/ sink) regardless of HBase service up or down. >And HBase startup relies on this info to be present to keep HBase replication >logic correct without compromising data integrity. This "external storage" >could be Hfiles, but without letting Regions online, this data cannot be >read/understood by HBase. Another thought: "try to assign all system table to a rs which only accept regions of system table assignment " -- This sounds like a downgrade of high availability for HBase to me. Thinking it aloud. Does is make sense to have a hybrid solution: letting ZK keeps tracking of peer info (to make source/sink init finish) and store others into HBase table. We could also sync peer info into a table when replication table is online. [~openinx] > Make sure the RS/Master can works fine when using table based replication > storage layer > --- > > Key: HBASE-20166 > URL: https://issues.apache.org/jira/browse/HBASE-20166 > Project: HBase > Issue Type: Sub-task >Reporter: Zheng Hu >Assignee: Zheng Hu >Priority: Major > Attachments: HBASE-20166.v1.patch > > > Currently, we cannot setup the HBase Cluster because the master will list > peers before finish its initialization, and if master cannot finish > initialization, the meta cannot be online, in other hand, if meta cannot be > online, the list peers will never success when using table based replication. > a dead loop happen. > {code} > 2018-03-09 15:03:50,531 ERROR [M:0;huzheng-xiaomi:46549] > helpers.MarkerIgnoringBase(159): * ABORTING master > huzheng-xiaomi,46549,1520579026550: Unhandled exception. Starting shutdown. > * > java.io.UncheckedIOException: > org.apache.hadoop.hbase.client.RetriesExhaustedException: Can't get the > location for replica 0 > at > org.apache.hadoop.hbase.client.ResultScanner$1.hasNext(ResultScanner.java:55) > at > org.apache.hadoop.hbase.replication.TableReplicationPeerStorage.listPeerIds(TableReplicationPeerStorage.java:124) > at > org.apache.hadoop.hbase.master.replication.ReplicationPeerManager.create(ReplicationPeerManager.java:335) > at > org.apache.hadoop.hbase.master.HMaster.initializeZKBasedSystemTrackers(HMaster.java:737) > at > org.apache.hadoop.hbase.master.HMaster.finishActiveMasterInitialization(HMaster.java:830) > at > org.apache.hadoop.hbase.master.HMaster.startActiveMasterManager(HMaster.java:2014) > at org.apache.hadoop.hbase.master.HMaster.run(HMaster.java:557) > {code} -- This message was sent by Atlassian JIRA (v7.6.14#76016)
[jira] [Commented] (HBASE-22819) Automatically migrate the rs group config for table after HBASE-22695
[ https://issues.apache.org/jira/browse/HBASE-22819?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16905669#comment-16905669 ] Xu Cang commented on HBASE-22819: - Could you elaborate this issue a bit more? thanks [~Apache9] > Automatically migrate the rs group config for table after HBASE-22695 > - > > Key: HBASE-22819 > URL: https://issues.apache.org/jira/browse/HBASE-22819 > Project: HBase > Issue Type: Sub-task >Reporter: Duo Zhang >Priority: Major > > It used to be stored in the rsgroup table, so we need to migrate it to the > new place. -- This message was sent by Atlassian JIRA (v7.6.14#76016)
[jira] [Commented] (HBASE-22695) Store the rsgroup of a table in table configuration
[ https://issues.apache.org/jira/browse/HBASE-22695?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16905660#comment-16905660 ] Xu Cang commented on HBASE-22695: - I am late for the discussion, but still would love to know, what's the rationale doing this? Thanks! > Store the rsgroup of a table in table configuration > --- > > Key: HBASE-22695 > URL: https://issues.apache.org/jira/browse/HBASE-22695 > Project: HBase > Issue Type: Sub-task > Components: rsgroup >Reporter: Duo Zhang >Assignee: Duo Zhang >Priority: Major > Fix For: HBASE-22514 > > -- This message was sent by Atlassian JIRA (v7.6.14#76016)
[jira] [Commented] (HBASE-22823) Mark Canary as Public/Evolving
[ https://issues.apache.org/jira/browse/HBASE-22823?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16905647#comment-16905647 ] Xu Cang commented on HBASE-22823: - I agree with this proposal and I don't see any practical downside about this. FYI, The private interface was introduced here : https://issues.apache.org/jira/browse/HBASE-17930 which removed using Canary from HBaseTestingUtility > Mark Canary as Public/Evolving > -- > > Key: HBASE-22823 > URL: https://issues.apache.org/jira/browse/HBASE-22823 > Project: HBase > Issue Type: Sub-task >Reporter: Andrew Purtell >Priority: Minor > > Canary is marked as a Private class. Its interfaces could change at any time. > Should we change the annotation on Canary to Public/Evolving? Or add > annotations on some of these subtypes? I think it depends on how we think > Canary results should be consumed. > In our production we find that scraping logs and parsing them is brittle and > not scalable. Although the scalability issue is more to do with the totality > of logs from a Hadoopish stack, if you run HBase then you have this problem, > and you wouldn't be using the canary if you didn't run HBase. We have a tool > that embeds the Canary and calls various methods and takes actions without > needing a round trip to the logs and whatever aggregates them. > I propose we promote Canary to Public/Evolving. -- This message was sent by Atlassian JIRA (v7.6.14#76016)
[jira] [Commented] (HBASE-22836) MemcachedBlockCache parameter error
[ https://issues.apache.org/jira/browse/HBASE-22836?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16905645#comment-16905645 ] Xu Cang commented on HBASE-22836: - So, if I understand you correctly, you meant, {code:java} public int getMaxSize() { return MAX_SIZE; }{code} should be {code:java} public int getMaxSize() { return CachedData.MAX_SIZE; }{code} Correct? Do you want to submit a patch for this? Thanks. > MemcachedBlockCache parameter error > --- > > Key: HBASE-22836 > URL: https://issues.apache.org/jira/browse/HBASE-22836 > Project: HBase > Issue Type: Bug > Components: BlockCache >Affects Versions: 1.4.9, 1.4.10 >Reporter: Zhao >Priority: Major > > When cache a block,the expiration is always set MAX_SIZE (which is a static > final field). > MAX_SIZE was mistakenly thought to be the max size of block. In fact, this > parameter represents the expiration of the cached block. MAX_SIZE should be > set 0 which means forever. -- This message was sent by Atlassian JIRA (v7.6.14#76016)
[jira] [Commented] (HBASE-22837) Move "Custom WAL Directory" section from "Bulk Loading" to "Write Ahead Log (WAL)" chapter
[ https://issues.apache.org/jira/browse/HBASE-22837?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16905639#comment-16905639 ] Xu Cang commented on HBASE-22837: - +1 > Move "Custom WAL Directory" section from "Bulk Loading" to "Write Ahead Log > (WAL)" chapter > -- > > Key: HBASE-22837 > URL: https://issues.apache.org/jira/browse/HBASE-22837 > Project: HBase > Issue Type: Bug > Components: documentation >Reporter: Wellington Chevreuil >Assignee: Wellington Chevreuil >Priority: Minor > Attachments: HBASE-22837.master.001.patch > > > Currently, explanation about *Custom WAL Directory* configuration is a > sub-topic of *Bulk Loading,* chapter, yet this subject has not much relation > with bulk loading at all. It should rather be moved to a sub-section of the > *Write Ahead Log (WAL)* chapter. -- This message was sent by Atlassian JIRA (v7.6.14#76016)
[jira] [Commented] (HBASE-22839) Provide Serial Replication in HBase 1.3 to fix "row keys and timestamps are the same but the values are different in the presence of cross-cluster replication"
[ https://issues.apache.org/jira/browse/HBASE-22839?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16905612#comment-16905612 ] Xu Cang commented on HBASE-22839: - FYI, keep in mind that there is a discussion about "Considering immediate EOL of branch-1.3 and branch-1.4 " proposed by [~apurtell] on hbase mailling list. > Provide Serial Replication in HBase 1.3 to fix "row keys and timestamps are > the same but the values are different in the presence of cross-cluster > replication" > --- > > Key: HBASE-22839 > URL: https://issues.apache.org/jira/browse/HBASE-22839 > Project: HBase > Issue Type: Improvement > Components: Replication >Affects Versions: 1.3.4, 1.3.5 >Reporter: Bin Shi >Priority: Major > Fix For: 1.3.4, 1.3.5 > > > Problem Statement: > In the cross-cluster replication validation, we found some cells in > master(source) cluster and slave(destination) cluster can have the same row > key, the same timestamp but different values. The happens when mutations with > the same row key are submitted in batch without specifying the timestamp, and > the same timestamp in the unit of millisecond is assigned at the time when > they are committed to the WAL. > When this happens, if the major compaction hasn’t happened yet and you scan > the table, you can find some cells have the same row key, the same timestamps > but different values, like the first three rows in the following table. > |Row Key 1|CF0::Column 1|Timestatmp 1|Value 1| > |Row Key 1|CF0::Column 1|Timestatmp 1|Value 2| > |Row Key 1|CF0::Column 1|Timestatmp 1|Value 3| > |Row Key 2|CF0::Column 1|Timestatmp 2|Value 4| > |Row Key 3|CF0::Column 1|Timestatmp 4|Value 5| > The ordering of the first three rows is indeterminate in the presence of the > cross-replication, so after compaction, in the master cluster you will see > “Row Key 1, CF0::Column1, Timestamp1” having the value 3, but in the slave > cluster, you might see the cell having one of the three possible values 1, 2, > 3, which results data inconsistency issue between the master and slave > clusters. > Root Cause Analysis: > In HBaseInterClusterReplicationEndpoint.createBatches() of branch-1.3, the > WAL entries from the same region could be split into different batches > according to replication RPC limit and these batches are shipped by > ReplicationSource concurrently, so the batches for the same region could > arrive at the sink on the region servers in the slave clusters then apply to > the region in indeterminate order due to synchronous nature of cross-cluster > replication. > Solution: > In HBase 3.0.0 and 2.1.0, we provided Serial Replication HBASE-20046 which > guarantees the order of pushing logs to slave clusters is same as the order > of requests from client in the master cluster. It contains mainly two changes: > # Recording the replication "barriers" in ZooKeeper to synchronize the > replication across old/failed RS and new RS to provide strict ordering > semantics even in the presence of region-move or RS failure. > # Make sure the batches within one region are shipped to the slave clusters > in order. > The second part of change is exactly what we need and the minimal change to > fix the issue in this JIRA. > To fix the issue in this JIRA, we have two options: > # Cherry-Pick HBASE-20046 to branch 1.3. Pros: It also fixes the data > inconsistency issue when there is region-move or RS failure and help to > reduce the noises in our cross-cluster replication/backup validation which is > our ultimate goal. Cons: the change is big and I'm not sure for now whether > the change is self-contained or it has other dependencies which need to port > to branch 1.3 too; and we need longer time to validate and stabilize. > # Port the minimal change or make the equivalent change as the second part > of HBASE-20046 to make sure the batches within one region are shipped to the > slave clusters in order." > I prefer option 2 because of cons of option 1. Thoughts? -- This message was sent by Atlassian JIRA (v7.6.14#76016)
[jira] [Updated] (HBASE-22775) Enhance logging for peer related operations
[ https://issues.apache.org/jira/browse/HBASE-22775?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Xu Cang updated HBASE-22775: Attachment: HBASE-22775.master.002.patch > Enhance logging for peer related operations > --- > > Key: HBASE-22775 > URL: https://issues.apache.org/jira/browse/HBASE-22775 > Project: HBase > Issue Type: Improvement >Reporter: Xu Cang >Assignee: Xu Cang >Priority: Minor > Attachments: HBASE-22775.master.001.patch, > HBASE-22775.master.002.patch > > > Now we don't have good logging regarding peer operations, for example addPeer > does not log itself: > [https://github.com/apache/hbase/blob/master/hbase-replication/src/main/java/org/apache/hadoop/hbase/replication/ZKReplicationPeerStorage.java#L102] > This Jira is aiming to enhancing this area -- This message was sent by Atlassian JIRA (v7.6.14#76016)
[jira] [Comment Edited] (HBASE-22775) Enhance logging for peer related operations
[ https://issues.apache.org/jira/browse/HBASE-22775?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16905567#comment-16905567 ] Xu Cang edited comment on HBASE-22775 at 8/12/19 8:54 PM: -- So, ZKReplicationPeerStorage extends _ZKReplicationStorageBase_ and implements interface _ReplicationPeerStorage_. Another Class _ZKReplicationQueueStorage_ also extends ZKReplicationStorageBase. I only want to log "peer related operations" such as addPeer, removePeer and so on. So, _ZKReplicationQueueStorage_ is not something I need/want to change here. (since it doesn't store Peer ops info inside) And for now, there is only one real implementation for interface ReplicationPeerStorage. And the _ReplicationPeerStorage_ is an interface in which I don't want to put any logs. [~apurtell] thanks! was (Author: xucang): So, ZKReplicationPeerStorage extends _ZKReplicationStorageBase_ and implements interface _ReplicationPeerStorage_. Another Class _ZKReplicationQueueStorage_ also extends ZKReplicationStorageBase. I only want to log "peer related operations" such ad addPeer, removePeer and so on. So, _ZKReplicationQueueStorage_ is not something I need/want to change here. (since it doesn't store Peer ops info inside) And for now, there is only one real implementation for interface ReplicationPeerStorage. And the _ReplicationPeerStorage_ is a interface. So I don't want to put any logs in the interface. [~apurtell] thanks! > Enhance logging for peer related operations > --- > > Key: HBASE-22775 > URL: https://issues.apache.org/jira/browse/HBASE-22775 > Project: HBase > Issue Type: Improvement >Reporter: Xu Cang >Assignee: Xu Cang >Priority: Minor > Attachments: HBASE-22775.master.001.patch > > > Now we don't have good logging regarding peer operations, for example addPeer > does not log itself: > [https://github.com/apache/hbase/blob/master/hbase-replication/src/main/java/org/apache/hadoop/hbase/replication/ZKReplicationPeerStorage.java#L102] > This Jira is aiming to enhancing this area -- This message was sent by Atlassian JIRA (v7.6.14#76016)
[jira] [Commented] (HBASE-22775) Enhance logging for peer related operations
[ https://issues.apache.org/jira/browse/HBASE-22775?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16905567#comment-16905567 ] Xu Cang commented on HBASE-22775: - So, ZKReplicationPeerStorage extends _ZKReplicationStorageBase_ and implements interface _ReplicationPeerStorage_. Another Class _ZKReplicationQueueStorage_ also extends ZKReplicationStorageBase. I only want to log "peer related operations" such ad addPeer, removePeer and so on. So, _ZKReplicationQueueStorage_ is not something I need/want to change here. (since it doesn't store Peer ops info inside) And for now, there is only one real implementation for interface ReplicationPeerStorage. And the _ReplicationPeerStorage_ is a interface. So I don't want to put any logs in the interface. [~apurtell] thanks! > Enhance logging for peer related operations > --- > > Key: HBASE-22775 > URL: https://issues.apache.org/jira/browse/HBASE-22775 > Project: HBase > Issue Type: Improvement >Reporter: Xu Cang >Assignee: Xu Cang >Priority: Minor > Attachments: HBASE-22775.master.001.patch > > > Now we don't have good logging regarding peer operations, for example addPeer > does not log itself: > [https://github.com/apache/hbase/blob/master/hbase-replication/src/main/java/org/apache/hadoop/hbase/replication/ZKReplicationPeerStorage.java#L102] > This Jira is aiming to enhancing this area -- This message was sent by Atlassian JIRA (v7.6.14#76016)
[jira] [Commented] (HBASE-22828) Log a region close journal
[ https://issues.apache.org/jira/browse/HBASE-22828?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16905438#comment-16905438 ] Xu Cang commented on HBASE-22828: - +1 > Log a region close journal > -- > > Key: HBASE-22828 > URL: https://issues.apache.org/jira/browse/HBASE-22828 > Project: HBase > Issue Type: Improvement >Reporter: Andrew Purtell >Assignee: Andrew Purtell >Priority: Minor > Fix For: 3.0.0, 1.5.0, 2.2.1, 2.1.6, 1.3.6, 1.4.11 > > Attachments: HBASE-22828-branch-1.patch, HBASE-22828.patch > > > We already track region close activity with a MonitoredTask. Enable the > status journal and dump it at DEBUG log level so if for some reasons region > closes are taking a long time we have a timestamped journal of the activity > and how long each step took. -- This message was sent by Atlassian JIRA (v7.6.14#76016)
[jira] [Updated] (HBASE-22775) Enhance logging for peer related operations
[ https://issues.apache.org/jira/browse/HBASE-22775?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Xu Cang updated HBASE-22775: Attachment: HBASE-22775.master.001.patch Status: Patch Available (was: Open) > Enhance logging for peer related operations > --- > > Key: HBASE-22775 > URL: https://issues.apache.org/jira/browse/HBASE-22775 > Project: HBase > Issue Type: Improvement >Reporter: Xu Cang >Assignee: Xu Cang >Priority: Minor > Attachments: HBASE-22775.master.001.patch > > > Now we don't have good logging regarding peer operations, for example addPeer > does not log itself: > [https://github.com/apache/hbase/blob/master/hbase-replication/src/main/java/org/apache/hadoop/hbase/replication/ZKReplicationPeerStorage.java#L102] > This Jira is aiming to enhancing this area -- This message was sent by Atlassian JIRA (v7.6.14#76016)
[jira] [Commented] (HBASE-22780) Missing table descriptor may cause STUCK Region-In-Transition forever
[ https://issues.apache.org/jira/browse/HBASE-22780?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16901559#comment-16901559 ] Xu Cang commented on HBASE-22780: - The patch looks good to me. (the unit test failure is irrelevant ,that's one of the flaky tests we always have) Though I would suggest adding a test for this case if possible. [~yu-huiyang] > Missing table descriptor may cause STUCK Region-In-Transition forever > -- > > Key: HBASE-22780 > URL: https://issues.apache.org/jira/browse/HBASE-22780 > Project: HBase > Issue Type: Bug >Affects Versions: 2.1.1 >Reporter: yuhuiyang >Priority: Major > Attachments: HBASE-22780-branch-2.1-01.patch > > > AssignProcedure plan to open a region R1 on regionserver RS1 and RS1 failed > due to missing table descriptor exception (throw new IOException("Missing > table descriptor for " + region.getEncodedName()). If the region is planned > to this regionserver RS1 again , it will ignor this request and do nothing . > So the region R1 will be in Region-In-Transition state forever even if the > missing table descriptor problem has been solved! -- This message was sent by Atlassian JIRA (v7.6.14#76016)
[jira] [Commented] (HBASE-22775) Enhance logging for peer related operations
[ https://issues.apache.org/jira/browse/HBASE-22775?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16900520#comment-16900520 ] Xu Cang commented on HBASE-22775: - Not yet [~apurtell] > Enhance logging for peer related operations > --- > > Key: HBASE-22775 > URL: https://issues.apache.org/jira/browse/HBASE-22775 > Project: HBase > Issue Type: Improvement >Reporter: Xu Cang >Assignee: Xu Cang >Priority: Minor > > Now we don't have good logging regarding peer operations, for example addPeer > does not log itself: > [https://github.com/apache/hbase/blob/master/hbase-replication/src/main/java/org/apache/hadoop/hbase/replication/ZKReplicationPeerStorage.java#L102] > This Jira is aiming to enhancing this area -- This message was sent by Atlassian JIRA (v7.6.14#76016)
[jira] [Commented] (HBASE-22780) Missing table descriptor may cause STUCK Region-In-Transition forever
[ https://issues.apache.org/jira/browse/HBASE-22780?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16899556#comment-16899556 ] Xu Cang commented on HBASE-22780: - looks good! Could you please click the "submit patch"button to submit this patch. It will trigger hadoop-qa testing. Thanks! Also, is that possible to add a unit test? > Missing table descriptor may cause STUCK Region-In-Transition forever > -- > > Key: HBASE-22780 > URL: https://issues.apache.org/jira/browse/HBASE-22780 > Project: HBase > Issue Type: Bug >Affects Versions: 2.1.1 >Reporter: yuhuiyang >Priority: Major > Attachments: HBASE-22780-branch-2.1-01.patch > > > AssignProcedure plan to open a region R1 on regionserver RS1 and RS1 failed > due to missing table descriptor exception (throw new IOException("Missing > table descriptor for " + region.getEncodedName()). If the region is planned > to this regionserver RS1 again , it will ignor this request and do nothing . > So the region R1 will be in Region-In-Transition state forever even if the > missing table descriptor problem has been solved! -- This message was sent by Atlassian JIRA (v7.6.14#76016)
[jira] [Comment Edited] (HBASE-22780) Missing table descriptor may cause STUCK Region-In-Transition forever
[ https://issues.apache.org/jira/browse/HBASE-22780?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16899556#comment-16899556 ] Xu Cang edited comment on HBASE-22780 at 8/4/19 4:52 AM: - looks good! Could you please click the "submit patch"button to submit this patch. It will trigger hadoop-qa testing. Thanks! Also, is that possible to add a unit test? [~yu-huiyang] was (Author: xucang): looks good! Could you please click the "submit patch"button to submit this patch. It will trigger hadoop-qa testing. Thanks! Also, is that possible to add a unit test? > Missing table descriptor may cause STUCK Region-In-Transition forever > -- > > Key: HBASE-22780 > URL: https://issues.apache.org/jira/browse/HBASE-22780 > Project: HBase > Issue Type: Bug >Affects Versions: 2.1.1 >Reporter: yuhuiyang >Priority: Major > Attachments: HBASE-22780-branch-2.1-01.patch > > > AssignProcedure plan to open a region R1 on regionserver RS1 and RS1 failed > due to missing table descriptor exception (throw new IOException("Missing > table descriptor for " + region.getEncodedName()). If the region is planned > to this regionserver RS1 again , it will ignor this request and do nothing . > So the region R1 will be in Region-In-Transition state forever even if the > missing table descriptor problem has been solved! -- This message was sent by Atlassian JIRA (v7.6.14#76016)
[jira] [Commented] (HBASE-22762) Print the delta between phases in the split/merge/compact/flush transaction journals
[ https://issues.apache.org/jira/browse/HBASE-22762?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16898465#comment-16898465 ] Xu Cang commented on HBASE-22762: - +1 > Print the delta between phases in the split/merge/compact/flush transaction > journals > > > Key: HBASE-22762 > URL: https://issues.apache.org/jira/browse/HBASE-22762 > Project: HBase > Issue Type: Improvement > Components: logging >Reporter: Andrew Purtell >Assignee: Andrew Purtell >Priority: Minor > Fix For: 1.5.0, 1.3.6, 1.4.11 > > Attachments: HBASE-22762-branch-1-addendum.patch, > HBASE-22762.branch-1.001.patch, HBASE-22762.branch-1.002.patch, > HBASE-22762.branch-1.004.patch > > > We print the start timestamp for each phase when logging the > split/merge/compact/flush transaction journals and so when debugging an > operator must do the math by hand. It would be trivial to also print the > delta from the start timestamp of the previous phase and helpful to do so. -- This message was sent by Atlassian JIRA (v7.6.14#76016)
[jira] [Assigned] (HBASE-22775) Enhance logging for peer related operations
[ https://issues.apache.org/jira/browse/HBASE-22775?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Xu Cang reassigned HBASE-22775: --- Assignee: Xu Cang > Enhance logging for peer related operations > --- > > Key: HBASE-22775 > URL: https://issues.apache.org/jira/browse/HBASE-22775 > Project: HBase > Issue Type: Improvement >Reporter: Xu Cang >Assignee: Xu Cang >Priority: Minor > > Now we don't have good logging regarding peer operations, for example addPeer > does not log itself: > [https://github.com/apache/hbase/blob/master/hbase-replication/src/main/java/org/apache/hadoop/hbase/replication/ZKReplicationPeerStorage.java#L102] > This Jira is aiming to enhancing this area -- This message was sent by Atlassian JIRA (v7.6.14#76016)
[jira] [Created] (HBASE-22775) Enhance logging for peer related operations
Xu Cang created HBASE-22775: --- Summary: Enhance logging for peer related operations Key: HBASE-22775 URL: https://issues.apache.org/jira/browse/HBASE-22775 Project: HBase Issue Type: Improvement Reporter: Xu Cang Now we don't have good logging regarding peer operations, for example addPeer does not log itself: [https://github.com/apache/hbase/blob/master/hbase-replication/src/main/java/org/apache/hadoop/hbase/replication/ZKReplicationPeerStorage.java#L102] This Jira is aiming to enhancing this area -- This message was sent by Atlassian JIRA (v7.6.14#76016)
[jira] [Commented] (HBASE-22642) Make move operations of RSGroup idempotent
[ https://issues.apache.org/jira/browse/HBASE-22642?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16897438#comment-16897438 ] Xu Cang commented on HBASE-22642: - [~Xiaolin Ha] Thanks! I re-reviewed the patch. Looks good. But there is one unit test's still failing. Could you please take a look? thanks > Make move operations of RSGroup idempotent > -- > > Key: HBASE-22642 > URL: https://issues.apache.org/jira/browse/HBASE-22642 > Project: HBase > Issue Type: Bug > Components: rsgroup >Reporter: Xiaolin Ha >Assignee: Xiaolin Ha >Priority: Major > > Currently, when moving tables or servers to a group, only groupInfo is > checked. And in RSGroup implementation, groupinfo is written to disk before > regions movements are done. If there are some problems caused move regions > abort, some regions will be on wrong regionservers. What's the worse, retry > the move operation will be rejected because of the correct groupinfo. > We think when moving, not only groupInfo should be checked, but also relevant > region assignments should be checked and corrected. -- This message was sent by Atlassian JIRA (v7.6.14#76016)
[jira] [Commented] (HBASE-22703) [Balancer] Regions no longer exists but there are 'RegionPlan's in balancer.
[ https://issues.apache.org/jira/browse/HBASE-22703?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16897430#comment-16897430 ] Xu Cang commented on HBASE-22703: - I am thinking it aloud, maybe first step is to add a unit test to reproduce this issue. It could help understanding why this is happening. thanks > [Balancer] Regions no longer exists but there are 'RegionPlan's in balancer. > > > Key: HBASE-22703 > URL: https://issues.apache.org/jira/browse/HBASE-22703 > Project: HBase > Issue Type: Bug > Components: Balancer >Affects Versions: 1.4.6 >Reporter: Reid Chan >Priority: Major > > {code} > 2019-07-17 14:20:00,693 INFO > [bd-hbase001.bx.momo.com,16000,1561099443129_ChoreService_1] master.HMaster: > balance > hri=feed:content,05f92c5b,1561537720857.8f1747020ed2442e93387c770f343596., > src=, dest= > 2019-07-17 14:20:00,693 INFO > [bd-hbase001.bx.momo.com,16000,1561099443129_ChoreService_1] > master.AssignmentManager: Ignored moving region not assigned: {ENCODED => > 8f1747020ed2442e93387c770f343596, NAME => > 'feed:content,05f92c5b,1561537720857.8f1747020ed2442e93387c770f343596.', > STARTKEY => '05f92c5b', ENDKEY => '06d3a068'}, not in region states > 2019-07-17 14:20:00,693 INFO > [bd-hbase001.bx.momo.com,16000,1561099443129_ChoreService_1] master.HMaster: > balance > hri=feed:content,a5e7,1561537720857.2c48781dcb5e21490820d5b8f4d50e9f., > src=, dest= > 2019-07-17 14:20:00,693 INFO > [bd-hbase001.bx.momo.com,16000,1561099443129_ChoreService_1] > master.AssignmentManager: Ignored moving region not assigned: {ENCODED => > 2c48781dcb5e21490820d5b8f4d50e9f, NAME => > 'feed:content,a5e7,1561537720857.2c48781dcb5e21490820d5b8f4d50e9f.', > STARTKEY => 'a5e7', ENDKEY => 'a740d9f4'}, not in region states > 2019-07-17 14:20:00,693 INFO > [bd-hbase001.bx.momo.com,16000,1561099443129_ChoreService_1] master.HMaster: > balance > hri=feed:content,94e3,1561537720857.0aa7fc2d59fc68d05d4695f3b6834915., > src=, dest= > 2019-07-17 14:20:00,693 INFO > [bd-hbase001.bx.momo.com,16000,1561099443129_ChoreService_1] > master.AssignmentManager: Ignored moving region not assigned: {ENCODED => > 0aa7fc2d59fc68d05d4695f3b6834915, NAME => > 'feed:content,94e3,1561537720857.0aa7fc2d59fc68d05d4695f3b6834915.', > STARTKEY => '94e3', ENDKEY => '962fc8f0'}, not in region states > {code} > These are logs pasted from master log. In fact, table {{feed:content}} and > its regions no longer exists in cluster, but they seems being kept in > {{RegionPlan}} or {{Balancer}} forever. > BTW, the balancer is {{RSGroupBasedLoadBalancer}} -- This message was sent by Atlassian JIRA (v7.6.14#76016)
[jira] [Commented] (HBASE-22731) ReplicationSource and HBaseInterClusterReplicationEndpoint log messages should include a target Peer identifier
[ https://issues.apache.org/jira/browse/HBASE-22731?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16897396#comment-16897396 ] Xu Cang commented on HBASE-22731: - Very useful stuff. One thing I'd like to see is some example log lines. Thanks. > ReplicationSource and HBaseInterClusterReplicationEndpoint log messages > should include a target Peer identifier > --- > > Key: HBASE-22731 > URL: https://issues.apache.org/jira/browse/HBASE-22731 > Project: HBase > Issue Type: Improvement > Components: Replication >Reporter: Wellington Chevreuil >Assignee: Wellington Chevreuil >Priority: Minor > Attachments: HBASE-22731.master.001.patch, > HBASE-22731.master.002.patch > > > _ReplicationSource_ and _HBaseInterClusterReplicationEndpoint_ already > include a good number of helpful DEBUG and TRACE log messages to help us > troubleshooting typical replication problems, such as lags or mysteriously > missing edits on target peer. > However, for each configured peer, there will be an individual > _ReplicationSource_/_HBaseInterClusterReplicationEndpoint_ pair running in > parallel, in scenarios where we need to investigate issues within a source to > an specific peer, we can't distinguish from which peer specific > _ReplicationSource_/_HBaseInterClusterReplicationEndpoint._ For such cases it > would be nice to have an identifier for specific peer the given > _ReplicationSource_/_HBaseInterClusterReplicationEndpoint_ is related to. -- This message was sent by Atlassian JIRA (v7.6.14#76016)
[jira] [Commented] (HBASE-22773) when set blockSize option in Performance Evaluation tool, error occurs:ERROR: Unrecognized option/command: --blockSize=131072
[ https://issues.apache.org/jira/browse/HBASE-22773?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16897337#comment-16897337 ] Xu Cang commented on HBASE-22773: - +1 > when set blockSize option in Performance Evaluation tool, error occurs:ERROR: > Unrecognized option/command: --blockSize=131072 > - > > Key: HBASE-22773 > URL: https://issues.apache.org/jira/browse/HBASE-22773 > Project: HBase > Issue Type: Bug > Components: mapreduce >Affects Versions: 2.1.0, 2.2.0, 2.3.0 > Environment: OS:CentOS7.6 > CPU: 6148 Processor >Reporter: dingwei2019 >Assignee: dingwei2019 >Priority: Minor > Attachments: HBASE-22773.rel-2.1.0.patch > > > I believe "blockSize" is an new options for PE in HBase2.0, when i try to set > the blockSize, error occurs:ERROR: Unrecognized option/command: > --blockSize=131072. > The error occurs because of missing a "continue;" when we match the option > "blockSize". If there isn't a "continue" the program will execute the last > "printUsageAndExit branch". -- This message was sent by Atlassian JIRA (v7.6.14#76016)
[jira] [Updated] (HBASE-22762) Print the delta between phases in the split/merge/compact/flush transaction journals
[ https://issues.apache.org/jira/browse/HBASE-22762?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Xu Cang updated HBASE-22762: Attachment: HBASE-22762.branch-1.004.patch > Print the delta between phases in the split/merge/compact/flush transaction > journals > > > Key: HBASE-22762 > URL: https://issues.apache.org/jira/browse/HBASE-22762 > Project: HBase > Issue Type: Improvement > Components: logging >Reporter: Andrew Purtell >Assignee: Xu Cang >Priority: Minor > Attachments: HBASE-22762.branch-1.001.patch, > HBASE-22762.branch-1.002.patch, HBASE-22762.branch-1.004.patch > > > We print the start timestamp for each phase when logging the > split/merge/compact/flush transaction journals and so when debugging an > operator must do the math by hand. It would be trivial to also print the > delta from the start timestamp of the previous phase and helpful to do so. -- This message was sent by Atlassian JIRA (v7.6.14#76016)
[jira] [Commented] (HBASE-22762) Print the delta between phases in the split/merge/compact/flush transaction journals
[ https://issues.apache.org/jira/browse/HBASE-22762?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16896643#comment-16896643 ] Xu Cang commented on HBASE-22762: - This is new version: 5353 2019-07-30 17:07:33,779 INFO [RS:0;xcang-ltm:65524-splits-1564531627716] regionserver.SplitRequest(143): Split transaction journal: 5354 STARTED at 1564531651882PREPARED at 1564531651884 ( +2 ms) 5355 BEFORE_PRE_SPLIT_HOOK at 1564531651884 5356 AFTER_PRE_SPLIT_HOOK at 1564531651884 5357 SET_SPLITTING at 1564531651885 ( +1 ms) 5358 CREATE_SPLIT_DIR at 1564531651993 ( +108 ms) 5359 CLOSED_PARENT_REGION at 1564531651999 ( +6 ms) 5360 OFFLINED_PARENT at 1564531651999 5361 STARTED_REGION_A_CREATION at 1564531652820 ( +821 ms) 5362 STARTED_REGION_B_CREATION at 1564531653234 ( +414 ms) 5363 PONR at 1564531653643 ( +409 ms) 5364 OPENED_REGION_A at 1564531653665 ( +22 ms) 5365 OPENED_REGION_B at 1564531653665 5366 BEFORE_POST_SPLIT_HOOK at 1564531653777 ( +112 ms) 5367 AFTER_POST_SPLIT_HOOK at 1564531653777 5368 COMPLETED at 1564531653777 I misunderstood the timestamp, thought it was start of the phase, but it is end of the phase. > Print the delta between phases in the split/merge/compact/flush transaction > journals > > > Key: HBASE-22762 > URL: https://issues.apache.org/jira/browse/HBASE-22762 > Project: HBase > Issue Type: Improvement > Components: logging >Reporter: Andrew Purtell >Assignee: Xu Cang >Priority: Minor > Attachments: HBASE-22762.branch-1.001.patch, > HBASE-22762.branch-1.002.patch > > > We print the start timestamp for each phase when logging the > split/merge/compact/flush transaction journals and so when debugging an > operator must do the math by hand. It would be trivial to also print the > delta from the start timestamp of the previous phase and helpful to do so. -- This message was sent by Atlassian JIRA (v7.6.14#76016)
[jira] [Commented] (HBASE-22762) Print the delta between phases in the split/merge/compact/flush transaction journals
[ https://issues.apache.org/jira/browse/HBASE-22762?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16896630#comment-16896630 ] Xu Cang commented on HBASE-22762: - One small question, for this order issue you mentioned: 1103 OPENED_REGION_A at 1564524100504 (0ms) 1104 OPENED_REGION_B at 1564524100504 (114ms) 1105 BEFORE_POST_SPLIT_HOOK at 1564524100618 (0ms) I think it's accurate. What it means is, it spent 0ms on OPENED_REGION_A and it spent 114ms on OPENED_REGION_B. [~apurtell] thanks. > Print the delta between phases in the split/merge/compact/flush transaction > journals > > > Key: HBASE-22762 > URL: https://issues.apache.org/jira/browse/HBASE-22762 > Project: HBase > Issue Type: Improvement > Components: logging >Reporter: Andrew Purtell >Assignee: Xu Cang >Priority: Minor > Attachments: HBASE-22762.branch-1.001.patch, > HBASE-22762.branch-1.002.patch > > > We print the start timestamp for each phase when logging the > split/merge/compact/flush transaction journals and so when debugging an > operator must do the math by hand. It would be trivial to also print the > delta from the start timestamp of the previous phase and helpful to do so. -- This message was sent by Atlassian JIRA (v7.6.14#76016)
[jira] [Commented] (HBASE-22762) Print the delta between phases in the split/merge/compact/flush transaction journals
[ https://issues.apache.org/jira/browse/HBASE-22762?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16896614#comment-16896614 ] Xu Cang commented on HBASE-22762: - thanks! [~apurtell] will fix. > Print the delta between phases in the split/merge/compact/flush transaction > journals > > > Key: HBASE-22762 > URL: https://issues.apache.org/jira/browse/HBASE-22762 > Project: HBase > Issue Type: Improvement > Components: logging >Reporter: Andrew Purtell >Assignee: Xu Cang >Priority: Minor > Attachments: HBASE-22762.branch-1.001.patch, > HBASE-22762.branch-1.002.patch > > > We print the start timestamp for each phase when logging the > split/merge/compact/flush transaction journals and so when debugging an > operator must do the math by hand. It would be trivial to also print the > delta from the start timestamp of the previous phase and helpful to do so. -- This message was sent by Atlassian JIRA (v7.6.14#76016)
[jira] [Commented] (HBASE-22703) [Balancer] Regions no longer exists but there are 'RegionPlan's in balancer.
[ https://issues.apache.org/jira/browse/HBASE-22703?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16896604#comment-16896604 ] Xu Cang commented on HBASE-22703: - cleanupRSGroupForTable should be called when table is deleted and this will make sure plans don't contain deleted regions. Can I ask, by saying "table and its regions no longer exists in cluster", did you mean the table was deleted? thanks! [~reidchan] > [Balancer] Regions no longer exists but there are 'RegionPlan's in balancer. > > > Key: HBASE-22703 > URL: https://issues.apache.org/jira/browse/HBASE-22703 > Project: HBase > Issue Type: Bug > Components: Balancer >Affects Versions: 1.4.6 >Reporter: Reid Chan >Priority: Major > > {code} > 2019-07-17 14:20:00,693 INFO > [bd-hbase001.bx.momo.com,16000,1561099443129_ChoreService_1] master.HMaster: > balance > hri=feed:content,05f92c5b,1561537720857.8f1747020ed2442e93387c770f343596., > src=, dest= > 2019-07-17 14:20:00,693 INFO > [bd-hbase001.bx.momo.com,16000,1561099443129_ChoreService_1] > master.AssignmentManager: Ignored moving region not assigned: {ENCODED => > 8f1747020ed2442e93387c770f343596, NAME => > 'feed:content,05f92c5b,1561537720857.8f1747020ed2442e93387c770f343596.', > STARTKEY => '05f92c5b', ENDKEY => '06d3a068'}, not in region states > 2019-07-17 14:20:00,693 INFO > [bd-hbase001.bx.momo.com,16000,1561099443129_ChoreService_1] master.HMaster: > balance > hri=feed:content,a5e7,1561537720857.2c48781dcb5e21490820d5b8f4d50e9f., > src=, dest= > 2019-07-17 14:20:00,693 INFO > [bd-hbase001.bx.momo.com,16000,1561099443129_ChoreService_1] > master.AssignmentManager: Ignored moving region not assigned: {ENCODED => > 2c48781dcb5e21490820d5b8f4d50e9f, NAME => > 'feed:content,a5e7,1561537720857.2c48781dcb5e21490820d5b8f4d50e9f.', > STARTKEY => 'a5e7', ENDKEY => 'a740d9f4'}, not in region states > 2019-07-17 14:20:00,693 INFO > [bd-hbase001.bx.momo.com,16000,1561099443129_ChoreService_1] master.HMaster: > balance > hri=feed:content,94e3,1561537720857.0aa7fc2d59fc68d05d4695f3b6834915., > src=, dest= > 2019-07-17 14:20:00,693 INFO > [bd-hbase001.bx.momo.com,16000,1561099443129_ChoreService_1] > master.AssignmentManager: Ignored moving region not assigned: {ENCODED => > 0aa7fc2d59fc68d05d4695f3b6834915, NAME => > 'feed:content,94e3,1561537720857.0aa7fc2d59fc68d05d4695f3b6834915.', > STARTKEY => '94e3', ENDKEY => '962fc8f0'}, not in region states > {code} > These are logs pasted from master log. In fact, table {{feed:content}} and > its regions no longer exists in cluster, but they seems being kept in > {{RegionPlan}} or {{Balancer}} forever. > BTW, the balancer is {{RSGroupBasedLoadBalancer}} -- This message was sent by Atlassian JIRA (v7.6.14#76016)
[jira] [Commented] (HBASE-22727) The canary table should be placed on all regionservers
[ https://issues.apache.org/jira/browse/HBASE-22727?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16896587#comment-16896587 ] Xu Cang commented on HBASE-22727: - Good point! [~dmanning] FYI > The canary table should be placed on all regionservers > -- > > Key: HBASE-22727 > URL: https://issues.apache.org/jira/browse/HBASE-22727 > Project: HBase > Issue Type: Sub-task >Reporter: Duo Zhang >Priority: Major > > Not within a single rs group. -- This message was sent by Atlassian JIRA (v7.6.14#76016)
[jira] [Updated] (HBASE-22762) Print the delta between phases in the split/merge/compact/flush transaction journals
[ https://issues.apache.org/jira/browse/HBASE-22762?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Xu Cang updated HBASE-22762: Attachment: HBASE-22762.branch-1.002.patch > Print the delta between phases in the split/merge/compact/flush transaction > journals > > > Key: HBASE-22762 > URL: https://issues.apache.org/jira/browse/HBASE-22762 > Project: HBase > Issue Type: Improvement > Components: logging >Reporter: Andrew Purtell >Assignee: Xu Cang >Priority: Minor > Attachments: HBASE-22762.branch-1.001.patch, > HBASE-22762.branch-1.002.patch > > > We print the start timestamp for each phase when logging the > split/merge/compact/flush transaction journals and so when debugging an > operator must do the math by hand. It would be trivial to also print the > delta from the start timestamp of the previous phase and helpful to do so. -- This message was sent by Atlassian JIRA (v7.6.14#76016)
[jira] [Updated] (HBASE-22762) Print the delta between phases in the split/merge/compact/flush transaction journals
[ https://issues.apache.org/jira/browse/HBASE-22762?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Xu Cang updated HBASE-22762: Attachment: HBASE-22762.branch-1.001.patch > Print the delta between phases in the split/merge/compact/flush transaction > journals > > > Key: HBASE-22762 > URL: https://issues.apache.org/jira/browse/HBASE-22762 > Project: HBase > Issue Type: Improvement > Components: logging >Reporter: Andrew Purtell >Assignee: Xu Cang >Priority: Minor > Attachments: HBASE-22762.branch-1.001.patch > > > We print the start timestamp for each phase when logging the > split/merge/compact/flush transaction journals and so when debugging an > operator must do the math by hand. It would be trivial to also print the > delta from the start timestamp of the previous phase and helpful to do so. -- This message was sent by Atlassian JIRA (v7.6.14#76016)
[jira] [Commented] (HBASE-22762) Print the delta between phases in the split/merge/compact/flush transaction journals
[ https://issues.apache.org/jira/browse/HBASE-22762?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16896563#comment-16896563 ] Xu Cang commented on HBASE-22762: - Patch is ready, output looks like below: 1091 2019-07-30 15:01:40,620 INFO [RS:0;xcang-ltm:62046-splits-1564524100198] regionserver.SplitRequest(143): Split transaction journal: 1092 STARTED at 1564524100204 (5ms) 1093 PREPARED at 1564524100209 (2ms) 1094 BEFORE_PRE_SPLIT_HOOK at 1564524100211 (0ms) 1095 AFTER_PRE_SPLIT_HOOK at 1564524100211 (2ms) 1096 SET_SPLITTING at 1564524100213 (106ms) 1097 CREATE_SPLIT_DIR at 1564524100319 (30ms) 1098 CLOSED_PARENT_REGION at 1564524100349 (0ms) 1099 OFFLINED_PARENT at 1564524100349 (40ms) 1100 STARTED_REGION_A_CREATION at 1564524100389 (20ms) 1101 STARTED_REGION_B_CREATION at 1564524100409 (14ms) 1102 PONR at 1564524100423 (81ms) 1103 OPENED_REGION_A at 1564524100504 (0ms) 1104 OPENED_REGION_B at 1564524100504 (114ms) 1105 BEFORE_POST_SPLIT_HOOK at 1564524100618 (0ms) 1106 AFTER_POST_SPLIT_HOOK at 1564524100618 (0ms) 1107 COMPLETED at 1564524100618 > Print the delta between phases in the split/merge/compact/flush transaction > journals > > > Key: HBASE-22762 > URL: https://issues.apache.org/jira/browse/HBASE-22762 > Project: HBase > Issue Type: Improvement > Components: logging >Reporter: Andrew Purtell >Assignee: Xu Cang >Priority: Minor > > We print the start timestamp for each phase when logging the > split/merge/compact/flush transaction journals and so when debugging an > operator must do the math by hand. It would be trivial to also print the > delta from the start timestamp of the previous phase and helpful to do so. -- This message was sent by Atlassian JIRA (v7.6.14#76016)
[jira] [Assigned] (HBASE-22762) Print the delta between phases in the split/merge/compact/flush transaction journals
[ https://issues.apache.org/jira/browse/HBASE-22762?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Xu Cang reassigned HBASE-22762: --- Assignee: Xu Cang > Print the delta between phases in the split/merge/compact/flush transaction > journals > > > Key: HBASE-22762 > URL: https://issues.apache.org/jira/browse/HBASE-22762 > Project: HBase > Issue Type: Improvement > Components: logging >Reporter: Andrew Purtell >Assignee: Xu Cang >Priority: Minor > > We print the start timestamp for each phase when logging the > split/merge/compact/flush transaction journals and so when debugging an > operator must do the math by hand. It would be trivial to also print the > delta from the start timestamp of the previous phase and helpful to do so. -- This message was sent by Atlassian JIRA (v7.6.14#76016)
[jira] [Comment Edited] (HBASE-22639) Unexpected split when a big table has only one region on a regionServer
[ https://issues.apache.org/jira/browse/HBASE-22639?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16886604#comment-16886604 ] Xu Cang edited comment on HBASE-22639 at 7/17/19 1:30 AM: -- I don't quite catch what you meant here "don't split the region when table is big enough even it is the only one on a node." Can you please elaborate why you think this is a bad thing? Why is splitting "a big enough' table a bad thing? Thanks! was (Author: xucang): I don't quite catch what you meant here "don't split the region when table is big enough even it is the only one on a node." Can you please elaborate why you think this is a bad thing? Thanks! > Unexpected split when a big table has only one region on a regionServer > > > Key: HBASE-22639 > URL: https://issues.apache.org/jira/browse/HBASE-22639 > Project: HBase > Issue Type: Improvement > Components: regionserver >Affects Versions: 2.0.0 >Reporter: Zheng Wang >Priority: Minor > > I am using the default policy named SteppingSplitPolicy. > If restart some nodes,it may occur,because this policy didnot judge if the > table is big enough actually. > It brings some unexpected small regions. -- This message was sent by Atlassian JIRA (v7.6.14#76016)
[jira] [Commented] (HBASE-22639) Unexpected split when a big table has only one region on a regionServer
[ https://issues.apache.org/jira/browse/HBASE-22639?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16886604#comment-16886604 ] Xu Cang commented on HBASE-22639: - I don't quite catch what you meant here "don't split the region when table is big enough even it is the only one on a node." Can you please elaborate why you think this is a bad thing? Thanks! > Unexpected split when a big table has only one region on a regionServer > > > Key: HBASE-22639 > URL: https://issues.apache.org/jira/browse/HBASE-22639 > Project: HBase > Issue Type: Improvement > Components: regionserver >Affects Versions: 2.0.0 >Reporter: Zheng Wang >Priority: Minor > > I am using the default policy named SteppingSplitPolicy. > If restart some nodes,it may occur,because this policy didnot judge if the > table is big enough actually. > It brings some unexpected small regions. -- This message was sent by Atlassian JIRA (v7.6.14#76016)
[jira] [Commented] (HBASE-22697) when RegionServerStoppedException is received, the client should clear meta cache
[ https://issues.apache.org/jira/browse/HBASE-22697?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16886601#comment-16886601 ] Xu Cang commented on HBASE-22697: - Is HBASE-4412 's idea similar to what you proposed? > when RegionServerStoppedException is received, the client should clear meta > cache > -- > > Key: HBASE-22697 > URL: https://issues.apache.org/jira/browse/HBASE-22697 > Project: HBase > Issue Type: Improvement >Reporter: Junhong Xu >Assignee: Junhong Xu >Priority: Major > > but now it will retry utill exhausted -- This message was sent by Atlassian JIRA (v7.6.14#76016)
[jira] [Commented] (HBASE-22698) [hbase-connectors] Add license header to README.md
[ https://issues.apache.org/jira/browse/HBASE-22698?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16886600#comment-16886600 ] Xu Cang commented on HBASE-22698: - +1 Since it's merged, closing this JIRA? :) > [hbase-connectors] Add license header to README.md > -- > > Key: HBASE-22698 > URL: https://issues.apache.org/jira/browse/HBASE-22698 > Project: HBase > Issue Type: Task > Components: hbase-connectors >Affects Versions: connector-1.0.0 >Reporter: Balazs Meszaros >Assignee: Balazs Meszaros >Priority: Major > > Build https://builds.apache.org/job/PreCommit-HBASE-CONNECTORS-Build/55/ > failed because {{spark/hbase-spark/README.md}} does not have license header. > {noformat} > Lines that start with ? in the ASF License report indicate files that do > not have an Apache license header: > !? /testptch/hbase-connectors/spark/hbase-spark/README.md > {noformat} -- This message was sent by Atlassian JIRA (v7.6.14#76016)
[jira] [Commented] (HBASE-22699) refactor isMetaClearingException
[ https://issues.apache.org/jira/browse/HBASE-22699?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16886598#comment-16886598 ] Xu Cang commented on HBASE-22699: - Agreed, just by reading this method, it's kind of cryptic. Will review your patch when it's ready. Thanks [~Joseph295] > refactor isMetaClearingException > > > Key: HBASE-22699 > URL: https://issues.apache.org/jira/browse/HBASE-22699 > Project: HBase > Issue Type: Improvement >Reporter: Junhong Xu >Assignee: Junhong Xu >Priority: Minor > > It is not so readable -- This message was sent by Atlassian JIRA (v7.6.14#76016)
[jira] [Commented] (HBASE-22630) Restore TestReplicationDroppedTables coverage to branch-1
[ https://issues.apache.org/jira/browse/HBASE-22630?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16885746#comment-16885746 ] Xu Cang commented on HBASE-22630: - ACK, will work on it. > Restore TestReplicationDroppedTables coverage to branch-1 > - > > Key: HBASE-22630 > URL: https://issues.apache.org/jira/browse/HBASE-22630 > Project: HBase > Issue Type: Sub-task >Reporter: Andrew Purtell >Priority: Major > Fix For: 1.6.0 > > > TestReplicationDroppedTables was dropped from branch-1. Restore the test > coverage with a test that is not flaky. -- This message was sent by Atlassian JIRA (v7.6.14#76016)
[jira] [Commented] (HBASE-22640) Random init hstore lastFlushTime
[ https://issues.apache.org/jira/browse/HBASE-22640?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16885744#comment-16885744 ] Xu Cang commented on HBASE-22640: - Nice idea. I'd suggest adding one line comment above your change to explain the rationale. > Random init hstore lastFlushTime > - > > Key: HBASE-22640 > URL: https://issues.apache.org/jira/browse/HBASE-22640 > Project: HBase > Issue Type: Improvement >Reporter: Bing Xiao >Assignee: Bing Xiao >Priority: Major > Fix For: 3.0.0, 2.2.1 > > Attachments: HBASE-22640-master-v1.patch > > > During with open region use current time as hstore last flush time, and no > mush data put cause memstore flush, after flushCheckInterval all memstore > will flush together bring concentrated IO and compaction make high request > latency;So random init lastFlushTime -- This message was sent by Atlassian JIRA (v7.6.14#76016)
[jira] [Comment Edited] (HBASE-22642) Make move operations of RSGroup idempotent
[ https://issues.apache.org/jira/browse/HBASE-22642?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16885741#comment-16885741 ] Xu Cang edited comment on HBASE-22642 at 7/16/19 1:03 AM: -- This is an interesting angle to look at this issue. thanks. And I don't see any other overheads beside the one you mentioned. [~Xiaolin Ha] Can you fix the conflict and let hadoopQA run it. was (Author: xucang): this is an interesting angle to look at this issue. thanks. can you elaborate on this comment "repeatedly moving tables/servers to a group might not make regions be moved repeatedly,"? [~Xiaolin Ha] > Make move operations of RSGroup idempotent > -- > > Key: HBASE-22642 > URL: https://issues.apache.org/jira/browse/HBASE-22642 > Project: HBase > Issue Type: Bug > Components: rsgroup >Reporter: Xiaolin Ha >Assignee: Xiaolin Ha >Priority: Major > > Currently, when moving tables or servers to a group, only groupInfo is > checked. And in RSGroup implementation, groupinfo is written to disk before > regions movements are done. If there are some problems caused move regions > abort, some regions will be on wrong regionservers. What's the worse, retry > the move operation will be rejected because of the correct groupinfo. > We think when moving, not only groupInfo should be checked, but also relevant > region assignments should be checked and corrected. -- This message was sent by Atlassian JIRA (v7.6.14#76016)
[jira] [Commented] (HBASE-22642) Make move operations of RSGroup idempotent
[ https://issues.apache.org/jira/browse/HBASE-22642?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16885741#comment-16885741 ] Xu Cang commented on HBASE-22642: - this is an interesting angle to look at this issue. thanks. can you elaborate on this comment "repeatedly moving tables/servers to a group might not make regions be moved repeatedly,"? [~Xiaolin Ha] > Make move operations of RSGroup idempotent > -- > > Key: HBASE-22642 > URL: https://issues.apache.org/jira/browse/HBASE-22642 > Project: HBase > Issue Type: Bug > Components: rsgroup >Reporter: Xiaolin Ha >Assignee: Xiaolin Ha >Priority: Major > > Currently, when moving tables or servers to a group, only groupInfo is > checked. And in RSGroup implementation, groupinfo is written to disk before > regions movements are done. If there are some problems caused move regions > abort, some regions will be on wrong regionservers. What's the worse, retry > the move operation will be rejected because of the correct groupinfo. > We think when moving, not only groupInfo should be checked, but also relevant > region assignments should be checked and corrected. -- This message was sent by Atlassian JIRA (v7.6.14#76016)
[jira] [Commented] (HBASE-22644) if region split fails, the directories of daughterRegions will not be deleted
[ https://issues.apache.org/jira/browse/HBASE-22644?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16885734#comment-16885734 ] Xu Cang commented on HBASE-22644: - Agree with [~stack] that we should examine all possibilities that come to this DNRIOE. And at the same time, can you generate a patch and submit patch to this Jira to let HADOOP-QA run? [~Bo Cui] thanks! > if region split fails, the directories of daughterRegions will not be deleted > - > > Key: HBASE-22644 > URL: https://issues.apache.org/jira/browse/HBASE-22644 > Project: HBase > Issue Type: Bug >Affects Versions: 1.3.1 >Reporter: Bo Cui >Priority: Critical > Attachments: HBaseFsck.PNG, SplitTransactionImpl.PNG, log.PNG, split > code.PNG > > > if SplitTransactionImpl#createDaughters throws DoNotRetryIOException, and > regionserver is stopping,the directories of daughterRegions will not be > deleted. > !split code.PNG! > the rs log information > !log.PNG! -- This message was sent by Atlassian JIRA (v7.6.14#76016)
[jira] [Commented] (HBASE-22650) NPE in AssignmentManager (master crash on startup)
[ https://issues.apache.org/jira/browse/HBASE-22650?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16885727#comment-16885727 ] Xu Cang commented on HBASE-22650: - [~sveyrie] good catch. Can you please rename patch name to HBASE-22650.branch-1.001.patch and re-submit by clicking the "submit patch" button to trigger Hadoop-QA. Thanks. > NPE in AssignmentManager (master crash on startup) > -- > > Key: HBASE-22650 > URL: https://issues.apache.org/jira/browse/HBASE-22650 > Project: HBase > Issue Type: Bug > Components: master >Affects Versions: 1.2.12, 1.3.5 >Reporter: Sylvain Veyrié >Priority: Critical > Labels: patch > Attachments: AssignmentManager-NPE.patch > > > On HMaster Startup: > > {quote}2019-07-02 12:38:11,312 FATAL [orc3:16000.activeMasterManager] > master.HMaster: Failed to become active master > java.lang.NullPointerException > at java.util.concurrent.ConcurrentHashMap.get(ConcurrentHashMap.java:936) > at > java.util.concurrent.ConcurrentHashMap.containsKey(ConcurrentHashMap.java:964) > at > java.util.concurrent.ConcurrentHashMap$KeySetView.contains(ConcurrentHashMap.java:4558) > at > java.util.Collections$UnmodifiableCollection.contains(Collections.java:1032) > at > org.apache.hadoop.hbase.master.AssignmentManager.rebuildUserRegions(AssignmentManager.java:3094) > at > org.apache.hadoop.hbase.master.AssignmentManager.joinCluster(AssignmentManager.java:495) > at > org.apache.hadoop.hbase.master.HMaster.finishActiveMasterInitialization(HMaster.java:830) > at org.apache.hadoop.hbase.master.HMaster.access$600(HMaster.java:202) > at org.apache.hadoop.hbase.master.HMaster$2.run(HMaster.java:1883) > at java.lang.Thread.run(Thread.java:748) > 2019-07-02 12:38:11,312 FATAL [orc3:16000.activeMasterManager] > master.HMaster: Master server abort: loaded coprocessors are: [] > 2019-07-02 12:38:11,312 FATAL [orc3:16000.activeMasterManager] > master.HMaster: Unhandled exception. Starting shutdown. > java.lang.NullPointerException > at java.util.concurrent.ConcurrentHashMap.get(ConcurrentHashMap.java:936) > at > java.util.concurrent.ConcurrentHashMap.containsKey(ConcurrentHashMap.java:964) > at > java.util.concurrent.ConcurrentHashMap$KeySetView.contains(ConcurrentHashMap.java:4558) > at > java.util.Collections$UnmodifiableCollection.contains(Collections.java:1032) > at > org.apache.hadoop.hbase.master.AssignmentManager.rebuildUserRegions(AssignmentManager.java:3094) > at > org.apache.hadoop.hbase.master.AssignmentManager.joinCluster(AssignmentManager.java:495) > at > org.apache.hadoop.hbase.master.HMaster.finishActiveMasterInitialization(HMaster.java:830) > at org.apache.hadoop.hbase.master.HMaster.access$600(HMaster.java:202) > at org.apache.hadoop.hbase.master.HMaster$2.run(HMaster.java:1883) > at java.lang.Thread.run(Thread.java:748) > {quote} > It happens when regionLocation is null, which may happen just above on line > 3086 (or as returned by getRegionServer) > We had this on 1.2.12 with the corresponding patch, but since it is not > supported anymore, did not submit it. > Attached is the patch for 1.3.5. Did not test it in 1.4+ > > > -- This message was sent by Atlassian JIRA (v7.6.14#76016)
[jira] [Commented] (HBASE-21606) Document use of the meta table load metrics added in HBASE-19722
[ https://issues.apache.org/jira/browse/HBASE-21606?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16881423#comment-16881423 ] Xu Cang commented on HBASE-21606: - Will do [~symat] thanks! > Document use of the meta table load metrics added in HBASE-19722 > > > Key: HBASE-21606 > URL: https://issues.apache.org/jira/browse/HBASE-21606 > Project: HBase > Issue Type: Task > Components: documentation, meta, metrics, Operability >Affects Versions: 3.0.0, 1.5.0, 1.4.6, 2.2.0, 2.0.2, 2.1.3 >Reporter: Sean Busbey >Assignee: Szalay-Beko Mate >Priority: Critical > Attachments: HBASE-21606-v1.png > > > HBASE-19722 added a great new tool for figuring out where cluster load is > coming from. Needs a section in the ref guide > * When should I use this? > * Why shouldn't I use it all the time? > * What does using it look like? > * How do I use it? > I think all the needed info for making something to answer these questions is > in the discussion on HBASE-19722 -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Comment Edited] (HBASE-22349) Stochastic Load Balancer skips balancing when node is replaced in cluster
[ https://issues.apache.org/jira/browse/HBASE-22349?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16875277#comment-16875277 ] Xu Cang edited comment on HBASE-22349 at 6/28/19 11:28 PM: --- This is a very good observation. One of my co-worker observed and debugged the similar issue in our environment. If we don't want RS holds 0 regions, maybe besides tweaking 'minCostNeedBalance', we can introduce a rule that when RS holds 0 region, it triggers balancing regardless. Or, we can adjust cost() for this class : static class PrimaryRegionCountSkewCostFunction to make this factor impacting more than others? was (Author: xucang): This is a very good observation. One of my co-worker observed and debugged the similar issue in our environment. Obviously we don't want RS holds 0 regions and LB still think it is 'balanced'. Besides tweaking 'minCostNeedBalance', maybe we can introduce a rule that when RS holds 0 region, it sill trigger balancing regardless. Or, we can adjust cost() for this class : static class PrimaryRegionCountSkewCostFunction to make this factor impacting more than others? > Stochastic Load Balancer skips balancing when node is replaced in cluster > - > > Key: HBASE-22349 > URL: https://issues.apache.org/jira/browse/HBASE-22349 > Project: HBase > Issue Type: Bug >Affects Versions: 3.0.0, 1.3.0, 1.4.4, 2.0.0 >Reporter: Suthan Phillips >Priority: Major > Attachments: Hbase-22349.pdf > > > In EMR cluster, whenever I replace one of the nodes, the regions never get > rebalanced. > The default minCostNeedBalance set to 0.05 is too high. > The region count on the servers were: 21, 21, 20, 20, 20, 20, 21, 20, 20, 20 > = 203 > Once a node(region server) got replaced with a new node (terminated and EMR > recreated a node), the region count on the servers became: 23, 0, 23, 22, 22, > 22, 22, 23, 23, 23 = 203 > From hbase-master-logs, I can see the below WARN which indicates that the > default minCostNeedBalance does not hold good for these scenarios. > ## > 2019-04-29 09:31:37,027 WARN > [ip-172-31-35-122.ec2.internal,16000,1556524892897_ChoreService_1] > cleaner.CleanerChore: WALs outstanding under > hdfs://ip-172-31-35-122.ec2.internal:8020/user/hbase/oldWALs2019-04-29 > 09:31:42,920 INFO > [ip-172-31-35-122.ec2.internal,16000,1556524892897_ChoreService_1] > balancer.StochasticLoadBalancer: Skipping load balancing because balanced > cluster; total cost is 52.041826194833405, sum multiplier is 1102.0 min cost > which need balance is 0.05 > ## > To mitigate this, I had to modify the default minCostNeedBalance to lower > value like 0.01f and restart Region Servers and Hbase Master. After modifying > this value to 0.01f I could see the regions getting re-balanced. > This has led me to the following questions which I would like to get it > answered from the HBase experts. > 1)What are the factors that affect the value of total cost and sum > multiplier? How could we determine the right minCostNeedBalance value for any > cluster? > 2)How did Hbase arrive at setting the default value to 0.05f? Is it optimal > value? If yes, then what is the recommended way to mitigate this scenario? > Attached: Steps to reproduce > > Note: HBase-17565 patch is already applied. -- This message was sent by Atlassian JIRA (v7.6.3#76005)