Re: [PR] HBASE-28064:Implement truncate_region command [hbase]

2023-10-16 Thread via GitHub


Apache-HBase commented on PR #5462:
URL: https://github.com/apache/hbase/pull/5462#issuecomment-1765748929

   :broken_heart: **-1 overall**
   
   
   
   
   
   
   | Vote | Subsystem | Runtime | Comment |
   |::|--:|:|:|
   | +0 :ok: |  reexec  |   0m 49s |  Docker mode activated.  |
   | -0 :warning: |  yetus  |   0m  2s |  Unprocessed flag(s): 
--brief-report-file --spotbugs-strict-precheck --whitespace-eol-ignore-list 
--whitespace-tabs-ignore-list --quick-hadoopcheck  |
   ||| _ Prechecks _ |
   ||| _ master Compile Tests _ |
   | +0 :ok: |  mvndep  |   0m 17s |  Maven dependency ordering for branch  |
   | +1 :green_heart: |  mvninstall  |   3m 43s |  master passed  |
   | +1 :green_heart: |  compile  |   2m 16s |  master passed  |
   | +1 :green_heart: |  shadedjars  |   6m  2s |  branch has no errors when 
building our shaded downstream artifacts.  |
   | +1 :green_heart: |  javadoc  |   1m 40s |  master passed  |
   ||| _ Patch Compile Tests _ |
   | +0 :ok: |  mvndep  |   0m 11s |  Maven dependency ordering for patch  |
   | +1 :green_heart: |  mvninstall  |   2m 44s |  the patch passed  |
   | +1 :green_heart: |  compile  |   2m 13s |  the patch passed  |
   | +1 :green_heart: |  javac  |   2m 13s |  the patch passed  |
   | +1 :green_heart: |  shadedjars  |   5m 44s |  patch has no errors when 
building our shaded downstream artifacts.  |
   | +1 :green_heart: |  javadoc  |   1m 24s |  the patch passed  |
   ||| _ Other Tests _ |
   | +1 :green_heart: |  unit  |   0m 27s |  hbase-protocol-shaded in the patch 
passed.  |
   | +1 :green_heart: |  unit  |   1m 28s |  hbase-client in the patch passed.  
|
   | -1 :x: |  unit  | 223m 21s |  hbase-server in the patch failed.  |
   | +1 :green_heart: |  unit  |   6m 37s |  hbase-thrift in the patch passed.  
|
   | +1 :green_heart: |  unit  |   7m 41s |  hbase-shell in the patch passed.  |
   |  |   | 271m 45s |   |
   
   
   | Subsystem | Report/Notes |
   |--:|:-|
   | Docker | ClientAPI=1.43 ServerAPI=1.43 base: 
https://ci-hbase.apache.org/job/HBase-PreCommit-GitHub-PR/job/PR-5462/3/artifact/yetus-jdk8-hadoop3-check/output/Dockerfile
 |
   | GITHUB PR | https://github.com/apache/hbase/pull/5462 |
   | JIRA Issue | HBASE-28064 |
   | Optional Tests | javac javadoc unit shadedjars compile |
   | uname | Linux 6d9597def5b8 5.4.0-1103-aws #111~18.04.1-Ubuntu SMP Tue May 
23 20:04:10 UTC 2023 x86_64 x86_64 x86_64 GNU/Linux |
   | Build tool | maven |
   | Personality | dev-support/hbase-personality.sh |
   | git revision | master / 391dfda6ad |
   | Default Java | Temurin-1.8.0_352-b08 |
   | unit | 
https://ci-hbase.apache.org/job/HBase-PreCommit-GitHub-PR/job/PR-5462/3/artifact/yetus-jdk8-hadoop3-check/output/patch-unit-hbase-server.txt
 |
   |  Test Results | 
https://ci-hbase.apache.org/job/HBase-PreCommit-GitHub-PR/job/PR-5462/3/testReport/
 |
   | Max. process+thread count | 4632 (vs. ulimit of 3) |
   | modules | C: hbase-protocol-shaded hbase-client hbase-server hbase-thrift 
hbase-shell U: . |
   | Console output | 
https://ci-hbase.apache.org/job/HBase-PreCommit-GitHub-PR/job/PR-5462/3/console 
|
   | versions | git=2.34.1 maven=3.8.6 |
   | Powered by | Apache Yetus 0.12.0 https://yetus.apache.org |
   
   
   This message was automatically generated.
   
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@hbase.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [PR] HBASE-28158 Decouple RIT list management from TRSP invocation [hbase]

2023-10-16 Thread via GitHub


Apache-HBase commented on PR #5464:
URL: https://github.com/apache/hbase/pull/5464#issuecomment-1765745358

   :broken_heart: **-1 overall**
   
   
   
   
   
   
   | Vote | Subsystem | Runtime | Comment |
   |::|--:|:|:|
   | +0 :ok: |  reexec  |   0m 49s |  Docker mode activated.  |
   | -0 :warning: |  yetus  |   0m  3s |  Unprocessed flag(s): 
--brief-report-file --spotbugs-strict-precheck --whitespace-eol-ignore-list 
--whitespace-tabs-ignore-list --quick-hadoopcheck  |
   ||| _ Prechecks _ |
   ||| _ master Compile Tests _ |
   | +1 :green_heart: |  mvninstall  |   4m  3s |  master passed  |
   | +1 :green_heart: |  compile  |   1m  4s |  master passed  |
   | +1 :green_heart: |  shadedjars  |   6m  0s |  branch has no errors when 
building our shaded downstream artifacts.  |
   | +1 :green_heart: |  javadoc  |   0m 32s |  master passed  |
   ||| _ Patch Compile Tests _ |
   | +1 :green_heart: |  mvninstall  |   3m 27s |  the patch passed  |
   | +1 :green_heart: |  compile  |   1m  9s |  the patch passed  |
   | +1 :green_heart: |  javac  |   1m  9s |  the patch passed  |
   | +1 :green_heart: |  shadedjars  |   6m 21s |  patch has no errors when 
building our shaded downstream artifacts.  |
   | +1 :green_heart: |  javadoc  |   0m 35s |  the patch passed  |
   ||| _ Other Tests _ |
   | -1 :x: |  unit  | 240m  1s |  hbase-server in the patch failed.  |
   |  |   | 268m 27s |   |
   
   
   | Subsystem | Report/Notes |
   |--:|:-|
   | Docker | ClientAPI=1.43 ServerAPI=1.43 base: 
https://ci-hbase.apache.org/job/HBase-PreCommit-GitHub-PR/job/PR-5464/1/artifact/yetus-jdk11-hadoop3-check/output/Dockerfile
 |
   | GITHUB PR | https://github.com/apache/hbase/pull/5464 |
   | Optional Tests | javac javadoc unit shadedjars compile |
   | uname | Linux 5dc037f38998 5.4.0-163-generic #180-Ubuntu SMP Tue Sep 5 
13:21:23 UTC 2023 x86_64 x86_64 x86_64 GNU/Linux |
   | Build tool | maven |
   | Personality | dev-support/hbase-personality.sh |
   | git revision | master / 391dfda6ad |
   | Default Java | Eclipse Adoptium-11.0.17+8 |
   | unit | 
https://ci-hbase.apache.org/job/HBase-PreCommit-GitHub-PR/job/PR-5464/1/artifact/yetus-jdk11-hadoop3-check/output/patch-unit-hbase-server.txt
 |
   |  Test Results | 
https://ci-hbase.apache.org/job/HBase-PreCommit-GitHub-PR/job/PR-5464/1/testReport/
 |
   | Max. process+thread count | 4250 (vs. ulimit of 3) |
   | modules | C: hbase-server U: hbase-server |
   | Console output | 
https://ci-hbase.apache.org/job/HBase-PreCommit-GitHub-PR/job/PR-5464/1/console 
|
   | versions | git=2.34.1 maven=3.8.6 |
   | Powered by | Apache Yetus 0.12.0 https://yetus.apache.org |
   
   
   This message was automatically generated.
   
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@hbase.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [PR] HBASE-28064:Implement truncate_region command [hbase]

2023-10-16 Thread via GitHub


Apache-HBase commented on PR #5462:
URL: https://github.com/apache/hbase/pull/5462#issuecomment-1765742128

   :broken_heart: **-1 overall**
   
   
   
   
   
   
   | Vote | Subsystem | Runtime | Comment |
   |::|--:|:|:|
   | +0 :ok: |  reexec  |   0m 31s |  Docker mode activated.  |
   | -0 :warning: |  yetus  |   0m  3s |  Unprocessed flag(s): 
--brief-report-file --spotbugs-strict-precheck --whitespace-eol-ignore-list 
--whitespace-tabs-ignore-list --quick-hadoopcheck  |
   ||| _ Prechecks _ |
   ||| _ master Compile Tests _ |
   | +0 :ok: |  mvndep  |   0m 13s |  Maven dependency ordering for branch  |
   | +1 :green_heart: |  mvninstall  |   2m 37s |  master passed  |
   | +1 :green_heart: |  compile  |   2m 23s |  master passed  |
   | +1 :green_heart: |  shadedjars  |   4m 50s |  branch has no errors when 
building our shaded downstream artifacts.  |
   | +1 :green_heart: |  javadoc  |   1m 33s |  master passed  |
   ||| _ Patch Compile Tests _ |
   | +0 :ok: |  mvndep  |   0m 13s |  Maven dependency ordering for patch  |
   | +1 :green_heart: |  mvninstall  |   2m 40s |  the patch passed  |
   | +1 :green_heart: |  compile  |   2m 24s |  the patch passed  |
   | +1 :green_heart: |  javac  |   2m 24s |  the patch passed  |
   | +1 :green_heart: |  shadedjars  |   4m 48s |  patch has no errors when 
building our shaded downstream artifacts.  |
   | +1 :green_heart: |  javadoc  |   1m 36s |  the patch passed  |
   ||| _ Other Tests _ |
   | +1 :green_heart: |  unit  |   0m 38s |  hbase-protocol-shaded in the patch 
passed.  |
   | +1 :green_heart: |  unit  |   1m 29s |  hbase-client in the patch passed.  
|
   | -1 :x: |  unit  | 218m 43s |  hbase-server in the patch failed.  |
   | +1 :green_heart: |  unit  |   8m 34s |  hbase-thrift in the patch passed.  
|
   | +1 :green_heart: |  unit  |   7m  4s |  hbase-shell in the patch passed.  |
   |  |   | 265m 31s |   |
   
   
   | Subsystem | Report/Notes |
   |--:|:-|
   | Docker | ClientAPI=1.43 ServerAPI=1.43 base: 
https://ci-hbase.apache.org/job/HBase-PreCommit-GitHub-PR/job/PR-5462/3/artifact/yetus-jdk11-hadoop3-check/output/Dockerfile
 |
   | GITHUB PR | https://github.com/apache/hbase/pull/5462 |
   | JIRA Issue | HBASE-28064 |
   | Optional Tests | javac javadoc unit shadedjars compile |
   | uname | Linux 1b5f3089cbd8 5.4.0-156-generic #173-Ubuntu SMP Tue Jul 11 
07:25:22 UTC 2023 x86_64 x86_64 x86_64 GNU/Linux |
   | Build tool | maven |
   | Personality | dev-support/hbase-personality.sh |
   | git revision | master / 391dfda6ad |
   | Default Java | Eclipse Adoptium-11.0.17+8 |
   | unit | 
https://ci-hbase.apache.org/job/HBase-PreCommit-GitHub-PR/job/PR-5462/3/artifact/yetus-jdk11-hadoop3-check/output/patch-unit-hbase-server.txt
 |
   |  Test Results | 
https://ci-hbase.apache.org/job/HBase-PreCommit-GitHub-PR/job/PR-5462/3/testReport/
 |
   | Max. process+thread count | 4248 (vs. ulimit of 3) |
   | modules | C: hbase-protocol-shaded hbase-client hbase-server hbase-thrift 
hbase-shell U: . |
   | Console output | 
https://ci-hbase.apache.org/job/HBase-PreCommit-GitHub-PR/job/PR-5462/3/console 
|
   | versions | git=2.34.1 maven=3.8.6 |
   | Powered by | Apache Yetus 0.12.0 https://yetus.apache.org |
   
   
   This message was automatically generated.
   
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@hbase.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [PR] HBASE-28158 Decouple RIT list management from TRSP invocation [hbase]

2023-10-16 Thread via GitHub


Apache-HBase commented on PR #5464:
URL: https://github.com/apache/hbase/pull/5464#issuecomment-1765725078

   :broken_heart: **-1 overall**
   
   
   
   
   
   
   | Vote | Subsystem | Runtime | Comment |
   |::|--:|:|:|
   | +0 :ok: |  reexec  |   0m 30s |  Docker mode activated.  |
   | -0 :warning: |  yetus  |   0m  3s |  Unprocessed flag(s): 
--brief-report-file --spotbugs-strict-precheck --whitespace-eol-ignore-list 
--whitespace-tabs-ignore-list --quick-hadoopcheck  |
   ||| _ Prechecks _ |
   ||| _ master Compile Tests _ |
   | +1 :green_heart: |  mvninstall  |   2m 32s |  master passed  |
   | +1 :green_heart: |  compile  |   0m 36s |  master passed  |
   | +1 :green_heart: |  shadedjars  |   5m  6s |  branch has no errors when 
building our shaded downstream artifacts.  |
   | +1 :green_heart: |  javadoc  |   0m 24s |  master passed  |
   ||| _ Patch Compile Tests _ |
   | +1 :green_heart: |  mvninstall  |   2m 11s |  the patch passed  |
   | +1 :green_heart: |  compile  |   0m 36s |  the patch passed  |
   | +1 :green_heart: |  javac  |   0m 36s |  the patch passed  |
   | +1 :green_heart: |  shadedjars  |   5m  6s |  patch has no errors when 
building our shaded downstream artifacts.  |
   | +1 :green_heart: |  javadoc  |   0m 22s |  the patch passed  |
   ||| _ Other Tests _ |
   | -1 :x: |  unit  | 226m 57s |  hbase-server in the patch failed.  |
   |  |   | 248m 21s |   |
   
   
   | Subsystem | Report/Notes |
   |--:|:-|
   | Docker | ClientAPI=1.43 ServerAPI=1.43 base: 
https://ci-hbase.apache.org/job/HBase-PreCommit-GitHub-PR/job/PR-5464/1/artifact/yetus-jdk8-hadoop3-check/output/Dockerfile
 |
   | GITHUB PR | https://github.com/apache/hbase/pull/5464 |
   | Optional Tests | javac javadoc unit shadedjars compile |
   | uname | Linux 6bb6c17b8131 5.4.0-1103-aws #111~18.04.1-Ubuntu SMP Tue May 
23 20:04:10 UTC 2023 x86_64 x86_64 x86_64 GNU/Linux |
   | Build tool | maven |
   | Personality | dev-support/hbase-personality.sh |
   | git revision | master / 391dfda6ad |
   | Default Java | Temurin-1.8.0_352-b08 |
   | unit | 
https://ci-hbase.apache.org/job/HBase-PreCommit-GitHub-PR/job/PR-5464/1/artifact/yetus-jdk8-hadoop3-check/output/patch-unit-hbase-server.txt
 |
   |  Test Results | 
https://ci-hbase.apache.org/job/HBase-PreCommit-GitHub-PR/job/PR-5464/1/testReport/
 |
   | Max. process+thread count | 4636 (vs. ulimit of 3) |
   | modules | C: hbase-server U: hbase-server |
   | Console output | 
https://ci-hbase.apache.org/job/HBase-PreCommit-GitHub-PR/job/PR-5464/1/console 
|
   | versions | git=2.34.1 maven=3.8.6 |
   | Powered by | Apache Yetus 0.12.0 https://yetus.apache.org |
   
   
   This message was automatically generated.
   
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@hbase.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[jira] [Updated] (HBASE-28064) Implement truncate_region command to truncate region directly from FS

2023-10-16 Thread Duo Zhang (Jira)


 [ 
https://issues.apache.org/jira/browse/HBASE-28064?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Duo Zhang updated HBASE-28064:
--
Component/s: Admin
 shell

> Implement truncate_region command to truncate region directly from FS
> -
>
> Key: HBASE-28064
> URL: https://issues.apache.org/jira/browse/HBASE-28064
> Project: HBase
>  Issue Type: New Feature
>  Components: Admin, shell
>Reporter: Ankit Singhal
>Assignee: Vaibhav Joshi
>Priority: Major
>
> One of our users has brought up a use-case where they need to truncate a 
> region to delete data within a specific range. There are two scenarios to 
> consider:
> * In the first scenario, the region boundaries involve a time range defined 
> through pre-splitting, and user is looking to efficiently clean old date 
> data. If HBase can directly truncate the region from the file system and then 
> the user can merge the empty region with adjacent regions to effectively 
> eliminate it which will be more optimized compared to deleting the data using 
> Delete API.
> * In another case, if the HFile for that region becomes corrupted for some 
> reason, user want to get rid of the HFile and reload the entire region to 
> avoid consistency issues and ensure performance.
> we can do this by taking the region offline and taking write lock to avoid 
> the consideration of race conditions involving Region In Transition (RITs), 
> region re-opening, and merge/split scenarios. 



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


Re: [PR] HBASE-28158 Decouple RIT list management from TRSP invocation [hbase]

2023-10-16 Thread via GitHub


apurtell commented on PR #5464:
URL: https://github.com/apache/hbase/pull/5464#issuecomment-1765654611

   Based on my local tests there are three test failures to resolve. 
   
   This first one is a false positive:
   - TestAssignmentManagerUtil.testCreateUnassignProceduresForMergeFail
   
   These two are timeouts, so waiters need changes:
   - TestAssignmentManagerMetrics.testRITAssignmentManagerMetrics
   - TestRegionReplicaSplit.testAssignFakeReplicaRegion
   
   I think these are test issues but will look in detail tomorrow.
   Let's see if precommit here has more. 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@hbase.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[jira] [Updated] (HBASE-28064) Implement truncate_region command to truncate region directly from FS

2023-10-16 Thread Vaibhav Joshi (Jira)


 [ 
https://issues.apache.org/jira/browse/HBASE-28064?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Vaibhav Joshi updated HBASE-28064:
--
Status: Patch Available  (was: In Progress)

> Implement truncate_region command to truncate region directly from FS
> -
>
> Key: HBASE-28064
> URL: https://issues.apache.org/jira/browse/HBASE-28064
> Project: HBase
>  Issue Type: New Feature
>Reporter: Ankit Singhal
>Assignee: Vaibhav Joshi
>Priority: Major
>
> One of our users has brought up a use-case where they need to truncate a 
> region to delete data within a specific range. There are two scenarios to 
> consider:
> * In the first scenario, the region boundaries involve a time range defined 
> through pre-splitting, and user is looking to efficiently clean old date 
> data. If HBase can directly truncate the region from the file system and then 
> the user can merge the empty region with adjacent regions to effectively 
> eliminate it which will be more optimized compared to deleting the data using 
> Delete API.
> * In another case, if the HFile for that region becomes corrupted for some 
> reason, user want to get rid of the HFile and reload the entire region to 
> avoid consistency issues and ensure performance.
> we can do this by taking the region offline and taking write lock to avoid 
> the consideration of race conditions involving Region In Transition (RITs), 
> region re-opening, and merge/split scenarios. 



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Work started] (HBASE-28064) Implement truncate_region command to truncate region directly from FS

2023-10-16 Thread Vaibhav Joshi (Jira)


 [ 
https://issues.apache.org/jira/browse/HBASE-28064?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Work on HBASE-28064 started by Vaibhav Joshi.
-
> Implement truncate_region command to truncate region directly from FS
> -
>
> Key: HBASE-28064
> URL: https://issues.apache.org/jira/browse/HBASE-28064
> Project: HBase
>  Issue Type: New Feature
>Reporter: Ankit Singhal
>Assignee: Vaibhav Joshi
>Priority: Major
>
> One of our users has brought up a use-case where they need to truncate a 
> region to delete data within a specific range. There are two scenarios to 
> consider:
> * In the first scenario, the region boundaries involve a time range defined 
> through pre-splitting, and user is looking to efficiently clean old date 
> data. If HBase can directly truncate the region from the file system and then 
> the user can merge the empty region with adjacent regions to effectively 
> eliminate it which will be more optimized compared to deleting the data using 
> Delete API.
> * In another case, if the HFile for that region becomes corrupted for some 
> reason, user want to get rid of the HFile and reload the entire region to 
> avoid consistency issues and ensure performance.
> we can do this by taking the region offline and taking write lock to avoid 
> the consideration of race conditions involving Region In Transition (RITs), 
> region re-opening, and merge/split scenarios. 



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Commented] (HBASE-28043) Reduce seeks from beginning of block in StoreFileScanner.seekToPreviousRow

2023-10-16 Thread Becker Ewing (Jira)


[ 
https://issues.apache.org/jira/browse/HBASE-28043?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17776008#comment-17776008
 ] 

Becker Ewing commented on HBASE-28043:
--

After some offline discussion with [~bbeaudreault], I trialed a "lazy 
optimization" path that kicks in on the 2nd consecutive call to 
"seekToPreviousRow" as opposed to the 1st: 
[https://gist.github.com/jbewing/e721a1f1f2615591512385f4b37da496] 

 

I also decided to benchmark _everything_ a lot more as a single benchmark run 
gives some signal, but not a super clear signal in our case. I repeated the 
above benchmarking methodology (and have included run #1 for "master" and 
"patch" in this table) along with "patch w/ lazy opt" which is the "lazy 
optimization" patch mentioned above.

 

I got the following results:

 
||Benchmark||Revision||Avg Latency (us)||Avg Throughput (rows / sec)||
|metaRandomRead #1|master|839|11898|
|metaRandomRead #2|master|789|12656|
|metaRandomRead #3|master|785|12714|
|metaRandomRead #1|patch|891|11203|
|metaRandomRead #2|patch|843|11840|
|metaRandomRead #3|patch|894|11181|
|metaRandomRead #1|patch w/ lazy opt|896|11134|
|metaRandomRead #2|patch w/ lazy opt|845|11825|
|metaRandomRead #3|patch w/ lazy opt|815|12239|
|metaRandomRead #4|patch w/ lazy opt|850|11749|

 

If anyone is interested in looking at per-thread results/response time 
histograms, I've pasted the raw results output logs of "hbase pe 
--nomapred=true metaRandomRead 10" for the newer above results (#2-3 master, 
#2-3 patch, #1-4 patch w/ lazy opt) in [this 
gist|https://gist.github.com/jbewing/185d7f45d1f43e4f1ee87a45fb3206bf].

 

I'm not entirely sure what to make of this. Both patches seem to be roughly 
equal in terms of benchmarking with master being about 50 microseconds faster 
on average. Given that 50 microseconds is a fraction of a datacenter RTT, maybe 
the regression won't be that apparent. On the other hand, I'm not sure that I 
can even say with too much authority that master + the patch (and variation) 
differ only be 50 microseconds as the benchmarking numbers are all over the 
place.

> Reduce seeks from beginning of block in StoreFileScanner.seekToPreviousRow
> --
>
> Key: HBASE-28043
> URL: https://issues.apache.org/jira/browse/HBASE-28043
> Project: HBase
>  Issue Type: Improvement
>Reporter: Becker Ewing
>Assignee: Becker Ewing
>Priority: Major
> Attachments: Current_SeekToPreviousRowBehavior.png, 
> Proposed_SeekToPreviousRowBehavior.png
>
>
> Currently, for non-RIV1 DBE encodings, each call to 
> [StoreFileScanner.seekToPreviousRow|https://github.com/apache/hbase/blob/89ca7f4ade84c84a246281c71898543b6161c099/hbase-server/src/main/java/org/apache/hadoop/hbase/regionserver/StoreFileScanner.java#L493-L506]
>  (a common operation in reverse scans) results in two seeks: 
>  # Seek from the beginning of the block to before the given row to find the 
> prior row
>  # Seek from the beginning of the block to the first cell of the prior row
> So if there are "N" rows in a block, a reverse scan through each row results 
> in seeking past 2(N-1)! rows.
>  
> This is a particularly expensive operation for tall tables that have many 
> rows in a block.
>  
> By introducing a state variable "previousRow" to StoreFileScanner, I believe 
> that we could modify the seeking algorithm to be:
>  # Seek from the beginning of the block to before the given row to find the 
> prior row
>  # Seek from the beginning of the block to before the row that is before the 
> row that was just seeked to (i.e. 2 rows back). _Save_ this as a hint for 
> where the prior row is in "previousRow"
>  # Reseek from "previousRow" (2 rows back from start) to 1 row back from 
> start (to the actual previousRow)
> Then the rest of the calls where a "previousRow" is present, you just need to 
> seek to the beginning of the block once instead of twice, i.e. 
>  # seek from the beginning of the block to right before the beginning of your 
> "previousRow" marker. Save this as the new "previousRow" marker
>  # Reseek to the next row (i.e. your previous "previousRow" marker)
>  
> If there are "N" rows in a block, a reverse scan from row N to row 0 results 
> in seeking past approximately (N-1)! rows i.e. 50% less than the current 
> behavior.
>  
> See the attached diagrams for the current and proposed behavior. 
>  
>  



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


Re: [PR] HBASE-28157. hbck should report previously reported regions with null region location [hbase]

2023-10-16 Thread via GitHub


Apache-HBase commented on PR #5463:
URL: https://github.com/apache/hbase/pull/5463#issuecomment-1765608554

   :confetti_ball: **+1 overall**
   
   
   
   
   
   
   | Vote | Subsystem | Runtime | Comment |
   |::|--:|:|:|
   | +0 :ok: |  reexec  |   0m 36s |  Docker mode activated.  |
   | -0 :warning: |  yetus  |   0m  3s |  Unprocessed flag(s): 
--brief-report-file --spotbugs-strict-precheck --whitespace-eol-ignore-list 
--whitespace-tabs-ignore-list --quick-hadoopcheck  |
   ||| _ Prechecks _ |
   ||| _ master Compile Tests _ |
   | +1 :green_heart: |  mvninstall  |   2m 54s |  master passed  |
   | +1 :green_heart: |  compile  |   0m 49s |  master passed  |
   | +1 :green_heart: |  shadedjars  |   4m 54s |  branch has no errors when 
building our shaded downstream artifacts.  |
   | +1 :green_heart: |  javadoc  |   0m 29s |  master passed  |
   ||| _ Patch Compile Tests _ |
   | +1 :green_heart: |  mvninstall  |   2m 34s |  the patch passed  |
   | +1 :green_heart: |  compile  |   0m 49s |  the patch passed  |
   | +1 :green_heart: |  javac  |   0m 49s |  the patch passed  |
   | +1 :green_heart: |  shadedjars  |   4m 53s |  patch has no errors when 
building our shaded downstream artifacts.  |
   | +1 :green_heart: |  javadoc  |   0m 26s |  the patch passed  |
   ||| _ Other Tests _ |
   | +1 :green_heart: |  unit  | 224m 23s |  hbase-server in the patch passed.  
|
   |  |   | 247m 16s |   |
   
   
   | Subsystem | Report/Notes |
   |--:|:-|
   | Docker | ClientAPI=1.43 ServerAPI=1.43 base: 
https://ci-hbase.apache.org/job/HBase-PreCommit-GitHub-PR/job/PR-5463/1/artifact/yetus-jdk11-hadoop3-check/output/Dockerfile
 |
   | GITHUB PR | https://github.com/apache/hbase/pull/5463 |
   | Optional Tests | javac javadoc unit shadedjars compile |
   | uname | Linux 599f47fc230e 5.4.0-156-generic #173-Ubuntu SMP Tue Jul 11 
07:25:22 UTC 2023 x86_64 x86_64 x86_64 GNU/Linux |
   | Build tool | maven |
   | Personality | dev-support/hbase-personality.sh |
   | git revision | master / 391dfda6ad |
   | Default Java | Eclipse Adoptium-11.0.17+8 |
   |  Test Results | 
https://ci-hbase.apache.org/job/HBase-PreCommit-GitHub-PR/job/PR-5463/1/testReport/
 |
   | Max. process+thread count | 4901 (vs. ulimit of 3) |
   | modules | C: hbase-server U: hbase-server |
   | Console output | 
https://ci-hbase.apache.org/job/HBase-PreCommit-GitHub-PR/job/PR-5463/1/console 
|
   | versions | git=2.34.1 maven=3.8.6 |
   | Powered by | Apache Yetus 0.12.0 https://yetus.apache.org |
   
   
   This message was automatically generated.
   
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@hbase.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [PR] HBASE-28157. hbck should report previously reported regions with null region location [hbase]

2023-10-16 Thread via GitHub


Apache-HBase commented on PR #5463:
URL: https://github.com/apache/hbase/pull/5463#issuecomment-1765604537

   :confetti_ball: **+1 overall**
   
   
   
   
   
   
   | Vote | Subsystem | Runtime | Comment |
   |::|--:|:|:|
   | +0 :ok: |  reexec  |   0m 24s |  Docker mode activated.  |
   | -0 :warning: |  yetus  |   0m  2s |  Unprocessed flag(s): 
--brief-report-file --spotbugs-strict-precheck --whitespace-eol-ignore-list 
--whitespace-tabs-ignore-list --quick-hadoopcheck  |
   ||| _ Prechecks _ |
   ||| _ master Compile Tests _ |
   | +1 :green_heart: |  mvninstall  |   2m 35s |  master passed  |
   | +1 :green_heart: |  compile  |   0m 37s |  master passed  |
   | +1 :green_heart: |  shadedjars  |   5m 11s |  branch has no errors when 
building our shaded downstream artifacts.  |
   | +1 :green_heart: |  javadoc  |   0m 23s |  master passed  |
   ||| _ Patch Compile Tests _ |
   | +1 :green_heart: |  mvninstall  |   2m 14s |  the patch passed  |
   | +1 :green_heart: |  compile  |   0m 36s |  the patch passed  |
   | +1 :green_heart: |  javac  |   0m 36s |  the patch passed  |
   | +1 :green_heart: |  shadedjars  |   5m  9s |  patch has no errors when 
building our shaded downstream artifacts.  |
   | +1 :green_heart: |  javadoc  |   0m 21s |  the patch passed  |
   ||| _ Other Tests _ |
   | +1 :green_heart: |  unit  | 219m 44s |  hbase-server in the patch passed.  
|
   |  |   | 241m  5s |   |
   
   
   | Subsystem | Report/Notes |
   |--:|:-|
   | Docker | ClientAPI=1.43 ServerAPI=1.43 base: 
https://ci-hbase.apache.org/job/HBase-PreCommit-GitHub-PR/job/PR-5463/1/artifact/yetus-jdk8-hadoop3-check/output/Dockerfile
 |
   | GITHUB PR | https://github.com/apache/hbase/pull/5463 |
   | Optional Tests | javac javadoc unit shadedjars compile |
   | uname | Linux 757f320b032b 5.4.0-1103-aws #111~18.04.1-Ubuntu SMP Tue May 
23 20:04:10 UTC 2023 x86_64 x86_64 x86_64 GNU/Linux |
   | Build tool | maven |
   | Personality | dev-support/hbase-personality.sh |
   | git revision | master / 391dfda6ad |
   | Default Java | Temurin-1.8.0_352-b08 |
   |  Test Results | 
https://ci-hbase.apache.org/job/HBase-PreCommit-GitHub-PR/job/PR-5463/1/testReport/
 |
   | Max. process+thread count | 4630 (vs. ulimit of 3) |
   | modules | C: hbase-server U: hbase-server |
   | Console output | 
https://ci-hbase.apache.org/job/HBase-PreCommit-GitHub-PR/job/PR-5463/1/console 
|
   | versions | git=2.34.1 maven=3.8.6 |
   | Powered by | Apache Yetus 0.12.0 https://yetus.apache.org |
   
   
   This message was automatically generated.
   
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@hbase.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [PR] HBASE-28158 Decouple RIT list management from TRSP invocation [hbase]

2023-10-16 Thread via GitHub


Apache9 commented on PR #5464:
URL: https://github.com/apache/hbase/pull/5464#issuecomment-1765592627

   So the intention here is that, when region is in abnormally closed state, we 
will use operator tools to bypass the TRSP, but then, we will consider the 
region as not in RIT and confusing the operators?
   
   Seems reasonable. But we should be careful to not mess up the normal logic. 
Let me take a look at the related code carefully.
   
   Thanks.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@hbase.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [PR] HBASE-28064:Implement truncate_region command [hbase]

2023-10-16 Thread via GitHub


Apache-HBase commented on PR #5462:
URL: https://github.com/apache/hbase/pull/5462#issuecomment-1765590780

   :confetti_ball: **+1 overall**
   
   
   
   
   
   
   | Vote | Subsystem | Runtime | Comment |
   |::|--:|:|:|
   | +0 :ok: |  reexec  |   0m 40s |  Docker mode activated.  |
   ||| _ Prechecks _ |
   | +1 :green_heart: |  dupname  |   0m  0s |  No case conflicting files 
found.  |
   | +0 :ok: |  prototool  |   0m  0s |  prototool was not available.  |
   | +1 :green_heart: |  hbaseanti  |   0m  0s |  Patch does not have any 
anti-patterns.  |
   | +1 :green_heart: |  @author  |   0m  0s |  The patch does not contain any 
@author tags.  |
   ||| _ master Compile Tests _ |
   | +0 :ok: |  mvndep  |   0m 11s |  Maven dependency ordering for branch  |
   | +1 :green_heart: |  mvninstall  |   3m 52s |  master passed  |
   | +1 :green_heart: |  compile  |   5m 36s |  master passed  |
   | +1 :green_heart: |  checkstyle  |   1m 32s |  master passed  |
   | +1 :green_heart: |  spotless  |   0m 56s |  branch has no errors when 
running spotless:check.  |
   | +1 :green_heart: |  spotbugs  |   7m 13s |  master passed  |
   ||| _ Patch Compile Tests _ |
   | +0 :ok: |  mvndep  |   0m 12s |  Maven dependency ordering for patch  |
   | +1 :green_heart: |  mvninstall  |   3m  5s |  the patch passed  |
   | +1 :green_heart: |  compile  |   7m 46s |  the patch passed  |
   | +1 :green_heart: |  cc  |   7m 46s |  the patch passed  |
   | +1 :green_heart: |  javac  |   7m 46s |  the patch passed  |
   | +1 :green_heart: |  checkstyle  |   2m 30s |  the patch passed  |
   | -0 :warning: |  rubocop  |   0m 18s |  The patch generated 8 new + 766 
unchanged - 0 fixed = 774 total (was 766)  |
   | +1 :green_heart: |  whitespace  |   0m  0s |  The patch has no whitespace 
issues.  |
   | +1 :green_heart: |  hadoopcheck  |  14m 16s |  Patch does not cause any 
errors with Hadoop 3.2.4 3.3.6.  |
   | +1 :green_heart: |  hbaseprotoc  |   3m  3s |  the patch passed  |
   | +1 :green_heart: |  spotless  |   1m 17s |  patch has no errors when 
running spotless:check.  |
   | +1 :green_heart: |  spotbugs  |   9m 33s |  the patch passed  |
   ||| _ Other Tests _ |
   | +1 :green_heart: |  asflicense  |   0m 59s |  The patch does not generate 
ASF License warnings.  |
   |  |   |  72m 29s |   |
   
   
   | Subsystem | Report/Notes |
   |--:|:-|
   | Docker | ClientAPI=1.43 ServerAPI=1.43 base: 
https://ci-hbase.apache.org/job/HBase-PreCommit-GitHub-PR/job/PR-5462/3/artifact/yetus-general-check/output/Dockerfile
 |
   | GITHUB PR | https://github.com/apache/hbase/pull/5462 |
   | JIRA Issue | HBASE-28064 |
   | Optional Tests | dupname asflicense javac spotbugs hadoopcheck hbaseanti 
spotless checkstyle compile cc hbaseprotoc prototool rubocop |
   | uname | Linux a6ba8bf3d82b 5.4.0-163-generic #180-Ubuntu SMP Tue Sep 5 
13:21:23 UTC 2023 x86_64 x86_64 x86_64 GNU/Linux |
   | Build tool | maven |
   | Personality | dev-support/hbase-personality.sh |
   | git revision | master / 391dfda6ad |
   | Default Java | Eclipse Adoptium-11.0.17+8 |
   | rubocop | 
https://ci-hbase.apache.org/job/HBase-PreCommit-GitHub-PR/job/PR-5462/3/artifact/yetus-general-check/output/diff-patch-rubocop.txt
 |
   | Max. process+thread count | 80 (vs. ulimit of 3) |
   | modules | C: hbase-protocol-shaded hbase-client hbase-server hbase-thrift 
hbase-shell U: . |
   | Console output | 
https://ci-hbase.apache.org/job/HBase-PreCommit-GitHub-PR/job/PR-5462/3/console 
|
   | versions | git=2.34.1 maven=3.8.6 spotbugs=4.7.3 rubocop=1.37.1 |
   | Powered by | Apache Yetus 0.12.0 https://yetus.apache.org |
   
   
   This message was automatically generated.
   
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@hbase.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [PR] HBASE-28158 Decouple RIT list management from TRSP invocation [hbase]

2023-10-16 Thread via GitHub


Apache-HBase commented on PR #5464:
URL: https://github.com/apache/hbase/pull/5464#issuecomment-1765564325

   :confetti_ball: **+1 overall**
   
   
   
   
   
   
   | Vote | Subsystem | Runtime | Comment |
   |::|--:|:|:|
   | +0 :ok: |  reexec  |   0m 35s |  Docker mode activated.  |
   ||| _ Prechecks _ |
   | +1 :green_heart: |  dupname  |   0m  0s |  No case conflicting files 
found.  |
   | +1 :green_heart: |  hbaseanti  |   0m  0s |  Patch does not have any 
anti-patterns.  |
   | +1 :green_heart: |  @author  |   0m  0s |  The patch does not contain any 
@author tags.  |
   ||| _ master Compile Tests _ |
   | +1 :green_heart: |  mvninstall  |   4m 17s |  master passed  |
   | +1 :green_heart: |  compile  |   3m  6s |  master passed  |
   | +1 :green_heart: |  checkstyle  |   0m 42s |  master passed  |
   | +1 :green_heart: |  spotless  |   0m 51s |  branch has no errors when 
running spotless:check.  |
   | +1 :green_heart: |  spotbugs  |   1m 46s |  master passed  |
   ||| _ Patch Compile Tests _ |
   | +1 :green_heart: |  mvninstall  |   3m 25s |  the patch passed  |
   | +1 :green_heart: |  compile  |   3m 13s |  the patch passed  |
   | +1 :green_heart: |  javac  |   3m 13s |  the patch passed  |
   | +1 :green_heart: |  checkstyle  |   0m 42s |  the patch passed  |
   | +1 :green_heart: |  whitespace  |   0m  0s |  The patch has no whitespace 
issues.  |
   | +1 :green_heart: |  hadoopcheck  |  12m 30s |  Patch does not cause any 
errors with Hadoop 3.2.4 3.3.6.  |
   | +1 :green_heart: |  spotless  |   1m  8s |  patch has no errors when 
running spotless:check.  |
   | +1 :green_heart: |  spotbugs  |   2m 29s |  the patch passed  |
   ||| _ Other Tests _ |
   | +1 :green_heart: |  asflicense  |   0m 13s |  The patch does not generate 
ASF License warnings.  |
   |  |   |  43m 56s |   |
   
   
   | Subsystem | Report/Notes |
   |--:|:-|
   | Docker | ClientAPI=1.43 ServerAPI=1.43 base: 
https://ci-hbase.apache.org/job/HBase-PreCommit-GitHub-PR/job/PR-5464/1/artifact/yetus-general-check/output/Dockerfile
 |
   | GITHUB PR | https://github.com/apache/hbase/pull/5464 |
   | Optional Tests | dupname asflicense javac spotbugs hadoopcheck hbaseanti 
spotless checkstyle compile |
   | uname | Linux 625f8eec7b40 5.4.0-1103-aws #111~18.04.1-Ubuntu SMP Tue May 
23 20:04:10 UTC 2023 x86_64 x86_64 x86_64 GNU/Linux |
   | Build tool | maven |
   | Personality | dev-support/hbase-personality.sh |
   | git revision | master / 391dfda6ad |
   | Default Java | Eclipse Adoptium-11.0.17+8 |
   | Max. process+thread count | 78 (vs. ulimit of 3) |
   | modules | C: hbase-server U: hbase-server |
   | Console output | 
https://ci-hbase.apache.org/job/HBase-PreCommit-GitHub-PR/job/PR-5464/1/console 
|
   | versions | git=2.34.1 maven=3.8.6 spotbugs=4.7.3 |
   | Powered by | Apache Yetus 0.12.0 https://yetus.apache.org |
   
   
   This message was automatically generated.
   
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@hbase.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[jira] [Updated] (HBASE-28158) Decouple RIT list management from TRSP invocation

2023-10-16 Thread Andrew Kyle Purtell (Jira)


 [ 
https://issues.apache.org/jira/browse/HBASE-28158?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Andrew Kyle Purtell updated HBASE-28158:

Description: 
Operators bypassed some in progress TRSPs leading to a state where some regions 
were persistently in transition but hidden. Because the master builds its list 
of regions in transition by tracking TRSP, the bypass of TRSP removed the 
regions from the RIT list. 

Although I can see from reading the code this is the expected behavior, it is 
surprising for operators and should be changed. Operators expect that regions 
that should be open but are not appear the master's RIT list, provided by 
/rits.jsp, the output of the shell's 'rit' command, and in ClusterStatus.

We should only remove a region from the RIT map when assignment reaches a 
suitable terminal state.

  was:
Operators bypassed some in progress TRSPs leading to a state where some regions 
were persistently in transition but hidden. Because the master builds its list 
of regions in transition by tracking TRSP, the bypass of TRSP removed the 
regions from the RIT list. 

Although I can see from reading the code this is the expected behavior, it is 
surprising for operators and should be changed. 

We should only remove a region from the RIT map when assignment reaches a 
suitable terminal state.


> Decouple RIT list management from TRSP invocation
> -
>
> Key: HBASE-28158
> URL: https://issues.apache.org/jira/browse/HBASE-28158
> Project: HBase
>  Issue Type: Bug
>Affects Versions: 2.5.6
>Reporter: Andrew Kyle Purtell
>Assignee: Andrew Kyle Purtell
>Priority: Major
> Fix For: 2.6.0, 3.0.0-beta-1, 4.0.0-alpha-1, 2.5.7
>
>
> Operators bypassed some in progress TRSPs leading to a state where some 
> regions were persistently in transition but hidden. Because the master builds 
> its list of regions in transition by tracking TRSP, the bypass of TRSP 
> removed the regions from the RIT list. 
> Although I can see from reading the code this is the expected behavior, it is 
> surprising for operators and should be changed. Operators expect that regions 
> that should be open but are not appear the master's RIT list, provided by 
> /rits.jsp, the output of the shell's 'rit' command, and in ClusterStatus.
> We should only remove a region from the RIT map when assignment reaches a 
> suitable terminal state.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Created] (HBASE-28158) Decouple RIT list management from TRSP invocation

2023-10-16 Thread Andrew Kyle Purtell (Jira)
Andrew Kyle Purtell created HBASE-28158:
---

 Summary: Decouple RIT list management from TRSP invocation
 Key: HBASE-28158
 URL: https://issues.apache.org/jira/browse/HBASE-28158
 Project: HBase
  Issue Type: Bug
Affects Versions: 2.5.6
Reporter: Andrew Kyle Purtell
Assignee: Andrew Kyle Purtell
 Fix For: 2.6.0, 3.0.0-beta-1, 4.0.0-alpha-1, 2.5.7


Operators bypassed some in progress TRSPs leading to a state where some regions 
were persistently in transition but hidden. Because the master builds its list 
of regions in transition by tracking TRSP, the bypass of TRSP removed the 
regions from the RIT list. 

Although I can see from reading the code this is the expected behavior, it is 
surprising for operators and should be changed. 

We should only remove a region from the RIT map when assignment reaches a 
suitable terminal state.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Commented] (HBASE-28156) Intra-process client connections cause netty EventLoop deadlock

2023-10-16 Thread Duo Zhang (Jira)


[ 
https://issues.apache.org/jira/browse/HBASE-28156?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17775971#comment-17775971
 ] 

Duo Zhang commented on HBASE-28156:
---

So mind uploading the jstack result here? We could see why a netty event loop 
can be blocked on Socket.accept...

> Intra-process client connections cause netty EventLoop deadlock
> ---
>
> Key: HBASE-28156
> URL: https://issues.apache.org/jira/browse/HBASE-28156
> Project: HBase
>  Issue Type: Bug
>Reporter: Bryan Beaudreault
>Priority: Major
>
> We've had a few operational incidents over the past few months where our 
> HMaster stops accepting new connections, but can continue processing requests 
> from existing ones. Finally I was able to get heap and thread dumps to 
> confirm what's happening.
> The core trigger is HBASE-24687, where the MobFileCleanerChore is not using 
> ClusterConnection. I've prodded the linked PR to get that resolved and will 
> take it over if I don't hear soon.
> In this case, the chore is using the NettyRpcClient to make a local rpc call 
> to the same NettyRpcServer in the process. Due to 
> [NettyEventLoopGroupConfig|https://github.com/apache/hbase/blob/master/hbase-server/src/main/java/org/apache/hadoop/hbase/util/NettyEventLoopGroupConfig.java#L98],
>  we use the same EventLoopGroup for both the RPC Client and the RPC Server.
> What happens rarely is that the local client for MobFileCleanerChore gets 
> assigned to RS-EventLoopGroup-1-1. Since we share the EventLoopGroupConfig, 
> and [we don't specify a separate parent 
> group|https://github.com/apache/hbase/blob/master/hbase-server/src/main/java/org/apache/hadoop/hbase/ipc/NettyRpcServer.java#L155],
>  that group is also the group which processes new connections.
> What we see in this case is that RS-EventLoopGroup-1-1 gets hung in 
> Socket.accept. Since the client side is on the same EventLoop, it's tasks get 
> stuck in a queue waiting for the executor. So the client can't send the 
> request that the server Socket is waiting for.
> Further, the client/chore gets stuck waiting on BlockingRpcCallback.get(). We 
> use an HWT TimerTask to cancel overdue requests, but it only gets scheduled 
> [once NettyRpcConnection.sendRequest0 is 
> executed|https://github.com/apache/hbase/blob/master/hbase-client/src/main/java/org/apache/hadoop/hbase/ipc/NettyRpcConnection.java#L371].
>  But sendRequest0 [executes on the 
> EventLoop|https://github.com/apache/hbase/blob/master/hbase-client/src/main/java/org/apache/hadoop/hbase/ipc/NettyRpcConnection.java#L393],
>  and thus gets similarly stuck. So we never schedule a timeout and the chore 
> gets stuck forever.
> While fixing HBASE-24687 will fix this case, I think we should improve our 
> netty configuration here so we can avoid problems like this if we ever do 
> intra-process RPC calls again (there may already be others, not sure).



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Commented] (HBASE-28156) Intra-process client connections cause netty EventLoop deadlock

2023-10-16 Thread Duo Zhang (Jira)


[ 
https://issues.apache.org/jira/browse/HBASE-28156?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17775969#comment-17775969
 ] 

Duo Zhang commented on HBASE-28156:
---

{quote}
I think we should use a separate EventLoopGroup for the server parent 
(acceptor). I also think we should fix our HWT timer to schedule prior to the 
event loop.
{quote}

No, these are all by design.

We should always try to share the same EventLoopGroup in the same process, for 
better share the same resources. If you find out that we can not consume all 
the CPUs, just increase the thread number in the EventLoopGroup, instead of 
introducing a new EventLoopGroup. And for canceling a request, it is also 
designed to be executed inside the channel handler, so there is no multi thread 
problems, which could simplify the logic a lot.

> Intra-process client connections cause netty EventLoop deadlock
> ---
>
> Key: HBASE-28156
> URL: https://issues.apache.org/jira/browse/HBASE-28156
> Project: HBase
>  Issue Type: Bug
>Reporter: Bryan Beaudreault
>Priority: Major
>
> We've had a few operational incidents over the past few months where our 
> HMaster stops accepting new connections, but can continue processing requests 
> from existing ones. Finally I was able to get heap and thread dumps to 
> confirm what's happening.
> The core trigger is HBASE-24687, where the MobFileCleanerChore is not using 
> ClusterConnection. I've prodded the linked PR to get that resolved and will 
> take it over if I don't hear soon.
> In this case, the chore is using the NettyRpcClient to make a local rpc call 
> to the same NettyRpcServer in the process. Due to 
> [NettyEventLoopGroupConfig|https://github.com/apache/hbase/blob/master/hbase-server/src/main/java/org/apache/hadoop/hbase/util/NettyEventLoopGroupConfig.java#L98],
>  we use the same EventLoopGroup for both the RPC Client and the RPC Server.
> What happens rarely is that the local client for MobFileCleanerChore gets 
> assigned to RS-EventLoopGroup-1-1. Since we share the EventLoopGroupConfig, 
> and [we don't specify a separate parent 
> group|https://github.com/apache/hbase/blob/master/hbase-server/src/main/java/org/apache/hadoop/hbase/ipc/NettyRpcServer.java#L155],
>  that group is also the group which processes new connections.
> What we see in this case is that RS-EventLoopGroup-1-1 gets hung in 
> Socket.accept. Since the client side is on the same EventLoop, it's tasks get 
> stuck in a queue waiting for the executor. So the client can't send the 
> request that the server Socket is waiting for.
> Further, the client/chore gets stuck waiting on BlockingRpcCallback.get(). We 
> use an HWT TimerTask to cancel overdue requests, but it only gets scheduled 
> [once NettyRpcConnection.sendRequest0 is 
> executed|https://github.com/apache/hbase/blob/master/hbase-client/src/main/java/org/apache/hadoop/hbase/ipc/NettyRpcConnection.java#L371].
>  But sendRequest0 [executes on the 
> EventLoop|https://github.com/apache/hbase/blob/master/hbase-client/src/main/java/org/apache/hadoop/hbase/ipc/NettyRpcConnection.java#L393],
>  and thus gets similarly stuck. So we never schedule a timeout and the chore 
> gets stuck forever.
> While fixing HBASE-24687 will fix this case, I think we should improve our 
> netty configuration here so we can avoid problems like this if we ever do 
> intra-process RPC calls again (there may already be others, not sure).



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Commented] (HBASE-28156) Intra-process client connections cause netty EventLoop deadlock

2023-10-16 Thread Duo Zhang (Jira)


[ 
https://issues.apache.org/jira/browse/HBASE-28156?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17775968#comment-17775968
 ] 

Duo Zhang commented on HBASE-28156:
---

Strange, a netty event loop should not be blocked on any operations, otherwise 
there will be big problems. The problem is why RS-EventLoopGroup-1-1 gets hung 
in Socket.accept, netty will only call Socket.accept once the Selector tells 
that the Socket can be accepted. And it should be SocketChannel? We should not 
use Socket in NIO...

> Intra-process client connections cause netty EventLoop deadlock
> ---
>
> Key: HBASE-28156
> URL: https://issues.apache.org/jira/browse/HBASE-28156
> Project: HBase
>  Issue Type: Bug
>Reporter: Bryan Beaudreault
>Priority: Major
>
> We've had a few operational incidents over the past few months where our 
> HMaster stops accepting new connections, but can continue processing requests 
> from existing ones. Finally I was able to get heap and thread dumps to 
> confirm what's happening.
> The core trigger is HBASE-24687, where the MobFileCleanerChore is not using 
> ClusterConnection. I've prodded the linked PR to get that resolved and will 
> take it over if I don't hear soon.
> In this case, the chore is using the NettyRpcClient to make a local rpc call 
> to the same NettyRpcServer in the process. Due to 
> [NettyEventLoopGroupConfig|https://github.com/apache/hbase/blob/master/hbase-server/src/main/java/org/apache/hadoop/hbase/util/NettyEventLoopGroupConfig.java#L98],
>  we use the same EventLoopGroup for both the RPC Client and the RPC Server.
> What happens rarely is that the local client for MobFileCleanerChore gets 
> assigned to RS-EventLoopGroup-1-1. Since we share the EventLoopGroupConfig, 
> and [we don't specify a separate parent 
> group|https://github.com/apache/hbase/blob/master/hbase-server/src/main/java/org/apache/hadoop/hbase/ipc/NettyRpcServer.java#L155],
>  that group is also the group which processes new connections.
> What we see in this case is that RS-EventLoopGroup-1-1 gets hung in 
> Socket.accept. Since the client side is on the same EventLoop, it's tasks get 
> stuck in a queue waiting for the executor. So the client can't send the 
> request that the server Socket is waiting for.
> Further, the client/chore gets stuck waiting on BlockingRpcCallback.get(). We 
> use an HWT TimerTask to cancel overdue requests, but it only gets scheduled 
> [once NettyRpcConnection.sendRequest0 is 
> executed|https://github.com/apache/hbase/blob/master/hbase-client/src/main/java/org/apache/hadoop/hbase/ipc/NettyRpcConnection.java#L371].
>  But sendRequest0 [executes on the 
> EventLoop|https://github.com/apache/hbase/blob/master/hbase-client/src/main/java/org/apache/hadoop/hbase/ipc/NettyRpcConnection.java#L393],
>  and thus gets similarly stuck. So we never schedule a timeout and the chore 
> gets stuck forever.
> While fixing HBASE-24687 will fix this case, I think we should improve our 
> netty configuration here so we can avoid problems like this if we ever do 
> intra-process RPC calls again (there may already be others, not sure).



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


Re: [PR] HBASE-28157. hbck should report previously reported regions with null region location [hbase]

2023-10-16 Thread via GitHub


Apache-HBase commented on PR #5463:
URL: https://github.com/apache/hbase/pull/5463#issuecomment-1765446691

   :confetti_ball: **+1 overall**
   
   
   
   
   
   
   | Vote | Subsystem | Runtime | Comment |
   |::|--:|:|:|
   | +0 :ok: |  reexec  |   0m 43s |  Docker mode activated.  |
   ||| _ Prechecks _ |
   | +1 :green_heart: |  dupname  |   0m  0s |  No case conflicting files 
found.  |
   | +1 :green_heart: |  hbaseanti  |   0m  0s |  Patch does not have any 
anti-patterns.  |
   | +1 :green_heart: |  @author  |   0m  0s |  The patch does not contain any 
@author tags.  |
   ||| _ master Compile Tests _ |
   | +1 :green_heart: |  mvninstall  |   3m  0s |  master passed  |
   | +1 :green_heart: |  compile  |   2m 23s |  master passed  |
   | +1 :green_heart: |  checkstyle  |   0m 35s |  master passed  |
   | +1 :green_heart: |  spotless  |   0m 42s |  branch has no errors when 
running spotless:check.  |
   | +1 :green_heart: |  spotbugs  |   1m 24s |  master passed  |
   ||| _ Patch Compile Tests _ |
   | +1 :green_heart: |  mvninstall  |   2m 33s |  the patch passed  |
   | +1 :green_heart: |  compile  |   2m 24s |  the patch passed  |
   | +1 :green_heart: |  javac  |   2m 24s |  the patch passed  |
   | +1 :green_heart: |  checkstyle  |   0m 30s |  the patch passed  |
   | +1 :green_heart: |  whitespace  |   0m  0s |  The patch has no whitespace 
issues.  |
   | +1 :green_heart: |  hadoopcheck  |   9m 29s |  Patch does not cause any 
errors with Hadoop 3.2.4 3.3.6.  |
   | +1 :green_heart: |  spotless  |   0m 38s |  patch has no errors when 
running spotless:check.  |
   | +1 :green_heart: |  spotbugs  |   1m 29s |  the patch passed  |
   ||| _ Other Tests _ |
   | +1 :green_heart: |  asflicense  |   0m  9s |  The patch does not generate 
ASF License warnings.  |
   |  |   |  32m  5s |   |
   
   
   | Subsystem | Report/Notes |
   |--:|:-|
   | Docker | ClientAPI=1.43 ServerAPI=1.43 base: 
https://ci-hbase.apache.org/job/HBase-PreCommit-GitHub-PR/job/PR-5463/1/artifact/yetus-general-check/output/Dockerfile
 |
   | GITHUB PR | https://github.com/apache/hbase/pull/5463 |
   | Optional Tests | dupname asflicense javac spotbugs hadoopcheck hbaseanti 
spotless checkstyle compile |
   | uname | Linux 351ccc9b3a48 5.4.0-1103-aws #111~18.04.1-Ubuntu SMP Tue May 
23 20:04:10 UTC 2023 x86_64 x86_64 x86_64 GNU/Linux |
   | Build tool | maven |
   | Personality | dev-support/hbase-personality.sh |
   | git revision | master / 391dfda6ad |
   | Default Java | Eclipse Adoptium-11.0.17+8 |
   | Max. process+thread count | 79 (vs. ulimit of 3) |
   | modules | C: hbase-server U: hbase-server |
   | Console output | 
https://ci-hbase.apache.org/job/HBase-PreCommit-GitHub-PR/job/PR-5463/1/console 
|
   | versions | git=2.34.1 maven=3.8.6 spotbugs=4.7.3 |
   | Powered by | Apache Yetus 0.12.0 https://yetus.apache.org |
   
   
   This message was automatically generated.
   
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@hbase.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [PR] HBASE-28043 Reduce seeks from beginning of block in StoreFileScanner.seekToPreviousRow [hbase]

2023-10-16 Thread via GitHub


Apache-HBase commented on PR #5373:
URL: https://github.com/apache/hbase/pull/5373#issuecomment-1765430811

   :confetti_ball: **+1 overall**
   
   
   
   
   
   
   | Vote | Subsystem | Runtime | Comment |
   |::|--:|:|:|
   | +0 :ok: |  reexec  |   0m 46s |  Docker mode activated.  |
   | -0 :warning: |  yetus  |   0m  3s |  Unprocessed flag(s): 
--brief-report-file --spotbugs-strict-precheck --whitespace-eol-ignore-list 
--whitespace-tabs-ignore-list --quick-hadoopcheck  |
   ||| _ Prechecks _ |
   ||| _ master Compile Tests _ |
   | +0 :ok: |  mvndep  |   0m 16s |  Maven dependency ordering for branch  |
   | +1 :green_heart: |  mvninstall  |   4m 16s |  master passed  |
   | +1 :green_heart: |  compile  |   1m 41s |  master passed  |
   | +1 :green_heart: |  shadedjars  |   7m  2s |  branch has no errors when 
building our shaded downstream artifacts.  |
   | +1 :green_heart: |  javadoc  |   1m  2s |  master passed  |
   ||| _ Patch Compile Tests _ |
   | +0 :ok: |  mvndep  |   0m 18s |  Maven dependency ordering for patch  |
   | +1 :green_heart: |  mvninstall  |   4m 10s |  the patch passed  |
   | +1 :green_heart: |  compile  |   1m 38s |  the patch passed  |
   | +1 :green_heart: |  javac  |   1m 38s |  the patch passed  |
   | +1 :green_heart: |  shadedjars  |   6m 48s |  patch has no errors when 
building our shaded downstream artifacts.  |
   | +1 :green_heart: |  javadoc  |   1m  2s |  the patch passed  |
   ||| _ Other Tests _ |
   | +1 :green_heart: |  unit  | 222m 19s |  hbase-server in the patch passed.  
|
   | +1 :green_heart: |  unit  |  13m 20s |  hbase-mapreduce in the patch 
passed.  |
   |  |   | 269m 22s |   |
   
   
   | Subsystem | Report/Notes |
   |--:|:-|
   | Docker | ClientAPI=1.43 ServerAPI=1.43 base: 
https://ci-hbase.apache.org/job/HBase-PreCommit-GitHub-PR/job/PR-5373/5/artifact/yetus-jdk11-hadoop3-check/output/Dockerfile
 |
   | GITHUB PR | https://github.com/apache/hbase/pull/5373 |
   | Optional Tests | javac javadoc unit shadedjars compile |
   | uname | Linux ad2625b7d438 5.4.0-1103-aws #111~18.04.1-Ubuntu SMP Tue May 
23 20:04:10 UTC 2023 x86_64 x86_64 x86_64 GNU/Linux |
   | Build tool | maven |
   | Personality | dev-support/hbase-personality.sh |
   | git revision | master / 391dfda6ad |
   | Default Java | Eclipse Adoptium-11.0.17+8 |
   |  Test Results | 
https://ci-hbase.apache.org/job/HBase-PreCommit-GitHub-PR/job/PR-5373/5/testReport/
 |
   | Max. process+thread count | 4706 (vs. ulimit of 3) |
   | modules | C: hbase-server hbase-mapreduce U: . |
   | Console output | 
https://ci-hbase.apache.org/job/HBase-PreCommit-GitHub-PR/job/PR-5373/5/console 
|
   | versions | git=2.34.1 maven=3.8.6 |
   | Powered by | Apache Yetus 0.12.0 https://yetus.apache.org |
   
   
   This message was automatically generated.
   
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@hbase.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [PR] HBASE-28043 Reduce seeks from beginning of block in StoreFileScanner.seekToPreviousRow [hbase]

2023-10-16 Thread via GitHub


Apache-HBase commented on PR #5373:
URL: https://github.com/apache/hbase/pull/5373#issuecomment-1765420871

   :broken_heart: **-1 overall**
   
   
   
   
   
   
   | Vote | Subsystem | Runtime | Comment |
   |::|--:|:|:|
   | +0 :ok: |  reexec  |   0m 26s |  Docker mode activated.  |
   | -0 :warning: |  yetus  |   0m  3s |  Unprocessed flag(s): 
--brief-report-file --spotbugs-strict-precheck --whitespace-eol-ignore-list 
--whitespace-tabs-ignore-list --quick-hadoopcheck  |
   ||| _ Prechecks _ |
   ||| _ master Compile Tests _ |
   | +0 :ok: |  mvndep  |   0m 13s |  Maven dependency ordering for branch  |
   | +1 :green_heart: |  mvninstall  |   2m 29s |  master passed  |
   | +1 :green_heart: |  compile  |   0m 52s |  master passed  |
   | +1 :green_heart: |  shadedjars  |   5m 13s |  branch has no errors when 
building our shaded downstream artifacts.  |
   | +1 :green_heart: |  javadoc  |   0m 33s |  master passed  |
   ||| _ Patch Compile Tests _ |
   | +0 :ok: |  mvndep  |   0m 11s |  Maven dependency ordering for patch  |
   | +1 :green_heart: |  mvninstall  |   2m 15s |  the patch passed  |
   | +1 :green_heart: |  compile  |   0m 51s |  the patch passed  |
   | +1 :green_heart: |  javac  |   0m 51s |  the patch passed  |
   | +1 :green_heart: |  shadedjars  |   5m 11s |  patch has no errors when 
building our shaded downstream artifacts.  |
   | +1 :green_heart: |  javadoc  |   0m 31s |  the patch passed  |
   ||| _ Other Tests _ |
   | -1 :x: |  unit  | 219m 33s |  hbase-server in the patch failed.  |
   | +1 :green_heart: |  unit  |  13m 41s |  hbase-mapreduce in the patch 
passed.  |
   |  |   | 256m 45s |   |
   
   
   | Subsystem | Report/Notes |
   |--:|:-|
   | Docker | ClientAPI=1.43 ServerAPI=1.43 base: 
https://ci-hbase.apache.org/job/HBase-PreCommit-GitHub-PR/job/PR-5373/5/artifact/yetus-jdk8-hadoop3-check/output/Dockerfile
 |
   | GITHUB PR | https://github.com/apache/hbase/pull/5373 |
   | Optional Tests | javac javadoc unit shadedjars compile |
   | uname | Linux 5331f75dd1d2 5.4.0-1103-aws #111~18.04.1-Ubuntu SMP Tue May 
23 20:04:10 UTC 2023 x86_64 x86_64 x86_64 GNU/Linux |
   | Build tool | maven |
   | Personality | dev-support/hbase-personality.sh |
   | git revision | master / 391dfda6ad |
   | Default Java | Temurin-1.8.0_352-b08 |
   | unit | 
https://ci-hbase.apache.org/job/HBase-PreCommit-GitHub-PR/job/PR-5373/5/artifact/yetus-jdk8-hadoop3-check/output/patch-unit-hbase-server.txt
 |
   |  Test Results | 
https://ci-hbase.apache.org/job/HBase-PreCommit-GitHub-PR/job/PR-5373/5/testReport/
 |
   | Max. process+thread count | 4624 (vs. ulimit of 3) |
   | modules | C: hbase-server hbase-mapreduce U: . |
   | Console output | 
https://ci-hbase.apache.org/job/HBase-PreCommit-GitHub-PR/job/PR-5373/5/console 
|
   | versions | git=2.34.1 maven=3.8.6 |
   | Powered by | Apache Yetus 0.12.0 https://yetus.apache.org |
   
   
   This message was automatically generated.
   
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@hbase.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[jira] [Updated] (HBASE-28157) hbck should report previously reported regions with null region location

2023-10-16 Thread Andrew Kyle Purtell (Jira)


 [ 
https://issues.apache.org/jira/browse/HBASE-28157?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Andrew Kyle Purtell updated HBASE-28157:

Assignee: Andrew Kyle Purtell
  Status: Patch Available  (was: Open)

> hbck should report previously reported regions with null region location
> 
>
> Key: HBASE-28157
> URL: https://issues.apache.org/jira/browse/HBASE-28157
> Project: HBase
>  Issue Type: Bug
>Affects Versions: 2.5.6
>Reporter: Andrew Kyle Purtell
>Assignee: Andrew Kyle Purtell
>Priority: Major
> Fix For: 2.6.0, 2.4.18, 3.0.0, 4.0.0-alpha-1, 2.5.7
>
>
> Operators bypassed some in progress TRSPs leading to a state where some 
> regions were persistently in transition but hidden. 
> Because the master builds its list of regions in transition by tracking TRSP, 
> the bypass of TRSP removed the regions from the RIT list. This was expected, 
> but I will propose a change to RIT tracking on another issue. 
> The online hbck chore also did not report the inconsistency. This was not 
> expected.
> HBASE-28144 was another issue related to this incident, already fixed. 
> Ensure that hbck will report as inconsistent regions where previously a 
> location was reported but now the region location is null, if it is not 
> expected to be offline.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Created] (HBASE-28157) hbck should report previously reported regions with null region location

2023-10-16 Thread Andrew Kyle Purtell (Jira)
Andrew Kyle Purtell created HBASE-28157:
---

 Summary: hbck should report previously reported regions with null 
region location
 Key: HBASE-28157
 URL: https://issues.apache.org/jira/browse/HBASE-28157
 Project: HBase
  Issue Type: Bug
Affects Versions: 2.5.6
Reporter: Andrew Kyle Purtell
 Fix For: 2.6.0, 2.4.18, 3.0.0, 4.0.0-alpha-1, 2.5.7


Operators bypassed some in progress TRSPs leading to a state where some regions 
were persistently in transition but hidden. 

Because the master builds its list of regions in transition by tracking TRSP, 
the bypass of TRSP removed the regions from the RIT list. This was expected, 
but I will propose a change to RIT tracking on another issue. 

The online hbck chore also did not report the inconsistency. This was not 
expected.

HBASE-28144 was another issue related to this incident, already fixed. 

Ensure that hbck will report as inconsistent regions where previously a 
location was reported but now the region location is null, if it is not 
expected to be offline.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


Re: [PR] HBASE-28064:Implement truncate_region command [hbase]

2023-10-16 Thread via GitHub


Apache-HBase commented on PR #5462:
URL: https://github.com/apache/hbase/pull/5462#issuecomment-1765292411

   :broken_heart: **-1 overall**
   
   
   
   
   
   
   | Vote | Subsystem | Runtime | Comment |
   |::|--:|:|:|
   | +0 :ok: |  reexec  |   0m 40s |  Docker mode activated.  |
   | -0 :warning: |  yetus  |   0m  3s |  Unprocessed flag(s): 
--brief-report-file --spotbugs-strict-precheck --whitespace-eol-ignore-list 
--whitespace-tabs-ignore-list --quick-hadoopcheck  |
   ||| _ Prechecks _ |
   ||| _ master Compile Tests _ |
   | +0 :ok: |  mvndep  |   0m 12s |  Maven dependency ordering for branch  |
   | +1 :green_heart: |  mvninstall  |   2m 48s |  master passed  |
   | +1 :green_heart: |  compile  |   2m 25s |  master passed  |
   | +1 :green_heart: |  shadedjars  |   4m 53s |  branch has no errors when 
building our shaded downstream artifacts.  |
   | +1 :green_heart: |  javadoc  |   1m 36s |  master passed  |
   ||| _ Patch Compile Tests _ |
   | +0 :ok: |  mvndep  |   0m 13s |  Maven dependency ordering for patch  |
   | +1 :green_heart: |  mvninstall  |   2m 40s |  the patch passed  |
   | +1 :green_heart: |  compile  |   2m 26s |  the patch passed  |
   | +1 :green_heart: |  javac  |   2m 26s |  the patch passed  |
   | +1 :green_heart: |  shadedjars  |   4m 49s |  patch has no errors when 
building our shaded downstream artifacts.  |
   | +1 :green_heart: |  javadoc  |   1m 35s |  the patch passed  |
   ||| _ Other Tests _ |
   | +1 :green_heart: |  unit  |   0m 38s |  hbase-protocol-shaded in the patch 
passed.  |
   | +1 :green_heart: |  unit  |   1m 33s |  hbase-client in the patch passed.  
|
   | -1 :x: |  unit  | 224m 18s |  hbase-server in the patch failed.  |
   | +1 :green_heart: |  unit  |   8m 30s |  hbase-thrift in the patch passed.  
|
   | +1 :green_heart: |  unit  |   7m 44s |  hbase-shell in the patch passed.  |
   |  |   | 272m 40s |   |
   
   
   | Subsystem | Report/Notes |
   |--:|:-|
   | Docker | ClientAPI=1.43 ServerAPI=1.43 base: 
https://ci-hbase.apache.org/job/HBase-PreCommit-GitHub-PR/job/PR-5462/2/artifact/yetus-jdk11-hadoop3-check/output/Dockerfile
 |
   | GITHUB PR | https://github.com/apache/hbase/pull/5462 |
   | JIRA Issue | HBASE-28064 |
   | Optional Tests | javac javadoc unit shadedjars compile |
   | uname | Linux f5705fc5cb1b 5.4.0-163-generic #180-Ubuntu SMP Tue Sep 5 
13:21:23 UTC 2023 x86_64 x86_64 x86_64 GNU/Linux |
   | Build tool | maven |
   | Personality | dev-support/hbase-personality.sh |
   | git revision | master / 391dfda6ad |
   | Default Java | Eclipse Adoptium-11.0.17+8 |
   | unit | 
https://ci-hbase.apache.org/job/HBase-PreCommit-GitHub-PR/job/PR-5462/2/artifact/yetus-jdk11-hadoop3-check/output/patch-unit-hbase-server.txt
 |
   |  Test Results | 
https://ci-hbase.apache.org/job/HBase-PreCommit-GitHub-PR/job/PR-5462/2/testReport/
 |
   | Max. process+thread count | 4594 (vs. ulimit of 3) |
   | modules | C: hbase-protocol-shaded hbase-client hbase-server hbase-thrift 
hbase-shell U: . |
   | Console output | 
https://ci-hbase.apache.org/job/HBase-PreCommit-GitHub-PR/job/PR-5462/2/console 
|
   | versions | git=2.34.1 maven=3.8.6 |
   | Powered by | Apache Yetus 0.12.0 https://yetus.apache.org |
   
   
   This message was automatically generated.
   
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@hbase.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [PR] HBASE-28064:Implement truncate_region command [hbase]

2023-10-16 Thread via GitHub


Apache-HBase commented on PR #5462:
URL: https://github.com/apache/hbase/pull/5462#issuecomment-1765272464

   :broken_heart: **-1 overall**
   
   
   
   
   
   
   | Vote | Subsystem | Runtime | Comment |
   |::|--:|:|:|
   | +0 :ok: |  reexec  |   0m 30s |  Docker mode activated.  |
   | -0 :warning: |  yetus  |   0m  3s |  Unprocessed flag(s): 
--brief-report-file --spotbugs-strict-precheck --whitespace-eol-ignore-list 
--whitespace-tabs-ignore-list --quick-hadoopcheck  |
   ||| _ Prechecks _ |
   ||| _ master Compile Tests _ |
   | +0 :ok: |  mvndep  |   0m 11s |  Maven dependency ordering for branch  |
   | +1 :green_heart: |  mvninstall  |   2m 14s |  master passed  |
   | +1 :green_heart: |  compile  |   1m 47s |  master passed  |
   | +1 :green_heart: |  shadedjars  |   5m 10s |  branch has no errors when 
building our shaded downstream artifacts.  |
   | +1 :green_heart: |  javadoc  |   1m  8s |  master passed  |
   ||| _ Patch Compile Tests _ |
   | +0 :ok: |  mvndep  |   0m 12s |  Maven dependency ordering for patch  |
   | +1 :green_heart: |  mvninstall  |   2m 14s |  the patch passed  |
   | +1 :green_heart: |  compile  |   1m 47s |  the patch passed  |
   | +1 :green_heart: |  javac  |   1m 47s |  the patch passed  |
   | +1 :green_heart: |  shadedjars  |   5m  8s |  patch has no errors when 
building our shaded downstream artifacts.  |
   | +1 :green_heart: |  javadoc  |   1m  7s |  the patch passed  |
   ||| _ Other Tests _ |
   | +1 :green_heart: |  unit  |   0m 25s |  hbase-protocol-shaded in the patch 
passed.  |
   | +1 :green_heart: |  unit  |   1m 21s |  hbase-client in the patch passed.  
|
   | -1 :x: |  unit  | 217m 47s |  hbase-server in the patch failed.  |
   | +1 :green_heart: |  unit  |   6m 33s |  hbase-thrift in the patch passed.  
|
   | +1 :green_heart: |  unit  |   7m 42s |  hbase-shell in the patch passed.  |
   |  |   | 260m  2s |   |
   
   
   | Subsystem | Report/Notes |
   |--:|:-|
   | Docker | ClientAPI=1.43 ServerAPI=1.43 base: 
https://ci-hbase.apache.org/job/HBase-PreCommit-GitHub-PR/job/PR-5462/2/artifact/yetus-jdk8-hadoop3-check/output/Dockerfile
 |
   | GITHUB PR | https://github.com/apache/hbase/pull/5462 |
   | JIRA Issue | HBASE-28064 |
   | Optional Tests | javac javadoc unit shadedjars compile |
   | uname | Linux 5128cba3b006 5.4.0-1103-aws #111~18.04.1-Ubuntu SMP Tue May 
23 20:04:10 UTC 2023 x86_64 x86_64 x86_64 GNU/Linux |
   | Build tool | maven |
   | Personality | dev-support/hbase-personality.sh |
   | git revision | master / 391dfda6ad |
   | Default Java | Temurin-1.8.0_352-b08 |
   | unit | 
https://ci-hbase.apache.org/job/HBase-PreCommit-GitHub-PR/job/PR-5462/2/artifact/yetus-jdk8-hadoop3-check/output/patch-unit-hbase-server.txt
 |
   |  Test Results | 
https://ci-hbase.apache.org/job/HBase-PreCommit-GitHub-PR/job/PR-5462/2/testReport/
 |
   | Max. process+thread count | 4595 (vs. ulimit of 3) |
   | modules | C: hbase-protocol-shaded hbase-client hbase-server hbase-thrift 
hbase-shell U: . |
   | Console output | 
https://ci-hbase.apache.org/job/HBase-PreCommit-GitHub-PR/job/PR-5462/2/console 
|
   | versions | git=2.34.1 maven=3.8.6 |
   | Powered by | Apache Yetus 0.12.0 https://yetus.apache.org |
   
   
   This message was automatically generated.
   
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@hbase.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [PR] HBASE-28113 Modify the way of acquiring the RegionStateNode lock in checkOnlineRegionsReport to tryLock [hbase]

2023-10-16 Thread via GitHub


Apache-HBase commented on PR #5442:
URL: https://github.com/apache/hbase/pull/5442#issuecomment-1765214167

   :broken_heart: **-1 overall**
   
   
   
   
   
   
   | Vote | Subsystem | Runtime | Comment |
   |::|--:|:|:|
   | +0 :ok: |  reexec  |   0m 41s |  Docker mode activated.  |
   | -0 :warning: |  yetus  |   0m  3s |  Unprocessed flag(s): 
--brief-report-file --spotbugs-strict-precheck --whitespace-eol-ignore-list 
--whitespace-tabs-ignore-list --quick-hadoopcheck  |
   ||| _ Prechecks _ |
   ||| _ master Compile Tests _ |
   | +1 :green_heart: |  mvninstall  |   3m 29s |  master passed  |
   | +1 :green_heart: |  compile  |   0m 59s |  master passed  |
   | +1 :green_heart: |  shadedjars  |   7m 12s |  branch has no errors when 
building our shaded downstream artifacts.  |
   | +1 :green_heart: |  javadoc  |   0m 28s |  master passed  |
   ||| _ Patch Compile Tests _ |
   | +1 :green_heart: |  mvninstall  |   2m 38s |  the patch passed  |
   | +1 :green_heart: |  compile  |   0m 40s |  the patch passed  |
   | +1 :green_heart: |  javac  |   0m 40s |  the patch passed  |
   | +1 :green_heart: |  shadedjars  |   5m 14s |  patch has no errors when 
building our shaded downstream artifacts.  |
   | +1 :green_heart: |  javadoc  |   0m 32s |  the patch passed  |
   ||| _ Other Tests _ |
   | -1 :x: |  unit  | 226m 55s |  hbase-server in the patch failed.  |
   |  |   | 253m  9s |   |
   
   
   | Subsystem | Report/Notes |
   |--:|:-|
   | Docker | ClientAPI=1.43 ServerAPI=1.43 base: 
https://ci-hbase.apache.org/job/HBase-PreCommit-GitHub-PR/job/PR-5442/4/artifact/yetus-jdk8-hadoop3-check/output/Dockerfile
 |
   | GITHUB PR | https://github.com/apache/hbase/pull/5442 |
   | Optional Tests | javac javadoc unit shadedjars compile |
   | uname | Linux 2dc319e19259 5.4.0-156-generic #173-Ubuntu SMP Tue Jul 11 
07:25:22 UTC 2023 x86_64 x86_64 x86_64 GNU/Linux |
   | Build tool | maven |
   | Personality | dev-support/hbase-personality.sh |
   | git revision | master / 391dfda6ad |
   | Default Java | Temurin-1.8.0_352-b08 |
   | unit | 
https://ci-hbase.apache.org/job/HBase-PreCommit-GitHub-PR/job/PR-5442/4/artifact/yetus-jdk8-hadoop3-check/output/patch-unit-hbase-server.txt
 |
   |  Test Results | 
https://ci-hbase.apache.org/job/HBase-PreCommit-GitHub-PR/job/PR-5442/4/testReport/
 |
   | Max. process+thread count | 4210 (vs. ulimit of 3) |
   | modules | C: hbase-server U: hbase-server |
   | Console output | 
https://ci-hbase.apache.org/job/HBase-PreCommit-GitHub-PR/job/PR-5442/4/console 
|
   | versions | git=2.34.1 maven=3.8.6 |
   | Powered by | Apache Yetus 0.12.0 https://yetus.apache.org |
   
   
   This message was automatically generated.
   
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@hbase.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[jira] [Comment Edited] (HBASE-28156) Intra-process client connections cause netty EventLoop deadlock

2023-10-16 Thread Bryan Beaudreault (Jira)


[ 
https://issues.apache.org/jira/browse/HBASE-28156?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17775903#comment-17775903
 ] 

Bryan Beaudreault edited comment on HBASE-28156 at 10/16/23 8:23 PM:
-

I think we should use a separate EventLoopGroup for the server parent 
(acceptor). I also think we should fix our HWT timer to schedule prior to the 
event loop.


was (Author: bbeaudreault):
I think we should use a separate EventLoopGroup for the server parent 
(acceptor). I also think we should fix our HWT timer to schedule prior to the 
event loop. I think it might still be possible for a server child task to get 
blocked at that point, but not the acceptor. Do we need a server side hard 
timeout as well?

> Intra-process client connections cause netty EventLoop deadlock
> ---
>
> Key: HBASE-28156
> URL: https://issues.apache.org/jira/browse/HBASE-28156
> Project: HBase
>  Issue Type: Bug
>Reporter: Bryan Beaudreault
>Priority: Major
>
> We've had a few operational incidents over the past few months where our 
> HMaster stops accepting new connections, but can continue processing requests 
> from existing ones. Finally I was able to get heap and thread dumps to 
> confirm what's happening.
> The core trigger is HBASE-24687, where the MobFileCleanerChore is not using 
> ClusterConnection. I've prodded the linked PR to get that resolved and will 
> take it over if I don't hear soon.
> In this case, the chore is using the NettyRpcClient to make a local rpc call 
> to the same NettyRpcServer in the process. Due to 
> [NettyEventLoopGroupConfig|https://github.com/apache/hbase/blob/master/hbase-server/src/main/java/org/apache/hadoop/hbase/util/NettyEventLoopGroupConfig.java#L98],
>  we use the same EventLoopGroup for both the RPC Client and the RPC Server.
> What happens rarely is that the local client for MobFileCleanerChore gets 
> assigned to RS-EventLoopGroup-1-1. Since we share the EventLoopGroupConfig, 
> and [we don't specify a separate parent 
> group|https://github.com/apache/hbase/blob/master/hbase-server/src/main/java/org/apache/hadoop/hbase/ipc/NettyRpcServer.java#L155],
>  that group is also the group which processes new connections.
> What we see in this case is that RS-EventLoopGroup-1-1 gets hung in 
> Socket.accept. Since the client side is on the same EventLoop, it's tasks get 
> stuck in a queue waiting for the executor. So the client can't send the 
> request that the server Socket is waiting for.
> Further, the client/chore gets stuck waiting on BlockingRpcCallback.get(). We 
> use an HWT TimerTask to cancel overdue requests, but it only gets scheduled 
> [once NettyRpcConnection.sendRequest0 is 
> executed|https://github.com/apache/hbase/blob/master/hbase-client/src/main/java/org/apache/hadoop/hbase/ipc/NettyRpcConnection.java#L371].
>  But sendRequest0 [executes on the 
> EventLoop|https://github.com/apache/hbase/blob/master/hbase-client/src/main/java/org/apache/hadoop/hbase/ipc/NettyRpcConnection.java#L393],
>  and thus gets similarly stuck. So we never schedule a timeout and the chore 
> gets stuck forever.
> While fixing HBASE-24687 will fix this case, I think we should improve our 
> netty configuration here so we can avoid problems like this if we ever do 
> intra-process RPC calls again (there may already be others, not sure).



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


Re: [PR] HBASE-28113 Modify the way of acquiring the RegionStateNode lock in checkOnlineRegionsReport to tryLock [hbase]

2023-10-16 Thread via GitHub


Apache-HBase commented on PR #5442:
URL: https://github.com/apache/hbase/pull/5442#issuecomment-1765213563

   :confetti_ball: **+1 overall**
   
   
   
   
   
   
   | Vote | Subsystem | Runtime | Comment |
   |::|--:|:|:|
   | +0 :ok: |  reexec  |   0m 27s |  Docker mode activated.  |
   | -0 :warning: |  yetus  |   0m  3s |  Unprocessed flag(s): 
--brief-report-file --spotbugs-strict-precheck --whitespace-eol-ignore-list 
--whitespace-tabs-ignore-list --quick-hadoopcheck  |
   ||| _ Prechecks _ |
   ||| _ master Compile Tests _ |
   | +1 :green_heart: |  mvninstall  |   2m 31s |  master passed  |
   | +1 :green_heart: |  compile  |   0m 43s |  master passed  |
   | +1 :green_heart: |  shadedjars  |   5m 10s |  branch has no errors when 
building our shaded downstream artifacts.  |
   | +1 :green_heart: |  javadoc  |   0m 22s |  master passed  |
   ||| _ Patch Compile Tests _ |
   | +1 :green_heart: |  mvninstall  |   2m 30s |  the patch passed  |
   | +1 :green_heart: |  compile  |   0m 44s |  the patch passed  |
   | +1 :green_heart: |  javac  |   0m 44s |  the patch passed  |
   | +1 :green_heart: |  shadedjars  |   5m  9s |  patch has no errors when 
building our shaded downstream artifacts.  |
   | +1 :green_heart: |  javadoc  |   0m 21s |  the patch passed  |
   ||| _ Other Tests _ |
   | +1 :green_heart: |  unit  | 230m 34s |  hbase-server in the patch passed.  
|
   |  |   | 253m 11s |   |
   
   
   | Subsystem | Report/Notes |
   |--:|:-|
   | Docker | ClientAPI=1.43 ServerAPI=1.43 base: 
https://ci-hbase.apache.org/job/HBase-PreCommit-GitHub-PR/job/PR-5442/4/artifact/yetus-jdk11-hadoop3-check/output/Dockerfile
 |
   | GITHUB PR | https://github.com/apache/hbase/pull/5442 |
   | Optional Tests | javac javadoc unit shadedjars compile |
   | uname | Linux 8dca7a963799 5.4.0-1103-aws #111~18.04.1-Ubuntu SMP Tue May 
23 20:04:10 UTC 2023 x86_64 x86_64 x86_64 GNU/Linux |
   | Build tool | maven |
   | Personality | dev-support/hbase-personality.sh |
   | git revision | master / 391dfda6ad |
   | Default Java | Eclipse Adoptium-11.0.17+8 |
   |  Test Results | 
https://ci-hbase.apache.org/job/HBase-PreCommit-GitHub-PR/job/PR-5442/4/testReport/
 |
   | Max. process+thread count | 4683 (vs. ulimit of 3) |
   | modules | C: hbase-server U: hbase-server |
   | Console output | 
https://ci-hbase.apache.org/job/HBase-PreCommit-GitHub-PR/job/PR-5442/4/console 
|
   | versions | git=2.34.1 maven=3.8.6 |
   | Powered by | Apache Yetus 0.12.0 https://yetus.apache.org |
   
   
   This message was automatically generated.
   
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@hbase.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[jira] [Assigned] (HBASE-28064) Implement truncate_region command to truncate region directly from FS

2023-10-16 Thread Ankit Singhal (Jira)


 [ 
https://issues.apache.org/jira/browse/HBASE-28064?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ankit Singhal reassigned HBASE-28064:
-

Assignee: Vaibhav Joshi

> Implement truncate_region command to truncate region directly from FS
> -
>
> Key: HBASE-28064
> URL: https://issues.apache.org/jira/browse/HBASE-28064
> Project: HBase
>  Issue Type: New Feature
>Reporter: Ankit Singhal
>Assignee: Vaibhav Joshi
>Priority: Major
>
> One of our users has brought up a use-case where they need to truncate a 
> region to delete data within a specific range. There are two scenarios to 
> consider:
> * In the first scenario, the region boundaries involve a time range defined 
> through pre-splitting, and user is looking to efficiently clean old date 
> data. If HBase can directly truncate the region from the file system and then 
> the user can merge the empty region with adjacent regions to effectively 
> eliminate it which will be more optimized compared to deleting the data using 
> Delete API.
> * In another case, if the HFile for that region becomes corrupted for some 
> reason, user want to get rid of the HFile and reload the entire region to 
> avoid consistency issues and ensure performance.
> we can do this by taking the region offline and taking write lock to avoid 
> the consideration of race conditions involving Region In Transition (RITs), 
> region re-opening, and merge/split scenarios. 



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Commented] (HBASE-28064) Implement truncate_region command to truncate region directly from FS

2023-10-16 Thread Ankit Singhal (Jira)


[ 
https://issues.apache.org/jira/browse/HBASE-28064?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17775908#comment-17775908
 ] 

Ankit Singhal commented on HBASE-28064:
---

[~rbhatta], Vaibhav has created a pull request, in case you want to contribute 
in review 
https://github.com/apache/hbase/pull/5462/files


> Implement truncate_region command to truncate region directly from FS
> -
>
> Key: HBASE-28064
> URL: https://issues.apache.org/jira/browse/HBASE-28064
> Project: HBase
>  Issue Type: New Feature
>Reporter: Ankit Singhal
>Assignee: Vaibhav Joshi
>Priority: Major
>
> One of our users has brought up a use-case where they need to truncate a 
> region to delete data within a specific range. There are two scenarios to 
> consider:
> * In the first scenario, the region boundaries involve a time range defined 
> through pre-splitting, and user is looking to efficiently clean old date 
> data. If HBase can directly truncate the region from the file system and then 
> the user can merge the empty region with adjacent regions to effectively 
> eliminate it which will be more optimized compared to deleting the data using 
> Delete API.
> * In another case, if the HFile for that region becomes corrupted for some 
> reason, user want to get rid of the HFile and reload the entire region to 
> avoid consistency issues and ensure performance.
> we can do this by taking the region offline and taking write lock to avoid 
> the consideration of race conditions involving Region In Transition (RITs), 
> region re-opening, and merge/split scenarios. 



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Commented] (HBASE-28156) Intra-process client connections cause netty EventLoop deadlock

2023-10-16 Thread Bryan Beaudreault (Jira)


[ 
https://issues.apache.org/jira/browse/HBASE-28156?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17775903#comment-17775903
 ] 

Bryan Beaudreault commented on HBASE-28156:
---

I think we should use a separate EventLoopGroup for the server parent 
(acceptor). I also think we should fix our HWT timer to schedule prior to the 
event loop. I think it might still be possible for a server child task to get 
blocked at that point, but not the acceptor. Do we need a server side hard 
timeout as well?

> Intra-process client connections cause netty EventLoop deadlock
> ---
>
> Key: HBASE-28156
> URL: https://issues.apache.org/jira/browse/HBASE-28156
> Project: HBase
>  Issue Type: Bug
>Reporter: Bryan Beaudreault
>Priority: Major
>
> We've had a few operational incidents over the past few months where our 
> HMaster stops accepting new connections, but can continue processing requests 
> from existing ones. Finally I was able to get heap and thread dumps to 
> confirm what's happening.
> The core trigger is HBASE-24687, where the MobFileCleanerChore is not using 
> ClusterConnection. I've prodded the linked PR to get that resolved and will 
> take it over if I don't hear soon.
> In this case, the chore is using the NettyRpcClient to make a local rpc call 
> to the same NettyRpcServer in the process. Due to 
> [NettyEventLoopGroupConfig|https://github.com/apache/hbase/blob/master/hbase-server/src/main/java/org/apache/hadoop/hbase/util/NettyEventLoopGroupConfig.java#L98],
>  we use the same EventLoopGroup for both the RPC Client and the RPC Server.
> What happens rarely is that the local client for MobFileCleanerChore gets 
> assigned to RS-EventLoopGroup-1-1. Since we share the EventLoopGroupConfig, 
> and [we don't specify a separate parent 
> group|https://github.com/apache/hbase/blob/master/hbase-server/src/main/java/org/apache/hadoop/hbase/ipc/NettyRpcServer.java#L155],
>  that group is also the group which processes new connections.
> What we see in this case is that RS-EventLoopGroup-1-1 gets hung in 
> Socket.accept. Since the client side is on the same EventLoop, it's tasks get 
> stuck in a queue waiting for the executor. So the client can't send the 
> request that the server Socket is waiting for.
> Further, the client/chore gets stuck waiting on BlockingRpcCallback.get(). We 
> use an HWT TimerTask to cancel overdue requests, but it only gets scheduled 
> [once NettyRpcConnection.sendRequest0 is 
> executed|https://github.com/apache/hbase/blob/master/hbase-client/src/main/java/org/apache/hadoop/hbase/ipc/NettyRpcConnection.java#L371].
>  But sendRequest0 [executes on the 
> EventLoop|https://github.com/apache/hbase/blob/master/hbase-client/src/main/java/org/apache/hadoop/hbase/ipc/NettyRpcConnection.java#L393],
>  and thus gets similarly stuck. So we never schedule a timeout and the chore 
> gets stuck forever.
> While fixing HBASE-24687 will fix this case, I think we should improve our 
> netty configuration here so we can avoid problems like this if we ever do 
> intra-process RPC calls again (there may already be others, not sure).



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


Re: [PR] HBASE-28043 Reduce seeks from beginning of block in StoreFileScanner.seekToPreviousRow [hbase]

2023-10-16 Thread via GitHub


Apache-HBase commented on PR #5373:
URL: https://github.com/apache/hbase/pull/5373#issuecomment-1765162740

   :confetti_ball: **+1 overall**
   
   
   
   
   
   
   | Vote | Subsystem | Runtime | Comment |
   |::|--:|:|:|
   | +0 :ok: |  reexec  |   0m 25s |  Docker mode activated.  |
   ||| _ Prechecks _ |
   | +1 :green_heart: |  dupname  |   0m  0s |  No case conflicting files 
found.  |
   | +1 :green_heart: |  hbaseanti  |   0m  0s |  Patch does not have any 
anti-patterns.  |
   | +1 :green_heart: |  @author  |   0m  0s |  The patch does not contain any 
@author tags.  |
   ||| _ master Compile Tests _ |
   | +0 :ok: |  mvndep  |   0m 13s |  Maven dependency ordering for branch  |
   | +1 :green_heart: |  mvninstall  |   2m 48s |  master passed  |
   | +1 :green_heart: |  compile  |   2m 55s |  master passed  |
   | +1 :green_heart: |  checkstyle  |   0m 43s |  master passed  |
   | +1 :green_heart: |  spotless  |   0m 44s |  branch has no errors when 
running spotless:check.  |
   | +1 :green_heart: |  spotbugs  |   1m 51s |  master passed  |
   ||| _ Patch Compile Tests _ |
   | +0 :ok: |  mvndep  |   0m 10s |  Maven dependency ordering for patch  |
   | +1 :green_heart: |  mvninstall  |   2m 35s |  the patch passed  |
   | +1 :green_heart: |  compile  |   2m 49s |  the patch passed  |
   | +1 :green_heart: |  javac  |   2m 49s |  the patch passed  |
   | -0 :warning: |  checkstyle  |   0m 34s |  hbase-server: The patch 
generated 3 new + 6 unchanged - 0 fixed = 9 total (was 6)  |
   | +1 :green_heart: |  whitespace  |   0m  0s |  The patch has no whitespace 
issues.  |
   | +1 :green_heart: |  hadoopcheck  |   9m 49s |  Patch does not cause any 
errors with Hadoop 3.2.4 3.3.6.  |
   | +1 :green_heart: |  spotless  |   0m 40s |  patch has no errors when 
running spotless:check.  |
   | +1 :green_heart: |  spotbugs  |   2m  7s |  the patch passed  |
   ||| _ Other Tests _ |
   | +1 :green_heart: |  asflicense  |   0m 16s |  The patch does not generate 
ASF License warnings.  |
   |  |   |  34m 53s |   |
   
   
   | Subsystem | Report/Notes |
   |--:|:-|
   | Docker | ClientAPI=1.43 ServerAPI=1.43 base: 
https://ci-hbase.apache.org/job/HBase-PreCommit-GitHub-PR/job/PR-5373/5/artifact/yetus-general-check/output/Dockerfile
 |
   | GITHUB PR | https://github.com/apache/hbase/pull/5373 |
   | Optional Tests | dupname asflicense javac spotbugs hadoopcheck hbaseanti 
spotless checkstyle compile |
   | uname | Linux 48fac20599a0 5.4.0-1103-aws #111~18.04.1-Ubuntu SMP Tue May 
23 20:04:10 UTC 2023 x86_64 x86_64 x86_64 GNU/Linux |
   | Build tool | maven |
   | Personality | dev-support/hbase-personality.sh |
   | git revision | master / 391dfda6ad |
   | Default Java | Eclipse Adoptium-11.0.17+8 |
   | checkstyle | 
https://ci-hbase.apache.org/job/HBase-PreCommit-GitHub-PR/job/PR-5373/5/artifact/yetus-general-check/output/diff-checkstyle-hbase-server.txt
 |
   | Max. process+thread count | 79 (vs. ulimit of 3) |
   | modules | C: hbase-server hbase-mapreduce U: . |
   | Console output | 
https://ci-hbase.apache.org/job/HBase-PreCommit-GitHub-PR/job/PR-5373/5/console 
|
   | versions | git=2.34.1 maven=3.8.6 spotbugs=4.7.3 |
   | Powered by | Apache Yetus 0.12.0 https://yetus.apache.org |
   
   
   This message was automatically generated.
   
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@hbase.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[jira] [Created] (HBASE-28156) Intra-process client connections cause netty EventLoop deadlock

2023-10-16 Thread Bryan Beaudreault (Jira)
Bryan Beaudreault created HBASE-28156:
-

 Summary: Intra-process client connections cause netty EventLoop 
deadlock
 Key: HBASE-28156
 URL: https://issues.apache.org/jira/browse/HBASE-28156
 Project: HBase
  Issue Type: Bug
Reporter: Bryan Beaudreault


We've had a few operational incidents over the past few months where our 
HMaster stops accepting new connections, but can continue processing requests 
from existing ones. Finally I was able to get heap and thread dumps to confirm 
what's happening.

The core trigger is HBASE-24687, where the MobFileCleanerChore is not using 
ClusterConnection. I've prodded the linked PR to get that resolved and will 
take it over if I don't hear soon.

In this case, the chore is using the NettyRpcClient to make a local rpc call to 
the same NettyRpcServer in the process. Due to 
[NettyEventLoopGroupConfig|https://github.com/apache/hbase/blob/master/hbase-server/src/main/java/org/apache/hadoop/hbase/util/NettyEventLoopGroupConfig.java#L98],
 we use the same EventLoopGroup for both the RPC Client and the RPC Server.

What happens rarely is that the local client for MobFileCleanerChore gets 
assigned to RS-EventLoopGroup-1-1. Since we share the EventLoopGroupConfig, and 
[we don't specify a separate parent 
group|https://github.com/apache/hbase/blob/master/hbase-server/src/main/java/org/apache/hadoop/hbase/ipc/NettyRpcServer.java#L155],
 that group is also the group which processes new connections.

What we see in this case is that RS-EventLoopGroup-1-1 gets hung in 
Socket.accept. Since the client side is on the same EventLoop, it's tasks get 
stuck in a queue waiting for the executor. So the client can't send the request 
that the server Socket is waiting for.

Further, the client/chore gets stuck waiting on BlockingRpcCallback.get(). We 
use an HWT TimerTask to cancel overdue requests, but it only gets scheduled 
[once NettyRpcConnection.sendRequest0 is 
executed|https://github.com/apache/hbase/blob/master/hbase-client/src/main/java/org/apache/hadoop/hbase/ipc/NettyRpcConnection.java#L371].
 But sendRequest0 [executes on the 
EventLoop|https://github.com/apache/hbase/blob/master/hbase-client/src/main/java/org/apache/hadoop/hbase/ipc/NettyRpcConnection.java#L393],
 and thus gets similarly stuck. So we never schedule a timeout and the chore 
gets stuck forever.

While fixing HBASE-24687 will fix this case, I think we should improve our 
netty configuration here so we can avoid problems like this if we ever do 
intra-process RPC calls again (there may already be others, not sure).



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Comment Edited] (HBASE-28043) Reduce seeks from beginning of block in StoreFileScanner.seekToPreviousRow

2023-10-16 Thread Becker Ewing (Jira)


[ 
https://issues.apache.org/jira/browse/HBASE-28043?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17775901#comment-17775901
 ] 

Becker Ewing edited comment on HBASE-28043 at 10/16/23 7:32 PM:


I've done some benchmarking with the "hbase pe" tool to test whether 1-row 
reverse scans—specifically those done for meta lookups—to see how they regress 
with this change as this change focuses on improving throughput for reverse 
scans over many rows. I used the following methodology:
 # Start a local HBase cluster with "hbase master start --localRegionServers=1"
 # Prepare the meta table for benchmarking with table state for benchmarking 
with "hbase pe --nomapred=true metaWrite 1". Additionally, flush the meta table 
and run a major compaction on it (I did this to keep the benchmarking numbers 
as similar as I could s.t. they were already reading over a consistent single 
storefile). I used the following command sequence to flush & compact the meta:  
{code:java}
$ hbase shell
> flush ‘hbase:meta’
> major_compact ‘hbase:meta’

{code}
 

 # Run "hbase pe --nomapred=true metaRandomRead 10" to benchmark meta lookup 
performance by scanning over all test rows inserted into the meta w/ 10 
concurrent threads

 

I got the following results (averaged over all 10 threads):
||Benchmark||Revision||Avg Latency (us)||Avg Throughput (rows / sec)||
|metaRandomRead|master|839|11898|
|metaRandomRead|patch|891|11203|

 

If anyone is interested in looking at per-thread results/response time 
histograms, I've pasted the raw results output logs of "hbase pe 
--nomapred=true metaRandomRead 10" for this patch + master in [this 
gist|https://gist.github.com/jbewing/b06aaf71326c323e3dfd85157c3cfcde]. 

 

As expected, this patch does bring a slight regression to the single row lookup 
case. That regression looks to be about ~5%. I'm theorizing that we're not 
seeing a huge regression here as the "hbase:meta" table has RIV1 data-block 
encoding applied by default which makes the extra reseek pretty cheap.


was (Author: JIRAUSER301708):
I've done some benchmarking with the "hbase pe" tool to test whether 1-row 
reverse scans—specifically those done for meta lookups—to see how they regress 
with this change as this change focuses on improving throughput for reverse 
scans over many rows. I used the following methodology:
 # Start a local HBase cluster with "hbase master start --localRegionServers=1"
 # Prepare the meta table for benchmarking with table state for benchmarking 
with "hbase pe --nomapred=true metaWrite 1". Additionally, flush the meta table 
and run a major compaction on it (I did this to keep the benchmarking numbers 
as similar as I could s.t. they were already reading over a consistent single 
storefile). I used the following command sequence to flush & compact the meta:  
{code:java}
$ hbase shell
> flush ‘hbase:meta’
> major_compact ‘hbase:meta’

{code}
 

 # Run "hbase pe --nomapred=true metaRandomRead 10" to benchmark meta lookup 
performance by scanning over all test rows inserted into the meta w/ 10 
concurrent threads

 

I got the following results (averaged over all 10 threads):
||Benchmark||Revision||Avg Latency (us)||Avg Throughput (rows / sec)||
|metaRandomRead|master|839|11898|
|metaRandomRead|patch|891|11203|

 

If anyone is interested in looking at per-thread results/response time 
histograms, I've pasted the raw results output logs of "hbase pe 
--nomapred=true metaRandomRead 10" for this patch + master in [this 
gist|https://gist.github.com/jbewing/b06aaf71326c323e3dfd85157c3cfcde]. 

 

As expected, this patch does bring a slight regression to the single row lookup 
case. That regression looks to be about ~5%. I'm theorizing that we're not 
seeing a huge regression here as the "hbase:meta" table has RIV1 data-block 
encoding applied by default which makes the extra reseek pretty cheap.

> Reduce seeks from beginning of block in StoreFileScanner.seekToPreviousRow
> --
>
> Key: HBASE-28043
> URL: https://issues.apache.org/jira/browse/HBASE-28043
> Project: HBase
>  Issue Type: Improvement
>Reporter: Becker Ewing
>Assignee: Becker Ewing
>Priority: Major
> Attachments: Current_SeekToPreviousRowBehavior.png, 
> Proposed_SeekToPreviousRowBehavior.png
>
>
> Currently, for non-RIV1 DBE encodings, each call to 
> [StoreFileScanner.seekToPreviousRow|https://github.com/apache/hbase/blob/89ca7f4ade84c84a246281c71898543b6161c099/hbase-server/src/main/java/org/apache/hadoop/hbase/regionserver/StoreFileScanner.java#L493-L506]
>  (a common operation in reverse scans) results in two seeks: 
>  # Seek from the beginning of the block to before the given row to find the 
> prior row
>  # Seek from the beginning of the block to the first cell o

[jira] [Comment Edited] (HBASE-28043) Reduce seeks from beginning of block in StoreFileScanner.seekToPreviousRow

2023-10-16 Thread Becker Ewing (Jira)


[ 
https://issues.apache.org/jira/browse/HBASE-28043?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17775901#comment-17775901
 ] 

Becker Ewing edited comment on HBASE-28043 at 10/16/23 7:31 PM:


I've done some benchmarking with the "hbase pe" tool to test whether 1-row 
reverse scans—specifically those done for meta lookups—to see how they regress 
with this change as this change focuses on improving throughput for reverse 
scans over many rows. I used the following methodology:
 # Start a local HBase cluster with "hbase master start --localRegionServers=1"
 # Prepare the meta table for benchmarking with table state for benchmarking 
with "hbase pe --nomapred=true metaWrite 1". Additionally, flush the meta table 
and run a major compaction on it (I did this to keep the benchmarking numbers 
as similar as I could s.t. they were already reading over a consistent single 
storefile). I used the following command sequence to flush & compact the meta:  
{code:java}
$ hbase shell
> flush ‘hbase:meta’
> major_compact ‘hbase:meta’

{code}
 

 # Run "hbase pe --nomapred=true metaRandomRead 10" to benchmark meta lookup 
performance by scanning over all test rows inserted into the meta w/ 10 
concurrent threads

 

I got the following results (averaged over all 10 threads):
||Benchmark||Revision||Avg Latency (us)||Avg Throughput (rows / sec)||
|metaRandomRead|master|839|11898|
|metaRandomRead|patch|891|11203|

 

If anyone is interested in looking at per-thread results/response time 
histograms, I've pasted the raw results output logs of "hbase pe 
--nomapred=true metaRandomRead 10" for this patch + master in [this 
gist|https://gist.github.com/jbewing/b06aaf71326c323e3dfd85157c3cfcde]. 

 

As expected, this patch does bring a slight regression to the single row lookup 
case. That regression looks to be about ~5%. I'm theorizing that we're not 
seeing a huge regression here as the "hbase:meta" table has RIV1 data-block 
encoding applied by default which makes the extra reseek pretty cheap.


was (Author: JIRAUSER301708):
I've done some benchmarking with the "hbase pe" tool to test whether 1-row 
reverse scans—specifically those done for meta lookups—to see how they regress 
with this change as this change focuses on improving throughput for reverse 
scans over many rows. I used the following methodology:
 # Start a local HBase cluster with "hbase master start --localRegionServers=1"
 # Prepare the meta table for benchmarking with table state for benchmarking 
with "hbase pe --nomapred=true metaWrite 1". Additionally, flush the meta table 
and run a major compaction on it (I did this to keep the benchmarking numbers 
as similar as I could s.t. they were already reading over a consistent single 
storefile). I used the following command sequence to flush & compact the meta:  
{code:java}
$ hbase shell
> flush ‘hbase:meta’
> major_compact ‘hbase:meta’

{code}
 

 # Run "hbase pe --nomapred=true metaRandomRead 10" to benchmark meta lookup 
performance by scanning over all test rows inserted into the meta w/ 10 
concurrent threads

 

I got the following results (averaged over all 10 threads):
||Benchmark||Revision||Avg Latency (us)||Avg Throughput (rows / sec)||
|metaRandomRead|master|839|11898|
|metaRandomRead|patch|891|11203|

 

If anyone is interested in looking at per-thread results/response time 
histograms, I've pasted the raw results output logs of "hbase pe 
--nomapred=true metaRandomRead 10" for this patch + master in [this 
gist|https://gist.github.com/jbewing/b06aaf71326c323e3dfd85157c3cfcde]. 

 

As expected, this patch does bring a slight regression to the single row lookup 
case. That regression looks to be about ~5%. I'm theorizing that we're not 
seeing a huge regression here as the "hbase:meta" table has RIV1 data-block 
encoding applied by default which makes the extra reseek pretty cheap.

> Reduce seeks from beginning of block in StoreFileScanner.seekToPreviousRow
> --
>
> Key: HBASE-28043
> URL: https://issues.apache.org/jira/browse/HBASE-28043
> Project: HBase
>  Issue Type: Improvement
>Reporter: Becker Ewing
>Assignee: Becker Ewing
>Priority: Major
> Attachments: Current_SeekToPreviousRowBehavior.png, 
> Proposed_SeekToPreviousRowBehavior.png
>
>
> Currently, for non-RIV1 DBE encodings, each call to 
> [StoreFileScanner.seekToPreviousRow|https://github.com/apache/hbase/blob/89ca7f4ade84c84a246281c71898543b6161c099/hbase-server/src/main/java/org/apache/hadoop/hbase/regionserver/StoreFileScanner.java#L493-L506]
>  (a common operation in reverse scans) results in two seeks: 
>  # Seek from the beginning of the block to before the given row to find the 
> prior row
>  # Seek from the beginning of the block to the first cell o

Re: [PR] HBASE-28043 Reduce seeks from beginning of block in StoreFileScanner.seekToPreviousRow [hbase]

2023-10-16 Thread via GitHub


jbewing commented on PR #5373:
URL: https://github.com/apache/hbase/pull/5373#issuecomment-1765144225

   Thanks for the 👀 & poke @bbeaudreault. I've addressed your review comments. 
I've also run benchmarks for the single-row meta lookup case and summarized the 
results in [this JIRA 
comment](https://issues.apache.org/jira/browse/HBASE-28043?focusedCommentId=17775901&page=com.atlassian.jira.plugin.system.issuetabpanels%3Acomment-tabpanel#comment-17775901).
 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@hbase.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[jira] [Commented] (HBASE-28043) Reduce seeks from beginning of block in StoreFileScanner.seekToPreviousRow

2023-10-16 Thread Becker Ewing (Jira)


[ 
https://issues.apache.org/jira/browse/HBASE-28043?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17775901#comment-17775901
 ] 

Becker Ewing commented on HBASE-28043:
--

I've done some benchmarking with the "hbase pe" tool to test whether 1-row 
reverse scans—specifically those done for meta lookups—to see how they regress 
with this change as this change focuses on improving throughput for reverse 
scans over many rows. I used the following methodology:
 # Start a local HBase cluster with "hbase master start --localRegionServers=1"
 # Prepare the meta table for benchmarking with table state for benchmarking 
with "hbase pe --nomapred=true metaWrite 1". Additionally, flush the meta table 
and run a major compaction on it (I did this to keep the benchmarking numbers 
as similar as I could s.t. they were already reading over a consistent single 
storefile). I used the following command sequence to flush & compact the meta:  
{code:java}
$ hbase shell
> flush ‘hbase:meta’
> major_compact ‘hbase:meta’

{code}
 

 # Run "hbase pe --nomapred=true metaRandomRead 10" to benchmark meta lookup 
performance by scanning over all test rows inserted into the meta w/ 10 
concurrent threads

 

I got the following results (averaged over all 10 threads):
||Benchmark||Revision||Avg Latency (us)||Avg Throughput (rows / sec)||
|metaRandomRead|master|839|11898|
|metaRandomRead|patch|891|11203|

 

If anyone is interested in looking at per-thread results/response time 
histograms, I've pasted the raw results output logs of "hbase pe 
--nomapred=true metaRandomRead 10" for this patch + master in [this 
gist|https://gist.github.com/jbewing/b06aaf71326c323e3dfd85157c3cfcde]. 

 

As expected, this patch does bring a slight regression to the single row lookup 
case. That regression looks to be about ~5%. I'm theorizing that we're not 
seeing a huge regression here as the "hbase:meta" table has RIV1 data-block 
encoding applied by default which makes the extra reseek pretty cheap.

> Reduce seeks from beginning of block in StoreFileScanner.seekToPreviousRow
> --
>
> Key: HBASE-28043
> URL: https://issues.apache.org/jira/browse/HBASE-28043
> Project: HBase
>  Issue Type: Improvement
>Reporter: Becker Ewing
>Assignee: Becker Ewing
>Priority: Major
> Attachments: Current_SeekToPreviousRowBehavior.png, 
> Proposed_SeekToPreviousRowBehavior.png
>
>
> Currently, for non-RIV1 DBE encodings, each call to 
> [StoreFileScanner.seekToPreviousRow|https://github.com/apache/hbase/blob/89ca7f4ade84c84a246281c71898543b6161c099/hbase-server/src/main/java/org/apache/hadoop/hbase/regionserver/StoreFileScanner.java#L493-L506]
>  (a common operation in reverse scans) results in two seeks: 
>  # Seek from the beginning of the block to before the given row to find the 
> prior row
>  # Seek from the beginning of the block to the first cell of the prior row
> So if there are "N" rows in a block, a reverse scan through each row results 
> in seeking past 2(N-1)! rows.
>  
> This is a particularly expensive operation for tall tables that have many 
> rows in a block.
>  
> By introducing a state variable "previousRow" to StoreFileScanner, I believe 
> that we could modify the seeking algorithm to be:
>  # Seek from the beginning of the block to before the given row to find the 
> prior row
>  # Seek from the beginning of the block to before the row that is before the 
> row that was just seeked to (i.e. 2 rows back). _Save_ this as a hint for 
> where the prior row is in "previousRow"
>  # Reseek from "previousRow" (2 rows back from start) to 1 row back from 
> start (to the actual previousRow)
> Then the rest of the calls where a "previousRow" is present, you just need to 
> seek to the beginning of the block once instead of twice, i.e. 
>  # seek from the beginning of the block to right before the beginning of your 
> "previousRow" marker. Save this as the new "previousRow" marker
>  # Reseek to the next row (i.e. your previous "previousRow" marker)
>  
> If there are "N" rows in a block, a reverse scan from row N to row 0 results 
> in seeking past approximately (N-1)! rows i.e. 50% less than the current 
> behavior.
>  
> See the attached diagrams for the current and proposed behavior. 
>  
>  



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


Re: [PR] HBASE-28043 Reduce seeks from beginning of block in StoreFileScanner.seekToPreviousRow [hbase]

2023-10-16 Thread via GitHub


jbewing commented on code in PR #5373:
URL: https://github.com/apache/hbase/pull/5373#discussion_r1361148558


##
hbase-server/src/main/java/org/apache/hadoop/hbase/regionserver/StoreFileScanner.java:
##
@@ -486,44 +490,51 @@ public boolean shouldUseScanner(Scan scan, HStore store, 
long oldestUnexpiredTS)
   @Override
   public boolean seekToPreviousRow(Cell originalKey) throws IOException {
 try {
-  try {
-boolean keepSeeking = false;
-Cell key = originalKey;
-do {
-  Cell seekKey = PrivateCellUtil.createFirstOnRow(key);
-  if (seekCount != null) seekCount.increment();
-  if (!hfs.seekBefore(seekKey)) {
-this.cur = null;
-return false;
-  }
-  Cell curCell = hfs.getCell();
-  Cell firstKeyOfPreviousRow = 
PrivateCellUtil.createFirstOnRow(curCell);
-
-  if (seekCount != null) seekCount.increment();
-  if (!seekAtOrAfter(hfs, firstKeyOfPreviousRow)) {
-this.cur = null;
-return false;
-  }
-
-  setCurrentCell(hfs.getCell());
-  this.stopSkippingKVsIfNextRow = true;
-  boolean resultOfSkipKVs;
-  try {
-resultOfSkipKVs = skipKVsNewerThanReadpoint();
-  } finally {
-this.stopSkippingKVsIfNextRow = false;
-  }
-  if (!resultOfSkipKVs || getComparator().compareRows(cur, 
firstKeyOfPreviousRow) > 0) {
-keepSeeking = true;
-key = firstKeyOfPreviousRow;
-continue;
-  } else {
-keepSeeking = false;
-  }
-} while (keepSeeking);
-return true;
-  } finally {
-realSeekDone = true;
+  if (previousRow == null || getComparator().compareRows(previousRow, 
originalKey) > 0) {
+return seekToPreviousRowWithoutHint(originalKey);
+  } else {

Review Comment:
   Done in 
https://github.com/apache/hbase/pull/5373/commits/194ea0aacef3a2c9c439397aac8e57d7fdcaa042



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@hbase.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [PR] HBASE-28064:Implement truncate_region command [hbase]

2023-10-16 Thread via GitHub


Apache-HBase commented on PR #5462:
URL: https://github.com/apache/hbase/pull/5462#issuecomment-1764953424

   :confetti_ball: **+1 overall**
   
   
   
   
   
   
   | Vote | Subsystem | Runtime | Comment |
   |::|--:|:|:|
   | +0 :ok: |  reexec  |   0m 34s |  Docker mode activated.  |
   ||| _ Prechecks _ |
   | +1 :green_heart: |  dupname  |   0m  0s |  No case conflicting files 
found.  |
   | +0 :ok: |  prototool  |   0m  0s |  prototool was not available.  |
   | +1 :green_heart: |  hbaseanti  |   0m  0s |  Patch does not have any 
anti-patterns.  |
   | +1 :green_heart: |  @author  |   0m  0s |  The patch does not contain any 
@author tags.  |
   ||| _ master Compile Tests _ |
   | +0 :ok: |  mvndep  |   0m  9s |  Maven dependency ordering for branch  |
   | +1 :green_heart: |  mvninstall  |   2m 41s |  master passed  |
   | +1 :green_heart: |  compile  |   4m 34s |  master passed  |
   | +1 :green_heart: |  checkstyle  |   1m 34s |  master passed  |
   | +1 :green_heart: |  spotless  |   0m 43s |  branch has no errors when 
running spotless:check.  |
   | +1 :green_heart: |  spotbugs  |   5m 22s |  master passed  |
   ||| _ Patch Compile Tests _ |
   | +0 :ok: |  mvndep  |   0m 11s |  Maven dependency ordering for patch  |
   | +1 :green_heart: |  mvninstall  |   2m 34s |  the patch passed  |
   | +1 :green_heart: |  compile  |   4m 35s |  the patch passed  |
   | +1 :green_heart: |  cc  |   4m 35s |  the patch passed  |
   | +1 :green_heart: |  javac  |   4m 35s |  the patch passed  |
   | +1 :green_heart: |  checkstyle  |   1m 33s |  the patch passed  |
   | -0 :warning: |  rubocop  |   0m 11s |  The patch generated 8 new + 766 
unchanged - 0 fixed = 774 total (was 766)  |
   | +1 :green_heart: |  whitespace  |   0m  0s |  The patch has no whitespace 
issues.  |
   | +1 :green_heart: |  hadoopcheck  |   9m  8s |  Patch does not cause any 
errors with Hadoop 3.2.4 3.3.6.  |
   | +1 :green_heart: |  hbaseprotoc  |   1m 54s |  the patch passed  |
   | +1 :green_heart: |  spotless  |   0m 41s |  patch has no errors when 
running spotless:check.  |
   | +1 :green_heart: |  spotbugs  |   5m 56s |  the patch passed  |
   ||| _ Other Tests _ |
   | +1 :green_heart: |  asflicense  |   0m 47s |  The patch does not generate 
ASF License warnings.  |
   |  |   |  49m 42s |   |
   
   
   | Subsystem | Report/Notes |
   |--:|:-|
   | Docker | ClientAPI=1.43 ServerAPI=1.43 base: 
https://ci-hbase.apache.org/job/HBase-PreCommit-GitHub-PR/job/PR-5462/2/artifact/yetus-general-check/output/Dockerfile
 |
   | GITHUB PR | https://github.com/apache/hbase/pull/5462 |
   | JIRA Issue | HBASE-28064 |
   | Optional Tests | dupname asflicense javac spotbugs hadoopcheck hbaseanti 
spotless checkstyle compile cc hbaseprotoc prototool rubocop |
   | uname | Linux ecff407a0924 5.4.0-163-generic #180-Ubuntu SMP Tue Sep 5 
13:21:23 UTC 2023 x86_64 x86_64 x86_64 GNU/Linux |
   | Build tool | maven |
   | Personality | dev-support/hbase-personality.sh |
   | git revision | master / 391dfda6ad |
   | Default Java | Eclipse Adoptium-11.0.17+8 |
   | rubocop | 
https://ci-hbase.apache.org/job/HBase-PreCommit-GitHub-PR/job/PR-5462/2/artifact/yetus-general-check/output/diff-patch-rubocop.txt
 |
   | Max. process+thread count | 80 (vs. ulimit of 3) |
   | modules | C: hbase-protocol-shaded hbase-client hbase-server hbase-thrift 
hbase-shell U: . |
   | Console output | 
https://ci-hbase.apache.org/job/HBase-PreCommit-GitHub-PR/job/PR-5462/2/console 
|
   | versions | git=2.34.1 maven=3.8.6 spotbugs=4.7.3 rubocop=1.37.1 |
   | Powered by | Apache Yetus 0.12.0 https://yetus.apache.org |
   
   
   This message was automatically generated.
   
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@hbase.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [PR] HBASE-28113 Modify the way of acquiring the RegionStateNode lock in checkOnlineRegionsReport to tryLock [hbase]

2023-10-16 Thread via GitHub


Apache-HBase commented on PR #5442:
URL: https://github.com/apache/hbase/pull/5442#issuecomment-1764874427

   :confetti_ball: **+1 overall**
   
   
   
   
   
   
   | Vote | Subsystem | Runtime | Comment |
   |::|--:|:|:|
   | +0 :ok: |  reexec  |   0m 29s |  Docker mode activated.  |
   ||| _ Prechecks _ |
   | +1 :green_heart: |  dupname  |   0m  0s |  No case conflicting files 
found.  |
   | +1 :green_heart: |  hbaseanti  |   0m  0s |  Patch does not have any 
anti-patterns.  |
   | +1 :green_heart: |  @author  |   0m  0s |  The patch does not contain any 
@author tags.  |
   ||| _ master Compile Tests _ |
   | +1 :green_heart: |  mvninstall  |   2m 55s |  master passed  |
   | +1 :green_heart: |  compile  |   2m 26s |  master passed  |
   | +1 :green_heart: |  checkstyle  |   0m 34s |  master passed  |
   | +1 :green_heart: |  spotless  |   0m 42s |  branch has no errors when 
running spotless:check.  |
   | +1 :green_heart: |  spotbugs  |   1m 31s |  master passed  |
   ||| _ Patch Compile Tests _ |
   | +1 :green_heart: |  mvninstall  |   2m 46s |  the patch passed  |
   | +1 :green_heart: |  compile  |   2m 39s |  the patch passed  |
   | +1 :green_heart: |  javac  |   2m 39s |  the patch passed  |
   | -0 :warning: |  checkstyle  |   0m 42s |  hbase-server: The patch 
generated 1 new + 0 unchanged - 0 fixed = 1 total (was 0)  |
   | +1 :green_heart: |  whitespace  |   0m  0s |  The patch has no whitespace 
issues.  |
   | +1 :green_heart: |  hadoopcheck  |   9m 55s |  Patch does not cause any 
errors with Hadoop 3.2.4 3.3.6.  |
   | +1 :green_heart: |  spotless  |   0m 39s |  patch has no errors when 
running spotless:check.  |
   | +1 :green_heart: |  spotbugs  |   1m 28s |  the patch passed  |
   ||| _ Other Tests _ |
   | +1 :green_heart: |  asflicense  |   0m  8s |  The patch does not generate 
ASF License warnings.  |
   |  |   |  33m  3s |   |
   
   
   | Subsystem | Report/Notes |
   |--:|:-|
   | Docker | ClientAPI=1.43 ServerAPI=1.43 base: 
https://ci-hbase.apache.org/job/HBase-PreCommit-GitHub-PR/job/PR-5442/4/artifact/yetus-general-check/output/Dockerfile
 |
   | GITHUB PR | https://github.com/apache/hbase/pull/5442 |
   | Optional Tests | dupname asflicense javac spotbugs hadoopcheck hbaseanti 
spotless checkstyle compile |
   | uname | Linux a726034acb16 5.4.0-1103-aws #111~18.04.1-Ubuntu SMP Tue May 
23 20:04:10 UTC 2023 x86_64 x86_64 x86_64 GNU/Linux |
   | Build tool | maven |
   | Personality | dev-support/hbase-personality.sh |
   | git revision | master / 391dfda6ad |
   | Default Java | Eclipse Adoptium-11.0.17+8 |
   | checkstyle | 
https://ci-hbase.apache.org/job/HBase-PreCommit-GitHub-PR/job/PR-5442/4/artifact/yetus-general-check/output/diff-checkstyle-hbase-server.txt
 |
   | Max. process+thread count | 78 (vs. ulimit of 3) |
   | modules | C: hbase-server U: hbase-server |
   | Console output | 
https://ci-hbase.apache.org/job/HBase-PreCommit-GitHub-PR/job/PR-5442/4/console 
|
   | versions | git=2.34.1 maven=3.8.6 spotbugs=4.7.3 |
   | Powered by | Apache Yetus 0.12.0 https://yetus.apache.org |
   
   
   This message was automatically generated.
   
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@hbase.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [PR] HBASE-28064:Implement truncate_region command [hbase]

2023-10-16 Thread via GitHub


Apache-HBase commented on PR #5462:
URL: https://github.com/apache/hbase/pull/5462#issuecomment-1764873248

   :broken_heart: **-1 overall**
   
   
   
   
   
   
   | Vote | Subsystem | Runtime | Comment |
   |::|--:|:|:|
   | +0 :ok: |  reexec  |   0m 29s |  Docker mode activated.  |
   | -0 :warning: |  yetus  |   0m  3s |  Unprocessed flag(s): 
--brief-report-file --spotbugs-strict-precheck --whitespace-eol-ignore-list 
--whitespace-tabs-ignore-list --quick-hadoopcheck  |
   ||| _ Prechecks _ |
   ||| _ master Compile Tests _ |
   | +0 :ok: |  mvndep  |   0m 11s |  Maven dependency ordering for branch  |
   | +1 :green_heart: |  mvninstall  |   2m 47s |  master passed  |
   | +1 :green_heart: |  compile  |   2m 26s |  master passed  |
   | +1 :green_heart: |  shadedjars  |   4m 52s |  branch has no errors when 
building our shaded downstream artifacts.  |
   | +1 :green_heart: |  javadoc  |   1m 34s |  master passed  |
   ||| _ Patch Compile Tests _ |
   | +0 :ok: |  mvndep  |   0m 13s |  Maven dependency ordering for patch  |
   | +1 :green_heart: |  mvninstall  |   2m 37s |  the patch passed  |
   | +1 :green_heart: |  compile  |   2m 25s |  the patch passed  |
   | +1 :green_heart: |  javac  |   2m 25s |  the patch passed  |
   | +1 :green_heart: |  shadedjars  |   4m 50s |  patch has no errors when 
building our shaded downstream artifacts.  |
   | +1 :green_heart: |  javadoc  |   1m 33s |  the patch passed  |
   ||| _ Other Tests _ |
   | +1 :green_heart: |  unit  |   0m 38s |  hbase-protocol-shaded in the patch 
passed.  |
   | +1 :green_heart: |  unit  |   1m 29s |  hbase-client in the patch passed.  
|
   | -1 :x: |  unit  | 225m  2s |  hbase-server in the patch failed.  |
   | +1 :green_heart: |  unit  |   8m 46s |  hbase-thrift in the patch passed.  
|
   | +1 :green_heart: |  unit  |   7m  4s |  hbase-shell in the patch passed.  |
   |  |   | 272m 29s |   |
   
   
   | Subsystem | Report/Notes |
   |--:|:-|
   | Docker | ClientAPI=1.43 ServerAPI=1.43 base: 
https://ci-hbase.apache.org/job/HBase-PreCommit-GitHub-PR/job/PR-5462/1/artifact/yetus-jdk11-hadoop3-check/output/Dockerfile
 |
   | GITHUB PR | https://github.com/apache/hbase/pull/5462 |
   | JIRA Issue | HBASE-28064 |
   | Optional Tests | javac javadoc unit shadedjars compile |
   | uname | Linux 4de732e8260b 5.4.0-156-generic #173-Ubuntu SMP Tue Jul 11 
07:25:22 UTC 2023 x86_64 x86_64 x86_64 GNU/Linux |
   | Build tool | maven |
   | Personality | dev-support/hbase-personality.sh |
   | git revision | master / 391dfda6ad |
   | Default Java | Eclipse Adoptium-11.0.17+8 |
   | unit | 
https://ci-hbase.apache.org/job/HBase-PreCommit-GitHub-PR/job/PR-5462/1/artifact/yetus-jdk11-hadoop3-check/output/patch-unit-hbase-server.txt
 |
   |  Test Results | 
https://ci-hbase.apache.org/job/HBase-PreCommit-GitHub-PR/job/PR-5462/1/testReport/
 |
   | Max. process+thread count | 4271 (vs. ulimit of 3) |
   | modules | C: hbase-protocol-shaded hbase-client hbase-server hbase-thrift 
hbase-shell U: . |
   | Console output | 
https://ci-hbase.apache.org/job/HBase-PreCommit-GitHub-PR/job/PR-5462/1/console 
|
   | versions | git=2.34.1 maven=3.8.6 |
   | Powered by | Apache Yetus 0.12.0 https://yetus.apache.org |
   
   
   This message was automatically generated.
   
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@hbase.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [PR] HBASE-28064:Implement truncate_region command [hbase]

2023-10-16 Thread via GitHub


Apache-HBase commented on PR #5462:
URL: https://github.com/apache/hbase/pull/5462#issuecomment-1764854597

   :broken_heart: **-1 overall**
   
   
   
   
   
   
   | Vote | Subsystem | Runtime | Comment |
   |::|--:|:|:|
   | +0 :ok: |  reexec  |   0m 33s |  Docker mode activated.  |
   | -0 :warning: |  yetus  |   0m  2s |  Unprocessed flag(s): 
--brief-report-file --spotbugs-strict-precheck --whitespace-eol-ignore-list 
--whitespace-tabs-ignore-list --quick-hadoopcheck  |
   ||| _ Prechecks _ |
   ||| _ master Compile Tests _ |
   | +0 :ok: |  mvndep  |   0m 16s |  Maven dependency ordering for branch  |
   | +1 :green_heart: |  mvninstall  |   2m 27s |  master passed  |
   | +1 :green_heart: |  compile  |   1m 46s |  master passed  |
   | +1 :green_heart: |  shadedjars  |   5m 12s |  branch has no errors when 
building our shaded downstream artifacts.  |
   | +1 :green_heart: |  javadoc  |   1m 10s |  master passed  |
   ||| _ Patch Compile Tests _ |
   | +0 :ok: |  mvndep  |   0m 12s |  Maven dependency ordering for patch  |
   | +1 :green_heart: |  mvninstall  |   2m 13s |  the patch passed  |
   | +1 :green_heart: |  compile  |   1m 44s |  the patch passed  |
   | +1 :green_heart: |  javac  |   1m 44s |  the patch passed  |
   | +1 :green_heart: |  shadedjars  |   5m  8s |  patch has no errors when 
building our shaded downstream artifacts.  |
   | +1 :green_heart: |  javadoc  |   1m  8s |  the patch passed  |
   ||| _ Other Tests _ |
   | +1 :green_heart: |  unit  |   0m 26s |  hbase-protocol-shaded in the patch 
passed.  |
   | +1 :green_heart: |  unit  |   1m 20s |  hbase-client in the patch passed.  
|
   | -1 :x: |  unit  | 217m 23s |  hbase-server in the patch failed.  |
   | +1 :green_heart: |  unit  |   6m 45s |  hbase-thrift in the patch passed.  
|
   | +1 :green_heart: |  unit  |   7m 41s |  hbase-shell in the patch passed.  |
   |  |   | 260m 10s |   |
   
   
   | Subsystem | Report/Notes |
   |--:|:-|
   | Docker | ClientAPI=1.43 ServerAPI=1.43 base: 
https://ci-hbase.apache.org/job/HBase-PreCommit-GitHub-PR/job/PR-5462/1/artifact/yetus-jdk8-hadoop3-check/output/Dockerfile
 |
   | GITHUB PR | https://github.com/apache/hbase/pull/5462 |
   | JIRA Issue | HBASE-28064 |
   | Optional Tests | javac javadoc unit shadedjars compile |
   | uname | Linux 6f953eec680d 5.4.0-1103-aws #111~18.04.1-Ubuntu SMP Tue May 
23 20:04:10 UTC 2023 x86_64 x86_64 x86_64 GNU/Linux |
   | Build tool | maven |
   | Personality | dev-support/hbase-personality.sh |
   | git revision | master / 391dfda6ad |
   | Default Java | Temurin-1.8.0_352-b08 |
   | unit | 
https://ci-hbase.apache.org/job/HBase-PreCommit-GitHub-PR/job/PR-5462/1/artifact/yetus-jdk8-hadoop3-check/output/patch-unit-hbase-server.txt
 |
   |  Test Results | 
https://ci-hbase.apache.org/job/HBase-PreCommit-GitHub-PR/job/PR-5462/1/testReport/
 |
   | Max. process+thread count | 4603 (vs. ulimit of 3) |
   | modules | C: hbase-protocol-shaded hbase-client hbase-server hbase-thrift 
hbase-shell U: . |
   | Console output | 
https://ci-hbase.apache.org/job/HBase-PreCommit-GitHub-PR/job/PR-5462/1/console 
|
   | versions | git=2.34.1 maven=3.8.6 |
   | Powered by | Apache Yetus 0.12.0 https://yetus.apache.org |
   
   
   This message was automatically generated.
   
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@hbase.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [PR] HBASE-28043 Reduce seeks from beginning of block in StoreFileScanner.seekToPreviousRow [hbase]

2023-10-16 Thread via GitHub


bbeaudreault commented on PR #5373:
URL: https://github.com/apache/hbase/pull/5373#issuecomment-1764836044

   @wchevreuil I don't think my nitpick about if/else needs to be a blocker. 
But I do think we should at least perf eval 1 row reverse scans to make sure 
this isn't a regression for them. They're very common in meta lookups, which is 
in the critical path of clients (we sometimes see meta hotspotting as is)


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@hbase.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [PR] HBASE-24687 MobFileCleanerChore uses a new Connection for each table … [hbase]

2023-10-16 Thread via GitHub


bbeaudreault commented on PR #2038:
URL: https://github.com/apache/hbase/pull/2038#issuecomment-1764825284

   @ArthurSXL8 any chance you can rebase and fix the merge conflicts on this? I 
can help get it merged, this recently became an issue for us.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@hbase.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [PR] HBASE-28113 Modify the way of acquiring the RegionStateNode lock in checkOnlineRegionsReport to tryLock [hbase]

2023-10-16 Thread via GitHub


hiping-tech commented on code in PR #5442:
URL: https://github.com/apache/hbase/pull/5442#discussion_r1360913265


##
hbase-server/src/main/java/org/apache/hadoop/hbase/master/assignment/AssignmentManager.java:
##
@@ -1398,29 +1398,37 @@ private void checkOnlineRegionsReport(ServerStateNode 
serverNode, Set re
 continue;
   }
   final long lag = 1000;
-  regionNode.lock();
-  try {
-long diff = EnvironmentEdgeManager.currentTime() - 
regionNode.getLastUpdate();
-if (regionNode.isInState(State.OPENING, State.OPEN)) {
-  // This is possible as a region server has just closed a region but 
the region server
-  // report is generated before the closing, but arrive after the 
closing. Make sure there
-  // is some elapsed time so less false alarms.
-  if (!regionNode.getRegionLocation().equals(serverName) && diff > 
lag) {
-LOG.warn("Reporting {} server does not match {} (time since last "
-  + "update={}ms); closing...", serverName, regionNode, diff);
-closeRegionSilently(serverNode.getServerName(), regionName);
-  }
-} else if (!regionNode.isInState(State.CLOSING, State.SPLITTING)) {
-  // So, we can get report that a region is CLOSED or SPLIT because a 
heartbeat
-  // came in at about same time as a region transition. Make sure 
there is some
-  // elapsed time so less false alarms.
-  if (diff > lag) {
-LOG.warn("Reporting {} state does not match {} (time since last 
update={}ms)",
-  serverName, regionNode, diff);
+  // It is likely that another thread is currently holding the lock. To 
avoid deadlock, use

Review Comment:
   Thank you for your advice and the suggestions from ChatGPT are very valuable 
and worth considering.
   
   



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@hbase.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[jira] [Commented] (HBASE-28155) RecoveredReplicationSource quit when there are still unfinished groups

2023-10-16 Thread Duo Zhang (Jira)


[ 
https://issues.apache.org/jira/browse/HBASE-28155?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17775796#comment-17775796
 ] 

Duo Zhang commented on HBASE-28155:
---

I think this is common problem for all branches.

When a shipper for RecoveredReplicationSource is finished, we will try to 
cleanup everything. And in ReplicationSource.initialize, we will create and 
start the shippers one by one, so it is possible that, a shipper has already 
been started and finished, the second shipper has not been added to the workers 
map yet, and then the logic in tryFinish method will cleanup everything.

On branch-2.x, there is a sleep in the tryFinish method which can reduce the 
possibility a lot, but it could still happen theoretically. For master and 
branch-3, there is no sleep so the possibility is much greater than branch-2.x.

The code for master and branch-3

{code}
if (workerThreads.isEmpty()) {
  this.getSourceMetrics().clear();
  manager.finishRecoveredSource(this);
}
{code}

For branch-2.x

{code}
synchronized (workerThreads) {
  Threads.sleep(100);// wait a short while for other worker thread to fully 
exit
  boolean allTasksDone = workerThreads.values().stream().allMatch(w -> 
w.isFinished());
  if (allTasksDone) {
this.getSourceMetrics().clear();
manager.removeRecoveredSource(this);
LOG.info("Finished recovering queue {} with the following stats: {}", 
queueId, getStats());
  }
}
{code}



> RecoveredReplicationSource quit when there are still unfinished groups
> --
>
> Key: HBASE-28155
> URL: https://issues.apache.org/jira/browse/HBASE-28155
> Project: HBase
>  Issue Type: Bug
>  Components: Recovery, Replication
>Reporter: Duo Zhang
>Assignee: Duo Zhang
>Priority: Critical
> Fix For: 2.6.0, 3.0.0-beta-1
>
> Attachments: 
> org.apache.hadoop.hbase.replication.TestSyncReplicationStandbyKillRS-output.txt
>
>
> Need to dig more but it seems to related to how we deal with 
> RecoveredReplicationSource and queue.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Updated] (HBASE-28155) RecoveredReplicationSource quit when there are still unfinished groups

2023-10-16 Thread Duo Zhang (Jira)


 [ 
https://issues.apache.org/jira/browse/HBASE-28155?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Duo Zhang updated HBASE-28155:
--
Fix Version/s: 2.6.0

> RecoveredReplicationSource quit when there are still unfinished groups
> --
>
> Key: HBASE-28155
> URL: https://issues.apache.org/jira/browse/HBASE-28155
> Project: HBase
>  Issue Type: Bug
>  Components: Recovery, Replication
>Reporter: Duo Zhang
>Assignee: Duo Zhang
>Priority: Critical
> Fix For: 2.6.0, 3.0.0-beta-1
>
> Attachments: 
> org.apache.hadoop.hbase.replication.TestSyncReplicationStandbyKillRS-output.txt
>
>
> Need to dig more but it seems to related to how we deal with 
> RecoveredReplicationSource and queue.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Updated] (HBASE-28155) RecoveredReplicationSource quit when there are still unfinished groups

2023-10-16 Thread Duo Zhang (Jira)


 [ 
https://issues.apache.org/jira/browse/HBASE-28155?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Duo Zhang updated HBASE-28155:
--
Summary: RecoveredReplicationSource quit when there are still unfinished 
groups  (was: NPE in ReplicationSourceManager.cleanOldLogs when sync 
replication is enabled)

> RecoveredReplicationSource quit when there are still unfinished groups
> --
>
> Key: HBASE-28155
> URL: https://issues.apache.org/jira/browse/HBASE-28155
> Project: HBase
>  Issue Type: Bug
>  Components: Recovery, Replication
>Reporter: Duo Zhang
>Assignee: Duo Zhang
>Priority: Critical
> Fix For: 3.0.0-beta-1
>
> Attachments: 
> org.apache.hadoop.hbase.replication.TestSyncReplicationStandbyKillRS-output.txt
>
>
> Need to dig more but it seems to related to how we deal with 
> RecoveredReplicationSource and queue.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Updated] (HBASE-28155) NPE in ReplicationSourceManager.cleanOldLogs when sync replication is enabled

2023-10-16 Thread Duo Zhang (Jira)


 [ 
https://issues.apache.org/jira/browse/HBASE-28155?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Duo Zhang updated HBASE-28155:
--
Priority: Critical  (was: Major)

> NPE in ReplicationSourceManager.cleanOldLogs when sync replication is enabled
> -
>
> Key: HBASE-28155
> URL: https://issues.apache.org/jira/browse/HBASE-28155
> Project: HBase
>  Issue Type: Bug
>  Components: Recovery, Replication
>Reporter: Duo Zhang
>Assignee: Duo Zhang
>Priority: Critical
> Attachments: 
> org.apache.hadoop.hbase.replication.TestSyncReplicationStandbyKillRS-output.txt
>
>
> Need to dig more but it seems to related to how we deal with 
> RecoveredReplicationSource and queue.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Updated] (HBASE-28155) NPE in ReplicationSourceManager.cleanOldLogs when sync replication is enabled

2023-10-16 Thread Duo Zhang (Jira)


 [ 
https://issues.apache.org/jira/browse/HBASE-28155?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Duo Zhang updated HBASE-28155:
--
Fix Version/s: 3.0.0-beta-1

> NPE in ReplicationSourceManager.cleanOldLogs when sync replication is enabled
> -
>
> Key: HBASE-28155
> URL: https://issues.apache.org/jira/browse/HBASE-28155
> Project: HBase
>  Issue Type: Bug
>  Components: Recovery, Replication
>Reporter: Duo Zhang
>Assignee: Duo Zhang
>Priority: Critical
> Fix For: 3.0.0-beta-1
>
> Attachments: 
> org.apache.hadoop.hbase.replication.TestSyncReplicationStandbyKillRS-output.txt
>
>
> Need to dig more but it seems to related to how we deal with 
> RecoveredReplicationSource and queue.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Commented] (HBASE-28155) NPE in ReplicationSourceManager.cleanOldLogs when sync replication is enabled

2023-10-16 Thread Duo Zhang (Jira)


[ 
https://issues.apache.org/jira/browse/HBASE-28155?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17775793#comment-17775793
 ] 

Duo Zhang commented on HBASE-28155:
---

It seems that there is a race...

Actually we have two wal groups for the recovered replication queue, but when 
the first one fnishes, the second one has not been added to the source yet, so 
it will consider the source has been finished and cleanup everything, and then 
cause the shipper thread for the second queue to throw NPE.

{noformat}
2023-10-15T16:04:39,020 INFO  
[RS_CLAIM_REPLICATION_QUEUE-regionserver/8c5d72df62de:0-0.replicationSource,1-8c5d72df62de,45999,1697385859285/8c5d72df62de,38435,1697385859175.replicationSource.shipper8c5d72df62de%2C38435%2C1697385859175,1-8c5d72df62de,45999,1697385859285/8c5d72df62de,38435,1697385859175
 {}] regionserver.ReplicationSourceShipper(98): Running 
ReplicationSourceShipper Thread for wal group: 
8c5d72df62de%2C38435%2C1697385859175
2023-10-15T16:04:39,020 INFO  
[RS_CLAIM_REPLICATION_QUEUE-regionserver/8c5d72df62de:0-0.replicationSource,1-8c5d72df62de,45999,1697385859285/8c5d72df62de,38435,1697385859175
 {}] regionserver.ReplicationSourceWALReader(109): 
peerClusterZnode=1-8c5d72df62de,45999,1697385859285/8c5d72df62de,38435,1697385859175,
 ReplicationSourceWALReaderThread : 1 inited, 
replicationBatchSizeCapacity=102400, replicationBatchCountCapacity=25000, 
replicationBatchQueueCapacity=1
2023-10-15T16:04:39,021 DEBUG 
[RS_CLAIM_REPLICATION_QUEUE-regionserver/8c5d72df62de:0-0.replicationSource,1-8c5d72df62de,45999,1697385859285/8c5d72df62de,38435,1697385859175.replicationSource.wal-reader.8c5d72df62de%2C38435%2C1697385859175.rep,1-8c5d72df62de,45999,1697385859285/8c5d72df62de,38435,1697385859175
 {}] regionserver.WALEntryStream(249): Creating new reader 
hdfs://localhost:39027/user/jenkins/test-data/48e99b63-a68f-156c-32f2-68ddc02602e1/oldWALs/8c5d72df62de%2C38435%2C1697385859175.rep.1697385865402,
 startPosition=0, beingWritten=false
2023-10-15T16:04:39,022 DEBUG 
[RS_CLAIM_REPLICATION_QUEUE-regionserver/8c5d72df62de:0-0.replicationSource,1-8c5d72df62de,45999,1697385859285/8c5d72df62de,38435,1697385859175.replicationSource.wal-reader.8c5d72df62de%2C38435%2C1697385859175,1-8c5d72df62de,45999,1697385859285/8c5d72df62de,38435,1697385859175
 {}] regionserver.WALEntryStream(451): EOF, closing 
hdfs://localhost:39027/user/jenkins/test-data/48e99b63-a68f-156c-32f2-68ddc02602e1/oldWALs/8c5d72df62de%2C38435%2C1697385859175.1697385867465
2023-10-15T16:04:39,022 DEBUG 
[RS_CLAIM_REPLICATION_QUEUE-regionserver/8c5d72df62de:0-0.replicationSource,1-8c5d72df62de,45999,1697385859285/8c5d72df62de,38435,1697385859175.replicationSource.wal-reader.8c5d72df62de%2C38435%2C1697385859175,1-8c5d72df62de,45999,1697385859285/8c5d72df62de,38435,1697385859175
 {}] regionserver.WALEntryStream(236): No more WAL files in queue
2023-10-15T16:04:39,022 DEBUG 
[RS_CLAIM_REPLICATION_QUEUE-regionserver/8c5d72df62de:0-0.replicationSource,1-8c5d72df62de,45999,1697385859285/8c5d72df62de,38435,1697385859175.replicationSource.wal-reader.8c5d72df62de%2C38435%2C1697385859175,1-8c5d72df62de,45999,1697385859285/8c5d72df62de,38435,1697385859175
 {}] regionserver.ReplicationSourceWALReader(118): Stopping the replication 
source wal reader
2023-10-15T16:04:39,022 DEBUG 
[RS_CLAIM_REPLICATION_QUEUE-regionserver/8c5d72df62de:0-0.replicationSource,1-8c5d72df62de,45999,1697385859285/8c5d72df62de,38435,1697385859175.replicationSource.shipper8c5d72df62de%2C38435%2C1697385859175,1-8c5d72df62de,45999,1697385859285/8c5d72df62de,38435,1697385859175
 {}] regionserver.ReplicationSourceShipper(110): Shipper from source 
1-8c5d72df62de,45999,1697385859285/8c5d72df62de,38435,1697385859175 got entry 
batch from reader: WALEntryBatch [walEntries=[], lastWalPath=null, 
lastWalPosition=0, nbRowKeys=0, nbHFiles=0, heapSize=0, lastSeqIds={}, 
endOfFile=false,usedBufferSize=0]
2023-10-15T16:04:39,022 DEBUG 
[RS_CLAIM_REPLICATION_QUEUE-regionserver/8c5d72df62de:0-0.replicationSource,1-8c5d72df62de,45999,1697385859285/8c5d72df62de,38435,1697385859175.replicationSource.shipper8c5d72df62de%2C38435%2C1697385859175,1-8c5d72df62de,45999,1697385859285/8c5d72df62de,38435,1697385859175
 {}] regionserver.ReplicationSourceShipper(138): Finished recovering queue for 
group 8c5d72df62de%2C38435%2C1697385859175 of peer 
1-8c5d72df62de,45999,1697385859285/8c5d72df62de,38435,1697385859175
2023-10-15T16:04:39,022 INFO  
[RS_CLAIM_REPLICATION_QUEUE-regionserver/8c5d72df62de:0-0.replicationSource,1-8c5d72df62de,45999,1697385859285/8c5d72df62de,38435,1697385859175.replicationSource.shipper8c5d72df62de%2C38435%2C1697385859175,1-8c5d72df62de,45999,1697385859285/8c5d72df62de,38435,1697385859175
 {}] regionserver.ReplicationSourceManager(540): Done with the recovered queue 
1-8c5d72df62de,45999,1697385859285/8c5d72df62de,38435,1697385859175
2023-10-15T16:04:39,024 INFO  
[RS_CLAIM_REPLICATION_QUEUE-regio

[jira] [Work started] (HBASE-28155) NPE in ReplicationSourceManager.cleanOldLogs when sync replication is enabled

2023-10-16 Thread Duo Zhang (Jira)


 [ 
https://issues.apache.org/jira/browse/HBASE-28155?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Work on HBASE-28155 started by Duo Zhang.
-
> NPE in ReplicationSourceManager.cleanOldLogs when sync replication is enabled
> -
>
> Key: HBASE-28155
> URL: https://issues.apache.org/jira/browse/HBASE-28155
> Project: HBase
>  Issue Type: Bug
>  Components: Recovery, Replication
>Reporter: Duo Zhang
>Assignee: Duo Zhang
>Priority: Major
> Attachments: 
> org.apache.hadoop.hbase.replication.TestSyncReplicationStandbyKillRS-output.txt
>
>
> Need to dig more but it seems to related to how we deal with 
> RecoveredReplicationSource and queue.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Assigned] (HBASE-28155) NPE in ReplicationSourceManager.cleanOldLogs when sync replication is enabled

2023-10-16 Thread Duo Zhang (Jira)


 [ 
https://issues.apache.org/jira/browse/HBASE-28155?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Duo Zhang reassigned HBASE-28155:
-

Assignee: Duo Zhang

> NPE in ReplicationSourceManager.cleanOldLogs when sync replication is enabled
> -
>
> Key: HBASE-28155
> URL: https://issues.apache.org/jira/browse/HBASE-28155
> Project: HBase
>  Issue Type: Bug
>  Components: Recovery, Replication
>Reporter: Duo Zhang
>Assignee: Duo Zhang
>Priority: Major
> Attachments: 
> org.apache.hadoop.hbase.replication.TestSyncReplicationStandbyKillRS-output.txt
>
>
> Need to dig more but it seems to related to how we deal with 
> RecoveredReplicationSource and queue.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Updated] (HBASE-28155) NPE in ReplicationSourceManager.cleanOldLogs when sync replication is enabled

2023-10-16 Thread Duo Zhang (Jira)


 [ 
https://issues.apache.org/jira/browse/HBASE-28155?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Duo Zhang updated HBASE-28155:
--
Attachment: 
org.apache.hadoop.hbase.replication.TestSyncReplicationStandbyKillRS-output.txt

> NPE in ReplicationSourceManager.cleanOldLogs when sync replication is enabled
> -
>
> Key: HBASE-28155
> URL: https://issues.apache.org/jira/browse/HBASE-28155
> Project: HBase
>  Issue Type: Bug
>  Components: Recovery, Replication
>Reporter: Duo Zhang
>Priority: Major
> Attachments: 
> org.apache.hadoop.hbase.replication.TestSyncReplicationStandbyKillRS-output.txt
>
>
> Need to dig more but it seems to related to how we deal with 
> RecoveredReplicationSource and queue.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Created] (HBASE-28155) NPE in ReplicationSourceManager.cleanOldLogs when sync replication is enabled

2023-10-16 Thread Duo Zhang (Jira)
Duo Zhang created HBASE-28155:
-

 Summary: NPE in ReplicationSourceManager.cleanOldLogs when sync 
replication is enabled
 Key: HBASE-28155
 URL: https://issues.apache.org/jira/browse/HBASE-28155
 Project: HBase
  Issue Type: Bug
  Components: Recovery, Replication
Reporter: Duo Zhang


Need to dig more but it seems to related to how we deal with 
RecoveredReplicationSource and queue.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


Re: [PR] HBASE-28114 Add more comments to explain why replication log queue could never be empty for normal replication queue [hbase]

2023-10-16 Thread via GitHub


Apache9 commented on PR #5443:
URL: https://github.com/apache/hbase/pull/5443#issuecomment-1764571509

   For `TestZooKeeper`, I filed HBASE-28154 for it.
   
   And for TestSyncReplicationStandbyKillRS, this should be a bug as we hit a 
NPE...
   
   Let me dig more...
   
   ```
   2023-10-15T16:04:54,666 ERROR 
[RS_CLAIM_REPLICATION_QUEUE-regionserver/8c5d72df62de:0-0.replicationSource,1-8c5d72df62de,35419,1697385888915/8c5d72df62de,46527,1697385859222.replicationSource.shipper8c5d72df62de%2C46527%2C1697385859222,1-8c5d72df62de,35419,1697385888915/8c5d72df62de,46527,1697385859222
 {}] regionserver.ReplicationSource(452): Unexpected exception in 
RS_CLAIM_REPLICATION_QUEUE-regionserver/8c5d72df62de:0-0.replicationSource,1-8c5d72df62de,35419,1697385888915/8c5d72df62de,46527,1697385859222.replicationSource.shipper8c5d72df62de%2C46527%2C1697385859222,1-8c5d72df62de,35419,1697385888915/8c5d72df62de,46527,1697385859222
 currentPath=null
   java.lang.NullPointerException: null
at 
org.apache.hadoop.hbase.replication.regionserver.ReplicationSourceManager.cleanOldLogs(ReplicationSourceManager.java:662)
 ~[classes/:?]
at 
org.apache.hadoop.hbase.replication.regionserver.ReplicationSourceManager.logPositionAndCleanOldLogs(ReplicationSourceManager.java:649)
 ~[classes/:?]
at 
org.apache.hadoop.hbase.replication.regionserver.ReplicationSourceInterface.logPositionAndCleanOldLogs(ReplicationSourceInterface.java:211)
 ~[classes/:?]
at 
org.apache.hadoop.hbase.replication.regionserver.ReplicationSourceShipper.updateLogPosition(ReplicationSourceShipper.java:266)
 ~[classes/:?]
at 
org.apache.hadoop.hbase.replication.regionserver.ReplicationSourceShipper.shipEdits(ReplicationSourceShipper.java:158)
 ~[classes/:?]
at 
org.apache.hadoop.hbase.replication.regionserver.ReplicationSourceShipper.run(ReplicationSourceShipper.java:119)
 ~[classes/:?]
   2023-10-15T16:04:54,666 ERROR 
[RS_CLAIM_REPLICATION_QUEUE-regionserver/8c5d72df62de:0-0.replicationSource,1-8c5d72df62de,35419,1697385888915/8c5d72df62de,46527,1697385859222.replicationSource.shipper8c5d72df62de%2C46527%2C1697385859222,1-8c5d72df62de,35419,1697385888915/8c5d72df62de,46527,1697385859222
 {}] regionserver.HRegionServer(2389): * ABORTING region server 
8c5d72df62de,35419,1697385888915: Unexpected exception in 
RS_CLAIM_REPLICATION_QUEUE-regionserver/8c5d72df62de:0-0.replicationSource,1-8c5d72df62de,35419,1697385888915/8c5d72df62de,46527,1697385859222.replicationSource.shipper8c5d72df62de%2C46527%2C1697385859222,1-8c5d72df62de,35419,1697385888915/8c5d72df62de,46527,1697385859222
 *
   java.lang.NullPointerException: null
at 
org.apache.hadoop.hbase.replication.regionserver.ReplicationSourceManager.cleanOldLogs(ReplicationSourceManager.java:662)
 ~[classes/:?]
at 
org.apache.hadoop.hbase.replication.regionserver.ReplicationSourceManager.logPositionAndCleanOldLogs(ReplicationSourceManager.java:649)
 ~[classes/:?]
at 
org.apache.hadoop.hbase.replication.regionserver.ReplicationSourceInterface.logPositionAndCleanOldLogs(ReplicationSourceInterface.java:211)
 ~[classes/:?]
at 
org.apache.hadoop.hbase.replication.regionserver.ReplicationSourceShipper.updateLogPosition(ReplicationSourceShipper.java:266)
 ~[classes/:?]
at 
org.apache.hadoop.hbase.replication.regionserver.ReplicationSourceShipper.shipEdits(ReplicationSourceShipper.java:158)
 ~[classes/:?]
at 
org.apache.hadoop.hbase.replication.regionserver.ReplicationSourceShipper.run(ReplicationSourceShipper.java:119)
 ~[classes/:?]
   ```


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@hbase.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[jira] [Created] (HBASE-28154) TestZooKeeper could hang forever

2023-10-16 Thread Duo Zhang (Jira)
Duo Zhang created HBASE-28154:
-

 Summary: TestZooKeeper could hang forever
 Key: HBASE-28154
 URL: https://issues.apache.org/jira/browse/HBASE-28154
 Project: HBase
  Issue Type: Bug
  Components: test
Reporter: Duo Zhang


Recently saw this several times in pre commit result.

Checked the log output, it is stuck in testRegionServerSessionExpired.

When replaying the edit for meta region, in the end we need to flush the 
memstore, and the flush is stuck which causes the test to timeout.

This is the last log message for opening hbase:meta
{noformat}
2023-10-15T14:37:46,704 INFO  [RS_OPEN_META-regionserver/2c0085825d5f:0-0 
{event_type=M_RS_OPEN_META, pid=9}] regionserver.HRegion(2885): Flushing 
1588230740 4/4 column families, dataSize=74 B heapSize=1.22 KB
{noformat}

And when the test timed out, we saw this
{noformat}
2023-10-15T14:47:57,360 WARN  [RS_OPEN_META-regionserver/2c0085825d5f:0-0 
{event_type=M_RS_OPEN_META, pid=9}] regionserver.HStore(846): Failed flushing 
store file for 1588230740/ns, retrying num=0
java.nio.channels.ClosedChannelException: null
at 
org.apache.hadoop.hdfs.ExceptionLastSeen.throwException4Close(ExceptionLastSeen.java:73)
 ~[hadoop-hdfs-client-3.2.4.jar:?]
at 
org.apache.hadoop.hdfs.DFSOutputStream.checkClosed(DFSOutputStream.java:153) 
~[hadoop-hdfs-client-3.2.4.jar:?]
at org.apache.hadoop.fs.FSOutputSummer.write(FSOutputSummer.java:105) 
~[hadoop-common-3.2.4.jar:?]
at 
org.apache.hadoop.fs.FSDataOutputStream$PositionCache.write(FSDataOutputStream.java:57)
 ~[hadoop-common-3.2.4.jar:?]
at java.io.DataOutputStream.write(DataOutputStream.java:107) 
~[?:1.8.0_352]
at 
org.apache.hadoop.hbase.io.hfile.HFileBlock$Writer.finishBlockAndWriteHeaderAndData(HFileBlock.java:1045)
 ~[classes/:?]
at 
org.apache.hadoop.hbase.io.hfile.HFileBlock$Writer.writeHeaderAndData(HFileBlock.java:1032)
 ~[classes/:?]
at 
org.apache.hadoop.hbase.io.hfile.HFileWriterImpl.writeInlineBlocks(HFileWriterImpl.java:539)
 ~[classes/:?]
at 
org.apache.hadoop.hbase.io.hfile.HFileWriterImpl.close(HFileWriterImpl.java:615)
 ~[classes/:?]
at 
org.apache.hadoop.hbase.regionserver.StoreFileWriter.close(StoreFileWriter.java:377)
 ~[classes/:?]
at 
org.apache.hadoop.hbase.regionserver.StoreFlusher.finalizeWriter(StoreFlusher.java:70)
 ~[classes/:?]
at 
org.apache.hadoop.hbase.regionserver.DefaultStoreFlusher.flushSnapshot(DefaultStoreFlusher.java:74)
 ~[classes/:?]
at 
org.apache.hadoop.hbase.regionserver.HStore.flushCache(HStore.java:828) 
~[classes/:?]
at 
org.apache.hadoop.hbase.regionserver.HStore$StoreFlusherImpl.flushCache(HStore.java:1969)
 ~[classes/:?]
at 
org.apache.hadoop.hbase.regionserver.HRegion.internalFlushCacheAndCommit(HRegion.java:3012)
 ~[classes/:?]
at 
org.apache.hadoop.hbase.regionserver.HRegion.internalFlushcache(HRegion.java:2720)
 ~[classes/:?]
at 
org.apache.hadoop.hbase.regionserver.HRegion.replayRecoveredEditsIfAny(HRegion.java:5458)
 ~[classes/:?]
at 
org.apache.hadoop.hbase.regionserver.HRegion.initializeRegionInternals(HRegion.java:1032)
 ~[classes/:?]
at 
org.apache.hadoop.hbase.regionserver.HRegion.initialize(HRegion.java:966) 
~[classes/:?]
at 
org.apache.hadoop.hbase.regionserver.HRegion.openHRegion(HRegion.java:7774) 
~[classes/:?]
at 
org.apache.hadoop.hbase.regionserver.HRegion.openHRegionFromTableDir(HRegion.java:7729)
 ~[classes/:?]
at 
org.apache.hadoop.hbase.regionserver.HRegion.openHRegion(HRegion.java:7704) 
~[classes/:?]
at 
org.apache.hadoop.hbase.regionserver.HRegion.openHRegion(HRegion.java:7663) 
~[classes/:?]
at 
org.apache.hadoop.hbase.regionserver.HRegion.openHRegion(HRegion.java:7619) 
~[classes/:?]
at 
org.apache.hadoop.hbase.regionserver.handler.AssignRegionHandler.process(AssignRegionHandler.java:138)
 ~[classes/:?]
at 
org.apache.hadoop.hbase.executor.EventHandler.run(EventHandler.java:104) 
~[classes/:?]
at 
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) 
~[?:1.8.0_352]
at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) 
~[?:1.8.0_352]
at java.lang.Thread.run(Thread.java:750) ~[?:1.8.0_352]
{noformat}

It is stuck on writing data to HDFS...

Not sure what is the root cause, need to dig more...



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


Re: [PR] HBASE-28113 Modify the way of acquiring the RegionStateNode lock in checkOnlineRegionsReport to tryLock [hbase]

2023-10-16 Thread via GitHub


Apache9 commented on code in PR #5442:
URL: https://github.com/apache/hbase/pull/5442#discussion_r1360675962


##
hbase-server/src/main/java/org/apache/hadoop/hbase/master/assignment/AssignmentManager.java:
##
@@ -1398,29 +1398,37 @@ private void checkOnlineRegionsReport(ServerStateNode 
serverNode, Set re
 continue;
   }
   final long lag = 1000;
-  regionNode.lock();
-  try {
-long diff = EnvironmentEdgeManager.currentTime() - 
regionNode.getLastUpdate();
-if (regionNode.isInState(State.OPENING, State.OPEN)) {
-  // This is possible as a region server has just closed a region but 
the region server
-  // report is generated before the closing, but arrive after the 
closing. Make sure there
-  // is some elapsed time so less false alarms.
-  if (!regionNode.getRegionLocation().equals(serverName) && diff > 
lag) {
-LOG.warn("Reporting {} server does not match {} (time since last "
-  + "update={}ms); closing...", serverName, regionNode, diff);
-closeRegionSilently(serverNode.getServerName(), regionName);
-  }
-} else if (!regionNode.isInState(State.CLOSING, State.SPLITTING)) {
-  // So, we can get report that a region is CLOSED or SPLIT because a 
heartbeat
-  // came in at about same time as a region transition. Make sure 
there is some
-  // elapsed time so less false alarms.
-  if (diff > lag) {
-LOG.warn("Reporting {} state does not match {} (time since last 
update={}ms)",
-  serverName, regionNode, diff);
+  // It is likely that another thread is currently holding the lock. To 
avoid deadlock, use

Review Comment:
   The comment is incorrect now? And better mention that this is just a 
fallback check for checking unexpected incosistency. This is what ChatGPT 
suggested
   ```
   This is just a fallback check designed to identify unexpected data 
inconsistencies, so we use tryLock to attempt to acquire the lock, and if the 
lock cannot be acquired, we skip the check. This will not cause any additional 
problems and also prevents the regionServerReport call from being stuck for too 
long which may cause deadlock on region assignment.
   ```



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@hbase.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [PR] HBASE-28146: Make ServerManager rsAdmins map thread safe [hbase]

2023-10-16 Thread via GitHub


Apache9 commented on code in PR #5461:
URL: https://github.com/apache/hbase/pull/5461#discussion_r1360664367


##
hbase-server/src/main/java/org/apache/hadoop/hbase/master/ServerManager.java:
##
@@ -718,14 +720,19 @@ public static void 
closeRegionSilentlyAndWait(ClusterConnection connection, Serv
   public AdminService.BlockingInterface getRsAdmin(final ServerName sn) throws 
IOException {
 AdminService.BlockingInterface admin = this.rsAdmins.get(sn);
 if (admin == null) {
-  LOG.debug("New admin connection to " + sn.toString());
-  if (sn.equals(master.getServerName()) && master instanceof 
HRegionServer) {
-// A master is also a region server now, see HBASE-10569 for details
-admin = ((HRegionServer) master).getRSRpcServices();
-  } else {
-admin = this.connection.getAdmin(sn);
-  }
-  this.rsAdmins.put(sn, admin);
+  return this.rsAdmins.computeIfAbsent(sn, server -> {
+LOG.debug("New admin connection to " + server.toString());
+if (server.equals(master.getServerName()) && master instanceof 
HRegionServer) {
+  // A master is also a region server now, see HBASE-10569 for details
+  return ((HRegionServer) master).getRSRpcServices();
+} else {
+  try {
+return this.connection.getAdmin(server);
+  } catch (IOException e) {
+throw new RuntimeException(e);

Review Comment:
   Ah, checked the code, as @bbeaudreault mentioned, we already have a cache in 
`ConnectionImplementation`, 
https://github.com/apache/hbase/blob/18a38aeec6b6db4abf24936bc8a403bca58e7e32/hbase-client/src/main/java/org/apache/hadoop/hbase/client/ConnectionImplementation.java#L1434
   
   So I think we can just remove the cache here.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@hbase.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [PR] HBASE-28146: Make ServerManager rsAdmins map thread safe [hbase]

2023-10-16 Thread via GitHub


rmdmattingly commented on code in PR #5461:
URL: https://github.com/apache/hbase/pull/5461#discussion_r1360631177


##
hbase-server/src/main/java/org/apache/hadoop/hbase/master/ServerManager.java:
##
@@ -718,14 +720,19 @@ public static void 
closeRegionSilentlyAndWait(ClusterConnection connection, Serv
   public AdminService.BlockingInterface getRsAdmin(final ServerName sn) throws 
IOException {
 AdminService.BlockingInterface admin = this.rsAdmins.get(sn);
 if (admin == null) {
-  LOG.debug("New admin connection to " + sn.toString());
-  if (sn.equals(master.getServerName()) && master instanceof 
HRegionServer) {
-// A master is also a region server now, see HBASE-10569 for details
-admin = ((HRegionServer) master).getRSRpcServices();
-  } else {
-admin = this.connection.getAdmin(sn);
-  }
-  this.rsAdmins.put(sn, admin);
+  return this.rsAdmins.computeIfAbsent(sn, server -> {
+LOG.debug("New admin connection to " + server.toString());
+if (server.equals(master.getServerName()) && master instanceof 
HRegionServer) {
+  // A master is also a region server now, see HBASE-10569 for details
+  return ((HRegionServer) master).getRSRpcServices();
+} else {
+  try {
+return this.connection.getAdmin(server);
+  } catch (IOException e) {
+throw new RuntimeException(e);

Review Comment:
   `computeIfAbsentEx` seems like a good idea here. I don't have an informed 
opinion regarding whether the caching is really necessary, but I'm happy to rip 
it out if we think that's worth doing now



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@hbase.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [PR] HBASE-28064:Implement truncate_region command [hbase]

2023-10-16 Thread via GitHub


Apache-HBase commented on PR #5462:
URL: https://github.com/apache/hbase/pull/5462#issuecomment-1764436486

   :broken_heart: **-1 overall**
   
   
   
   
   
   
   | Vote | Subsystem | Runtime | Comment |
   |::|--:|:|:|
   | +0 :ok: |  reexec  |   0m 36s |  Docker mode activated.  |
   ||| _ Prechecks _ |
   | +1 :green_heart: |  dupname  |   0m  0s |  No case conflicting files 
found.  |
   | +0 :ok: |  prototool  |   0m  0s |  prototool was not available.  |
   | +1 :green_heart: |  hbaseanti  |   0m  0s |  Patch does not have any 
anti-patterns.  |
   | +1 :green_heart: |  @author  |   0m  0s |  The patch does not contain any 
@author tags.  |
   ||| _ master Compile Tests _ |
   | +0 :ok: |  mvndep  |   0m  8s |  Maven dependency ordering for branch  |
   | +1 :green_heart: |  mvninstall  |   2m 44s |  master passed  |
   | +1 :green_heart: |  compile  |   4m 36s |  master passed  |
   | +1 :green_heart: |  checkstyle  |   1m 35s |  master passed  |
   | +1 :green_heart: |  spotless  |   0m 44s |  branch has no errors when 
running spotless:check.  |
   | +1 :green_heart: |  spotbugs  |   5m 22s |  master passed  |
   ||| _ Patch Compile Tests _ |
   | +0 :ok: |  mvndep  |   0m 10s |  Maven dependency ordering for patch  |
   | +1 :green_heart: |  mvninstall  |   2m 38s |  the patch passed  |
   | +1 :green_heart: |  compile  |   4m 35s |  the patch passed  |
   | +1 :green_heart: |  cc  |   4m 35s |  the patch passed  |
   | +1 :green_heart: |  javac  |   4m 35s |  the patch passed  |
   | -0 :warning: |  checkstyle  |   0m 16s |  hbase-client: The patch 
generated 8 new + 4 unchanged - 0 fixed = 12 total (was 4)  |
   | -0 :warning: |  checkstyle  |   0m 35s |  hbase-server: The patch 
generated 2 new + 7 unchanged - 0 fixed = 9 total (was 7)  |
   | -0 :warning: |  rubocop  |   0m 10s |  The patch generated 8 new + 766 
unchanged - 0 fixed = 774 total (was 766)  |
   | +1 :green_heart: |  whitespace  |   0m  0s |  The patch has no whitespace 
issues.  |
   | +1 :green_heart: |  hadoopcheck  |   9m 13s |  Patch does not cause any 
errors with Hadoop 3.2.4 3.3.6.  |
   | +1 :green_heart: |  hbaseprotoc  |   1m 55s |  the patch passed  |
   | -1 :x: |  spotless  |   0m 19s |  patch has 63 errors when running 
spotless:check, run spotless:apply to fix.  |
   | +1 :green_heart: |  spotbugs  |   5m 57s |  the patch passed  |
   ||| _ Other Tests _ |
   | +1 :green_heart: |  asflicense  |   0m 48s |  The patch does not generate 
ASF License warnings.  |
   |  |   |  49m 47s |   |
   
   
   | Subsystem | Report/Notes |
   |--:|:-|
   | Docker | ClientAPI=1.43 ServerAPI=1.43 base: 
https://ci-hbase.apache.org/job/HBase-PreCommit-GitHub-PR/job/PR-5462/1/artifact/yetus-general-check/output/Dockerfile
 |
   | GITHUB PR | https://github.com/apache/hbase/pull/5462 |
   | JIRA Issue | HBASE-28064 |
   | Optional Tests | dupname asflicense javac spotbugs hadoopcheck hbaseanti 
spotless checkstyle compile cc hbaseprotoc prototool rubocop |
   | uname | Linux 74972bea0cfb 5.4.0-163-generic #180-Ubuntu SMP Tue Sep 5 
13:21:23 UTC 2023 x86_64 x86_64 x86_64 GNU/Linux |
   | Build tool | maven |
   | Personality | dev-support/hbase-personality.sh |
   | git revision | master / 391dfda6ad |
   | Default Java | Eclipse Adoptium-11.0.17+8 |
   | checkstyle | 
https://ci-hbase.apache.org/job/HBase-PreCommit-GitHub-PR/job/PR-5462/1/artifact/yetus-general-check/output/diff-checkstyle-hbase-client.txt
 |
   | checkstyle | 
https://ci-hbase.apache.org/job/HBase-PreCommit-GitHub-PR/job/PR-5462/1/artifact/yetus-general-check/output/diff-checkstyle-hbase-server.txt
 |
   | rubocop | 
https://ci-hbase.apache.org/job/HBase-PreCommit-GitHub-PR/job/PR-5462/1/artifact/yetus-general-check/output/diff-patch-rubocop.txt
 |
   | spotless | 
https://ci-hbase.apache.org/job/HBase-PreCommit-GitHub-PR/job/PR-5462/1/artifact/yetus-general-check/output/patch-spotless.txt
 |
   | Max. process+thread count | 80 (vs. ulimit of 3) |
   | modules | C: hbase-protocol-shaded hbase-client hbase-server hbase-thrift 
hbase-shell U: . |
   | Console output | 
https://ci-hbase.apache.org/job/HBase-PreCommit-GitHub-PR/job/PR-5462/1/console 
|
   | versions | git=2.34.1 maven=3.8.6 spotbugs=4.7.3 rubocop=1.37.1 |
   | Powered by | Apache Yetus 0.12.0 https://yetus.apache.org |
   
   
   This message was automatically generated.
   
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@hbase.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[PR] HBASE-28064:Implement truncate_region command [hbase]

2023-10-16 Thread via GitHub


vaijosh opened a new pull request, #5462:
URL: https://github.com/apache/hbase/pull/5462

   HBASE-28064:Implement truncate_region command to truncate region directly 
from FS
   
   Implemented truncate_region command.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@hbase.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[jira] (HBASE-27389) Add cost function in balancer to consider the cost of building bucket cache before moving regions

2023-10-16 Thread Wellington Chevreuil (Jira)


[ https://issues.apache.org/jira/browse/HBASE-27389 ]


Wellington Chevreuil deleted comment on HBASE-27389:
--

was (Author: wchevreuil):
Merged into the feature branch.

> Add cost function in balancer to consider the cost of building bucket cache 
> before moving regions
> -
>
> Key: HBASE-27389
> URL: https://issues.apache.org/jira/browse/HBASE-27389
> Project: HBase
>  Issue Type: Task
>  Components: Balancer
>Reporter: Rahul Agarkar
>Assignee: Rahul Agarkar
>Priority: Major
>
> HBase currently uses StochasticLoadBalancer to determine the cost of moving 
> the regions from one RS to another. Each cost functions give a result between 
> 0 and 1, with 0 being the lowest cost and 1 being the cost. The balancer 
> iterates through each cost function and comes up with the total cost. Now, 
> the balancer will create multiple balancing plans on random actions and try 
> to compute the cost of each plan as if they are executed, if the cost of the 
> plan is less than the initial cost, the plan is executed.
> Implement a new "CacheAwareCostFunction" which takes into account if the 
> region is fully cached and return the highest cost if the plan suggests 
> moving this region.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Resolved] (HBASE-27999) Implement cache aware load balancer

2023-10-16 Thread Wellington Chevreuil (Jira)


 [ 
https://issues.apache.org/jira/browse/HBASE-27999?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wellington Chevreuil resolved HBASE-27999.
--
Resolution: Fixed

Merged into the feature branch.

> Implement cache aware load balancer
> ---
>
> Key: HBASE-27999
> URL: https://issues.apache.org/jira/browse/HBASE-27999
> Project: HBase
>  Issue Type: Sub-task
>  Components: Balancer
>Reporter: Rahul Agarkar
>Assignee: Rahul Agarkar
>Priority: Major
>
> HBase uses ephemeral cache to cache the blocks by reading them from the slow 
> storages and storing them to the bucket cache. This cache is warmed up 
> everytime a region server is started. Depending on the data size and the 
> configured cache size, the cache warm up can take anywhere between a few 
> minutes to few hours. Doing this everytime the region server starts can be a 
> very expensive process. To eliminate this, HBASE-27313 implemented the cache 
> persistence feature where the region servers periodically persist the blocks 
> cached in the bucket cache. This persisted information is then used to 
> resurrect the cache in the event of a region server restart because of normal 
> restart or crash.
> This feature aims at enhancing this capability of HBase to enable the 
> balancer implementation considers the cache allocation of each region on 
> region servers when calculating a new assignment plan and uses the 
> region/region server cache allocation info reported by region servers which 
> takes into account to calculate the percentage of HFiles cached for each 
> region on the hosting server, and then use that as another factor when 
> deciding on an optimal, new assignment plan.
>  
> A design document describing the balancer can be found at 
> https://docs.google.com/document/d/1A8-eVeRhZjwL0hzFw9wmXl8cGP4BFomSlohX2QcaFg4/edit?usp=sharing



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Reopened] (HBASE-27389) Add cost function in balancer to consider the cost of building bucket cache before moving regions

2023-10-16 Thread Wellington Chevreuil (Jira)


 [ 
https://issues.apache.org/jira/browse/HBASE-27389?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wellington Chevreuil reopened HBASE-27389:
--

> Add cost function in balancer to consider the cost of building bucket cache 
> before moving regions
> -
>
> Key: HBASE-27389
> URL: https://issues.apache.org/jira/browse/HBASE-27389
> Project: HBase
>  Issue Type: Task
>  Components: Balancer
>Reporter: Rahul Agarkar
>Assignee: Rahul Agarkar
>Priority: Major
>
> HBase currently uses StochasticLoadBalancer to determine the cost of moving 
> the regions from one RS to another. Each cost functions give a result between 
> 0 and 1, with 0 being the lowest cost and 1 being the cost. The balancer 
> iterates through each cost function and comes up with the total cost. Now, 
> the balancer will create multiple balancing plans on random actions and try 
> to compute the cost of each plan as if they are executed, if the cost of the 
> plan is less than the initial cost, the plan is executed.
> Implement a new "CacheAwareCostFunction" which takes into account if the 
> region is fully cached and return the highest cost if the plan suggests 
> moving this region.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Resolved] (HBASE-27389) Add cost function in balancer to consider the cost of building bucket cache before moving regions

2023-10-16 Thread Wellington Chevreuil (Jira)


 [ 
https://issues.apache.org/jira/browse/HBASE-27389?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wellington Chevreuil resolved HBASE-27389.
--
Resolution: Fixed

Merged into the feature branch.

> Add cost function in balancer to consider the cost of building bucket cache 
> before moving regions
> -
>
> Key: HBASE-27389
> URL: https://issues.apache.org/jira/browse/HBASE-27389
> Project: HBase
>  Issue Type: Task
>  Components: Balancer
>Reporter: Rahul Agarkar
>Assignee: Rahul Agarkar
>Priority: Major
>
> HBase currently uses StochasticLoadBalancer to determine the cost of moving 
> the regions from one RS to another. Each cost functions give a result between 
> 0 and 1, with 0 being the lowest cost and 1 being the cost. The balancer 
> iterates through each cost function and comes up with the total cost. Now, 
> the balancer will create multiple balancing plans on random actions and try 
> to compute the cost of each plan as if they are executed, if the cost of the 
> plan is less than the initial cost, the plan is executed.
> Implement a new "CacheAwareCostFunction" which takes into account if the 
> region is fully cached and return the highest cost if the plan suggests 
> moving this region.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Resolved] (HBASE-28152) Replace scala.util.parsing.json with org.json4s.jackson which used in Spark too

2023-10-16 Thread Balazs Meszaros (Jira)


 [ 
https://issues.apache.org/jira/browse/HBASE-28152?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Balazs Meszaros resolved HBASE-28152.
-
Fix Version/s: hbase-connectors-1.0.1
   Resolution: Fixed

> Replace scala.util.parsing.json with org.json4s.jackson which used in Spark 
> too
> ---
>
> Key: HBASE-28152
> URL: https://issues.apache.org/jira/browse/HBASE-28152
> Project: HBase
>  Issue Type: New Feature
>  Components: spark
>Affects Versions: connector-1.0.0
>Reporter: Attila Zsolt Piros
>Assignee: Attila Zsolt Piros
>Priority: Major
> Fix For: hbase-connectors-1.0.1
>
>
> In https://issues.apache.org/jira/browse/HBASE-28137 to support Spark 3.4 the 
> scala-parser-combinators was added as direct dependency to HBase Spark 
> Connector. 
> This was needed as Spark 3.4 is not using scala-parser-combinators and it is 
> not inherited as transitive dependency.  
> But this solution has a disadvantage. As the HBase Spark Connector assembly 
> jar does not include any 3rd party libraries the scala-parser-combinators 
> must be added to the spark classpath for HBase Spark Connector to work.
> A much better solution is to replace  scala.util.parsing.json with 
> org.json4s.jackson which is used by Spark core, see 
> https://github.com/apache/spark/blob/branch-3.4/core/pom.xml#L279-L280.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Assigned] (HBASE-28152) Replace scala.util.parsing.json with org.json4s.jackson which used in Spark too

2023-10-16 Thread Balazs Meszaros (Jira)


 [ 
https://issues.apache.org/jira/browse/HBASE-28152?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Balazs Meszaros reassigned HBASE-28152:
---

Assignee: Attila Zsolt Piros

> Replace scala.util.parsing.json with org.json4s.jackson which used in Spark 
> too
> ---
>
> Key: HBASE-28152
> URL: https://issues.apache.org/jira/browse/HBASE-28152
> Project: HBase
>  Issue Type: New Feature
>  Components: spark
>Affects Versions: connector-1.0.0
>Reporter: Attila Zsolt Piros
>Assignee: Attila Zsolt Piros
>Priority: Major
>
> In https://issues.apache.org/jira/browse/HBASE-28137 to support Spark 3.4 the 
> scala-parser-combinators was added as direct dependency to HBase Spark 
> Connector. 
> This was needed as Spark 3.4 is not using scala-parser-combinators and it is 
> not inherited as transitive dependency.  
> But this solution has a disadvantage. As the HBase Spark Connector assembly 
> jar does not include any 3rd party libraries the scala-parser-combinators 
> must be added to the spark classpath for HBase Spark Connector to work.
> A much better solution is to replace  scala.util.parsing.json with 
> org.json4s.jackson which is used by Spark core, see 
> https://github.com/apache/spark/blob/branch-3.4/core/pom.xml#L279-L280.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


Re: [PR] HBASE-28152. Replace scala.util.parsing.json with org.json4s.jackson [hbase-connectors]

2023-10-16 Thread via GitHub


meszibalu merged PR #126:
URL: https://github.com/apache/hbase-connectors/pull/126


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@hbase.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org