[jira] [Comment Edited] (HBASE-28599) RowTooBigException is thrown when duplicate increment RPC call is attempted

2024-05-19 Thread youngju kim (Jira)


[ 
https://issues.apache.org/jira/browse/HBASE-28599?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17847675#comment-17847675
 ] 

youngju kim edited comment on HBASE-28599 at 5/19/24 12:53 PM:
---

Hello, [~zhangduo] Could you review this PR?

[https://github.com/apache/hbase/pull/5927]


was (Author: JIRAUSER300939):
Hello, [~zhangduo] Could you review this PR?

[https://github.com/apache/hbase/pull/5927]
 
 

> RowTooBigException is thrown when duplicate increment RPC call is attempted
> ---
>
> Key: HBASE-28599
> URL: https://issues.apache.org/jira/browse/HBASE-28599
> Project: HBase
>  Issue Type: Bug
>  Components: regionserver
>Affects Versions: 2.5.5, 2.5.6, 2.5.7, 2.5.8
>Reporter: Robin Infant A
>Assignee: youngju kim
>Priority: Major
>  Labels: pull-request-available
> Attachments: RowTooBig_trace.txt
>
>
> *Issue:*
> `RowTooBigException` is thrown when a duplicate increment RPC call is 
> attempted.
> *Expected Behavior:*
> 1. The initial RPC increment call should time out for some reason.
> 2. The duplicate RPC call should be converted to a GET request and fetch the 
> result that I am trying to increment.
> 3. The result should contain only the qualifier that I am attempting to 
> increment.
> *Actual Behavior:*
> 1. The initial RPC increment call timed out, which is expected.
> 2. The duplicate RPC call is converted to a GET request but fails to clone 
> the qualifier into the GET request.
> 3. Hence, the GET request attempts to retrieve all qualifiers for the given 
> row and columnfamily, resulting in a `RowTooBigException`.
> *Steps to Reproduce:*
> 1. Ensure a row with a total value size exceeding `hbase.table.max.rowsize` 
> (default = 1073741824) exists.
> 2. Nonce property should be enabled `hbase.client.nonces.enabled` which is 
> actually defaulted to true.
> 3. Attempt to increment a qualifier against the same row.
> 4. In my case, I am using a postIncrement co-processor which may cause a 
> delay (longer than the RPC timeout property).
> 5. A duplicate increment call should be triggered, which tries to get the 
> value rather than increment it.
> 6. The GET request actually tries to retrieve all the qualifiers for the row, 
> resulting in a `RowTooBigException`.
> *Insights:*
> Upon further debugging, I found that qualifiers are not cloned into the GET 
> instance due to incorrect usage of 
> [CellScanner.advance|https://github.com/apache/hbase/blob/7ebd4381261fefd78fc2acf258a95184f4147cee/hbase-server/src/main/java/org/apache/hadoop/hbase/regionserver/HRegion.java#L3833]
> *Fix Suggestion:*
> Removing the `!` operation from `while (!CellScanner.advance)` may resolve 
> the issue.
> Attached Exception Stack Trace for reference.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Commented] (HBASE-28599) RowTooBigException is thrown when duplicate increment RPC call is attempted

2024-05-19 Thread youngju kim (Jira)


[ 
https://issues.apache.org/jira/browse/HBASE-28599?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17847675#comment-17847675
 ] 

youngju kim commented on HBASE-28599:
-

Hello, [~zhangduo] Could you review this PR?

[https://github.com/apache/hbase/pull/5927]
 
 

> RowTooBigException is thrown when duplicate increment RPC call is attempted
> ---
>
> Key: HBASE-28599
> URL: https://issues.apache.org/jira/browse/HBASE-28599
> Project: HBase
>  Issue Type: Bug
>  Components: regionserver
>Affects Versions: 2.5.5, 2.5.6, 2.5.7, 2.5.8
>Reporter: Robin Infant A
>Assignee: youngju kim
>Priority: Major
>  Labels: pull-request-available
> Attachments: RowTooBig_trace.txt
>
>
> *Issue:*
> `RowTooBigException` is thrown when a duplicate increment RPC call is 
> attempted.
> *Expected Behavior:*
> 1. The initial RPC increment call should time out for some reason.
> 2. The duplicate RPC call should be converted to a GET request and fetch the 
> result that I am trying to increment.
> 3. The result should contain only the qualifier that I am attempting to 
> increment.
> *Actual Behavior:*
> 1. The initial RPC increment call timed out, which is expected.
> 2. The duplicate RPC call is converted to a GET request but fails to clone 
> the qualifier into the GET request.
> 3. Hence, the GET request attempts to retrieve all qualifiers for the given 
> row and columnfamily, resulting in a `RowTooBigException`.
> *Steps to Reproduce:*
> 1. Ensure a row with a total value size exceeding `hbase.table.max.rowsize` 
> (default = 1073741824) exists.
> 2. Nonce property should be enabled `hbase.client.nonces.enabled` which is 
> actually defaulted to true.
> 3. Attempt to increment a qualifier against the same row.
> 4. In my case, I am using a postIncrement co-processor which may cause a 
> delay (longer than the RPC timeout property).
> 5. A duplicate increment call should be triggered, which tries to get the 
> value rather than increment it.
> 6. The GET request actually tries to retrieve all the qualifiers for the row, 
> resulting in a `RowTooBigException`.
> *Insights:*
> Upon further debugging, I found that qualifiers are not cloned into the GET 
> instance due to incorrect usage of 
> [CellScanner.advance|https://github.com/apache/hbase/blob/7ebd4381261fefd78fc2acf258a95184f4147cee/hbase-server/src/main/java/org/apache/hadoop/hbase/regionserver/HRegion.java#L3833]
> *Fix Suggestion:*
> Removing the `!` operation from `while (!CellScanner.advance)` may resolve 
> the issue.
> Attached Exception Stack Trace for reference.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Assigned] (HBASE-28599) RowTooBigException is thrown when duplicate increment RPC call is attempted

2024-05-18 Thread youngju kim (Jira)


 [ 
https://issues.apache.org/jira/browse/HBASE-28599?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

youngju kim reassigned HBASE-28599:
---

Assignee: youngju kim

> RowTooBigException is thrown when duplicate increment RPC call is attempted
> ---
>
> Key: HBASE-28599
> URL: https://issues.apache.org/jira/browse/HBASE-28599
> Project: HBase
>  Issue Type: Bug
>  Components: regionserver
>Affects Versions: 2.5.5, 2.5.6, 2.5.7, 2.5.8
>Reporter: Robin Infant A
>Assignee: youngju kim
>Priority: Major
> Attachments: RowTooBig_trace.txt
>
>
> *Issue:*
> `RowTooBigException` is thrown when a duplicate increment RPC call is 
> attempted.
> *Expected Behavior:*
> 1. The initial RPC increment call should time out for some reason.
> 2. The duplicate RPC call should be converted to a GET request and fetch the 
> result that I am trying to increment.
> 3. The result should contain only the qualifier that I am attempting to 
> increment.
> *Actual Behavior:*
> 1. The initial RPC increment call timed out, which is expected.
> 2. The duplicate RPC call is converted to a GET request but fails to clone 
> the qualifier into the GET request.
> 3. Hence, the GET request attempts to retrieve all qualifiers for the given 
> row and columnfamily, resulting in a `RowTooBigException`.
> *Steps to Reproduce:*
> 1. Ensure a row with a total value size exceeding `hbase.table.max.rowsize` 
> (default = 1073741824) exists.
> 2. Nonce property should be enabled `hbase.client.nonces.enabled` which is 
> actually defaulted to true.
> 3. Attempt to increment a qualifier against the same row.
> 4. In my case, I am using a postIncrement co-processor which may cause a 
> delay (longer than the RPC timeout property).
> 5. A duplicate increment call should be triggered, which tries to get the 
> value rather than increment it.
> 6. The GET request actually tries to retrieve all the qualifiers for the row, 
> resulting in a `RowTooBigException`.
> *Insights:*
> Upon further debugging, I found that qualifiers are not cloned into the GET 
> instance due to incorrect usage of 
> [CellScanner.advance|https://github.com/apache/hbase/blob/7ebd4381261fefd78fc2acf258a95184f4147cee/hbase-server/src/main/java/org/apache/hadoop/hbase/regionserver/HRegion.java#L3833]
> *Fix Suggestion:*
> Removing the `!` operation from `while (!CellScanner.advance)` may resolve 
> the issue.
> Attached Exception Stack Trace for reference.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Comment Edited] (HBASE-28599) RowTooBigException is thrown when duplicate increment RPC call is attempted

2024-05-18 Thread youngju kim (Jira)


[ 
https://issues.apache.org/jira/browse/HBASE-28599?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17847554#comment-17847554
 ] 

youngju kim edited comment on HBASE-28599 at 5/18/24 3:23 PM:
--

Hi,[~robiee17]. This is youngjukim and I'm operating HBase clusters in my 
company from Korea. Thank you for explaining the detailed problem definition, 
reproduction process with fix suggestions. I'd like to do HBase contributing. 
Is it okay if I try to solve this issue?


was (Author: JIRAUSER300939):
Hi,[~robiee17]. This is youngjukim and I'm operating HBase clusters in my 
company from Korea. Thank you for explaining the detailed problem definition, 
reproduction process with some hint. I'd like to do HBase contributing. Is it 
okay if I try to solve this issue?

> RowTooBigException is thrown when duplicate increment RPC call is attempted
> ---
>
> Key: HBASE-28599
> URL: https://issues.apache.org/jira/browse/HBASE-28599
> Project: HBase
>  Issue Type: Bug
>  Components: regionserver
>Affects Versions: 2.5.5, 2.5.6, 2.5.7, 2.5.8
>Reporter: Robin Infant A
>Priority: Major
> Attachments: RowTooBig_trace.txt
>
>
> *Issue:*
> `RowTooBigException` is thrown when a duplicate increment RPC call is 
> attempted.
> *Expected Behavior:*
> 1. The initial RPC increment call should time out for some reason.
> 2. The duplicate RPC call should be converted to a GET request and fetch the 
> result that I am trying to increment.
> 3. The result should contain only the qualifier that I am attempting to 
> increment.
> *Actual Behavior:*
> 1. The initial RPC increment call timed out, which is expected.
> 2. The duplicate RPC call is converted to a GET request but fails to clone 
> the qualifier into the GET request.
> 3. Hence, the GET request attempts to retrieve all qualifiers for the given 
> row and columnfamily, resulting in a `RowTooBigException`.
> *Steps to Reproduce:*
> 1. Ensure a row with a total value size exceeding `hbase.table.max.rowsize` 
> (default = 1073741824) exists.
> 2. Nonce property should be enabled `hbase.client.nonces.enabled` which is 
> actually defaulted to true.
> 3. Attempt to increment a qualifier against the same row.
> 4. In my case, I am using a postIncrement co-processor which may cause a 
> delay (longer than the RPC timeout property).
> 5. A duplicate increment call should be triggered, which tries to get the 
> value rather than increment it.
> 6. The GET request actually tries to retrieve all the qualifiers for the row, 
> resulting in a `RowTooBigException`.
> *Insights:*
> Upon further debugging, I found that qualifiers are not cloned into the GET 
> instance due to incorrect usage of 
> [CellScanner.advance|https://github.com/apache/hbase/blob/7ebd4381261fefd78fc2acf258a95184f4147cee/hbase-server/src/main/java/org/apache/hadoop/hbase/regionserver/HRegion.java#L3833]
> *Fix Suggestion:*
> Removing the `!` operation from `while (!CellScanner.advance)` may resolve 
> the issue.
> Attached Exception Stack Trace for reference.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Comment Edited] (HBASE-28599) RowTooBigException is thrown when duplicate increment RPC call is attempted

2024-05-18 Thread youngju kim (Jira)


[ 
https://issues.apache.org/jira/browse/HBASE-28599?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17847554#comment-17847554
 ] 

youngju kim edited comment on HBASE-28599 at 5/18/24 3:20 PM:
--

Hi,[~robiee17]. This is youngjukim and I'm operating HBase clusters in my 
company from Korea. Thank you for explaining the detailed problem definition, 
reproduction process with some hint. I'd like to do HBase contributing. Is it 
okay if I try to solve this issue?


was (Author: JIRAUSER300939):
Hi,[~robiee17] . I'm operating HBase clusters in my company from Korea. Thank 
you for explaining the detailed problem definition, reproduction process with 
some hint. I'd like to do HBase contributing. Is it okay if I try to solve this 
issue?

> RowTooBigException is thrown when duplicate increment RPC call is attempted
> ---
>
> Key: HBASE-28599
> URL: https://issues.apache.org/jira/browse/HBASE-28599
> Project: HBase
>  Issue Type: Bug
>  Components: regionserver
>Affects Versions: 2.5.5, 2.5.6, 2.5.7, 2.5.8
>Reporter: Robin Infant A
>Priority: Major
> Attachments: RowTooBig_trace.txt
>
>
> *Issue:*
> `RowTooBigException` is thrown when a duplicate increment RPC call is 
> attempted.
> *Expected Behavior:*
> 1. The initial RPC increment call should time out for some reason.
> 2. The duplicate RPC call should be converted to a GET request and fetch the 
> result that I am trying to increment.
> 3. The result should contain only the qualifier that I am attempting to 
> increment.
> *Actual Behavior:*
> 1. The initial RPC increment call timed out, which is expected.
> 2. The duplicate RPC call is converted to a GET request but fails to clone 
> the qualifier into the GET request.
> 3. Hence, the GET request attempts to retrieve all qualifiers for the given 
> row and columnfamily, resulting in a `RowTooBigException`.
> *Steps to Reproduce:*
> 1. Ensure a row with a total value size exceeding `hbase.table.max.rowsize` 
> (default = 1073741824) exists.
> 2. Nonce property should be enabled `hbase.client.nonces.enabled` which is 
> actually defaulted to true.
> 3. Attempt to increment a qualifier against the same row.
> 4. In my case, I am using a postIncrement co-processor which may cause a 
> delay (longer than the RPC timeout property).
> 5. A duplicate increment call should be triggered, which tries to get the 
> value rather than increment it.
> 6. The GET request actually tries to retrieve all the qualifiers for the row, 
> resulting in a `RowTooBigException`.
> *Insights:*
> Upon further debugging, I found that qualifiers are not cloned into the GET 
> instance due to incorrect usage of 
> [CellScanner.advance|https://github.com/apache/hbase/blob/7ebd4381261fefd78fc2acf258a95184f4147cee/hbase-server/src/main/java/org/apache/hadoop/hbase/regionserver/HRegion.java#L3833]
> *Fix Suggestion:*
> Removing the `!` operation from `while (!CellScanner.advance)` may resolve 
> the issue.
> Attached Exception Stack Trace for reference.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Commented] (HBASE-28599) RowTooBigException is thrown when duplicate increment RPC call is attempted

2024-05-18 Thread youngju kim (Jira)


[ 
https://issues.apache.org/jira/browse/HBASE-28599?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17847554#comment-17847554
 ] 

youngju kim commented on HBASE-28599:
-

Hi,[~robiee17] . I'm operating HBase clusters in my company from Korea. Thank 
you for explaining the detailed problem definition, reproduction process with 
some hint. I'd like to do HBase contributing. Is it okay if I try to solve this 
issue?

> RowTooBigException is thrown when duplicate increment RPC call is attempted
> ---
>
> Key: HBASE-28599
> URL: https://issues.apache.org/jira/browse/HBASE-28599
> Project: HBase
>  Issue Type: Bug
>  Components: regionserver
>Affects Versions: 2.5.5, 2.5.6, 2.5.7, 2.5.8
>Reporter: Robin Infant A
>Priority: Major
> Attachments: RowTooBig_trace.txt
>
>
> *Issue:*
> `RowTooBigException` is thrown when a duplicate increment RPC call is 
> attempted.
> *Expected Behavior:*
> 1. The initial RPC increment call should time out for some reason.
> 2. The duplicate RPC call should be converted to a GET request and fetch the 
> result that I am trying to increment.
> 3. The result should contain only the qualifier that I am attempting to 
> increment.
> *Actual Behavior:*
> 1. The initial RPC increment call timed out, which is expected.
> 2. The duplicate RPC call is converted to a GET request but fails to clone 
> the qualifier into the GET request.
> 3. Hence, the GET request attempts to retrieve all qualifiers for the given 
> row and columnfamily, resulting in a `RowTooBigException`.
> *Steps to Reproduce:*
> 1. Ensure a row with a total value size exceeding `hbase.table.max.rowsize` 
> (default = 1073741824) exists.
> 2. Nonce property should be enabled `hbase.client.nonces.enabled` which is 
> actually defaulted to true.
> 3. Attempt to increment a qualifier against the same row.
> 4. In my case, I am using a postIncrement co-processor which may cause a 
> delay (longer than the RPC timeout property).
> 5. A duplicate increment call should be triggered, which tries to get the 
> value rather than increment it.
> 6. The GET request actually tries to retrieve all the qualifiers for the row, 
> resulting in a `RowTooBigException`.
> *Insights:*
> Upon further debugging, I found that qualifiers are not cloned into the GET 
> instance due to incorrect usage of 
> [CellScanner.advance|https://github.com/apache/hbase/blob/7ebd4381261fefd78fc2acf258a95184f4147cee/hbase-server/src/main/java/org/apache/hadoop/hbase/regionserver/HRegion.java#L3833]
> *Fix Suggestion:*
> Removing the `!` operation from `while (!CellScanner.advance)` may resolve 
> the issue.
> Attached Exception Stack Trace for reference.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Assigned] (HBASE-28294) Support to skip Kerberos authentication for metric endpoints

2024-05-18 Thread youngju kim (Jira)


 [ 
https://issues.apache.org/jira/browse/HBASE-28294?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

youngju kim reassigned HBASE-28294:
---

Assignee: (was: youngju kim)

> Support to skip Kerberos authentication for metric endpoints
> 
>
> Key: HBASE-28294
> URL: https://issues.apache.org/jira/browse/HBASE-28294
> Project: HBase
>  Issue Type: New Feature
>  Components: metrics, UI
>Affects Versions: 2.5.5
>Reporter: YUBI LEE
>Priority: Major
>
> Make HBase support to skip Kerberos authentication for metric endpoints. 
> (e.g. /jvm, /prometheus, /metrics)
> Since HBase uses KerberoAuthenticationHandler.java, whitelist configuration 
> can be used like 
> [HADOOP-16527|https://issues.apache.org/jira/browse/HADOOP-16527].
>  
>  



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Assigned] (HBASE-28294) Support to skip Kerberos authentication for metric endpoints

2024-05-18 Thread youngju kim (Jira)


 [ 
https://issues.apache.org/jira/browse/HBASE-28294?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

youngju kim reassigned HBASE-28294:
---

Assignee: youngju kim

> Support to skip Kerberos authentication for metric endpoints
> 
>
> Key: HBASE-28294
> URL: https://issues.apache.org/jira/browse/HBASE-28294
> Project: HBase
>  Issue Type: New Feature
>  Components: metrics, UI
>Affects Versions: 2.5.5
>Reporter: YUBI LEE
>Assignee: youngju kim
>Priority: Major
>
> Make HBase support to skip Kerberos authentication for metric endpoints. 
> (e.g. /jvm, /prometheus, /metrics)
> Since HBase uses KerberoAuthenticationHandler.java, whitelist configuration 
> can be used like 
> [HADOOP-16527|https://issues.apache.org/jira/browse/HADOOP-16527].
>  
>  



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Assigned] (HBASE-28015) rpc read handler can get stuck on LruBlockCache#getBlock

2024-05-17 Thread youngju kim (Jira)


 [ 
https://issues.apache.org/jira/browse/HBASE-28015?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

youngju kim reassigned HBASE-28015:
---

Assignee: (was: youngju kim)

> rpc read handler can get stuck on LruBlockCache#getBlock
> 
>
> Key: HBASE-28015
> URL: https://issues.apache.org/jira/browse/HBASE-28015
> Project: HBase
>  Issue Type: Bug
>  Components: BlockCache
>Affects Versions: 3.0.0-alpha-4
>Reporter: ruanhui
>Priority: Major
> Fix For: 4.0.0-alpha-1
>
>
> In our production, we found lots of read handlers got stuck on 
> LruBlockCache#getBlock. This may be caused by a bug in jdk8 
> ConcurrentHashMap. To make common fast, I think we'd better get and check it 
> before call ConcurrentHashMap#computeIfPresent.
>  
>  
> "RpcServer.priority.RWQ.Fifo.scan.handler=190,queue=57,port=60020" #1807 
> daemon prio=5 os_prio=0 cpu=9703.28ms elapsed=88160.93s 
> tid=0x7f38d338a800 nid=0x8f4 waiting for monitor entry 
> [0x7f0af4baa000]
>    java.lang.Thread.State: BLOCKED (on object monitor)
>         at 
> java.util.concurrent.ConcurrentHashMap.computeIfPresent(ConcurrentHashMap.java:1760)
>         - waiting to lock <0x7f2fc6495fe0> (a 
> java.util.concurrent.ConcurrentHashMap$Node)
>         at 
> org.apache.hadoop.hbase.io.hfile.LruBlockCache.getBlock(LruBlockCache.java:538)
>         at 
> org.apache.hadoop.hbase.io.hfile.CombinedBlockCache.getBlock(CombinedBlockCache.java:88)
>         at 
> org.apache.hadoop.hbase.io.hfile.HFileReaderImpl.getCachedBlock(HFileReaderImpl.java:1124)
>         at 
> org.apache.hadoop.hbase.io.hfile.HFileReaderImpl.readBlock(HFileReaderImpl.java:1300)
>         at 
> org.apache.hadoop.hbase.io.hfile.HFileBlockIndex$CellBasedKeyBlockIndexReader.loadDataBlockWithScanInfo(HFileBlockIndex.java:331)
>         at 
> org.apache.hadoop.hbase.io.hfile.HFileReaderImpl$HFileScannerImpl.seekTo(HFileReaderImpl.java:679)
>         at 
> org.apache.hadoop.hbase.io.hfile.HFileReaderImpl$HFileScannerImpl.seekTo(HFileReaderImpl.java:631)
>         at 
> org.apache.hadoop.hbase.regionserver.StoreFileScanner.seekAtOrAfter(StoreFileScanner.java:315)
>         at 
> org.apache.hadoop.hbase.regionserver.StoreFileScanner.seek(StoreFileScanner.java:216)
>         at 
> org.apache.hadoop.hbase.regionserver.StoreFileScanner.backwardSeek(StoreFileScanner.java:561)
>         at 
> org.apache.hadoop.hbase.regionserver.ReversedKeyValueHeap.backwardSeek(ReversedKeyValueHeap.java:117)
>         at 
> org.apache.hadoop.hbase.regionserver.ReversedStoreScanner.backwardSeek(ReversedStoreScanner.java:134)
>         at 
> org.apache.hadoop.hbase.regionserver.ReversedStoreScanner.seekAsDirection(ReversedStoreScanner.java:94)
>         at 
> org.apache.hadoop.hbase.regionserver.StoreScanner.seekOrSkipToNextColumn(StoreScanner.java:821)
>         at 
> org.apache.hadoop.hbase.regionserver.StoreScanner.next(StoreScanner.java:727)
>         at 
> org.apache.hadoop.hbase.regionserver.KeyValueHeap.next(KeyValueHeap.java:155)
>         at 
> org.apache.hadoop.hbase.regionserver.HRegion$RegionScannerImpl.populateResult(HRegion.java:7515)
>         at 
> org.apache.hadoop.hbase.regionserver.HRegion$RegionScannerImpl.nextInternal(HRegion.java:7683)
>         at 
> org.apache.hadoop.hbase.regionserver.HRegion$RegionScannerImpl.nextRaw(HRegion.java:7447)
>         at 
> org.apache.hadoop.hbase.regionserver.RSRpcServices.scan(RSRpcServices.java:3403)
>         - locked <0x7f2ff1fc8f40> (a 
> org.apache.hadoop.hbase.regionserver.ReversedRegionScannerImpl)
>         at 
> org.apache.hadoop.hbase.regionserver.RSRpcServices.scan(RSRpcServices.java:3662)
>         at 
> org.apache.hadoop.hbase.shaded.protobuf.generated.ClientProtos$ClientService$2.callBlockingMethod(ClientProtos.java:45253)
>         at org.apache.hadoop.hbase.ipc.RpcServer.call(RpcServer.java:447)
>         at org.apache.hadoop.hbase.ipc.CallRunner.run(CallRunner.java:136)
>         at 
> org.apache.hadoop.hbase.ipc.RpcExecutor$Handler.run(RpcExecutor.java:338)
>         at 
> org.apache.hadoop.hbase.ipc.RpcExecutor$Handler.run(RpcExecutor.java:318)
>    Locked ownable synchronizers:
>         - None



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Assigned] (HBASE-28015) rpc read handler can get stuck on LruBlockCache#getBlock

2024-05-17 Thread youngju kim (Jira)


 [ 
https://issues.apache.org/jira/browse/HBASE-28015?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

youngju kim reassigned HBASE-28015:
---

Assignee: youngju kim

> rpc read handler can get stuck on LruBlockCache#getBlock
> 
>
> Key: HBASE-28015
> URL: https://issues.apache.org/jira/browse/HBASE-28015
> Project: HBase
>  Issue Type: Bug
>  Components: BlockCache
>Affects Versions: 3.0.0-alpha-4
>Reporter: ruanhui
>Assignee: youngju kim
>Priority: Major
> Fix For: 4.0.0-alpha-1
>
>
> In our production, we found lots of read handlers got stuck on 
> LruBlockCache#getBlock. This may be caused by a bug in jdk8 
> ConcurrentHashMap. To make common fast, I think we'd better get and check it 
> before call ConcurrentHashMap#computeIfPresent.
>  
>  
> "RpcServer.priority.RWQ.Fifo.scan.handler=190,queue=57,port=60020" #1807 
> daemon prio=5 os_prio=0 cpu=9703.28ms elapsed=88160.93s 
> tid=0x7f38d338a800 nid=0x8f4 waiting for monitor entry 
> [0x7f0af4baa000]
>    java.lang.Thread.State: BLOCKED (on object monitor)
>         at 
> java.util.concurrent.ConcurrentHashMap.computeIfPresent(ConcurrentHashMap.java:1760)
>         - waiting to lock <0x7f2fc6495fe0> (a 
> java.util.concurrent.ConcurrentHashMap$Node)
>         at 
> org.apache.hadoop.hbase.io.hfile.LruBlockCache.getBlock(LruBlockCache.java:538)
>         at 
> org.apache.hadoop.hbase.io.hfile.CombinedBlockCache.getBlock(CombinedBlockCache.java:88)
>         at 
> org.apache.hadoop.hbase.io.hfile.HFileReaderImpl.getCachedBlock(HFileReaderImpl.java:1124)
>         at 
> org.apache.hadoop.hbase.io.hfile.HFileReaderImpl.readBlock(HFileReaderImpl.java:1300)
>         at 
> org.apache.hadoop.hbase.io.hfile.HFileBlockIndex$CellBasedKeyBlockIndexReader.loadDataBlockWithScanInfo(HFileBlockIndex.java:331)
>         at 
> org.apache.hadoop.hbase.io.hfile.HFileReaderImpl$HFileScannerImpl.seekTo(HFileReaderImpl.java:679)
>         at 
> org.apache.hadoop.hbase.io.hfile.HFileReaderImpl$HFileScannerImpl.seekTo(HFileReaderImpl.java:631)
>         at 
> org.apache.hadoop.hbase.regionserver.StoreFileScanner.seekAtOrAfter(StoreFileScanner.java:315)
>         at 
> org.apache.hadoop.hbase.regionserver.StoreFileScanner.seek(StoreFileScanner.java:216)
>         at 
> org.apache.hadoop.hbase.regionserver.StoreFileScanner.backwardSeek(StoreFileScanner.java:561)
>         at 
> org.apache.hadoop.hbase.regionserver.ReversedKeyValueHeap.backwardSeek(ReversedKeyValueHeap.java:117)
>         at 
> org.apache.hadoop.hbase.regionserver.ReversedStoreScanner.backwardSeek(ReversedStoreScanner.java:134)
>         at 
> org.apache.hadoop.hbase.regionserver.ReversedStoreScanner.seekAsDirection(ReversedStoreScanner.java:94)
>         at 
> org.apache.hadoop.hbase.regionserver.StoreScanner.seekOrSkipToNextColumn(StoreScanner.java:821)
>         at 
> org.apache.hadoop.hbase.regionserver.StoreScanner.next(StoreScanner.java:727)
>         at 
> org.apache.hadoop.hbase.regionserver.KeyValueHeap.next(KeyValueHeap.java:155)
>         at 
> org.apache.hadoop.hbase.regionserver.HRegion$RegionScannerImpl.populateResult(HRegion.java:7515)
>         at 
> org.apache.hadoop.hbase.regionserver.HRegion$RegionScannerImpl.nextInternal(HRegion.java:7683)
>         at 
> org.apache.hadoop.hbase.regionserver.HRegion$RegionScannerImpl.nextRaw(HRegion.java:7447)
>         at 
> org.apache.hadoop.hbase.regionserver.RSRpcServices.scan(RSRpcServices.java:3403)
>         - locked <0x7f2ff1fc8f40> (a 
> org.apache.hadoop.hbase.regionserver.ReversedRegionScannerImpl)
>         at 
> org.apache.hadoop.hbase.regionserver.RSRpcServices.scan(RSRpcServices.java:3662)
>         at 
> org.apache.hadoop.hbase.shaded.protobuf.generated.ClientProtos$ClientService$2.callBlockingMethod(ClientProtos.java:45253)
>         at org.apache.hadoop.hbase.ipc.RpcServer.call(RpcServer.java:447)
>         at org.apache.hadoop.hbase.ipc.CallRunner.run(CallRunner.java:136)
>         at 
> org.apache.hadoop.hbase.ipc.RpcExecutor$Handler.run(RpcExecutor.java:338)
>         at 
> org.apache.hadoop.hbase.ipc.RpcExecutor$Handler.run(RpcExecutor.java:318)
>    Locked ownable synchronizers:
>         - None



--
This message was sent by Atlassian Jira
(v8.20.10#820010)