[ 
https://issues.apache.org/jira/browse/HDFS-16198?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Shilun Fan updated HDFS-16198:
------------------------------
          Component/s: block placement
         Hadoop Flags: Reviewed
     Target Version/s: 3.2.4, 3.3.2, 2.10.2, 3.4.0
    Affects Version/s: 3.2.4
                       3.3.2
                       2.10.2
                       3.4.0

> Short circuit read leaks Slot objects when InvalidToken exception is thrown
> ---------------------------------------------------------------------------
>
>                 Key: HDFS-16198
>                 URL: https://issues.apache.org/jira/browse/HDFS-16198
>             Project: Hadoop HDFS
>          Issue Type: Bug
>          Components: block placement
>    Affects Versions: 3.4.0, 2.10.2, 3.3.2, 3.2.4
>            Reporter: Eungsop Yoo
>            Assignee: Eungsop Yoo
>            Priority: Major
>              Labels: pull-request-available
>             Fix For: 3.4.0, 2.10.2, 3.3.2, 3.2.4
>
>         Attachments: HDFS-16198.patch, screenshot-2.png
>
>          Time Spent: 2.5h
>  Remaining Estimate: 0h
>
> In secure mode, 'dfs.block.access.token.enable' should be set 'true'. With 
> this configuration SecretManager.InvalidToken exception may be thrown if the 
> access token expires when we do short circuit reads. It doesn't matter 
> because the failed reads will be retried. But it causes the leakage of 
> ShortCircuitShm.Slot objects. 
>  
> We found this problem in our secure HBase clusters. The number of open file 
> descriptors of RegionServers kept increasing using short circuit reading. 
> !screenshot-2.png!
>  
> It was caused by the leakage of shared memory segments used by short circuit 
> reading.
> {code:java}
> [root ~]# lsof -p $(ps -ef | grep proc_regionserver | grep -v grep | awk 
> '{print $2}') | grep /dev/shm | wc -l
> 3925
> [root ~]# lsof -p $(ps -ef | grep proc_regionserver | grep -v grep | awk 
> '{print $2}') | grep /dev/shm | head -5
> java 86309 hbase DEL REG 0,19 2308279984 
> /dev/shm/HadoopShortCircuitShm_DFSClient_NONMAPREDUCE_-1107866286_1_743473959
> java 86309 hbase DEL REG 0,19 2306359893 
> /dev/shm/HadoopShortCircuitShm_DFSClient_NONMAPREDUCE_-1107866286_1_1594162967
> java 86309 hbase DEL REG 0,19 2305496758 
> /dev/shm/HadoopShortCircuitShm_DFSClient_NONMAPREDUCE_-1107866286_1_2043027439
> java 86309 hbase DEL REG 0,19 2304784261 
> /dev/shm/HadoopShortCircuitShm_DFSClient_NONMAPREDUCE_-1107866286_1_689571088
> java 86309 hbase DEL REG 0,19 2302621988 
> /dev/shm/HadoopShortCircuitShm_DFSClient_NONMAPREDUCE_-1107866286_1_347008590 
> {code}
>  
> We finally found that the root cause of this is the leakage of 
> ShortCircuitShm.Slot.
>  
> The fix is trivial. Just free the slot when InvalidToken exception is thrown.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

---------------------------------------------------------------------
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org

Reply via email to