[jira] [Updated] (HDFS-15273) CacheReplicationMonitor hold lock for long time and lead to NN out of service
[ https://issues.apache.org/jira/browse/HDFS-15273?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Wei-Chiu Chuang updated HDFS-15273: --- Fix Version/s: 3.4.0 Resolution: Fixed Status: Resolved (was: Patch Available) > CacheReplicationMonitor hold lock for long time and lead to NN out of service > - > > Key: HDFS-15273 > URL: https://issues.apache.org/jira/browse/HDFS-15273 > Project: Hadoop HDFS > Issue Type: Improvement > Components: caching, namenode >Reporter: Xiaoqiao He >Assignee: Xiaoqiao He >Priority: Major > Fix For: 3.4.0 > > Attachments: HDFS-15273.001.patch, HDFS-15273.002.patch, > HDFS-15273.003.patch > > > CacheReplicationMonitor scan Cache Directives and Cached BlockMap > periodically. If we add more and more cache directives, > CacheReplicationMonitor will cost very long time to rescan all of cache > directives and cache blocks. Meanwhile, scan operation hold global write > lock, during scan period, NameNode could not process other request. > So I think we should warn this risk to end user who turn on CacheManager > feature before improve this implement. > {code:java} > private void rescan() throws InterruptedException { > scannedDirectives = 0; > scannedBlocks = 0; > try { > namesystem.writeLock(); > try { > lock.lock(); > if (shutdown) { > throw new InterruptedException("CacheReplicationMonitor was " + > "shut down."); > } > curScanCount = completedScanCount + 1; > } finally { > lock.unlock(); > } > resetStatistics(); > rescanCacheDirectives(); > rescanCachedBlockMap(); > blockManager.getDatanodeManager().resetLastCachingDirectiveSentTime(); > } finally { > namesystem.writeUnlock(); > } > } > {code} -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Updated] (HDFS-15273) CacheReplicationMonitor hold lock for long time and lead to NN out of service
[ https://issues.apache.org/jira/browse/HDFS-15273?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Xiaoqiao He updated HDFS-15273: --- Attachment: HDFS-15273.003.patch > CacheReplicationMonitor hold lock for long time and lead to NN out of service > - > > Key: HDFS-15273 > URL: https://issues.apache.org/jira/browse/HDFS-15273 > Project: Hadoop HDFS > Issue Type: Improvement > Components: caching, namenode >Reporter: Xiaoqiao He >Assignee: Xiaoqiao He >Priority: Major > Attachments: HDFS-15273.001.patch, HDFS-15273.002.patch, > HDFS-15273.003.patch > > > CacheReplicationMonitor scan Cache Directives and Cached BlockMap > periodically. If we add more and more cache directives, > CacheReplicationMonitor will cost very long time to rescan all of cache > directives and cache blocks. Meanwhile, scan operation hold global write > lock, during scan period, NameNode could not process other request. > So I think we should warn this risk to end user who turn on CacheManager > feature before improve this implement. > {code:java} > private void rescan() throws InterruptedException { > scannedDirectives = 0; > scannedBlocks = 0; > try { > namesystem.writeLock(); > try { > lock.lock(); > if (shutdown) { > throw new InterruptedException("CacheReplicationMonitor was " + > "shut down."); > } > curScanCount = completedScanCount + 1; > } finally { > lock.unlock(); > } > resetStatistics(); > rescanCacheDirectives(); > rescanCachedBlockMap(); > blockManager.getDatanodeManager().resetLastCachingDirectiveSentTime(); > } finally { > namesystem.writeUnlock(); > } > } > {code} -- This message was sent by Atlassian Jira (v8.20.1#820001) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Updated] (HDFS-15273) CacheReplicationMonitor hold lock for long time and lead to NN out of service
[ https://issues.apache.org/jira/browse/HDFS-15273?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Xiaoqiao He updated HDFS-15273: --- Attachment: HDFS-15273.002.patch > CacheReplicationMonitor hold lock for long time and lead to NN out of service > - > > Key: HDFS-15273 > URL: https://issues.apache.org/jira/browse/HDFS-15273 > Project: Hadoop HDFS > Issue Type: Improvement > Components: caching, namenode >Reporter: Xiaoqiao He >Assignee: Xiaoqiao He >Priority: Major > Attachments: HDFS-15273.001.patch, HDFS-15273.002.patch > > > CacheReplicationMonitor scan Cache Directives and Cached BlockMap > periodically. If we add more and more cache directives, > CacheReplicationMonitor will cost very long time to rescan all of cache > directives and cache blocks. Meanwhile, scan operation hold global write > lock, during scan period, NameNode could not process other request. > So I think we should warn this risk to end user who turn on CacheManager > feature before improve this implement. > {code:java} > private void rescan() throws InterruptedException { > scannedDirectives = 0; > scannedBlocks = 0; > try { > namesystem.writeLock(); > try { > lock.lock(); > if (shutdown) { > throw new InterruptedException("CacheReplicationMonitor was " + > "shut down."); > } > curScanCount = completedScanCount + 1; > } finally { > lock.unlock(); > } > resetStatistics(); > rescanCacheDirectives(); > rescanCachedBlockMap(); > blockManager.getDatanodeManager().resetLastCachingDirectiveSentTime(); > } finally { > namesystem.writeUnlock(); > } > } > {code} -- This message was sent by Atlassian Jira (v8.20.1#820001) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Updated] (HDFS-15273) CacheReplicationMonitor hold lock for long time and lead to NN out of service
[ https://issues.apache.org/jira/browse/HDFS-15273?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Wei-Chiu Chuang updated HDFS-15273: --- Status: Open (was: Patch Available) > CacheReplicationMonitor hold lock for long time and lead to NN out of service > - > > Key: HDFS-15273 > URL: https://issues.apache.org/jira/browse/HDFS-15273 > Project: Hadoop HDFS > Issue Type: Improvement > Components: caching, namenode >Reporter: Xiaoqiao He >Assignee: Xiaoqiao He >Priority: Major > Attachments: HDFS-15273.001.patch > > > CacheReplicationMonitor scan Cache Directives and Cached BlockMap > periodically. If we add more and more cache directives, > CacheReplicationMonitor will cost very long time to rescan all of cache > directives and cache blocks. Meanwhile, scan operation hold global write > lock, during scan period, NameNode could not process other request. > So I think we should warn this risk to end user who turn on CacheManager > feature before improve this implement. > {code:java} > private void rescan() throws InterruptedException { > scannedDirectives = 0; > scannedBlocks = 0; > try { > namesystem.writeLock(); > try { > lock.lock(); > if (shutdown) { > throw new InterruptedException("CacheReplicationMonitor was " + > "shut down."); > } > curScanCount = completedScanCount + 1; > } finally { > lock.unlock(); > } > resetStatistics(); > rescanCacheDirectives(); > rescanCachedBlockMap(); > blockManager.getDatanodeManager().resetLastCachingDirectiveSentTime(); > } finally { > namesystem.writeUnlock(); > } > } > {code} -- This message was sent by Atlassian Jira (v8.20.1#820001) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Updated] (HDFS-15273) CacheReplicationMonitor hold lock for long time and lead to NN out of service
[ https://issues.apache.org/jira/browse/HDFS-15273?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Wei-Chiu Chuang updated HDFS-15273: --- Status: Patch Available (was: Open) The patch still applies. Submit the patch to go through the precommit tests. > CacheReplicationMonitor hold lock for long time and lead to NN out of service > - > > Key: HDFS-15273 > URL: https://issues.apache.org/jira/browse/HDFS-15273 > Project: Hadoop HDFS > Issue Type: Improvement > Components: caching, namenode >Reporter: Xiaoqiao He >Assignee: Xiaoqiao He >Priority: Major > Attachments: HDFS-15273.001.patch > > > CacheReplicationMonitor scan Cache Directives and Cached BlockMap > periodically. If we add more and more cache directives, > CacheReplicationMonitor will cost very long time to rescan all of cache > directives and cache blocks. Meanwhile, scan operation hold global write > lock, during scan period, NameNode could not process other request. > So I think we should warn this risk to end user who turn on CacheManager > feature before improve this implement. > {code:java} > private void rescan() throws InterruptedException { > scannedDirectives = 0; > scannedBlocks = 0; > try { > namesystem.writeLock(); > try { > lock.lock(); > if (shutdown) { > throw new InterruptedException("CacheReplicationMonitor was " + > "shut down."); > } > curScanCount = completedScanCount + 1; > } finally { > lock.unlock(); > } > resetStatistics(); > rescanCacheDirectives(); > rescanCachedBlockMap(); > blockManager.getDatanodeManager().resetLastCachingDirectiveSentTime(); > } finally { > namesystem.writeUnlock(); > } > } > {code} -- This message was sent by Atlassian Jira (v8.20.1#820001) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Updated] (HDFS-15273) CacheReplicationMonitor hold lock for long time and lead to NN out of service
[ https://issues.apache.org/jira/browse/HDFS-15273?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Xiaoqiao He updated HDFS-15273: --- Attachment: HDFS-15273.001.patch Status: Patch Available (was: Open) submit demo patch and trigger jenkins. > CacheReplicationMonitor hold lock for long time and lead to NN out of service > - > > Key: HDFS-15273 > URL: https://issues.apache.org/jira/browse/HDFS-15273 > Project: Hadoop HDFS > Issue Type: Improvement > Components: caching, namenode >Reporter: Xiaoqiao He >Assignee: Xiaoqiao He >Priority: Major > Attachments: HDFS-15273.001.patch > > > CacheReplicationMonitor scan Cache Directives and Cached BlockMap > periodically. If we add more and more cache directives, > CacheReplicationMonitor will cost very long time to rescan all of cache > directives and cache blocks. Meanwhile, scan operation hold global write > lock, during scan period, NameNode could not process other request. > So I think we should warn this risk to end user who turn on CacheManager > feature before improve this implement. > {code:java} > private void rescan() throws InterruptedException { > scannedDirectives = 0; > scannedBlocks = 0; > try { > namesystem.writeLock(); > try { > lock.lock(); > if (shutdown) { > throw new InterruptedException("CacheReplicationMonitor was " + > "shut down."); > } > curScanCount = completedScanCount + 1; > } finally { > lock.unlock(); > } > resetStatistics(); > rescanCacheDirectives(); > rescanCachedBlockMap(); > blockManager.getDatanodeManager().resetLastCachingDirectiveSentTime(); > } finally { > namesystem.writeUnlock(); > } > } > {code} -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org