[ https://issues.apache.org/jira/browse/HDFS-16484?focusedWorklogId=755199&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-755199 ]
ASF GitHub Bot logged work on HDFS-16484: ----------------------------------------- Author: ASF GitHub Bot Created on: 11/Apr/22 12:19 Start Date: 11/Apr/22 12:19 Worklog Time Spent: 10m Work Description: liubingxing commented on code in PR #4032: URL: https://github.com/apache/hadoop/pull/4032#discussion_r847262936 ########## hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/namenode/sps/BlockStorageMovementNeeded.java: ########## @@ -248,13 +250,18 @@ public void run() { pendingWorkForDirectory.get(startINode); if (dirPendingWorkInfo != null && dirPendingWorkInfo.isDirWorkDone()) { - ctxt.removeSPSHint(startINode); pendingWorkForDirectory.remove(startINode); + ctxt.removeSPSHint(startINode); Review Comment: @tasanuma Thanks for your suggestion. It is also a good way to solve this problem by catching the FileNotFoundException here. I updated the code according to your suggestion. Issue Time Tracking ------------------- Worklog Id: (was: 755199) Time Spent: 2.5h (was: 2h 20m) > [SPS]: Fix an infinite loop bug in SPSPathIdProcessor thread > ------------------------------------------------------------- > > Key: HDFS-16484 > URL: https://issues.apache.org/jira/browse/HDFS-16484 > Project: Hadoop HDFS > Issue Type: Sub-task > Reporter: qinyuren > Assignee: qinyuren > Priority: Major > Labels: pull-request-available > Attachments: image-2022-02-25-14-35-42-255.png > > Time Spent: 2.5h > Remaining Estimate: 0h > > Currently, we ran SPS in our cluster and found this log. The > SPSPathIdProcessor thread enters an infinite loop and prints the same log all > the time. > !image-2022-02-25-14-35-42-255.png|width=682,height=195! > In SPSPathIdProcessor thread, if it get a inodeId which path does not exist, > then the SPSPathIdProcessor thread entry infinite loop and can't work > normally. > The reason is that #ctxt.getNextSPSPath() get a inodeId which path does not > exist. The inodeId will not be set to null, causing the thread hold this > inodeId forever. > {code:java} > public void run() { > LOG.info("Starting SPSPathIdProcessor!."); > Long startINode = null; > while (ctxt.isRunning()) { > try { > if (!ctxt.isInSafeMode()) { > if (startINode == null) { > startINode = ctxt.getNextSPSPath(); > } // else same id will be retried > if (startINode == null) { > // Waiting for SPS path > Thread.sleep(3000); > } else { > ctxt.scanAndCollectFiles(startINode); > // check if directory was empty and no child added to queue > DirPendingWorkInfo dirPendingWorkInfo = > pendingWorkForDirectory.get(startINode); > if (dirPendingWorkInfo != null > && dirPendingWorkInfo.isDirWorkDone()) { > ctxt.removeSPSHint(startINode); > pendingWorkForDirectory.remove(startINode); > } > } > startINode = null; // Current inode successfully scanned. > } > } catch (Throwable t) { > String reClass = t.getClass().getName(); > if (InterruptedException.class.getName().equals(reClass)) { > LOG.info("SPSPathIdProcessor thread is interrupted. Stopping.."); > break; > } > LOG.warn("Exception while scanning file inodes to satisfy the policy", > t); > try { > Thread.sleep(3000); > } catch (InterruptedException e) { > LOG.info("Interrupted while waiting in SPSPathIdProcessor", t); > break; > } > } > } > } {code} > > -- This message was sent by Atlassian Jira (v8.20.1#820001) --------------------------------------------------------------------- To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org