[ https://issues.apache.org/jira/browse/HDFS-16484?focusedWorklogId=754417&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-754417 ]
ASF GitHub Bot logged work on HDFS-16484: ----------------------------------------- Author: ASF GitHub Bot Created on: 08/Apr/22 05:43 Start Date: 08/Apr/22 05:43 Worklog Time Spent: 10m Work Description: tasanuma commented on code in PR #4032: URL: https://github.com/apache/hadoop/pull/4032#discussion_r845753947 ########## hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/namenode/sps/BlockStorageMovementNeeded.java: ########## @@ -232,6 +233,7 @@ public synchronized void clearQueuesWithNotification() { public void run() { LOG.info("Starting SPSPathIdProcessor!."); Long startINode = null; + int retryCount = 0; Review Comment: Do we still need to implement the retry logic after catching the FileNotFoundException? Issue Time Tracking ------------------- Worklog Id: (was: 754417) Time Spent: 1h 40m (was: 1.5h) > [SPS]: Fix an infinite loop bug in SPSPathIdProcessor thread > ------------------------------------------------------------- > > Key: HDFS-16484 > URL: https://issues.apache.org/jira/browse/HDFS-16484 > Project: Hadoop HDFS > Issue Type: Sub-task > Reporter: qinyuren > Assignee: qinyuren > Priority: Major > Labels: pull-request-available > Attachments: image-2022-02-25-14-35-42-255.png > > Time Spent: 1h 40m > Remaining Estimate: 0h > > Currently, we ran SPS in our cluster and found this log. The > SPSPathIdProcessor thread enters an infinite loop and prints the same log all > the time. > !image-2022-02-25-14-35-42-255.png|width=682,height=195! > In SPSPathIdProcessor thread, if it get a inodeId which path does not exist, > then the SPSPathIdProcessor thread entry infinite loop and can't work > normally. > The reason is that #ctxt.getNextSPSPath() get a inodeId which path does not > exist. The inodeId will not be set to null, causing the thread hold this > inodeId forever. > {code:java} > public void run() { > LOG.info("Starting SPSPathIdProcessor!."); > Long startINode = null; > while (ctxt.isRunning()) { > try { > if (!ctxt.isInSafeMode()) { > if (startINode == null) { > startINode = ctxt.getNextSPSPath(); > } // else same id will be retried > if (startINode == null) { > // Waiting for SPS path > Thread.sleep(3000); > } else { > ctxt.scanAndCollectFiles(startINode); > // check if directory was empty and no child added to queue > DirPendingWorkInfo dirPendingWorkInfo = > pendingWorkForDirectory.get(startINode); > if (dirPendingWorkInfo != null > && dirPendingWorkInfo.isDirWorkDone()) { > ctxt.removeSPSHint(startINode); > pendingWorkForDirectory.remove(startINode); > } > } > startINode = null; // Current inode successfully scanned. > } > } catch (Throwable t) { > String reClass = t.getClass().getName(); > if (InterruptedException.class.getName().equals(reClass)) { > LOG.info("SPSPathIdProcessor thread is interrupted. Stopping.."); > break; > } > LOG.warn("Exception while scanning file inodes to satisfy the policy", > t); > try { > Thread.sleep(3000); > } catch (InterruptedException e) { > LOG.info("Interrupted while waiting in SPSPathIdProcessor", t); > break; > } > } > } > } {code} > > -- This message was sent by Atlassian Jira (v8.20.1#820001) --------------------------------------------------------------------- To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org