[ https://issues.apache.org/jira/browse/HDFS-16484?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
qinyuren updated HDFS-16484: ---------------------------- Description: Currently, we ran SPS in our cluster and found this log. The SPSPathIdProcessor thread enters an infinite loop and prints the same log all the time. !image-2022-02-25-14-35-42-255.png|width=682,height=195! In SPSPathIdProcessor thread, if it get a inodeId which path does not exist, then the SPSPathIdProcessor thread entry infinite loop and can't work normally. The reason is that #ctxt.getNextSPSPath() get a inodeId which path does not exist. The inodeId will not be set to null, causing the thread hold this inodeId forever. {code:java} public void run() { LOG.info("Starting SPSPathIdProcessor!."); Long startINode = null; while (ctxt.isRunning()) { try { if (!ctxt.isInSafeMode()) { if (startINode == null) { startINode = ctxt.getNextSPSPath(); } // else same id will be retried if (startINode == null) { // Waiting for SPS path Thread.sleep(3000); } else { ctxt.scanAndCollectFiles(startINode); // check if directory was empty and no child added to queue DirPendingWorkInfo dirPendingWorkInfo = pendingWorkForDirectory.get(startINode); if (dirPendingWorkInfo != null && dirPendingWorkInfo.isDirWorkDone()) { ctxt.removeSPSHint(startINode); pendingWorkForDirectory.remove(startINode); } } startINode = null; // Current inode successfully scanned. } } catch (Throwable t) { String reClass = t.getClass().getName(); if (InterruptedException.class.getName().equals(reClass)) { LOG.info("SPSPathIdProcessor thread is interrupted. Stopping.."); break; } LOG.warn("Exception while scanning file inodes to satisfy the policy", t); try { Thread.sleep(3000); } catch (InterruptedException e) { LOG.info("Interrupted while waiting in SPSPathIdProcessor", t); break; } } } } {code} was: In SPSPathIdProcessor thread, if it get a inodeId which path does not exist, then the SPSPathIdProcessor thread entry infinite loop and can't work normally. !image-2022-02-25-14-35-42-255.png|width=682,height=195! > [SPS]: Fix an infinite loop bug in SPSPathIdProcessor thread > ------------------------------------------------------------- > > Key: HDFS-16484 > URL: https://issues.apache.org/jira/browse/HDFS-16484 > Project: Hadoop HDFS > Issue Type: Sub-task > Reporter: qinyuren > Assignee: qinyuren > Priority: Major > Labels: pull-request-available > Attachments: image-2022-02-25-14-35-42-255.png > > Time Spent: 0.5h > Remaining Estimate: 0h > > Currently, we ran SPS in our cluster and found this log. The > SPSPathIdProcessor thread enters an infinite loop and prints the same log all > the time. > !image-2022-02-25-14-35-42-255.png|width=682,height=195! > In SPSPathIdProcessor thread, if it get a inodeId which path does not exist, > then the SPSPathIdProcessor thread entry infinite loop and can't work > normally. > The reason is that #ctxt.getNextSPSPath() get a inodeId which path does not > exist. The inodeId will not be set to null, causing the thread hold this > inodeId forever. > {code:java} > public void run() { > LOG.info("Starting SPSPathIdProcessor!."); > Long startINode = null; > while (ctxt.isRunning()) { > try { > if (!ctxt.isInSafeMode()) { > if (startINode == null) { > startINode = ctxt.getNextSPSPath(); > } // else same id will be retried > if (startINode == null) { > // Waiting for SPS path > Thread.sleep(3000); > } else { > ctxt.scanAndCollectFiles(startINode); > // check if directory was empty and no child added to queue > DirPendingWorkInfo dirPendingWorkInfo = > pendingWorkForDirectory.get(startINode); > if (dirPendingWorkInfo != null > && dirPendingWorkInfo.isDirWorkDone()) { > ctxt.removeSPSHint(startINode); > pendingWorkForDirectory.remove(startINode); > } > } > startINode = null; // Current inode successfully scanned. > } > } catch (Throwable t) { > String reClass = t.getClass().getName(); > if (InterruptedException.class.getName().equals(reClass)) { > LOG.info("SPSPathIdProcessor thread is interrupted. Stopping.."); > break; > } > LOG.warn("Exception while scanning file inodes to satisfy the policy", > t); > try { > Thread.sleep(3000); > } catch (InterruptedException e) { > LOG.info("Interrupted while waiting in SPSPathIdProcessor", t); > break; > } > } > } > } {code} > > -- This message was sent by Atlassian Jira (v8.20.1#820001) --------------------------------------------------------------------- To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org