[ 
https://issues.apache.org/jira/browse/HDFS-16484?focusedWorklogId=754417&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-754417
 ]

ASF GitHub Bot logged work on HDFS-16484:
-----------------------------------------

                Author: ASF GitHub Bot
            Created on: 08/Apr/22 05:43
            Start Date: 08/Apr/22 05:43
    Worklog Time Spent: 10m 
      Work Description: tasanuma commented on code in PR #4032:
URL: https://github.com/apache/hadoop/pull/4032#discussion_r845753947


##########
hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/namenode/sps/BlockStorageMovementNeeded.java:
##########
@@ -232,6 +233,7 @@ public synchronized void clearQueuesWithNotification() {
     public void run() {
       LOG.info("Starting SPSPathIdProcessor!.");
       Long startINode = null;
+      int retryCount = 0;

Review Comment:
   Do we still need to implement the retry logic after catching the 
FileNotFoundException?





Issue Time Tracking
-------------------

    Worklog Id:     (was: 754417)
    Time Spent: 1h 40m  (was: 1.5h)

> [SPS]: Fix an infinite loop bug in SPSPathIdProcessor thread 
> -------------------------------------------------------------
>
>                 Key: HDFS-16484
>                 URL: https://issues.apache.org/jira/browse/HDFS-16484
>             Project: Hadoop HDFS
>          Issue Type: Sub-task
>            Reporter: qinyuren
>            Assignee: qinyuren
>            Priority: Major
>              Labels: pull-request-available
>         Attachments: image-2022-02-25-14-35-42-255.png
>
>          Time Spent: 1h 40m
>  Remaining Estimate: 0h
>
> Currently, we ran SPS in our cluster and found this log. The 
> SPSPathIdProcessor thread enters an infinite loop and prints the same log all 
> the time.
> !image-2022-02-25-14-35-42-255.png|width=682,height=195!
> In SPSPathIdProcessor thread, if it get a inodeId which path does not exist, 
> then the SPSPathIdProcessor thread entry infinite loop and can't work 
> normally. 
> The reason is that #ctxt.getNextSPSPath() get a inodeId which path does not 
> exist. The inodeId will not be set to null, causing the thread hold this 
> inodeId forever.
> {code:java}
> public void run() {
>   LOG.info("Starting SPSPathIdProcessor!.");
>   Long startINode = null;
>   while (ctxt.isRunning()) {
>     try {
>       if (!ctxt.isInSafeMode()) {
>         if (startINode == null) {
>           startINode = ctxt.getNextSPSPath();
>         } // else same id will be retried
>         if (startINode == null) {
>           // Waiting for SPS path
>           Thread.sleep(3000);
>         } else {
>           ctxt.scanAndCollectFiles(startINode);
>           // check if directory was empty and no child added to queue
>           DirPendingWorkInfo dirPendingWorkInfo =
>               pendingWorkForDirectory.get(startINode);
>           if (dirPendingWorkInfo != null
>               && dirPendingWorkInfo.isDirWorkDone()) {
>             ctxt.removeSPSHint(startINode);
>             pendingWorkForDirectory.remove(startINode);
>           }
>         }
>         startINode = null; // Current inode successfully scanned.
>       }
>     } catch (Throwable t) {
>       String reClass = t.getClass().getName();
>       if (InterruptedException.class.getName().equals(reClass)) {
>         LOG.info("SPSPathIdProcessor thread is interrupted. Stopping..");
>         break;
>       }
>       LOG.warn("Exception while scanning file inodes to satisfy the policy",
>           t);
>       try {
>         Thread.sleep(3000);
>       } catch (InterruptedException e) {
>         LOG.info("Interrupted while waiting in SPSPathIdProcessor", t);
>         break;
>       }
>     }
>   }
> } {code}
>  
>  



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

---------------------------------------------------------------------
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org

Reply via email to