[jira] [Commented] (HDFS-10904) Need a new Result state for DiskBalancerWorkStatus to indicate the final Plan step errors and stuck rebalancing

Manoj Govindassamy (JIRA) Mon, 31 Oct 2016 12:13:16 -0700

    [ 
https://issues.apache.org/jira/browse/HDFS-10904?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15623069#comment-15623069
 ]


Manoj Govindassamy commented on HDFS-10904:
-------------------------------------------

[~anu],

I explored the code a little deeper and found the __Future__ object in 
{{DiskBalancer}} and the way it is used in 
{{DiskBalancerWorkStatus#queryWorkStatus}} is mitigating the need for an extra 
result state to indicate explicit errors. 

DiskBalancerWorkStatus#queryWorkStatus():
{code}
      // if we had a plan in progress, check if it is finished.
      if (this.currentResult == Result.PLAN_UNDER_PROGRESS &&
          this.future != null &&
          this.future.isDone()) {
        this.currentResult = Result.PLAN_DONE;
      }
{code}

So, even after the last MoveStep encountered any serious errors,  since the 
future moved to the Done state, the Result state is set to PLAN_DONE as against 
my assumption of PLAN_UNDER_PROGRESS. 

{noformat}
1266 2016-10-31 11:58:02,625 [main] INFO  diskbalancer.TestDiskBalancer 
(TestDiskBalancer.java:get(569)) - Work Status: 
{"currentState":[{"sourcePath":"/Users/manoj/work/ups-hadoop/hadoop-hdfs-project/hadoop-hdfs/target/test/data/dfs/data/data1/","destPath":"/Users/manoj/work/ups-hadoop/hadoop-hdfs-project/ha
     
doop-hdfs/target/test/data/dfs/data/data2/","workItem":{"startTime":0,"secondsElapsed":0,"bytesToCopy":51469,"bytesCopied":0,"errorCount":0,"errMsg":"Disk
 Balancer - Unable to find dest volume: 
/Users/manoj/work/ups-hadoop/hadoop-hdfs-project/hadoop-hdfs/target/test/data/dfs/data/data2/","blocksCopied":0
     
,"maxDiskErrors":0,"tolerancePercent":10,"bandwidth":0}}],"result":"PLAN_DONE","planID":"147408705a52443d183b4415e318bc6283fe5fe6","planFile":"/system/current.plan.json"}
{noformat}

So, user looking at the detailed query result can get the details on what went 
wrong from the Result errMsg. So, it is not very important to introduce a new 
state. So, please feel free to move this bug out of the parent jira or closer 
the bug as the new priority looks very low to me.  Your thoughts please ?



> Need a new Result state for DiskBalancerWorkStatus to indicate the final Plan 
> step errors and stuck rebalancing
> ---------------------------------------------------------------------------------------------------------------
>
>                 Key: HDFS-10904
>                 URL: https://issues.apache.org/jira/browse/HDFS-10904
>             Project: Hadoop HDFS
>          Issue Type: Sub-task
>          Components: balancer & mover
>    Affects Versions: 3.0.0-alpha2
>            Reporter: Manoj Govindassamy
>            Assignee: Manoj Govindassamy
>             Fix For: 2.9.0
>
>
> * A DiskBalancer {{NodePlan}} might include a Single {{MoveStep}} or a list 
> of MoveSteps to perform the requested disk balancing operation.
> * {{DiskBalancerWorkStatus}} tracks the current disk balancing operation 
> status for the {{Plan}} just submitted. 
> * {{DiskBalancerWorkStatus#Result}} has following states and the state 
> machine movement for the {{currentResult}} state doesn't seem to be a driven 
> totally from disk balancing operation. Especially, the state movement to DONE 
> is happening only upon QueryResult, which can be improved. {code}
>   /** Various result values. **/
>   public enum Result {
>     NO_PLAN(0),
>     PLAN_UNDER_PROGRESS(1),
>     PLAN_DONE(2),
>     PLAN_CANCELLED(3);
> DiskBalancer
> cancelPlan(String)
>         this.currentResult = Result.PLAN_CANCELLED;
> DiskBalancer(String, Configuration, BlockMover)
>     this.currentResult = Result.NO_PLAN;
> queryWorkStatus()
>         this.currentResult = Result.PLAN_DONE;
> shutdown()
>       this.currentResult = Result.NO_PLAN;
>         this.currentResult = Result.PLAN_CANCELLED;
> submitPlan(String, long, String, String, boolean)
>       this.currentResult = Result.PLAN_UNDER_PROGRESS;
> {code}
> * More importantly, when the final {{MoveStep}} of the {{NodePlan}} fails, 
> the currentResult state is stuck in {{PLAN_UNDER_PROGRESS}} forever. User 
> querying the status will assume the operation is in progress when in reality 
> its not making any progress.  User can also run {{Query}} command with 
> _verbose_ option which then will display more details about the operation 
> which includes details about errors encountered.
> **  Query Output: {code}
> Plan File:  <_file_path_>
> Plan ID: <_plan_hash_>
> Result: PLAN_UNDER_PROGRESS
> {code}
> ** {code}
> "sourcePath" : "/data/disk2/hdfs/dn",
>   "destPath" : "/data/disk3/hdfs/dn",
>   "workItem" :
>     .. .. ..
>     "errorCount" : 0,
>     "errMsg" : null,
>     .. .. 
>     "maxDiskErrors" : 5,
>     .. .. ..
> {code}
> ** But, user has to decipher these details to make out that the disk 
> balancing operation is stuck as the top level Result still says 
> {{PLAN_UNDER_PROGRESS}}. So, we want the DiskBalancer differentiate between 
> the in-progress operation and the stuck or final error operations.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org

[jira] [Commented] (HDFS-10904) Need a new Result state for DiskBalancerWorkStatus to indicate the final Plan step errors and stuck rebalancing

Reply via email to