[ https://issues.apache.org/jira/browse/HDFS-10808?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15450677#comment-15450677 ]
Anu Engineer commented on HDFS-10808: ------------------------------------- [~eddyxu] Thank you very much for taking time out to look at the code and ask me these pertinent questions. bq. Wouldn't break is the simplest way to exit the while loop? if you had one or two breaks may be. But as the number of breaks increase, the control flow graph becomes quite complex. So it becomes harder to reason about the resources and other states from each exit point. An easy way of thinking about copyBlocks would be to think of it as a simple state machine. if you were implementing a state machine, you would probably move to state "exit" and use the default mechanisms of the state machine to handle the exit state instead of breaking out at each error condition. copyBlocks is following that pattern. So while in generic case I agree with you, in this specific case I think this pattern produces code that is easier to reason. bq. on the comment. {noformat} // check if someone told us exit, treat this as an interruption // point for the thread, since both getNextBlock and moveBlocAcrossVolume // can take some time. {noformat} I was under the impression that comment is quite clear, may be I am mistaken. There are several conditions under which we would like to exit in the copy blocks thread. Some of them are states. Some are actions with clear side effects. What we are trying to do is minimize the effects of both. So we introduce the notion of "interruption points" in our copy thread. That is when we invoke a function and if we encounter a failure condition, we flag that information so that at the next safe point to bail, we will. In other words, we don't exit at the point of error, but simply set the state so that thread can proceed to a point where it considers that it is safe for it to exit. Examples of action with side effects are, copy of data block but metadata is not still copied or getting a bunch of disk errors (we wait till 5) before we can get out etc, or finding a block and before we can get to it, it disappears underneath us. Since we have all these kinds of external conditions to take care of, we simply set up a flag telling the system to exit cleanly. This paradigm gives us a centralized exit handler, so if the thread had to do some specific cleanup based on certain error, it is still possible to chain those error handlers at the exit point. Yes, the Atomic nature of shouldRun flag is confusing and perhaps not needed. It is an artifact of playing around with copying multiple blocks when I was developing code. It had a different structure, but then I found that enforcing bandwidth was harder and decided to do single block copy at a time. I really appreciate you taking time out to ask these questions and helping to make sure that I am in the right path. > DiskBalancer does not execute multi-steps plan-redux > ---------------------------------------------------- > > Key: HDFS-10808 > URL: https://issues.apache.org/jira/browse/HDFS-10808 > Project: Hadoop HDFS > Issue Type: Sub-task > Components: balancer & mover > Reporter: Anu Engineer > Assignee: Anu Engineer > Attachments: HDFS-10808.001.patch, HDFS-10808.002.patch > > > This is to redo of the fix in HDFS-10598 -- This message was sent by Atlassian JIRA (v6.3.4#6332) --------------------------------------------------------------------- To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org