[ https://issues.apache.org/jira/browse/HDFS-15340?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17105563#comment-17105563 ]
Yiqun Lin commented on HDFS-15340: ---------------------------------- Thanks for addressing the comments, [~LiJinglun]. Some places are confused to understand, also you need to take care for some typos, :). Some more review comments: *BalanceJob.java* Typo: {{Each procedure need}} --> {{Each procedure needs}} Typo: jos -> job Can you rewrite this: {noformat} Start procedure {}. The last procedure is ... -- > Start procedure {}, last procedure is {noformat} How about adding sleep interval time when curProcedure.execute returning false that means curProcedure executed failed? It can avoid frequently executing failed procedure. {noformat} if (curProcedure.execute(lastProcedure)) { + lastProcedure = curProcedure; + curProcedure = next(); + } {noformat} The balancer job is same, why we always write its journal info to HDFS? Only once time is enough I think. {noformat} if (!scheduler.writeJournal(this)) { + quit = true; // Write journal failed. Simply quit because this job + // has already been added to the recoverQueue. + LOG.debug("Write journal failed. Quit and wait for recovery."); + } {noformat} I see this is also used for testing, can you add @VisibleForTesting annotation for this? {noformat} + public boolean removeAfterDone() { + return removeAfterDone; + }{noformat} It would be better to add comment for this method, we don't know why we pass exception here. {code:java} private synchronized void finish(Exception exception) {code} *BalanceJournalInfoHDFS.java* Found many minor grammar rule issues, can you have a quick fix? * "list all jobs from journal"; – > "List all jobs from journal" * Need to leave one white space: builder.append(",") -> builder.append(", "); * clear journal of job - > Clear journal of job *BalanceProcedure.java* {code:java} + /** + * The main process. This is called by the ProcedureScheduler. + + * Make sure the process quits fast when it's interrupted and the scheduler is + * shut down. + * + * @param lastProcedure the last procedure. + * @throws RetryException if this procedure needs delay a while then retry. + * @return true if the procedure has done and the job will go to the next + * procedure, otherwise false. + */ + public abstract boolean execute(T lastProcedure) + throws RetryException, IOException; {code} lastProcedure here is only used for testing, I suggest to remove this as an input parameter. It seems too confused that we pass lastProcedure but do nothing in actual BalanceProcedure class. The major function methods need be clear for others to understand, :). *BalanceProcedureScheduler.java* For the elapse time calculation on Hadoop world, we will use Time.monotonicNow() not Time.now or System.currentTimeMillis(). Can you update for this? {noformat} this.time = Time.now() + delayInMilliseconds; long delay = time - System.currentTimeMillis(); {noformat} *UnrecoverableProcedure.java* {code:java} + @Override + public boolean execute(BalanceProcedure lastProcedure) throws RetryException, + IOException { + if (handler != null) { + return handler.execute(lastProcedure); + } else { + return true; + } + } {code} We could use mock to throw exception, not depend on BalanceProcedure passed to throw exception. So lastProcedure can completely removed in execute method. > RBF: Balance data across federation namespaces with DistCp and snapshot diff > / Step 1: The State Machine(BalanceProcedureScheduler) > ----------------------------------------------------------------------------------------------------------------------------------- > > Key: HDFS-15340 > URL: https://issues.apache.org/jira/browse/HDFS-15340 > Project: Hadoop HDFS > Issue Type: Sub-task > Reporter: Jinglun > Assignee: Jinglun > Priority: Major > Attachments: HDFS-15340.001.patch, HDFS-15340.002.patch, > HDFS-15340.003.patch, HDFS-15340.004.patch > > > Patch in HDFS-15294 is too big to review so we split it into 2 patches. This > is the first one. Detail can be found at HDFS-15294. -- This message was sent by Atlassian Jira (v8.3.4#803005) --------------------------------------------------------------------- To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org