[ 
https://issues.apache.org/jira/browse/HDFS-7645?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14598582#comment-14598582
 ] 

Andrew Wang commented on HDFS-7645:
-----------------------------------

Hey Vinay,

It might be okay to sneak in this incompat change, I doubt there are many users 
of this API. It's also possible to write an "after" check will work with both 
old and new NNs to check for finalization:

{code}
// before
if (ruinfo == null)
// after
if (ruinfo == null || ruinfo.isFinalized())
{code}

One related change we could also make is adding boolean isStarted and 
isFinalized to the JMX output, since that way callers won't have to do a "!= 0" 
check. Essentially all the normal benefits of a getter. I just filed HDFS-8656 
to do this.

In hindsight it would have been nice to always return an RUInfo so the check 
could just be {{if (ruinfo.isFinalized)}}. The need for null checking is a bit 
ugly.

> Rolling upgrade is restoring blocks from trash multiple times
> -------------------------------------------------------------
>
>                 Key: HDFS-7645
>                 URL: https://issues.apache.org/jira/browse/HDFS-7645
>             Project: Hadoop HDFS
>          Issue Type: Improvement
>          Components: datanode
>    Affects Versions: 2.6.0
>            Reporter: Nathan Roberts
>            Assignee: Keisuke Ogiwara
>             Fix For: 2.8.0
>
>         Attachments: HDFS-7645.01.patch, HDFS-7645.02.patch, 
> HDFS-7645.03.patch, HDFS-7645.04.patch, HDFS-7645.05.patch, 
> HDFS-7645.06.patch, HDFS-7645.07.patch
>
>
> When performing an HDFS rolling upgrade, the trash directory is getting 
> restored twice when under normal circumstances it shouldn't need to be 
> restored at all. iiuc, the only time these blocks should be restored is if we 
> need to rollback a rolling upgrade. 
> On a busy cluster, this can cause significant and unnecessary block churn 
> both on the datanodes, and more importantly in the namenode.
> The two times this happens are:
> 1) restart of DN onto new software
> {code}
>   private void doTransition(DataNode datanode, StorageDirectory sd,
>       NamespaceInfo nsInfo, StartupOption startOpt) throws IOException {
>     if (startOpt == StartupOption.ROLLBACK && sd.getPreviousDir().exists()) {
>       Preconditions.checkState(!getTrashRootDir(sd).exists(),
>           sd.getPreviousDir() + " and " + getTrashRootDir(sd) + " should not 
> " +
>           " both be present.");
>       doRollback(sd, nsInfo); // rollback if applicable
>     } else {
>       // Restore all the files in the trash. The restored files are retained
>       // during rolling upgrade rollback. They are deleted during rolling
>       // upgrade downgrade.
>       int restored = restoreBlockFilesFromTrash(getTrashRootDir(sd));
>       LOG.info("Restored " + restored + " block files from trash.");
>     }
> {code}
> 2) When heartbeat response no longer indicates a rollingupgrade is in progress
> {code}
>   /**
>    * Signal the current rolling upgrade status as indicated by the NN.
>    * @param inProgress true if a rolling upgrade is in progress
>    */
>   void signalRollingUpgrade(boolean inProgress) throws IOException {
>     String bpid = getBlockPoolId();
>     if (inProgress) {
>       dn.getFSDataset().enableTrash(bpid);
>       dn.getFSDataset().setRollingUpgradeMarker(bpid);
>     } else {
>       dn.getFSDataset().restoreTrash(bpid);
>       dn.getFSDataset().clearRollingUpgradeMarker(bpid);
>     }
>   }
> {code}
> HDFS-6800 and HDFS-6981 were modifying this behavior making it not completely 
> clear whether this is somehow intentional. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to