[ https://issues.apache.org/jira/browse/HDFS-6800?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14106137#comment-14106137 ]
Colin Patrick McCabe commented on HDFS-6800: -------------------------------------------- Thanks for fixing the DataNode usage message. That bugged me. The overall strategy looks good. Starting the DN with {{\-rollback}} matches how we handle starting the NN during rollback. {code} private void doTransition(DataNode datanode, StorageDirectory sd, NamespaceInfo nsInfo, StartupOption startOpt) throws IOException { - if (startOpt == StartupOption.ROLLBACK) { + if (startOpt == StartupOption.ROLLBACK && sd.getPreviousDir().exists()) { doRollback(sd, nsInfo); // rollback if applicable + // we have already restored everything in the trash by rolling back to + // the previous directory, so we must delete the trash to ensure + // that it's not restored by BPOfferService.signalRollingUpgrade() + FileUtil.fullyDelete(getTrashRootDir(sd)); } else { {code} What if the rename inside doRollback succeeds, but the deletion of the trash fails? I think to avoid this, we should have a process like this: 1. rename trash to trash.old.<monotonic-timestamp> 2. doRollback (renames previous to current, etc.) 3. if doRollback succeeded, delete trash.old.<monotonic-timestamp> If it failed, rename trash.old.<monotonic-timestamp> to trash. This means using try/catch and/or checking return booleans as needed > Determine how Datanode layout changes should interact with rolling upgrade > -------------------------------------------------------------------------- > > Key: HDFS-6800 > URL: https://issues.apache.org/jira/browse/HDFS-6800 > Project: Hadoop HDFS > Issue Type: Improvement > Components: datanode > Affects Versions: 2.6.0 > Reporter: Colin Patrick McCabe > Assignee: James Thomas > Attachments: HDFS-6800.2.patch, HDFS-6800.3.patch, HDFS-6800.4.patch, > HDFS-6800.patch > > > We need to handle attempts to rolling-upgrade the DataNode to a new storage > directory layout. > One approach is to disallow such upgrades. If we choose this approach, we > should make sure that the system administrator gets a helpful error message > and a clean failure when trying to use rolling upgrade to a version that > doesn't support it. Based on the compatibility guarantees described in > HDFS-5535, this would mean that *any* future DataNode layout changes would > require a major version upgrade. > Another approach would be to support rolling upgrade from an old DN storage > layout to a new layout. This approach requires us to change our > documentation to explain to users that they should supply the {{\-rollback}} > command on the command-line when re-starting the DataNodes during rolling > rollback. Currently the documentation just says to restart the DataNode > normally. > Another issue here is that the DataNode's usage message describes rollback > options that no longer exist. The help text says that the DN supports > {{\-rollingupgrade rollback}}, but this option was removed by HDFS-6005. -- This message was sent by Atlassian JIRA (v6.2#6252)