[jira] [Commented] (HDFS-5889) When rolling upgrade is in progress, standby NN should create checkpoint for downgrade.

2014-02-18 Thread Arpit Agarwal (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-5889?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13904568#comment-13904568
 ] 

Arpit Agarwal commented on HDFS-5889:
-

This patch seems to have broken the rolling upgrade tests.

The new edit log ops {{OP_ROLLING_UPGRADE_START}} and 
{{OP_ROLLING_UPGRADE_FINALIZE}} trigger a {{RollingUpgradeException}} during NN 
restart. I think the fix should be to invoke 
{{startRollingUpgrade/finalizeRollingUpgrade}} (and write to editLog only when 
invoked via RPC).

I filed HDFS-5960.

> When rolling upgrade is in progress, standby NN should create checkpoint for 
> downgrade.
> ---
>
> Key: HDFS-5889
> URL: https://issues.apache.org/jira/browse/HDFS-5889
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>  Components: namenode
>Reporter: Tsz Wo (Nicholas), SZE
>Assignee: Tsz Wo (Nicholas), SZE
> Fix For: HDFS-5535 (Rolling upgrades)
>
> Attachments: h5889_20140211.patch, h5889_20140212b.patch, 
> h5889_20140212c.patch, h5889_20140213.patch
>
>
> After rolling upgrade is started and checkpoint is disabled, the edit log may 
> grow to a huge size.  It is not a problem if rolling upgrade is finalized 
> normally since NN keeps the current state in memory and it writes a new 
> checkpoint during finalize.  However, it is a problem if admin decides to 
> downgrade.  It could take a long time to apply edit log.  Rollback does not 
> have such problem.



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)


[jira] [Commented] (HDFS-5889) When rolling upgrade is in progress, standby NN should create checkpoint for downgrade.

2014-02-13 Thread Tsz Wo (Nicholas), SZE (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-5889?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13900117#comment-13900117
 ] 

Tsz Wo (Nicholas), SZE commented on HDFS-5889:
--

Created HDFS-5945 for fsimage change.

> When rolling upgrade is in progress, standby NN should create checkpoint for 
> downgrade.
> ---
>
> Key: HDFS-5889
> URL: https://issues.apache.org/jira/browse/HDFS-5889
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>  Components: namenode
>Reporter: Tsz Wo (Nicholas), SZE
>Assignee: Tsz Wo (Nicholas), SZE
> Fix For: HDFS-5535 (Rolling upgrades)
>
> Attachments: h5889_20140211.patch, h5889_20140212b.patch, 
> h5889_20140212c.patch, h5889_20140213.patch
>
>
> After rolling upgrade is started and checkpoint is disabled, the edit log may 
> grow to a huge size.  It is not a problem if rolling upgrade is finalized 
> normally since NN keeps the current state in memory and it writes a new 
> checkpoint during finalize.  However, it is a problem if admin decides to 
> downgrade.  It could take a long time to apply edit log.  Rollback does not 
> have such problem.



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)


[jira] [Commented] (HDFS-5889) When rolling upgrade is in progress, standby NN should create checkpoint for downgrade.

2014-02-12 Thread Vinayakumar B (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-5889?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13900080#comment-13900080
 ] 

Vinayakumar B commented on HDFS-5889:
-

Hi
I forgot to mention one trivial nit, 
{code}+  private synchronized void saveFSImageInAllDirs(FSNamesystem source,
+  NameNodeFile nnf, long txid, Canceler canceler) throws IOException {
{code}
Have trailing spaces. Can you please remove it before commit..? thanks.

> When rolling upgrade is in progress, standby NN should create checkpoint for 
> downgrade.
> ---
>
> Key: HDFS-5889
> URL: https://issues.apache.org/jira/browse/HDFS-5889
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>  Components: namenode
>Reporter: Tsz Wo (Nicholas), SZE
>Assignee: Tsz Wo (Nicholas), SZE
> Attachments: h5889_20140211.patch, h5889_20140212b.patch, 
> h5889_20140212c.patch
>
>
> After rolling upgrade is started and checkpoint is disabled, the edit log may 
> grow to a huge size.  It is not a problem if rolling upgrade is finalized 
> normally since NN keeps the current state in memory and it writes a new 
> checkpoint during finalize.  However, it is a problem if admin decides to 
> downgrade.  It could take a long time to apply edit log.  Rollback does not 
> have such problem.



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)


[jira] [Commented] (HDFS-5889) When rolling upgrade is in progress, standby NN should create checkpoint for downgrade.

2014-02-12 Thread Vinayakumar B (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-5889?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13900073#comment-13900073
 ] 

Vinayakumar B commented on HDFS-5889:
-

Yes, +1 from my side too.

bq.We also need to change FSImage format for adding upgrade info. Otherwise, 
the upgrade info will be lost if it uses a checkpoint to restart. Let's commit 
this first and do the work separately so that this will unblock HDFS-5920. Do 
you agree?
I too was thinking about the same problem and then saw this already mentioned. 
Its ok to commit now and handle this separately

> When rolling upgrade is in progress, standby NN should create checkpoint for 
> downgrade.
> ---
>
> Key: HDFS-5889
> URL: https://issues.apache.org/jira/browse/HDFS-5889
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>  Components: namenode
>Reporter: Tsz Wo (Nicholas), SZE
>Assignee: Tsz Wo (Nicholas), SZE
> Attachments: h5889_20140211.patch, h5889_20140212b.patch, 
> h5889_20140212c.patch
>
>
> After rolling upgrade is started and checkpoint is disabled, the edit log may 
> grow to a huge size.  It is not a problem if rolling upgrade is finalized 
> normally since NN keeps the current state in memory and it writes a new 
> checkpoint during finalize.  However, it is a problem if admin decides to 
> downgrade.  It could take a long time to apply edit log.  Rollback does not 
> have such problem.



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)


[jira] [Commented] (HDFS-5889) When rolling upgrade is in progress, standby NN should create checkpoint for downgrade.

2014-02-12 Thread Jing Zhao (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-5889?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13899796#comment-13899796
 ] 

Jing Zhao commented on HDFS-5889:
-

Yes. +1 for the current patch.

> When rolling upgrade is in progress, standby NN should create checkpoint for 
> downgrade.
> ---
>
> Key: HDFS-5889
> URL: https://issues.apache.org/jira/browse/HDFS-5889
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>  Components: namenode
>Reporter: Tsz Wo (Nicholas), SZE
>Assignee: Tsz Wo (Nicholas), SZE
> Attachments: h5889_20140211.patch, h5889_20140212b.patch, 
> h5889_20140212c.patch
>
>
> After rolling upgrade is started and checkpoint is disabled, the edit log may 
> grow to a huge size.  It is not a problem if rolling upgrade is finalized 
> normally since NN keeps the current state in memory and it writes a new 
> checkpoint during finalize.  However, it is a problem if admin decides to 
> downgrade.  It could take a long time to apply edit log.  Rollback does not 
> have such problem.



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)


[jira] [Commented] (HDFS-5889) When rolling upgrade is in progress, standby NN should create checkpoint for downgrade.

2014-02-12 Thread Tsz Wo (Nicholas), SZE (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-5889?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13899775#comment-13899775
 ] 

Tsz Wo (Nicholas), SZE commented on HDFS-5889:
--

Jing, thanks for making the change.  It looks good.

We also need to change FSImage format for adding upgrade info.  Otherwise, the 
upgrade info will be lost if it uses a checkpoint to restart.  Let's commit 
this first and do the work separately so that this will unblock HDFS-5920.  Do 
you agree?

> When rolling upgrade is in progress, standby NN should create checkpoint for 
> downgrade.
> ---
>
> Key: HDFS-5889
> URL: https://issues.apache.org/jira/browse/HDFS-5889
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>  Components: namenode
>Reporter: Tsz Wo (Nicholas), SZE
>Assignee: Tsz Wo (Nicholas), SZE
> Attachments: h5889_20140211.patch, h5889_20140212b.patch, 
> h5889_20140212c.patch
>
>
> After rolling upgrade is started and checkpoint is disabled, the edit log may 
> grow to a huge size.  It is not a problem if rolling upgrade is finalized 
> normally since NN keeps the current state in memory and it writes a new 
> checkpoint during finalize.  However, it is a problem if admin decides to 
> downgrade.  It could take a long time to apply edit log.  Rollback does not 
> have such problem.



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)


[jira] [Commented] (HDFS-5889) When rolling upgrade is in progress, standby NN should create checkpoint for downgrade.

2014-02-12 Thread Jing Zhao (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-5889?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13899644#comment-13899644
 ] 

Jing Zhao commented on HDFS-5889:
-

Another issue is that when checkpointing for the rollback image, we did not 
rename the md5 file. This can cause failure when loading fsimage for rollback.

We can also fix this in HDFS-5920 since I have both fix and unit test for this 
failure.

> When rolling upgrade is in progress, standby NN should create checkpoint for 
> downgrade.
> ---
>
> Key: HDFS-5889
> URL: https://issues.apache.org/jira/browse/HDFS-5889
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>  Components: namenode
>Reporter: Tsz Wo (Nicholas), SZE
>Assignee: Tsz Wo (Nicholas), SZE
> Attachments: h5889_20140211.patch, h5889_20140212b.patch, 
> h5889_20140212c.patch
>
>
> After rolling upgrade is started and checkpoint is disabled, the edit log may 
> grow to a huge size.  It is not a problem if rolling upgrade is finalized 
> normally since NN keeps the current state in memory and it writes a new 
> checkpoint during finalize.  However, it is a problem if admin decides to 
> downgrade.  It could take a long time to apply edit log.  Rollback does not 
> have such problem.



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)


[jira] [Commented] (HDFS-5889) When rolling upgrade is in progress, standby NN should create checkpoint for downgrade.

2014-02-12 Thread Jing Zhao (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-5889?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13899437#comment-13899437
 ] 

Jing Zhao commented on HDFS-5889:
-

The new patch looks good to me. One question is that the current patch calls 
"purgeOldStorage(NameNodeFile.IMAGE_ROLLBACK)" when finalizing rolling upgrade. 
Looks like this method will still retain the IMAGE_ROLLBACK checkpoint (by 
default, <= 2 ckpts)? Do we want to make sure the IMAGE_ROLLBACK checkpoint 
gets purged here?

> When rolling upgrade is in progress, standby NN should create checkpoint for 
> downgrade.
> ---
>
> Key: HDFS-5889
> URL: https://issues.apache.org/jira/browse/HDFS-5889
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>  Components: namenode
>Reporter: Tsz Wo (Nicholas), SZE
>Assignee: Tsz Wo (Nicholas), SZE
> Attachments: h5889_20140211.patch, h5889_20140212b.patch
>
>
> After rolling upgrade is started and checkpoint is disabled, the edit log may 
> grow to a huge size.  It is not a problem if rolling upgrade is finalized 
> normally since NN keeps the current state in memory and it writes a new 
> checkpoint during finalize.  However, it is a problem if admin decides to 
> downgrade.  It could take a long time to apply edit log.  Rollback does not 
> have such problem.



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)


[jira] [Commented] (HDFS-5889) When rolling upgrade is in progress, standby NN should create checkpoint for downgrade.

2014-02-11 Thread Tsz Wo (Nicholas), SZE (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-5889?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13898898#comment-13898898
 ] 

Tsz Wo (Nicholas), SZE commented on HDFS-5889:
--

Actually, we do not have to change the existing checkpoint code much if we 
creates a rollback image (fsimage_rollback) instead of creating upgrade images 
(fsimage_upgrade).  The rollback image is created when starting rolling upgrade 
and deleted after finalize/downgrade.   For rollback, the rollback image should 
be renamed using normal image name.

> When rolling upgrade is in progress, standby NN should create checkpoint for 
> downgrade.
> ---
>
> Key: HDFS-5889
> URL: https://issues.apache.org/jira/browse/HDFS-5889
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>  Components: namenode
>Reporter: Tsz Wo (Nicholas), SZE
>Assignee: Tsz Wo (Nicholas), SZE
> Attachments: h5889_20140211.patch
>
>
> After rolling upgrade is started and checkpoint is disabled, the edit log may 
> grow to a huge size.  It is not a problem if rolling upgrade is finalized 
> normally since NN keeps the current state in memory and it writes a new 
> checkpoint during finalize.  However, it is a problem if admin decides to 
> downgrade.  It could take a long time to apply edit log.  Rollback does not 
> have such problem.



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)


[jira] [Commented] (HDFS-5889) When rolling upgrade is in progress, standby NN should create checkpoint for downgrade.

2014-02-11 Thread Jing Zhao (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-5889?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13898184#comment-13898184
 ] 

Jing Zhao commented on HDFS-5889:
-

The patch looks good to me. Besides image transferring, only one comment:
In the StandbyCheckpointer, if we are not in rolling upgrade, the nnf should 
still be NameNodeFile.IMAGE instead of IMAGE_NEW?
{code}
-  img.saveNamespace(namesystem, canceler);
+  final NameNodeFile nnf = namesystem.isRollingUpgrade()?
+  NameNodeFile.IMAGE_UPGRADE: NameNodeFile.IMAGE_NEW;
+  img.saveNamespace(namesystem, nnf, canceler);
{code}

> When rolling upgrade is in progress, standby NN should create checkpoint for 
> downgrade.
> ---
>
> Key: HDFS-5889
> URL: https://issues.apache.org/jira/browse/HDFS-5889
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>  Components: namenode
>Reporter: Tsz Wo (Nicholas), SZE
>Assignee: Tsz Wo (Nicholas), SZE
> Attachments: h5889_20140211.patch
>
>
> After rolling upgrade is started and checkpoint is disabled, the edit log may 
> grow to a huge size.  It is not a problem if rolling upgrade is finalized 
> normally since NN keeps the current state in memory and it writes a new 
> checkpoint during finalize.  However, it is a problem if admin decides to 
> downgrade.  It could take a long time to apply edit log.  Rollback does not 
> have such problem.



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)


[jira] [Commented] (HDFS-5889) When rolling upgrade is in progress, standby NN should create checkpoint for downgrade.

2014-02-11 Thread Tsz Wo (Nicholas), SZE (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-5889?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13897992#comment-13897992
 ] 

Tsz Wo (Nicholas), SZE commented on HDFS-5889:
--

Thanks.  It actually cannot send NameNodeFile.IMAGE_UPGRADE file since it can 
only send NameNodeFile.IMAGE files.  Will also change it to support 
IMAGE_UPGRADE.

> When rolling upgrade is in progress, standby NN should create checkpoint for 
> downgrade.
> ---
>
> Key: HDFS-5889
> URL: https://issues.apache.org/jira/browse/HDFS-5889
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>  Components: namenode
>Reporter: Tsz Wo (Nicholas), SZE
>Assignee: Tsz Wo (Nicholas), SZE
> Attachments: h5889_20140211.patch
>
>
> After rolling upgrade is started and checkpoint is disabled, the edit log may 
> grow to a huge size.  It is not a problem if rolling upgrade is finalized 
> normally since NN keeps the current state in memory and it writes a new 
> checkpoint during finalize.  However, it is a problem if admin decides to 
> downgrade.  It could take a long time to apply edit log.  Rollback does not 
> have such problem.



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)


[jira] [Commented] (HDFS-5889) When rolling upgrade is in progress, standby NN should create checkpoint for downgrade.

2014-02-11 Thread Vinayakumar B (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-5889?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13897697#comment-13897697
 ] 

Vinayakumar B commented on HDFS-5889:
-

Hi Nicholas, 
Good work. 

I have one doubt.
Current patch handles only saving fsimage in NameNodeFile.IMAGE_UPGRADE only by 
StandbyCheckpointer in Standby Namenode. But when this is sent to Active 
NameNode its being stored in NameNodeFile.IMAGE itself and purging also happens 
on NameNodeFile.IMAGE. 

Do you think in Active NameNode also ( even though its software not upgraded 
yet ) checkpoint should be stored in NameNodeFile.IMAGE_UPGRADE when rolling 
upgrade in progress?

> When rolling upgrade is in progress, standby NN should create checkpoint for 
> downgrade.
> ---
>
> Key: HDFS-5889
> URL: https://issues.apache.org/jira/browse/HDFS-5889
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>  Components: namenode
>Reporter: Tsz Wo (Nicholas), SZE
>Assignee: Tsz Wo (Nicholas), SZE
> Attachments: h5889_20140211.patch
>
>
> After rolling upgrade is started and checkpoint is disabled, the edit log may 
> grow to a huge size.  It is not a problem if rolling upgrade is finalized 
> normally since NN keeps the current state in memory and it writes a new 
> checkpoint during finalize.  However, it is a problem if admin decides to 
> downgrade.  It could take a long time to apply edit log.  Rollback does not 
> have such problem.



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)


[jira] [Commented] (HDFS-5889) When rolling upgrade is in progress, standby NN should create checkpoint for downgrade.

2014-02-11 Thread Vinayakumar B (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-5889?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13897626#comment-13897626
 ] 

Vinayakumar B commented on HDFS-5889:
-

Sounds good nicholas. 

> When rolling upgrade is in progress, standby NN should create checkpoint for 
> downgrade.
> ---
>
> Key: HDFS-5889
> URL: https://issues.apache.org/jira/browse/HDFS-5889
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>  Components: namenode
>Reporter: Tsz Wo (Nicholas), SZE
>Assignee: Tsz Wo (Nicholas), SZE
>
> After rolling upgrade is started and checkpoint is disabled, the edit log may 
> grow to a huge size.  It is not a problem if rolling upgrade is finalized 
> normally since NN keeps the current state in memory and it writes a new 
> checkpoint during finalize.  However, it is a problem if admin decides to 
> downgrade.  It could take a long time to apply edit log.  Rollback does not 
> have such problem.



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)


[jira] [Commented] (HDFS-5889) When rolling upgrade is in progress, standby NN should create checkpoint for downgrade.

2014-02-11 Thread Tsz Wo (Nicholas), SZE (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-5889?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13897625#comment-13897625
 ] 

Tsz Wo (Nicholas), SZE commented on HDFS-5889:
--

We could save the images with a different naming scheme and then put them into 
the same directory.  How does it sound to you?

> When rolling upgrade is in progress, standby NN should create checkpoint for 
> downgrade.
> ---
>
> Key: HDFS-5889
> URL: https://issues.apache.org/jira/browse/HDFS-5889
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>  Components: namenode
>Reporter: Tsz Wo (Nicholas), SZE
>Assignee: Tsz Wo (Nicholas), SZE
>
> After rolling upgrade is started and checkpoint is disabled, the edit log may 
> grow to a huge size.  It is not a problem if rolling upgrade is finalized 
> normally since NN keeps the current state in memory and it writes a new 
> checkpoint during finalize.  However, it is a problem if admin decides to 
> downgrade.  It could take a long time to apply edit log.  Rollback does not 
> have such problem.



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)


[jira] [Commented] (HDFS-5889) When rolling upgrade is in progress, standby NN should create checkpoint for downgrade.

2014-02-05 Thread Vinay (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-5889?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13893033#comment-13893033
 ] 

Vinay commented on HDFS-5889:
-

Hi Nicholas,
How you are planning to keep these checkpointed images..? in some special 
directory so that during rollback these can be ignored and during downgrade 
these can be included?

> When rolling upgrade is in progress, standby NN should create checkpoint for 
> downgrade.
> ---
>
> Key: HDFS-5889
> URL: https://issues.apache.org/jira/browse/HDFS-5889
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>  Components: namenode
>Reporter: Tsz Wo (Nicholas), SZE
>Assignee: Tsz Wo (Nicholas), SZE
>
> After rolling upgrade is started and checkpoint is disabled, the edit log may 
> grow to a huge size.  It is not a problem if rolling upgrade is finalized 
> normally since NN keeps the current state in memory and it writes a new 
> checkpoint during finalize.  However, it is a problem if admin decides to 
> downgrade.  It could take a long time to apply edit log.  Rollback does not 
> have such problem.



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)