[jira] [Commented] (HADOOP-4885) Try to restore failed replicas of Name Node storage (at checkpoint time)

2012-10-15 Thread Bertrand Dechoux (JIRA)

[ 
https://issues.apache.org/jira/browse/HADOOP-4885?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13476248#comment-13476248
 ] 

Bertrand Dechoux commented on HADOOP-4885:
--

Sorry for the misunderstanding but its seems like it was indeed added (in the 
1.0.2 like the backport says) so maybe this JIRA could be once again be updated 
to reflect it. (I can't modify the fix versions.) I will probably test this 
feature for 'curiosity' soon.

> Try to restore failed replicas of Name Node storage (at checkpoint time)
> 
>
> Key: HADOOP-4885
> URL: https://issues.apache.org/jira/browse/HADOOP-4885
> Project: Hadoop Common
>  Issue Type: Improvement
>Reporter: Boris Shkolnik
>Assignee: Boris Shkolnik
> Fix For: 0.21.0
>
> Attachments: HADOOP-4885-1.patch, HADOOP-4885-3.patch, 
> HADOOP-4885-3.patch, HADOOP-4885.branch-1.patch, 
> HADOOP-4885.branch-1.patch.2, HADOOP-4885.branch-1.patch.3, 
> HADOOP-4885.patch, HADOOP-4885.patch
>
>


--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HADOOP-4885) Try to restore failed replicas of Name Node storage (at checkpoint time)

2012-09-28 Thread Bertrand Dechoux (JIRA)

[ 
https://issues.apache.org/jira/browse/HADOOP-4885?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13465706#comment-13465706
 ] 

Bertrand Dechoux commented on HADOOP-4885:
--

grep -R "dfs.namenode.name.dir.restore" *

src/hdfs/org/apache/hadoop/hdfs/DFSConfigKeys.java:  public static final String 
 DFS_NAMENODE_NAME_DIR_RESTORE_KEY = "dfs.namenode.name.dir.restore";

Great! I will test it. The documentation does not seem updated but that's a 
detail. (same for the description of the jira...)



> Try to restore failed replicas of Name Node storage (at checkpoint time)
> 
>
> Key: HADOOP-4885
> URL: https://issues.apache.org/jira/browse/HADOOP-4885
> Project: Hadoop Common
>  Issue Type: Improvement
>Reporter: Boris Shkolnik
>Assignee: Boris Shkolnik
> Fix For: 0.21.0
>
> Attachments: HADOOP-4885-1.patch, HADOOP-4885-3.patch, 
> HADOOP-4885-3.patch, HADOOP-4885.branch-1.patch, 
> HADOOP-4885.branch-1.patch.2, HADOOP-4885.branch-1.patch.3, 
> HADOOP-4885.patch, HADOOP-4885.patch
>
>


--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HADOOP-4885) Try to restore failed replicas of Name Node storage (at checkpoint time)

2012-09-28 Thread Brandon Li (JIRA)

[ 
https://issues.apache.org/jira/browse/HADOOP-4885?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13465698#comment-13465698
 ] 

Brandon Li commented on HADOOP-4885:


{quote}I did a grep -R "dfs.name.dir.restore" srcon a downloaded version of 
Hadoop 1.0.3 and found no match.{quote}
The property name is dfs.namenode.name.dir.restore.

> Try to restore failed replicas of Name Node storage (at checkpoint time)
> 
>
> Key: HADOOP-4885
> URL: https://issues.apache.org/jira/browse/HADOOP-4885
> Project: Hadoop Common
>  Issue Type: Improvement
>Reporter: Boris Shkolnik
>Assignee: Boris Shkolnik
> Fix For: 0.21.0
>
> Attachments: HADOOP-4885-1.patch, HADOOP-4885-3.patch, 
> HADOOP-4885-3.patch, HADOOP-4885.branch-1.patch, 
> HADOOP-4885.branch-1.patch.2, HADOOP-4885.branch-1.patch.3, 
> HADOOP-4885.patch, HADOOP-4885.patch
>
>


--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HADOOP-4885) Try to restore failed replicas of Name Node storage (at checkpoint time)

2012-09-28 Thread Harsh J (JIRA)

[ 
https://issues.apache.org/jira/browse/HADOOP-4885?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13465661#comment-13465661
 ] 

Harsh J commented on HADOOP-4885:
-

Lets visit HDFS-3075 for the backport. I removed the versioning from here as it 
was erroneous.

> Try to restore failed replicas of Name Node storage (at checkpoint time)
> 
>
> Key: HADOOP-4885
> URL: https://issues.apache.org/jira/browse/HADOOP-4885
> Project: Hadoop Common
>  Issue Type: Improvement
>Reporter: Boris Shkolnik
>Assignee: Boris Shkolnik
> Fix For: 0.21.0
>
> Attachments: HADOOP-4885-1.patch, HADOOP-4885-3.patch, 
> HADOOP-4885-3.patch, HADOOP-4885.branch-1.patch, 
> HADOOP-4885.branch-1.patch.2, HADOOP-4885.branch-1.patch.3, 
> HADOOP-4885.patch, HADOOP-4885.patch
>
>


--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HADOOP-4885) Try to restore failed replicas of Name Node storage (at checkpoint time)

2012-09-28 Thread Bertrand Dechoux (JIRA)

[ 
https://issues.apache.org/jira/browse/HADOOP-4885?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13465633#comment-13465633
 ] 

Bertrand Dechoux commented on HADOOP-4885:
--

I did a
grep -R "dfs.name.dir.restore" src
on a downloaded version of Hadoop 1.0.3 and found no match.
Maybe the fix version should be updated.

> Try to restore failed replicas of Name Node storage (at checkpoint time)
> 
>
> Key: HADOOP-4885
> URL: https://issues.apache.org/jira/browse/HADOOP-4885
> Project: Hadoop Common
>  Issue Type: Improvement
>Reporter: Boris Shkolnik
>Assignee: Boris Shkolnik
> Fix For: 1.0.3, 0.21.0
>
> Attachments: HADOOP-4885-1.patch, HADOOP-4885-3.patch, 
> HADOOP-4885-3.patch, HADOOP-4885.branch-1.patch, 
> HADOOP-4885.branch-1.patch.2, HADOOP-4885.branch-1.patch.3, 
> HADOOP-4885.patch, HADOOP-4885.patch
>
>


--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HADOOP-4885) Try to restore failed replicas of Name Node storage (at checkpoint time)

2012-03-21 Thread Brandon Li (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/HADOOP-4885?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13235194#comment-13235194
 ] 

Brandon Li commented on HADOOP-4885:


bq. since we haven't started the edits.new file yet, something may actually 
block??
Filesystem modification can be blocked. This step should be optimized. By 
default, the automatic restore is disabled.

bq. It would be better if the resyncing up to the last closed edit log is done 
asynchronously.
This could be a way to optimize the operation. Another way is not to copy over 
the files but wait for the checkpoint processing to populate the new image and 
edit logs. For the second approach the storage directories under restoring 
should have a new state (e.g., formatted or restoring) rather than "active".

bq. Ideally any exceptions from dealing with the removed dirs should be ignored.
Agree.

> Try to restore failed replicas of Name Node storage (at checkpoint time)
> 
>
> Key: HADOOP-4885
> URL: https://issues.apache.org/jira/browse/HADOOP-4885
> Project: Hadoop Common
>  Issue Type: Improvement
>Reporter: Boris Shkolnik
>Assignee: Boris Shkolnik
> Fix For: 0.21.0
>
> Attachments: HADOOP-4885-1.patch, HADOOP-4885-3.patch, 
> HADOOP-4885-3.patch, HADOOP-4885.branch-1.patch, 
> HADOOP-4885.branch-1.patch.2, HADOOP-4885.branch-1.patch.3, 
> HADOOP-4885.patch, HADOOP-4885.patch
>
>


--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HADOOP-4885) Try to restore failed replicas of Name Node storage (at checkpoint time)

2012-03-21 Thread Kihwal Lee (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/HADOOP-4885?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13234460#comment-13234460
 ] 

Kihwal Lee commented on HADOOP-4885:


- It would be better if the resyncing up to the last closed edit log is done 
asynchronously. That way, the NN only needs to sync one or two edits while 
rolling the log.

- It seems that if a restore fails, rollEditLog() also fails even if there are 
healthy directories. Ideally any exceptions from dealing with the removed dirs 
should be ignored.

> Try to restore failed replicas of Name Node storage (at checkpoint time)
> 
>
> Key: HADOOP-4885
> URL: https://issues.apache.org/jira/browse/HADOOP-4885
> Project: Hadoop Common
>  Issue Type: Improvement
>Reporter: Boris Shkolnik
>Assignee: Boris Shkolnik
> Fix For: 0.21.0
>
> Attachments: HADOOP-4885-1.patch, HADOOP-4885-3.patch, 
> HADOOP-4885-3.patch, HADOOP-4885.branch-1.patch, 
> HADOOP-4885.branch-1.patch.2, HADOOP-4885.branch-1.patch.3, 
> HADOOP-4885.patch, HADOOP-4885.patch
>
>


--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HADOOP-4885) Try to restore failed replicas of Name Node storage (at checkpoint time)

2012-03-20 Thread Nathan Roberts (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/HADOOP-4885?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13233868#comment-13233868
 ] 

Nathan Roberts commented on HADOOP-4885:


Thanks for the response Brandon. My real concern is whether or not the namenode 
can continue completely normal operation during a long running restoration 
(several minutes for an image of 10s of GB). Or, since we haven't started the 
edits.new file yet, something may actually block??

> Try to restore failed replicas of Name Node storage (at checkpoint time)
> 
>
> Key: HADOOP-4885
> URL: https://issues.apache.org/jira/browse/HADOOP-4885
> Project: Hadoop Common
>  Issue Type: Improvement
>Reporter: Boris Shkolnik
>Assignee: Boris Shkolnik
> Fix For: 0.21.0
>
> Attachments: HADOOP-4885-1.patch, HADOOP-4885-3.patch, 
> HADOOP-4885-3.patch, HADOOP-4885.branch-1.patch, 
> HADOOP-4885.branch-1.patch.2, HADOOP-4885.branch-1.patch.3, 
> HADOOP-4885.patch, HADOOP-4885.patch
>
>


--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HADOOP-4885) Try to restore failed replicas of Name Node storage (at checkpoint time)

2012-03-20 Thread Brandon Li (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/HADOOP-4885?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13233633#comment-13233633
 ] 

Brandon Li commented on HADOOP-4885:


Hi Nathan,
It could be slow if the image is very large though currently the image size is 
limited by the memory size. 

Thanks,
Brandon

> Try to restore failed replicas of Name Node storage (at checkpoint time)
> 
>
> Key: HADOOP-4885
> URL: https://issues.apache.org/jira/browse/HADOOP-4885
> Project: Hadoop Common
>  Issue Type: Improvement
>Reporter: Boris Shkolnik
>Assignee: Boris Shkolnik
> Fix For: 0.21.0
>
> Attachments: HADOOP-4885-1.patch, HADOOP-4885-3.patch, 
> HADOOP-4885-3.patch, HADOOP-4885.branch-1.patch, 
> HADOOP-4885.branch-1.patch.2, HADOOP-4885.branch-1.patch.3, 
> HADOOP-4885.patch, HADOOP-4885.patch
>
>


--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HADOOP-4885) Try to restore failed replicas of Name Node storage (at checkpoint time)

2012-03-20 Thread Nathan Roberts (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/HADOOP-4885?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13233469#comment-13233469
 ] 

Nathan Roberts commented on HADOOP-4885:


Quick question on this patch. Are there any negative effects if the images 
being restored are very large or the restore is otherwise very slow? Just 
wondering because at first glance it looks like the restoration is being done 
after closing the current edits log and before starting edits.new.

> Try to restore failed replicas of Name Node storage (at checkpoint time)
> 
>
> Key: HADOOP-4885
> URL: https://issues.apache.org/jira/browse/HADOOP-4885
> Project: Hadoop Common
>  Issue Type: Improvement
>Reporter: Boris Shkolnik
>Assignee: Boris Shkolnik
> Fix For: 0.21.0
>
> Attachments: HADOOP-4885-1.patch, HADOOP-4885-3.patch, 
> HADOOP-4885-3.patch, HADOOP-4885.branch-1.patch, 
> HADOOP-4885.branch-1.patch.2, HADOOP-4885.branch-1.patch.3, 
> HADOOP-4885.patch, HADOOP-4885.patch
>
>


--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HADOOP-4885) Try to restore failed replicas of Name Node storage (at checkpoint time)

2012-03-15 Thread Eli Collins (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/HADOOP-4885?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13230483#comment-13230483
 ] 

Eli Collins commented on HADOOP-4885:
-

Ah, that makes sense. Thanks for the explanation!   +1 to the latest patch

> Try to restore failed replicas of Name Node storage (at checkpoint time)
> 
>
> Key: HADOOP-4885
> URL: https://issues.apache.org/jira/browse/HADOOP-4885
> Project: Hadoop Common
>  Issue Type: Improvement
>Reporter: Boris Shkolnik
>Assignee: Boris Shkolnik
> Fix For: 0.21.0
>
> Attachments: HADOOP-4885-1.patch, HADOOP-4885-3.patch, 
> HADOOP-4885-3.patch, HADOOP-4885.branch-1.patch, 
> HADOOP-4885.branch-1.patch.2, HADOOP-4885.branch-1.patch.3, 
> HADOOP-4885.patch, HADOOP-4885.patch
>
>


--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HADOOP-4885) Try to restore failed replicas of Name Node storage (at checkpoint time)

2012-03-15 Thread Brandon Li (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/HADOOP-4885?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13230297#comment-13230297
 ] 

Brandon Li commented on HADOOP-4885:


I did manual tests before including the unit tests in the backport patch. 

The new edit log is immediately created in the new storage directory, but the 
rolled edit log doesn't exist in the recovered storage directory. From the NN 
UI, the healthy storage directory and recovered directory both have "Active" 
status. This is why I said it's "misleading".

It would be a more obvious problem when the storage directory is a fsimage only 
directory. From the NN UI/JMX, the administrator can't tell which "Active" 
storage directory has fsimage inside and which doesn't. The same "Active" state 
here means differently at differnt time with different directories.

> Try to restore failed replicas of Name Node storage (at checkpoint time)
> 
>
> Key: HADOOP-4885
> URL: https://issues.apache.org/jira/browse/HADOOP-4885
> Project: Hadoop Common
>  Issue Type: Improvement
>Reporter: Boris Shkolnik
>Assignee: Boris Shkolnik
> Fix For: 0.21.0
>
> Attachments: HADOOP-4885-1.patch, HADOOP-4885-3.patch, 
> HADOOP-4885-3.patch, HADOOP-4885.branch-1.patch, 
> HADOOP-4885.branch-1.patch.2, HADOOP-4885.branch-1.patch.3, 
> HADOOP-4885.patch, HADOOP-4885.patch
>
>


--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HADOOP-4885) Try to restore failed replicas of Name Node storage (at checkpoint time)

2012-03-14 Thread Eli Collins (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/HADOOP-4885?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13229929#comment-13229929
 ] 

Eli Collins commented on HADOOP-4885:
-

bq. The format-addStorageDir solution make the failed directory "active" 
immediately even it's not a real active state. The state is visible from the nn 
UI and JMX. If the checkpoint fails, the fake "Active" state can be misleading.

Not sure I'm following.. when you roll the log and it restores the storage 
directory it creates a new empty storage dir, the directory is added to the 
list of storage dirs and a new edit log is immediately created on it (see 
FSEditLog#rollEditLog), ie it is immediately "active" right?

Have you done any testing of this patch aside from running the unit tests?

> Try to restore failed replicas of Name Node storage (at checkpoint time)
> 
>
> Key: HADOOP-4885
> URL: https://issues.apache.org/jira/browse/HADOOP-4885
> Project: Hadoop Common
>  Issue Type: Improvement
>Reporter: Boris Shkolnik
>Assignee: Boris Shkolnik
> Fix For: 0.21.0
>
> Attachments: HADOOP-4885-1.patch, HADOOP-4885-3.patch, 
> HADOOP-4885-3.patch, HADOOP-4885.branch-1.patch, 
> HADOOP-4885.branch-1.patch.2, HADOOP-4885.branch-1.patch.3, 
> HADOOP-4885.patch, HADOOP-4885.patch
>
>


--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HADOOP-4885) Try to restore failed replicas of Name Node storage (at checkpoint time)

2012-03-14 Thread Brandon Li (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/HADOOP-4885?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13229819#comment-13229819
 ] 

Brandon Li commented on HADOOP-4885:


The format-addStorageDir solution make the failed directory "active" 
immediately even it's not a real active state. The state is visible from the nn 
UI and JMX. If the checkpoint fails, the fake "Active" state can be misleading.

The copy-over solution may do some extra work but it sets the recovered storage 
directories in the real active state. 


I agree those 3 JIRA issues you mentioned should be back ported too to branch 
1.02 (the backport patch here is for branch-1 not 1.02). 

It's a good point about the network mount problem. :-)  
It's also a problem with original patch: the "format-addStorageDir" creates the 
storage directory if it doesn't exist. However, if this storage directory is a 
mount point, it shouldn't be created automatically. HDFS-3095 is filed for this 
issue.


> Try to restore failed replicas of Name Node storage (at checkpoint time)
> 
>
> Key: HADOOP-4885
> URL: https://issues.apache.org/jira/browse/HADOOP-4885
> Project: Hadoop Common
>  Issue Type: Improvement
>Reporter: Boris Shkolnik
>Assignee: Boris Shkolnik
> Fix For: 0.21.0
>
> Attachments: HADOOP-4885-1.patch, HADOOP-4885-3.patch, 
> HADOOP-4885-3.patch, HADOOP-4885.branch-1.patch, 
> HADOOP-4885.branch-1.patch.2, HADOOP-4885.branch-1.patch.3, 
> HADOOP-4885.patch, HADOOP-4885.patch
>
>


--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HADOOP-4885) Try to restore failed replicas of Name Node storage (at checkpoint time)

2012-03-14 Thread Tsz Wo (Nicholas), SZE (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/HADOOP-4885?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13229792#comment-13229792
 ] 

Tsz Wo (Nicholas), SZE commented on HADOOP-4885:


Hi Eli,

Brandon addressed all [your earlier 
comment|https://issues.apache.org/jira/browse/HADOOP-4885?focusedCommentId=13228915&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-13228915]
 last night.  I did not see your further comment so that I committed the patch.

You made some good points in [your previous 
comment|https://issues.apache.org/jira/browse/HADOOP-4885?focusedCommentId=13229774&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-13229774].
  As always, we could file a JIRA for them.

Does it sound good?

> Try to restore failed replicas of Name Node storage (at checkpoint time)
> 
>
> Key: HADOOP-4885
> URL: https://issues.apache.org/jira/browse/HADOOP-4885
> Project: Hadoop Common
>  Issue Type: Improvement
>Reporter: Boris Shkolnik
>Assignee: Boris Shkolnik
> Fix For: 0.21.0
>
> Attachments: HADOOP-4885-1.patch, HADOOP-4885-3.patch, 
> HADOOP-4885-3.patch, HADOOP-4885.branch-1.patch, 
> HADOOP-4885.branch-1.patch.2, HADOOP-4885.branch-1.patch.3, 
> HADOOP-4885.patch, HADOOP-4885.patch
>
>


--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HADOOP-4885) Try to restore failed replicas of Name Node storage (at checkpoint time)

2012-03-14 Thread Eli Collins (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/HADOOP-4885?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13229774#comment-13229774
 ] 

Eli Collins commented on HADOOP-4885:
-

bq. I didn't get your seco- nd question: my patch uses addStorageDir too. 

What I meant was the trunk patch does the following, which is much shorter:
{code}
sd.clearDirectory();
addStorageDir(sd);
{code}

and leverages the fact that checkpoint populates the directory. Why not use the 
same approach here?

- I'd test with a real NFS mount and disconnect/reconnect the network. I found 
some bugs that way when backporting this a while back. Also discovered 
HDFS-2701, HDFS-2702, HDFS-2703 via testing with a real build instead of the 
unit tests. 
- Nit: s/"may should be mounted"/"may be a network mount"/

> Try to restore failed replicas of Name Node storage (at checkpoint time)
> 
>
> Key: HADOOP-4885
> URL: https://issues.apache.org/jira/browse/HADOOP-4885
> Project: Hadoop Common
>  Issue Type: Improvement
>Reporter: Boris Shkolnik
>Assignee: Boris Shkolnik
> Fix For: 0.21.0
>
> Attachments: HADOOP-4885-1.patch, HADOOP-4885-3.patch, 
> HADOOP-4885-3.patch, HADOOP-4885.branch-1.patch, 
> HADOOP-4885.branch-1.patch.2, HADOOP-4885.branch-1.patch.3, 
> HADOOP-4885.patch, HADOOP-4885.patch
>
>


--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HADOOP-4885) Try to restore failed replicas of Name Node storage (at checkpoint time)

2012-03-14 Thread Tsz Wo (Nicholas), SZE (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/HADOOP-4885?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13229496#comment-13229496
 ] 

Tsz Wo (Nicholas), SZE commented on HADOOP-4885:


+1  The patch also looks good to me.  I will commit this in HDFS-3075.

> Try to restore failed replicas of Name Node storage (at checkpoint time)
> 
>
> Key: HADOOP-4885
> URL: https://issues.apache.org/jira/browse/HADOOP-4885
> Project: Hadoop Common
>  Issue Type: Improvement
>Reporter: Boris Shkolnik
>Assignee: Boris Shkolnik
> Fix For: 0.21.0
>
> Attachments: HADOOP-4885-1.patch, HADOOP-4885-3.patch, 
> HADOOP-4885-3.patch, HADOOP-4885.branch-1.patch, 
> HADOOP-4885.branch-1.patch.2, HADOOP-4885.branch-1.patch.3, 
> HADOOP-4885.patch, HADOOP-4885.patch
>
>


--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HADOOP-4885) Try to restore failed replicas of Name Node storage (at checkpoint time)

2012-03-14 Thread Jitendra Nath Pandey (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/HADOOP-4885?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13229457#comment-13229457
 ] 

Jitendra Nath Pandey commented on HADOOP-4885:
--

+1 the patch for branch-1 looks good to me.

> Try to restore failed replicas of Name Node storage (at checkpoint time)
> 
>
> Key: HADOOP-4885
> URL: https://issues.apache.org/jira/browse/HADOOP-4885
> Project: Hadoop Common
>  Issue Type: Improvement
>Reporter: Boris Shkolnik
>Assignee: Boris Shkolnik
> Fix For: 0.21.0
>
> Attachments: HADOOP-4885-1.patch, HADOOP-4885-3.patch, 
> HADOOP-4885-3.patch, HADOOP-4885.branch-1.patch, 
> HADOOP-4885.branch-1.patch.2, HADOOP-4885.branch-1.patch.3, 
> HADOOP-4885.patch, HADOOP-4885.patch
>
>


--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HADOOP-4885) Try to restore failed replicas of Name Node storage (at checkpoint time)

2012-03-14 Thread Brandon Li (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/HADOOP-4885?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13229452#comment-13229452
 ] 

Brandon Li commented on HADOOP-4885:


The new patch has the restart test. Thanks!

> Try to restore failed replicas of Name Node storage (at checkpoint time)
> 
>
> Key: HADOOP-4885
> URL: https://issues.apache.org/jira/browse/HADOOP-4885
> Project: Hadoop Common
>  Issue Type: Improvement
>Reporter: Boris Shkolnik
>Assignee: Boris Shkolnik
> Fix For: 0.21.0
>
> Attachments: HADOOP-4885-1.patch, HADOOP-4885-3.patch, 
> HADOOP-4885-3.patch, HADOOP-4885.branch-1.patch, 
> HADOOP-4885.branch-1.patch.2, HADOOP-4885.branch-1.patch.3, 
> HADOOP-4885.patch, HADOOP-4885.patch
>
>


--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HADOOP-4885) Try to restore failed replicas of Name Node storage (at checkpoint time)

2012-03-14 Thread Jitendra Nath Pandey (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/HADOOP-4885?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13229442#comment-13229442
 ] 

Jitendra Nath Pandey commented on HADOOP-4885:
--

The patch looks good to me. It would be great to add a few lines to test that 
namenode can restart with just a restored edits/image directory.

> Try to restore failed replicas of Name Node storage (at checkpoint time)
> 
>
> Key: HADOOP-4885
> URL: https://issues.apache.org/jira/browse/HADOOP-4885
> Project: Hadoop Common
>  Issue Type: Improvement
>Reporter: Boris Shkolnik
>Assignee: Boris Shkolnik
> Fix For: 0.21.0
>
> Attachments: HADOOP-4885-1.patch, HADOOP-4885-3.patch, 
> HADOOP-4885-3.patch, HADOOP-4885.branch-1.patch, 
> HADOOP-4885.branch-1.patch.2, HADOOP-4885.patch, HADOOP-4885.patch
>
>


--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HADOOP-4885) Try to restore failed replicas of Name Node storage (at checkpoint time)

2012-03-13 Thread Brandon Li (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/HADOOP-4885?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13229004#comment-13229004
 ] 

Brandon Li commented on HADOOP-4885:


Hi Eli,
Thanks for the comments!
The code base in branch-1 is slightly different with 0.21. 
Adding directories to removedStorageDirs in original patch is already in 
branch-1. 
I didn't get your second question: my patch uses addStorageDir too. 
The same test with minor modification(e.g., comparing md5 instead of length for 
edits files) is included in the backport patch.

Thanks.

> Try to restore failed replicas of Name Node storage (at checkpoint time)
> 
>
> Key: HADOOP-4885
> URL: https://issues.apache.org/jira/browse/HADOOP-4885
> Project: Hadoop Common
>  Issue Type: Improvement
>Reporter: Boris Shkolnik
>Assignee: Boris Shkolnik
> Fix For: 0.21.0
>
> Attachments: HADOOP-4885-1.patch, HADOOP-4885-3.patch, 
> HADOOP-4885-3.patch, HADOOP-4885.branch-1.patch, 
> HADOOP-4885.branch-1.patch.2, HADOOP-4885.patch, HADOOP-4885.patch
>
>


--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HADOOP-4885) Try to restore failed replicas of Name Node storage (at checkpoint time)

2012-03-13 Thread Eli Collins (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/HADOOP-4885?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13228915#comment-13228915
 ] 

Eli Collins commented on HADOOP-4885:
-

- In the trunk patch we're also adding directories to removedStorageDirs, seems 
like we'll need those additions here right? 
- The trunk version uses addStorageDir, any reason that it's done differently 
here?
- Testing?

> Try to restore failed replicas of Name Node storage (at checkpoint time)
> 
>
> Key: HADOOP-4885
> URL: https://issues.apache.org/jira/browse/HADOOP-4885
> Project: Hadoop Common
>  Issue Type: Improvement
>Reporter: Boris Shkolnik
>Assignee: Boris Shkolnik
> Fix For: 0.21.0
>
> Attachments: HADOOP-4885-1.patch, HADOOP-4885-3.patch, 
> HADOOP-4885-3.patch, HADOOP-4885.branch-1.patch, 
> HADOOP-4885.branch-1.patch.2, HADOOP-4885.patch, HADOOP-4885.patch
>
>


--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HADOOP-4885) Try to restore failed replicas of Name Node storage (at checkpoint time)

2012-03-12 Thread Tsz Wo (Nicholas), SZE (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/HADOOP-4885?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13228136#comment-13228136
 ] 

Tsz Wo (Nicholas), SZE commented on HADOOP-4885:


The getRestoreRemovedDirs() below should be removed.
{code}
+  boolean getRestoreRemovedDirs() {
+return this.restoreRemovedDirs;
+  }
{code}

> Try to restore failed replicas of Name Node storage (at checkpoint time)
> 
>
> Key: HADOOP-4885
> URL: https://issues.apache.org/jira/browse/HADOOP-4885
> Project: Hadoop Common
>  Issue Type: Improvement
>Reporter: Boris Shkolnik
>Assignee: Boris Shkolnik
> Fix For: 0.21.0
>
> Attachments: HADOOP-4885-1.patch, HADOOP-4885-3.patch, 
> HADOOP-4885-3.patch, HADOOP-4885.branch-1.patch, HADOOP-4885.patch, 
> HADOOP-4885.patch
>
>


--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HADOOP-4885) Try to restore failed replicas of Name Node storage (at checkpoint time)

2011-04-21 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/HADOOP-4885?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13023028#comment-13023028
 ] 

Hudson commented on HADOOP-4885:


Integrated in Hadoop-Hdfs-22-branch #35 (See 
[https://builds.apache.org/hudson/job/Hadoop-Hdfs-22-branch/35/])


> Try to restore failed replicas of Name Node storage (at checkpoint time)
> 
>
> Key: HADOOP-4885
> URL: https://issues.apache.org/jira/browse/HADOOP-4885
> Project: Hadoop Common
>  Issue Type: Improvement
>Reporter: Boris Shkolnik
>Assignee: Boris Shkolnik
> Fix For: 0.21.0
>
> Attachments: HADOOP-4885-1.patch, HADOOP-4885-3.patch, 
> HADOOP-4885-3.patch, HADOOP-4885.patch, HADOOP-4885.patch
>
>


--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HADOOP-4885) Try to restore failed replicas of Name Node storage (at checkpoint time)

2011-04-20 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/HADOOP-4885?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13022466#comment-13022466
 ] 

Hudson commented on HADOOP-4885:


Integrated in Hadoop-Hdfs-trunk #643 (See 
[https://builds.apache.org/hudson/job/Hadoop-Hdfs-trunk/643/])


> Try to restore failed replicas of Name Node storage (at checkpoint time)
> 
>
> Key: HADOOP-4885
> URL: https://issues.apache.org/jira/browse/HADOOP-4885
> Project: Hadoop Common
>  Issue Type: Improvement
>Reporter: Boris Shkolnik
>Assignee: Boris Shkolnik
> Fix For: 0.21.0
>
> Attachments: HADOOP-4885-1.patch, HADOOP-4885-3.patch, 
> HADOOP-4885-3.patch, HADOOP-4885.patch, HADOOP-4885.patch
>
>


--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] Commented: (HADOOP-4885) Try to restore failed replicas of Name Node storage (at checkpoint time)

2011-02-09 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/HADOOP-4885?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12992596#comment-12992596
 ] 

Hudson commented on HADOOP-4885:


Integrated in Hadoop-Hdfs-trunk-Commit #539 (See 
[https://hudson.apache.org/hudson/job/Hadoop-Hdfs-trunk-Commit/539/])


> Try to restore failed replicas of Name Node storage (at checkpoint time)
> 
>
> Key: HADOOP-4885
> URL: https://issues.apache.org/jira/browse/HADOOP-4885
> Project: Hadoop Common
>  Issue Type: Improvement
>Reporter: Boris Shkolnik
>Assignee: Boris Shkolnik
> Fix For: 0.21.0
>
> Attachments: HADOOP-4885-1.patch, HADOOP-4885-3.patch, 
> HADOOP-4885-3.patch, HADOOP-4885.patch, HADOOP-4885.patch
>
>


-- 
This message is automatically generated by JIRA.
-
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] Commented: (HADOOP-4885) Try to restore failed replicas of Name Node storage (at checkpoint time)

2011-02-07 Thread Boris Shkolnik (JIRA)

[ 
https://issues.apache.org/jira/browse/HADOOP-4885?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12991683#comment-12991683
 ] 

Boris Shkolnik commented on HADOOP-4885:


fix submitted in HDFS-1602.

> Try to restore failed replicas of Name Node storage (at checkpoint time)
> 
>
> Key: HADOOP-4885
> URL: https://issues.apache.org/jira/browse/HADOOP-4885
> Project: Hadoop Common
>  Issue Type: Improvement
>Reporter: Boris Shkolnik
>Assignee: Boris Shkolnik
> Fix For: 0.21.0
>
> Attachments: HADOOP-4885-1.patch, HADOOP-4885-3.patch, 
> HADOOP-4885-3.patch, HADOOP-4885.patch, HADOOP-4885.patch
>
>


-- 
This message is automatically generated by JIRA.
-
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] Commented: (HADOOP-4885) Try to restore failed replicas of Name Node storage (at checkpoint time)

2011-01-28 Thread Konstantin Boudnik (JIRA)

[ 
https://issues.apache.org/jira/browse/HADOOP-4885?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12988363#action_12988363
 ] 

Konstantin Boudnik commented on HADOOP-4885:


After a bit more of investigation I have noticed (dah!) this new config 
parameter {{dfs.name.dir.restore}} which triggers removed storage restoration. 
fsimage flies for both (nfs'ed and non-nfs volumes) as well as secondary NN's 
checkpoints have the same md5sums. So it seems that (as Hairong pointed out 
elsewhere) that without HDFS-903 this features kinda works.

> Try to restore failed replicas of Name Node storage (at checkpoint time)
> 
>
> Key: HADOOP-4885
> URL: https://issues.apache.org/jira/browse/HADOOP-4885
> Project: Hadoop Common
>  Issue Type: Improvement
>Reporter: Boris Shkolnik
>Assignee: Boris Shkolnik
> Fix For: 0.21.0
>
> Attachments: HADOOP-4885-1.patch, HADOOP-4885-3.patch, 
> HADOOP-4885-3.patch, HADOOP-4885.patch, HADOOP-4885.patch
>
>


-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (HADOOP-4885) Try to restore failed replicas of Name Node storage (at checkpoint time)

2011-01-27 Thread Konstantin Boudnik (JIRA)

[ 
https://issues.apache.org/jira/browse/HADOOP-4885?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12987912#action_12987912
 ] 

Konstantin Boudnik commented on HADOOP-4885:


This feature seems like not completely working: adding back once removed 
storage volume doesn't happen as expected (see HDFS-1496). I'd suggest to 
disable this new feature for now

> Try to restore failed replicas of Name Node storage (at checkpoint time)
> 
>
> Key: HADOOP-4885
> URL: https://issues.apache.org/jira/browse/HADOOP-4885
> Project: Hadoop Common
>  Issue Type: Improvement
>Reporter: Boris Shkolnik
>Assignee: Boris Shkolnik
> Fix For: 0.21.0
>
> Attachments: HADOOP-4885-1.patch, HADOOP-4885-3.patch, 
> HADOOP-4885-3.patch, HADOOP-4885.patch, HADOOP-4885.patch
>
>


-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.