[jira] [Updated] (HDFS-3277) fail over to loading a different FSImage if the first one we try to load is corrupt

2013-03-14 Thread Aaron T. Myers (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-3277?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Aaron T. Myers updated HDFS-3277:
-

   Resolution: Fixed
Fix Version/s: 2.0.5-beta
 Hadoop Flags: Reviewed
   Status: Resolved  (was: Patch Available)

I've just committed this to trunk and branch-2.

Thanks a lot for the contribution, Colin and Andrew.

 fail over to loading a different FSImage if the first one we try to load is 
 corrupt
 ---

 Key: HDFS-3277
 URL: https://issues.apache.org/jira/browse/HDFS-3277
 Project: Hadoop HDFS
  Issue Type: Bug
Affects Versions: 3.0.0
Reporter: Colin Patrick McCabe
Assignee: Andrew Wang
 Fix For: 2.0.5-beta

 Attachments: HDFS-3277.002.patch, HDFS-3277.003.patch, 
 HDFS-3277.004.patch, HDFS-3277.005.patch, HDFS-3277.006.patch


 Most users store multiple copies of the FSImage in order to prevent 
 catastrophic data loss if a hard disk fails.  However, our image loading code 
 is currently not set up to start reading another FSImage if loading the first 
 one does not succeed.  We should add this capability.
 We should also be sure to remove the FSImage directory that failed from the 
 list of FSImage directories to write to, in the way we normally do when a 
 write (as opopsed to read) fails.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (HDFS-3277) fail over to loading a different FSImage if the first one we try to load is corrupt

2013-03-13 Thread Andrew Wang (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-3277?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Andrew Wang updated HDFS-3277:
--

Attachment: HDFS-3277.006.patch

Thanks for the review Aaron! New patch attached.

I agree about the secret manager, it looks like all the state being saved is 
just reset later anyway. Removed that bit of code, but Colin can ring in if we 
missed something.

Hopefully addressed your other comments too.

 fail over to loading a different FSImage if the first one we try to load is 
 corrupt
 ---

 Key: HDFS-3277
 URL: https://issues.apache.org/jira/browse/HDFS-3277
 Project: Hadoop HDFS
  Issue Type: Bug
Affects Versions: 3.0.0
Reporter: Colin Patrick McCabe
Assignee: Andrew Wang
 Attachments: HDFS-3277.002.patch, HDFS-3277.003.patch, 
 HDFS-3277.004.patch, HDFS-3277.005.patch, HDFS-3277.006.patch


 Most users store multiple copies of the FSImage in order to prevent 
 catastrophic data loss if a hard disk fails.  However, our image loading code 
 is currently not set up to start reading another FSImage if loading the first 
 one does not succeed.  We should add this capability.
 We should also be sure to remove the FSImage directory that failed from the 
 list of FSImage directories to write to, in the way we normally do when a 
 write (as opopsed to read) fails.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (HDFS-3277) fail over to loading a different FSImage if the first one we try to load is corrupt

2013-03-01 Thread Andrew Wang (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-3277?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Andrew Wang updated HDFS-3277:
--

Attachment: HDFS-3277.005.patch

Forgot a license header.

 fail over to loading a different FSImage if the first one we try to load is 
 corrupt
 ---

 Key: HDFS-3277
 URL: https://issues.apache.org/jira/browse/HDFS-3277
 Project: Hadoop HDFS
  Issue Type: Bug
Affects Versions: 3.0.0
Reporter: Colin Patrick McCabe
Assignee: Andrew Wang
 Attachments: HDFS-3277.002.patch, HDFS-3277.003.patch, 
 HDFS-3277.004.patch, HDFS-3277.005.patch


 Most users store multiple copies of the FSImage in order to prevent 
 catastrophic data loss if a hard disk fails.  However, our image loading code 
 is currently not set up to start reading another FSImage if loading the first 
 one does not succeed.  We should add this capability.
 We should also be sure to remove the FSImage directory that failed from the 
 list of FSImage directories to write to, in the way we normally do when a 
 write (as opopsed to read) fails.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (HDFS-3277) fail over to loading a different FSImage if the first one we try to load is corrupt

2013-02-28 Thread Andrew Wang (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-3277?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Andrew Wang updated HDFS-3277:
--

Attachment: HDFS-3277.004.patch

I'm attaching a rebased version of Colin's patch. A few small fixups, mostly in 
getting the failing tests passing. I also added a new test which checks the 
failover case (e.g. 1 corrupt and 1 not corrupt fsimage).

 fail over to loading a different FSImage if the first one we try to load is 
 corrupt
 ---

 Key: HDFS-3277
 URL: https://issues.apache.org/jira/browse/HDFS-3277
 Project: Hadoop HDFS
  Issue Type: Bug
Affects Versions: 3.0.0
Reporter: Colin Patrick McCabe
Assignee: Colin Patrick McCabe
 Attachments: HDFS-3277.002.patch, HDFS-3277.003.patch, 
 HDFS-3277.004.patch


 Most users store multiple copies of the FSImage in order to prevent 
 catastrophic data loss if a hard disk fails.  However, our image loading code 
 is currently not set up to start reading another FSImage if loading the first 
 one does not succeed.  We should add this capability.
 We should also be sure to remove the FSImage directory that failed from the 
 list of FSImage directories to write to, in the way we normally do when a 
 write (as opopsed to read) fails.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (HDFS-3277) fail over to loading a different FSImage if the first one we try to load is corrupt

2013-02-28 Thread Andrew Wang (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-3277?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Andrew Wang updated HDFS-3277:
--

Status: Patch Available  (was: Open)

 fail over to loading a different FSImage if the first one we try to load is 
 corrupt
 ---

 Key: HDFS-3277
 URL: https://issues.apache.org/jira/browse/HDFS-3277
 Project: Hadoop HDFS
  Issue Type: Bug
Affects Versions: 3.0.0
Reporter: Colin Patrick McCabe
Assignee: Andrew Wang
 Attachments: HDFS-3277.002.patch, HDFS-3277.003.patch, 
 HDFS-3277.004.patch


 Most users store multiple copies of the FSImage in order to prevent 
 catastrophic data loss if a hard disk fails.  However, our image loading code 
 is currently not set up to start reading another FSImage if loading the first 
 one does not succeed.  We should add this capability.
 We should also be sure to remove the FSImage directory that failed from the 
 list of FSImage directories to write to, in the way we normally do when a 
 write (as opopsed to read) fails.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (HDFS-3277) fail over to loading a different FSImage if the first one we try to load is corrupt

2012-07-13 Thread Eli Collins (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-3277?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Eli Collins updated HDFS-3277:
--

Status: Open  (was: Patch Available)

 fail over to loading a different FSImage if the first one we try to load is 
 corrupt
 ---

 Key: HDFS-3277
 URL: https://issues.apache.org/jira/browse/HDFS-3277
 Project: Hadoop HDFS
  Issue Type: Bug
Reporter: Colin Patrick McCabe
Assignee: Colin Patrick McCabe
 Attachments: HDFS-3277.002.patch, HDFS-3277.003.patch


 Most users store multiple copies of the FSImage in order to prevent 
 catastrophic data loss if a hard disk fails.  However, our image loading code 
 is currently not set up to start reading another FSImage if loading the first 
 one does not succeed.  We should add this capability.
 We should also be sure to remove the FSImage directory that failed from the 
 list of FSImage directories to write to, in the way we normally do when a 
 write (as opopsed to read) fails.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Updated] (HDFS-3277) fail over to loading a different FSImage if the first one we try to load is corrupt

2012-07-13 Thread Eli Collins (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-3277?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Eli Collins updated HDFS-3277:
--

Affects Version/s: 3.0.0

 fail over to loading a different FSImage if the first one we try to load is 
 corrupt
 ---

 Key: HDFS-3277
 URL: https://issues.apache.org/jira/browse/HDFS-3277
 Project: Hadoop HDFS
  Issue Type: Bug
Affects Versions: 3.0.0
Reporter: Colin Patrick McCabe
Assignee: Colin Patrick McCabe
 Attachments: HDFS-3277.002.patch, HDFS-3277.003.patch


 Most users store multiple copies of the FSImage in order to prevent 
 catastrophic data loss if a hard disk fails.  However, our image loading code 
 is currently not set up to start reading another FSImage if loading the first 
 one does not succeed.  We should add this capability.
 We should also be sure to remove the FSImage directory that failed from the 
 list of FSImage directories to write to, in the way we normally do when a 
 write (as opopsed to read) fails.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Updated] (HDFS-3277) fail over to loading a different FSImage if the first one we try to load is corrupt

2012-05-08 Thread Colin Patrick McCabe (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-3277?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Colin Patrick McCabe updated HDFS-3277:
---

Attachment: HDFS-3277.003.patch

* fix bug where we weren't always loading the newest image(s)

* rebase on trunk

 fail over to loading a different FSImage if the first one we try to load is 
 corrupt
 ---

 Key: HDFS-3277
 URL: https://issues.apache.org/jira/browse/HDFS-3277
 Project: Hadoop HDFS
  Issue Type: Bug
Reporter: Colin Patrick McCabe
Assignee: Colin Patrick McCabe
 Attachments: HDFS-3277.002.patch, HDFS-3277.003.patch


 Most users store multiple copies of the FSImage in order to prevent 
 catastrophic data loss if a hard disk fails.  However, our image loading code 
 is currently not set up to start reading another FSImage if loading the first 
 one does not succeed.  We should add this capability.
 We should also be sure to remove the FSImage directory that failed from the 
 list of FSImage directories to write to, in the way we normally do when a 
 write (as opopsed to read) fails.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Updated] (HDFS-3277) fail over to loading a different FSImage if the first one we try to load is corrupt

2012-04-26 Thread Colin Patrick McCabe (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-3277?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Colin Patrick McCabe updated HDFS-3277:
---

Attachment: HDFS-3277.002.patch

 fail over to loading a different FSImage if the first one we try to load is 
 corrupt
 ---

 Key: HDFS-3277
 URL: https://issues.apache.org/jira/browse/HDFS-3277
 Project: Hadoop HDFS
  Issue Type: Bug
Reporter: Colin Patrick McCabe
Assignee: Colin Patrick McCabe
 Attachments: HDFS-3277.002.patch


 Most users store multiple copies of the FSImage in order to prevent 
 catastrophic data loss if a hard disk fails.  However, our image loading code 
 is currently not set up to start reading another FSImage if loading the first 
 one does not succeed.  We should add this capability.
 We should also be sure to remove the FSImage directory that failed from the 
 list of FSImage directories to write to, in the way we normally do when a 
 write (as opopsed to read) fails.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Updated] (HDFS-3277) fail over to loading a different FSImage if the first one we try to load is corrupt

2012-04-26 Thread Colin Patrick McCabe (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-3277?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Colin Patrick McCabe updated HDFS-3277:
---

Status: Patch Available  (was: Open)

 fail over to loading a different FSImage if the first one we try to load is 
 corrupt
 ---

 Key: HDFS-3277
 URL: https://issues.apache.org/jira/browse/HDFS-3277
 Project: Hadoop HDFS
  Issue Type: Bug
Reporter: Colin Patrick McCabe
Assignee: Colin Patrick McCabe
 Attachments: HDFS-3277.002.patch


 Most users store multiple copies of the FSImage in order to prevent 
 catastrophic data loss if a hard disk fails.  However, our image loading code 
 is currently not set up to start reading another FSImage if loading the first 
 one does not succeed.  We should add this capability.
 We should also be sure to remove the FSImage directory that failed from the 
 list of FSImage directories to write to, in the way we normally do when a 
 write (as opopsed to read) fails.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira