[jira] [Updated] (HDFS-3277) fail over to loading a different FSImage if the first one we try to load is corrupt
[ https://issues.apache.org/jira/browse/HDFS-3277?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Aaron T. Myers updated HDFS-3277: - Resolution: Fixed Fix Version/s: 2.0.5-beta Hadoop Flags: Reviewed Status: Resolved (was: Patch Available) I've just committed this to trunk and branch-2. Thanks a lot for the contribution, Colin and Andrew. fail over to loading a different FSImage if the first one we try to load is corrupt --- Key: HDFS-3277 URL: https://issues.apache.org/jira/browse/HDFS-3277 Project: Hadoop HDFS Issue Type: Bug Affects Versions: 3.0.0 Reporter: Colin Patrick McCabe Assignee: Andrew Wang Fix For: 2.0.5-beta Attachments: HDFS-3277.002.patch, HDFS-3277.003.patch, HDFS-3277.004.patch, HDFS-3277.005.patch, HDFS-3277.006.patch Most users store multiple copies of the FSImage in order to prevent catastrophic data loss if a hard disk fails. However, our image loading code is currently not set up to start reading another FSImage if loading the first one does not succeed. We should add this capability. We should also be sure to remove the FSImage directory that failed from the list of FSImage directories to write to, in the way we normally do when a write (as opopsed to read) fails. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HDFS-3277) fail over to loading a different FSImage if the first one we try to load is corrupt
[ https://issues.apache.org/jira/browse/HDFS-3277?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Andrew Wang updated HDFS-3277: -- Attachment: HDFS-3277.006.patch Thanks for the review Aaron! New patch attached. I agree about the secret manager, it looks like all the state being saved is just reset later anyway. Removed that bit of code, but Colin can ring in if we missed something. Hopefully addressed your other comments too. fail over to loading a different FSImage if the first one we try to load is corrupt --- Key: HDFS-3277 URL: https://issues.apache.org/jira/browse/HDFS-3277 Project: Hadoop HDFS Issue Type: Bug Affects Versions: 3.0.0 Reporter: Colin Patrick McCabe Assignee: Andrew Wang Attachments: HDFS-3277.002.patch, HDFS-3277.003.patch, HDFS-3277.004.patch, HDFS-3277.005.patch, HDFS-3277.006.patch Most users store multiple copies of the FSImage in order to prevent catastrophic data loss if a hard disk fails. However, our image loading code is currently not set up to start reading another FSImage if loading the first one does not succeed. We should add this capability. We should also be sure to remove the FSImage directory that failed from the list of FSImage directories to write to, in the way we normally do when a write (as opopsed to read) fails. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HDFS-3277) fail over to loading a different FSImage if the first one we try to load is corrupt
[ https://issues.apache.org/jira/browse/HDFS-3277?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Andrew Wang updated HDFS-3277: -- Attachment: HDFS-3277.005.patch Forgot a license header. fail over to loading a different FSImage if the first one we try to load is corrupt --- Key: HDFS-3277 URL: https://issues.apache.org/jira/browse/HDFS-3277 Project: Hadoop HDFS Issue Type: Bug Affects Versions: 3.0.0 Reporter: Colin Patrick McCabe Assignee: Andrew Wang Attachments: HDFS-3277.002.patch, HDFS-3277.003.patch, HDFS-3277.004.patch, HDFS-3277.005.patch Most users store multiple copies of the FSImage in order to prevent catastrophic data loss if a hard disk fails. However, our image loading code is currently not set up to start reading another FSImage if loading the first one does not succeed. We should add this capability. We should also be sure to remove the FSImage directory that failed from the list of FSImage directories to write to, in the way we normally do when a write (as opopsed to read) fails. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HDFS-3277) fail over to loading a different FSImage if the first one we try to load is corrupt
[ https://issues.apache.org/jira/browse/HDFS-3277?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Andrew Wang updated HDFS-3277: -- Attachment: HDFS-3277.004.patch I'm attaching a rebased version of Colin's patch. A few small fixups, mostly in getting the failing tests passing. I also added a new test which checks the failover case (e.g. 1 corrupt and 1 not corrupt fsimage). fail over to loading a different FSImage if the first one we try to load is corrupt --- Key: HDFS-3277 URL: https://issues.apache.org/jira/browse/HDFS-3277 Project: Hadoop HDFS Issue Type: Bug Affects Versions: 3.0.0 Reporter: Colin Patrick McCabe Assignee: Colin Patrick McCabe Attachments: HDFS-3277.002.patch, HDFS-3277.003.patch, HDFS-3277.004.patch Most users store multiple copies of the FSImage in order to prevent catastrophic data loss if a hard disk fails. However, our image loading code is currently not set up to start reading another FSImage if loading the first one does not succeed. We should add this capability. We should also be sure to remove the FSImage directory that failed from the list of FSImage directories to write to, in the way we normally do when a write (as opopsed to read) fails. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HDFS-3277) fail over to loading a different FSImage if the first one we try to load is corrupt
[ https://issues.apache.org/jira/browse/HDFS-3277?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Andrew Wang updated HDFS-3277: -- Status: Patch Available (was: Open) fail over to loading a different FSImage if the first one we try to load is corrupt --- Key: HDFS-3277 URL: https://issues.apache.org/jira/browse/HDFS-3277 Project: Hadoop HDFS Issue Type: Bug Affects Versions: 3.0.0 Reporter: Colin Patrick McCabe Assignee: Andrew Wang Attachments: HDFS-3277.002.patch, HDFS-3277.003.patch, HDFS-3277.004.patch Most users store multiple copies of the FSImage in order to prevent catastrophic data loss if a hard disk fails. However, our image loading code is currently not set up to start reading another FSImage if loading the first one does not succeed. We should add this capability. We should also be sure to remove the FSImage directory that failed from the list of FSImage directories to write to, in the way we normally do when a write (as opopsed to read) fails. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HDFS-3277) fail over to loading a different FSImage if the first one we try to load is corrupt
[ https://issues.apache.org/jira/browse/HDFS-3277?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Eli Collins updated HDFS-3277: -- Status: Open (was: Patch Available) fail over to loading a different FSImage if the first one we try to load is corrupt --- Key: HDFS-3277 URL: https://issues.apache.org/jira/browse/HDFS-3277 Project: Hadoop HDFS Issue Type: Bug Reporter: Colin Patrick McCabe Assignee: Colin Patrick McCabe Attachments: HDFS-3277.002.patch, HDFS-3277.003.patch Most users store multiple copies of the FSImage in order to prevent catastrophic data loss if a hard disk fails. However, our image loading code is currently not set up to start reading another FSImage if loading the first one does not succeed. We should add this capability. We should also be sure to remove the FSImage directory that failed from the list of FSImage directories to write to, in the way we normally do when a write (as opopsed to read) fails. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HDFS-3277) fail over to loading a different FSImage if the first one we try to load is corrupt
[ https://issues.apache.org/jira/browse/HDFS-3277?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Eli Collins updated HDFS-3277: -- Affects Version/s: 3.0.0 fail over to loading a different FSImage if the first one we try to load is corrupt --- Key: HDFS-3277 URL: https://issues.apache.org/jira/browse/HDFS-3277 Project: Hadoop HDFS Issue Type: Bug Affects Versions: 3.0.0 Reporter: Colin Patrick McCabe Assignee: Colin Patrick McCabe Attachments: HDFS-3277.002.patch, HDFS-3277.003.patch Most users store multiple copies of the FSImage in order to prevent catastrophic data loss if a hard disk fails. However, our image loading code is currently not set up to start reading another FSImage if loading the first one does not succeed. We should add this capability. We should also be sure to remove the FSImage directory that failed from the list of FSImage directories to write to, in the way we normally do when a write (as opopsed to read) fails. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HDFS-3277) fail over to loading a different FSImage if the first one we try to load is corrupt
[ https://issues.apache.org/jira/browse/HDFS-3277?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Colin Patrick McCabe updated HDFS-3277: --- Attachment: HDFS-3277.003.patch * fix bug where we weren't always loading the newest image(s) * rebase on trunk fail over to loading a different FSImage if the first one we try to load is corrupt --- Key: HDFS-3277 URL: https://issues.apache.org/jira/browse/HDFS-3277 Project: Hadoop HDFS Issue Type: Bug Reporter: Colin Patrick McCabe Assignee: Colin Patrick McCabe Attachments: HDFS-3277.002.patch, HDFS-3277.003.patch Most users store multiple copies of the FSImage in order to prevent catastrophic data loss if a hard disk fails. However, our image loading code is currently not set up to start reading another FSImage if loading the first one does not succeed. We should add this capability. We should also be sure to remove the FSImage directory that failed from the list of FSImage directories to write to, in the way we normally do when a write (as opopsed to read) fails. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HDFS-3277) fail over to loading a different FSImage if the first one we try to load is corrupt
[ https://issues.apache.org/jira/browse/HDFS-3277?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Colin Patrick McCabe updated HDFS-3277: --- Attachment: HDFS-3277.002.patch fail over to loading a different FSImage if the first one we try to load is corrupt --- Key: HDFS-3277 URL: https://issues.apache.org/jira/browse/HDFS-3277 Project: Hadoop HDFS Issue Type: Bug Reporter: Colin Patrick McCabe Assignee: Colin Patrick McCabe Attachments: HDFS-3277.002.patch Most users store multiple copies of the FSImage in order to prevent catastrophic data loss if a hard disk fails. However, our image loading code is currently not set up to start reading another FSImage if loading the first one does not succeed. We should add this capability. We should also be sure to remove the FSImage directory that failed from the list of FSImage directories to write to, in the way we normally do when a write (as opopsed to read) fails. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HDFS-3277) fail over to loading a different FSImage if the first one we try to load is corrupt
[ https://issues.apache.org/jira/browse/HDFS-3277?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Colin Patrick McCabe updated HDFS-3277: --- Status: Patch Available (was: Open) fail over to loading a different FSImage if the first one we try to load is corrupt --- Key: HDFS-3277 URL: https://issues.apache.org/jira/browse/HDFS-3277 Project: Hadoop HDFS Issue Type: Bug Reporter: Colin Patrick McCabe Assignee: Colin Patrick McCabe Attachments: HDFS-3277.002.patch Most users store multiple copies of the FSImage in order to prevent catastrophic data loss if a hard disk fails. However, our image loading code is currently not set up to start reading another FSImage if loading the first one does not succeed. We should add this capability. We should also be sure to remove the FSImage directory that failed from the list of FSImage directories to write to, in the way we normally do when a write (as opopsed to read) fails. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira