[jira] [Updated] (MAPREDUCE-5332) Support token-preserving restart of history server
[ https://issues.apache.org/jira/browse/MAPREDUCE-5332?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jason Lowe updated MAPREDUCE-5332: -- Resolution: Fixed Fix Version/s: 2.3.0 3.0.0 Hadoop Flags: Reviewed Status: Resolved (was: Patch Available) I committed this to trunk and branch-2. > Support token-preserving restart of history server > -- > > Key: MAPREDUCE-5332 > URL: https://issues.apache.org/jira/browse/MAPREDUCE-5332 > Project: Hadoop Map/Reduce > Issue Type: New Feature > Components: jobhistoryserver >Reporter: Jason Lowe >Assignee: Jason Lowe > Fix For: 3.0.0, 2.3.0 > > Attachments: MAPREDUCE-5332-2.patch, MAPREDUCE-5332-3.patch, > MAPREDUCE-5332-4.patch, MAPREDUCE-5332-5.patch, MAPREDUCE-5332-5.patch, > MAPREDUCE-5332-6.patch, MAPREDUCE-5332-7.patch, MAPREDUCE-5332.patch > > > To better support rolling upgrades through a cluster, the history server > needs the ability to restart without losing track of delegation tokens. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (MAPREDUCE-5332) Support token-preserving restart of history server
[ https://issues.apache.org/jira/browse/MAPREDUCE-5332?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jason Lowe updated MAPREDUCE-5332: -- Attachment: MAPREDUCE-5332-7.patch Thanks for the thorough review, Daryn! Updated the patch to address all but one of the concerns. High-level changes include: * Added an updateToken method to the state store interface, and filesystem store uses rename to try to make this atomic. * Token buckets are created up front bq. The DTSM has the stateStore so its recovery method could load the state - instead of the caller loading the state from the stateStore and passing it in. The code may become a bit easier to follow, but just a suggestion. I kept this as-is. It makes more sense if the history server were to persist more items in the future than just these tokens, as you'd want to load the state once then dole out the bits of state to the various entities that need to recover using that state. Either that or the state stores should just be separate and per-service, then I agree that the recovery would be handled by each service. > Support token-preserving restart of history server > -- > > Key: MAPREDUCE-5332 > URL: https://issues.apache.org/jira/browse/MAPREDUCE-5332 > Project: Hadoop Map/Reduce > Issue Type: New Feature > Components: jobhistoryserver >Reporter: Jason Lowe >Assignee: Jason Lowe > Attachments: MAPREDUCE-5332-2.patch, MAPREDUCE-5332-3.patch, > MAPREDUCE-5332-4.patch, MAPREDUCE-5332-5.patch, MAPREDUCE-5332-5.patch, > MAPREDUCE-5332-6.patch, MAPREDUCE-5332-7.patch, MAPREDUCE-5332.patch > > > To better support rolling upgrades through a cluster, the history server > needs the ability to restart without losing track of delegation tokens. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (MAPREDUCE-5332) Support token-preserving restart of history server
[ https://issues.apache.org/jira/browse/MAPREDUCE-5332?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jason Lowe updated MAPREDUCE-5332: -- Attachment: MAPREDUCE-5332-6.patch Minor tweak to patch to set the permissions on the file during the create which should reduce the number of RPC calls when using HDFS as the filesystem. > Support token-preserving restart of history server > -- > > Key: MAPREDUCE-5332 > URL: https://issues.apache.org/jira/browse/MAPREDUCE-5332 > Project: Hadoop Map/Reduce > Issue Type: New Feature > Components: jobhistoryserver >Reporter: Jason Lowe >Assignee: Jason Lowe > Attachments: MAPREDUCE-5332-2.patch, MAPREDUCE-5332-3.patch, > MAPREDUCE-5332-4.patch, MAPREDUCE-5332-5.patch, MAPREDUCE-5332-5.patch, > MAPREDUCE-5332-6.patch, MAPREDUCE-5332.patch > > > To better support rolling upgrades through a cluster, the history server > needs the ability to restart without losing track of delegation tokens. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (MAPREDUCE-5332) Support token-preserving restart of history server
[ https://issues.apache.org/jira/browse/MAPREDUCE-5332?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jason Lowe updated MAPREDUCE-5332: -- Attachment: MAPREDUCE-5332-5.patch Wow, that's a lot of test breakage. None of the test failures appear to be related to this change. Many of them are failing with OOM errors due to too many threads, suspect this is caused by lingering AMs like what was reported in MAPREDUCE-5501 and YARN-1183. Also, I'm able to reproduce many of the failures on trunk without this patch. Uploading the same patch again to see if we can get a clean(er) run this time. > Support token-preserving restart of history server > -- > > Key: MAPREDUCE-5332 > URL: https://issues.apache.org/jira/browse/MAPREDUCE-5332 > Project: Hadoop Map/Reduce > Issue Type: New Feature > Components: jobhistoryserver >Reporter: Jason Lowe >Assignee: Jason Lowe > Attachments: MAPREDUCE-5332-2.patch, MAPREDUCE-5332-3.patch, > MAPREDUCE-5332-4.patch, MAPREDUCE-5332-5.patch, MAPREDUCE-5332-5.patch, > MAPREDUCE-5332.patch > > > To better support rolling upgrades through a cluster, the history server > needs the ability to restart without losing track of delegation tokens. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (MAPREDUCE-5332) Support token-preserving restart of history server
[ https://issues.apache.org/jira/browse/MAPREDUCE-5332?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jason Lowe updated MAPREDUCE-5332: -- Attachment: MAPREDUCE-5332-5.patch Updating the patch to use temporary files when creating key and token files. This prevents the recovery from seeing a partially-written file if we crash in the middle of a write. Also extended the unit tests to check for correct behavior on redundant key and token stores. > Support token-preserving restart of history server > -- > > Key: MAPREDUCE-5332 > URL: https://issues.apache.org/jira/browse/MAPREDUCE-5332 > Project: Hadoop Map/Reduce > Issue Type: New Feature > Components: jobhistoryserver >Reporter: Jason Lowe >Assignee: Jason Lowe > Attachments: MAPREDUCE-5332-2.patch, MAPREDUCE-5332-3.patch, > MAPREDUCE-5332-4.patch, MAPREDUCE-5332-5.patch, MAPREDUCE-5332.patch > > > To better support rolling upgrades through a cluster, the history server > needs the ability to restart without losing track of delegation tokens. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (MAPREDUCE-5332) Support token-preserving restart of history server
[ https://issues.apache.org/jira/browse/MAPREDUCE-5332?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jason Lowe updated MAPREDUCE-5332: -- Attachment: MAPREDUCE-5332-4.patch Updated patch to address Daryn's comments. Summary of changes: * Prefixes prefixed * getBucketPath div-to-mod fix * state stores renamed to state store services > Support token-preserving restart of history server > -- > > Key: MAPREDUCE-5332 > URL: https://issues.apache.org/jira/browse/MAPREDUCE-5332 > Project: Hadoop Map/Reduce > Issue Type: New Feature > Components: jobhistoryserver >Reporter: Jason Lowe >Assignee: Jason Lowe > Attachments: MAPREDUCE-5332-2.patch, MAPREDUCE-5332-3.patch, > MAPREDUCE-5332-4.patch, MAPREDUCE-5332.patch > > > To better support rolling upgrades through a cluster, the history server > needs the ability to restart without losing track of delegation tokens. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (MAPREDUCE-5332) Support token-preserving restart of history server
[ https://issues.apache.org/jira/browse/MAPREDUCE-5332?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jason Lowe updated MAPREDUCE-5332: -- Attachment: MAPREDUCE-5332-3.patch Updated patch based on similar changes in YARN-1082: * HistoryServerStateStorage is now a service * Moved filesystem startup to the startStorage method > Support token-preserving restart of history server > -- > > Key: MAPREDUCE-5332 > URL: https://issues.apache.org/jira/browse/MAPREDUCE-5332 > Project: Hadoop Map/Reduce > Issue Type: New Feature > Components: jobhistoryserver >Reporter: Jason Lowe >Assignee: Jason Lowe > Attachments: MAPREDUCE-5332-2.patch, MAPREDUCE-5332-3.patch, > MAPREDUCE-5332.patch > > > To better support rolling upgrades through a cluster, the history server > needs the ability to restart without losing track of delegation tokens. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (MAPREDUCE-5332) Support token-preserving restart of history server
[ https://issues.apache.org/jira/browse/MAPREDUCE-5332?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jason Lowe updated MAPREDUCE-5332: -- Attachment: MAPREDUCE-5332-2.patch Fixing release audit warning. The TestUberAM timeout has been occurring on trunk for quite a while and is unrelated to this change. > Support token-preserving restart of history server > -- > > Key: MAPREDUCE-5332 > URL: https://issues.apache.org/jira/browse/MAPREDUCE-5332 > Project: Hadoop Map/Reduce > Issue Type: New Feature > Components: jobhistoryserver >Reporter: Jason Lowe >Assignee: Jason Lowe > Attachments: MAPREDUCE-5332-2.patch, MAPREDUCE-5332.patch > > > To better support rolling upgrades through a cluster, the history server > needs the ability to restart without losing track of delegation tokens. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (MAPREDUCE-5332) Support token-preserving restart of history server
[ https://issues.apache.org/jira/browse/MAPREDUCE-5332?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jason Lowe updated MAPREDUCE-5332: -- Status: Patch Available (was: Open) > Support token-preserving restart of history server > -- > > Key: MAPREDUCE-5332 > URL: https://issues.apache.org/jira/browse/MAPREDUCE-5332 > Project: Hadoop Map/Reduce > Issue Type: New Feature > Components: jobhistoryserver >Reporter: Jason Lowe >Assignee: Jason Lowe > Attachments: MAPREDUCE-5332.patch > > > To better support rolling upgrades through a cluster, the history server > needs the ability to restart without losing track of delegation tokens. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (MAPREDUCE-5332) Support token-preserving restart of history server
[ https://issues.apache.org/jira/browse/MAPREDUCE-5332?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jason Lowe updated MAPREDUCE-5332: -- Attachment: MAPREDUCE-5332.patch Patch that adds token persistence in a similar manner to how it is done for the RM. One major difference is that an error in the token persistence layer is not fatal as it is for the RM. My thinking is it would be better for the history server to stay up than just fall over for any filesystem hiccup. It's easy to change this if people think that the history server should crash when this occurs. > Support token-preserving restart of history server > -- > > Key: MAPREDUCE-5332 > URL: https://issues.apache.org/jira/browse/MAPREDUCE-5332 > Project: Hadoop Map/Reduce > Issue Type: New Feature > Components: jobhistoryserver >Reporter: Jason Lowe >Assignee: Jason Lowe > Attachments: MAPREDUCE-5332.patch > > > To better support rolling upgrades through a cluster, the history server > needs the ability to restart without losing track of delegation tokens. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira