Re: Next releases
This merge to branch-2 is complete. The changes have been merged to branch-2 and target version set to 2.4.0 (r1556076). On Fri, Jan 3, 2014 at 4:13 PM, Arpit Agarwal aagar...@hortonworks.comwrote: We plan to merge HDFS-2832 to branch-2 next week for inclusion in 2.4. On Fri, Dec 6, 2013 at 1:53 PM, Arun C Murthy a...@hortonworks.com wrote: Thanks Suresh Colin. Please update the Roadmap wiki with your proposals. As always, we will try our best to get these in - but we can collectively decide to slip some of these to subsequent releases based on timelines. Arun On Dec 6, 2013, at 10:43 AM, Suresh Srinivas sur...@hortonworks.com wrote: Arun, I propose the following changes for 2.3: - There have been a lot of improvements related to supporting http policy. - There is a still discussion going on, but I would like to deprecate BackupNode in 2.3 as well. - We are currently working on rolling upgrades related change in HDFS. We might add a couple of changes that enables rolling upgrades from 2.3 onwards (hopefully we can this done by December) I propose the following for 2.4 release, if they are tested and stable: - Heterogeneous storage support - HDFS-2832 - Datanode cache related change - HDFS-4949 - HDFS ACLs - HDFS-4685 - Rolling upgrade changes Let me know if you want me to update the wiki. Regards, Suresh On Dec 6, 2013, at 12:27 PM, Colin McCabe cmcc...@alumni.cmu.edu wrote: If 2.4 is released in January, I think it's very unlikely to include symlinks. There is still a lot of work to be done before they're usable. You can look at the progress on HADOOP-10019. For some of the subtasks, it will require some community discussion before any code can be written. For better or worse, symlinks have not been requested by users as often as features like NFS export, HDFS caching, ACLs, etc, so effort has been focused on those instead. For now, I think we should put the symlinks-disabling patches (HADOOP-10020, etc) into branch-2, so that they will be part of the next releases without additional effort. I would like to see HDFS caching make it into 2.4. The APIs and implementation are beginning to stabilize, and around January it should be ok to backport to a stable branch. best, Colin On Thu, Nov 7, 2013 at 6:42 PM, Arun C Murthy a...@hortonworks.com wrote: Gang, Thinking through the next couple of releases here, appreciate f/b. # hadoop-2.2.1 I was looking through commit logs and there is a *lot* of content here (81 commits as on 11/7). Some are features/improvements and some are fixes - it's really hard to distinguish what is important and what isn't. I propose we start with a blank slate (i.e. blow away branch-2.2 and start fresh from a copy of branch-2.2.0) and then be very careful and meticulous about including only *blocker* fixes in branch-2.2. So, most of the content here comes via the next minor release (i.e. hadoop-2.3) In future, we continue to be *very* parsimonious about what gets into a patch release (major.minor.patch) - in general, these should be only *blocker* fixes or key operational issues. # hadoop-2.3 I'd like to propose the following features for YARN/MR to make it into hadoop-2.3 and punt the rest to hadoop-2.4 and beyond: * Application History Server - This is happening in a branch and is close; with it we can provide a reasonable experience for new frameworks being built on top of YARN. * Bug-fixes in RM Restart * Minimal support for long-running applications (e.g. security) via YARN-896 * RM Fail-over via ZKFC * Anything else? HDFS??? Overall, I feel like we have a decent chance of rolling hadoop-2.3 by the end of the year. Thoughts? thanks, Arun -- Arun C. Murthy Hortonworks Inc. http://hortonworks.com/ -- CONFIDENTIALITY NOTICE NOTICE: This message is intended for the use of the individual or entity to which it is addressed and may contain information that is confidential, privileged and exempt from disclosure under applicable law. If the reader of this message is not the intended recipient, you are hereby notified that any printing, copying, dissemination, distribution, disclosure or forwarding of this communication is strictly prohibited. If you have received this communication in error, please contact the sender immediately and delete it from your system. Thank You. -- http://hortonworks.com/download/ -- CONFIDENTIALITY NOTICE NOTICE: This message is intended for the use of the individual or entity to which it is addressed and may contain information that is confidential, privileged and exempt from disclosure under applicable law. If the reader of this message is not the intended recipient, you are hereby notified that any printing, copying, dissemination, distribution, disclosure or forwarding
Re: Next releases
Great news! Thanks Arpit! I think we should update Roadmap wiki to include this work. :) Thanks, Junping - Original Message - From: Arpit Agarwal aagar...@hortonworks.com To: common-dev@hadoop.apache.org Cc: yarn-...@hadoop.apache.org, hdfs-...@hadoop.apache.org, mapreduce-...@hadoop.apache.org Sent: Tuesday, January 7, 2014 8:47:32 AM Subject: Re: Next releases This merge to branch-2 is complete. The changes have been merged to branch-2 and target version set to 2.4.0 (r1556076). On Fri, Jan 3, 2014 at 4:13 PM, Arpit Agarwal aagar...@hortonworks.comwrote: We plan to merge HDFS-2832 to branch-2 next week for inclusion in 2.4. On Fri, Dec 6, 2013 at 1:53 PM, Arun C Murthy a...@hortonworks.com wrote: Thanks Suresh Colin. Please update the Roadmap wiki with your proposals. As always, we will try our best to get these in - but we can collectively decide to slip some of these to subsequent releases based on timelines. Arun On Dec 6, 2013, at 10:43 AM, Suresh Srinivas sur...@hortonworks.com wrote: Arun, I propose the following changes for 2.3: - There have been a lot of improvements related to supporting http policy. - There is a still discussion going on, but I would like to deprecate BackupNode in 2.3 as well. - We are currently working on rolling upgrades related change in HDFS. We might add a couple of changes that enables rolling upgrades from 2.3 onwards (hopefully we can this done by December) I propose the following for 2.4 release, if they are tested and stable: - Heterogeneous storage support - HDFS-2832 - Datanode cache related change - HDFS-4949 - HDFS ACLs - HDFS-4685 - Rolling upgrade changes Let me know if you want me to update the wiki. Regards, Suresh On Dec 6, 2013, at 12:27 PM, Colin McCabe cmcc...@alumni.cmu.edu wrote: If 2.4 is released in January, I think it's very unlikely to include symlinks. There is still a lot of work to be done before they're usable. You can look at the progress on HADOOP-10019. For some of the subtasks, it will require some community discussion before any code can be written. For better or worse, symlinks have not been requested by users as often as features like NFS export, HDFS caching, ACLs, etc, so effort has been focused on those instead. For now, I think we should put the symlinks-disabling patches (HADOOP-10020, etc) into branch-2, so that they will be part of the next releases without additional effort. I would like to see HDFS caching make it into 2.4. The APIs and implementation are beginning to stabilize, and around January it should be ok to backport to a stable branch. best, Colin On Thu, Nov 7, 2013 at 6:42 PM, Arun C Murthy a...@hortonworks.com wrote: Gang, Thinking through the next couple of releases here, appreciate f/b. # hadoop-2.2.1 I was looking through commit logs and there is a *lot* of content here (81 commits as on 11/7). Some are features/improvements and some are fixes - it's really hard to distinguish what is important and what isn't. I propose we start with a blank slate (i.e. blow away branch-2.2 and start fresh from a copy of branch-2.2.0) and then be very careful and meticulous about including only *blocker* fixes in branch-2.2. So, most of the content here comes via the next minor release (i.e. hadoop-2.3) In future, we continue to be *very* parsimonious about what gets into a patch release (major.minor.patch) - in general, these should be only *blocker* fixes or key operational issues. # hadoop-2.3 I'd like to propose the following features for YARN/MR to make it into hadoop-2.3 and punt the rest to hadoop-2.4 and beyond: * Application History Server - This is happening in a branch and is close; with it we can provide a reasonable experience for new frameworks being built on top of YARN. * Bug-fixes in RM Restart * Minimal support for long-running applications (e.g. security) via YARN-896 * RM Fail-over via ZKFC * Anything else? HDFS??? Overall, I feel like we have a decent chance of rolling hadoop-2.3 by the end of the year. Thoughts? thanks, Arun -- Arun C. Murthy Hortonworks Inc. https://urldefense.proofpoint.com/v1/url?u=http://hortonworks.com/k=oIvRg1%2BdGAgOoM1BIlLLqw%3D%3D%0Ar=Mw3izENeqbMFnOzHo594vQ%3D%3D%0Am=mPOfB21VcWnmM3DU81gzph9Quwrh%2BSSBvfQ2CNr5G0g%3D%0As=35573f6f0811fd2b7cc2274469110563eca6b20dce4ed5961ae05ad601d552a0 -- CONFIDENTIALITY NOTICE NOTICE: This message is intended for the use of the individual or entity to which it is addressed and may contain information that is confidential, privileged and exempt from disclosure under applicable law. If the reader of this message is not the intended recipient, you are hereby notified that any printing, copying, dissemination, distribution, disclosure or forwarding of this communication is strictly prohibited. If you
Re: Next releases
We plan to merge HDFS-2832 to branch-2 next week for inclusion in 2.4. On Fri, Dec 6, 2013 at 1:53 PM, Arun C Murthy a...@hortonworks.com wrote: Thanks Suresh Colin. Please update the Roadmap wiki with your proposals. As always, we will try our best to get these in - but we can collectively decide to slip some of these to subsequent releases based on timelines. Arun On Dec 6, 2013, at 10:43 AM, Suresh Srinivas sur...@hortonworks.com wrote: Arun, I propose the following changes for 2.3: - There have been a lot of improvements related to supporting http policy. - There is a still discussion going on, but I would like to deprecate BackupNode in 2.3 as well. - We are currently working on rolling upgrades related change in HDFS. We might add a couple of changes that enables rolling upgrades from 2.3 onwards (hopefully we can this done by December) I propose the following for 2.4 release, if they are tested and stable: - Heterogeneous storage support - HDFS-2832 - Datanode cache related change - HDFS-4949 - HDFS ACLs - HDFS-4685 - Rolling upgrade changes Let me know if you want me to update the wiki. Regards, Suresh On Dec 6, 2013, at 12:27 PM, Colin McCabe cmcc...@alumni.cmu.edu wrote: If 2.4 is released in January, I think it's very unlikely to include symlinks. There is still a lot of work to be done before they're usable. You can look at the progress on HADOOP-10019. For some of the subtasks, it will require some community discussion before any code can be written. For better or worse, symlinks have not been requested by users as often as features like NFS export, HDFS caching, ACLs, etc, so effort has been focused on those instead. For now, I think we should put the symlinks-disabling patches (HADOOP-10020, etc) into branch-2, so that they will be part of the next releases without additional effort. I would like to see HDFS caching make it into 2.4. The APIs and implementation are beginning to stabilize, and around January it should be ok to backport to a stable branch. best, Colin On Thu, Nov 7, 2013 at 6:42 PM, Arun C Murthy a...@hortonworks.com wrote: Gang, Thinking through the next couple of releases here, appreciate f/b. # hadoop-2.2.1 I was looking through commit logs and there is a *lot* of content here (81 commits as on 11/7). Some are features/improvements and some are fixes - it's really hard to distinguish what is important and what isn't. I propose we start with a blank slate (i.e. blow away branch-2.2 and start fresh from a copy of branch-2.2.0) and then be very careful and meticulous about including only *blocker* fixes in branch-2.2. So, most of the content here comes via the next minor release (i.e. hadoop-2.3) In future, we continue to be *very* parsimonious about what gets into a patch release (major.minor.patch) - in general, these should be only *blocker* fixes or key operational issues. # hadoop-2.3 I'd like to propose the following features for YARN/MR to make it into hadoop-2.3 and punt the rest to hadoop-2.4 and beyond: * Application History Server - This is happening in a branch and is close; with it we can provide a reasonable experience for new frameworks being built on top of YARN. * Bug-fixes in RM Restart * Minimal support for long-running applications (e.g. security) via YARN-896 * RM Fail-over via ZKFC * Anything else? HDFS??? Overall, I feel like we have a decent chance of rolling hadoop-2.3 by the end of the year. Thoughts? thanks, Arun -- Arun C. Murthy Hortonworks Inc. http://hortonworks.com/ -- CONFIDENTIALITY NOTICE NOTICE: This message is intended for the use of the individual or entity to which it is addressed and may contain information that is confidential, privileged and exempt from disclosure under applicable law. If the reader of this message is not the intended recipient, you are hereby notified that any printing, copying, dissemination, distribution, disclosure or forwarding of this communication is strictly prohibited. If you have received this communication in error, please contact the sender immediately and delete it from your system. Thank You. -- http://hortonworks.com/download/ -- CONFIDENTIALITY NOTICE NOTICE: This message is intended for the use of the individual or entity to which it is addressed and may contain information that is confidential, privileged and exempt from disclosure under applicable law. If the reader of this message is not the intended recipient, you are hereby notified that any printing, copying, dissemination, distribution, disclosure or forwarding of this communication is strictly prohibited. If you have received this communication in error, please contact the sender immediately and delete it from your system. Thank You. -- Arun C. Murthy Hortonworks
Re: Next releases
If 2.4 is released in January, I think it's very unlikely to include symlinks. There is still a lot of work to be done before they're usable. You can look at the progress on HADOOP-10019. For some of the subtasks, it will require some community discussion before any code can be written. For better or worse, symlinks have not been requested by users as often as features like NFS export, HDFS caching, ACLs, etc, so effort has been focused on those instead. For now, I think we should put the symlinks-disabling patches (HADOOP-10020, etc) into branch-2, so that they will be part of the next releases without additional effort. I would like to see HDFS caching make it into 2.4. The APIs and implementation are beginning to stabilize, and around January it should be ok to backport to a stable branch. best, Colin On Thu, Dec 5, 2013 at 3:57 PM, Arun C Murthy a...@hortonworks.com wrote: Ok, I've updated https://wiki.apache.org/hadoop/Roadmap with a initial strawman list for hadoop-2.4 which I feel we can get out in Jan. What else would folks like to see? Please keep timeframe in mind. thanks, Arun On Dec 2, 2013, at 10:55 AM, Arun C Murthy a...@hortonworks.com wrote: On Nov 13, 2013, at 1:55 PM, Jason Lowe jl...@yahoo-inc.com wrote: +1 to limiting checkins of patch releases to Blockers/Criticals. If necessary committers check into trunk/branch-2 only and defer to the patch release manager for the patch release merge. Then there should be fewer surprises for everyone what ended up in a patch release and less likely the patch release becomes destabilized from the sheer amount of code churn. Maybe this won't be necessary if everyone understands that the patch release isn't the only way to get a change out in timely manner. I've updated https://wiki.apache.org/hadoop/Roadmap to reflect that we only put in Blocker/Critical bugs into Point Releases. Committers, from now, please exercise extreme caution when committing to a point release: they should only be limited to Blocker bugs. thanks, Arun -- Arun C. Murthy Hortonworks Inc. http://hortonworks.com/ -- CONFIDENTIALITY NOTICE NOTICE: This message is intended for the use of the individual or entity to which it is addressed and may contain information that is confidential, privileged and exempt from disclosure under applicable law. If the reader of this message is not the intended recipient, you are hereby notified that any printing, copying, dissemination, distribution, disclosure or forwarding of this communication is strictly prohibited. If you have received this communication in error, please contact the sender immediately and delete it from your system. Thank You.
Re: Next releases
Thanks Suresh Colin. Please update the Roadmap wiki with your proposals. As always, we will try our best to get these in - but we can collectively decide to slip some of these to subsequent releases based on timelines. Arun On Dec 6, 2013, at 10:43 AM, Suresh Srinivas sur...@hortonworks.com wrote: Arun, I propose the following changes for 2.3: - There have been a lot of improvements related to supporting http policy. - There is a still discussion going on, but I would like to deprecate BackupNode in 2.3 as well. - We are currently working on rolling upgrades related change in HDFS. We might add a couple of changes that enables rolling upgrades from 2.3 onwards (hopefully we can this done by December) I propose the following for 2.4 release, if they are tested and stable: - Heterogeneous storage support - HDFS-2832 - Datanode cache related change - HDFS-4949 - HDFS ACLs - HDFS-4685 - Rolling upgrade changes Let me know if you want me to update the wiki. Regards, Suresh On Dec 6, 2013, at 12:27 PM, Colin McCabe cmcc...@alumni.cmu.edu wrote: If 2.4 is released in January, I think it's very unlikely to include symlinks. There is still a lot of work to be done before they're usable. You can look at the progress on HADOOP-10019. For some of the subtasks, it will require some community discussion before any code can be written. For better or worse, symlinks have not been requested by users as often as features like NFS export, HDFS caching, ACLs, etc, so effort has been focused on those instead. For now, I think we should put the symlinks-disabling patches (HADOOP-10020, etc) into branch-2, so that they will be part of the next releases without additional effort. I would like to see HDFS caching make it into 2.4. The APIs and implementation are beginning to stabilize, and around January it should be ok to backport to a stable branch. best, Colin On Thu, Nov 7, 2013 at 6:42 PM, Arun C Murthy a...@hortonworks.com wrote: Gang, Thinking through the next couple of releases here, appreciate f/b. # hadoop-2.2.1 I was looking through commit logs and there is a *lot* of content here (81 commits as on 11/7). Some are features/improvements and some are fixes - it's really hard to distinguish what is important and what isn't. I propose we start with a blank slate (i.e. blow away branch-2.2 and start fresh from a copy of branch-2.2.0) and then be very careful and meticulous about including only *blocker* fixes in branch-2.2. So, most of the content here comes via the next minor release (i.e. hadoop-2.3) In future, we continue to be *very* parsimonious about what gets into a patch release (major.minor.patch) - in general, these should be only *blocker* fixes or key operational issues. # hadoop-2.3 I'd like to propose the following features for YARN/MR to make it into hadoop-2.3 and punt the rest to hadoop-2.4 and beyond: * Application History Server - This is happening in a branch and is close; with it we can provide a reasonable experience for new frameworks being built on top of YARN. * Bug-fixes in RM Restart * Minimal support for long-running applications (e.g. security) via YARN-896 * RM Fail-over via ZKFC * Anything else? HDFS??? Overall, I feel like we have a decent chance of rolling hadoop-2.3 by the end of the year. Thoughts? thanks, Arun -- Arun C. Murthy Hortonworks Inc. http://hortonworks.com/ -- CONFIDENTIALITY NOTICE NOTICE: This message is intended for the use of the individual or entity to which it is addressed and may contain information that is confidential, privileged and exempt from disclosure under applicable law. If the reader of this message is not the intended recipient, you are hereby notified that any printing, copying, dissemination, distribution, disclosure or forwarding of this communication is strictly prohibited. If you have received this communication in error, please contact the sender immediately and delete it from your system. Thank You. -- http://hortonworks.com/download/ -- CONFIDENTIALITY NOTICE NOTICE: This message is intended for the use of the individual or entity to which it is addressed and may contain information that is confidential, privileged and exempt from disclosure under applicable law. If the reader of this message is not the intended recipient, you are hereby notified that any printing, copying, dissemination, distribution, disclosure or forwarding of this communication is strictly prohibited. If you have received this communication in error, please contact the sender immediately and delete it from your system. Thank You. -- Arun C. Murthy Hortonworks Inc. http://hortonworks.com/ -- CONFIDENTIALITY NOTICE NOTICE: This message is intended for the use of the individual or entity to which it is addressed and may contain information that is confidential, privileged and exempt from disclosure
Re: Next releases
Ok, I've updated https://wiki.apache.org/hadoop/Roadmap with a initial strawman list for hadoop-2.4 which I feel we can get out in Jan. What else would folks like to see? Please keep timeframe in mind. thanks, Arun On Dec 2, 2013, at 10:55 AM, Arun C Murthy a...@hortonworks.com wrote: On Nov 13, 2013, at 1:55 PM, Jason Lowe jl...@yahoo-inc.com wrote: +1 to limiting checkins of patch releases to Blockers/Criticals. If necessary committers check into trunk/branch-2 only and defer to the patch release manager for the patch release merge. Then there should be fewer surprises for everyone what ended up in a patch release and less likely the patch release becomes destabilized from the sheer amount of code churn. Maybe this won't be necessary if everyone understands that the patch release isn't the only way to get a change out in timely manner. I've updated https://wiki.apache.org/hadoop/Roadmap to reflect that we only put in Blocker/Critical bugs into Point Releases. Committers, from now, please exercise extreme caution when committing to a point release: they should only be limited to Blocker bugs. thanks, Arun -- Arun C. Murthy Hortonworks Inc. http://hortonworks.com/ -- CONFIDENTIALITY NOTICE NOTICE: This message is intended for the use of the individual or entity to which it is addressed and may contain information that is confidential, privileged and exempt from disclosure under applicable law. If the reader of this message is not the intended recipient, you are hereby notified that any printing, copying, dissemination, distribution, disclosure or forwarding of this communication is strictly prohibited. If you have received this communication in error, please contact the sender immediately and delete it from your system. Thank You.
Re: Next releases
On Dec 2, 2013, at 10:31 AM, Arun C Murthy a...@hortonworks.com wrote: Ok, looks like there are no objections. I'm starting the work to rename 2.2.1 to 2.3 now. Committers, please hold commits till I send out the all clear. Done. I've renamed 2.3 - 2.4 and 2.2.1 - 2.3. I'll create the first RC for 2.3 a week from now i.e. 12/9. thanks, Arun thanks, Arun On Nov 20, 2013, at 6:38 AM, Arun C Murthy a...@hortonworks.com wrote: Jason, I'm glad to see we are converging. I'll update the Roadmap wiki with details about major/minor/patch releases. Here is a straight-forward approach for now: I'll just roll contents of branch-2.2 as a 2.3-rc0 candidate right-away. This way we don't have to get embroiled in details of individual patches (there are too many). Next up, I'll roll 2.4 in December. Thoughts? thanks, Arun On Nov 13, 2013, at 1:55 PM, Jason Lowe jl...@yahoo-inc.com wrote: I think a lot of confusion comes from the fact that the 2.x line is starting to mature. Before this there wasn't such a big contention of what went into patch vs. minor releases and often the lines were blurred between the two. However now we have significant customers and products starting to use 2.x as a base, which means we need to start treating it like we treat 1.x. That means getting serious about what we should put into a patch release vs. what we postpone to a minor release. Here's my $0.02 on recent proposals: +1 to releasing more often in general. A lot of the rush to put changes into a patch release is because it can be a very long time between any kind of release. If minor releases are more frequent then I hope there would be less of a need to rush something or hold up a release. +1 to limiting checkins of patch releases to Blockers/Criticals. If necessary committers check into trunk/branch-2 only and defer to the patch release manager for the patch release merge. Then there should be fewer surprises for everyone what ended up in a patch release and less likely the patch release becomes destabilized from the sheer amount of code churn. Maybe this won't be necessary if everyone understands that the patch release isn't the only way to get a change out in timely manner. As for 2.2.1, again I think it's expectations for what that release means. If it's really just a patch release then there shouldn't be features in it and tons of code churn, but I think many were treating it as the next vehicle to deliver changes in general. If we think 2.2.1 is just as good or better than 2.2.0 then let's wrap it up and move to a more disciplined approach for subsequent patch releases and more frequent minor releases. Jason On 11/13/2013 12:10 PM, Arun C Murthy wrote: On Nov 12, 2013, at 1:54 PM, Todd Lipcon t...@cloudera.com wrote: On Mon, Nov 11, 2013 at 2:57 PM, Colin McCabe cmcc...@alumni.cmu.eduwrote: To be honest, I'm not aware of anything in 2.2.1 that shouldn't be there. However, I have only been following the HDFS and common side of things so I may not have the full picture. Arun, can you give a specific example of something you'd like to blow away? There are bunch of issues in YARN/MapReduce which clearly aren't *critical*, similarly in HDFS a cursory glance showed up some *enhancements*/*improvements* in CHANGES.txt which aren't necessary for a patch release, plus things like: HADOOP-9623 Update jets3t dependency to 0.9.0 Having said that, the HDFS devs know their code the best. I agree with Colin. If we've been backporting things into a patch release (third version component) which don't belong, we should explicitly call out those patches, so we can learn from our mistakes and have a discussion about what belongs. Good point. Here is a straw man proposal: A patch (third version) release should only include *blocker* bugs which are critical from an operational, security or data-integrity issues. This way, we can ensure that a minor series release (2.2.x or 2.3.x or 2.4.x) is always release-able, and more importantly, deploy-able at any point in time. Sandy did bring up a related point about timing of releases and the urge for everyone to cram features/fixes into a dot release. So, we could remedy that situation by doing a release every 4-6 weeks (2.3, 2.4 etc.) and keep the patch releases limited to blocker bugs. Thoughts? thanks, Arun -- Arun C. Murthy Hortonworks Inc. http://hortonworks.com/ -- Arun C. Murthy Hortonworks Inc. http://hortonworks.com/ -- Arun C. Murthy Hortonworks Inc. http://hortonworks.com/ -- CONFIDENTIALITY NOTICE NOTICE: This message is intended for the use of the individual or entity to which it is addressed and may contain information that is confidential, privileged and exempt from disclosure under applicable law. If the reader of this message is not
Re: Next releases
On Nov 13, 2013, at 1:55 PM, Jason Lowe jl...@yahoo-inc.com wrote: +1 to limiting checkins of patch releases to Blockers/Criticals. If necessary committers check into trunk/branch-2 only and defer to the patch release manager for the patch release merge. Then there should be fewer surprises for everyone what ended up in a patch release and less likely the patch release becomes destabilized from the sheer amount of code churn. Maybe this won't be necessary if everyone understands that the patch release isn't the only way to get a change out in timely manner. I've updated https://wiki.apache.org/hadoop/Roadmap to reflect that we only put in Blocker/Critical bugs into Point Releases. Committers, from now, please exercise extreme caution when committing to a point release: they should only be limited to Blocker bugs. thanks, Arun -- CONFIDENTIALITY NOTICE NOTICE: This message is intended for the use of the individual or entity to which it is addressed and may contain information that is confidential, privileged and exempt from disclosure under applicable law. If the reader of this message is not the intended recipient, you are hereby notified that any printing, copying, dissemination, distribution, disclosure or forwarding of this communication is strictly prohibited. If you have received this communication in error, please contact the sender immediately and delete it from your system. Thank You.
Re: Next releases
Jason, I'm glad to see we are converging. I'll update the Roadmap wiki with details about major/minor/patch releases. Here is a straight-forward approach for now: I'll just roll contents of branch-2.2 as a 2.3-rc0 candidate right-away. This way we don't have to get embroiled in details of individual patches (there are too many). Next up, I'll roll 2.4 in December. Thoughts? thanks, Arun On Nov 13, 2013, at 1:55 PM, Jason Lowe jl...@yahoo-inc.com wrote: I think a lot of confusion comes from the fact that the 2.x line is starting to mature. Before this there wasn't such a big contention of what went into patch vs. minor releases and often the lines were blurred between the two. However now we have significant customers and products starting to use 2.x as a base, which means we need to start treating it like we treat 1.x. That means getting serious about what we should put into a patch release vs. what we postpone to a minor release. Here's my $0.02 on recent proposals: +1 to releasing more often in general. A lot of the rush to put changes into a patch release is because it can be a very long time between any kind of release. If minor releases are more frequent then I hope there would be less of a need to rush something or hold up a release. +1 to limiting checkins of patch releases to Blockers/Criticals. If necessary committers check into trunk/branch-2 only and defer to the patch release manager for the patch release merge. Then there should be fewer surprises for everyone what ended up in a patch release and less likely the patch release becomes destabilized from the sheer amount of code churn. Maybe this won't be necessary if everyone understands that the patch release isn't the only way to get a change out in timely manner. As for 2.2.1, again I think it's expectations for what that release means. If it's really just a patch release then there shouldn't be features in it and tons of code churn, but I think many were treating it as the next vehicle to deliver changes in general. If we think 2.2.1 is just as good or better than 2.2.0 then let's wrap it up and move to a more disciplined approach for subsequent patch releases and more frequent minor releases. Jason On 11/13/2013 12:10 PM, Arun C Murthy wrote: On Nov 12, 2013, at 1:54 PM, Todd Lipcon t...@cloudera.com wrote: On Mon, Nov 11, 2013 at 2:57 PM, Colin McCabe cmcc...@alumni.cmu.eduwrote: To be honest, I'm not aware of anything in 2.2.1 that shouldn't be there. However, I have only been following the HDFS and common side of things so I may not have the full picture. Arun, can you give a specific example of something you'd like to blow away? There are bunch of issues in YARN/MapReduce which clearly aren't *critical*, similarly in HDFS a cursory glance showed up some *enhancements*/*improvements* in CHANGES.txt which aren't necessary for a patch release, plus things like: HADOOP-9623 Update jets3t dependency to 0.9.0 Having said that, the HDFS devs know their code the best. I agree with Colin. If we've been backporting things into a patch release (third version component) which don't belong, we should explicitly call out those patches, so we can learn from our mistakes and have a discussion about what belongs. Good point. Here is a straw man proposal: A patch (third version) release should only include *blocker* bugs which are critical from an operational, security or data-integrity issues. This way, we can ensure that a minor series release (2.2.x or 2.3.x or 2.4.x) is always release-able, and more importantly, deploy-able at any point in time. Sandy did bring up a related point about timing of releases and the urge for everyone to cram features/fixes into a dot release. So, we could remedy that situation by doing a release every 4-6 weeks (2.3, 2.4 etc.) and keep the patch releases limited to blocker bugs. Thoughts? thanks, Arun -- Arun C. Murthy Hortonworks Inc. http://hortonworks.com/ -- CONFIDENTIALITY NOTICE NOTICE: This message is intended for the use of the individual or entity to which it is addressed and may contain information that is confidential, privileged and exempt from disclosure under applicable law. If the reader of this message is not the intended recipient, you are hereby notified that any printing, copying, dissemination, distribution, disclosure or forwarding of this communication is strictly prohibited. If you have received this communication in error, please contact the sender immediately and delete it from your system. Thank You.
Re: Next releases
On Wed, Nov 13, 2013 at 10:10 AM, Arun C Murthy a...@hortonworks.com wrote: On Nov 12, 2013, at 1:54 PM, Todd Lipcon t...@cloudera.com wrote: On Mon, Nov 11, 2013 at 2:57 PM, Colin McCabe cmcc...@alumni.cmu.eduwrote: To be honest, I'm not aware of anything in 2.2.1 that shouldn't be there. However, I have only been following the HDFS and common side of things so I may not have the full picture. Arun, can you give a specific example of something you'd like to blow away? There are bunch of issues in YARN/MapReduce which clearly aren't *critical*, similarly in HDFS a cursory glance showed up some *enhancements*/*improvements* in CHANGES.txt which aren't necessary for a patch release, plus things like: HADOOP-9623 Update jets3t dependency to 0.9.0 I'm fine with reverting HADOOP-9623 from branch-2.2 and pushing it out to branch-2.3. It does bring in httpcore, a dependency that wasn't there before. Colin Here is a straw man proposal: A patch (third version) release should only include *blocker* bugs which are critical from an operational, security or data-integrity issues. This way, we can ensure that a minor series release (2.2.x or 2.3.x or 2.4.x) is always release-able, and more importantly, deploy-able at any point in time. Sandy did bring up a related point about timing of releases and the urge for everyone to cram features/fixes into a dot release. So, we could remedy that situation by doing a release every 4-6 weeks (2.3, 2.4 etc.) and keep the patch releases limited to blocker bugs. Thoughts? thanks, Arun -- CONFIDENTIALITY NOTICE NOTICE: This message is intended for the use of the individual or entity to which it is addressed and may contain information that is confidential, privileged and exempt from disclosure under applicable law. If the reader of this message is not the intended recipient, you are hereby notified that any printing, copying, dissemination, distribution, disclosure or forwarding of this communication is strictly prohibited. If you have received this communication in error, please contact the sender immediately and delete it from your system. Thank You.
Re: Next releases
On 14 November 2013 17:15, Colin McCabe cmcc...@alumni.cmu.edu wrote: HADOOP-9623 Update jets3t dependency to 0.9.0 I'm fine with reverting HADOOP-9623 from branch-2.2 and pushing it out to branch-2.3. It does bring in httpcore, a dependency that wasn't there before. I think http components came in 2.3 with the openstack package -if that'sthe same one Also, I'm doing a step-by-step update of all the dependencies for 2.3+, the jet3t is a major code change, but there a lots of others to deal with, even if we leave jetty and friends alone : https://issues.apache.org/jira/browse/HADOOP-9991 -- CONFIDENTIALITY NOTICE NOTICE: This message is intended for the use of the individual or entity to which it is addressed and may contain information that is confidential, privileged and exempt from disclosure under applicable law. If the reader of this message is not the intended recipient, you are hereby notified that any printing, copying, dissemination, distribution, disclosure or forwarding of this communication is strictly prohibited. If you have received this communication in error, please contact the sender immediately and delete it from your system. Thank You.
Re: Next releases
On 13 November 2013 18:11, Arun C Murthy a...@hortonworks.com wrote: HADOOP-9623 Update jets3t dependency to 0.9.0 I saw that change -I don't think its a bad one, but I do think we need more testing of blobstores especially big operations, like 6Gb uploads (which should now work with the 0.9.0 jets3t). -- CONFIDENTIALITY NOTICE NOTICE: This message is intended for the use of the individual or entity to which it is addressed and may contain information that is confidential, privileged and exempt from disclosure under applicable law. If the reader of this message is not the intended recipient, you are hereby notified that any printing, copying, dissemination, distribution, disclosure or forwarding of this communication is strictly prohibited. If you have received this communication in error, please contact the sender immediately and delete it from your system. Thank You.
Re: Next releases
Here are few patches that I put into 2.2.1 and are minimally invasive, but I don't think are blockers: YARN-305. Fair scheduler logs too many Node offered to app messages. YARN-1335. Move duplicate code from FSSchedulerApp and FiCaSchedulerApp into SchedulerApplication YARN-1333. Support blacklisting in the Fair Scheduler YARN-1109. Demote NodeManager Sending out status for container logs to debug (haosdent via Sandy Ryza) YARN-1388. Fair Scheduler page always displays blank fair share +1 to doing releases at some fixed time interval. -Sandy On Wed, Nov 13, 2013 at 10:10 AM, Arun C Murthy a...@hortonworks.com wrote: On Nov 12, 2013, at 1:54 PM, Todd Lipcon t...@cloudera.com wrote: On Mon, Nov 11, 2013 at 2:57 PM, Colin McCabe cmcc...@alumni.cmu.edu wrote: To be honest, I'm not aware of anything in 2.2.1 that shouldn't be there. However, I have only been following the HDFS and common side of things so I may not have the full picture. Arun, can you give a specific example of something you'd like to blow away? There are bunch of issues in YARN/MapReduce which clearly aren't *critical*, similarly in HDFS a cursory glance showed up some *enhancements*/*improvements* in CHANGES.txt which aren't necessary for a patch release, plus things like: HADOOP-9623 Update jets3t dependency to 0.9.0 Having said that, the HDFS devs know their code the best. I agree with Colin. If we've been backporting things into a patch release (third version component) which don't belong, we should explicitly call out those patches, so we can learn from our mistakes and have a discussion about what belongs. Good point. Here is a straw man proposal: A patch (third version) release should only include *blocker* bugs which are critical from an operational, security or data-integrity issues. This way, we can ensure that a minor series release (2.2.x or 2.3.x or 2.4.x) is always release-able, and more importantly, deploy-able at any point in time. Sandy did bring up a related point about timing of releases and the urge for everyone to cram features/fixes into a dot release. So, we could remedy that situation by doing a release every 4-6 weeks (2.3, 2.4 etc.) and keep the patch releases limited to blocker bugs. Thoughts? thanks, Arun -- CONFIDENTIALITY NOTICE NOTICE: This message is intended for the use of the individual or entity to which it is addressed and may contain information that is confidential, privileged and exempt from disclosure under applicable law. If the reader of this message is not the intended recipient, you are hereby notified that any printing, copying, dissemination, distribution, disclosure or forwarding of this communication is strictly prohibited. If you have received this communication in error, please contact the sender immediately and delete it from your system. Thank You.
Re: Next releases
On Nov 13, 2013, at 12:38 PM, Sandy Ryza sandy.r...@cloudera.com wrote: Here are few patches that I put into 2.2.1 and are minimally invasive, but I don't think are blockers: YARN-305. Fair scheduler logs too many Node offered to app messages. YARN-1335. Move duplicate code from FSSchedulerApp and FiCaSchedulerApp into SchedulerApplication YARN-1333. Support blacklisting in the Fair Scheduler YARN-1109. Demote NodeManager Sending out status for container logs to debug (haosdent via Sandy Ryza) YARN-1388. Fair Scheduler page always displays blank fair share +1 to doing releases at some fixed time interval. To be clear, I still think we should be *very* clear about what features we target for each release (2.3, 2.4, etc.). Except, we don't wait infinitely for any specific feature - if we miss a 4-6 week window a feature goes to the next train. Makes sense? thanks, Arun -Sandy On Wed, Nov 13, 2013 at 10:10 AM, Arun C Murthy a...@hortonworks.com wrote: On Nov 12, 2013, at 1:54 PM, Todd Lipcon t...@cloudera.com wrote: On Mon, Nov 11, 2013 at 2:57 PM, Colin McCabe cmcc...@alumni.cmu.edu wrote: To be honest, I'm not aware of anything in 2.2.1 that shouldn't be there. However, I have only been following the HDFS and common side of things so I may not have the full picture. Arun, can you give a specific example of something you'd like to blow away? There are bunch of issues in YARN/MapReduce which clearly aren't *critical*, similarly in HDFS a cursory glance showed up some *enhancements*/*improvements* in CHANGES.txt which aren't necessary for a patch release, plus things like: HADOOP-9623 Update jets3t dependency to 0.9.0 Having said that, the HDFS devs know their code the best. I agree with Colin. If we've been backporting things into a patch release (third version component) which don't belong, we should explicitly call out those patches, so we can learn from our mistakes and have a discussion about what belongs. Good point. Here is a straw man proposal: A patch (third version) release should only include *blocker* bugs which are critical from an operational, security or data-integrity issues. This way, we can ensure that a minor series release (2.2.x or 2.3.x or 2.4.x) is always release-able, and more importantly, deploy-able at any point in time. Sandy did bring up a related point about timing of releases and the urge for everyone to cram features/fixes into a dot release. So, we could remedy that situation by doing a release every 4-6 weeks (2.3, 2.4 etc.) and keep the patch releases limited to blocker bugs. Thoughts? thanks, Arun -- CONFIDENTIALITY NOTICE NOTICE: This message is intended for the use of the individual or entity to which it is addressed and may contain information that is confidential, privileged and exempt from disclosure under applicable law. If the reader of this message is not the intended recipient, you are hereby notified that any printing, copying, dissemination, distribution, disclosure or forwarding of this communication is strictly prohibited. If you have received this communication in error, please contact the sender immediately and delete it from your system. Thank You. -- Arun C. Murthy Hortonworks Inc. http://hortonworks.com/ -- CONFIDENTIALITY NOTICE NOTICE: This message is intended for the use of the individual or entity to which it is addressed and may contain information that is confidential, privileged and exempt from disclosure under applicable law. If the reader of this message is not the intended recipient, you are hereby notified that any printing, copying, dissemination, distribution, disclosure or forwarding of this communication is strictly prohibited. If you have received this communication in error, please contact the sender immediately and delete it from your system. Thank You.
Re: Next releases
Sound goods, just a little impedance between what seem to be 2 conflicting goals: * what features we target for each release * train releases If we want to do train releases at fixed times, then if a feature is not ready, it catches the next train, no delays of the train because of a feature. If a bug is delaying the train and a feature becomes ready in the mean time and it does not stabilize the release, it can jump on board, if it breaks something, it goes out of the window until the next train. Also we have do decide what we do with 2.2.1. I would say start wrapping up the current 2.2 branch and make it the first train. thx On Wed, Nov 13, 2013 at 12:55 PM, Arun C Murthy a...@hortonworks.com wrote: On Nov 13, 2013, at 12:38 PM, Sandy Ryza sandy.r...@cloudera.com wrote: Here are few patches that I put into 2.2.1 and are minimally invasive, but I don't think are blockers: YARN-305. Fair scheduler logs too many Node offered to app messages. YARN-1335. Move duplicate code from FSSchedulerApp and FiCaSchedulerApp into SchedulerApplication YARN-1333. Support blacklisting in the Fair Scheduler YARN-1109. Demote NodeManager Sending out status for container logs to debug (haosdent via Sandy Ryza) YARN-1388. Fair Scheduler page always displays blank fair share +1 to doing releases at some fixed time interval. To be clear, I still think we should be *very* clear about what features we target for each release (2.3, 2.4, etc.). Except, we don't wait infinitely for any specific feature - if we miss a 4-6 week window a feature goes to the next train. Makes sense? thanks, Arun -Sandy On Wed, Nov 13, 2013 at 10:10 AM, Arun C Murthy a...@hortonworks.com wrote: On Nov 12, 2013, at 1:54 PM, Todd Lipcon t...@cloudera.com wrote: On Mon, Nov 11, 2013 at 2:57 PM, Colin McCabe cmcc...@alumni.cmu.edu wrote: To be honest, I'm not aware of anything in 2.2.1 that shouldn't be there. However, I have only been following the HDFS and common side of things so I may not have the full picture. Arun, can you give a specific example of something you'd like to blow away? There are bunch of issues in YARN/MapReduce which clearly aren't *critical*, similarly in HDFS a cursory glance showed up some *enhancements*/*improvements* in CHANGES.txt which aren't necessary for a patch release, plus things like: HADOOP-9623 Update jets3t dependency to 0.9.0 Having said that, the HDFS devs know their code the best. I agree with Colin. If we've been backporting things into a patch release (third version component) which don't belong, we should explicitly call out those patches, so we can learn from our mistakes and have a discussion about what belongs. Good point. Here is a straw man proposal: A patch (third version) release should only include *blocker* bugs which are critical from an operational, security or data-integrity issues. This way, we can ensure that a minor series release (2.2.x or 2.3.x or 2.4.x) is always release-able, and more importantly, deploy-able at any point in time. Sandy did bring up a related point about timing of releases and the urge for everyone to cram features/fixes into a dot release. So, we could remedy that situation by doing a release every 4-6 weeks (2.3, 2.4 etc.) and keep the patch releases limited to blocker bugs. Thoughts? thanks, Arun -- CONFIDENTIALITY NOTICE NOTICE: This message is intended for the use of the individual or entity to which it is addressed and may contain information that is confidential, privileged and exempt from disclosure under applicable law. If the reader of this message is not the intended recipient, you are hereby notified that any printing, copying, dissemination, distribution, disclosure or forwarding of this communication is strictly prohibited. If you have received this communication in error, please contact the sender immediately and delete it from your system. Thank You. -- Arun C. Murthy Hortonworks Inc. http://hortonworks.com/ -- CONFIDENTIALITY NOTICE NOTICE: This message is intended for the use of the individual or entity to which it is addressed and may contain information that is confidential, privileged and exempt from disclosure under applicable law. If the reader of this message is not the intended recipient, you are hereby notified that any printing, copying, dissemination, distribution, disclosure or forwarding of this communication is strictly prohibited. If you have received this communication in error, please contact the sender immediately and delete it from your system. Thank You. -- Alejandro
Re: Next releases
I think a lot of confusion comes from the fact that the 2.x line is starting to mature. Before this there wasn't such a big contention of what went into patch vs. minor releases and often the lines were blurred between the two. However now we have significant customers and products starting to use 2.x as a base, which means we need to start treating it like we treat 1.x. That means getting serious about what we should put into a patch release vs. what we postpone to a minor release. Here's my $0.02 on recent proposals: +1 to releasing more often in general. A lot of the rush to put changes into a patch release is because it can be a very long time between any kind of release. If minor releases are more frequent then I hope there would be less of a need to rush something or hold up a release. +1 to limiting checkins of patch releases to Blockers/Criticals. If necessary committers check into trunk/branch-2 only and defer to the patch release manager for the patch release merge. Then there should be fewer surprises for everyone what ended up in a patch release and less likely the patch release becomes destabilized from the sheer amount of code churn. Maybe this won't be necessary if everyone understands that the patch release isn't the only way to get a change out in timely manner. As for 2.2.1, again I think it's expectations for what that release means. If it's really just a patch release then there shouldn't be features in it and tons of code churn, but I think many were treating it as the next vehicle to deliver changes in general. If we think 2.2.1 is just as good or better than 2.2.0 then let's wrap it up and move to a more disciplined approach for subsequent patch releases and more frequent minor releases. Jason On 11/13/2013 12:10 PM, Arun C Murthy wrote: On Nov 12, 2013, at 1:54 PM, Todd Lipcon t...@cloudera.com wrote: On Mon, Nov 11, 2013 at 2:57 PM, Colin McCabe cmcc...@alumni.cmu.eduwrote: To be honest, I'm not aware of anything in 2.2.1 that shouldn't be there. However, I have only been following the HDFS and common side of things so I may not have the full picture. Arun, can you give a specific example of something you'd like to blow away? There are bunch of issues in YARN/MapReduce which clearly aren't *critical*, similarly in HDFS a cursory glance showed up some *enhancements*/*improvements* in CHANGES.txt which aren't necessary for a patch release, plus things like: HADOOP-9623 Update jets3t dependency to 0.9.0 Having said that, the HDFS devs know their code the best. I agree with Colin. If we've been backporting things into a patch release (third version component) which don't belong, we should explicitly call out those patches, so we can learn from our mistakes and have a discussion about what belongs. Good point. Here is a straw man proposal: A patch (third version) release should only include *blocker* bugs which are critical from an operational, security or data-integrity issues. This way, we can ensure that a minor series release (2.2.x or 2.3.x or 2.4.x) is always release-able, and more importantly, deploy-able at any point in time. Sandy did bring up a related point about timing of releases and the urge for everyone to cram features/fixes into a dot release. So, we could remedy that situation by doing a release every 4-6 weeks (2.3, 2.4 etc.) and keep the patch releases limited to blocker bugs. Thoughts? thanks, Arun
Re: Next releases
On Mon, Nov 11, 2013 at 2:57 PM, Colin McCabe cmcc...@alumni.cmu.eduwrote: To be honest, I'm not aware of anything in 2.2.1 that shouldn't be there. However, I have only been following the HDFS and common side of things so I may not have the full picture. Arun, can you give a specific example of something you'd like to blow away? I agree with Colin. If we've been backporting things into a patch release (third version component) which don't belong, we should explicitly call out those patches, so we can learn from our mistakes and have a discussion about what belongs. Otherwise we'll just end up doing it again. Saying there were a few mistakes, so let's reset back a bunch of backport work seems like a baby-with-the-bathwater situation. Todd
Re: Next releases
Hi Arun, Another feature that would be relevant and got deferred was the symlink work (HADOOP-10020) that Colin and Andrew were working on. Can we include this in hadoop-2.3.0 also? thanks hari On Sun, Nov 10, 2013 at 2:07 PM, Alejandro Abdelnur t...@cloudera.comwrote: Arun, thanks for jumping on this. On hadoop branch-2.2. I've quickly scanned the commit logs starting from the 2.2.0 release and I've found around 20 JIRAs that I like seeing in 2.2.1. Not all of them are bugs but the don't shake anything and improve usability. I presume others will have their own laundry lists as well and I wonder the union of all of them how much adds up to the current 81 commits. How about splitting the JIRAs among a few contributors to assert there is nothing risky in there? And if so get discuss getting rid of those commits for 2.2.1. IMO doing that would be cheaper than selectively applying commits on a fresh branch. Said this, I think we should get 2.2.1 out of the door before switching main efforts to 2.3.0. I volunteer myself to drive 2.2.1 a release if ASAP if you don't have the bandwidth at the moment for it. Cheers. Alejandro Commits in branch-2.2 that I'd like them to be in the 2.2.1 release: The ones prefixed with '*' technically are not bugs. YARN-1284. LCE: Race condition leaves dangling cgroups entries for killed containers. (Alejandro Abdelnur via Sandy Ryza) YARN-1265. Fair Scheduler chokes on unhealthy node reconnect (Sandy Ryza) YARN-1044. used/min/max resources do not display info in the scheduler page (Sangjin Lee via Sandy Ryza) YARN-305. Fair scheduler logs too many Node offered to app messages. (Lohit Vijayarenu via Sandy Ryza) *MAPREDUCE-5463. Deprecate SLOTS_MILLIS counters. (Tzuyoshi Ozawa via Sandy Ryza) YARN-1259. In Fair Scheduler web UI, queue num pending and num active apps switched. (Robert Kanter via Sandy Ryza) YARN-1295. In UnixLocalWrapperScriptBuilder, using bash -c can cause Text file busy errors. (Sandy Ryza) *MAPREDUCE-5457. Add a KeyOnlyTextOutputReader to enable streaming to write out text files without separators (Sandy Ryza) *YARN-1258. Allow configuring the Fair Scheduler root queue (Sandy Ryza) *YARN-1288. Make Fair Scheduler ACLs more user friendly (Sandy Ryza) YARN-1330. Fair Scheduler: defaultQueueSchedulingPolicy does not take effect (Sandy Ryza) HDFS-5403. WebHdfs client cannot communicate with older WebHdfs servers post HDFS-5306. Contributed by Aaron T. Myers. *YARN-1335. Move duplicate code from FSSchedulerApp and FiCaSchedulerApp into SchedulerApplication (Sandy Ryza) *YARN-1333. Support blacklisting in the Fair Scheduler (Tsuyoshi Ozawa via Sandy Ryza) *MAPREDUCE-4680. Job history cleaner should only check timestamps of files in old enough directories (Robert Kanter via Sandy Ryza) YARN-1109. Demote NodeManager Sending out status for container logs to debug (haosdent via Sandy Ryza) *YARN-1321. Changed NMTokenCache to support both singleton and an instance usage. Contributed by Alejandro Abdelnur YARN-1343. NodeManagers additions/restarts are not reported as node updates in AllocateResponse responses to AMs. (tucu) YARN-1381. Same relaxLocality appears twice in exception message of AMRMClientImpl#checkLocalityRelaxationConflict() (Ted Yu via Sandy Ryza) HADOOP-9898. Set SO_KEEPALIVE on all our sockets. Contributed by Todd Lipcon. YARN-1388. Fair Scheduler page always displays blank fair share (Liyin Liang via Sandy Ryza) On Fri, Nov 8, 2013 at 10:35 PM, Chris Nauroth cnaur...@hortonworks.com wrote: Arun, what are your thoughts on test-only patches? I know I've been merging a lot of Windows test stabilization patches down to branch-2.2. These can't rightly be called blockers, but they do improve dev experience, and there is no risk to product code. Chris Nauroth Hortonworks http://hortonworks.com/ On Fri, Nov 8, 2013 at 1:30 AM, Steve Loughran ste...@hortonworks.com wrote: On 8 November 2013 02:42, Arun C Murthy a...@hortonworks.com wrote: Gang, Thinking through the next couple of releases here, appreciate f/b. # hadoop-2.2.1 I was looking through commit logs and there is a *lot* of content here (81 commits as on 11/7). Some are features/improvements and some are fixes - it's really hard to distinguish what is important and what isn't. I propose we start with a blank slate (i.e. blow away branch-2.2 and start fresh from a copy of branch-2.2.0) and then be very careful and meticulous about including only *blocker* fixes in branch-2.2. So, most of the content here comes via the next minor release (i.e. hadoop-2.3) In future, we continue to be *very* parsimonious about what gets into a patch release (major.minor.patch) - in general, these should be only *blocker* fixes or
Re: Next releases
Starting afresh with 2.2.1 and keeping it as small as possible sounds reasonable to me. Would love to get 2.3 out soon. To that end, how would people feel about having code and/or feature freeze and/or ship dates? We've been way behind out goals for recent releases. Having actual targets on the calendar would help us achieve the regular release cadence that can really benefit both downstream projects and ourselves. Without this, we get in the habit of stuffing changes into the closest release, even if minor or patch, for fear that the next will be many months away. Vinod, Zhijie, or Mayank, please correct me if I'm wrong, but my intuition is that end-of-year is a little unrealistic for YARN-321. Will the ~30 working days before the end of the year be enough to complete the feature, stabilize it, bring APIs to the point that we won't need to break them in the future, and merge the branch? Could we either push the feature out or aim for the end of January instead? Assuming the latter, some strawman dates: Feature freeze - 1/6 Code freeze - 1/16 Release - 1/30 Lastly, I'd like to add finer-grained CPU scheduling a la YARN-1089 to the target features. thanks, Sandy On Thu, Nov 7, 2013 at 6:42 PM, Arun C Murthy a...@hortonworks.com wrote: Gang, Thinking through the next couple of releases here, appreciate f/b. # hadoop-2.2.1 I was looking through commit logs and there is a *lot* of content here (81 commits as on 11/7). Some are features/improvements and some are fixes - it's really hard to distinguish what is important and what isn't. I propose we start with a blank slate (i.e. blow away branch-2.2 and start fresh from a copy of branch-2.2.0) and then be very careful and meticulous about including only *blocker* fixes in branch-2.2. So, most of the content here comes via the next minor release (i.e. hadoop-2.3) In future, we continue to be *very* parsimonious about what gets into a patch release (major.minor.patch) - in general, these should be only *blocker* fixes or key operational issues. # hadoop-2.3 I'd like to propose the following features for YARN/MR to make it into hadoop-2.3 and punt the rest to hadoop-2.4 and beyond: * Application History Server - This is happening in a branch and is close; with it we can provide a reasonable experience for new frameworks being built on top of YARN. * Bug-fixes in RM Restart * Minimal support for long-running applications (e.g. security) via YARN-896 * RM Fail-over via ZKFC * Anything else? HDFS??? Overall, I feel like we have a decent chance of rolling hadoop-2.3 by the end of the year. Thoughts? thanks, Arun -- Arun C. Murthy Hortonworks Inc. http://hortonworks.com/ -- CONFIDENTIALITY NOTICE NOTICE: This message is intended for the use of the individual or entity to which it is addressed and may contain information that is confidential, privileged and exempt from disclosure under applicable law. If the reader of this message is not the intended recipient, you are hereby notified that any printing, copying, dissemination, distribution, disclosure or forwarding of this communication is strictly prohibited. If you have received this communication in error, please contact the sender immediately and delete it from your system. Thank You.
Re: Next releases
Hi Arun, Thanks for working out this list which looks great to me. In addition, I would like to add an item: YARN-291 to 2.3 release which enhance YARN's resource elasticity in cloud scenario and can benefit other scenarios i.e. graceful NM decommission (YARN-914), non job/app regression (or maintenance model) in NM rolling upgrade (YARN-671), etc. With great help from Luke, Bikas and Vinod, we already get the first and the most important work (YARN-311) in. Now, I am working on the left parts include: interfaces (RPC, CLI, REST, etc.) and a few enhancements (persistent, supporting different policies, etc.) and be optimistic on completing most of work by the end of 2013. Would you help to embrace it in if we can make it on time? :) Thanks, Junping - Original Message - From: Arun C Murthy a...@hortonworks.com To: common-dev@hadoop.apache.org, hdfs-...@hadoop.apache.org, yarn-...@hadoop.apache.org, mapreduce-...@hadoop.apache.org Sent: Friday, November 8, 2013 10:42:36 AM Subject: Next releases Gang, Thinking through the next couple of releases here, appreciate f/b. # hadoop-2.2.1 I was looking through commit logs and there is a *lot* of content here (81 commits as on 11/7). Some are features/improvements and some are fixes - it's really hard to distinguish what is important and what isn't. I propose we start with a blank slate (i.e. blow away branch-2.2 and start fresh from a copy of branch-2.2.0) and then be very careful and meticulous about including only *blocker* fixes in branch-2.2. So, most of the content here comes via the next minor release (i.e. hadoop-2.3) In future, we continue to be *very* parsimonious about what gets into a patch release (major.minor.patch) - in general, these should be only *blocker* fixes or key operational issues. # hadoop-2.3 I'd like to propose the following features for YARN/MR to make it into hadoop-2.3 and punt the rest to hadoop-2.4 and beyond: * Application History Server - This is happening in a branch and is close; with it we can provide a reasonable experience for new frameworks being built on top of YARN. * Bug-fixes in RM Restart * Minimal support for long-running applications (e.g. security) via YARN-896 * RM Fail-over via ZKFC * Anything else? HDFS??? Overall, I feel like we have a decent chance of rolling hadoop-2.3 by the end of the year. Thoughts? thanks, Arun -- Arun C. Murthy Hortonworks Inc. http://hortonworks.com/ -- CONFIDENTIALITY NOTICE NOTICE: This message is intended for the use of the individual or entity to which it is addressed and may contain information that is confidential, privileged and exempt from disclosure under applicable law. If the reader of this message is not the intended recipient, you are hereby notified that any printing, copying, dissemination, distribution, disclosure or forwarding of this communication is strictly prohibited. If you have received this communication in error, please contact the sender immediately and delete it from your system. Thank You
Re: Next releases
On 8 November 2013 02:42, Arun C Murthy a...@hortonworks.com wrote: Gang, Thinking through the next couple of releases here, appreciate f/b. # hadoop-2.2.1 I was looking through commit logs and there is a *lot* of content here (81 commits as on 11/7). Some are features/improvements and some are fixes - it's really hard to distinguish what is important and what isn't. I propose we start with a blank slate (i.e. blow away branch-2.2 and start fresh from a copy of branch-2.2.0) and then be very careful and meticulous about including only *blocker* fixes in branch-2.2. So, most of the content here comes via the next minor release (i.e. hadoop-2.3) In future, we continue to be *very* parsimonious about what gets into a patch release (major.minor.patch) - in general, these should be only *blocker* fixes or key operational issues. +1 # hadoop-2.3 I'd like to propose the following features for YARN/MR to make it into hadoop-2.3 and punt the rest to hadoop-2.4 and beyond: * Application History Server - This is happening in a branch and is close; with it we can provide a reasonable experience for new frameworks being built on top of YARN. * Bug-fixes in RM Restart * Minimal support for long-running applications (e.g. security) via YARN-896 +1 -the complete set isn't going to make it, but I'm sure we can identify the key ones * RM Fail-over via ZKFC * Anything else? HDFS??? - If I had the time, I'd like to do some work on the HADOOP-9361 filesystem spec tests -this is mostly some specification, the basis of a better test framework for newer FS tests, and some more tests, with a couple of minor changes to some of the FS code, mainly in terms of tightening some of the exceptions thrown (IOE - EOF) otherwise: - I'd like the hadoop-openstack JAR in; it's already in branch-2 so it's a matter of ensuring testing during the release against as many providers as possible. - There are a fair few JIRAs about updating versions of dependencies -the S3 JetS3t update went in this week, but there are more, as well as cruft in the POMs which shows up downstream. I think we could update the low-risk dependencies (test-time, log4j, c), while avoiding those we know will be trouble (jetty). This may seem minor but it does make a big diff to the downstream projects. -- CONFIDENTIALITY NOTICE NOTICE: This message is intended for the use of the individual or entity to which it is addressed and may contain information that is confidential, privileged and exempt from disclosure under applicable law. If the reader of this message is not the intended recipient, you are hereby notified that any printing, copying, dissemination, distribution, disclosure or forwarding of this communication is strictly prohibited. If you have received this communication in error, please contact the sender immediately and delete it from your system. Thank You.
Re: Next releases
Arun, what are your thoughts on test-only patches? I know I've been merging a lot of Windows test stabilization patches down to branch-2.2. These can't rightly be called blockers, but they do improve dev experience, and there is no risk to product code. Chris Nauroth Hortonworks http://hortonworks.com/ On Fri, Nov 8, 2013 at 1:30 AM, Steve Loughran ste...@hortonworks.comwrote: On 8 November 2013 02:42, Arun C Murthy a...@hortonworks.com wrote: Gang, Thinking through the next couple of releases here, appreciate f/b. # hadoop-2.2.1 I was looking through commit logs and there is a *lot* of content here (81 commits as on 11/7). Some are features/improvements and some are fixes - it's really hard to distinguish what is important and what isn't. I propose we start with a blank slate (i.e. blow away branch-2.2 and start fresh from a copy of branch-2.2.0) and then be very careful and meticulous about including only *blocker* fixes in branch-2.2. So, most of the content here comes via the next minor release (i.e. hadoop-2.3) In future, we continue to be *very* parsimonious about what gets into a patch release (major.minor.patch) - in general, these should be only *blocker* fixes or key operational issues. +1 # hadoop-2.3 I'd like to propose the following features for YARN/MR to make it into hadoop-2.3 and punt the rest to hadoop-2.4 and beyond: * Application History Server - This is happening in a branch and is close; with it we can provide a reasonable experience for new frameworks being built on top of YARN. * Bug-fixes in RM Restart * Minimal support for long-running applications (e.g. security) via YARN-896 +1 -the complete set isn't going to make it, but I'm sure we can identify the key ones * RM Fail-over via ZKFC * Anything else? HDFS??? - If I had the time, I'd like to do some work on the HADOOP-9361 filesystem spec tests -this is mostly some specification, the basis of a better test framework for newer FS tests, and some more tests, with a couple of minor changes to some of the FS code, mainly in terms of tightening some of the exceptions thrown (IOE - EOF) otherwise: - I'd like the hadoop-openstack JAR in; it's already in branch-2 so it's a matter of ensuring testing during the release against as many providers as possible. - There are a fair few JIRAs about updating versions of dependencies -the S3 JetS3t update went in this week, but there are more, as well as cruft in the POMs which shows up downstream. I think we could update the low-risk dependencies (test-time, log4j, c), while avoiding those we know will be trouble (jetty). This may seem minor but it does make a big diff to the downstream projects. -- CONFIDENTIALITY NOTICE NOTICE: This message is intended for the use of the individual or entity to which it is addressed and may contain information that is confidential, privileged and exempt from disclosure under applicable law. If the reader of this message is not the intended recipient, you are hereby notified that any printing, copying, dissemination, distribution, disclosure or forwarding of this communication is strictly prohibited. If you have received this communication in error, please contact the sender immediately and delete it from your system. Thank You. -- CONFIDENTIALITY NOTICE NOTICE: This message is intended for the use of the individual or entity to which it is addressed and may contain information that is confidential, privileged and exempt from disclosure under applicable law. If the reader of this message is not the intended recipient, you are hereby notified that any printing, copying, dissemination, distribution, disclosure or forwarding of this communication is strictly prohibited. If you have received this communication in error, please contact the sender immediately and delete it from your system. Thank You.
Next releases
Gang, Thinking through the next couple of releases here, appreciate f/b. # hadoop-2.2.1 I was looking through commit logs and there is a *lot* of content here (81 commits as on 11/7). Some are features/improvements and some are fixes - it's really hard to distinguish what is important and what isn't. I propose we start with a blank slate (i.e. blow away branch-2.2 and start fresh from a copy of branch-2.2.0) and then be very careful and meticulous about including only *blocker* fixes in branch-2.2. So, most of the content here comes via the next minor release (i.e. hadoop-2.3) In future, we continue to be *very* parsimonious about what gets into a patch release (major.minor.patch) - in general, these should be only *blocker* fixes or key operational issues. # hadoop-2.3 I'd like to propose the following features for YARN/MR to make it into hadoop-2.3 and punt the rest to hadoop-2.4 and beyond: * Application History Server - This is happening in a branch and is close; with it we can provide a reasonable experience for new frameworks being built on top of YARN. * Bug-fixes in RM Restart * Minimal support for long-running applications (e.g. security) via YARN-896 * RM Fail-over via ZKFC * Anything else? HDFS??? Overall, I feel like we have a decent chance of rolling hadoop-2.3 by the end of the year. Thoughts? thanks, Arun -- Arun C. Murthy Hortonworks Inc. http://hortonworks.com/ -- CONFIDENTIALITY NOTICE NOTICE: This message is intended for the use of the individual or entity to which it is addressed and may contain information that is confidential, privileged and exempt from disclosure under applicable law. If the reader of this message is not the intended recipient, you are hereby notified that any printing, copying, dissemination, distribution, disclosure or forwarding of this communication is strictly prohibited. If you have received this communication in error, please contact the sender immediately and delete it from your system. Thank You.