Re: Hadoop 2.0 Support for Accumulo 1.4 Branch
The main thing is that I would not want to see an ACCUMULO-1790 *without* ACCUMULO-1795. Having 1792 alone would be insufficient for me. -- Christopher L Tubbs II http://gravatar.com/ctubbsii On Tue, Nov 12, 2013 at 9:22 AM, Sean Busbey bus...@clouderagovt.com wrote: On Fri, Oct 18, 2013 at 12:29 AM, Sean Busbey bus...@cloudera.com wrote: On Tue, Oct 15, 2013 at 10:20 AM, Sean Busbey bus...@cloudera.com wrote: On Tue, Oct 15, 2013 at 10:16 AM, Sean Busbey bus...@cloudera.comwrote: On Tue, Oct 15, 2013 at 7:16 AM, dlmar...@comcast.net wrote: Just to be clear, we are talking about adding profile support to the pom's for Hadoop 2.2.0 for a 1.4.5 and 1.5.1 release, correct? We are not talking about changing the default build profile for these branches are we? for 1.4.5-SNAPSHOT I am only talking about adding support Hadoop 2.2.0. I am not suggesting we change the default from building against Hadoop 0.23.203. I mean 0.20.203.0. Ugh, Hadoop versions. Okay, barring additional suggestions, tomorrow afternoon I'll break things down into an umbrella and 3 sub tasks: 1) addition of hadoop 2 support - to include backports of commits - to include making the target hadoop 2 version 2.2.0 - to include test changes that flex hadoop 2 features like fail over 2) ensuring compatibility for 0.20.203 - presuming some subset of the commits in 1) will break it since 0.20 support was left behind in 1.5 3) doc / packaging updates - the issue of binary releases per distro - doc patch for what version(s) the release tests are expected to run against Once work is put against those tickets, I'd expect things to go into a branch based on the umbrella ticket until such time as the complete work can pass the test suite that we'll use at the next release. Then it can get rebased onto the 1.4.x dev branch. -- Sean Based on recent feedback on ACCUMULO-1792 and ACCUMULO-1795, I want to resurrect this thread to make sure everyone's concerns are addressed. For context, here's a link to the start of the last thread: http://bit.ly/1aPqKuH From ACCUMULO-1792, ctubbsii: I'd be reluctant to support any Hadoop 2.x support in the 1.4 release line that breaks compatibility with 0.20. I don't think breaking 0.20 and then possibly fixing it again as a second step is acceptable (because that subsequent work may not ever be done, and I don't think we should break the compatibility contract that we've established with 1.4.0). Chris, I believe keeping all of the work in a branch under the umbrella jira of ACCUMULO-1790 will ensure that we don't end up with a 1.4 release that doesn't have proper support for 0.20.203. Is there something beyond making sure the branch passes a full set of release tests on 0.20.203 that you'd like to see? In the event that the branch only ever contains the work for adding Hadoop 2, it's a simple matter to abandon without rolling into the 1.4 development line. From ACCUMULO-1795, bills (and +1ed by elserj and ctubbsii): I'm very uncomfortable with risking breaking continuity in such an old release, and I don't think managing two lines of 1.4 releases is worth the effort. Though we have no official EOL policy, 1.3 was practically dead in the water once 1.4 was around, and I hope we start encouraging more adoption of 1.5 (and soon 1.6) versus continually propping up 1.4. I'd love to get people to move off of 1.4. However, I think adding Hadoop 2 support to 1.4 encourages this more than leaving it out. Accumulo 1.5.x places a higher burden on HDFS than 1.4 did, and I'm not surprised people find relying on 0.20 for the 1.5 WAL intimidating. Upgrading both HDFS and Accumulo across major versions at once is asking them to take on a bunch of risk. By adding in Hadoop 2 support to 1.4 we allow them to break the risk up into steps: they can upgrade HDFS versions first, get comfortable, then upgrade Accumulo to 1.5. I think the existing tickets under the umbrella of ACCUMULO-1790 should ensure that we end up with a single 1.4 line that can work with either the existing 0.20.203.0 claimed in releases or against 2.2.0. Bill (or Josh or Chris), is there stronger language you'd like to see around docs / packaging (area #3 in the original plan and currently ACCUMULO-1796)? Maybe expressly only doing a binary convenience package for 0.20.203.0? Are you looking for something beyond a full release suite to ensure 1.4 is still maintaining compatibility on Hadoop 0.20.203? -Sean
Re: Hadoop 2.0 Support for Accumulo 1.4 Branch
On Tue, Nov 12, 2013 at 4:49 PM, Sean Busbey busbey...@clouderagovt.com wrote: On Tue, Nov 12, 2013 at 3:14 PM, William Slacum wilhelm.von.cl...@accumulo.net wrote: The language of ACCUMULO-1795 indicated that an acceptable state was something that wasn't binary compatible. That's my #1 thing to avoid. Ah. So I see, not sure why I phrased that that way. Since the default build should still be 0.20.203.0, I'm not sure how it'd end up not being binary compatible. I can update the ticket to clarify the language. Any need to compile should be limited to running Hadoop 2.2.0. Sound good? +1 (The confusing wording was the basis for my concerns also.) Maybe expressly only doing a binary convenience package for 0.20.203.0? If we need an extra package, doesn't that mean a user can't just upgrade Accumulo? By binary convenience package I mean the binary distribution tarball (or rpms, or whatevs) that we make as a part of the release process. For users of Hadoop 0.20.203.0, upgrading should be unchanged from how they would normally get their Accumulo 1.4.x distribution. ACCUMULO-1796 has some leeway about the convenience packages for people who want Hadoop 2 support. On the extreme end, they'd have to build from source and then run a normal upgrade process. I'd prefer binary compatibility with a single build, but if that's too hard to achieve, I have no objection to providing a mechanism to perform an alternate build against 2.x (whether or not we provide a pre-built binary package for it), so long as the default build is 0.20.x -- Christopher L Tubbs II http://gravatar.com/ctubbsii
Re: Hadoop 2.0 Support for Accumulo 1.4 Branch
On Thu, Nov 14, 2013 at 6:27 PM, Christopher ctubb...@apache.org wrote: The main thing is that I would not want to see an ACCUMULO-1790 *without* ACCUMULO-1795. Having 1792 alone would be insufficient for me. That is precisely the intention of ACCUMULO-1790. All of the subtasks (including ACCUMULO-1792 and ACCUMULO-1795) have to be complete for things to get into the 1.4 branch. Until that time the work would just go into a feature branch for ACCUMULO-1790 (to make working and testing easier for those implementing the subtasks). If you wanted to see the full implementation you would just wait until all of the subtasks were committed to the feature branch. Am I missing something? -- Sean
Re: Hadoop 2.0 Support for Accumulo 1.4 Branch
Nope, I think we're on the same page now. -- Christopher L Tubbs II http://gravatar.com/ctubbsii On Thu, Nov 14, 2013 at 7:39 PM, Sean Busbey busbey...@clouderagovt.com wrote: On Thu, Nov 14, 2013 at 6:27 PM, Christopher ctubb...@apache.org wrote: The main thing is that I would not want to see an ACCUMULO-1790 *without* ACCUMULO-1795. Having 1792 alone would be insufficient for me. That is precisely the intention of ACCUMULO-1790. All of the subtasks (including ACCUMULO-1792 and ACCUMULO-1795) have to be complete for things to get into the 1.4 branch. Until that time the work would just go into a feature branch for ACCUMULO-1790 (to make working and testing easier for those implementing the subtasks). If you wanted to see the full implementation you would just wait until all of the subtasks were committed to the feature branch. Am I missing something? -- Sean
Re: Hadoop 2.0 Support for Accumulo 1.4 Branch
Based on recent feedback on ACCUMULO-1792 and ACCUMULO-1795, I want to resurrect this thread to make sure everyone's concerns are addressed. For context, here's a link to the start of the last thread: http://bit.ly/1aPqKuH From ACCUMULO-1792, ctubbsii: I'd be reluctant to support any Hadoop 2.x support in the 1.4 release line that breaks compatibility with 0.20. I don't think breaking 0.20 and then possibly fixing it again as a second step is acceptable (because that subsequent work may not ever be done, and I don't think we should break the compatibility contract that we've established with 1.4.0). Chris, I believe keeping all of the work in a branch under the umbrella jira of ACCUMULO-1790 will ensure that we don't end up with a 1.4 release that doesn't have proper support for 0.20.203. Is there something beyond making sure the branch passes a full set of release tests on 0.20.203 that you'd like to see? In the event that the branch only ever contains the work for adding Hadoop 2, it's a simple matter to abandon without rolling into the 1.4 development line. From ACCUMULO-1795, bills (and +1ed by elserj and ctubbsii): I'm very uncomfortable with risking breaking continuity in such an old release, and I don't think managing two lines of 1.4 releases is worth the effort. Though we have no official EOL policy, 1.3 was practically dead in the water once 1.4 was around, and I hope we start encouraging more adoption of 1.5 (and soon 1.6) versus continually propping up 1.4. I'd love to get people to move off of 1.4. However, I think adding Hadoop 2 support to 1.4 encourages this more than leaving it out. I'm not sure I agree that adding Hadoop2 support to 1.4 encourages people to upgrade Accumulo. My gut reaction would be that it allows people to completely ignore Accumulo updates (ignoring moving to 1.4.5 which would allow them to do hadoop2 with your proposed changes) Accumulo 1.5.x places a higher burden on HDFS than 1.4 did, and I'm not surprised people find relying on 0.20 for the 1.5 WAL intimidating. Upgrading both HDFS and Accumulo across major versions at once is asking them to take on a bunch of risk. By adding in Hadoop 2 support to 1.4 we allow them to break the risk up into steps: they can upgrade HDFS versions first, get comfortable, then upgrade Accumulo to 1.5. Personally, maintaining 0.20 compatibility is not a big concern on my radar. If you're still running an 0.20 release, I'd *really* hope that you have an upgrade path to 1.2.x (if not 2.2.x) scheduled. I think claiming that 1.5 has a higher burden on 1.4 is a bit of a fallacy. There were many problems and pains regarding WALs in =1.4 that are very difficult to work with in a large environment (try finding WALs in server failure cases). I think the increased I/O on HDFS is a much smaller cost than the completely different I/O path that the old loggers have. I also think upgrading Accumulo is much less scary than upgrading HDFS, but that's just me. To me, it seems like the argument may be coming down to whether or not we break 0.20 hadoop compatibility on a bug-fix release and how concerned we are about letting users lag behind the upstream development. I think the existing tickets under the umbrella of ACCUMULO-1790 should ensure that we end up with a single 1.4 line that can work with either the existing 0.20.203.0 claimed in releases or against 2.2.0. Bill (or Josh or Chris), is there stronger language you'd like to see around docs / packaging (area #3 in the original plan and currently ACCUMULO-1796)? Maybe expressly only doing a binary convenience package for 0.20.203.0? Are you looking for something beyond a full release suite to ensure 1.4 is still maintaining compatibility on Hadoop 0.20.203? Again, my biggest concern here is not following our own guidelines of breaking changes across minor releases, but I'd hope 0.20 users have an upgrade path outlined for themselves.
Re: Hadoop 2.0 Support for Accumulo 1.4 Branch
A user of 1.4.a should be able to move to 1.4.b without any major infrastructure changes, such as swapping out HDFS or installing extra add-ons. I don't find much merit in debating local WAL vs HDFS WAL cost/benefit since the only quantifiable evidence we have supported the move. I should note, Sean, that if you see merit in the work, you don't need community approval for forking and sharing. However, I do not think it is in the community's best interest to continue to upgrade 1.4. On Tue, Nov 12, 2013 at 2:12 PM, Josh Elser josh.el...@gmail.com wrote: Based on recent feedback on ACCUMULO-1792 and ACCUMULO-1795, I want to resurrect this thread to make sure everyone's concerns are addressed. For context, here's a link to the start of the last thread: http://bit.ly/1aPqKuH From ACCUMULO-1792, ctubbsii: I'd be reluctant to support any Hadoop 2.x support in the 1.4 release line that breaks compatibility with 0.20. I don't think breaking 0.20 and then possibly fixing it again as a second step is acceptable (because that subsequent work may not ever be done, and I don't think we should break the compatibility contract that we've established with 1.4.0). Chris, I believe keeping all of the work in a branch under the umbrella jira of ACCUMULO-1790 will ensure that we don't end up with a 1.4 release that doesn't have proper support for 0.20.203. Is there something beyond making sure the branch passes a full set of release tests on 0.20.203 that you'd like to see? In the event that the branch only ever contains the work for adding Hadoop 2, it's a simple matter to abandon without rolling into the 1.4 development line. From ACCUMULO-1795, bills (and +1ed by elserj and ctubbsii): I'm very uncomfortable with risking breaking continuity in such an old release, and I don't think managing two lines of 1.4 releases is worth the effort. Though we have no official EOL policy, 1.3 was practically dead in the water once 1.4 was around, and I hope we start encouraging more adoption of 1.5 (and soon 1.6) versus continually propping up 1.4. I'd love to get people to move off of 1.4. However, I think adding Hadoop 2 support to 1.4 encourages this more than leaving it out. I'm not sure I agree that adding Hadoop2 support to 1.4 encourages people to upgrade Accumulo. My gut reaction would be that it allows people to completely ignore Accumulo updates (ignoring moving to 1.4.5 which would allow them to do hadoop2 with your proposed changes) Accumulo 1.5.x places a higher burden on HDFS than 1.4 did, and I'm not surprised people find relying on 0.20 for the 1.5 WAL intimidating. Upgrading both HDFS and Accumulo across major versions at once is asking them to take on a bunch of risk. By adding in Hadoop 2 support to 1.4 we allow them to break the risk up into steps: they can upgrade HDFS versions first, get comfortable, then upgrade Accumulo to 1.5. Personally, maintaining 0.20 compatibility is not a big concern on my radar. If you're still running an 0.20 release, I'd *really* hope that you have an upgrade path to 1.2.x (if not 2.2.x) scheduled. I think claiming that 1.5 has a higher burden on 1.4 is a bit of a fallacy. There were many problems and pains regarding WALs in =1.4 that are very difficult to work with in a large environment (try finding WALs in server failure cases). I think the increased I/O on HDFS is a much smaller cost than the completely different I/O path that the old loggers have. I also think upgrading Accumulo is much less scary than upgrading HDFS, but that's just me. To me, it seems like the argument may be coming down to whether or not we break 0.20 hadoop compatibility on a bug-fix release and how concerned we are about letting users lag behind the upstream development. I think the existing tickets under the umbrella of ACCUMULO-1790 should ensure that we end up with a single 1.4 line that can work with either the existing 0.20.203.0 claimed in releases or against 2.2.0. Bill (or Josh or Chris), is there stronger language you'd like to see around docs / packaging (area #3 in the original plan and currently ACCUMULO-1796)? Maybe expressly only doing a binary convenience package for 0.20.203.0? Are you looking for something beyond a full release suite to ensure 1.4 is still maintaining compatibility on Hadoop 0.20.203? Again, my biggest concern here is not following our own guidelines of breaking changes across minor releases, but I'd hope 0.20 users have an upgrade path outlined for themselves.
Re: Hadoop 2.0 Support for Accumulo 1.4 Branch
On Tue, Nov 12, 2013 at 1:12 PM, Josh Elser josh.el...@gmail.com wrote: To me, it seems like the argument may be coming down to whether or not we break 0.20 hadoop compatibility on a bug-fix release and how concerned we are about letting users lag behind the upstream development. I think the existing tickets under the umbrella of ACCUMULO-1790 should ensure that we end up with a single 1.4 line that can work with either the existing 0.20.203.0 claimed in releases or against 2.2.0. Bill (or Josh or Chris), is there stronger language you'd like to see around docs / packaging (area #3 in the original plan and currently ACCUMULO-1796)? Maybe expressly only doing a binary convenience package for 0.20.203.0? Are you looking for something beyond a full release suite to ensure 1.4 is still maintaining compatibility on Hadoop 0.20.203? Again, my biggest concern here is not following our own guidelines of breaking changes across minor releases, but I'd hope 0.20 users have an upgrade path outlined for themselves. The plan outlined in the original thread, and in the subtasks under ACCUMULO-1790, is expressly aimed at not breaking 0.20 compatibility in the 1.4 bugfix line. If there's anything we can do besides running through the release test suite on a 0.20 cluster to help ensure that, I am interested in adding it to the existing plan. -- Sean
Re: Hadoop 2.0 Support for Accumulo 1.4 Branch
On Tue, Nov 12, 2013 at 1:28 PM, William Slacum wilhelm.von.cl...@accumulo.net wrote: A user of 1.4.a should be able to move to 1.4.b without any major infrastructure changes, such as swapping out HDFS or installing extra add-ons. Right, exactly. Hopefully no part of the original plan contradicts this. Is there something that appears to? -- Sean
Re: Hadoop 2.0 Support for Accumulo 1.4 Branch
On 11/12/13, 12:24 PM, Sean Busbey wrote: On Tue, Nov 12, 2013 at 1:12 PM, Josh Elser josh.el...@gmail.com wrote: To me, it seems like the argument may be coming down to whether or not we break 0.20 hadoop compatibility on a bug-fix release and how concerned we are about letting users lag behind the upstream development. I think the existing tickets under the umbrella of ACCUMULO-1790 should ensure that we end up with a single 1.4 line that can work with either the existing 0.20.203.0 claimed in releases or against 2.2.0. Bill (or Josh or Chris), is there stronger language you'd like to see around docs / packaging (area #3 in the original plan and currently ACCUMULO-1796)? Maybe expressly only doing a binary convenience package for 0.20.203.0? Are you looking for something beyond a full release suite to ensure 1.4 is still maintaining compatibility on Hadoop 0.20.203? Again, my biggest concern here is not following our own guidelines of breaking changes across minor releases, but I'd hope 0.20 users have an upgrade path outlined for themselves. The plan outlined in the original thread, and in the subtasks under ACCUMULO-1790, is expressly aimed at not breaking 0.20 compatibility in the 1.4 bugfix line. If there's anything we can do besides running through the release test suite on a 0.20 cluster to help ensure that, I am interested in adding it to the existing plan. What about the other half: encouraging users to lag (soon to be) two major releases behind?
Re: Hadoop 2.0 Support for Accumulo 1.4 Branch
The language of ACCUMULO-1795 indicated that an acceptable state was something that wasn't binary compatible. That's my #1 thing to avoid. Maybe expressly only doing a binary convenience package for 0.20.203.0? If we need an extra package, doesn't that mean a user can't just upgrade Accumulo? As a side note, 0.20.203.0 is 1.4, On Tue, Nov 12, 2013 at 3:28 PM, Sean Busbey busbey...@clouderagovt.comwrote: On Tue, Nov 12, 2013 at 1:28 PM, William Slacum wilhelm.von.cl...@accumulo.net wrote: A user of 1.4.a should be able to move to 1.4.b without any major infrastructure changes, such as swapping out HDFS or installing extra add-ons. Right, exactly. Hopefully no part of the original plan contradicts this. Is there something that appears to? -- Sean
Re: Hadoop 2.0 Support for Accumulo 1.4 Branch
On Tue, Nov 12, 2013 at 2:48 PM, Josh Elser josh.el...@gmail.com wrote: What about the other half: encouraging users to lag (soon to be) two major releases behind? I don't think our current user base needs to be encouraged strongly to upgrade. And as I said previously I think this change provides them with an upgrade path that's easier to stomach, but I suspect this is a point we disagree on. -- Sean
Re: Hadoop 2.0 Support for Accumulo 1.4 Branch
Just to be clear, we are talking about adding profile support to the pom's for Hadoop 2.2.0 for a 1.4.5 and 1.5.1 release, correct? We are not talking about changing the default build profile for these branches are we? - Original Message - From: Billie Rinaldi billie.rina...@gmail.com To: dev@accumulo.apache.org Sent: Monday, October 14, 2013 11:57:40 PM Subject: Re: Hadoop 2.0 Support for Accumulo 1.4 Branch Thanks for the note, Ted. That vote is for 2.2.0, not -beta. On Oct 14, 2013 7:30 PM, Ted Yu yuzhih...@gmail.com wrote: w.r.t. hadoop-2 release, see this thread: http://search-hadoop.com/m/YSTny19y1Ha1/hadoop+2.2.0 Looks like 2.2.0-beta would pass votes. Cheers On Mon, Oct 14, 2013 at 7:24 PM, Mike Drob md...@mdrob.com wrote: Responses Inline. - Mike On Mon, Oct 14, 2013 at 12:55 PM, Sean Busbey bus...@cloudera.com wrote: Hey All, I'd like to restart the conversation from end July / start August about Hadoop 2 support on the 1.4 branch. Specifically, I'd like to get some requirements ironed out so I can file one or more jiras. I'd also like to get a plan for application. =requirements Here's the requirements I have from the last thread: 1) Maintain existing 1.4 compatibility The only thing I see listed in the pom is Apache release 0.20.203.0. (1.4.4 tag)[1] I don't see anything in the README[2] nor the user manual[3] on other versions being supported. Yep. 2) Gain Hadoop 2 support At the moment, I'm presuming this means Apache release 2.0.4-alpha since that's what 1.5.0 builds against for Hadoop 2. I haven't been following the Hadoop 2 release schedule that closely, but I think the latest is a 2.1.0-beta? Pretty sure it was released after we finished Accumulo 1.5, so there's no reason not to support it in my mind. Depending on an alpha of something strikes me as either unstable or lazy, although I fully understand that it may be neither. 3) Test for correctness on given versions, with = 5 node cluster * Unit Tests * Functional Tests * 24hr continuous + verification * 24hr continuous + verification + agitation * 24hr random walk * 24hr random walk + agitation Keith mentioned running these against a CDH4 cluster, but I presume that since Apache Releases are our stated compatibilities it would actually be against whatever versions we list. Based on #1 and #2 above, I would expect that to be Apache Hadoop 0.20.203.0 and Apache Hadoop 2.0.4-alpha. Hadoop 2 introduces some neat new things like NN HA, which I think it might be worthwhile to test with. At that level it might be more of a verification of the Hadoop code, but I'd like to be comfortable that our DFS Clients switch correctly. This is in addition to the standard release suite that we run. [1] [1]: http://accumulo.apache.org/governance/releasing.html#testing 4) Binary packaging 4a) Either source produces a single binary for all accepted versions or 4b) Instructions for building from source for each versions and somehow flag what (if any) convenience binaries are made for the release. Having run the binary packaging for 1.4.4, I can tell you that it is not in great shape. Christopher cleaned up a lot of the issues in the 1.5 line, so I didn't bother spending a ton of time on them here, but I think RPM and DEB are both broken. It would be nice to be able to specify a Hadoop 2 version for compilation, similar to what happens in the newer code base, which could be back ported, I suppose. 4b seems easier. =application There will be many back-ported patches. Not much active development happens on 1.4.x now, but I presume this should still all go onto a feature branch? Is the community preference that eventually all the changes become a single commit (or one-per-subtask if there are multiple jiras) on the active 1.4 development branch, or that the original patches remain broken out? Not sure what you mean by this. For what it's worth, I'd recommend keeping them broken out. (And that's how the initial development against CDH4 has been done.) [1] http://bit.ly/1fxucMe [2] http://bit.ly/192zUAJ [3] http://accumulo.apache.org/1.4/user_manual/Administration.html#Dependencies -- Sean
Re: Hadoop 2.0 Support for Accumulo 1.4 Branch
On Tue, Oct 15, 2013 at 7:16 AM, dlmar...@comcast.net wrote: Just to be clear, we are talking about adding profile support to the pom's for Hadoop 2.2.0 for a 1.4.5 and 1.5.1 release, correct? We are not talking about changing the default build profile for these branches are we? for 1.4.5-SNAPSHOT I am only talking about adding support Hadoop 2.2.0. I am not suggesting we change the default from building against Hadoop 0.23.203. I'm not sure about the change to 1.5.1-SNAPSHOT. I believe we're talking about changing the hadoop.profile for 2.0 to use the 2.2.0 release. I don't think it makes sense to change the default off of the version in the hadoop.profile for 1.0. Presumably this change would also happen in master. Now that Hadoop 2.x is going to have a GA release, I think it makes sense to have a discussion about changing the default to be the hadoop 2.0 profile for master, but this is not that discussion. -- Sean
Re: Hadoop 2.0 Support for Accumulo 1.4 Branch
On Tue, Oct 15, 2013 at 10:16 AM, Sean Busbey bus...@cloudera.com wrote: On Tue, Oct 15, 2013 at 7:16 AM, dlmar...@comcast.net wrote: Just to be clear, we are talking about adding profile support to the pom's for Hadoop 2.2.0 for a 1.4.5 and 1.5.1 release, correct? We are not talking about changing the default build profile for these branches are we? for 1.4.5-SNAPSHOT I am only talking about adding support Hadoop 2.2.0. I am not suggesting we change the default from building against Hadoop 0.23.203. I mean 0.20.203.0. Ugh, Hadoop versions. -- Sean
Re: Hadoop 2.0 Support for Accumulo 1.4 Branch
I think you meant: Ugh, Hadoop versions.[1] [1] http://blog.cloudera.com/blog/2012/04/apache-hadoop-versions-looking-ahead-3/ On Tue, Oct 15, 2013 at 11:20 AM, Sean Busbey bus...@cloudera.com wrote: On Tue, Oct 15, 2013 at 10:16 AM, Sean Busbey bus...@cloudera.com wrote: On Tue, Oct 15, 2013 at 7:16 AM, dlmar...@comcast.net wrote: Just to be clear, we are talking about adding profile support to the pom's for Hadoop 2.2.0 for a 1.4.5 and 1.5.1 release, correct? We are not talking about changing the default build profile for these branches are we? for 1.4.5-SNAPSHOT I am only talking about adding support Hadoop 2.2.0. I am not suggesting we change the default from building against Hadoop 0.23.203. I mean 0.20.203.0. Ugh, Hadoop versions. -- Sean -- Joey Echeverria Director, Federal FTS Cloudera, Inc.
Hadoop 2.0 Support for Accumulo 1.4 Branch
Hey All, I'd like to restart the conversation from end July / start August about Hadoop 2 support on the 1.4 branch. Specifically, I'd like to get some requirements ironed out so I can file one or more jiras. I'd also like to get a plan for application. =requirements Here's the requirements I have from the last thread: 1) Maintain existing 1.4 compatibility The only thing I see listed in the pom is Apache release 0.20.203.0. (1.4.4 tag)[1] I don't see anything in the README[2] nor the user manual[3] on other versions being supported. 2) Gain Hadoop 2 support At the moment, I'm presuming this means Apache release 2.0.4-alpha since that's what 1.5.0 builds against for Hadoop 2. 3) Test for correctness on given versions, with = 5 node cluster * Unit Tests * Functional Tests * 24hr continuous + verification * 24hr continuous + verification + agitation * 24hr random walk * 24hr random walk + agitation Keith mentioned running these against a CDH4 cluster, but I presume that since Apache Releases are our stated compatibilities it would actually be against whatever versions we list. Based on #1 and #2 above, I would expect that to be Apache Hadoop 0.20.203.0 and Apache Hadoop 2.0.4-alpha. 4) Binary packaging 4a) Either source produces a single binary for all accepted versions or 4b) Instructions for building from source for each versions and somehow flag what (if any) convenience binaries are made for the release. =application There will be many back-ported patches. Not much active development happens on 1.4.x now, but I presume this should still all go onto a feature branch? Is the community preference that eventually all the changes become a single commit (or one-per-subtask if there are multiple jiras) on the active 1.4 development branch, or that the original patches remain broken out? For what it's worth, I'd recommend keeping them broken out. (And that's how the initial development against CDH4 has been done.) [1] http://bit.ly/1fxucMe [2] http://bit.ly/192zUAJ [3] http://accumulo.apache.org/1.4/user_manual/Administration.html#Dependencies -- Sean
Re: Hadoop 2.0 Support for Accumulo 1.4 Branch
Responses Inline. - Mike On Mon, Oct 14, 2013 at 12:55 PM, Sean Busbey bus...@cloudera.com wrote: Hey All, I'd like to restart the conversation from end July / start August about Hadoop 2 support on the 1.4 branch. Specifically, I'd like to get some requirements ironed out so I can file one or more jiras. I'd also like to get a plan for application. =requirements Here's the requirements I have from the last thread: 1) Maintain existing 1.4 compatibility The only thing I see listed in the pom is Apache release 0.20.203.0. (1.4.4 tag)[1] I don't see anything in the README[2] nor the user manual[3] on other versions being supported. Yep. 2) Gain Hadoop 2 support At the moment, I'm presuming this means Apache release 2.0.4-alpha since that's what 1.5.0 builds against for Hadoop 2. I haven't been following the Hadoop 2 release schedule that closely, but I think the latest is a 2.1.0-beta? Pretty sure it was released after we finished Accumulo 1.5, so there's no reason not to support it in my mind. Depending on an alpha of something strikes me as either unstable or lazy, although I fully understand that it may be neither. 3) Test for correctness on given versions, with = 5 node cluster * Unit Tests * Functional Tests * 24hr continuous + verification * 24hr continuous + verification + agitation * 24hr random walk * 24hr random walk + agitation Keith mentioned running these against a CDH4 cluster, but I presume that since Apache Releases are our stated compatibilities it would actually be against whatever versions we list. Based on #1 and #2 above, I would expect that to be Apache Hadoop 0.20.203.0 and Apache Hadoop 2.0.4-alpha. Hadoop 2 introduces some neat new things like NN HA, which I think it might be worthwhile to test with. At that level it might be more of a verification of the Hadoop code, but I'd like to be comfortable that our DFS Clients switch correctly. This is in addition to the standard release suite that we run. [1] [1]: http://accumulo.apache.org/governance/releasing.html#testing 4) Binary packaging 4a) Either source produces a single binary for all accepted versions or 4b) Instructions for building from source for each versions and somehow flag what (if any) convenience binaries are made for the release. Having run the binary packaging for 1.4.4, I can tell you that it is not in great shape. Christopher cleaned up a lot of the issues in the 1.5 line, so I didn't bother spending a ton of time on them here, but I think RPM and DEB are both broken. It would be nice to be able to specify a Hadoop 2 version for compilation, similar to what happens in the newer code base, which could be back ported, I suppose. 4b seems easier. =application There will be many back-ported patches. Not much active development happens on 1.4.x now, but I presume this should still all go onto a feature branch? Is the community preference that eventually all the changes become a single commit (or one-per-subtask if there are multiple jiras) on the active 1.4 development branch, or that the original patches remain broken out? Not sure what you mean by this. For what it's worth, I'd recommend keeping them broken out. (And that's how the initial development against CDH4 has been done.) [1] http://bit.ly/1fxucMe [2] http://bit.ly/192zUAJ [3] http://accumulo.apache.org/1.4/user_manual/Administration.html#Dependencies -- Sean
Re: Hadoop 2.0 Support for Accumulo 1.4 Branch
For #2, from what I've read, we should definitely bump up the dependency on 1.5.1-SNAPSHOT to 2.1.0-beta, and, given what Ted replied with, to 2.2.0-beta for that hadoop-2 profile. I probably stated this before, but I'd much rather see more effort in testing Accumulo 1.5.x (and 1.6.0 as that will be feature frozen soon) against hadoop-2 (like Mike's point about HA). I'm not sure if anyone ever did testing of Accumulo with the hadoop-2 features -- I seem to recall that it was more testing does Accumulo run on both hadoop 1 and 2. If we can maintain a single artifact, that would definitely be easiest for users, but falling back to user-built artifacts or convenience releases isn't the end of the world. As far as commits, I'd like to see as much separation as possible, but it's understandable if the changes overlap and don't make sense to split out. On 10/14/13 12:55 PM, Sean Busbey wrote: Hey All, I'd like to restart the conversation from end July / start August about Hadoop 2 support on the 1.4 branch. Specifically, I'd like to get some requirements ironed out so I can file one or more jiras. I'd also like to get a plan for application. =requirements Here's the requirements I have from the last thread: 1) Maintain existing 1.4 compatibility The only thing I see listed in the pom is Apache release 0.20.203.0. (1.4.4 tag)[1] I don't see anything in the README[2] nor the user manual[3] on other versions being supported. 2) Gain Hadoop 2 support At the moment, I'm presuming this means Apache release 2.0.4-alpha since that's what 1.5.0 builds against for Hadoop 2. 3) Test for correctness on given versions, with = 5 node cluster * Unit Tests * Functional Tests * 24hr continuous + verification * 24hr continuous + verification + agitation * 24hr random walk * 24hr random walk + agitation Keith mentioned running these against a CDH4 cluster, but I presume that since Apache Releases are our stated compatibilities it would actually be against whatever versions we list. Based on #1 and #2 above, I would expect that to be Apache Hadoop 0.20.203.0 and Apache Hadoop 2.0.4-alpha. 4) Binary packaging 4a) Either source produces a single binary for all accepted versions or 4b) Instructions for building from source for each versions and somehow flag what (if any) convenience binaries are made for the release. =application There will be many back-ported patches. Not much active development happens on 1.4.x now, but I presume this should still all go onto a feature branch? Is the community preference that eventually all the changes become a single commit (or one-per-subtask if there are multiple jiras) on the active 1.4 development branch, or that the original patches remain broken out? For what it's worth, I'd recommend keeping them broken out. (And that's how the initial development against CDH4 has been done.) [1] http://bit.ly/1fxucMe [2] http://bit.ly/192zUAJ [3] http://accumulo.apache.org/1.4/user_manual/Administration.html#Dependencies
Re: Hadoop 2.0 Support for Accumulo 1.4 Branch
On Mon, Oct 14, 2013 at 9:24 PM, Mike Drob md...@mdrob.com wrote: 3) Test for correctness on given versions, with = 5 node cluster * Unit Tests * Functional Tests * 24hr continuous + verification * 24hr continuous + verification + agitation * 24hr random walk * 24hr random walk + agitation Keith mentioned running these against a CDH4 cluster, but I presume that since Apache Releases are our stated compatibilities it would actually be against whatever versions we list. Based on #1 and #2 above, I would expect that to be Apache Hadoop 0.20.203.0 and Apache Hadoop 2.0.4-alpha. Hadoop 2 introduces some neat new things like NN HA, which I think it might be worthwhile to test with. At that level it might be more of a verification of the Hadoop code, but I'd like to be comfortable that our DFS Clients switch correctly. This is in addition to the standard release suite that we run. [1] [1]: http://accumulo.apache.org/governance/releasing.html#testing Just to confirm, the change from Keith's request is * 72hr continuous + agitation + cluster running * Something to test that HA NN failover doesn't take out Accumulo Would the latter be addressed by an additional functional test? or would it need to be some kind of addition to the agitation? Having run the binary packaging for 1.4.4, I can tell you that it is not in great shape. Christopher cleaned up a lot of the issues in the 1.5 line, so I didn't bother spending a ton of time on them here, but I think RPM and DEB are both broken. It would be nice to be able to specify a Hadoop 2 version for compilation, similar to what happens in the newer code base, which could be back ported, I suppose. 4b seems easier. I think this means you're +0 on 4b? =application There will be many back-ported patches. Not much active development happens on 1.4.x now, but I presume this should still all go onto a feature branch? Is the community preference that eventually all the changes become a single commit (or one-per-subtask if there are multiple jiras) on the active 1.4 development branch, or that the original patches remain broken out? Not sure what you mean by this. It's the difference between the 1.4.x branch having all the commits that are backported from 1.5.x vs just having squashed ones. The former maintains more of the original authorship and ties to original jiras. The latter has less noise. -- Sean
Re: Hadoop 2.0 Support for Accumulo 1.4 Branch
On Mon, Oct 14, 2013 at 10:02 PM, Josh Elser josh.el...@gmail.com wrote: For #2, from what I've read, we should definitely bump up the dependency on 1.5.1-SNAPSHOT to 2.1.0-beta, and, given what Ted replied with, to 2.2.0-beta for that hadoop-2 profile. so 1.5.1-SNAPSHOT and this proposed change to 1.4.5-SNAPSHOT should both target 2.2.0-beta, presuming the RC passes (and 2.1.0-beta prior). This sounds inline with Mike's comment re: alpha v beta. anyone have an objection? I probably stated this before, but I'd much rather see more effort in testing Accumulo 1.5.x (and 1.6.0 as that will be feature frozen soon) against hadoop-2 (like Mike's point about HA). I'm not sure if anyone ever did testing of Accumulo with the hadoop-2 features -- I seem to recall that it was more testing does Accumulo run on both hadoop 1 and 2. I figured whatever bar I end up passing for Hadoop 2 support on 1.4.x should help with testing the same for 1.5.x and 1.6.x. -- Sean
Re: Hadoop 2.0 Support for Accumulo 1.4 Branch
Thanks for the note, Ted. That vote is for 2.2.0, not -beta. On Oct 14, 2013 7:30 PM, Ted Yu yuzhih...@gmail.com wrote: w.r.t. hadoop-2 release, see this thread: http://search-hadoop.com/m/YSTny19y1Ha1/hadoop+2.2.0 Looks like 2.2.0-beta would pass votes. Cheers On Mon, Oct 14, 2013 at 7:24 PM, Mike Drob md...@mdrob.com wrote: Responses Inline. - Mike On Mon, Oct 14, 2013 at 12:55 PM, Sean Busbey bus...@cloudera.com wrote: Hey All, I'd like to restart the conversation from end July / start August about Hadoop 2 support on the 1.4 branch. Specifically, I'd like to get some requirements ironed out so I can file one or more jiras. I'd also like to get a plan for application. =requirements Here's the requirements I have from the last thread: 1) Maintain existing 1.4 compatibility The only thing I see listed in the pom is Apache release 0.20.203.0. (1.4.4 tag)[1] I don't see anything in the README[2] nor the user manual[3] on other versions being supported. Yep. 2) Gain Hadoop 2 support At the moment, I'm presuming this means Apache release 2.0.4-alpha since that's what 1.5.0 builds against for Hadoop 2. I haven't been following the Hadoop 2 release schedule that closely, but I think the latest is a 2.1.0-beta? Pretty sure it was released after we finished Accumulo 1.5, so there's no reason not to support it in my mind. Depending on an alpha of something strikes me as either unstable or lazy, although I fully understand that it may be neither. 3) Test for correctness on given versions, with = 5 node cluster * Unit Tests * Functional Tests * 24hr continuous + verification * 24hr continuous + verification + agitation * 24hr random walk * 24hr random walk + agitation Keith mentioned running these against a CDH4 cluster, but I presume that since Apache Releases are our stated compatibilities it would actually be against whatever versions we list. Based on #1 and #2 above, I would expect that to be Apache Hadoop 0.20.203.0 and Apache Hadoop 2.0.4-alpha. Hadoop 2 introduces some neat new things like NN HA, which I think it might be worthwhile to test with. At that level it might be more of a verification of the Hadoop code, but I'd like to be comfortable that our DFS Clients switch correctly. This is in addition to the standard release suite that we run. [1] [1]: http://accumulo.apache.org/governance/releasing.html#testing 4) Binary packaging 4a) Either source produces a single binary for all accepted versions or 4b) Instructions for building from source for each versions and somehow flag what (if any) convenience binaries are made for the release. Having run the binary packaging for 1.4.4, I can tell you that it is not in great shape. Christopher cleaned up a lot of the issues in the 1.5 line, so I didn't bother spending a ton of time on them here, but I think RPM and DEB are both broken. It would be nice to be able to specify a Hadoop 2 version for compilation, similar to what happens in the newer code base, which could be back ported, I suppose. 4b seems easier. =application There will be many back-ported patches. Not much active development happens on 1.4.x now, but I presume this should still all go onto a feature branch? Is the community preference that eventually all the changes become a single commit (or one-per-subtask if there are multiple jiras) on the active 1.4 development branch, or that the original patches remain broken out? Not sure what you mean by this. For what it's worth, I'd recommend keeping them broken out. (And that's how the initial development against CDH4 has been done.) [1] http://bit.ly/1fxucMe [2] http://bit.ly/192zUAJ [3] http://accumulo.apache.org/1.4/user_manual/Administration.html#Dependencies -- Sean
Re: Hadoop 2.0 Support for Accumulo 1.4 Branch
Sorry for the delay, it's been one of those weeks. The current version would probably not be backwards compatible to 0.20.2 just based on changes in dependencies. We're looking right now to see how hard it is to have three way compatibility (0.20, 1.0, 2.0). -Joey On Thu, Aug 1, 2013 at 7:33 PM, Dave Marion dlmar...@comcast.net wrote: Any update? -Original Message- From: Joey Echeverria [mailto:j...@cloudera.com] Sent: Monday, July 29, 2013 1:24 PM To: dev@accumulo.apache.org Subject: Re: Hadoop 2.0 Support for Accumulo 1.4 Branch We're testing this today. I'll report back what we find. -Joey — Sent from Mailbox for iPhone On Fri, Jul 26, 2013 at 3:34 PM, null dlmar...@comcast.net wrote: Will 1.4 still work with 0.20 with these patches? Great point Billie. - Original Message - From: Billie Rinaldi billie.rina...@gmail.com To: dev@accumulo.apache.org Sent: Friday, July 26, 2013 3:02:41 PM Subject: Re: Hadoop 2.0 Support for Accumulo 1.4 Branch On Fri, Jul 26, 2013 at 11:33 AM, Joey Echeverria j...@cloudera.com wrote: If these patches are going to be included with 1.4.4 or 1.4.5, I would like to see the following test run using CDH4 on at least a 5 node cluster. More nodes would be better. * unit test * Functional test * 24 hr Continuous ingest + verification * 24 hr Continuous ingest + verification + agitation * 24 hr Random walk * 24 hr Random walk + agitation I may be able to assist with this, but I can not make any promises. Sure thing. Is there already a write-up on running this full battery of tests? I have a 10 node cluster that I can use for this. Great. I think this would be a good patch for 1.4. I assume that if a user stays with Hadoop 1 there are no dependency changes? Yup. It works the same way as 1.5 where all of the dependency changes are in a Hadoop 2.0 profile. In 1.5.0, we gave up on compatibility with 0.20 (and early versions of 1.0) to make the compatibility requirements simpler; we ended up without dependency changes in the hadoop version profiles. Will 1.4 still work with 0.20 with these patches? If there are dependency changes in the profiles, 1.4 would have to be compiled against a hadoop version compatible with the running version of hadoop, correct? We had some trouble in the 1.5 release process with figuring out how to provide multiple binary artifacts (each compiled against a different version of hadoop) for the same release. Just something we should consider before we are in the midst of releasing 1.4.4. Billie -Joey -- Joey Echeverria Director, Federal FTS Cloudera, Inc.
Re: Hadoop 2.0 Support for Accumulo 1.4 Branch
Would it be reasonable to consider a version of 1.4 that breaks compatibility with 0.20? I'm not really a fan of this, personally, but am curious what others think. -- Christopher L Tubbs II http://gravatar.com/ctubbsii On Fri, Aug 2, 2013 at 2:22 PM, Joey Echeverria j...@cloudera.com wrote: Sorry for the delay, it's been one of those weeks. The current version would probably not be backwards compatible to 0.20.2 just based on changes in dependencies. We're looking right now to see how hard it is to have three way compatibility (0.20, 1.0, 2.0). -Joey On Thu, Aug 1, 2013 at 7:33 PM, Dave Marion dlmar...@comcast.net wrote: Any update? -Original Message- From: Joey Echeverria [mailto:j...@cloudera.com] Sent: Monday, July 29, 2013 1:24 PM To: dev@accumulo.apache.org Subject: Re: Hadoop 2.0 Support for Accumulo 1.4 Branch We're testing this today. I'll report back what we find. -Joey — Sent from Mailbox for iPhone On Fri, Jul 26, 2013 at 3:34 PM, null dlmar...@comcast.net wrote: Will 1.4 still work with 0.20 with these patches? Great point Billie. - Original Message - From: Billie Rinaldi billie.rina...@gmail.com To: dev@accumulo.apache.org Sent: Friday, July 26, 2013 3:02:41 PM Subject: Re: Hadoop 2.0 Support for Accumulo 1.4 Branch On Fri, Jul 26, 2013 at 11:33 AM, Joey Echeverria j...@cloudera.com wrote: If these patches are going to be included with 1.4.4 or 1.4.5, I would like to see the following test run using CDH4 on at least a 5 node cluster. More nodes would be better. * unit test * Functional test * 24 hr Continuous ingest + verification * 24 hr Continuous ingest + verification + agitation * 24 hr Random walk * 24 hr Random walk + agitation I may be able to assist with this, but I can not make any promises. Sure thing. Is there already a write-up on running this full battery of tests? I have a 10 node cluster that I can use for this. Great. I think this would be a good patch for 1.4. I assume that if a user stays with Hadoop 1 there are no dependency changes? Yup. It works the same way as 1.5 where all of the dependency changes are in a Hadoop 2.0 profile. In 1.5.0, we gave up on compatibility with 0.20 (and early versions of 1.0) to make the compatibility requirements simpler; we ended up without dependency changes in the hadoop version profiles. Will 1.4 still work with 0.20 with these patches? If there are dependency changes in the profiles, 1.4 would have to be compiled against a hadoop version compatible with the running version of hadoop, correct? We had some trouble in the 1.5 release process with figuring out how to provide multiple binary artifacts (each compiled against a different version of hadoop) for the same release. Just something we should consider before we are in the midst of releasing 1.4.4. Billie -Joey -- Joey Echeverria Director, Federal FTS Cloudera, Inc.
Re: Hadoop 2.0 Support for Accumulo 1.4 Branch
I don't think that's a good idea unless you can come up with very clear version number change. -Joey On Fri, Aug 2, 2013 at 2:31 PM, Christopher ctubb...@apache.org wrote: Would it be reasonable to consider a version of 1.4 that breaks compatibility with 0.20? I'm not really a fan of this, personally, but am curious what others think. -- Christopher L Tubbs II http://gravatar.com/ctubbsii On Fri, Aug 2, 2013 at 2:22 PM, Joey Echeverria j...@cloudera.com wrote: Sorry for the delay, it's been one of those weeks. The current version would probably not be backwards compatible to 0.20.2 just based on changes in dependencies. We're looking right now to see how hard it is to have three way compatibility (0.20, 1.0, 2.0). -Joey On Thu, Aug 1, 2013 at 7:33 PM, Dave Marion dlmar...@comcast.net wrote: Any update? -Original Message- From: Joey Echeverria [mailto:j...@cloudera.com] Sent: Monday, July 29, 2013 1:24 PM To: dev@accumulo.apache.org Subject: Re: Hadoop 2.0 Support for Accumulo 1.4 Branch We're testing this today. I'll report back what we find. -Joey — Sent from Mailbox for iPhone On Fri, Jul 26, 2013 at 3:34 PM, null dlmar...@comcast.net wrote: Will 1.4 still work with 0.20 with these patches? Great point Billie. - Original Message - From: Billie Rinaldi billie.rina...@gmail.com To: dev@accumulo.apache.org Sent: Friday, July 26, 2013 3:02:41 PM Subject: Re: Hadoop 2.0 Support for Accumulo 1.4 Branch On Fri, Jul 26, 2013 at 11:33 AM, Joey Echeverria j...@cloudera.com wrote: If these patches are going to be included with 1.4.4 or 1.4.5, I would like to see the following test run using CDH4 on at least a 5 node cluster. More nodes would be better. * unit test * Functional test * 24 hr Continuous ingest + verification * 24 hr Continuous ingest + verification + agitation * 24 hr Random walk * 24 hr Random walk + agitation I may be able to assist with this, but I can not make any promises. Sure thing. Is there already a write-up on running this full battery of tests? I have a 10 node cluster that I can use for this. Great. I think this would be a good patch for 1.4. I assume that if a user stays with Hadoop 1 there are no dependency changes? Yup. It works the same way as 1.5 where all of the dependency changes are in a Hadoop 2.0 profile. In 1.5.0, we gave up on compatibility with 0.20 (and early versions of 1.0) to make the compatibility requirements simpler; we ended up without dependency changes in the hadoop version profiles. Will 1.4 still work with 0.20 with these patches? If there are dependency changes in the profiles, 1.4 would have to be compiled against a hadoop version compatible with the running version of hadoop, correct? We had some trouble in the 1.5 release process with figuring out how to provide multiple binary artifacts (each compiled against a different version of hadoop) for the same release. Just something we should consider before we are in the midst of releasing 1.4.4. Billie -Joey -- Joey Echeverria Director, Federal FTS Cloudera, Inc. -- Joey Echeverria Director, Federal FTS Cloudera, Inc.
Re: Hadoop 2.0 Support for Accumulo 1.4 Branch
Which version of 0.20 are you testing against? Vanilla, or cdh3 flavored? On Fri, Aug 2, 2013 at 2:37 PM, Joey Echeverria j...@cloudera.com wrote: I don't think that's a good idea unless you can come up with very clear version number change. -Joey On Fri, Aug 2, 2013 at 2:31 PM, Christopher ctubb...@apache.org wrote: Would it be reasonable to consider a version of 1.4 that breaks compatibility with 0.20? I'm not really a fan of this, personally, but am curious what others think. -- Christopher L Tubbs II http://gravatar.com/ctubbsii On Fri, Aug 2, 2013 at 2:22 PM, Joey Echeverria j...@cloudera.com wrote: Sorry for the delay, it's been one of those weeks. The current version would probably not be backwards compatible to 0.20.2 just based on changes in dependencies. We're looking right now to see how hard it is to have three way compatibility (0.20, 1.0, 2.0). -Joey On Thu, Aug 1, 2013 at 7:33 PM, Dave Marion dlmar...@comcast.net wrote: Any update? -Original Message- From: Joey Echeverria [mailto:j...@cloudera.com] Sent: Monday, July 29, 2013 1:24 PM To: dev@accumulo.apache.org Subject: Re: Hadoop 2.0 Support for Accumulo 1.4 Branch We're testing this today. I'll report back what we find. -Joey — Sent from Mailbox for iPhone On Fri, Jul 26, 2013 at 3:34 PM, null dlmar...@comcast.net wrote: Will 1.4 still work with 0.20 with these patches? Great point Billie. - Original Message - From: Billie Rinaldi billie.rina...@gmail.com To: dev@accumulo.apache.org Sent: Friday, July 26, 2013 3:02:41 PM Subject: Re: Hadoop 2.0 Support for Accumulo 1.4 Branch On Fri, Jul 26, 2013 at 11:33 AM, Joey Echeverria j...@cloudera.com wrote: If these patches are going to be included with 1.4.4 or 1.4.5, I would like to see the following test run using CDH4 on at least a 5 node cluster. More nodes would be better. * unit test * Functional test * 24 hr Continuous ingest + verification * 24 hr Continuous ingest + verification + agitation * 24 hr Random walk * 24 hr Random walk + agitation I may be able to assist with this, but I can not make any promises. Sure thing. Is there already a write-up on running this full battery of tests? I have a 10 node cluster that I can use for this. Great. I think this would be a good patch for 1.4. I assume that if a user stays with Hadoop 1 there are no dependency changes? Yup. It works the same way as 1.5 where all of the dependency changes are in a Hadoop 2.0 profile. In 1.5.0, we gave up on compatibility with 0.20 (and early versions of 1.0) to make the compatibility requirements simpler; we ended up without dependency changes in the hadoop version profiles. Will 1.4 still work with 0.20 with these patches? If there are dependency changes in the profiles, 1.4 would have to be compiled against a hadoop version compatible with the running version of hadoop, correct? We had some trouble in the 1.5 release process with figuring out how to provide multiple binary artifacts (each compiled against a different version of hadoop) for the same release. Just something we should consider before we are in the midst of releasing 1.4.4. Billie -Joey -- Joey Echeverria Director, Federal FTS Cloudera, Inc. -- Joey Echeverria Director, Federal FTS Cloudera, Inc.
RE: Hadoop 2.0 Support for Accumulo 1.4 Branch
Any update? -Original Message- From: Joey Echeverria [mailto:j...@cloudera.com] Sent: Monday, July 29, 2013 1:24 PM To: dev@accumulo.apache.org Subject: Re: Hadoop 2.0 Support for Accumulo 1.4 Branch We're testing this today. I'll report back what we find. -Joey — Sent from Mailbox for iPhone On Fri, Jul 26, 2013 at 3:34 PM, null dlmar...@comcast.net wrote: Will 1.4 still work with 0.20 with these patches? Great point Billie. - Original Message - From: Billie Rinaldi billie.rina...@gmail.com To: dev@accumulo.apache.org Sent: Friday, July 26, 2013 3:02:41 PM Subject: Re: Hadoop 2.0 Support for Accumulo 1.4 Branch On Fri, Jul 26, 2013 at 11:33 AM, Joey Echeverria j...@cloudera.com wrote: If these patches are going to be included with 1.4.4 or 1.4.5, I would like to see the following test run using CDH4 on at least a 5 node cluster. More nodes would be better. * unit test * Functional test * 24 hr Continuous ingest + verification * 24 hr Continuous ingest + verification + agitation * 24 hr Random walk * 24 hr Random walk + agitation I may be able to assist with this, but I can not make any promises. Sure thing. Is there already a write-up on running this full battery of tests? I have a 10 node cluster that I can use for this. Great. I think this would be a good patch for 1.4. I assume that if a user stays with Hadoop 1 there are no dependency changes? Yup. It works the same way as 1.5 where all of the dependency changes are in a Hadoop 2.0 profile. In 1.5.0, we gave up on compatibility with 0.20 (and early versions of 1.0) to make the compatibility requirements simpler; we ended up without dependency changes in the hadoop version profiles. Will 1.4 still work with 0.20 with these patches? If there are dependency changes in the profiles, 1.4 would have to be compiled against a hadoop version compatible with the running version of hadoop, correct? We had some trouble in the 1.5 release process with figuring out how to provide multiple binary artifacts (each compiled against a different version of hadoop) for the same release. Just something we should consider before we are in the midst of releasing 1.4.4. Billie -Joey
Re: Hadoop 2.0 Support for Accumulo 1.4 Branch
We're testing this today. I'll report back what we find. -Joey — Sent from Mailbox for iPhone On Fri, Jul 26, 2013 at 3:34 PM, null dlmar...@comcast.net wrote: Will 1.4 still work with 0.20 with these patches? Great point Billie. - Original Message - From: Billie Rinaldi billie.rina...@gmail.com To: dev@accumulo.apache.org Sent: Friday, July 26, 2013 3:02:41 PM Subject: Re: Hadoop 2.0 Support for Accumulo 1.4 Branch On Fri, Jul 26, 2013 at 11:33 AM, Joey Echeverria j...@cloudera.com wrote: If these patches are going to be included with 1.4.4 or 1.4.5, I would like to see the following test run using CDH4 on at least a 5 node cluster. More nodes would be better. * unit test * Functional test * 24 hr Continuous ingest + verification * 24 hr Continuous ingest + verification + agitation * 24 hr Random walk * 24 hr Random walk + agitation I may be able to assist with this, but I can not make any promises. Sure thing. Is there already a write-up on running this full battery of tests? I have a 10 node cluster that I can use for this. Great. I think this would be a good patch for 1.4. I assume that if a user stays with Hadoop 1 there are no dependency changes? Yup. It works the same way as 1.5 where all of the dependency changes are in a Hadoop 2.0 profile. In 1.5.0, we gave up on compatibility with 0.20 (and early versions of 1.0) to make the compatibility requirements simpler; we ended up without dependency changes in the hadoop version profiles. Will 1.4 still work with 0.20 with these patches? If there are dependency changes in the profiles, 1.4 would have to be compiled against a hadoop version compatible with the running version of hadoop, correct? We had some trouble in the 1.5 release process with figuring out how to provide multiple binary artifacts (each compiled against a different version of hadoop) for the same release. Just something we should consider before we are in the midst of releasing 1.4.4. Billie -Joey
Hadoop 2.0 Support for Accumulo 1.4 Branch
Cloudera announced last night our support for Accumulo 1.4.3 on CDH4: http://www.slideshare.net/JoeyEcheverria/apache-accumulo-and-cloudera This required back porting about 11 patches in whole or in part from the 1.5 line on top of 1.4.3. Our release is still in a semi-private beta, but when it's fully public it will be downloadable along with all of the extra patches that we committed. My question is if the community would be interested in us pulling those back ports upstream? I believe this would violate the previously agreed upon rule of no feature back ports to 1.4.3, depending on how we label support for Hadoop 2.0. Thoughts? -Joey
Re: Hadoop 2.0 Support for Accumulo 1.4 Branch
My question is if the community would be interested in us pulling those back ports upstream? Yes, please.
Re: Hadoop 2.0 Support for Accumulo 1.4 Branch
We have both the unit tests and the full system test suite hooked up to a Jenkins build server. There are still a couple of tests that fail periodically with the full system test due to timeouts. We're working on those which is why our current release is just a beta. There are no API changes or Accumulo behavior changes. You can use unmodified 1.4.x clients with our release of the server daemons. -Joey On Fri, Jul 26, 2013 at 11:45 AM, Keith Turner ke...@deenlo.com wrote: On Fri, Jul 26, 2013 at 11:02 AM, Joey Echeverria j...@cloudera.com wrote: Cloudera announced last night our support for Accumulo 1.4.3 on CDH4: http://www.slideshare.net/JoeyEcheverria/apache-accumulo-and-cloudera This required back porting about 11 patches in whole or in part from the 1.5 line on top of 1.4.3. Our release is still in a semi-private beta, but when it's fully public it will be downloadable along with all of the extra patches that we committed. My question is if the community would be interested in us pulling those back ports upstream? What testing has been done? It would be nice to run accumulo's full test suite against 1.4.3+CDH4. Are there any Accumulo API changes or Accumulo behavior changes? I believe this would violate the previously agreed upon rule of no feature back ports to 1.4.3, depending on how we label support for Hadoop 2.0. Thoughts? -Joey -- Joey Echeverria Director, Federal FTS Cloudera, Inc.
Re: Hadoop 2.0 Support for Accumulo 1.4 Branch
On Fri, Jul 26, 2013 at 12:24 PM, Joey Echeverria j...@cloudera.com wrote: We have both the unit tests and the full system test suite hooked up to a Jenkins build server. If these patches are going to be included with 1.4.4 or 1.4.5, I would like to see the following test run using CDH4 on at least a 5 node cluster. More nodes would be better. * unit test * Functional test * 24 hr Continuous ingest + verification * 24 hr Continuous ingest + verification + agitation * 24 hr Random walk * 24 hr Random walk + agitation I may be able to assist with this, but I can not make any promises. There are still a couple of tests that fail periodically with the full system test due to timeouts. We're working on those which is why our current release is just a beta. There are no API changes or Accumulo behavior changes. You can use unmodified 1.4.x clients with our release of the server daemons. Great. I think this would be a good patch for 1.4. I assume that if a user stays with Hadoop 1 there are no dependency changes? -Joey On Fri, Jul 26, 2013 at 11:45 AM, Keith Turner ke...@deenlo.com wrote: On Fri, Jul 26, 2013 at 11:02 AM, Joey Echeverria j...@cloudera.com wrote: Cloudera announced last night our support for Accumulo 1.4.3 on CDH4: http://www.slideshare.net/JoeyEcheverria/apache-accumulo-and-cloudera This required back porting about 11 patches in whole or in part from the 1.5 line on top of 1.4.3. Our release is still in a semi-private beta, but when it's fully public it will be downloadable along with all of the extra patches that we committed. My question is if the community would be interested in us pulling those back ports upstream? What testing has been done? It would be nice to run accumulo's full test suite against 1.4.3+CDH4. Are there any Accumulo API changes or Accumulo behavior changes? I believe this would violate the previously agreed upon rule of no feature back ports to 1.4.3, depending on how we label support for Hadoop 2.0. Thoughts? -Joey -- Joey Echeverria Director, Federal FTS Cloudera, Inc.
Re: Hadoop 2.0 Support for Accumulo 1.4 Branch
If these patches are going to be included with 1.4.4 or 1.4.5, I would like to see the following test run using CDH4 on at least a 5 node cluster. More nodes would be better. * unit test * Functional test * 24 hr Continuous ingest + verification * 24 hr Continuous ingest + verification + agitation * 24 hr Random walk * 24 hr Random walk + agitation I may be able to assist with this, but I can not make any promises. Sure thing. Is there already a write-up on running this full battery of tests? I have a 10 node cluster that I can use for this. Great. I think this would be a good patch for 1.4. I assume that if a user stays with Hadoop 1 there are no dependency changes? Yup. It works the same way as 1.5 where all of the dependency changes are in a Hadoop 2.0 profile. -Joey
Re: Hadoop 2.0 Support for Accumulo 1.4 Branch
On Fri, Jul 26, 2013 at 11:33 AM, Joey Echeverria j...@cloudera.com wrote: If these patches are going to be included with 1.4.4 or 1.4.5, I would like to see the following test run using CDH4 on at least a 5 node cluster. More nodes would be better. * unit test * Functional test * 24 hr Continuous ingest + verification * 24 hr Continuous ingest + verification + agitation * 24 hr Random walk * 24 hr Random walk + agitation I may be able to assist with this, but I can not make any promises. Sure thing. Is there already a write-up on running this full battery of tests? I have a 10 node cluster that I can use for this. Great. I think this would be a good patch for 1.4. I assume that if a user stays with Hadoop 1 there are no dependency changes? Yup. It works the same way as 1.5 where all of the dependency changes are in a Hadoop 2.0 profile. In 1.5.0, we gave up on compatibility with 0.20 (and early versions of 1.0) to make the compatibility requirements simpler; we ended up without dependency changes in the hadoop version profiles. Will 1.4 still work with 0.20 with these patches? If there are dependency changes in the profiles, 1.4 would have to be compiled against a hadoop version compatible with the running version of hadoop, correct? We had some trouble in the 1.5 release process with figuring out how to provide multiple binary artifacts (each compiled against a different version of hadoop) for the same release. Just something we should consider before we are in the midst of releasing 1.4.4. Billie -Joey
Re: Hadoop 2.0 Support for Accumulo 1.4 Branch
On Fri, Jul 26, 2013 at 2:33 PM, Joey Echeverria j...@cloudera.com wrote: If these patches are going to be included with 1.4.4 or 1.4.5, I would like to see the following test run using CDH4 on at least a 5 node cluster. More nodes would be better. * unit test * Functional test * 24 hr Continuous ingest + verification * 24 hr Continuous ingest + verification + agitation * 24 hr Random walk * 24 hr Random walk + agitation I may be able to assist with this, but I can not make any promises. Sure thing. Is there already a write-up on running this full battery of tests? I have a 10 node cluster that I can use for this. There are some instructions. test/system/continuous/README test/system/randomwalk/README Continuous ingest has a lot of options. For release testing we do something like the following. #configure may need to adjust max mappers and max reducers to make map reduce job run faster start-ingest.sh start-walker.sh #sleep 24hr stop-ingest.sh stop-walker.sh run-verify.sh There continuous dir has scripts for starting and stopping the agitator. We also use this script to agitate while running random walk test. For random walk we use the All.xml graph, configure it to log errors to NFS, and run a walker on each node. We look in NFS for walkers that died or got stuck. The random walk framework will log a message if a node in the graph gets stuck. It will also log a message when it gets unstuck. Great. I think this would be a good patch for 1.4. I assume that if a user stays with Hadoop 1 there are no dependency changes? Yup. It works the same way as 1.5 where all of the dependency changes are in a Hadoop 2.0 profile. -Joey
Re: Hadoop 2.0 Support for Accumulo 1.4 Branch
Will 1.4 still work with 0.20 with these patches? Great point Billie. - Original Message - From: Billie Rinaldi billie.rina...@gmail.com To: dev@accumulo.apache.org Sent: Friday, July 26, 2013 3:02:41 PM Subject: Re: Hadoop 2.0 Support for Accumulo 1.4 Branch On Fri, Jul 26, 2013 at 11:33 AM, Joey Echeverria j...@cloudera.com wrote: If these patches are going to be included with 1.4.4 or 1.4.5, I would like to see the following test run using CDH4 on at least a 5 node cluster. More nodes would be better. * unit test * Functional test * 24 hr Continuous ingest + verification * 24 hr Continuous ingest + verification + agitation * 24 hr Random walk * 24 hr Random walk + agitation I may be able to assist with this, but I can not make any promises. Sure thing. Is there already a write-up on running this full battery of tests? I have a 10 node cluster that I can use for this. Great. I think this would be a good patch for 1.4. I assume that if a user stays with Hadoop 1 there are no dependency changes? Yup. It works the same way as 1.5 where all of the dependency changes are in a Hadoop 2.0 profile. In 1.5.0, we gave up on compatibility with 0.20 (and early versions of 1.0) to make the compatibility requirements simpler; we ended up without dependency changes in the hadoop version profiles. Will 1.4 still work with 0.20 with these patches? If there are dependency changes in the profiles, 1.4 would have to be compiled against a hadoop version compatible with the running version of hadoop, correct? We had some trouble in the 1.5 release process with figuring out how to provide multiple binary artifacts (each compiled against a different version of hadoop) for the same release. Just something we should consider before we are in the midst of releasing 1.4.4. Billie -Joey