Re: Hadoop 2.0 Support for Accumulo 1.4 Branch

2013-11-14 Thread Christopher
The main thing is that I would not want to see an ACCUMULO-1790
*without* ACCUMULO-1795. Having 1792 alone would be insufficient for
me.

--
Christopher L Tubbs II
http://gravatar.com/ctubbsii


On Tue, Nov 12, 2013 at 9:22 AM, Sean Busbey bus...@clouderagovt.com wrote:
 On Fri, Oct 18, 2013 at 12:29 AM, Sean Busbey bus...@cloudera.com wrote:

 On Tue, Oct 15, 2013 at 10:20 AM, Sean Busbey bus...@cloudera.com wrote:


 On Tue, Oct 15, 2013 at 10:16 AM, Sean Busbey bus...@cloudera.comwrote:


 On Tue, Oct 15, 2013 at 7:16 AM, dlmar...@comcast.net wrote:

 Just to be clear, we are talking about adding profile support to the
 pom's for Hadoop 2.2.0 for a 1.4.5 and 1.5.1 release, correct? We are not
 talking about changing the default build profile for these branches are 
 we?



 for 1.4.5-SNAPSHOT I am only talking about adding support Hadoop 2.2.0.
 I am not suggesting we change the default from building against Hadoop
 0.23.203.



 I mean 0.20.203.0. Ugh, Hadoop versions.



 Okay, barring additional suggestions, tomorrow afternoon I'll break things
 down into an umbrella and 3 sub tasks:

 1) addition of hadoop 2 support

  - to include backports of commits
  - to include making the target hadoop 2 version 2.2.0
  - to include test changes that flex hadoop 2 features like fail over

 2) ensuring compatibility for 0.20.203

 - presuming some subset of the commits in 1) will break it since 0.20
 support was left behind in 1.5

 3) doc / packaging updates

 - the issue of binary releases per distro
 - doc patch for what version(s) the release tests are expected to run
 against

 Once work is put against those tickets, I'd expect things to go into a
 branch based on the umbrella ticket until such time as the complete work
 can pass the test suite that we'll use at the next release. Then it can get
 rebased onto the 1.4.x dev branch.

 --
 Sean


 Based on recent feedback on ACCUMULO-1792 and ACCUMULO-1795, I want to
 resurrect this thread to make sure everyone's concerns are addressed.

 For context, here's a link to the start of the last thread:

 http://bit.ly/1aPqKuH

 From ACCUMULO-1792, ctubbsii:

 I'd be reluctant to support any Hadoop 2.x support in the 1.4 release
 line that breaks compatibility with 0.20. I don't think breaking 0.20
 and then possibly fixing it again as a second step is acceptable (because
 that subsequent work may not ever be done, and I don't think
 we should break the compatibility contract that we've established with
 1.4.0).

 Chris, I believe keeping all of the work in a branch under the umbrella
 jira of ACCUMULO-1790 will ensure that we don't end up with a 1.4 release
 that doesn't have proper support for 0.20.203.

 Is there something beyond making sure the branch passes a full set of
 release tests on 0.20.203 that you'd like to see? In the event that the
 branch only ever contains the work for adding Hadoop 2, it's a simple
 matter to abandon without rolling into the 1.4 development line.

 From ACCUMULO-1795, bills (and +1ed by elserj and ctubbsii):

 I'm very uncomfortable with risking breaking continuity in such an old
 release, and I don't think managing two lines of 1.4 releases is
 worth the effort. Though we have no official EOL policy, 1.3 was
 practically dead in the water once 1.4 was around, and I hope we start
 encouraging more adoption of 1.5 (and soon 1.6) versus continually
 propping up 1.4.

 I'd love to get people to move off of 1.4. However, I think adding Hadoop 2
 support to 1.4 encourages this more than leaving it out.

 Accumulo 1.5.x places a higher burden on HDFS than 1.4 did, and I'm not
 surprised people find relying on 0.20 for the 1.5 WAL intimidating.
 Upgrading both HDFS and Accumulo across major versions at once is asking
 them to take on a bunch of risk. By adding in Hadoop 2 support to 1.4 we
 allow them to break the risk up into steps: they can upgrade HDFS versions
 first, get comfortable, then upgrade Accumulo to 1.5.

 I think the existing tickets under the umbrella of ACCUMULO-1790 should
 ensure that we end up with a single 1.4 line that can work with either the
 existing 0.20.203.0 claimed in releases or against 2.2.0.

 Bill (or Josh or Chris), is there stronger language you'd like to see
 around docs / packaging (area #3 in the original plan and currently
 ACCUMULO-1796)? Maybe expressly only doing a binary convenience package for
 0.20.203.0? Are you looking for something beyond a full release suite to
 ensure 1.4 is still maintaining compatibility on Hadoop 0.20.203?


 -Sean


Re: Hadoop 2.0 Support for Accumulo 1.4 Branch

2013-11-14 Thread Christopher
On Tue, Nov 12, 2013 at 4:49 PM, Sean Busbey busbey...@clouderagovt.com wrote:
 On Tue, Nov 12, 2013 at 3:14 PM, William Slacum 
 wilhelm.von.cl...@accumulo.net wrote:

 The language of ACCUMULO-1795 indicated that an acceptable state was
 something that wasn't binary compatible. That's my #1 thing to avoid.


 Ah. So I see, not sure why I phrased that that way. Since the default build
 should still be 0.20.203.0, I'm not sure how it'd end up not being binary
 compatible. I can update the ticket to clarify the language. Any need to
 compile should be limited to running Hadoop 2.2.0.

 Sound good?

+1
(The confusing wording was the basis for my concerns also.)

  Maybe expressly only doing a binary convenience package for
  0.20.203.0?

 If we need an extra package, doesn't that mean a user can't just upgrade
 Accumulo?


 By binary convenience package I mean the binary distribution tarball (or
 rpms, or whatevs) that we make as a part of the release process. For users
 of Hadoop 0.20.203.0, upgrading should be unchanged from how they would
 normally get their Accumulo 1.4.x distribution.

 ACCUMULO-1796 has some leeway about the convenience packages for people who
 want Hadoop 2 support. On the extreme end, they'd have to build from source
 and then run a normal upgrade process.

I'd prefer binary compatibility with a single build, but if that's too
hard to achieve, I have no objection to providing a mechanism to
perform an alternate build against 2.x (whether or not we provide a
pre-built binary package for it), so long as the default build is
0.20.x

--
Christopher L Tubbs II
http://gravatar.com/ctubbsii


Re: Hadoop 2.0 Support for Accumulo 1.4 Branch

2013-11-14 Thread Sean Busbey
On Thu, Nov 14, 2013 at 6:27 PM, Christopher ctubb...@apache.org wrote:

 The main thing is that I would not want to see an ACCUMULO-1790
 *without* ACCUMULO-1795. Having 1792 alone would be insufficient for
 me.


That is precisely the intention of ACCUMULO-1790. All of the subtasks
(including ACCUMULO-1792 and ACCUMULO-1795) have to be complete for things
to get into the 1.4 branch. Until that time the work would just go into a
feature branch for ACCUMULO-1790 (to make working and testing easier for
those implementing the subtasks). If you wanted to see the full
implementation you would just wait until all of the subtasks were committed
to the feature branch.

Am I missing something?


-- 
Sean


Re: Hadoop 2.0 Support for Accumulo 1.4 Branch

2013-11-14 Thread Christopher
Nope, I think we're on the same page now.

--
Christopher L Tubbs II
http://gravatar.com/ctubbsii


On Thu, Nov 14, 2013 at 7:39 PM, Sean Busbey busbey...@clouderagovt.com wrote:
 On Thu, Nov 14, 2013 at 6:27 PM, Christopher ctubb...@apache.org wrote:

 The main thing is that I would not want to see an ACCUMULO-1790
 *without* ACCUMULO-1795. Having 1792 alone would be insufficient for
 me.


 That is precisely the intention of ACCUMULO-1790. All of the subtasks
 (including ACCUMULO-1792 and ACCUMULO-1795) have to be complete for things
 to get into the 1.4 branch. Until that time the work would just go into a
 feature branch for ACCUMULO-1790 (to make working and testing easier for
 those implementing the subtasks). If you wanted to see the full
 implementation you would just wait until all of the subtasks were committed
 to the feature branch.

 Am I missing something?


 --
 Sean


Re: Hadoop 2.0 Support for Accumulo 1.4 Branch

2013-11-12 Thread Josh Elser


Based on recent feedback on ACCUMULO-1792 and ACCUMULO-1795, I want to
resurrect this thread to make sure everyone's concerns are addressed.

For context, here's a link to the start of the last thread:

http://bit.ly/1aPqKuH

 From ACCUMULO-1792, ctubbsii:


I'd be reluctant to support any Hadoop 2.x support in the 1.4 release

line that breaks compatibility with 0.20. I don't think breaking 0.20

and then possibly fixing it again as a second step is acceptable (because

that subsequent work may not ever be done, and I don't think

we should break the compatibility contract that we've established with

1.4.0).

Chris, I believe keeping all of the work in a branch under the umbrella
jira of ACCUMULO-1790 will ensure that we don't end up with a 1.4 release
that doesn't have proper support for 0.20.203.

Is there something beyond making sure the branch passes a full set of
release tests on 0.20.203 that you'd like to see? In the event that the
branch only ever contains the work for adding Hadoop 2, it's a simple
matter to abandon without rolling into the 1.4 development line.

 From ACCUMULO-1795, bills (and +1ed by elserj and ctubbsii):


I'm very uncomfortable with risking breaking continuity in such an old

release, and I don't think managing two lines of 1.4 releases is

worth the effort. Though we have no official EOL policy, 1.3 was

practically dead in the water once 1.4 was around, and I hope we start

encouraging more adoption of 1.5 (and soon 1.6) versus continually

propping up 1.4.

I'd love to get people to move off of 1.4. However, I think adding Hadoop 2
support to 1.4 encourages this more than leaving it out.


I'm not sure I agree that adding Hadoop2 support to 1.4 encourages 
people to upgrade Accumulo. My gut reaction would be that it allows 
people to completely ignore Accumulo updates (ignoring moving to 1.4.5 
which would allow them to do hadoop2 with your proposed changes)



Accumulo 1.5.x places a higher burden on HDFS than 1.4 did, and I'm not
surprised people find relying on 0.20 for the 1.5 WAL intimidating.
Upgrading both HDFS and Accumulo across major versions at once is asking
them to take on a bunch of risk. By adding in Hadoop 2 support to 1.4 we
allow them to break the risk up into steps: they can upgrade HDFS versions
first, get comfortable, then upgrade Accumulo to 1.5.


Personally, maintaining 0.20 compatibility is not a big concern on my 
radar. If you're still running an 0.20 release, I'd *really* hope that 
you have an upgrade path to 1.2.x (if not 2.2.x) scheduled.


I think claiming that 1.5 has a higher burden on 1.4 is a bit of a 
fallacy. There were many problems and pains regarding WALs in =1.4 that 
are very difficult to work with in a large environment (try finding WALs 
in server failure cases). I think the increased I/O on HDFS is a much 
smaller cost than the completely different I/O path that the old loggers 
have.


I also think upgrading Accumulo is much less scary than upgrading HDFS, 
but that's just me.


To me, it seems like the argument may be coming down to whether or not 
we break 0.20 hadoop compatibility on a bug-fix release and how 
concerned we are about letting users lag behind the upstream development.



I think the existing tickets under the umbrella of ACCUMULO-1790 should
ensure that we end up with a single 1.4 line that can work with either the
existing 0.20.203.0 claimed in releases or against 2.2.0.

Bill (or Josh or Chris), is there stronger language you'd like to see
around docs / packaging (area #3 in the original plan and currently
ACCUMULO-1796)? Maybe expressly only doing a binary convenience package for
0.20.203.0? Are you looking for something beyond a full release suite to
ensure 1.4 is still maintaining compatibility on Hadoop 0.20.203?



Again, my biggest concern here is not following our own guidelines of 
breaking changes across minor releases, but I'd hope 0.20 users have an 
upgrade path outlined for themselves.


Re: Hadoop 2.0 Support for Accumulo 1.4 Branch

2013-11-12 Thread William Slacum
A user of 1.4.a should be able to move to 1.4.b without any major
infrastructure changes, such as swapping out HDFS or installing extra
add-ons.

I don't find much merit in debating local WAL vs HDFS WAL cost/benefit
since the only quantifiable evidence we have supported the move.

I should note, Sean, that if you see merit in the work, you don't need
community approval for forking and sharing. However, I do not think it is
in the community's best interest to continue to upgrade 1.4.



On Tue, Nov 12, 2013 at 2:12 PM, Josh Elser josh.el...@gmail.com wrote:


 Based on recent feedback on ACCUMULO-1792 and ACCUMULO-1795, I want to
 resurrect this thread to make sure everyone's concerns are addressed.

 For context, here's a link to the start of the last thread:

 http://bit.ly/1aPqKuH

  From ACCUMULO-1792, ctubbsii:

  I'd be reluctant to support any Hadoop 2.x support in the 1.4 release

 line that breaks compatibility with 0.20. I don't think breaking 0.20

 and then possibly fixing it again as a second step is acceptable (because

 that subsequent work may not ever be done, and I don't think

 we should break the compatibility contract that we've established with

 1.4.0).

 Chris, I believe keeping all of the work in a branch under the umbrella
 jira of ACCUMULO-1790 will ensure that we don't end up with a 1.4 release
 that doesn't have proper support for 0.20.203.

 Is there something beyond making sure the branch passes a full set of
 release tests on 0.20.203 that you'd like to see? In the event that the
 branch only ever contains the work for adding Hadoop 2, it's a simple
 matter to abandon without rolling into the 1.4 development line.

  From ACCUMULO-1795, bills (and +1ed by elserj and ctubbsii):

  I'm very uncomfortable with risking breaking continuity in such an old

 release, and I don't think managing two lines of 1.4 releases is

 worth the effort. Though we have no official EOL policy, 1.3 was

 practically dead in the water once 1.4 was around, and I hope we start

 encouraging more adoption of 1.5 (and soon 1.6) versus continually

 propping up 1.4.

 I'd love to get people to move off of 1.4. However, I think adding Hadoop
 2
 support to 1.4 encourages this more than leaving it out.


 I'm not sure I agree that adding Hadoop2 support to 1.4 encourages people
 to upgrade Accumulo. My gut reaction would be that it allows people to
 completely ignore Accumulo updates (ignoring moving to 1.4.5 which would
 allow them to do hadoop2 with your proposed changes)


  Accumulo 1.5.x places a higher burden on HDFS than 1.4 did, and I'm not
 surprised people find relying on 0.20 for the 1.5 WAL intimidating.
 Upgrading both HDFS and Accumulo across major versions at once is asking
 them to take on a bunch of risk. By adding in Hadoop 2 support to 1.4 we
 allow them to break the risk up into steps: they can upgrade HDFS versions
 first, get comfortable, then upgrade Accumulo to 1.5.


 Personally, maintaining 0.20 compatibility is not a big concern on my
 radar. If you're still running an 0.20 release, I'd *really* hope that you
 have an upgrade path to 1.2.x (if not 2.2.x) scheduled.

 I think claiming that 1.5 has a higher burden on 1.4 is a bit of a
 fallacy. There were many problems and pains regarding WALs in =1.4 that
 are very difficult to work with in a large environment (try finding WALs in
 server failure cases). I think the increased I/O on HDFS is a much smaller
 cost than the completely different I/O path that the old loggers have.

 I also think upgrading Accumulo is much less scary than upgrading HDFS,
 but that's just me.

 To me, it seems like the argument may be coming down to whether or not we
 break 0.20 hadoop compatibility on a bug-fix release and how concerned we
 are about letting users lag behind the upstream development.


  I think the existing tickets under the umbrella of ACCUMULO-1790 should
 ensure that we end up with a single 1.4 line that can work with either the
 existing 0.20.203.0 claimed in releases or against 2.2.0.

 Bill (or Josh or Chris), is there stronger language you'd like to see
 around docs / packaging (area #3 in the original plan and currently
 ACCUMULO-1796)? Maybe expressly only doing a binary convenience package
 for
 0.20.203.0? Are you looking for something beyond a full release suite to
 ensure 1.4 is still maintaining compatibility on Hadoop 0.20.203?


 Again, my biggest concern here is not following our own guidelines of
 breaking changes across minor releases, but I'd hope 0.20 users have an
 upgrade path outlined for themselves.



Re: Hadoop 2.0 Support for Accumulo 1.4 Branch

2013-11-12 Thread Sean Busbey
On Tue, Nov 12, 2013 at 1:12 PM, Josh Elser josh.el...@gmail.com wrote:



 To me, it seems like the argument may be coming down to whether or not we
 break 0.20 hadoop compatibility on a bug-fix release and how concerned we
 are about letting users lag behind the upstream development.


  I think the existing tickets under the umbrella of ACCUMULO-1790 should
 ensure that we end up with a single 1.4 line that can work with either the
 existing 0.20.203.0 claimed in releases or against 2.2.0.

 Bill (or Josh or Chris), is there stronger language you'd like to see
 around docs / packaging (area #3 in the original plan and currently
 ACCUMULO-1796)? Maybe expressly only doing a binary convenience package
 for
 0.20.203.0? Are you looking for something beyond a full release suite to
 ensure 1.4 is still maintaining compatibility on Hadoop 0.20.203?


 Again, my biggest concern here is not following our own guidelines of
 breaking changes across minor releases, but I'd hope 0.20 users have an
 upgrade path outlined for themselves.



The plan outlined in the original thread, and in the subtasks under
ACCUMULO-1790, is expressly aimed at not breaking 0.20 compatibility in the
1.4 bugfix line. If there's anything we can do besides running through the
release test suite on a 0.20 cluster to help ensure that, I am interested
in adding it to the existing plan.


-- 
Sean


Re: Hadoop 2.0 Support for Accumulo 1.4 Branch

2013-11-12 Thread Sean Busbey
On Tue, Nov 12, 2013 at 1:28 PM, William Slacum 
wilhelm.von.cl...@accumulo.net wrote:

 A user of 1.4.a should be able to move to 1.4.b without any major
 infrastructure changes, such as swapping out HDFS or installing extra
 add-ons.



Right, exactly. Hopefully no part of the original plan contradicts this. Is
there something that appears to?


-- 
Sean


Re: Hadoop 2.0 Support for Accumulo 1.4 Branch

2013-11-12 Thread Josh Elser

On 11/12/13, 12:24 PM, Sean Busbey wrote:

On Tue, Nov 12, 2013 at 1:12 PM, Josh Elser josh.el...@gmail.com wrote:






To me, it seems like the argument may be coming down to whether or not we
break 0.20 hadoop compatibility on a bug-fix release and how concerned we
are about letting users lag behind the upstream development.


  I think the existing tickets under the umbrella of ACCUMULO-1790 should

ensure that we end up with a single 1.4 line that can work with either the
existing 0.20.203.0 claimed in releases or against 2.2.0.

Bill (or Josh or Chris), is there stronger language you'd like to see
around docs / packaging (area #3 in the original plan and currently
ACCUMULO-1796)? Maybe expressly only doing a binary convenience package
for
0.20.203.0? Are you looking for something beyond a full release suite to
ensure 1.4 is still maintaining compatibility on Hadoop 0.20.203?



Again, my biggest concern here is not following our own guidelines of
breaking changes across minor releases, but I'd hope 0.20 users have an
upgrade path outlined for themselves.




The plan outlined in the original thread, and in the subtasks under
ACCUMULO-1790, is expressly aimed at not breaking 0.20 compatibility in the
1.4 bugfix line. If there's anything we can do besides running through the
release test suite on a 0.20 cluster to help ensure that, I am interested
in adding it to the existing plan.




What about the other half: encouraging users to lag (soon to be) two 
major releases behind?


Re: Hadoop 2.0 Support for Accumulo 1.4 Branch

2013-11-12 Thread William Slacum
The language of ACCUMULO-1795 indicated that an acceptable state was
something that wasn't binary compatible. That's my #1 thing to avoid.

 Maybe expressly only doing a binary convenience package for
 0.20.203.0?

If we need an extra package, doesn't that mean a user can't just upgrade
Accumulo?

As a side note, 0.20.203.0 is 1.4,

On Tue, Nov 12, 2013 at 3:28 PM, Sean Busbey busbey...@clouderagovt.comwrote:

 On Tue, Nov 12, 2013 at 1:28 PM, William Slacum 
 wilhelm.von.cl...@accumulo.net wrote:

  A user of 1.4.a should be able to move to 1.4.b without any major
  infrastructure changes, such as swapping out HDFS or installing extra
  add-ons.
 
 

 Right, exactly. Hopefully no part of the original plan contradicts this. Is
 there something that appears to?


 --
 Sean



Re: Hadoop 2.0 Support for Accumulo 1.4 Branch

2013-11-12 Thread Sean Busbey
On Tue, Nov 12, 2013 at 2:48 PM, Josh Elser josh.el...@gmail.com wrote:



 What about the other half: encouraging users to lag (soon to be) two
 major releases behind?



I don't think our current user base needs to be encouraged strongly to
upgrade. And as I said previously I think this change provides them with an
upgrade path that's easier to stomach, but I suspect this is a point we
disagree on.

-- 
Sean


Re: Hadoop 2.0 Support for Accumulo 1.4 Branch

2013-10-15 Thread dlmarion
Just to be clear, we are talking about adding profile support to the pom's for 
Hadoop 2.2.0 for a 1.4.5 and 1.5.1 release, correct? We are not talking about 
changing the default build profile for these branches are we? 

- Original Message -

From: Billie Rinaldi billie.rina...@gmail.com 
To: dev@accumulo.apache.org 
Sent: Monday, October 14, 2013 11:57:40 PM 
Subject: Re: Hadoop 2.0 Support for Accumulo 1.4 Branch 

Thanks for the note, Ted. That vote is for 2.2.0, not -beta. 
On Oct 14, 2013 7:30 PM, Ted Yu yuzhih...@gmail.com wrote: 

 w.r.t. hadoop-2 release, see this thread: 
 
 http://search-hadoop.com/m/YSTny19y1Ha1/hadoop+2.2.0 
 
 Looks like 2.2.0-beta would pass votes. 
 
 Cheers 
 
 
 On Mon, Oct 14, 2013 at 7:24 PM, Mike Drob md...@mdrob.com wrote: 
 
  Responses Inline. 
  
  - Mike 
  
  On Mon, Oct 14, 2013 at 12:55 PM, Sean Busbey bus...@cloudera.com 
 wrote: 
  
   Hey All, 
   
   I'd like to restart the conversation from end July / start August about 
   Hadoop 2 support on the 1.4 branch. 
   
   Specifically, I'd like to get some requirements ironed out so I can 
 file 
   one or more jiras. I'd also like to get a plan for application. 
   
   =requirements 
   
   Here's the requirements I have from the last thread: 
   
   1)  Maintain existing 1.4 compatibility 
   
   The only thing I see listed in the pom is Apache release 0.20.203.0. 
  (1.4.4 
   tag)[1] 
   
   I don't see anything in the README[2] nor the user manual[3] on other 
   versions being supported. 
   
   Yep. 
  
  
   2) Gain Hadoop 2 support 
   
   At the moment, I'm presuming this means Apache release 2.0.4-alpha 
 since 
   that's what 1.5.0 builds against for Hadoop 2. 
   
   I haven't been following the Hadoop 2 release schedule that closely, 
 but 
  I 
  think the latest is a 2.1.0-beta? Pretty sure it was released after we 
  finished Accumulo 1.5, so there's no reason not to support it in my mind. 
  Depending on an alpha of something strikes me as either unstable or 
 lazy, 
  although I fully understand that it may be neither. 
  
  
   3) Test for correctness on given versions, with = 5 node cluster 
   
   * Unit Tests 
   * Functional Tests 
   * 24hr continuous + verification 
   * 24hr continuous + verification + agitation 
   * 24hr random walk 
   * 24hr random walk + agitation 
   
   Keith mentioned running these against a CDH4 cluster, but I presume 
 that 
   since Apache Releases are our stated compatibilities it would actually 
 be 
   against whatever versions we list. Based on #1 and #2 above, I would 
  expect 
   that to be Apache Hadoop 0.20.203.0 and Apache Hadoop 2.0.4-alpha. 
   
   Hadoop 2 introduces some neat new things like NN HA, which I think it 
  might be worthwhile to test with. At that level it might be more of a 
  verification of the Hadoop code, but I'd like to be comfortable that our 
  DFS Clients switch correctly. This is in addition to the standard release 
  suite that we run. [1] 
  
  [1]: http://accumulo.apache.org/governance/releasing.html#testing 
  
  
   4) Binary packaging 
   4a) Either source produces a single binary for all accepted versions 
   
   or 
   
   4b) Instructions for building from source for each versions and somehow 
   flag what (if any) convenience binaries are made for the release. 
   
   
  Having run the binary packaging for 1.4.4, I can tell you that it is not 
 in 
  great shape. Christopher cleaned up a lot of the issues in the 1.5 line, 
 so 
  I didn't bother spending a ton of time on them here, but I think RPM and 
  DEB are both broken. It would be nice to be able to specify a Hadoop 2 
  version for compilation, similar to what happens in the newer code base, 
  which could be back ported, I suppose. 4b seems easier. 
  
  =application 
   
   There will be many back-ported patches. Not much active development 
  happens 
   on 1.4.x now, but I presume this should still all go onto a feature 
  branch? 
   
   Is the community preference that eventually all the changes become a 
  single 
   commit (or one-per-subtask if there are multiple jiras) on the active 
 1.4 
   development branch, or that the original patches remain broken out? 
   
   Not sure what you mean by this. 
  
  
   For what it's worth, I'd recommend keeping them broken out. (And that's 
  how 
   the initial development against CDH4 has been done.) 
   
   
   [1] http://bit.ly/1fxucMe 
   [2] http://bit.ly/192zUAJ 
   [3] 
   
  
 http://accumulo.apache.org/1.4/user_manual/Administration.html#Dependencies 
   
   -- 
   Sean 
   
  
 



Re: Hadoop 2.0 Support for Accumulo 1.4 Branch

2013-10-15 Thread Sean Busbey
On Tue, Oct 15, 2013 at 7:16 AM, dlmar...@comcast.net wrote:

 Just to be clear, we are talking about adding profile support to the pom's
 for Hadoop 2.2.0 for a 1.4.5 and 1.5.1 release, correct? We are not talking
 about changing the default build profile for these branches are we?



for 1.4.5-SNAPSHOT I am only talking about adding support Hadoop 2.2.0. I
am not suggesting we change the default from building against Hadoop
0.23.203.


I'm not sure about the change to 1.5.1-SNAPSHOT. I believe we're talking
about changing the hadoop.profile for 2.0 to use the 2.2.0 release. I don't
think it makes sense to change the default off of the version in the
hadoop.profile for 1.0.

Presumably this change would also happen in master. Now that Hadoop 2.x is
going to have a GA release, I think it makes sense to have a discussion
about changing the default to be the hadoop 2.0 profile for master, but
this is not that discussion.


-- 
Sean


Re: Hadoop 2.0 Support for Accumulo 1.4 Branch

2013-10-15 Thread Sean Busbey
On Tue, Oct 15, 2013 at 10:16 AM, Sean Busbey bus...@cloudera.com wrote:


 On Tue, Oct 15, 2013 at 7:16 AM, dlmar...@comcast.net wrote:

 Just to be clear, we are talking about adding profile support to the
 pom's for Hadoop 2.2.0 for a 1.4.5 and 1.5.1 release, correct? We are not
 talking about changing the default build profile for these branches are we?



 for 1.4.5-SNAPSHOT I am only talking about adding support Hadoop 2.2.0. I
 am not suggesting we change the default from building against Hadoop
 0.23.203.



I mean 0.20.203.0. Ugh, Hadoop versions.

-- 
Sean


Re: Hadoop 2.0 Support for Accumulo 1.4 Branch

2013-10-15 Thread Joey Echeverria
I think you meant:

Ugh, Hadoop versions.[1]

[1]
http://blog.cloudera.com/blog/2012/04/apache-hadoop-versions-looking-ahead-3/


On Tue, Oct 15, 2013 at 11:20 AM, Sean Busbey bus...@cloudera.com wrote:

 On Tue, Oct 15, 2013 at 10:16 AM, Sean Busbey bus...@cloudera.com wrote:

 
  On Tue, Oct 15, 2013 at 7:16 AM, dlmar...@comcast.net wrote:
 
  Just to be clear, we are talking about adding profile support to the
  pom's for Hadoop 2.2.0 for a 1.4.5 and 1.5.1 release, correct? We are
 not
  talking about changing the default build profile for these branches are
 we?
 
 
 
  for 1.4.5-SNAPSHOT I am only talking about adding support Hadoop 2.2.0. I
  am not suggesting we change the default from building against Hadoop
  0.23.203.
 
 
 
 I mean 0.20.203.0. Ugh, Hadoop versions.

 --
 Sean




-- 
Joey Echeverria
Director, Federal FTS
Cloudera, Inc.


Hadoop 2.0 Support for Accumulo 1.4 Branch

2013-10-14 Thread Sean Busbey
Hey All,

I'd like to restart the conversation from end July / start August about
Hadoop 2 support on the 1.4 branch.

Specifically, I'd like to get some requirements ironed out so I can file
one or more jiras. I'd also like to get a plan for application.

=requirements

Here's the requirements I have from the last thread:

1)  Maintain existing 1.4 compatibility

The only thing I see listed in the pom is Apache release 0.20.203.0. (1.4.4
tag)[1]

I don't see anything in the README[2] nor the user manual[3] on other
versions being supported.


2) Gain Hadoop 2 support

At the moment, I'm presuming this means Apache release 2.0.4-alpha since
that's what 1.5.0 builds against for Hadoop 2.

3) Test for correctness on given versions, with = 5 node cluster

* Unit Tests
* Functional Tests
* 24hr continuous + verification
* 24hr continuous + verification + agitation
* 24hr random walk
* 24hr random walk + agitation

Keith mentioned running these against a CDH4 cluster, but I presume that
since Apache Releases are our stated compatibilities it would actually be
against whatever versions we list. Based on #1 and #2 above, I would expect
that to be Apache Hadoop 0.20.203.0 and Apache Hadoop 2.0.4-alpha.

4) Binary packaging
4a) Either source produces a single binary for all accepted versions

or

4b) Instructions for building from source for each versions and somehow
flag what (if any) convenience binaries are made for the release.

=application

There will be many back-ported patches. Not much active development happens
on 1.4.x now, but I presume this should still all go onto a feature branch?

Is the community preference that eventually all the changes become a single
commit (or one-per-subtask if there are multiple jiras) on the active 1.4
development branch, or that the original patches remain broken out?

For what it's worth, I'd recommend keeping them broken out. (And that's how
the initial development against CDH4 has been done.)


[1] http://bit.ly/1fxucMe
[2] http://bit.ly/192zUAJ
[3]
http://accumulo.apache.org/1.4/user_manual/Administration.html#Dependencies

-- 
Sean


Re: Hadoop 2.0 Support for Accumulo 1.4 Branch

2013-10-14 Thread Mike Drob
Responses Inline.

- Mike

On Mon, Oct 14, 2013 at 12:55 PM, Sean Busbey bus...@cloudera.com wrote:

 Hey All,

 I'd like to restart the conversation from end July / start August about
 Hadoop 2 support on the 1.4 branch.

 Specifically, I'd like to get some requirements ironed out so I can file
 one or more jiras. I'd also like to get a plan for application.

 =requirements

 Here's the requirements I have from the last thread:

 1)  Maintain existing 1.4 compatibility

 The only thing I see listed in the pom is Apache release 0.20.203.0. (1.4.4
 tag)[1]

 I don't see anything in the README[2] nor the user manual[3] on other
 versions being supported.

 Yep.


 2) Gain Hadoop 2 support

 At the moment, I'm presuming this means Apache release 2.0.4-alpha since
 that's what 1.5.0 builds against for Hadoop 2.

 I haven't been following the Hadoop 2 release schedule that closely, but I
think the latest is a 2.1.0-beta? Pretty sure it was released after we
finished Accumulo 1.5, so there's no reason not to support it in my mind.
Depending on an alpha of something strikes me as either unstable or lazy,
although I fully understand that it may be neither.


 3) Test for correctness on given versions, with = 5 node cluster

 * Unit Tests
 * Functional Tests
 * 24hr continuous + verification
 * 24hr continuous + verification + agitation
 * 24hr random walk
 * 24hr random walk + agitation

 Keith mentioned running these against a CDH4 cluster, but I presume that
 since Apache Releases are our stated compatibilities it would actually be
 against whatever versions we list. Based on #1 and #2 above, I would expect
 that to be Apache Hadoop 0.20.203.0 and Apache Hadoop 2.0.4-alpha.

 Hadoop 2 introduces some neat new things like NN HA, which I think it
might be worthwhile to test with. At that level it might be more of a
verification of the Hadoop code, but I'd like to be comfortable that our
DFS Clients switch correctly. This is in addition to the standard release
suite that we run. [1]

[1]: http://accumulo.apache.org/governance/releasing.html#testing


 4) Binary packaging
 4a) Either source produces a single binary for all accepted versions

 or

 4b) Instructions for building from source for each versions and somehow
 flag what (if any) convenience binaries are made for the release.


Having run the binary packaging for 1.4.4, I can tell you that it is not in
great shape. Christopher cleaned up a lot of the issues in the 1.5 line, so
I didn't bother spending a ton of time on them here, but I think RPM and
DEB are both broken. It would be nice to be able to specify a Hadoop 2
version for compilation, similar to what happens in the newer code base,
which could be back ported, I suppose. 4b seems easier.

=application

 There will be many back-ported patches. Not much active development happens
 on 1.4.x now, but I presume this should still all go onto a feature branch?

 Is the community preference that eventually all the changes become a single
 commit (or one-per-subtask if there are multiple jiras) on the active 1.4
 development branch, or that the original patches remain broken out?

 Not sure what you mean by this.


 For what it's worth, I'd recommend keeping them broken out. (And that's how
 the initial development against CDH4 has been done.)


 [1] http://bit.ly/1fxucMe
 [2] http://bit.ly/192zUAJ
 [3]
 http://accumulo.apache.org/1.4/user_manual/Administration.html#Dependencies

 --
 Sean



Re: Hadoop 2.0 Support for Accumulo 1.4 Branch

2013-10-14 Thread Josh Elser
For #2, from what I've read, we should definitely bump up the dependency 
on 1.5.1-SNAPSHOT to 2.1.0-beta, and, given what Ted replied with, to 
2.2.0-beta for that hadoop-2 profile.


I probably stated this before, but I'd much rather see more effort in 
testing Accumulo 1.5.x (and 1.6.0 as that will be feature frozen soon) 
against hadoop-2 (like Mike's point about HA). I'm not sure if anyone 
ever did testing of Accumulo with the hadoop-2 features -- I seem to 
recall that it was more testing does Accumulo run on both hadoop 1 and 2.


If we can maintain a single artifact, that would definitely be easiest 
for users, but falling back to user-built artifacts or convenience 
releases isn't the end of the world.


As far as commits, I'd like to see as much separation as possible, but 
it's understandable if the changes overlap and don't make sense to split 
out.


On 10/14/13 12:55 PM, Sean Busbey wrote:

Hey All,

I'd like to restart the conversation from end July / start August about
Hadoop 2 support on the 1.4 branch.

Specifically, I'd like to get some requirements ironed out so I can file
one or more jiras. I'd also like to get a plan for application.

=requirements

Here's the requirements I have from the last thread:

1)  Maintain existing 1.4 compatibility

The only thing I see listed in the pom is Apache release 0.20.203.0. (1.4.4
tag)[1]

I don't see anything in the README[2] nor the user manual[3] on other
versions being supported.


2) Gain Hadoop 2 support

At the moment, I'm presuming this means Apache release 2.0.4-alpha since
that's what 1.5.0 builds against for Hadoop 2.

3) Test for correctness on given versions, with = 5 node cluster

* Unit Tests
* Functional Tests
* 24hr continuous + verification
* 24hr continuous + verification + agitation
* 24hr random walk
* 24hr random walk + agitation

Keith mentioned running these against a CDH4 cluster, but I presume that
since Apache Releases are our stated compatibilities it would actually be
against whatever versions we list. Based on #1 and #2 above, I would expect
that to be Apache Hadoop 0.20.203.0 and Apache Hadoop 2.0.4-alpha.

4) Binary packaging
4a) Either source produces a single binary for all accepted versions

or

4b) Instructions for building from source for each versions and somehow
flag what (if any) convenience binaries are made for the release.

=application

There will be many back-ported patches. Not much active development happens
on 1.4.x now, but I presume this should still all go onto a feature branch?

Is the community preference that eventually all the changes become a single
commit (or one-per-subtask if there are multiple jiras) on the active 1.4
development branch, or that the original patches remain broken out?

For what it's worth, I'd recommend keeping them broken out. (And that's how
the initial development against CDH4 has been done.)


[1] http://bit.ly/1fxucMe
[2] http://bit.ly/192zUAJ
[3]
http://accumulo.apache.org/1.4/user_manual/Administration.html#Dependencies



Re: Hadoop 2.0 Support for Accumulo 1.4 Branch

2013-10-14 Thread Sean Busbey
On Mon, Oct 14, 2013 at 9:24 PM, Mike Drob md...@mdrob.com wrote:


  3) Test for correctness on given versions, with = 5 node cluster
 
  * Unit Tests
  * Functional Tests
  * 24hr continuous + verification
  * 24hr continuous + verification + agitation
  * 24hr random walk
  * 24hr random walk + agitation
 
  Keith mentioned running these against a CDH4 cluster, but I presume that
  since Apache Releases are our stated compatibilities it would actually be
  against whatever versions we list. Based on #1 and #2 above, I would
 expect
  that to be Apache Hadoop 0.20.203.0 and Apache Hadoop 2.0.4-alpha.
 
 Hadoop 2 introduces some neat new things like NN HA, which I think it
 might be worthwhile to test with. At that level it might be more of a
 verification of the Hadoop code, but I'd like to be comfortable that our
 DFS Clients switch correctly. This is in addition to the standard release
 suite that we run. [1]

 [1]: http://accumulo.apache.org/governance/releasing.html#testing




Just to confirm, the change from Keith's request is

* 72hr continuous + agitation + cluster running
* Something to test that HA NN failover doesn't take out Accumulo

Would the latter be addressed by an additional functional test? or would it
need to be some kind of addition to the agitation?




 Having run the binary packaging for 1.4.4, I can tell you that it is not in
 great shape. Christopher cleaned up a lot of the issues in the 1.5 line, so
 I didn't bother spending a ton of time on them here, but I think RPM and
 DEB are both broken. It would be nice to be able to specify a Hadoop 2
 version for compilation, similar to what happens in the newer code base,
 which could be back ported, I suppose. 4b seems easier.



I think this means you're +0 on 4b?



  =application
 
  There will be many back-ported patches. Not much active development
 happens
  on 1.4.x now, but I presume this should still all go onto a feature
 branch?
 
  Is the community preference that eventually all the changes become a
 single
  commit (or one-per-subtask if there are multiple jiras) on the active 1.4
  development branch, or that the original patches remain broken out?
 
 Not sure what you mean by this.


It's the difference between the 1.4.x branch having all the commits that
are backported from 1.5.x vs just having squashed ones. The former
maintains more of the original authorship and ties to original jiras. The
latter has less noise.

-- 
Sean


Re: Hadoop 2.0 Support for Accumulo 1.4 Branch

2013-10-14 Thread Sean Busbey
On Mon, Oct 14, 2013 at 10:02 PM, Josh Elser josh.el...@gmail.com wrote:

 For #2, from what I've read, we should definitely bump up the dependency
 on 1.5.1-SNAPSHOT to 2.1.0-beta, and, given what Ted replied with, to
 2.2.0-beta for that hadoop-2 profile.


so 1.5.1-SNAPSHOT and this proposed change to 1.4.5-SNAPSHOT should both
target 2.2.0-beta, presuming the RC passes (and 2.1.0-beta prior). This
sounds inline with Mike's comment re: alpha v beta.

anyone have an objection?



 I probably stated this before, but I'd much rather see more effort in
 testing Accumulo 1.5.x (and 1.6.0 as that will be feature frozen soon)
 against hadoop-2 (like Mike's point about HA). I'm not sure if anyone ever
 did testing of Accumulo with the hadoop-2 features -- I seem to recall that
 it was more testing does Accumulo run on both hadoop 1 and 2.



I figured whatever bar I end up passing for Hadoop 2 support on 1.4.x
should help with testing the same for 1.5.x and 1.6.x.


-- 
Sean


Re: Hadoop 2.0 Support for Accumulo 1.4 Branch

2013-10-14 Thread Billie Rinaldi
Thanks for the note, Ted. That vote is for 2.2.0, not -beta.
On Oct 14, 2013 7:30 PM, Ted Yu yuzhih...@gmail.com wrote:

 w.r.t. hadoop-2 release, see this thread:

 http://search-hadoop.com/m/YSTny19y1Ha1/hadoop+2.2.0

 Looks like 2.2.0-beta would pass votes.

 Cheers


 On Mon, Oct 14, 2013 at 7:24 PM, Mike Drob md...@mdrob.com wrote:

  Responses Inline.
 
  - Mike
 
  On Mon, Oct 14, 2013 at 12:55 PM, Sean Busbey bus...@cloudera.com
 wrote:
 
   Hey All,
  
   I'd like to restart the conversation from end July / start August about
   Hadoop 2 support on the 1.4 branch.
  
   Specifically, I'd like to get some requirements ironed out so I can
 file
   one or more jiras. I'd also like to get a plan for application.
  
   =requirements
  
   Here's the requirements I have from the last thread:
  
   1)  Maintain existing 1.4 compatibility
  
   The only thing I see listed in the pom is Apache release 0.20.203.0.
  (1.4.4
   tag)[1]
  
   I don't see anything in the README[2] nor the user manual[3] on other
   versions being supported.
  
   Yep.
 
 
   2) Gain Hadoop 2 support
  
   At the moment, I'm presuming this means Apache release 2.0.4-alpha
 since
   that's what 1.5.0 builds against for Hadoop 2.
  
   I haven't been following the Hadoop 2 release schedule that closely,
 but
  I
  think the latest is a 2.1.0-beta? Pretty sure it was released after we
  finished Accumulo 1.5, so there's no reason not to support it in my mind.
  Depending on an alpha of something strikes me as either unstable or
 lazy,
  although I fully understand that it may be neither.
 
 
   3) Test for correctness on given versions, with = 5 node cluster
  
   * Unit Tests
   * Functional Tests
   * 24hr continuous + verification
   * 24hr continuous + verification + agitation
   * 24hr random walk
   * 24hr random walk + agitation
  
   Keith mentioned running these against a CDH4 cluster, but I presume
 that
   since Apache Releases are our stated compatibilities it would actually
 be
   against whatever versions we list. Based on #1 and #2 above, I would
  expect
   that to be Apache Hadoop 0.20.203.0 and Apache Hadoop 2.0.4-alpha.
  
   Hadoop 2 introduces some neat new things like NN HA, which I think it
  might be worthwhile to test with. At that level it might be more of a
  verification of the Hadoop code, but I'd like to be comfortable that our
  DFS Clients switch correctly. This is in addition to the standard release
  suite that we run. [1]
 
  [1]: http://accumulo.apache.org/governance/releasing.html#testing
 
 
   4) Binary packaging
   4a) Either source produces a single binary for all accepted versions
  
   or
  
   4b) Instructions for building from source for each versions and somehow
   flag what (if any) convenience binaries are made for the release.
  
  
  Having run the binary packaging for 1.4.4, I can tell you that it is not
 in
  great shape. Christopher cleaned up a lot of the issues in the 1.5 line,
 so
  I didn't bother spending a ton of time on them here, but I think RPM and
  DEB are both broken. It would be nice to be able to specify a Hadoop 2
  version for compilation, similar to what happens in the newer code base,
  which could be back ported, I suppose. 4b seems easier.
 
  =application
  
   There will be many back-ported patches. Not much active development
  happens
   on 1.4.x now, but I presume this should still all go onto a feature
  branch?
  
   Is the community preference that eventually all the changes become a
  single
   commit (or one-per-subtask if there are multiple jiras) on the active
 1.4
   development branch, or that the original patches remain broken out?
  
   Not sure what you mean by this.
 
 
   For what it's worth, I'd recommend keeping them broken out. (And that's
  how
   the initial development against CDH4 has been done.)
  
  
   [1] http://bit.ly/1fxucMe
   [2] http://bit.ly/192zUAJ
   [3]
  
 
 http://accumulo.apache.org/1.4/user_manual/Administration.html#Dependencies
  
   --
   Sean
  
 



Re: Hadoop 2.0 Support for Accumulo 1.4 Branch

2013-08-02 Thread Joey Echeverria
Sorry for the delay, it's been one of those weeks.

The current version would probably not be backwards compatible to
0.20.2 just based on changes in dependencies. We're looking right now
to see how hard it is to have three way compatibility (0.20, 1.0,
2.0).

-Joey

On Thu, Aug 1, 2013 at 7:33 PM, Dave Marion dlmar...@comcast.net wrote:
 Any update?

 -Original Message-
 From: Joey Echeverria [mailto:j...@cloudera.com]
 Sent: Monday, July 29, 2013 1:24 PM
 To: dev@accumulo.apache.org
 Subject: Re: Hadoop 2.0 Support for Accumulo 1.4 Branch

 We're testing this today. I'll report back what we find.


 -Joey
 —
 Sent from Mailbox for iPhone

 On Fri, Jul 26, 2013 at 3:34 PM, null dlmar...@comcast.net wrote:

 Will 1.4 still work with 0.20 with these patches?
 Great point Billie.
 - Original Message -
 From: Billie Rinaldi billie.rina...@gmail.com
 To: dev@accumulo.apache.org
 Sent: Friday, July 26, 2013 3:02:41 PM
 Subject: Re: Hadoop 2.0 Support for Accumulo 1.4 Branch On Fri, Jul
 26, 2013 at 11:33 AM, Joey Echeverria j...@cloudera.com wrote:
  If these patches are going to be included with 1.4.4 or 1.4.5, I
  would
 like
  to see the following test run using CDH4 on at least a 5 node cluster.
   More nodes would be better.
 
* unit test
* Functional test
* 24 hr Continuous ingest + verification
* 24 hr Continuous ingest + verification + agitation
* 24 hr Random walk
* 24 hr Random walk + agitation
 
  I may be able to assist with this, but I can not make any promises.

 Sure thing. Is there already a write-up on running this full battery
 of tests? I have a 10 node cluster that I can use for this.


  Great.  I think this would be a good patch for 1.4.   I assume that
  if a user stays with Hadoop 1 there are no dependency changes?

 Yup. It works the same way as 1.5 where all of the dependency changes
 are in a Hadoop 2.0 profile.

 In 1.5.0, we gave up on compatibility with 0.20 (and early versions of
 1.0) to make the compatibility requirements simpler; we ended up
 without dependency changes in the hadoop version profiles.  Will 1.4
 still work with 0.20 with these patches?  If there are dependency
 changes in the profiles, 1.4 would have to be compiled against a
 hadoop version compatible with the running version of hadoop, correct?
 We had some trouble in the
 1.5 release process with figuring out how to provide multiple binary
 artifacts (each compiled against a different version of hadoop) for
 the same release.  Just something we should consider before we are in
 the midst of releasing 1.4.4.
 Billie
 -Joey





-- 
Joey Echeverria
Director, Federal FTS
Cloudera, Inc.


Re: Hadoop 2.0 Support for Accumulo 1.4 Branch

2013-08-02 Thread Christopher
Would it be reasonable to consider a version of 1.4 that breaks
compatibility with 0.20? I'm not really a fan of this, personally, but
am curious what others think.

--
Christopher L Tubbs II
http://gravatar.com/ctubbsii


On Fri, Aug 2, 2013 at 2:22 PM, Joey Echeverria j...@cloudera.com wrote:
 Sorry for the delay, it's been one of those weeks.

 The current version would probably not be backwards compatible to
 0.20.2 just based on changes in dependencies. We're looking right now
 to see how hard it is to have three way compatibility (0.20, 1.0,
 2.0).

 -Joey

 On Thu, Aug 1, 2013 at 7:33 PM, Dave Marion dlmar...@comcast.net wrote:
 Any update?

 -Original Message-
 From: Joey Echeverria [mailto:j...@cloudera.com]
 Sent: Monday, July 29, 2013 1:24 PM
 To: dev@accumulo.apache.org
 Subject: Re: Hadoop 2.0 Support for Accumulo 1.4 Branch

 We're testing this today. I'll report back what we find.


 -Joey
 —
 Sent from Mailbox for iPhone

 On Fri, Jul 26, 2013 at 3:34 PM, null dlmar...@comcast.net wrote:

 Will 1.4 still work with 0.20 with these patches?
 Great point Billie.
 - Original Message -
 From: Billie Rinaldi billie.rina...@gmail.com
 To: dev@accumulo.apache.org
 Sent: Friday, July 26, 2013 3:02:41 PM
 Subject: Re: Hadoop 2.0 Support for Accumulo 1.4 Branch On Fri, Jul
 26, 2013 at 11:33 AM, Joey Echeverria j...@cloudera.com wrote:
  If these patches are going to be included with 1.4.4 or 1.4.5, I
  would
 like
  to see the following test run using CDH4 on at least a 5 node cluster.
   More nodes would be better.
 
* unit test
* Functional test
* 24 hr Continuous ingest + verification
* 24 hr Continuous ingest + verification + agitation
* 24 hr Random walk
* 24 hr Random walk + agitation
 
  I may be able to assist with this, but I can not make any promises.

 Sure thing. Is there already a write-up on running this full battery
 of tests? I have a 10 node cluster that I can use for this.


  Great.  I think this would be a good patch for 1.4.   I assume that
  if a user stays with Hadoop 1 there are no dependency changes?

 Yup. It works the same way as 1.5 where all of the dependency changes
 are in a Hadoop 2.0 profile.

 In 1.5.0, we gave up on compatibility with 0.20 (and early versions of
 1.0) to make the compatibility requirements simpler; we ended up
 without dependency changes in the hadoop version profiles.  Will 1.4
 still work with 0.20 with these patches?  If there are dependency
 changes in the profiles, 1.4 would have to be compiled against a
 hadoop version compatible with the running version of hadoop, correct?
 We had some trouble in the
 1.5 release process with figuring out how to provide multiple binary
 artifacts (each compiled against a different version of hadoop) for
 the same release.  Just something we should consider before we are in
 the midst of releasing 1.4.4.
 Billie
 -Joey





 --
 Joey Echeverria
 Director, Federal FTS
 Cloudera, Inc.


Re: Hadoop 2.0 Support for Accumulo 1.4 Branch

2013-08-02 Thread Joey Echeverria
I don't think that's a good idea unless you can come up with very
clear version number change.

-Joey

On Fri, Aug 2, 2013 at 2:31 PM, Christopher ctubb...@apache.org wrote:
 Would it be reasonable to consider a version of 1.4 that breaks
 compatibility with 0.20? I'm not really a fan of this, personally, but
 am curious what others think.

 --
 Christopher L Tubbs II
 http://gravatar.com/ctubbsii


 On Fri, Aug 2, 2013 at 2:22 PM, Joey Echeverria j...@cloudera.com wrote:
 Sorry for the delay, it's been one of those weeks.

 The current version would probably not be backwards compatible to
 0.20.2 just based on changes in dependencies. We're looking right now
 to see how hard it is to have three way compatibility (0.20, 1.0,
 2.0).

 -Joey

 On Thu, Aug 1, 2013 at 7:33 PM, Dave Marion dlmar...@comcast.net wrote:
 Any update?

 -Original Message-
 From: Joey Echeverria [mailto:j...@cloudera.com]
 Sent: Monday, July 29, 2013 1:24 PM
 To: dev@accumulo.apache.org
 Subject: Re: Hadoop 2.0 Support for Accumulo 1.4 Branch

 We're testing this today. I'll report back what we find.


 -Joey
 —
 Sent from Mailbox for iPhone

 On Fri, Jul 26, 2013 at 3:34 PM, null dlmar...@comcast.net wrote:

 Will 1.4 still work with 0.20 with these patches?
 Great point Billie.
 - Original Message -
 From: Billie Rinaldi billie.rina...@gmail.com
 To: dev@accumulo.apache.org
 Sent: Friday, July 26, 2013 3:02:41 PM
 Subject: Re: Hadoop 2.0 Support for Accumulo 1.4 Branch On Fri, Jul
 26, 2013 at 11:33 AM, Joey Echeverria j...@cloudera.com wrote:
  If these patches are going to be included with 1.4.4 or 1.4.5, I
  would
 like
  to see the following test run using CDH4 on at least a 5 node cluster.
   More nodes would be better.
 
* unit test
* Functional test
* 24 hr Continuous ingest + verification
* 24 hr Continuous ingest + verification + agitation
* 24 hr Random walk
* 24 hr Random walk + agitation
 
  I may be able to assist with this, but I can not make any promises.

 Sure thing. Is there already a write-up on running this full battery
 of tests? I have a 10 node cluster that I can use for this.


  Great.  I think this would be a good patch for 1.4.   I assume that
  if a user stays with Hadoop 1 there are no dependency changes?

 Yup. It works the same way as 1.5 where all of the dependency changes
 are in a Hadoop 2.0 profile.

 In 1.5.0, we gave up on compatibility with 0.20 (and early versions of
 1.0) to make the compatibility requirements simpler; we ended up
 without dependency changes in the hadoop version profiles.  Will 1.4
 still work with 0.20 with these patches?  If there are dependency
 changes in the profiles, 1.4 would have to be compiled against a
 hadoop version compatible with the running version of hadoop, correct?
 We had some trouble in the
 1.5 release process with figuring out how to provide multiple binary
 artifacts (each compiled against a different version of hadoop) for
 the same release.  Just something we should consider before we are in
 the midst of releasing 1.4.4.
 Billie
 -Joey





 --
 Joey Echeverria
 Director, Federal FTS
 Cloudera, Inc.



-- 
Joey Echeverria
Director, Federal FTS
Cloudera, Inc.


Re: Hadoop 2.0 Support for Accumulo 1.4 Branch

2013-08-02 Thread Mike Drob
Which version of 0.20 are you testing against? Vanilla, or cdh3 flavored?


On Fri, Aug 2, 2013 at 2:37 PM, Joey Echeverria j...@cloudera.com wrote:

 I don't think that's a good idea unless you can come up with very
 clear version number change.

 -Joey

 On Fri, Aug 2, 2013 at 2:31 PM, Christopher ctubb...@apache.org wrote:
  Would it be reasonable to consider a version of 1.4 that breaks
  compatibility with 0.20? I'm not really a fan of this, personally, but
  am curious what others think.
 
  --
  Christopher L Tubbs II
  http://gravatar.com/ctubbsii
 
 
  On Fri, Aug 2, 2013 at 2:22 PM, Joey Echeverria j...@cloudera.com
 wrote:
  Sorry for the delay, it's been one of those weeks.
 
  The current version would probably not be backwards compatible to
  0.20.2 just based on changes in dependencies. We're looking right now
  to see how hard it is to have three way compatibility (0.20, 1.0,
  2.0).
 
  -Joey
 
  On Thu, Aug 1, 2013 at 7:33 PM, Dave Marion dlmar...@comcast.net
 wrote:
  Any update?
 
  -Original Message-
  From: Joey Echeverria [mailto:j...@cloudera.com]
  Sent: Monday, July 29, 2013 1:24 PM
  To: dev@accumulo.apache.org
  Subject: Re: Hadoop 2.0 Support for Accumulo 1.4 Branch
 
  We're testing this today. I'll report back what we find.
 
 
  -Joey
  —
  Sent from Mailbox for iPhone
 
  On Fri, Jul 26, 2013 at 3:34 PM, null dlmar...@comcast.net wrote:
 
  Will 1.4 still work with 0.20 with these patches?
  Great point Billie.
  - Original Message -
  From: Billie Rinaldi billie.rina...@gmail.com
  To: dev@accumulo.apache.org
  Sent: Friday, July 26, 2013 3:02:41 PM
  Subject: Re: Hadoop 2.0 Support for Accumulo 1.4 Branch On Fri, Jul
  26, 2013 at 11:33 AM, Joey Echeverria j...@cloudera.com wrote:
   If these patches are going to be included with 1.4.4 or 1.4.5, I
   would
  like
   to see the following test run using CDH4 on at least a 5 node
 cluster.
More nodes would be better.
  
 * unit test
 * Functional test
 * 24 hr Continuous ingest + verification
 * 24 hr Continuous ingest + verification + agitation
 * 24 hr Random walk
 * 24 hr Random walk + agitation
  
   I may be able to assist with this, but I can not make any promises.
 
  Sure thing. Is there already a write-up on running this full battery
  of tests? I have a 10 node cluster that I can use for this.
 
 
   Great.  I think this would be a good patch for 1.4.   I assume that
   if a user stays with Hadoop 1 there are no dependency changes?
 
  Yup. It works the same way as 1.5 where all of the dependency changes
  are in a Hadoop 2.0 profile.
 
  In 1.5.0, we gave up on compatibility with 0.20 (and early versions of
  1.0) to make the compatibility requirements simpler; we ended up
  without dependency changes in the hadoop version profiles.  Will 1.4
  still work with 0.20 with these patches?  If there are dependency
  changes in the profiles, 1.4 would have to be compiled against a
  hadoop version compatible with the running version of hadoop, correct?
  We had some trouble in the
  1.5 release process with figuring out how to provide multiple binary
  artifacts (each compiled against a different version of hadoop) for
  the same release.  Just something we should consider before we are in
  the midst of releasing 1.4.4.
  Billie
  -Joey
 
 
 
 
 
  --
  Joey Echeverria
  Director, Federal FTS
  Cloudera, Inc.



 --
 Joey Echeverria
 Director, Federal FTS
 Cloudera, Inc.



RE: Hadoop 2.0 Support for Accumulo 1.4 Branch

2013-08-01 Thread Dave Marion
Any update?

-Original Message-
From: Joey Echeverria [mailto:j...@cloudera.com] 
Sent: Monday, July 29, 2013 1:24 PM
To: dev@accumulo.apache.org
Subject: Re: Hadoop 2.0 Support for Accumulo 1.4 Branch

We're testing this today. I'll report back what we find. 


-Joey
—
Sent from Mailbox for iPhone

On Fri, Jul 26, 2013 at 3:34 PM, null dlmar...@comcast.net wrote:

 Will 1.4 still work with 0.20 with these patches? 
 Great point Billie. 
 - Original Message -
 From: Billie Rinaldi billie.rina...@gmail.com
 To: dev@accumulo.apache.org
 Sent: Friday, July 26, 2013 3:02:41 PM
 Subject: Re: Hadoop 2.0 Support for Accumulo 1.4 Branch On Fri, Jul 
 26, 2013 at 11:33 AM, Joey Echeverria j...@cloudera.com wrote:
  If these patches are going to be included with 1.4.4 or 1.4.5, I 
  would
 like
  to see the following test run using CDH4 on at least a 5 node cluster. 
   More nodes would be better. 
  
* unit test
* Functional test
* 24 hr Continuous ingest + verification
* 24 hr Continuous ingest + verification + agitation
* 24 hr Random walk
* 24 hr Random walk + agitation
  
  I may be able to assist with this, but I can not make any promises. 
 
 Sure thing. Is there already a write-up on running this full battery 
 of tests? I have a 10 node cluster that I can use for this.
 
 
  Great.  I think this would be a good patch for 1.4.   I assume that 
  if a user stays with Hadoop 1 there are no dependency changes?
 
 Yup. It works the same way as 1.5 where all of the dependency changes 
 are in a Hadoop 2.0 profile.
 
 In 1.5.0, we gave up on compatibility with 0.20 (and early versions of 
 1.0) to make the compatibility requirements simpler; we ended up 
 without dependency changes in the hadoop version profiles.  Will 1.4 
 still work with 0.20 with these patches?  If there are dependency 
 changes in the profiles, 1.4 would have to be compiled against a 
 hadoop version compatible with the running version of hadoop, correct?  
 We had some trouble in the
 1.5 release process with figuring out how to provide multiple binary 
 artifacts (each compiled against a different version of hadoop) for 
 the same release.  Just something we should consider before we are in 
 the midst of releasing 1.4.4.
 Billie
 -Joey
 



Re: Hadoop 2.0 Support for Accumulo 1.4 Branch

2013-07-29 Thread Joey Echeverria
We're testing this today. I'll report back what we find. 


-Joey
—
Sent from Mailbox for iPhone

On Fri, Jul 26, 2013 at 3:34 PM, null dlmar...@comcast.net wrote:

 Will 1.4 still work with 0.20 with these patches? 
 Great point Billie. 
 - Original Message -
 From: Billie Rinaldi billie.rina...@gmail.com 
 To: dev@accumulo.apache.org 
 Sent: Friday, July 26, 2013 3:02:41 PM 
 Subject: Re: Hadoop 2.0 Support for Accumulo 1.4 Branch 
 On Fri, Jul 26, 2013 at 11:33 AM, Joey Echeverria j...@cloudera.com wrote: 
  If these patches are going to be included with 1.4.4 or 1.4.5, I would 
 like 
  to see the following test run using CDH4 on at least a 5 node cluster. 
   More nodes would be better. 
  
    * unit test 
    * Functional test 
    * 24 hr Continuous ingest + verification 
    * 24 hr Continuous ingest + verification + agitation 
    * 24 hr Random walk 
    * 24 hr Random walk + agitation 
  
  I may be able to assist with this, but I can not make any promises. 
 
 Sure thing. Is there already a write-up on running this full battery 
 of tests? I have a 10 node cluster that I can use for this. 
 
 
  Great.  I think this would be a good patch for 1.4.   I assume that if a 
  user stays with Hadoop 1 there are no dependency changes? 
 
 Yup. It works the same way as 1.5 where all of the dependency changes 
 are in a Hadoop 2.0 profile. 
 
 In 1.5.0, we gave up on compatibility with 0.20 (and early versions of 1.0) 
 to make the compatibility requirements simpler; we ended up without 
 dependency changes in the hadoop version profiles.  Will 1.4 still work 
 with 0.20 with these patches?  If there are dependency changes in the 
 profiles, 1.4 would have to be compiled against a hadoop version compatible 
 with the running version of hadoop, correct?  We had some trouble in the 
 1.5 release process with figuring out how to provide multiple binary 
 artifacts (each compiled against a different version of hadoop) for the 
 same release.  Just something we should consider before we are in the midst 
 of releasing 1.4.4. 
 Billie 
 -Joey 
 

Hadoop 2.0 Support for Accumulo 1.4 Branch

2013-07-26 Thread Joey Echeverria
Cloudera announced last night our support for Accumulo 1.4.3 on CDH4:

http://www.slideshare.net/JoeyEcheverria/apache-accumulo-and-cloudera

This required back porting about 11 patches in whole or in part from the
1.5 line on top of 1.4.3. Our release is still in a semi-private beta, but
when it's fully public it will be downloadable along with all of the extra
patches that we committed.

My question is if the community would be interested in us pulling those
back ports upstream?

I believe this would violate the previously agreed upon rule of no feature
back ports to 1.4.3, depending on how we label support for Hadoop 2.0.

Thoughts?

-Joey


Re: Hadoop 2.0 Support for Accumulo 1.4 Branch

2013-07-26 Thread Eric Newton
My question is if the community would be interested in us pulling those
back ports upstream?

Yes, please.


Re: Hadoop 2.0 Support for Accumulo 1.4 Branch

2013-07-26 Thread Joey Echeverria
We have both the unit tests and the full system test suite hooked up to a
Jenkins build server.

There are still a couple of tests that fail periodically with the full
system test due to timeouts. We're working on those which is why our
current release is just a beta.

There are no API changes or Accumulo behavior changes. You can use
unmodified 1.4.x clients with our release of the server daemons.

-Joey


On Fri, Jul 26, 2013 at 11:45 AM, Keith Turner ke...@deenlo.com wrote:

 On Fri, Jul 26, 2013 at 11:02 AM, Joey Echeverria j...@cloudera.com
 wrote:

  Cloudera announced last night our support for Accumulo 1.4.3 on CDH4:
 
  http://www.slideshare.net/JoeyEcheverria/apache-accumulo-and-cloudera
 
  This required back porting about 11 patches in whole or in part from the
  1.5 line on top of 1.4.3. Our release is still in a semi-private beta,
 but
  when it's fully public it will be downloadable along with all of the
 extra
  patches that we committed.
 
  My question is if the community would be interested in us pulling those
  back ports upstream?
 

 What testing has been done?  It would be nice to run accumulo's full test
 suite against 1.4.3+CDH4.

 Are there any Accumulo API changes or Accumulo behavior changes?


  I believe this would violate the previously agreed upon rule of no
 feature
  back ports to 1.4.3, depending on how we label support for Hadoop 2.0.


  Thoughts?
 
  -Joey
 




-- 
Joey Echeverria
Director, Federal FTS
Cloudera, Inc.


Re: Hadoop 2.0 Support for Accumulo 1.4 Branch

2013-07-26 Thread Keith Turner
On Fri, Jul 26, 2013 at 12:24 PM, Joey Echeverria j...@cloudera.com wrote:

 We have both the unit tests and the full system test suite hooked up to a
 Jenkins build server.


If these patches are going to be included with 1.4.4 or 1.4.5, I would like
to see the following test run using CDH4 on at least a 5 node cluster.
 More nodes would be better.

  * unit test
  * Functional test
  * 24 hr Continuous ingest + verification
  * 24 hr Continuous ingest + verification + agitation
  * 24 hr Random walk
  * 24 hr Random walk + agitation

I may be able to assist with this, but I can not make any promises.



 There are still a couple of tests that fail periodically with the full
 system test due to timeouts. We're working on those which is why our
 current release is just a beta.

 There are no API changes or Accumulo behavior changes. You can use
 unmodified 1.4.x clients with our release of the server daemons.


Great.  I think this would be a good patch for 1.4.   I assume that if a
user stays with Hadoop 1 there are no dependency changes?



 -Joey


 On Fri, Jul 26, 2013 at 11:45 AM, Keith Turner ke...@deenlo.com wrote:

  On Fri, Jul 26, 2013 at 11:02 AM, Joey Echeverria j...@cloudera.com
  wrote:
 
   Cloudera announced last night our support for Accumulo 1.4.3 on CDH4:
  
   http://www.slideshare.net/JoeyEcheverria/apache-accumulo-and-cloudera
  
   This required back porting about 11 patches in whole or in part from
 the
   1.5 line on top of 1.4.3. Our release is still in a semi-private beta,
  but
   when it's fully public it will be downloadable along with all of the
  extra
   patches that we committed.
  
   My question is if the community would be interested in us pulling those
   back ports upstream?
  
 
  What testing has been done?  It would be nice to run accumulo's full test
  suite against 1.4.3+CDH4.
 
  Are there any Accumulo API changes or Accumulo behavior changes?
 
 
   I believe this would violate the previously agreed upon rule of no
  feature
   back ports to 1.4.3, depending on how we label support for Hadoop
 2.0.
 
 
   Thoughts?
  
   -Joey
  
 



 --
 Joey Echeverria
 Director, Federal FTS
 Cloudera, Inc.



Re: Hadoop 2.0 Support for Accumulo 1.4 Branch

2013-07-26 Thread Joey Echeverria
 If these patches are going to be included with 1.4.4 or 1.4.5, I would like
 to see the following test run using CDH4 on at least a 5 node cluster.
  More nodes would be better.

   * unit test
   * Functional test
   * 24 hr Continuous ingest + verification
   * 24 hr Continuous ingest + verification + agitation
   * 24 hr Random walk
   * 24 hr Random walk + agitation

 I may be able to assist with this, but I can not make any promises.

Sure thing. Is there already a write-up on running this full battery
of tests? I have a 10 node cluster that I can use for this.


 Great.  I think this would be a good patch for 1.4.   I assume that if a
 user stays with Hadoop 1 there are no dependency changes?

Yup. It works the same way as 1.5 where all of the dependency changes
are in a Hadoop 2.0 profile.

-Joey


Re: Hadoop 2.0 Support for Accumulo 1.4 Branch

2013-07-26 Thread Billie Rinaldi
On Fri, Jul 26, 2013 at 11:33 AM, Joey Echeverria j...@cloudera.com wrote:

  If these patches are going to be included with 1.4.4 or 1.4.5, I would
 like
  to see the following test run using CDH4 on at least a 5 node cluster.
   More nodes would be better.
 
* unit test
* Functional test
* 24 hr Continuous ingest + verification
* 24 hr Continuous ingest + verification + agitation
* 24 hr Random walk
* 24 hr Random walk + agitation
 
  I may be able to assist with this, but I can not make any promises.

 Sure thing. Is there already a write-up on running this full battery
 of tests? I have a 10 node cluster that I can use for this.


  Great.  I think this would be a good patch for 1.4.   I assume that if a
  user stays with Hadoop 1 there are no dependency changes?

 Yup. It works the same way as 1.5 where all of the dependency changes
 are in a Hadoop 2.0 profile.


In 1.5.0, we gave up on compatibility with 0.20 (and early versions of 1.0)
to make the compatibility requirements simpler; we ended up without
dependency changes in the hadoop version profiles.  Will 1.4 still work
with 0.20 with these patches?  If there are dependency changes in the
profiles, 1.4 would have to be compiled against a hadoop version compatible
with the running version of hadoop, correct?  We had some trouble in the
1.5 release process with figuring out how to provide multiple binary
artifacts (each compiled against a different version of hadoop) for the
same release.  Just something we should consider before we are in the midst
of releasing 1.4.4.

Billie


 -Joey



Re: Hadoop 2.0 Support for Accumulo 1.4 Branch

2013-07-26 Thread Keith Turner
On Fri, Jul 26, 2013 at 2:33 PM, Joey Echeverria j...@cloudera.com wrote:

  If these patches are going to be included with 1.4.4 or 1.4.5, I would
 like
  to see the following test run using CDH4 on at least a 5 node cluster.
   More nodes would be better.
 
* unit test
* Functional test
* 24 hr Continuous ingest + verification
* 24 hr Continuous ingest + verification + agitation
* 24 hr Random walk
* 24 hr Random walk + agitation
 
  I may be able to assist with this, but I can not make any promises.

 Sure thing. Is there already a write-up on running this full battery
 of tests? I have a 10 node cluster that I can use for this.


There are some instructions.

test/system/continuous/README
test/system/randomwalk/README

Continuous ingest has a lot of options.  For release testing we do
something like the following.

  #configure may need to adjust max mappers and max reducers to make map
reduce job run faster
  start-ingest.sh
  start-walker.sh
  #sleep 24hr
  stop-ingest.sh
  stop-walker.sh
  run-verify.sh

There continuous dir has scripts for starting and stopping the agitator.
 We also use this script to agitate while running random walk test.

For random walk we use the All.xml graph, configure it to log errors to
NFS, and run a walker on each node.  We look in NFS for walkers that died
or got stuck.  The random walk framework will log a message if a node in
the graph gets stuck.  It will also log a message when it gets unstuck.



  Great.  I think this would be a good patch for 1.4.   I assume that if a
  user stays with Hadoop 1 there are no dependency changes?

 Yup. It works the same way as 1.5 where all of the dependency changes
 are in a Hadoop 2.0 profile.

 -Joey



Re: Hadoop 2.0 Support for Accumulo 1.4 Branch

2013-07-26 Thread dlmarion


Will 1.4 still work with 0.20 with these patches? 



Great point Billie. 



- Original Message -


From: Billie Rinaldi billie.rina...@gmail.com 
To: dev@accumulo.apache.org 
Sent: Friday, July 26, 2013 3:02:41 PM 
Subject: Re: Hadoop 2.0 Support for Accumulo 1.4 Branch 

On Fri, Jul 26, 2013 at 11:33 AM, Joey Echeverria j...@cloudera.com wrote: 

  If these patches are going to be included with 1.4.4 or 1.4.5, I would 
 like 
  to see the following test run using CDH4 on at least a 5 node cluster. 
   More nodes would be better. 
  
    * unit test 
    * Functional test 
    * 24 hr Continuous ingest + verification 
    * 24 hr Continuous ingest + verification + agitation 
    * 24 hr Random walk 
    * 24 hr Random walk + agitation 
  
  I may be able to assist with this, but I can not make any promises. 
 
 Sure thing. Is there already a write-up on running this full battery 
 of tests? I have a 10 node cluster that I can use for this. 
 
 
  Great.  I think this would be a good patch for 1.4.   I assume that if a 
  user stays with Hadoop 1 there are no dependency changes? 
 
 Yup. It works the same way as 1.5 where all of the dependency changes 
 are in a Hadoop 2.0 profile. 
 

In 1.5.0, we gave up on compatibility with 0.20 (and early versions of 1.0) 
to make the compatibility requirements simpler; we ended up without 
dependency changes in the hadoop version profiles.  Will 1.4 still work 
with 0.20 with these patches?  If there are dependency changes in the 
profiles, 1.4 would have to be compiled against a hadoop version compatible 
with the running version of hadoop, correct?  We had some trouble in the 
1.5 release process with figuring out how to provide multiple binary 
artifacts (each compiled against a different version of hadoop) for the 
same release.  Just something we should consider before we are in the midst 
of releasing 1.4.4. 

Billie 


 -Joey