Re: Review Request 26188: ACCUMULO-3176 Add ability to create a table with user specified initial properties

2014-10-06 Thread keith


On Oct. 6, 2014, 2:19 p.m., Jenna Huston wrote:
  Did you plan to change the proxy API and/or create table command in shell?
 
 Jenna Huston wrote:
 I am in the process of testing the new command option in the shell.  Can 
 you suggest a test that would be good to look at for testing the new option.

test/src/test/java/org/apache/accumulo/test/ShellServerIT.java


- kturner


---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/26188/#review55498
---


On Oct. 3, 2014, 5:30 p.m., Jenna Huston wrote:
 
 ---
 This is an automatically generated e-mail. To reply, visit:
 https://reviews.apache.org/r/26188/
 ---
 
 (Updated Oct. 3, 2014, 5:30 p.m.)
 
 
 Review request for accumulo.
 
 
 Bugs: ACCUMULO-3176
 https://issues.apache.org/jira/browse/ACCUMULO-3176
 
 
 Repository: accumulo
 
 
 Description
 ---
 
 Gives the ability to add properties to tables before they are initialized.  
 Therefore these properties will take effect before the default tablet is 
 created.  We create a NewTableConfiguration class and send that in the create 
 method as opposed to adding another method.  
 
 
 Diffs
 -
 
   
 core/src/main/java/org/apache/accumulo/core/client/admin/TableOperations.java 
 97f538d 
   
 core/src/main/java/org/apache/accumulo/core/client/impl/NewTableConfiguration.java
  PRE-CREATION 
   
 core/src/main/java/org/apache/accumulo/core/client/impl/TableOperationsImpl.java
  e46b9c9 
   core/src/main/java/org/apache/accumulo/core/client/mock/MockAccumulo.java 
 32dbb28 
   core/src/main/java/org/apache/accumulo/core/client/mock/MockTable.java 
 35cbdd2 
   
 core/src/main/java/org/apache/accumulo/core/client/mock/MockTableOperations.java
  08750fe 
   
 core/src/test/java/org/apache/accumulo/core/client/impl/TableOperationsHelperTest.java
  02838ed 
   proxy/src/main/java/org/apache/accumulo/proxy/ProxyServer.java a778add 
   
 shell/src/main/java/org/apache/accumulo/shell/commands/CreateTableCommand.java
  81b39d2 
   
 test/src/test/java/org/apache/accumulo/test/CreateTableWithNewTableConfigIT.java
  PRE-CREATION 
 
 Diff: https://reviews.apache.org/r/26188/diff/
 
 
 Testing
 ---
 
 New IT, ran unit test and integration tests
 
 
 Thanks,
 
 Jenna Huston
 




Re: Review Request 26188: ACCUMULO-3176 Add ability to create a table with user specified initial properties

2014-10-06 Thread Christopher Tubbs


 On Oct. 6, 2014, 10:19 a.m., kturner wrote:
  core/src/main/java/org/apache/accumulo/core/client/impl/TableOperationsImpl.java,
   line 200
  https://reviews.apache.org/r/26188/diff/1/?file=713519#file713519line200
 
  Is there a benefit to deprecating here if its deprecated in the parent 
  class?  I am not sure if its needed, does the deprecated annotation inherit?

It's good practice to deprecate implementing sub-class methods for deprecated 
interface methods, unless there's a good reason to expect the sub-class to be 
referenced directly and it still needs the method. Annotations are not 
inherited, and can lead to API confusion if it's deprecated in an interface, 
but not in the implementing class.


- Christopher


---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/26188/#review55498
---


On Oct. 3, 2014, 1:30 p.m., Jenna Huston wrote:
 
 ---
 This is an automatically generated e-mail. To reply, visit:
 https://reviews.apache.org/r/26188/
 ---
 
 (Updated Oct. 3, 2014, 1:30 p.m.)
 
 
 Review request for accumulo.
 
 
 Bugs: ACCUMULO-3176
 https://issues.apache.org/jira/browse/ACCUMULO-3176
 
 
 Repository: accumulo
 
 
 Description
 ---
 
 Gives the ability to add properties to tables before they are initialized.  
 Therefore these properties will take effect before the default tablet is 
 created.  We create a NewTableConfiguration class and send that in the create 
 method as opposed to adding another method.  
 
 
 Diffs
 -
 
   
 core/src/main/java/org/apache/accumulo/core/client/admin/TableOperations.java 
 97f538d 
   
 core/src/main/java/org/apache/accumulo/core/client/impl/NewTableConfiguration.java
  PRE-CREATION 
   
 core/src/main/java/org/apache/accumulo/core/client/impl/TableOperationsImpl.java
  e46b9c9 
   core/src/main/java/org/apache/accumulo/core/client/mock/MockAccumulo.java 
 32dbb28 
   core/src/main/java/org/apache/accumulo/core/client/mock/MockTable.java 
 35cbdd2 
   
 core/src/main/java/org/apache/accumulo/core/client/mock/MockTableOperations.java
  08750fe 
   
 core/src/test/java/org/apache/accumulo/core/client/impl/TableOperationsHelperTest.java
  02838ed 
   proxy/src/main/java/org/apache/accumulo/proxy/ProxyServer.java a778add 
   
 shell/src/main/java/org/apache/accumulo/shell/commands/CreateTableCommand.java
  81b39d2 
   
 test/src/test/java/org/apache/accumulo/test/CreateTableWithNewTableConfigIT.java
  PRE-CREATION 
 
 Diff: https://reviews.apache.org/r/26188/diff/
 
 
 Testing
 ---
 
 New IT, ran unit test and integration tests
 
 
 Thanks,
 
 Jenna Huston
 




Re: C++ accumulo client -- native clients for Python, Go, Ruby etc

2014-10-06 Thread Josh Elser

It'd be really cool to see a C++ client -- fully implemented or not. The 
increased performance via other languages like you said would be really nice, 
but I'd also be curious to see how the server characteristics change when the 
client might be sending data at a much faster rate.

My C++ is super rusty these days, but I'd be happy to help out any devs who can 
spearhead the effort :)

John R. Frank wrote:

Accumulo Developers,

We're trying to boost throughput of non-Java tools with Accumulo.  It seems 
that the lowest hanging fruit is to stop using the thrift proxy. Per discussion 
about Python and thrift proxy in the users list [1], I'm wondering if anyone is 
interested in helping with a native C++ client?  There is a start on one here 
[2]. We could offer a bounty or maybe make a consulting project depending who 
is interested in it.

We also looked at trying to run a separate thrift proxy for every worker thread 
or process.  With many cores on a box, eg 32, it just doesn't seem practical to 
run that many proxies, even if they all run on a single JVM. We'd be glad to 
hear ideas on that front too.

A potentially big benefit of making a proper C++ accumulo client is that it is 
straightforward to expose native interfaces in Python (via pyObject), Go [3], 
Ruby [4], and other languages.

Thanks for any advice, pointers, interest.

John


1-- http://www.mail-archive.com/user@accumulo.apache.org/msg03999.html

2--
https://github.com/phrocker/apeirogon

3-- http://golang.org/cmd/cgo/

4-- https://www.amberbit.com/blog/2014/6/12/calling-c-cpp-from-ruby/


Sent from +1-617-899-2066


Re: Review Request 26188: ACCUMULO-3176 Add ability to create a table with user specified initial properties

2014-10-06 Thread Jenna Huston

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/26188/
---

(Updated Oct. 6, 2014, 6:40 p.m.)


Review request for accumulo.


Bugs: ACCUMULO-3176
https://issues.apache.org/jira/browse/ACCUMULO-3176


Repository: accumulo


Description
---

Gives the ability to add properties to tables before they are initialized.  
Therefore these properties will take effect before the default tablet is 
created.  We create a NewTableConfiguration class and send that in the create 
method as opposed to adding another method.  


Diffs (updated)
-

  core/src/main/java/org/apache/accumulo/core/client/NewTableConfiguration.java 
PRE-CREATION 
  core/src/main/java/org/apache/accumulo/core/client/admin/TableOperations.java 
97f538d 
  
core/src/main/java/org/apache/accumulo/core/client/impl/TableOperationsImpl.java
 e46b9c9 
  core/src/main/java/org/apache/accumulo/core/client/mock/MockAccumulo.java 
32dbb28 
  core/src/main/java/org/apache/accumulo/core/client/mock/MockTable.java 
35cbdd2 
  
core/src/main/java/org/apache/accumulo/core/client/mock/MockTableOperations.java
 08750fe 
  
core/src/test/java/org/apache/accumulo/core/client/impl/TableOperationsHelperTest.java
 02838ed 
  proxy/src/main/java/org/apache/accumulo/proxy/ProxyServer.java a778add 
  
shell/src/main/java/org/apache/accumulo/shell/commands/CreateTableCommand.java 
81b39d2 
  
test/src/test/java/org/apache/accumulo/test/CreateTableWithNewTableConfigIT.java
 PRE-CREATION 
  test/src/test/java/org/apache/accumulo/test/ShellServerIT.java 5a068af 

Diff: https://reviews.apache.org/r/26188/diff/


Testing
---

New IT, ran unit test and integration tests


Thanks,

Jenna Huston



Re: C++ accumulo client -- native clients for Python, Go, Ruby etc

2014-10-06 Thread Corey Nolet
I'm all for this- though I'm curious to know the thoughts about maintenance
and the design. Are we going to use thrift to tie the C++ client calls into
the server-side components? Is that going to be maintained through a
separate effort or is the plan to  have the Accumulo community officially
support it?

On Mon, Oct 6, 2014 at 2:34 PM, Josh Elser josh.el...@gmail.com wrote:

 It'd be really cool to see a C++ client -- fully implemented or not. The
 increased performance via other languages like you said would be really
 nice, but I'd also be curious to see how the server characteristics change
 when the client might be sending data at a much faster rate.

 My C++ is super rusty these days, but I'd be happy to help out any devs
 who can spearhead the effort :)


 John R. Frank wrote:

 Accumulo Developers,

 We're trying to boost throughput of non-Java tools with Accumulo.  It
 seems that the lowest hanging fruit is to stop using the thrift proxy. Per
 discussion about Python and thrift proxy in the users list [1], I'm
 wondering if anyone is interested in helping with a native C++ client?
 There is a start on one here [2]. We could offer a bounty or maybe make a
 consulting project depending who is interested in it.

 We also looked at trying to run a separate thrift proxy for every worker
 thread or process.  With many cores on a box, eg 32, it just doesn't seem
 practical to run that many proxies, even if they all run on a single JVM.
 We'd be glad to hear ideas on that front too.

 A potentially big benefit of making a proper C++ accumulo client is that
 it is straightforward to expose native interfaces in Python (via pyObject),
 Go [3], Ruby [4], and other languages.

 Thanks for any advice, pointers, interest.

 John


 1-- http://www.mail-archive.com/user@accumulo.apache.org/msg03999.html

 2--
 https://github.com/phrocker/apeirogon

 3-- http://golang.org/cmd/cgo/

 4-- https://www.amberbit.com/blog/2014/6/12/calling-c-cpp-from-ruby/


 Sent from +1-617-899-2066




Re: Deprecation removal for 1.7.0

2014-10-06 Thread Sean Busbey
No objection to removing aggregators.

If anything first deprecated in 1.5 has managed to live this long in 1.7
I'd like to keep it so folks have an easier time getting off of 1.5 when we
EOL it. But I realize some things have probably already been removed.

On Mon, Oct 6, 2014 at 3:00 PM, Christopher ctubb...@apache.org wrote:

 Re: ACCUMULO-3197

 First:
 Any objections to finally removing Aggregators in 1.7.0?
 They've been deprecated in favor of Combiners since 1.4.

 Second:
 Is there any API deprecated in 1.6.x or earlier that you really want
 preserved in 1.7.0?
 (I know we need to keep INSTANCE_DFS_{URI,DIR} properties for volume
 upgrades, at least.)

 --
 Christopher L Tubbs II
 http://gravatar.com/ctubbsii




-- 
Sean


Re: Deprecation removal for 1.7.0

2014-10-06 Thread Christopher
The main thing I'm looking at which is causing problems for me is the
instance.getConfiguration() stuff. It was never well defined, usually
didn't work or do what was expected of it, and is still being leveraged
(incorrectly) by new code (replication, for instance, and I've already
informed Josh), because of
ServerConfigurationUtil.getConfiguration(Instance instance). It wasn't
formally deprecated until 1.6.0, though.

Aside from that, everything else is just a nice cleanup. A somewhat
exhaustive list of what I was looking at was:

Scanner timeout options
extra batchwriter/batchdeleter factory methods
some junk in MutationsRejectedException
extra ZooKeeperInstance constructors
securityOperations stuff from 1.5
extra getSplits and flush in tableOperations
Constants.NO_AUTHS
KeyExtents.getKeyExtentsForRange
an extra Value constructor which copies from a ByteBuffer
iterators that moved packages in 1.4
some protected getters in the mapred stuff
unused RangeInputSplit in InputFormatBase
LogFileKey/LogFileValue (old version)


You can review the expected changes at
https://github.com/ctubbsii/accumulo/tree/ACCUMULO-3197 (in two commits,
one for instance stuff, the other for aggregators and everything else).


--
Christopher L Tubbs II
http://gravatar.com/ctubbsii

On Mon, Oct 6, 2014 at 4:11 PM, Sean Busbey bus...@cloudera.com wrote:

 No objection to removing aggregators.

 If anything first deprecated in 1.5 has managed to live this long in 1.7
 I'd like to keep it so folks have an easier time getting off of 1.5 when we
 EOL it. But I realize some things have probably already been removed.

 On Mon, Oct 6, 2014 at 3:00 PM, Christopher ctubb...@apache.org wrote:

  Re: ACCUMULO-3197
 
  First:
  Any objections to finally removing Aggregators in 1.7.0?
  They've been deprecated in favor of Combiners since 1.4.
 
  Second:
  Is there any API deprecated in 1.6.x or earlier that you really want
  preserved in 1.7.0?
  (I know we need to keep INSTANCE_DFS_{URI,DIR} properties for volume
  upgrades, at least.)
 
  --
  Christopher L Tubbs II
  http://gravatar.com/ctubbsii
 



 --
 Sean



Re: Deprecation removal for 1.7.0

2014-10-06 Thread Mike Drob
Do we still have mapred(uce) stuff?

On Mon, Oct 6, 2014 at 3:54 PM, Christopher ctubb...@apache.org wrote:

 The main thing I'm looking at which is causing problems for me is the
 instance.getConfiguration() stuff. It was never well defined, usually
 didn't work or do what was expected of it, and is still being leveraged
 (incorrectly) by new code (replication, for instance, and I've already
 informed Josh), because of
 ServerConfigurationUtil.getConfiguration(Instance instance). It wasn't
 formally deprecated until 1.6.0, though.

 Aside from that, everything else is just a nice cleanup. A somewhat
 exhaustive list of what I was looking at was:

 Scanner timeout options
 extra batchwriter/batchdeleter factory methods
 some junk in MutationsRejectedException
 extra ZooKeeperInstance constructors
 securityOperations stuff from 1.5
 extra getSplits and flush in tableOperations
 Constants.NO_AUTHS
 KeyExtents.getKeyExtentsForRange
 an extra Value constructor which copies from a ByteBuffer
 iterators that moved packages in 1.4
 some protected getters in the mapred stuff
 unused RangeInputSplit in InputFormatBase
 LogFileKey/LogFileValue (old version)


 You can review the expected changes at
 https://github.com/ctubbsii/accumulo/tree/ACCUMULO-3197 (in two commits,
 one for instance stuff, the other for aggregators and everything else).


 --
 Christopher L Tubbs II
 http://gravatar.com/ctubbsii

 On Mon, Oct 6, 2014 at 4:11 PM, Sean Busbey bus...@cloudera.com wrote:

  No objection to removing aggregators.
 
  If anything first deprecated in 1.5 has managed to live this long in 1.7
  I'd like to keep it so folks have an easier time getting off of 1.5 when
 we
  EOL it. But I realize some things have probably already been removed.
 
  On Mon, Oct 6, 2014 at 3:00 PM, Christopher ctubb...@apache.org wrote:
 
   Re: ACCUMULO-3197
  
   First:
   Any objections to finally removing Aggregators in 1.7.0?
   They've been deprecated in favor of Combiners since 1.4.
  
   Second:
   Is there any API deprecated in 1.6.x or earlier that you really want
   preserved in 1.7.0?
   (I know we need to keep INSTANCE_DFS_{URI,DIR} properties for volume
   upgrades, at least.)
  
   --
   Christopher L Tubbs II
   http://gravatar.com/ctubbsii
  
 
 
 
  --
  Sean
 



Re: C++ accumulo client -- native clients for Python, Go, Ruby etc

2014-10-06 Thread John R. Frank
Two kinds of gains:

1) single client throughput:  the extra RPC hop through the proxy deserializes 
and then reserializes the messages.  With the proxy running locally the extra 
network hop is less of an issue.  This was discussed on the user list (see link 
earlier in this thread), and 5x slow down was suggested as a possible swag 
estimate. 

2) cluster management complexity: it's clearly best to have the proxy local to 
the workers, but if you have a worker on every core of a large box (eg 32), 
then having a single proxy on each worker machine becomes a bottleneck. Running 
many proxies on a single JVM is the next thing we could try to improve this --- 
having a native client seems preferable. 


Comments?

jrf


 On Oct 6, 2014, at 4:15 PM, David Medinets david.medin...@gmail.com wrote:
 
 How far away from the theoretical maximum rate is the thrift protocol?
 What kind of gain is expected from the native C++ approach?
 
 On Sat, Oct 4, 2014 at 12:56 PM, John R. Frank j...@diffeo.com wrote:
 Accumulo Developers,
 
 We're trying to boost throughput of non-Java tools with Accumulo.  It seems 
 that the lowest hanging fruit is to stop using the thrift proxy. Per 
 discussion about Python and thrift proxy in the users list [1], I'm 
 wondering if anyone is interested in helping with a native C++ client?  
 There is a start on one here [2]. We could offer a bounty or maybe make a 
 consulting project depending who is interested in it.
 
 We also looked at trying to run a separate thrift proxy for every worker 
 thread or process.  With many cores on a box, eg 32, it just doesn't seem 
 practical to run that many proxies, even if they all run on a single JVM. 
 We'd be glad to hear ideas on that front too.
 
 A potentially big benefit of making a proper C++ accumulo client is that it 
 is straightforward to expose native interfaces in Python (via pyObject), Go 
 [3], Ruby [4], and other languages.
 
 Thanks for any advice, pointers, interest.
 
 John
 
 
 1-- http://www.mail-archive.com/user@accumulo.apache.org/msg03999.html
 
 2--
 https://github.com/phrocker/apeirogon
 
 3-- http://golang.org/cmd/cgo/
 
 4-- https://www.amberbit.com/blog/2014/6/12/calling-c-cpp-from-ruby/
 
 
 Sent from +1-617-899-2066


Re: Deprecation removal for 1.7.0

2014-10-06 Thread Josh Elser
Christopher, would it make sense to get a patch of the actual things 
you're looking at potentially removing, or would that be a waste of time 
this early?


Mike Drob wrote:

I think before we can agree on a deprecation strategy, we need to firm up
the scope for this release plan.


What are the intentions for 1.7.0? Is it a minor release in the sense of
our previous minor releases, where we add a bunch of new features and
maintain some compatibility promises? Or are we going to try and make it a
truer minor release, where we cut down on the number of features and have
more conservative stakes in the ground?


Personally, I think 1.7.0 is shaping up to be a full-featured release 
given the amount of time since 1.6.0. I wanted to do a scrape of JIRA 
and collect the stuff that I know is done/in-progress.



Is this the same 1.7.0 that was going to be renamed to 2.0.0? Or an
intermediate release?


Intermediate -- the revised client API that Christopher is working on 
would be punted to a 1.8/2.0.



When do we need to deprecate the mapred API if we plan to drop Hadoop 1
support in Accumulo 2? (as has been discussed, but I'm not sure it was ever
formally decided.)

In general, I'm inclined to leave as much in as possible, and then if we
must remove things then do so in 2.0.0. I know that our compatibility
statement only promises one minor version, but that doesn't mean we have to
be strict at every opportunity.

Mike

On Mon, Oct 6, 2014 at 4:03 PM, Billie Rinaldibillie.rina...@gmail.com
wrote:


Yes, we have both.  Neither is deprecated.

On Mon, Oct 6, 2014 at 1:56 PM, Mike Drobmad...@cloudera.com  wrote:


Do we still have mapred(uce) stuff?

On Mon, Oct 6, 2014 at 3:54 PM, Christopherctubb...@apache.org  wrote:


The main thing I'm looking at which is causing problems for me is the
instance.getConfiguration() stuff. It was never well defined, usually
didn't work or do what was expected of it, and is still being leveraged
(incorrectly) by new code (replication, for instance, and I've already
informed Josh), because of
ServerConfigurationUtil.getConfiguration(Instance instance). It wasn't
formally deprecated until 1.6.0, though.

Aside from that, everything else is just a nice cleanup. A somewhat
exhaustive list of what I was looking at was:

Scanner timeout options
extra batchwriter/batchdeleter factory methods
some junk in MutationsRejectedException
extra ZooKeeperInstance constructors
securityOperations stuff from 1.5
extra getSplits and flush in tableOperations
Constants.NO_AUTHS
KeyExtents.getKeyExtentsForRange
an extra Value constructor which copies from a ByteBuffer
iterators that moved packages in 1.4
some protected getters in the mapred stuff
unused RangeInputSplit in InputFormatBase
LogFileKey/LogFileValue (old version)


You can review the expected changes at
https://github.com/ctubbsii/accumulo/tree/ACCUMULO-3197 (in two

commits,

one for instance stuff, the other for aggregators and everything else).


--
Christopher L Tubbs II
http://gravatar.com/ctubbsii

On Mon, Oct 6, 2014 at 4:11 PM, Sean Busbeybus...@cloudera.com

wrote:

No objection to removing aggregators.

If anything first deprecated in 1.5 has managed to live this long in

1.7

I'd like to keep it so folks have an easier time getting off of 1.5

when

we

EOL it. But I realize some things have probably already been removed.

On Mon, Oct 6, 2014 at 3:00 PM, Christopherctubb...@apache.org

wrote:

Re: ACCUMULO-3197

First:
Any objections to finally removing Aggregators in 1.7.0?
They've been deprecated in favor of Combiners since 1.4.

Second:
Is there any API deprecated in 1.6.x or earlier that you really

want

preserved in 1.7.0?
(I know we need to keep INSTANCE_DFS_{URI,DIR} properties for

volume

upgrades, at least.)

--
Christopher L Tubbs II
http://gravatar.com/ctubbsii




--
Sean





Re: Deprecation removal for 1.7.0

2014-10-06 Thread Sean Busbey
On Mon, Oct 6, 2014 at 4:12 PM, Mike Drob mad...@cloudera.com wrote:



 In general, I'm inclined to leave as much in as possible, and then if we
 must remove things then do so in 2.0.0. I know that our compatibility
 statement only promises one minor version, but that doesn't mean we have to
 be strict at every opportunity.

 Mike



Related, I'd like to EOL 1.5 shortly after 1.7 gets released. I don't want
to derail this thread with that discussion, but my guess is it's a much
easier sell if we're conservative about removing things. Just so everyone
knows where I'm coming from.


-- 
Sean


Re: Deprecation removal for 1.7.0

2014-10-06 Thread Christopher
See https://github.com/ctubbsii/accumulo/tree/ACCUMULO-3197 for the two
commits proposed for removing deprecated stuffs. One removes the
instance.getConfiguration nightmare that I'd really like to proceed with.
The other removes aggregators and other cleanup, which I don't feel
strongly about.


--
Christopher L Tubbs II
http://gravatar.com/ctubbsii

On Mon, Oct 6, 2014 at 5:20 PM, Josh Elser josh.el...@gmail.com wrote:

 Christopher, would it make sense to get a patch of the actual things
 you're looking at potentially removing, or would that be a waste of time
 this early?

 Mike Drob wrote:

 I think before we can agree on a deprecation strategy, we need to firm up
 the scope for this release plan.


 What are the intentions for 1.7.0? Is it a minor release in the sense of
 our previous minor releases, where we add a bunch of new features and
 maintain some compatibility promises? Or are we going to try and make it a
 truer minor release, where we cut down on the number of features and have
 more conservative stakes in the ground?


 Personally, I think 1.7.0 is shaping up to be a full-featured release
 given the amount of time since 1.6.0. I wanted to do a scrape of JIRA and
 collect the stuff that I know is done/in-progress.

  Is this the same 1.7.0 that was going to be renamed to 2.0.0? Or an
 intermediate release?


 Intermediate -- the revised client API that Christopher is working on
 would be punted to a 1.8/2.0.


  When do we need to deprecate the mapred API if we plan to drop Hadoop 1
 support in Accumulo 2? (as has been discussed, but I'm not sure it was
 ever
 formally decided.)

 In general, I'm inclined to leave as much in as possible, and then if we
 must remove things then do so in 2.0.0. I know that our compatibility
 statement only promises one minor version, but that doesn't mean we have
 to
 be strict at every opportunity.

 Mike

 On Mon, Oct 6, 2014 at 4:03 PM, Billie Rinaldibillie.rina...@gmail.com
 wrote:

  Yes, we have both.  Neither is deprecated.

 On Mon, Oct 6, 2014 at 1:56 PM, Mike Drobmad...@cloudera.com  wrote:

  Do we still have mapred(uce) stuff?

 On Mon, Oct 6, 2014 at 3:54 PM, Christopherctubb...@apache.org
 wrote:

  The main thing I'm looking at which is causing problems for me is the
 instance.getConfiguration() stuff. It was never well defined, usually
 didn't work or do what was expected of it, and is still being leveraged
 (incorrectly) by new code (replication, for instance, and I've already
 informed Josh), because of
 ServerConfigurationUtil.getConfiguration(Instance instance). It wasn't
 formally deprecated until 1.6.0, though.

 Aside from that, everything else is just a nice cleanup. A somewhat
 exhaustive list of what I was looking at was:

 Scanner timeout options
 extra batchwriter/batchdeleter factory methods
 some junk in MutationsRejectedException
 extra ZooKeeperInstance constructors
 securityOperations stuff from 1.5
 extra getSplits and flush in tableOperations
 Constants.NO_AUTHS
 KeyExtents.getKeyExtentsForRange
 an extra Value constructor which copies from a ByteBuffer
 iterators that moved packages in 1.4
 some protected getters in the mapred stuff
 unused RangeInputSplit in InputFormatBase
 LogFileKey/LogFileValue (old version)


 You can review the expected changes at
 https://github.com/ctubbsii/accumulo/tree/ACCUMULO-3197 (in two

 commits,

 one for instance stuff, the other for aggregators and everything else).


 --
 Christopher L Tubbs II
 http://gravatar.com/ctubbsii

 On Mon, Oct 6, 2014 at 4:11 PM, Sean Busbeybus...@cloudera.com

 wrote:

 No objection to removing aggregators.

 If anything first deprecated in 1.5 has managed to live this long in

 1.7

 I'd like to keep it so folks have an easier time getting off of 1.5

 when

 we

 EOL it. But I realize some things have probably already been removed.

 On Mon, Oct 6, 2014 at 3:00 PM, Christopherctubb...@apache.org

 wrote:

 Re: ACCUMULO-3197

 First:
 Any objections to finally removing Aggregators in 1.7.0?
 They've been deprecated in favor of Combiners since 1.4.

 Second:
 Is there any API deprecated in 1.6.x or earlier that you really

 want

 preserved in 1.7.0?
 (I know we need to keep INSTANCE_DFS_{URI,DIR} properties for

 volume

 upgrades, at least.)

 --
 Christopher L Tubbs II
 http://gravatar.com/ctubbsii



 --
 Sean





Re: Deprecation removal for 1.7.0

2014-10-06 Thread Sean Busbey
On Mon, Oct 6, 2014 at 4:51 PM, Christopher ctubb...@apache.org wrote:

 On Mon, Oct 6, 2014 at 5:20 PM, Sean Busbey bus...@cloudera.com wrote:

  On Mon, Oct 6, 2014 at 4:12 PM, Mike Drob mad...@cloudera.com wrote:
 
  
  
   In general, I'm inclined to leave as much in as possible, and then if
 we
   must remove things then do so in 2.0.0. I know that our compatibility
   statement only promises one minor version, but that doesn't mean we
 have
  to
   be strict at every opportunity.
  
   Mike
  
  
 
  Related, I'd like to EOL 1.5 shortly after 1.7 gets released. I don't
 want
  to derail this thread with that discussion, but my guess is it's a much
  easier sell if we're conservative about removing things. Just so everyone
  knows where I'm coming from.
 
 
 
 (+1 for EOL 1.5 after)

 In general, does this mean that you're okay with removing stuff deprecated
 prior to 1.5? With the exception of the instance.getConfiguration stuff,
 which was deprecated in 1.6.0 and I'd like to remove in 1.7.0, due to its
 problematic nature (requires further discussion), I could restrict the
 remaining cleanup to only stuff deprecated prior to 1.5.


For me, yeah that's the cut point I'd prefer to use. I'm hoping anyone who
did the move to 1.5 didn't move from a removed api to a deprecated API.

Maybe we should send a ping to user@ asking if any 1.5 users want to pipe
up about APIs they're using that were deprecated prior to 1.5?

-- 
Sean


Re: 1.7 release timeline

2014-10-06 Thread Josh Elser

Thanks, John.

I was thinking about trying to gun for January time-frame for a release. 
I'd love to say before 2014 is over, but that probably just won't happen 
for a major release with the holidays.


For 1.7 right now, I see the following bigger items (correct me where 
I'm wrong):


* Replication (done)
* Upgrade rules/guarantees (proposed)
* Replace cloudtrace (in-progress)
* Rewrite monitor, include REST service (in-progress)
* Drop Hadoop 1 support (proposed)
* Decouple MiniAccumulo from ITs (in-progress)
* Other minicluster types: in-process, shim to real instance (in-progress)
* Support Hadoop metrics2 (proposed)
* A few WAL/metadata related performance improvements (in-progress)

Also, would be good to check the In-Progress state issues on JIRA. What 
do people think?


John Vines wrote:

Moving this to it's own thread...

On Mon, Oct 6, 2014 at 5:54 PM, Mike Drobmad...@cloudera.com  wrote:


Related: Do we have a release timeline for 1.7?



Re: 1.7 release timeline

2014-10-06 Thread Josh Elser
Yes, of course. We definitely need to see some code here before it gets 
officially slated for 1.7. I just know that efforts are being put 
towards it, so I wanted to list it.


Christopher wrote:

Would replacing cloudtrace be part of1.7?  I'm not sure about that. I'd
like to see where that's headed before we decide on that. Personally, I'd
prefer Zipkin, since htrace is basically a copy of
cloudtrace/accumulo-trace, and it has some of the same issues (millis time,
for instance, instead of relative nanos, which is independent of the system
clock and actually intended for time spans).


Re: 1.7 release timeline

2014-10-06 Thread Billie Rinaldi
Zipkin is a possible replacement for our trace collection system.  It does
not provide instrumentation like cloudtrace or htrace, so even if we make
zipkin the default collection system we will still need instrumentation.
Anyway, we can discuss the details and approach elsewhere.  I'd certainly
want the trace work to be in 2.0, but if we decide not to put it in 1.7
that would be okay.

On Mon, Oct 6, 2014 at 5:59 PM, Christopher ctubb...@apache.org wrote:

 Would replacing cloudtrace be part of 1.7? I'm not sure about that. I'd
 like to see where that's headed before we decide on that. Personally, I'd
 prefer Zipkin, since htrace is basically a copy of
 cloudtrace/accumulo-trace, and it has some of the same issues (millis time,
 for instance, instead of relative nanos, which is independent of the system
 clock and actually intended for time spans).

 I think the upgrade guarantees are more a 2.0.0 thing, but I think we can
 be a bit more conservative in 1.x to move towards that. I wouldn't mind
 dropping Hadoop 1 support in 1.7.0. (I guess we should just vote on that).

 I'd really like to include the VolumeChooser improvements (in particular
 ACCUMULO-3177, which depends on ACCUMULO-3176).


 --
 Christopher L Tubbs II
 http://gravatar.com/ctubbsii

 On Mon, Oct 6, 2014 at 8:38 PM, Josh Elser josh.el...@gmail.com wrote:

  Thanks, John.
 
  I was thinking about trying to gun for January time-frame for a release.
  I'd love to say before 2014 is over, but that probably just won't happen
  for a major release with the holidays.
 
  For 1.7 right now, I see the following bigger items (correct me where
  I'm wrong):
 
  * Replication (done)
  * Upgrade rules/guarantees (proposed)
  * Replace cloudtrace (in-progress)
  * Rewrite monitor, include REST service (in-progress)
  * Drop Hadoop 1 support (proposed)
  * Decouple MiniAccumulo from ITs (in-progress)
  * Other minicluster types: in-process, shim to real instance
 (in-progress)
  * Support Hadoop metrics2 (proposed)
  * A few WAL/metadata related performance improvements (in-progress)
 
  Also, would be good to check the In-Progress state issues on JIRA. What
 do
  people think?
 
 
  John Vines wrote:
 
  Moving this to it's own thread...
 
  On Mon, Oct 6, 2014 at 5:54 PM, Mike Drobmad...@cloudera.com  wrote:
 
   Related: Do we have a release timeline for 1.7?
 
 



[GitHub] accumulo pull request: ACCUMULO-2826 Allow single CF for Intersect...

2014-10-06 Thread asfgit
Github user asfgit closed the pull request at:

https://github.com/apache/accumulo/pull/8


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


Re: Deprecation removal for 1.7.0

2014-10-06 Thread Adam Fuchs
So, I think we can make a general argument to set policy, and when removing
a specific method we should make a specific argument. Personally, I would
set the bar at identifying the specific harm cause by the retention of the
method, as well as polling the community and considering objections.

Christopher, you made an argument about people misunderstanding the
semantics of the method and using it incorrectly. Is that not solved by
just deprecating the method?

It would be nice to have a more structured way of polling the community for
continuing use of deprecated code. Can anyone propose a way of doing this?
Maybe a call-back system where people can register the deprecated methods
that they care about? Maybe some scripts that people can use to determine
which deprecated methods they depend on and submit those to us?

Adam
On Mon, Oct 6, 2014 at 4:42 PM, Jeremy Kepner kep...@ll.mit.edu wrote:

 -1

 Need a good reason why the current deprecated code is causing harm to
 Accumulo.


In general, keeping around deprecated code restricts how much we can
optimize behind the scenes (both for performance or maintainability). It
also keeps our test burden higher.

I'll let Christopher speak to the specifics of what he wants to remove, but
it sounds like at least one of them is something that commonly results in
incorrect usage, even internally.

--
Sean


Re: [PROPOSAL] 1.7/2.0 branches and git workflow change

2014-10-06 Thread Christopher
True. Everything I'm thinking of would work with no master, but that might
be confusing, and might break some tooling without extra effort (which
branch is default when cloning?). We also kind of assume that the master
branch is forward-moving only, but other branches are disposable and can be
rebase'd, deleted, re-created, etc.

Alternatively, if people understood that a 2.0 branch is a future
branch when 1.7 (master) is the current, that'd work, too... I just worry
that people will merge it poorly.

I suppose the best option, then, is probably to keep the status quo, and
use a branch name like ACCUMULO- which represents the overall work
for a particular future release plan, instead of a name which looks like a
maintenance branch.


--
Christopher L Tubbs II
http://gravatar.com/ctubbsii

On Mon, Oct 6, 2014 at 10:59 PM, William Slacum 
wilhelm.von.cl...@accumulo.net wrote:

 It seems to me you can get everything you want by merely getting rid of
 master or making master just be the 1.7 branch. I'm not really concerned
 about the name, because it's easy enough to figure out. Master duplicating
 a tag doesn't really seem useful to me, save for here's the highest
 version we have released, which is of limited utility when a user can just
 check the tags. I don't see the point in having master be something for the
 sake of having master.



 On Mon, Oct 6, 2014 at 9:19 PM, Josh Elser josh.el...@gmail.com wrote:

  Christopher wrote:
 
  What purpose does the master branch serve if it's just the same as the
  last
major release tag?
  
  
 
  I think Josh had some specific opinions on this, but the general idea
 from
  what I understood was that master is supposed to be stable...
  representative of the latest, most modern release, because it's what a
 new
  contributor would expect to fork to create a patch. That's hard to do if
  the goalpost is moving a lot, and it makes feature merges more
  complicated,
  since contributors have to rebase or merge themselves in order to
 create a
  patch that merges cleanly. Having a stable master makes it very easy to
  contribute to the most recent release.
 
 
  No, I don't really care for a stable-only master (I think I diverge from
  the git-flow model in that regard). I like master to just be a
  commits-go-here area more than anything.