Re: [DISCUSS] Java 8 support (was Fwd: [jira] [Commented] (ACCUMULO-4177) TinyLFU-based BlockCache)

2016-05-06 Thread Dave Marion
It's 2.0, remove mock and deprecate it in 1.8 if it's not already.
On May 6, 2016 10:25 AM, "Josh Elser"  wrote:

We can't disable modernizer just for mock? Or really, any code which we
intentionally don't want to modernize?
On May 5, 2016 11:43 PM, "Christopher"  wrote:

> Another interesting point... didn't realize until actually doing it:
> bumping to JDK8 *requires* a bump in the major version, because modernizer
> will block on some incompatible API changes in Mock, which is already
> deprecated. (Unless we're okay with disabling modernizer... which I guess
> is an acceptable solution... but it makes me unhappy :) )
>
> On Thu, May 5, 2016 at 11:39 PM Josh Elser  wrote:
>
> > Thanks boss. I figured you'd have my back :)
> > On May 5, 2016 9:43 PM, "Christopher"  wrote:
> >
> > > Already pushed. Initially forgot about modernizer, but I'm working
> > through
> > > it now.
> > >
> > > On Thu, May 5, 2016 at 7:25 PM Josh Elser 
> wrote:
> > >
> > > > Sounds good!
> > > >
> > > > I had tried to switch master to jdk8 as well, but ran into
modernizer
> > > > plugin issues. I've since been on a call, so I haven't been able to
> > push
> > > > that update. I'll get to it when I can, but perhaps someone has
> beaten
> > > > me to it already.
> > > >
> > > > Christopher wrote:
> > > > > Okay, so if we're okay treating the master branch as a 2.0
> > development
> > > > > branch, then I'm going to go ahead and start focusing on some 2.0
> > > tickets
> > > > > that may involve refactoring which have breaking changes that I've
> > been
> > > > > reluctant to do before without an explicit 2.0 development branch.
> Of
> > > > > course, none of this says we have to stop development on 1.x
> stuffs,
> > or
> > > > > says anything about when we'll release a 2.0, but it'd be nice to
> > have
> > > a
> > > > > place to start putting in stuff for an eventual 2.0.
> > > > >
> > > > > On Thu, May 5, 2016 at 11:07 AM Josh Elser
> > > wrote:
> > > > >
> > > > >> Ok, looks to me that we are in agreement now and don't need a
> vote.
> > > > >>
> > > > >> I will create a 1.8 branch today (updating Jenkins appropriately)
> so
> > > we
> > > > >> can get master in a state that would be ready for the changes in
> > 4177.
> > > > >>
> > > > >> Keith Turner wrote:
> > > > >>> On Tue, May 3, 2016 at 4:54 PM, Christopher
> > > >  wrote:
> > > > >>>
> > > >  I think I'd prefer leaving 1.8 as it stands, with the
> expectation
> > to
> > > > >> have a
> > > >  release line of 1.8 which only requires Java 7.
> > > > 
> > > > >>> +1
> > > > >>>
> > > > >>> I can not see any reason to switch to JDK8 before releasing
> 1.8...
> > > > >> assuming
> > > > >>> thats going to happen soonish
> > > > >>>
> > > > >>>
> > > >  We can create a 2.0 branch, which bumps the Java version, and
> can
> > > > accept
> > > >  changes which require Java 8 or API-breaking changes (as per
> > semver)
> > > > for
> > > >  the next major release line after 1.8.
> > > > 
> > > >  That would put us on a solid roadmap for 2.0 without disrupting
> > 1.8
> > > >  development, which is probably already nearing release
> readiness.
> > > > 
> > > >  On Tue, May 3, 2016 at 4:33 PM Josh Elser
> > > >  wrote:
> > > > 
> > > > > Gotcha. Thanks for clarifying, Mike -- I'm inclined to agree
> with
> > > > you.
> > > > >> I
> > > > > can't think of a reason why we would upgrade to Java8 and not
> > make
> > > > use
> > > > > of it in some way (publicly or privately).
> > > > >
> > > > > That being said, I don't think I see consensus. How about we
> > > regroup
> > > > in
> > > > > the form of a vote? (normal semver rules are an invariant --
no
> > > > changes
> > > > > to our public API compatibility rules are implied by the
below)
> > > > >
> > > > > * Call the current 1.8.0-SNAPSHOT (master) "2.0.0-SNAPSHOT"
and
> > > move
> > > > to
> > > > > jdk8
> > > > > * Branch 1.8, make master 2.0.0-SNAPSHOT. 1.8 stays jdk7, 2.0
> > goes
> > > > jdk8
> > > > >
> > > > > Please chime in if I missed another option or am calling
> > discussion
> > > > too
> > > > > soon. It just seems like we might have veered off-track and I
> > don't
> > > > >> want
> > > > > this to fall to the wayside (again) without decision.
> > > > >
> > > > > Mike Drob wrote:
> > > > >> If our code ends up using java 8 bytecode in any classes
> > required
> > > > by a
> > > > >> consumer, then I think they will get compilation (linking?)
> > > errors,
> > > > >> regardless of java 8 types in our methods signatures.
> > > > >>
> > > > >> On Tue, May 3, 2016 at 3:09 PM, Josh Elser<
> josh.el...@gmail.com
> > >
> > > >  wrote:
> > > > >>> That's a new assertion ("we can't actually use Java 8
> features
> > > util
> > > > >>> Accumulo-2"), isn't it? We could use new Java 8 features
> > > internally
> > > > > which
> > > > >>> would require a minimum of Java 8 and no

Re: [DISCUSS] Java 8

2016-08-18 Thread Dave Marion
I think we discussed this previously. If I remember correctly, I suggested,
in this case, releasing 1.x as is, closely followed by a 2.0 with newer
dependencies and deprecated items removed and much the same features.

Found it: http://mail-archives.apache.org/mod_mbox/accumulo-dev/201605.mbox/
<2113920695.26610898.1462308522327.javamail.zim...@comcast.net>

On Aug 18, 2016 7:13 PM, "Christopher"  wrote:

> Oh, master is in a terrible state (test instabilities). I wouldn't think
> it's even close. Trying to support 1.6, 1.7, and working towards 1.8,
> there's nothing left for working on master.
>
> If we wanted to do a quick 2.0 release and a Java 7 1.8 release, we can
> fork a 2.0 from 1.8 for JDK 8.
>
> My main concern with this suggestion, though, is the need to continue to
> support 4x branches. 1.7, 1.8, 2.0, and whatever master becomes (probably
> 3.0). I think it'll spread us far too thin (I think we're already too
> thin), and I don't think we can afford to drop 1.7, like we can for 1.6,
> because 1.8 hasn't been "in the wild" long enough yet, and we should
> continue to support 1.7.
>
> On Thu, Aug 18, 2016 at 6:50 PM Sean Busbey  wrote:
>
> > I'm all for moving us towards java 8+ only, but I'm still -1 on
> > dropping java 7 in a minor release. Plenty of folks still run Java 7
> > in production. I'm sure a non-zero number of them will want to update
> > versions and a major version is how we communicate that level of
> > expected disruption.
> >
> > How about we get 1.8 out the door with Java 7 + Java 8, then try to
> > get master out the door with Java 8 as the minimum version? What's the
> > blocker on a release from master now?
> >
> > On Thu, Aug 18, 2016 at 5:46 PM, Christopher 
> wrote:
> > > We need to make sure this release works with Java 8 anyway... but this
> > > change would tighten things up a bit, so we don't have to worry about
> > > supporting Java 7. It narrows our testing and allows us to focus on
> just
> > > the non-EOL, modern Java versions that we should be realistically
> > expecting
> > > users of Accumulo 1.8 to be using anyway.
> > >
> > > On Thu, Aug 18, 2016 at 6:37 PM Josh Elser 
> wrote:
> > >
> > >> Err, I am not a big fan of making this change after two rc's and all
> of
> > >> the testing I've been babysitting this week.
> > >>
> > >> I have no problem with you spinning a 2.0 which is 99% similar to 1.8
> > >> with whatever else you'd like to do (in fact, I'd encourage anyone to
> > >> step up and drive 2.0 to release).
> > >>
> > >> Sean Busbey wrote:
> > >> > Why don't we just make the 1.8 branch 2.0 then? I really don't want
> to
> > >> > drop support for JDKs on non-major releases; it's super disruptive.
> > >> >
> > >> > On Thu, Aug 18, 2016 at 4:01 PM, Christopher
> > >> wrote:
> > >> >> I know we've talked about this before, but I kind of want to just
> use
> > >> Java
> > >> >> 8 for Accumulo 1.8. It'd help clean up some things in the build
> (can
> > >> make
> > >> >> use of newer versions of build plugins, and make it easier for new
> > >> >> development against the latest release).
> > >> >>
> > >> >> I just don't know how reasonable it is to keep making new,
> non-bugfix
> > >> >> releases on EOL JDKs (even though I may have previously argued that
> > >> it'd be
> > >> >> safer to just wait until a major version bump).
> > >> >
> > >> >
> > >> >
> > >>
> >
> >
> >
> > --
> > busbey
> >
>


RE: Running Accumulo on a standard file system, without Hadoop

2017-01-16 Thread Dave Marion
IIRC, Accumulo *only* uses the HDFS client, so it needs something on the other 
side that can respond to that protocol. MiniAccumulo starts up MiniHDFS for 
this. You could run some other type of service locally that is HDFS client 
compatible (something like Quantcast QFS[1], setting up client [2]). If 
Accumulo is using something in Hadoop outside of the public client API, this 
may not work.

[1] https://github.com/quantcast/qfs
[2] https://github.com/quantcast/qfs/wiki/Migration-Guide


> -Original Message-
> From: Dylan Hutchison [mailto:dhutc...@cs.washington.edu]
> Sent: Monday, January 16, 2017 3:17 PM
> To: dev@accumulo.apache.org
> Subject: Running Accumulo on a standard file system, without Hadoop
> 
> Hi folks,
> 
> A friend of mine asked about running Accumulo on a normal file system in
> place of Hadoop, similar to the way MiniAccumulo runs.  How possible is this,
> or how much work would it take to do so?
> 
> I think my friend is just interested in running on a single node, but I am
> curious about both the single-node and distributed (via parallel file system
> like Lustre) cases.
> 
> Thanks, Dylan



Qonduit - secure web socket proxy for Accumulo

2017-02-22 Thread Dave Marion
I extracted the Netty web socket pipeline from Timely into it's own server 
process, and modified it to support custom and pluggable server side logic, and 
request / response objects. The readme has a little more information, to 
include how it's different than the current proxy, it's located at 
https://github.com/NationalSecurityAgency/timely/tree/qonduit.


For now the code is located in an orphan branch in the Timely GitHub project. 
Someone suggested it might make sense as an Accumulo sub-project - let me know 
if there is interest in that.


- Dave


Re: Qonduit - secure web socket proxy for Accumulo

2017-02-22 Thread Dave Marion
There is a WebSocketClientIT test in the test module to test access using the 
Java client. I have implemented operations that get the Qonduit server version 
and an operation to run an Accumulo Scanner. Both of these implementation are 
in the operations modules. An example of creating/using a web socket from a web 
page exists in the Timely codebase[1]. For doing this with Qonduit, the request 
and response objects need to be encoded/decoded with a CBOR library (for which 
there are several).

Regarding security, authentication is performed on the server side using 
whatever pluggable modules are configured (basic auth, x509, etc). The 
transport is encrypted from the browser to the Qonduit server using SSL (HTTPS 
/ WSS).

[1] 
https://github.com/NationalSecurityAgency/timely/blob/master/server/src/main/resources/webapp/index.html

> On February 22, 2017 at 1:55 PM Josh Elser  wrote:
> 
> 
> Neat. Thanks for sharing!
> 
> Any examples to show how a client would use it?
> 
> Regarding the security, does it encompass authentication and privacy 
> (encryption)? Any experience with certain implementations for the Spring 
> security modules (e.g. which ones you've tested to work)?
> 
> Dave Marion wrote:
> > I extracted the Netty web socket pipeline from Timely into it's own server 
> > process, and modified it to support custom and pluggable server side logic, 
> > and request / response objects. The readme has a little more information, 
> > to include how it's different than the current proxy, it's located at 
> > https://github.com/NationalSecurityAgency/timely/tree/qonduit.
> >
> >
> > For now the code is located in an orphan branch in the Timely GitHub 
> > project. Someone suggested it might make sense as an Accumulo sub-project - 
> > let me know if there is interest in that.
> >
> >
> > - Dave
> >


Re: Qonduit - secure web socket proxy for Accumulo

2017-04-14 Thread Dave Marion
Qonduit is now located at https://github.com/NationalSecurityAgency/qonduit.

On Wed, Feb 22, 2017 at 2:16 PM, Josh Elser  wrote:

> Thanks, Dave!
>
>
> Dave Marion wrote:
>
>> There is a WebSocketClientIT test in the test module to test access using
>> the Java client. I have implemented operations that get the Qonduit server
>> version and an operation to run an Accumulo Scanner. Both of these
>> implementation are in the operations modules. An example of creating/using
>> a web socket from a web page exists in the Timely codebase[1]. For doing
>> this with Qonduit, the request and response objects need to be
>> encoded/decoded with a CBOR library (for which there are several).
>>
>> Regarding security, authentication is performed on the server side using
>> whatever pluggable modules are configured (basic auth, x509, etc). The
>> transport is encrypted from the browser to the Qonduit server using SSL
>> (HTTPS / WSS).
>>
>> [1] https://github.com/NationalSecurityAgency/timely/blob/
>> master/server/src/main/resources/webapp/index.html
>>
>> On February 22, 2017 at 1:55 PM Josh Elser  wrote:
>>>
>>>
>>> Neat. Thanks for sharing!
>>>
>>> Any examples to show how a client would use it?
>>>
>>> Regarding the security, does it encompass authentication and privacy
>>> (encryption)? Any experience with certain implementations for the Spring
>>> security modules (e.g. which ones you've tested to work)?
>>>
>>> Dave Marion wrote:
>>>
>>>> I extracted the Netty web socket pipeline from Timely into it's own
>>>> server process, and modified it to support custom and pluggable server side
>>>> logic, and request / response objects. The readme has a little more
>>>> information, to include how it's different than the current proxy, it's
>>>> located at https://github.com/NationalSecurityAgency/timely/tree/
>>>> qonduit.
>>>>
>>>>
>>>> For now the code is located in an orphan branch in the Timely GitHub
>>>> project. Someone suggested it might make sense as an Accumulo sub-project -
>>>> let me know if there is interest in that.
>>>>
>>>>
>>>> - Dave
>>>>
>>>>


Re: Fwd: Accumulo Table Sacanning Taking Time!!!

2017-04-27 Thread Dave Marion
You could add more tablet servers and add splits to the table.

> On April 27, 2017 at 7:17 AM Suresh Prajapati  
> wrote:
>
>
> -- Forwarded message --
> From: Suresh Prajapati 
> Date: Thu, Apr 27, 2017 at 4:39 PM
> Subject: Accumulo Table Sacanning Taking Time!!!
> To: dev@accumulo.apache.org
>
>
> Hello Team
>
> I am developing a client in accumulo to store geo-spatial information and
> using geomesa for indexing on top of it. However i found that scanning *~1
> million* records taking *2-3 sec*. I looked at indexes and query plan of
> geomesa but not able to find cause of the problem. I am running accumulo as
> single tablet-server(including master). I want to know -
> what are the factors can affect accumulo scanning operation? how can I
> optimise this time?
>
> Thank You
> Suresh Prajapati


Pull request guidelines

2017-06-05 Thread Dave Marion
Do we have some basic set of guidelines to use when reviewing pull requests? I 
don't see anything on the contributor[1] page.


[1] http://accumulo.apache.org/contributor/



Re: Pull request guidelines

2017-06-05 Thread Dave Marion
Thanks. I missed that and only saw the GitHub pull requests link. Reading 
through that document, it's not really what I was looking for. I was looking 
for a document that describes a standard for pull request authors and reviewers 
to provide some level of expectations and consistency for both parties. If we 
don't have one, I suggest that we discuss creating one.

> On June 5, 2017 at 9:33 AM Mike Walch  wrote:
>
>
> There is some documentation on reviews in the "Review Board" page of the
> contributor guide.
>
> https://accumulo.apache.org/contributor/rb
>
> This documentation should be cleaned up and generalized for reviews that
> are done using Review Board or GitHub.
>
> On Mon, Jun 5, 2017 at 9:02 AM Dave Marion  wrote:
>
> > Do we have some basic set of guidelines to use when reviewing pull
> > requests? I don't see anything on the contributor[1] page.
> >
> >
> > [1] http://accumulo.apache.org/contributor/
> >
> >


[DISCUSS] Pull Request Guidelines

2017-06-05 Thread Dave Marion
I propose that we define a set of guidelines to use when reviewing pull 
requests. In doing so, contributors will be able to determine potential issues 
in their code possibly reducing the number of changes that occur before 
acceptance. Here's an example to start the discussion:


Items a reviewer should look for:

1. Adherence to code formatting rules (link to formatting rules)

2. Unit tests required

3. Threading issues

4. Performance implications


Items that should not block acceptance:

1. Stylistic changes that have no performance benefit

2. Addition of features outside the scope of the ticket (moving the goal post, 
discussion should lead to ticket creation)


Re: [DISCUSS] Pull Request Guidelines

2017-06-05 Thread Dave Marion
I think that changes to the public API would be under more scrutiny and 
hopefully a consensus could be reached. I don't think we have hard and fast 
rules for our API conventions.

> On June 5, 2017 at 11:19 AM Tony Kurc  wrote:
>
>
> Dave, on your not block acceptance #1 - where would something like ensuring
> consistent "look and feel" of APIs fit? I recently had a PR for another
> project and recommended a class name change to something more consistent
> with the rest of the project.
>
>
> On Mon, Jun 5, 2017 at 11:08 AM, Dave Marion  wrote:
>
> > I propose that we define a set of guidelines to use when reviewing pull
> > requests. In doing so, contributors will be able to determine potential
> > issues in their code possibly reducing the number of changes that occur
> > before acceptance. Here's an example to start the discussion:
> >
> >
> > Items a reviewer should look for:
> >
> > 1. Adherence to code formatting rules (link to formatting rules)
> >
> > 2. Unit tests required
> >
> > 3. Threading issues
> >
> > 4. Performance implications
> >
> >
> > Items that should not block acceptance:
> >
> > 1. Stylistic changes that have no performance benefit
> >
> > 2. Addition of features outside the scope of the ticket (moving the goal
> > post, discussion should lead to ticket creation)
> >


Re: [DISCUSS] Pull Request Guidelines

2017-06-05 Thread Dave Marion
I'm not suggesting that stylistic changes should be ignored; in the example I 
was suggesting they should not be a blocker. The reviewer certainly should ask 
questions to understand the code, and suggest changes to make things clearer. 
Regarding #2, I agree that some PR's may have to wait to be merged until 
another issue is resolved.

I'm not trying to handcuff reviewers here, I'm just proposing that we handle 
PR's with some consistency. I fully agree that the reviewer should be able to 
voice his/her opinions. However, it would be good if the acceptance bar was 
close to the same for all contributions. I personally have been burned and 
totally put off by a contribution I was trying to make another open source 
project. I think that if there is a (somewhat) loose, but defined set of 
expectations on both sides of the contribution, it might be a better experience.


> On June 5, 2017 at 11:25 AM "Marc P."  wrote:
> 
> Dave,
>   I don't agree that stylistic changes are something to ignore. There may 
> be cases where something is confusing to others and thus should be called 
> out. This is difficult to blatantly avoid.
> 
>   I can't agree with number two either since a PR can be a form of 
> requirements elicitation and such there are cases in which there are new 
> preconditions on the ticket. While your "not block of acceptance" may 
> sometimes apply I don't think it goes to fitting a community of developers, 
> where you can discuss your differences. In the case of number one and two 
> developers reviewing will pick their battles and perhaps other reviewers can 
> chime in on the importance of said feature. What is the purpose of limiting 
> this discussion my claiming it cannot impact acceptance? 
> 
>   Bad code begets bad code and if a developer wants to take issue with 
> code, they should be allowed to discuss this within the PR. Further, 
> inconsistency begets inconsistency, so wild departures from the norm should 
> be something a reviewer has the levity to discuss.
> 
>   While discussion should lead to ticket creation we should avoid 
> creating features that need a portion completed to be used in production 
> successfully.
> 
>
> 
> 
> On Mon, Jun 5, 2017 at 11:08 AM, Dave Marion  mailto:dlmar...@comcast.net > wrote:
> 
> > > I propose that we define a set of guidelines to use when 
> reviewing pull requests. In doing so, contributors will be able to determine 
> potential issues in their code possibly reducing the number of changes that 
> occur before acceptance. Here's an example to start the discussion:
> > 
> > 
> > Items a reviewer should look for:
> > 
> > 1. Adherence to code formatting rules (link to formatting rules)
> > 
> > 2. Unit tests required
> > 
> > 3. Threading issues
> > 
> > 4. Performance implications
> > 
> > 
> > Items that should not block acceptance:
> > 
> > 1. Stylistic changes that have no performance benefit
> > 
> > 2. Addition of features outside the scope of the ticket (moving the 
> > goal post, discussion should lead to ticket creation)
> > 
> > > 
> 
 


Re: [DISCUSS] Pull Request Guidelines

2017-06-05 Thread Dave Marion
That's the intent. If we have some guidelines, and they are read first by a new 
contributor, then they will know that their is a formatter that they should be 
using.

> On June 5, 2017 at 11:35 AM Mike Drob  wrote:
>
>
> > 1. Adherence to code formatting rules (link to formatting rules)
>
> Can we let checkstyle handle this instead of humans worrying about it?
>
> On Mon, Jun 5, 2017 at 10:25 AM, Marc P.  wrote:
>
> > Dave,
> > I don't agree that stylistic changes are something to ignore. There may
> > be cases where something is confusing to others and thus should be called
> > out. This is difficult to blatantly avoid.
> >
> > I can't agree with number two either since a PR can be a form of
> > requirements elicitation and such there are cases in which there are new
> > preconditions on the ticket. While your "not block of acceptance" may
> > sometimes apply I don't think it goes to fitting a community of developers,
> > where you can discuss your differences. In the case of number one and two
> > developers reviewing will pick their battles and perhaps other reviewers
> > can chime in on the importance of said feature. What is the purpose of
> > limiting this discussion my claiming it cannot impact acceptance?
> >
> > Bad code begets bad code and if a developer wants to take issue with
> > code, they should be allowed to discuss this within the PR. Further,
> > inconsistency begets inconsistency, so wild departures from the norm should
> > be something a reviewer has the levity to discuss.
> >
> > While discussion should lead to ticket creation we should avoid creating
> > features that need a portion completed to be used in production
> > successfully.
> >
> >
> >
> >
> > On Mon, Jun 5, 2017 at 11:08 AM, Dave Marion  wrote:
> >
> > > I propose that we define a set of guidelines to use when reviewing pull
> > > requests. In doing so, contributors will be able to determine potential
> > > issues in their code possibly reducing the number of changes that occur
> > > before acceptance. Here's an example to start the discussion:
> > >
> > >
> > > Items a reviewer should look for:
> > >
> > > 1. Adherence to code formatting rules (link to formatting rules)
> > >
> > > 2. Unit tests required
> > >
> > > 3. Threading issues
> > >
> > > 4. Performance implications
> > >
> > >
> > > Items that should not block acceptance:
> > >
> > > 1. Stylistic changes that have no performance benefit
> > >
> > > 2. Addition of features outside the scope of the ticket (moving the goal
> > > post, discussion should lead to ticket creation)
> > >
> >


Re: [DISCUSS] Pull Request Guidelines

2017-06-05 Thread Dave Marion
The main entrance to the community for new contributors is through pull 
requests. I have seen PR's approved in an inconsistent manner. My intent was to 
make known the expectations for new contributions so that newcomers don't get 
discouraged by the amount of feedback and/or changes requested while providing 
some guidelines to make it more consistent. It seems that there is not a desire 
to do this for various reasons. That's fine by me and I'm willing to drop the 
discussion here.


> On June 5, 2017 at 12:14 PM "Marc P."  wrote:
> 
> Turner and Tubbs,
>   You both piqued my interest and I agree. There's something important in 
> what both said regarding the discussion and importance of a particular 
> change. Style changes most likely aren't deal breakers unless it is terribly 
> confusing, but I would leave that up to the reviewer and developer to 
> discuss. 
> 
> Dave,
>   I'm sure your intent is good and you goal isn't the handcuff reviewers. 
> Is your concern over a stalemate on something such as a code style? Would a 
> discussion not be the remedy for this? 
> 
> On Mon, Jun 5, 2017 at 12:07 PM, Keith Turner  mailto:ke...@deenlo.com > wrote:
> 
> > > Sometimes I use review comments to just ask questions about 
> things I
> > don't understand.  Sometimes when looking at a code review, I have a
> > thought about the change that I know is a subjective opinion.  In 
> > this
> > case I want to share my thought, in case they find it useful.
> > However, I don't care if a change is made or not.  Sometimes I 
> > think a
> > change must be made.  I try to communicate my intentions, but its
> > wordy, slow,  and I don't think I always succeed.
> > 
> > Given there are so many ways the comments on a review can be used, I
> > think it can be difficult to quickly know the intentions of the
> > reviewer.  I liked review board's issues, I think they helped with
> > this problem.  A reviewer could make comments and issues.  The 
> > issues
> > made it clear what the reviewer thought must be done vs discussion.
> > Issues made reviews more efficient by making the intentions clear 
> > AND
> > separating important concerns from lots of discussion.
> > 
> > When I submit a PR and it has lots of comments, towards the end I go
> > back and look through all of the comments to make sure I didn't miss
> > anything important.  Its annoying to have to do this.  Is there
> > anything we could do in GH to replicate this and help separate the
> > signal from the noise?
> > 
> > 
> > On Mon, Jun 5, 2017 at 11:08 AM, Dave Marion  > mailto:dlmar...@comcast.net > wrote:
> > > I propose that we define a set of guidelines to use when 
> > reviewing pull requests. In doing so, contributors will be able to 
> > determine potential issues in their code possibly reducing the number of 
> > changes that occur before acceptance. Here's an example to start the 
> > discussion:
> > >
> > >
> > > Items a reviewer should look for:
> > >
> > > 1. Adherence to code formatting rules (link to formatting rules)
> > >
> > > 2. Unit tests required
> > >
> > > 3. Threading issues
> > >
> > > 4. Performance implications
> > >
> > >
> > > Items that should not block acceptance:
> > >
> > > 1. Stylistic changes that have no performance benefit
> > >
> > > 2. Addition of features outside the scope of the ticket (moving 
> > the goal post, discussion should lead to ticket creation)
> > 
> > > 
> 
 


Re: [DISCUSS] Pull Request Guidelines

2017-06-05 Thread Dave Marion
I think things can be improved when it comes to handling pull requests. The 
point of this thread was to try and come up with something to set expectations 
for the contributor. I figured the discussion would lead to the modification of 
the existing example or to a new example. Christopher provided a different 
example, but most of the feedback seemed to indicate that this was not 
warranted. I'm not sure what else I can say on the matter. If the majority 
thinks that its not a problem, then its not a problem.

> On June 5, 2017 at 12:39 PM Josh Elser  wrote:
>
>
> Perhaps this discussion would be better served if you gave some concrete
> suggestions on how you think things can/should be improved.
>
> e.g. Mike's suggestion of using the maven-checkstyle-plugin earlier, why
> not focus on that? Does this (still) work with the build? If so, how do
> we get that run automagically via travis or jenkins?
>
> To me, it seems like you either wanted to throw some shade or you are
> genuinely concerned about a problem that others are not (yet?) concerned
> about. I doubt re-focusing contribution processes for efficiency would
> be met with disapproval.
>
> On 6/5/17 12:32 PM, Dave Marion wrote:
> > The main entrance to the community for new contributors is through pull 
> > requests. I have seen PR's approved in an inconsistent manner. My intent 
> > was to make known the expectations for new contributions so that newcomers 
> > don't get discouraged by the amount of feedback and/or changes requested 
> > while providing some guidelines to make it more consistent. It seems that 
> > there is not a desire to do this for various reasons. That's fine by me and 
> > I'm willing to drop the discussion here.
> >
> >
> >> On June 5, 2017 at 12:14 PM "Marc P."  wrote:
> >>
> >> Turner and Tubbs,
> >> You both piqued my interest and I agree. There's something important in 
> >> what both said regarding the discussion and importance of a particular 
> >> change. Style changes most likely aren't deal breakers unless it is 
> >> terribly confusing, but I would leave that up to the reviewer and 
> >> developer to discuss.
> >>
> >> Dave,
> >> I'm sure your intent is good and you goal isn't the handcuff reviewers. Is 
> >> your concern over a stalemate on something such as a code style? Would a 
> >> discussion not be the remedy for this?
> >>
> >> On Mon, Jun 5, 2017 at 12:07 PM, Keith Turner  >> mailto:ke...@deenlo.com > wrote:
> >>
> >> > > Sometimes I use review comments to just ask questions about things I
> >>> don't understand. Sometimes when looking at a code review, I have a
> >>> thought about the change that I know is a subjective opinion. In this
> >>> case I want to share my thought, in case they find it useful.
> >>> However, I don't care if a change is made or not. Sometimes I think a
> >>> change must be made. I try to communicate my intentions, but its
> >>> wordy, slow, and I don't think I always succeed.
> >>>
> >>> Given there are so many ways the comments on a review can be used, I
> >>> think it can be difficult to quickly know the intentions of the
> >>> reviewer. I liked review board's issues, I think they helped with
> >>> this problem. A reviewer could make comments and issues. The issues
> >>> made it clear what the reviewer thought must be done vs discussion.
> >>> Issues made reviews more efficient by making the intentions clear AND
> >>> separating important concerns from lots of discussion.
> >>>
> >>> When I submit a PR and it has lots of comments, towards the end I go
> >>> back and look through all of the comments to make sure I didn't miss
> >>> anything important. Its annoying to have to do this. Is there
> >>> anything we could do in GH to replicate this and help separate the
> >>> signal from the noise?
> >>>
> >>>
> >>> On Mon, Jun 5, 2017 at 11:08 AM, Dave Marion  >>> mailto:dlmar...@comcast.net > wrote:
> >>> > I propose that we define a set of guidelines to use when reviewing pull 
> >>> > requests. In doing so, contributors will be able to determine potential 
> >>> > issues in their code possibly reducing the number of changes that occur 
> >>> > before acceptance. Here's an example to start the discussion:
> >>> >
> >>> >
> >>> > Items a reviewer should look for:
> >>> >
> >>> > 1. Adherence to code formatting rules (link to formatting rules)
> >>> >
> >>> > 2. Unit tests required
> >>> >
> >>> > 3. Threading issues
> >>> >
> >>> > 4. Performance implications
> >>> >
> >>> >
> >>> > Items that should not block acceptance:
> >>> >
> >>> > 1. Stylistic changes that have no performance benefit
> >>> >
> >>> > 2. Addition of features outside the scope of the ticket (moving the 
> >>> > goal post, discussion should lead to ticket creation)
> >>>
> >>> >
> >>
> >
> >


Re: [DISCUSS] Pull Request Guidelines

2017-06-05 Thread Dave Marion
I have used Hadoop's documentation on this subject for submitting patches. I'm 
not suggesting that we go to this level of detail, but as a new contributor I 
know how to set up my IDE, what commands to run to create my patch, and I know 
the items that are going to be checked at the start.


[1] https://wiki.apache.org/hadoop/HowToContribute

[2] https://wiki.apache.org/hadoop/CodeReviewChecklist


> 
> On June 5, 2017 at 1:19 PM Mike Miller  wrote:
> 
> I could be wrong, but it sounds like there are two different
> perspectives being discussed here and it may be helpful to try and
> separate the two. On one hand there are discussions of guidelines for
> reviewers (Dave's initial list, Keith's ideas) to follow and on the
> other hand, suggestions for contributors, which Christopher's list
> sounds more geared towards. Since everyone on this list has to wear
> both hats, I think each different point of view could benefit from
> some loose guidelines.
> 
> For example, General Pull Request Guidelines for the Accumulo community:
> When submitting a PR... please run these commands [...] before
> submitting to ensure code adheres to checkstyle and passes findbugs,
> etc
> When reviewing a PR... ensure dialog portrays how strongly the
> reviewer feels about the comment [Could = optional suggestion, Should
> = would be helpful but not blocking, Must = required]
> 
> On Mon, Jun 5, 2017 at 12:57 PM, Dave Marion  wrote:
> 
> > > 
> > I think things can be improved when it comes to handling pull 
> > requests. The point of this thread was to try and come up with something to 
> > set expectations for the contributor. I figured the discussion would lead 
> > to the modification of the existing example or to a new example. 
> > Christopher provided a different example, but most of the feedback seemed 
> > to indicate that this was not warranted. I'm not sure what else I can say 
> > on the matter. If the majority thinks that its not a problem, then its not 
> > a problem.
> > 
> > > > > 
> > > On June 5, 2017 at 12:39 PM Josh Elser  
> > > wrote:
> > > 
> > > Perhaps this discussion would be better served if you gave 
> > > some concrete
> > > suggestions on how you think things can/should be improved.
> > > 
> > > e.g. Mike's suggestion of using the maven-checkstyle-plugin 
> > > earlier, why
> > > not focus on that? Does this (still) work with the build? If 
> > > so, how do
> > > we get that run automagically via travis or jenkins?
> > > 
> > > To me, it seems like you either wanted to throw some shade or 
> > > you are
> > > genuinely concerned about a problem that others are not 
> > > (yet?) concerned
> > > about. I doubt re-focusing contribution processes for 
> > > efficiency would
> > > be met with disapproval.
> > > 
> > > On 6/5/17 12:32 PM, Dave Marion wrote:
> > > 
> > > > > > > 
> > > > The main entrance to the community for new contributors 
> > > > is through pull requests. I have seen PR's approved in an inconsistent 
> > > > manner. My intent was to make known the expectations for new 
> > > > contributions so that newcomers don't get discouraged by the amount of 
> > > > feedback and/or changes requested while providing some guidelines to 
> > > > make it more consistent. It seems that there is not a desire to do this 
> > > > for various reasons. That's fine by me and I'm willing to drop the 
> > > > discussion here.
> > > > 
> > > > > > > > > 
> > > > > On June 5, 2017 at 12:14 PM "Marc P." 
> > > > >  wrote:
> > > > > 
> > > > > Turner and Tubbs,
> > > > > You both piqued my interest and I agree. There's 
> > > > > something important in what both said regarding the discussion and 
> > > > > importance of a particular change. Style changes most likely aren't 
> > > > > deal breakers unless it is terribly confusing, but I would leave that 
> > > > > up to the reviewer and developer to discuss.
> > > > > 
> > > > > 

RE: [DISCUSS] Question about 1.7 bugfix releases

2017-06-06 Thread Dave Marion
Looks like 128 issues in 1.8.x that are not in 1.7.x. I looked through them, I 
didn't see anything that stood out as destabilizing.

https://issues.apache.org/jira/browse/ACCUMULO-4572?jql=fixVersion IN (1.8.0%2C 
1.8.1) AND fixVersion NOT IN (1.7.0%2C 1.7.1%2C 1.7.2%2C 1.7.3) AND project %3D 
ACCUMULO


> -Original Message-
> From: md...@cloudera.com [mailto:md...@cloudera.com] On Behalf Of
> Mike Drob
> Sent: Tuesday, June 06, 2017 3:14 PM
> To: Accumulo Dev List
> Subject: Re: [DISCUSS] Question about 1.7 bugfix releases
> 
> Are there potentially destabilizing new features in 1.8 that are not present 
> in
> 1.7.x?
> 
> On Tue, Jun 6, 2017 at 2:09 PM, Christopher  wrote:
> 
> > On Tue, Jun 6, 2017 at 12:39 PM Sean Busbey 
> wrote:
> >
> > > Why do we consider 1.8.1 stable?
> > >
> > >
> >
> >
> > I would consider 1.8.1 stable (or, at least as stable as 1.7.3),
> > because it includes all the bugfixes that we've identified in 1.7,
> > plus fixes to all the known issues which were identified shortly after the
> rollout of 1.8.0.
> > And, because I've seen users use it successfully.
> >



Re: Draft Board Report for Jul 2017

2017-07-10 Thread Dave Marion
+1 LGTM

> 
> On July 10, 2017 at 8:38 AM Michael Wall  wrote:
> 
> The Apache Accumulo PMC decided to draft its quarterly board
> reports on the dev list. Here is a draft of our report which is due by
> Wednesday, Jul 12 . Please let me know if you have any suggestions,
> I plan to submit on the 12th.
> 
> Mike
> 
> --
> 
> ## Description:
> 
> * The Apache Accumulo sorted, distributed key/value store is a robust,
>   scalable, high performance data storage system that features 
> cell-based
>   access control and customizable server-side processing. It is based 
> on
>   Google's BigTable design and is built on top of Apache Hadoop,
>   Zookeeper, and Thrift.
> 
> ## Issues:
> 
> * There are no issues requiring board attention at this time.
> 
> ## Activity:
> 
> * There were no new releases during the current reporting period.
> * Since the last report, there has been a focus on documentation 
> clean up
>   and paying down some technical debt in our integration test suite.
> 
> ## Health report:
> 
> * The project remains healthy. Activity levels on mailing lists, git 
> and
>   JIRA remain constant.
> 
> ## PMC changes:
> 
> * Currently 29 PMC members.
> * No new PMC members added in the last 3 months. We have invited a 
> long
>   time
>   contributor to become both a committer and PMC member.
> * Last PMC addition was Mike Walch on Wed Nov 02 2016.
> 
> ## Committer base changes:
> 
> * Currently 29 committers.
> * No new committers added in the last 3 months
> * Last committer addition was Mike Walch at Thu Nov 03 2016
> 
> ## Releases:
> 
> * Last release was 1.7.3 on Sat Mar 25 2017
> 
> ## Mailing list activity:
> 
> * dev@accumulo.apache.org:
> 
>   o 232 subscribers (up 3 in the last 3 months):
>   o 997 emails sent to list (1012 in previous quarter)
> * notificati...@accumulo.apache.org:
> 
>   o 63 subscribers (down -2 in the last 3 months):
>   o 573 emails sent to list (589 in previous quarter)
> * u...@accumulo.apache.org:
> 
>   o 398 subscribers (up 1 in the last 3 months):
>   o 107 emails sent to list (120 in previous quarter)
> 
> ## JIRA activity:
> 
> * 57 JIRA tickets created in the last 3 months
> * 52 JIRA tickets closed/resolved in the last 3 months
> 


RE: accumulo.metadata table online but scans hang

2017-08-31 Thread Dave Marion
e considered to be an invalid format and break things more 
> (I'm not sure that's possible), or might they be accepted as needing no 
> further resolution?
>
> Any other thoughts (anyone) on how we might save ourselves, besides starting 
> from scratch? (When we first loaded our 16TB of data it took 6 weeks using 
> the map/reduce method!)
>
> Thank you again!
>
> Nick
>
>
>
> From: Dave Marion [mailto:dlmar...@comcast.net]
> Sent: 30 August 2017 20:13
> To: u...@accumulo.apache.org; Nick Wise 
> Subject: Re: accumulo.metadata table online but scans hang
>
> Some immediate thoughts:
>
> 1. Regarding node08 having so many files, maybe it was the last DN that had 
> free space?
> 2. Look in the trash folder for the missing referenced WAL files 3. For you 
> OOME using the HDFS CLI, I think you can increase the amount of memory that 
> the client will use with: export HADOOP_CLIENT_OPTS="-Xmx1G" (or something 
> like that).
>
> Still digesting the rest
>
>
> On August 30, 2017 at 2:45 PM Nick Wise 
> <mailto:nicholas.w...@sa.catapult.org.uk> wrote:
>
> Disclaimer: I don’t have much experience with Accumulo or Hadoop, I’m 
> standing in because our resident expert is away on honeymoon! We’ve done a 
> great deal of reading and do not know if our situation is recoverable, so any 
> and all advice would be very welcome.
>
> Background:
> We are running:
> (a) Accumulo version: 1.7.0
> (b) Hadoop version: 2.7.1
> (c) Geomesa version: 1.2.1
> We have 31 nodes, 2 masters and 3 zookeepers (obviously named in the log 
> excerpts below). Nodes are both data nodes and tablet servers, masters are 
> also name nodes. Nodes have 16GB RAM, Intel Core i5 dual core CPUs, and 500GB 
> or 1TB SSD each.
> This is a production deployment where we are analysing 16TB (and growing) 
> geospatial data, with the outcomes being used daily. We have customers 
> relying on our results.
>
> Initial Issue:
> The non-DFS storage used in our HDFS system was falsely reporting that it was 
> using all of the free space we had available, resulting in HDFS rejecting 
> writes from a variety of places across our cluster. After research it 
> appeared that this may be as a result of a bug, and that restarting HDFS 
> services would resolve it. After restarting the HDFS services the non-DFS 
> storage used immediately returned to expected levels, but accumulo didn’t 
> seem to be responding to queries so we ran stop-all.sh and start-all.sh. When 
> running stop-all.sh it timed out trying to contact the master, and did a 
> forced shutdown.
>
> After restarting, Accumulo listed all the tables as being online (except for 
> accumulo.replication which is offline) but none of the tables have their 
> tablets associated except for:
> (a) accumulo.metadata
> (b) accumulo.root
> All Geomesa tables are showing as online though the tablets, table sizes and 
> record counts are not showing in the web UI.
>
> In the logs (which are very large) there are a range of issues showing, the 
> following seeming important from our Googling.
>
> Log excerpts:
> 2017-08-30 14:45:06,195 [master.EventCoordinator] INFO : Marked 1 tablets as 
> unassigned because they don't have current servers
> 2017-08-30 14:45:06,195 [master.EventCoordinator] INFO : [Metadata Tablets]: 
> 1 tablets are ASSIGNED_TO_DEAD_SERVER
> 2017-08-30 14:45:13,425 [master.Master] INFO : Assigning 1 tablets
> 2017-08-30 14:45:13,441 [master.EventCoordinator] INFO : [Metadata Tablets]: 
> 1 tablets are UNASSIGNED
> 2017-08-30 14:45:13,975 [master.EventCoordinator] INFO : tablet !0<;~ was 
> loaded on node03:9997
>
> An Accumulo meta data node is offline. In the accumulo master log file we see 
> that there are 1101 WALs associated with a node (node08) that are linked to 
> tablet !0<~. Below are 2 instances of the message we get in the logs, which 
> repeat over and over, and there are 1101 of them per repeat. We’re not sure 
> why there are 1101 WALs for the one node, but we assume that this is the main 
> cause of our problem.
>
> 2017-08-30 15:20:29,094 [conf.AccumuloConfiguration] INFO : Loaded class : 
> org.apache.accumulo.server.master.recovery.HadoopLogCloser
> 2017-08-30 15:20:29,094 [recovery.RecoveryManager] INFO : Starting recovery 
> of 
> hdfs://master01:9000/user/accumulo/accumulo/wal/node08+9997/fed84709-3d3b-45b0-8b77-020a71762b09
>  (in : 300s), tablet !0;~< holds a reference
> 2017-08-30 15:20:29,142 [conf.AccumuloConfiguration] INFO : Loaded class : 
> org.apache.accumulo.server.master.recovery.HadoopLogCloser
> 2017-08-30 15:20:29,142 [recovery.RecoveryManager] INFO : Starting recovery 
> of 
> hdfs://master01:9000/user/accumulo/accumulo/wal/node0

Re: [DISCUSS] Guava Dependencies

2017-09-18 Thread Dave Marion
We still have to use a Hadoop-compatible version of Guava on the server-side 
though, right? I believe the DFSClient has Guava dependencies.


> 
> On September 18, 2017 at 2:12 PM Mike Miller  wrote:
> 
> Recently tickets have been opened dealing with Guava in Accumulo (see
> ACCUMULO-4701 through 4704), in particular the use of Beta classes and
> methods. Use of Guava comes with a few warnings...
> 
> From the Guava README:
> 
> *1. APIs marked with the @Beta annotation at the class or method level are
> subject to change. They can be modified in any way, or even removed, at 
> any
> time. If your code is a library itself (i.e. it is used on the CLASSPATH 
> of
> users outside your own control), you should not use beta APIs, unless you
> repackage them (e.g. using ProGuard).2.Deprecated non-beta APIs will be
> removed two years after the release in which they are first deprecated. 
> You
> must fix your references before this time. If you don't, any manner of
> breakage could result (you are not guaranteed a compilation error).*
> 
> I think it is worth a discussion on how to handle Guava dependencies going
> forward across the different versions of Accumulo. The goal would be to
> allow use of a newer version version of Guava in client applications with
> the current supported versions of Accumulo.
> 
> Ideally, we could just eliminate any use of Beta Guava code. But there are
> Beta classes that are very useful and some which we already have 
> integrated
> into released Accumulo versions.
> 
> There seem to be 3 ways to handle Guava dependencies:
> 1 - jar shading
> 2 - copy Guava code into Accumulo
> 3 - replace Guava code with standard Java
> 
> We may have to handle it differently with each version of Accumulo. For
> example, 1.8 has more widespread use of Beta annotated code than 1.7.
> 


RE: [DISCUSS] Hadoop3 support target?

2017-12-04 Thread Dave Marion
There is no reason that you can't mark the offending API methods as deprecated 
in a 1.8.x release, then immediately branch off of that to create a 2.0 and 
remove the method. Alternatively, we could decide to forego the semver rules 
for a specific release and make sure to point it out in the release notes.

-Original Message-
From: Josh Elser [mailto:els...@apache.org] 
Sent: Monday, December 4, 2017 6:19 PM
To: dev@accumulo.apache.org
Subject: Re: [DISCUSS] Hadoop3 support target?

Also, just to be clear for everyone else:

This means that we have *no roadmap* at all for Hadoop 3 support because 
Accumulo 2.0 is in a state of languish.

This is a severe enough problem to me that I would consider breaking API 
compatibility and fixing the API leak in 1.7/1.8. I'm curious what people other 
than Christopher think (assuming from his comments/JIRA work that he disagrees 
with me).

On 12/4/17 6:12 PM, Christopher wrote:
> Agreed.
> 
> On Mon, Dec 4, 2017 at 6:01 PM Josh Elser  wrote:
> 
>> Ah, I'm seeing now -- didn't check my inbox appropriately.
>>
>> I think the fact that code that we don't own has somehow been allowed 
>> to be public API is the smell. That's something that needs to be 
>> rectified sooner than later. By that measure, it can *only* land on 
>> Accumulo 2.0 (which is going to be a major issue for the project).
>>
>> On 12/4/17 5:58 PM, Josh Elser wrote:
>>> Sorry, I don't follow. Why do you think 4611/4753 is a show-stopper?
>>> Cuz, uh... I made it work already :)
>>>
>>> Thanks for the JIRA cleanup. Forgot about that one.
>>>
>>> On 12/4/17 5:55 PM, Christopher wrote:
 I don't think we can support it with 1.8 or earlier, because of 
 some serious incompatibilities (namely, ACCUMULO-4611/4753) I think 
 people are still patching 1.7, so I don't think we've "officially"
 EOL'd it.
 I think 2.0 could require Hadoop 3, if Hadoop 3 is sufficiently stable.

 On Mon, Dec 4, 2017 at 1:14 PM Josh Elser  wrote:

> What branch do we want to consider Hadoop3 support?
>
> There is a 3.0.0-beta1 release that's been out for a while, and 
> Hadoop PMC has already done a 3.0.0 RC0. I think it's the right 
> time to start considering this.
>
> In my poking so far, I've filed ACCUMULO-4753 which I'm working 
> through now. This does raise the question: where do we want to say 
> we support Hadoop3? 1.8 or 2.0? (have we "officially" deprecated 
> 1.7?)
>
> - Josh
>
> https://issues.apache.org/jira/browse/ACCUMULO-4753
>

>>
> 



Re: commons-vfs2.jar 2.2 buggy

2018-10-24 Thread Dave Marion
I have talked with Christopher about the VFS class loader in general and I
think he has a good approach. He can elaborate further if needed, but the
approach is to move it out of the core project and allow users to configure
it at runtime using the java.system.class.loader system property. There are
organizations using the VFSClassloader successfully, maybe it just needs to
be reimplemented.

On Wed, Oct 24, 2018 at 2:58 PM Sean Busbey 
wrote:

> sounds like a good DISCUSS thread for 2.0?
> On Wed, Oct 24, 2018 at 1:43 PM Josh Elser  wrote:
> >
> > It seems like commons-vfs2 is just a pile of crap.
> >
> > It's been known to have bugs for years and we've seen zero progress from
> > them on making them better.
> >
> > IMO, rip the whole damn thing out.
> >
> > On 10/24/18 12:42 PM, Matthew Peterson wrote:
> > > Hello Accumulo,
> > >
> > > Summary: commons-vfs2 version 2.2 seems to have problems and it may be
> > > worth rolling back to version 2.1 of commons-vfs2.
> > >
> > > My project upgraded a system from Accumulo 1.8.1 to 1.9.2.  Immediately
> > > after switching vfs contexts we saw problems.  The tservers would
> error in
> > > iterators about missing classes that were clearly on the classpath.
> The
> > > problems were persistent until we replaced the commons-vfs2.jar with
> > > version 2.1 (Accumulo 1.9.2 uses version 2.2).  Until we rolled vfs
> back,
> > > we received errors particularly with Spring code trying to access
> various
> > > classes and files within the jars.  It looks like in 2.2, commons-vfs
> > > implemented a doDetach method which closed the zip files.  We suspect
> that
> > > code is the problem but haven't tested that theory.
> > >
> > > I suspect that most users don't use this feature.
> > >
> > > Thanks!
> > > Matt
> > >
>
>
>
> --
> busbey
>


Re: commons-vfs2.jar 2.2 buggy

2018-10-26 Thread Dave Marion
Based on the comments in https://issues.apache.org/jira/browse/ACCUMULO-4828, 
the update to 2.2 did not solve the issues. Seems like reverting back to 2.1 
might be in order for the short term.

> On October 26, 2018 at 1:32 PM Andrew Hulbert  mailto:andrew.hulb...@ccri.com > wrote:
> 
> 
> Matt,
> 
> We are running into similar issues with the 2.2 VFS jar running on
> Accumulo 1.9.2 after upgrading from 1.8.1 but have been restarting
> tservers to work around it and other issues with putting the iterators
> in /tmp on certain systems.
> 
> In general though we love it because we can run multiple versions of
> iterators on the same cluster and we have it deployed on several systems
> with our clients for that specific use case.
> 
> Sean/Chris, if we rip it out would you imagine iterators being more like
> HBase where you are basically bound to the startup classpath as the
> baseline mechanism (with user-enabled specific class loaders). Or do you
> imagine another upgrade/configuration mechanism? FYI we do VFS and the
> general accumulo mechanism for configuring iterators and the iterator
> api design because its pretty user/developer friendly.
> 
> Thanks,
> 
> Andrew
> 
> 
> On 10/24/2018 10:55 PM, Christopher wrote:
> 
> > > The idea that Dave is talking about is that I don't think we 
> should be
> > doing any classloader special sauce in accumulo-start at all, and we
> > might even be able to remove accumulo-start as a module entirely,
> > since this is its primary (sole?) purpose.
> > 
> > > 
> > > It's just a rough idea that I've tossed around with a few people, 
> but
> > haven't really spent any time materializing it into a proposal, PR, 
> > or
> > experiment. Basically, I think we should rip out all classloader
> > special sauce. If a user still wishes to use a custom classloader 
> > for
> > any reason, using vfs2 or anything else, they can set a system class
> > loader with -Djava.system.class.loader=my.custom.CustomClassLoader
> > when they run java. This is an advanced Java option supported by 
> > Java
> > itself, and really shouldn't be a problem to punt this downstream.
> > Classloading is way outside the scope of what Accumulo does anyway,
> > and Accumulo should have its complexity centered around what it 
> > does,
> > and not "bells and whistles" on top of basic Java language 
> > functions.
> > 
> > > 
> > > If we wanted to, we could use our current classloading code to 
> create
> > a classloader which could be used this way... and maybe provide it 
> > as
> > an example or explain it in a blog post. But, Accumulo shouldn't be
> > doing special sauce class loading... there are other projects that 
> > are
> > better suited to specializing that for any Java application... and
> > there's no reason we need it so tightly coupled to Accumulo.
> > 
> > > 
> > > Of course, there's still some utility in the per-table context
> > classloaders for pluggable components like iterators... and there's
> > probably room for improvement in the configuration of those... but 
> > the
> > main startup classloading is probably best to rip out.
> > 
> > > 
> > > I'm not sure if it should be done for 2.0 or not... maybe yes. 
> I'd be
> > willing to rip it out... I enjoy ripping things out and reducing 
> > code
> > complexity. But, I don't really have a desire to do the work of
> > implementing or blogging about alternatives, if that's even 
> > necessary.
> > I'd hope that somebody else would do that, if they felt it was 
> > really
> > necessary once the built-in stuff was ripped out. For me, I'd be 
> > happy
> > mentioning the feature in the release notes, maybe linking to the 
> > docs
> > on the feature, and leaving implementation as an exercise for
> > downstream, with an open invitation for a guest blog on our website
> > about how it could be done.
> > 
> > > 
> > > I've been thinking we're probably going to want a second alpha... 
> or a
> > beta, before 2.0 final... and if we did this for 2.0, I'd definitely
> >

Re: [VOTE] Proposal to release version 1.10

2019-11-01 Thread Dave Marion
Ed,

  At the point that 1.10 is released, would there be any Java 8 language
features used in the codebase? What exactly are you changing for the 1.10
release, the compiler version in the pom, or more?

On Fri, Nov 1, 2019 at 2:10 PM Michael Wall  wrote:

> I am +1 on moving 1.10 to Java 8.
>
> However Sean's -1 vote is a veto [1] and we can not proceed down this path
> unless it is withdrawn.  I can only take the veto to mean there are
> customers who would upgrade to Accumulo 1.10 but would not upgrade to Java
> 1.8.  Is there anything that would change your mind Sean?
>
> Thanks
>
> Mike
>
> 1 - https://www.apache.org/foundation/voting.html#Veto
>
>
> On Fri, Nov 1, 2019 at 12:46 PM Sean Busbey 
> wrote:
>
> > Correct, it is up to every user of SemVer to define the public API and
> > AFAIK we have chosen not to include things like the Java version
> > needed to run Accumulo in ours[1].
> >
> > That doesn't mean it's not crappy to our downstream users to do things
> > that have a major operational impact upon minor releases. Updating a
> > JDK version is a major undertaking. It takes a long time to do in an
> > environment with strict change control policies and it sucks. There
> > are still shops that run JDK7. There are multiple options for
> > purchasing commercial support with security updates for it still. Just
> > picking two vendors out of the air[2], Oracle will still provide
> > support for almost 2 more years and Azul for almost 3.
> >
> > That doesn't mean we have to keep supporting JDK7, but be aware that
> > we are trading for a gain in developer convenience at the expense of
> > operator difficulty. We will probably drive folks into the arms of
> > forks that bother to maintain JDK compatibility for these release
> > lines. It does inhibit our ability to draw new folks into the
> > community, but that's not a fundamental problem I guess.
> >
> > As an aside, this comment from your cited FAQ is inaccurate on its
> > face for practical considerations in the Java ecosystem as cause for
> > not needing to worry about the downstream impact of changing a
> > dependency.
> >
> > > Software that explicitly depends on the same dependencies as your
> > package should have their own dependency specifications and the author
> will
> > notice any conflicts.
> >
> > We've discussed this a bunch of times. We clearly have disagreement in
> > the community about the priority on the tradeoff between developer
> > work and operational work. That's okay.
> >
> > [1]: https://accumulo.apache.org/api/
> > [2]: https://www.azul.com/products/azul-support-roadmap/
> >
> > On Fri, Nov 1, 2019 at 7:16 AM Ed Coleman  wrote:
> > >
> > > If I am reading semver correctly (
> >
> https://semver.org/#what-should-i-do-if-i-update-my-own-dependencies-without-changing-the-public-api
> )
> > this proposal has no changes to the Accumulo public API, it is an update
> to
> > our dependencies - and would not require a major version change.
> > >
> > > -Original Message-
> > > From: Sean Busbey [mailto:bus...@cloudera.com.INVALID]
> > > Sent: Friday, November 01, 2019 3:52 AM
> > > To: dev@accumulo apache. org 
> > > Subject: Re: [VOTE] Proposal to release version 1.10
> > >
> > > -1 no dropping supported java versions in a minor release. if we want
> > folks to move to java 8 then we should make it easier to upgrade to
> > Accumulo 2.y
> > >
> > > On Thu, Oct 31, 2019 at 7:37 PM Ed Coleman 
> wrote:
> > > >
> > > > As suggested in the LTS discussion ([LAZY][VOTE] A basic, but
> > > > concrete, LTS proposal), I'm breaking this out to as a separate
> thread
> > > > to keep the topic distinct.
> > > >
> > > >
> > > > The proposal - I would like to start the formal release process for a
> > > > 1.10 version that would change the java language level to java 8.
> The
> > > > release would be based on the current 1.9 branch and would be
> released
> > > > instead of a 1.9.4.  The 1.10 release would not contain additional
> > > > feature changes that are not present in the current 1.9 branch.
> > > > Currently, this would be based on the commit SHA:
> > > >
> > > >
> > > > 328ffa0849981e0f113dfbf539c832b447e06902 - committed Thu Oct 10.
> > > >
> > > >
> > > > (I am unaware of any bug-fixes or issues in the pipe line that would
> /
> > > > should be included - but hopefully this makes the intention clear.)
> > > >
> > > >
> > > > The goal is to provide a candidate for LTS nomination that is based
> on
> > > > the current 1.9.x code, but unifies our currently supported branches
> > > > to all use java 8 as the supported language level. While this had
> been
> > > > discussed in the past, enough time has passed that a java 8
> > > > requirement now, seems to me, to be unlikely to impact any customers
> > > > that would look to upgrade Accumulo past a 1.9.3 version and have
> them
> > not running at least java 8.
> > > > Having our code base with a modern, unified java language support
> > > > level would greatly benefit our develop

Re: [LAZY][VOTE] change default branch to 'main'

2020-08-03 Thread Dave Marion
+1

On Mon, Aug 3, 2020 at 8:48 AM Owens, Mark  wrote:

> +1
>
> -Original Message-
> From: Christopher 
> Sent: Monday, August 3, 2020 7:59 AM
> To: accumulo-dev 
> Subject: [LAZY][VOTE] change default branch to 'main'
>
> As a follow-up from our previous conversation on this issue, I have
> already started a new branch named 'main' for my own future contributions
> (that name because it appears to be the trending alternative to 'master'),
> and for others who wish to use it as an alternative to the current 'master'
> branch. I intend to discontinue merging contributions (my own or others) to
> any 'master' branch owned by the Accumulo PMC.
>
> In conjunction with that practice that I intend to follow, I would like to
> propose the following as a lazy vote (by that, I mean I'm just going to do
> it if nobody votes; if somebody objects, this turns into a majority vote):
>
> 1. Submit an INFRA ticket to change the default branch to 'main' for all
> our repos (I'll make sure one is created in each before then), and 2.
> Update all our PRs against 'master' to be based on 'main' instead, and 3.
> Delete the branch currently named 'master' (after the first two steps and
> ensuring 'main' contains all of its commits)
>
> This effectively results in a rename of our primary development branches.
> Upon completion of these steps, I would send an email reminder to update
> any forks or local clones.
>
> This vote will end after Thu 06 Aug 2020 12:00:00 PM UTC (Thu 06 Aug 2020
> 08:00:00 AM EDT / Thu 06 Aug 2020 05:00:00 AM PDT)
>
> Thanks,
> Christopher
>


Re: [VOTE] "Manager" as new name for "master" service

2020-08-03 Thread Dave Marion
+1

On Mon, Aug 3, 2020 at 10:15 AM Adam Lerman  wrote:

> +1
>
> On Mon, Aug 3, 2020 at 10:14 AM Owens, Mark  wrote:
>
> > +1
> >
> > -Original Message-
> > From: Christopher 
> > Sent: Monday, August 3, 2020 9:54 AM
> > To: accumulo-dev 
> > Subject: [VOTE] "Manager" as new name for "master" service
> >
> > Based on the feedback on
> > https://github.com/apache/accumulo/issues/1638 , the following two names
> > have taken a clear lead in popularity for the new name for the service
> > currently known as "master": Manager and Coordinator. Of the two,
> "Manager"
> > is more popular by a very narrow margin.
> >
> > Please vote on whether to accept "Manager" as the new name for the
> service
> > currently known as "master" to work towards. Remember, we're only voting
> on
> > a target name at this time, not a migration path, release plan, or any
> > specific code changes. Those details can be worked out in future actions.
> >
> > My vote is +1
> >
> > This vote will end after Thu 06 Aug 2020 02:00:00 PM UTC (Thu 06 Aug 2020
> > 10:00:00 AM EDT / Thu 06 Aug 2020 07:00:00 AM PDT)
> >
> > Thanks,
> >
> > Christopher
> >
>


Re: [VOTE] Apache Accumulo 1.10.0-rc1 (attempt 2)

2020-08-27 Thread Dave Marion
-1 , same reason.

On Wed, Aug 26, 2020 at 5:36 PM Christopher  wrote:

> -1 because of https://github.com/apache/accumulo/pull/1692
>
> On Wed, Aug 26, 2020 at 1:02 PM Mike Miller  wrote:
> >
> > Here is the fingerprint of the gpg key I used for signing:
> > 1914AF6FE2C53672C87CE1DADC8FFDC342894E89
> >
> > On Wed, Aug 26, 2020 at 11:52 AM Mike Miller  wrote:
> >
> > > Accumulo Developers,
> > >
> > > Please consider the following candidate for Apache Accumulo 1.10.0.
> > >
> > > Git Commit:
> > > 30d2d35a71bc50aac91b43c86f51c349adbede0f
> > > Branch:
> > > 1.10.0-rc1
> > >
> > > If this vote passes, a gpg-signed tag will be created using:
> > > git tag -f -m 'Apache Accumulo 1.10.0' -s rel/1.10.0 \
> > > 30d2d35a71bc50aac91b43c86f51c349adbede0f
> > >
> > > Staging repo:
> > >
> https://repository.apache.org/content/repositories/orgapacheaccumulo-1085
> > > Source (official release artifact):
> > >
> https://repository.apache.org/content/repositories/orgapacheaccumulo-1085/org/apache/accumulo/accumulo/1.10.0/accumulo-1.10.0-src.tar.gz
> > > Binary:
> > >
> https://repository.apache.org/content/repositories/orgapacheaccumulo-1085/org/apache/accumulo/accumulo/1.10.0/accumulo-1.10.0-bin.tar.gz
> > >
> > > Append ".asc" to download the cryptographic signature for a given
> artifact.
> > > (You can also append ".sha1" or ".md5" instead in order to verify the
> > > checksums
> > > generated by Maven to verify the integrity of the Nexus repository
> staging
> > > area.)
> > >
> > > Signing keys are available at
> https://www.apache.org/dist/accumulo/KEYS
> > > (Expected fingerprint: 1914AF6FE2C53672C87CE1DADC8FFDC342894E89
> > > 73FA18AB40B246B2CAB6C3929247F00F838DFA21
> > > 10CF505ADD8A393C0F43FCB79012BEAF250F433E
> > > 3EFE21FEEFD4D7C601F2111242F77927DD546F7A)
> > >
> > > In addition to the tarballs and their signatures, the following
> checksum
> > > files will be added to the dist/release SVN area after release:
> > > accumulo-1.10.0-src.tar.gz.sha512 will contain:
> > > SHA512 (accumulo-1.10.0-src.tar.gz) =
> > >
> 962acc0c75edc5270f00f63321994aa2e142433f5a0edb8e03747662841689390946a1363c22e18b503602f1b03d12f8b750972276bf3a0ffbd93d56d2d55034
> > > accumulo-1.10.0-bin.tar.gz.sha512 will contain:
> > > SHA512 (accumulo-1.10.0-bin.tar.gz) =
> > >
> 92a06b89f3e6434f5e900d4f8c0fbf9e9b85485d9e4e67641bf2fffec9664b3da5bebeb90892470d16d84619ce6241191a9c32063b740db0087d46cf00998323
> > >
> > > Release notes (in progress) can be found at:
> > > https://accumulo.apache.org/release/accumulo-1.10.0/
> > >
> > > Release testing instructions:
> > > https://accumulo.apache.org/contributor/verifying-release
> > >
> > > Please vote one of:
> > > [ ] +1 - I have verified and accept...
> > > [ ] +0 - I have reservations, but not strong enough to vote against...
> > > [ ] -1 - Because..., I do not accept...
> > > ... these artifacts as the 1.10.0 release of Apache Accumulo.
> > >
> > > This vote will remain open until at least Sat Aug 29 15:30:00 UTC 2020.
> > > (Sat Aug 29 11:30:00 EDT 2020 / Sat Aug 29 08:30:00 PDT 2020)
> > > Voting can continue after this deadline until the release manager
> > > sends an email ending the vote.
> > >
> > > Thanks!
> > >
> > > P.S. Hint: download the whole staging repo with
> > > wget -erobots=off -r -l inf -np -nH \
> > >
> > >
> https://repository.apache.org/content/repositories/orgapacheaccumulo-1085/
> > > # note the trailing slash is needed
> > >
>


Re: [VOTE] Apache Accumulo 1.10.0-rc2

2020-08-28 Thread Dave Marion
+1
- Verified checksums and signatures for accumulo-1.10.0-bin.tar.gz
- Verified checksums and signatures for accumulo-1.10.0-src.tar.gz
- BUILD SUCCESS running "mvn verify
-Dfailsafe.groups=org.apache.accumulo.test.categories.SunnyDayTests" on
source bundle
- 99.7% BUILD SUCCESS running "mvn verify
-Dfailsafe.groups=org.apache.accumulo.test.categories.MiniClusterOnlyTests
-Dtimeout.factor=2" on source bundle, errors:
- DurabilityIT.testWriteSpeed, which has subsequently been removed in
main
- SuspendedTabletsIT.crashAndResumeTserver, which timed out

On Thu, Aug 27, 2020 at 12:37 PM Mike Miller  wrote:

> Accumulo Developers,
>
> Please consider the following candidate for Apache Accumulo 1.10.0.
>
> Git Commit:
> 4d261254c3ac43a3bd13ce974e91ce4303a83998
> Branch:
> 1.10.0-rc2
>
> If this vote passes, a gpg-signed tag will be created using:
> git tag -f -m 'Apache Accumulo 1.10.0' -s rel/1.10.0 \
> 4d261254c3ac43a3bd13ce974e91ce4303a83998
>
> Staging repo:
> https://repository.apache.org/content/repositories/orgapacheaccumulo-1086
> Source (official release artifact):
>
> https://repository.apache.org/content/repositories/orgapacheaccumulo-1086/org/apache/accumulo/accumulo/1.10.0/accumulo-1.10.0-src.tar.gz
> Binary:
>
> https://repository.apache.org/content/repositories/orgapacheaccumulo-1086/org/apache/accumulo/accumulo/1.10.0/accumulo-1.10.0-bin.tar.gz
>
> Append ".asc" to download the cryptographic signature for a given artifact.
> (You can also append ".sha1" or ".md5" instead in order to verify the
> checksums
> generated by Maven to verify the integrity of the Nexus repository staging
> area.)
>
> Signing keys are available at https://www.apache.org/dist/accumulo/KEYS
> (Expected fingerprint: 1914AF6FE2C53672C87CE1DADC8FFDC342894E89)
>
> In addition to the tarballs and their signatures, the following checksum
> files will be added to the dist/release SVN area after release:
> accumulo-1.10.0-src.tar.gz.sha512 will contain:
> SHA512 (accumulo-1.10.0-src.tar.gz) =
>
> 81f2a8f8273e2bdfe46d6a807dc38276ee2937ced648829648b7750bfc22816c13d43461d1b08c50a6957d78a999ae3109c93d2f31c7d8be116e91e0ea25f5c2
> accumulo-1.10.0-bin.tar.gz.sha512 will contain:
> SHA512 (accumulo-1.10.0-bin.tar.gz) =
>
> 9d3023c8724069282035ed6dcb047f737c1c53dc05f7b15da2cfd941f51d1d7720892496475430eb639f3a36c83f4eecc1942c0317c67d38dcf2061d06beb648
>
> Release notes (in progress) can be found at:
> https://accumulo.apache.org/release/accumulo-1.10.0/
>
> Release testing instructions:
> https://accumulo.apache.org/contributor/verifying-release
>
> Please vote one of:
> [ ] +1 - I have verified and accept...
> [ ] +0 - I have reservations, but not strong enough to vote against...
> [ ] -1 - Because..., I do not accept...
> ... these artifacts as the 1.10.0 release of Apache Accumulo.
>
> This vote will remain open until at least Sun Aug 30 16:30:00 UTC 2020.
> (Sun Aug 30 12:30:00 EDT 2020 / Sun Aug 30 09:30:00 PDT 2020)
> Voting can continue after this deadline until the release manager
> sends an email ending the vote.
>
> Thanks!
>
> P.S. Hint: download the whole staging repo with
> wget -erobots=off -r -l inf -np -nH \
>
> https://repository.apache.org/content/repositories/orgapacheaccumulo-1086/
> # note the trailing slash is needed
>


Re: Review Request 23988: Use Dropwizard to create a proper REST monitor service

2014-07-29 Thread Dave Marion


> On July 28, 2014, 5:35 p.m., Christopher Tubbs wrote:
> > server/monitor-rest/pom.xml, lines 98-131
> > 
> >
> > Please don't shade by default in the build. It creates a nightmare for 
> > pom dependency resolution. We should not be shipping shaded binary 
> > artifacts in a release, or deploying them to maven central in a release.
> 
> Josh Elser wrote:
> That's the whole point of Dropwizard. I'd recommend you read into it - 
> https://dropwizard.github.io/dropwizard/getting-started.html, specifically 
> https://dropwizard.github.io/dropwizard/getting-started.html#building-fat-jars.

Consumers of the rest service may need the client classes depending on what 
they are doing in their application. In this case they will have to use the 
shaded jar, as it contains the client classes, and it will also pull in the 
jax-rs, jax-b, and jackson jars which may conflict with what their application 
is doing. If we have a shaded jar on the server side, for ease of classpath or 
whatever, then I think we want to create a client jar that only contains the 
client classes.


- Dave


---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/23988/#review48869
---


On July 28, 2014, 4:55 p.m., Josh Elser wrote:
> 
> ---
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/23988/
> ---
> 
> (Updated July 28, 2014, 4:55 p.m.)
> 
> 
> Review request for accumulo.
> 
> 
> Bugs: ACCUMULO-3005
> https://issues.apache.org/jira/browse/ACCUMULO-3005
> 
> 
> Repository: accumulo
> 
> 
> Description
> ---
> 
> Creates a proper REST service using Dropwizard, with the intent to eventually 
> replace the existing monitor's "data" component. Copies most of the 
> functionality (sans the log-forwarding) into a standalone application. 
> Returns data as JSON and tries to separate logic into consumable pieces. 
> Still uses the Monitor class for most Thrift interactions.
> 
> 
> Diffs
> -
> 
>   assemble/bin/accumulo 727a4c8 
>   assemble/bin/start-all.sh cebbd8c 
>   assemble/bin/stop-all.sh 4bf06c0 
>   assemble/bin/stop-server.sh 52696af 
>   assemble/conf/templates/accumulo-site.xml 08c905b 
>   assemble/pom.xml 89a3747 
>   pom.xml ba6693d 
>   server/monitor-rest/.gitignore PRE-CREATION 
>   server/monitor-rest/pom.xml PRE-CREATION 
>   
> server/monitor-rest/src/main/java/org/apache/accumulo/monitor/rest/MonitorApplication.java
>  PRE-CREATION 
>   
> server/monitor-rest/src/main/java/org/apache/accumulo/monitor/rest/MonitorConfiguration.java
>  PRE-CREATION 
>   
> server/monitor-rest/src/main/java/org/apache/accumulo/monitor/rest/api/GarbageCollection.java
>  PRE-CREATION 
>   
> server/monitor-rest/src/main/java/org/apache/accumulo/monitor/rest/api/GarbageCollectorCycle.java
>  PRE-CREATION 
>   
> server/monitor-rest/src/main/java/org/apache/accumulo/monitor/rest/api/GarbageCollectorStatus.java
>  PRE-CREATION 
>   
> server/monitor-rest/src/main/java/org/apache/accumulo/monitor/rest/api/LogEvent.java
>  PRE-CREATION 
>   
> server/monitor-rest/src/main/java/org/apache/accumulo/monitor/rest/api/RecoveryStatusInformation.java
>  PRE-CREATION 
>   
> server/monitor-rest/src/main/java/org/apache/accumulo/monitor/rest/api/ReplicationInformation.java
>  PRE-CREATION 
>   
> server/monitor-rest/src/main/java/org/apache/accumulo/monitor/rest/api/TableInformation.java
>  PRE-CREATION 
>   
> server/monitor-rest/src/main/java/org/apache/accumulo/monitor/rest/api/TabletServerInformation.java
>  PRE-CREATION 
>   
> server/monitor-rest/src/main/java/org/apache/accumulo/monitor/rest/api/TabletServerTableInformation.java
>  PRE-CREATION 
>   
> server/monitor-rest/src/main/java/org/apache/accumulo/monitor/rest/api/TabletServerWithTableInformation.java
>  PRE-CREATION 
>   
> server/monitor-rest/src/main/java/org/apache/accumulo/monitor/rest/health/AccumuloHealthCheck.java
>  PRE-CREATION 
>   
> server/monitor-rest/src/main/java/org/apache/accumulo/monitor/rest/resources/GarbageCollectorResource.java
>  PRE-CREATION 
>   
> server/monitor-rest/src/main/java/org/apache/accumulo/monitor/rest/resources/LogResource.java
>  PRE-CREATION 
>   
> server/monitor-rest/src/main/java/org/apache/accumulo/monitor/rest/resources/MasterResource.java
>  PRE-CREATION 
>   
> server/monitor-rest/src/main/java/org/apache/accumulo/monitor/rest/resources/ProblemsResource.java
>  PRE-CREATION 
>   
> server/monitor-rest/src/main/java/org/apache/accumulo/monitor/rest/resources/ReplicationResource.java
>  PRE-CREATION 
>   
> server/monitor-rest/src/main/java/org/apache/accumulo/monitor/rest/resources/StatisticsOverTimeResource.java
>  PRE-CREATION 
>   
> server/monitor-re

Re: Review Request 23988: Use Dropwizard to create a proper REST monitor service

2014-07-29 Thread Dave Marion


> On July 28, 2014, 5:35 p.m., Christopher Tubbs wrote:
> > server/monitor-rest/pom.xml, lines 98-131
> > <https://reviews.apache.org/r/23988/diff/1/?file=643622#file643622line98>
> >
> > Please don't shade by default in the build. It creates a nightmare for 
> > pom dependency resolution. We should not be shipping shaded binary 
> > artifacts in a release, or deploying them to maven central in a release.
> 
> Josh Elser wrote:
> That's the whole point of Dropwizard. I'd recommend you read into it - 
> https://dropwizard.github.io/dropwizard/getting-started.html, specifically 
> https://dropwizard.github.io/dropwizard/getting-started.html#building-fat-jars.
> 
> Dave Marion wrote:
> Consumers of the rest service may need the client classes depending on 
> what they are doing in their application. In this case they will have to use 
> the shaded jar, as it contains the client classes, and it will also pull in 
> the jax-rs, jax-b, and jackson jars which may conflict with what their 
> application is doing. If we have a shaded jar on the server side, for ease of 
> classpath or whatever, then I think we want to create a client jar that only 
> contains the client classes.
> 
> Josh Elser wrote:
> That's valid. The pojos could be split into their own artifact (long 
> term, probably rolled into the long talked about 'accumulo-client' jar or 
> similar).
> 
> Christopher Tubbs wrote:
> Shading comes with a whole host of problems, some of which I mentioned 
> above, but also because anybody having a dependency on some POJO inside the 
> jar is going to have a huge nightmare with additional bundled stuff.
> 
> If that's the only way to reasonable work with dropwizard, then I think 
> we should look at alternative REST frameworks, for something that can be more 
> easily baked in to the existing monitor packaging. Alternatively, we could do 
> a full separation and provide that service as a separate project or contrib, 
> which might actually help us focus on providing a standard API for 
> metrics/stats that any monitoring tool might be able to use, rather than 
> enriching the existing one.
> 
> On the other hand, that link you provide doesn't really explain that we 
> need to build shaded jars... it just explains what shaded jars are and why 
> you might want to create them. In our case, I think non-shaded is preferred. 
> It fits better into our existing build and scripts, and doesn't come with all 
> the problems that shaded jars have.
> 
> Josh Elser wrote:
> Won't restate the POJO resolution as I already addressed that.
> 
> The reason I like the recommendation of the shaded jar is because no one 
> on this project is a real "expert" on setting up Java web services. It gets 
> out of the way to provide some easy standards of just writing code. I do like 
> the technology as well (Jackson, Jetty, and Jersey), but boy do I hate trying 
> to actually set it up by hand for numerous reasons (primarily death by 1000 
> configuration parameters).
> 
> Removing the shade might not be *too* bad because we could hide the 
> dependencies necessary via the assembly pom and the shell scripts (which is 
> the argument that Dropwizard typically makes - you just need a single jar 
> that can be deployed with a simple `java -jar foo.jar server` and not having 
> to futz with the classpath). Having our own collection of scripts will 
> mitigate some of the pains that shading solves. I'll see if I can get 
> something working without shade some evening.
> 
> Integration with metrics systems is outside of the scope of what this is 
> providing, but there is already integration with JMX as well as Ganglia. Both 
> of those are also unrelated to these changes. While yes, it may be nice if we 
> could integrate with some magical metrics library that gives us everything we 
> want, I've yet to find such a thing. Until then, it's pointless to keep an 
> unmaintainable collection of code just because it "would be nice" if such a 
> magical library existed. This diff is making improvements to Accumulo 
> providing metrics data in a consumable format.
> 
> The splitting out of the existing servlet classes into data producers 
> (REST endpoints), we already make one step closer to easing integration with 
> other systems. Additionally, creating POJOs for the data being returned also 
> allows these integrations to more reliably use the data we produce.
> 
> Sean Busbey wrote:
> The primary reason to build a shaded jar is to provide a simpler 
> deployment model. From what I read, the point of dropwizard i

Re: [DISCUSS] Semantic Versioning

2014-12-08 Thread Dave Marion
+1
On Dec 8, 2014 6:22 PM, "Christopher"  wrote:

> Short Summary:
>
> I see 6 informal +1s (including my own) for adopting Semver, and no -1s.
> Other points differ.
>
> Longer Summary:
>
> Including additional strictness for deprecation documented in a major
> release does not have significant consensus and, in hindsight, probably
> doesn't really add much value. Semver does not bind us to a particular
> release cycle for major/minor/bugfix, only what we call it when we make
> certain changes. The basic Semver rules are sufficient.
>
> Including additional strictness for forward compatibility isn't necessary.
> Semver requires a minor version bump if new features are added to the API.
> So, this is redundant and not needed.
>
> Including the wire version is tough without a test framework, and maybe
> unnecessary, since the main concern about compatibility seems to be with
> applications needing to be modified to function with a newer client
> library, which contains the RPC code. If we ensure compatibility at the
> API, then users simply need to drop in the appropriate client jars for wire
> compatibility. This is probably sufficient.
>
> There seems to be some confusion about when and where these rules are
> applied. However, I believe we can go ahead and start adopting these rules
> from here on, without any issues. This doesn't hurt users, and only *adds*
> to the stability of the API, which we've already been striving for. It also
> doesn't bind us to a particular release cycle or deprecation duration. It
> only helps us determine what minimum version we should call something, when
> we do release. Upon adoption, the "master" branch version can be computed
> from the rules. If that computation requires a bump higher than what we are
> comfortable with, we can always ensure a greater level of compatibility
> than what currently exists, in order to avoid that bump, if we so choose.
> Adoption of these rules should help inform such discussions.
>
> Now, to be clear, it may be the case that the 1.5 and 1.6 maintenance
> branches already have introduced additional APIs that under Semver would
> have required a minor version bump. I'm not suggesting that we revert those
> changes, but by adopting the Semver, we can agree to avoid doing that from
> here on. Since 1.7 already adds additional features, by adopting Semver, we
> simply agree that the master branch should be called 2.0 if it is not
> backwards-compatible with 1.6.x, and 1.7.0 if it is. Adopting these rules
> helps inform that decision, but does not make that decision for us. Either
> way, that decision would be independent of adopting Semver today for all
> future releases. Incidentally, this answers the question of whether 2.0 can
> introduce "breaking" (removal of deprecations) changes, but it does not say
> that we must stop support for 1.x or release 2.0 on any particular
> timeline.
>
> Action:
>
> In the absence of further discussion, I think we should call a majority
> vote (tomorrow) to adopt Semver, so we can immediately start communicating
> better versioning semantics, and we can make progress with a concrete
> decision to help with release planning. The specific wording of the
> proposition I would suggest (please propose amendments here if you think it
> is unclear) would be:
>
> "Vote to adopt Semantic Versioning 2.0.0 (as described at
> https://semver.org)
> from this point forward, for all future releases, with the public API
> documented in the README."
>
> --
> Christopher L Tubbs II
> http://gravatar.com/ctubbsii
>


Re: [DISCUSS] Semantic Versioning

2014-12-08 Thread Dave Marion
Yes. http://semver.org #7 and #8 apply here. Minor version bump when
methodes are deprecated. Major version bump when backwards compatibility is
broken.
On Dec 8, 2014 6:55 PM, "John Vines"  wrote:

> Just to make sure I'm understanding this before we get into another vote
> thread kerfluffle, if we adopt semver in 1.7.0, include a new client api in
> 1.7.0, deprecate the old api in 1.7.0, then semver would allow (but not
> require) removing the deprecated api in 2.0.0, correct?
>
> On Mon, Dec 8, 2014 at 6:21 PM, Christopher  wrote:
>
> > Short Summary:
> >
> > I see 6 informal +1s (including my own) for adopting Semver, and no -1s.
> > Other points differ.
> >
> > Longer Summary:
> >
> > Including additional strictness for deprecation documented in a major
> > release does not have significant consensus and, in hindsight, probably
> > doesn't really add much value. Semver does not bind us to a particular
> > release cycle for major/minor/bugfix, only what we call it when we make
> > certain changes. The basic Semver rules are sufficient.
> >
> > Including additional strictness for forward compatibility isn't
> necessary.
> > Semver requires a minor version bump if new features are added to the
> API.
> > So, this is redundant and not needed.
> >
> > Including the wire version is tough without a test framework, and maybe
> > unnecessary, since the main concern about compatibility seems to be with
> > applications needing to be modified to function with a newer client
> > library, which contains the RPC code. If we ensure compatibility at the
> > API, then users simply need to drop in the appropriate client jars for
> wire
> > compatibility. This is probably sufficient.
> >
> > There seems to be some confusion about when and where these rules are
> > applied. However, I believe we can go ahead and start adopting these
> rules
> > from here on, without any issues. This doesn't hurt users, and only
> *adds*
> > to the stability of the API, which we've already been striving for. It
> also
> > doesn't bind us to a particular release cycle or deprecation duration. It
> > only helps us determine what minimum version we should call something,
> when
> > we do release. Upon adoption, the "master" branch version can be computed
> > from the rules. If that computation requires a bump higher than what we
> are
> > comfortable with, we can always ensure a greater level of compatibility
> > than what currently exists, in order to avoid that bump, if we so choose.
> > Adoption of these rules should help inform such discussions.
> >
> > Now, to be clear, it may be the case that the 1.5 and 1.6 maintenance
> > branches already have introduced additional APIs that under Semver would
> > have required a minor version bump. I'm not suggesting that we revert
> those
> > changes, but by adopting the Semver, we can agree to avoid doing that
> from
> > here on. Since 1.7 already adds additional features, by adopting Semver,
> we
> > simply agree that the master branch should be called 2.0 if it is not
> > backwards-compatible with 1.6.x, and 1.7.0 if it is. Adopting these rules
> > helps inform that decision, but does not make that decision for us.
> Either
> > way, that decision would be independent of adopting Semver today for all
> > future releases. Incidentally, this answers the question of whether 2.0
> can
> > introduce "breaking" (removal of deprecations) changes, but it does not
> say
> > that we must stop support for 1.x or release 2.0 on any particular
> > timeline.
> >
> > Action:
> >
> > In the absence of further discussion, I think we should call a majority
> > vote (tomorrow) to adopt Semver, so we can immediately start
> communicating
> > better versioning semantics, and we can make progress with a concrete
> > decision to help with release planning. The specific wording of the
> > proposition I would suggest (please propose amendments here if you think
> it
> > is unclear) would be:
> >
> > "Vote to adopt Semantic Versioning 2.0.0 (as described at
> > https://semver.org)
> > from this point forward, for all future releases, with the public API
> > documented in the README."
> >
> > --
> > Christopher L Tubbs II
> > http://gravatar.com/ctubbsii
> >
>


Re: State of 1.6 branches

2014-12-26 Thread Dave Marion
That didnt clear it up for me. BTW, the release management section needs to
be changed now that we have adopted semver. I will make a ticket for that.

I will change the question. We are working on releasing 1.6.2. Where should
changes for 1.6.3 be placed?
Dave, consult http://accumulo.apache.org/git.html

tl;dr No, the 1.6 branch is for the lifetime of all 1.6 versions.

WRT the other branches you are seeing, I'm guessing it's Corey doing
something in preparation for making a 1.6.2.

dlmar...@comcast.net wrote:

> We have 1.6, 1.6.2-SNAPSHOT, and 1.6.2-rc0 branches. Should we change the
> version in the 1.6 branch to 1.6.3-SNAPSHOT? If not, what's the plan?
>
>


Re: State of 1.6 branches

2014-12-26 Thread Dave Marion
Thanks for the clarification. Doesnt this process make the release managers
job harder by having to wade through all the commits on the development
branch and cherry pick them? I think I will hold my commits until the
release to make it easier.
On Dec 26, 2014 1:00 PM, "Christopher"  wrote:

> Changes for 1.6.3 should go in the 1.6 branch. However, it may be the case
> that branch is released from an earlier commit. The release manager should
> be responsible for ensuring any additional fixes included in the 1.6.2
> release from the 1.6 branch are properly marked for 1.6.2.
>
>
> --
> Christopher L Tubbs II
> http://gravatar.com/ctubbsii
>
> On Fri, Dec 26, 2014 at 12:46 PM, Dave Marion  wrote:
>
> > That didnt clear it up for me. BTW, the release management section needs
> to
> > be changed now that we have adopted semver. I will make a ticket for
> that.
> >
> > I will change the question. We are working on releasing 1.6.2. Where
> should
> > changes for 1.6.3 be placed?
> > Dave, consult http://accumulo.apache.org/git.html
> >
> > tl;dr No, the 1.6 branch is for the lifetime of all 1.6 versions.
> >
> > WRT the other branches you are seeing, I'm guessing it's Corey doing
> > something in preparation for making a 1.6.2.
> >
> > dlmar...@comcast.net wrote:
> >
> > > We have 1.6, 1.6.2-SNAPSHOT, and 1.6.2-rc0 branches. Should we change
> the
> > > version in the 1.6 branch to 1.6.3-SNAPSHOT? If not, what's the plan?
> > >
> > >
> >
>


Re: Review Request 43957: ACCUMULO-1755: removed synchronized modifier from TabletServerBatchWriterstartProcessing()

2016-02-24 Thread Dave Marion

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/43957/
---

(Updated Feb. 24, 2016, 6:54 p.m.)


Review request for accumulo and Josh Elser.


Repository: accumulo


Description
---

ACCUMULO-1755: removed synchronized modifier from 
TabletServerBatchWriterstartProcessing()


Diffs
-

  
core/src/main/java/org/apache/accumulo/core/client/impl/TabletServerBatchWriter.java
 bc90d00 

Diff: https://reviews.apache.org/r/43957/diff/


Testing
---

unit tests in core pass


Thanks,

Dave Marion



Re: Review Request 43957: ACCUMULO-1755: removed synchronized modifier from TabletServerBatchWriterstartProcessing()

2016-02-24 Thread Dave Marion

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/43957/#review120544
---




core/src/main/java/org/apache/accumulo/core/client/impl/TabletServerBatchWriter.java
 (line 228)
<https://reviews.apache.org/r/43957/#comment182015>

Looking at this closer, by removing the synchronized modifier I think it 
might be possible to add the same set of mutations to the writer (dupes)


- Dave Marion


On Feb. 24, 2016, 6:54 p.m., Dave Marion wrote:
> 
> ---
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/43957/
> ---
> 
> (Updated Feb. 24, 2016, 6:54 p.m.)
> 
> 
> Review request for accumulo and Josh Elser.
> 
> 
> Repository: accumulo
> 
> 
> Description
> ---
> 
> ACCUMULO-1755: removed synchronized modifier from 
> TabletServerBatchWriterstartProcessing()
> 
> 
> Diffs
> -
> 
>   
> core/src/main/java/org/apache/accumulo/core/client/impl/TabletServerBatchWriter.java
>  bc90d00 
> 
> Diff: https://reviews.apache.org/r/43957/diff/
> 
> 
> Testing
> ---
> 
> unit tests in core pass
> 
> 
> Thanks,
> 
> Dave Marion
> 
>



Re: Review Request 43957: ACCUMULO-1755: removed synchronized modifier from TabletServerBatchWriterstartProcessing()

2016-02-24 Thread Dave Marion

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/43957/
---

(Updated Feb. 24, 2016, 7:29 p.m.)


Review request for accumulo and Josh Elser.


Changes
---

modified startProcessing() so that duplicates would not be added to the writer. 
Inside of startProcessing(), the MutationSet could have been swapped out by 
another thread between each call to mutations.get()


Repository: accumulo


Description
---

ACCUMULO-1755: removed synchronized modifier from 
TabletServerBatchWriterstartProcessing()


Diffs (updated)
-

  
core/src/main/java/org/apache/accumulo/core/client/impl/TabletServerBatchWriter.java
 bc90d00 

Diff: https://reviews.apache.org/r/43957/diff/


Testing
---

unit tests in core pass


Thanks,

Dave Marion



Re: Review Request 43957: ACCUMULO-1755: removed synchronized modifier from TabletServerBatchWriterstartProcessing()

2016-02-24 Thread Dave Marion


> On Feb. 24, 2016, 7:15 p.m., Josh Elser wrote:
> > core/src/main/java/org/apache/accumulo/core/client/impl/TabletServerBatchWriter.java,
> >  line 967
> > <https://reviews.apache.org/r/43957/diff/1/?file=1268376#file1268376line967>
> >
> > Can be `final` now. Also, why the upgrade to Long?

mutation.estimateMemoryUsed() returns a long and we were truncating to an int.


On Feb. 24, 2016, 7:15 p.m., Dave Marion wrote:
> > I'm trying to look at the big picture here as well, but I'm not sure that 
> > this change actually helps the synchronization. It seems like every call to 
> > `startProcessing` is made by a `synchronized` method (flush, addMutation, 
> > close, addFailedMutations) or while holding 
> > `TabletServerBatchWriter.class`'s lock. Am I missing something?

Ah, yes. I believe that I can remove the synchronized block in the jtimer now. 
I'm trying to hold off from removing synchronized modifiers from the other 
methods, but I think I can do it from addMutation now


- Dave


---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/43957/#review120541
-------


On Feb. 24, 2016, 7:29 p.m., Dave Marion wrote:
> 
> ---
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/43957/
> ---
> 
> (Updated Feb. 24, 2016, 7:29 p.m.)
> 
> 
> Review request for accumulo and Josh Elser.
> 
> 
> Repository: accumulo
> 
> 
> Description
> ---
> 
> ACCUMULO-1755: removed synchronized modifier from 
> TabletServerBatchWriterstartProcessing()
> 
> 
> Diffs
> -
> 
>   
> core/src/main/java/org/apache/accumulo/core/client/impl/TabletServerBatchWriter.java
>  bc90d00 
> 
> Diff: https://reviews.apache.org/r/43957/diff/
> 
> 
> Testing
> ---
> 
> unit tests in core pass
> 
> 
> Thanks,
> 
> Dave Marion
> 
>



Re: Review Request 43957: ACCUMULO-1755: removed synchronized modifier from TabletServerBatchWriterstartProcessing()

2016-02-24 Thread Dave Marion


> On Feb. 24, 2016, 8:43 p.m., Christopher Tubbs wrote:
> > core/src/main/java/org/apache/accumulo/core/client/impl/TabletServerBatchWriter.java,
> >  line 227
> > <https://reviews.apache.org/r/43957/diff/1/?file=1268376#file1268376line227>
> >
> > Isn't there a race condition here without the synchronized block? Does 
> > it matter?

Hang on. I ripped the band-aid off and started removing synchronized modifiers. 
Should have an updated patch here soon.


- Dave


---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/43957/#review120575
-------


On Feb. 24, 2016, 7:29 p.m., Dave Marion wrote:
> 
> ---
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/43957/
> ---
> 
> (Updated Feb. 24, 2016, 7:29 p.m.)
> 
> 
> Review request for accumulo and Josh Elser.
> 
> 
> Repository: accumulo
> 
> 
> Description
> ---
> 
> ACCUMULO-1755: removed synchronized modifier from 
> TabletServerBatchWriterstartProcessing()
> 
> 
> Diffs
> -
> 
>   
> core/src/main/java/org/apache/accumulo/core/client/impl/TabletServerBatchWriter.java
>  bc90d00 
> 
> Diff: https://reviews.apache.org/r/43957/diff/
> 
> 
> Testing
> ---
> 
> unit tests in core pass
> 
> 
> Thanks,
> 
> Dave Marion
> 
>



Re: Review Request 43957: ACCUMULO-1755: removed synchronized modifier from TabletServerBatchWriterstartProcessing()

2016-02-24 Thread Dave Marion

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/43957/
---

(Updated Feb. 24, 2016, 8:53 p.m.)


Review request for accumulo and Josh Elser.


Changes
---

Removed synchronized modifier from addMutation (danger Will Robinson)


Repository: accumulo


Description
---

ACCUMULO-1755: removed synchronized modifier from 
TabletServerBatchWriterstartProcessing()


Diffs (updated)
-

  
core/src/main/java/org/apache/accumulo/core/client/impl/TabletServerBatchWriter.java
 bc90d00 

Diff: https://reviews.apache.org/r/43957/diff/


Testing
---

unit tests in core pass


Thanks,

Dave Marion



Re: Review Request 43957: ACCUMULO-1755: removed synchronized modifier from TabletServerBatchWriterstartProcessing()

2016-02-24 Thread Dave Marion


> On Feb. 24, 2016, 7:15 p.m., Josh Elser wrote:
> > core/src/main/java/org/apache/accumulo/core/client/impl/TabletServerBatchWriter.java,
> >  line 974
> > <https://reviews.apache.org/r/43957/diff/1/?file=1268376#file1268376line974>
> >
> > Don't need to specify the parameterization of the HashMap

Why wouldn't you set the size on the map to avoid the overhead of growing it?


- Dave


---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/43957/#review120541
-------


On Feb. 24, 2016, 8:53 p.m., Dave Marion wrote:
> 
> ---
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/43957/
> ---
> 
> (Updated Feb. 24, 2016, 8:53 p.m.)
> 
> 
> Review request for accumulo and Josh Elser.
> 
> 
> Repository: accumulo
> 
> 
> Description
> ---
> 
> ACCUMULO-1755: removed synchronized modifier from 
> TabletServerBatchWriterstartProcessing()
> 
> 
> Diffs
> -
> 
>   
> core/src/main/java/org/apache/accumulo/core/client/impl/TabletServerBatchWriter.java
>  bc90d00 
> 
> Diff: https://reviews.apache.org/r/43957/diff/
> 
> 
> Testing
> ---
> 
> unit tests in core pass
> 
> 
> Thanks,
> 
> Dave Marion
> 
>



Re: Review Request 43957: ACCUMULO-1755: removed synchronized modifier from TabletServerBatchWriterstartProcessing()

2016-02-24 Thread Dave Marion


> On Feb. 24, 2016, 7:15 p.m., Josh Elser wrote:
> > core/src/main/java/org/apache/accumulo/core/client/impl/TabletServerBatchWriter.java,
> >  line 223
> > <https://reviews.apache.org/r/43957/diff/1/?file=1268376#file1268376line223>
> >
> > Can be `final`.

Dropped issue, code changed.


- Dave


---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/43957/#review120541
---


On Feb. 24, 2016, 8:53 p.m., Dave Marion wrote:
> 
> ---
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/43957/
> ---
> 
> (Updated Feb. 24, 2016, 8:53 p.m.)
> 
> 
> Review request for accumulo and Josh Elser.
> 
> 
> Repository: accumulo
> 
> 
> Description
> ---
> 
> ACCUMULO-1755: removed synchronized modifier from 
> TabletServerBatchWriterstartProcessing()
> 
> 
> Diffs
> -
> 
>   
> core/src/main/java/org/apache/accumulo/core/client/impl/TabletServerBatchWriter.java
>  bc90d00 
> 
> Diff: https://reviews.apache.org/r/43957/diff/
> 
> 
> Testing
> ---
> 
> unit tests in core pass
> 
> 
> Thanks,
> 
> Dave Marion
> 
>



Re: Review Request 43957: ACCUMULO-1755: removed synchronized modifier from TabletServerBatchWriterstartProcessing()

2016-02-24 Thread Dave Marion


> On Feb. 24, 2016, 7:15 p.m., Josh Elser wrote:
> > core/src/main/java/org/apache/accumulo/core/client/impl/TabletServerBatchWriter.java,
> >  line 974
> > <https://reviews.apache.org/r/43957/diff/1/?file=1268376#file1268376line974>
> >
> > Don't need to specify the parameterization of the HashMap
> 
> Dave Marion wrote:
> Why wouldn't you set the size on the map to avoid the overhead of growing 
> it?
> 
> Josh Elser wrote:
> The parameterization, not the arguments "String,List" is 
> unnecessary.

Gotcha


- Dave


---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/43957/#review120541
---


On Feb. 24, 2016, 8:53 p.m., Dave Marion wrote:
> 
> ---
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/43957/
> ---
> 
> (Updated Feb. 24, 2016, 8:53 p.m.)
> 
> 
> Review request for accumulo and Josh Elser.
> 
> 
> Repository: accumulo
> 
> 
> Description
> ---
> 
> ACCUMULO-1755: removed synchronized modifier from 
> TabletServerBatchWriterstartProcessing()
> 
> 
> Diffs
> -
> 
>   
> core/src/main/java/org/apache/accumulo/core/client/impl/TabletServerBatchWriter.java
>  bc90d00 
> 
> Diff: https://reviews.apache.org/r/43957/diff/
> 
> 
> Testing
> ---
> 
> unit tests in core pass
> 
> 
> Thanks,
> 
> Dave Marion
> 
>



Re: Review Request 43957: ACCUMULO-1755: removed synchronized modifier from TabletServerBatchWriterstartProcessing()

2016-02-24 Thread Dave Marion

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/43957/
---

(Updated Feb. 24, 2016, 6:10 p.m.)


Review request for accumulo and Josh Elser.


Changes
---

Added bug to rb


Bugs: ACCUMULO-1755
https://issues.apache.org/jira/browse/ACCUMULO-1755


Repository: accumulo


Description
---

ACCUMULO-1755: removed synchronized modifier from 
TabletServerBatchWriterstartProcessing()


Diffs
-

  
core/src/main/java/org/apache/accumulo/core/client/impl/TabletServerBatchWriter.java
 bc90d00 

Diff: https://reviews.apache.org/r/43957/diff/


Testing
---

unit tests in core pass


Thanks,

Dave Marion



RE: Vote For A) README_UBUNTU B) "Compiling on Ubuntu" In README

2012-07-26 Thread Dave Marion

  Neither. Looking over the README, it contains information for developers
and users. So, we have a README, user and developer manuals, and a wiki.
This information is clearly for developers, maybe it should go in the
developer manual or on the web site in the developer guide section[1].  

  It almost seems as if there are too many documents. Maybe get rid of the
README, and instead have a CHANGES document that details the differences
between releases. Then, in addition, have a user and developer manual in the
distribution, and make those document also available from the web site.
Three documents instead of four, each having their own distinct purpose.

[1] http://accumulo.apache.org/source.html

Dave

-Original Message-
From: David Medinets [mailto:david.medin...@gmail.com] 
Sent: Thursday, July 26, 2012 7:17 PM
To: accumulo-dev
Subject: Vote For A) README_UBUNTU B) "Compiling on Ubuntu" In README

I'd like to get a vote by the contributors on which approach to use.
I'll start the voting:

A) +1 because I don't know how much information might be needed and the
information might change for different versions of Accumulo. Also the README
might wind up with many OS-specific sections. And, lastly because I've
already done it. :)



RE: new committers!

2012-09-26 Thread Dave Marion

  Hello to all. I'm currently working on ticket #708 and hope to contribute
further improvements to the JMX instrumentation. I also hope to update the
wiki search example soon. Thanks,

Dave

-Original Message-
From: Billie Rinaldi [mailto:bil...@apache.org] 
Sent: Wednesday, September 26, 2012 12:24 PM
To: dev@accumulo.apache.org
Subject: new committers!

I am pleased to announce that Chris Tubbs and Dave Marion have been voted to
become new committers for Apache Accumulo.

Welcome, Dave and Chris!  Feel free to say a few words about your
development interests.

Billie



RE: JIRA Etiquette / Hackathon Projects

2012-10-08 Thread Dave Marion
Regarding 350, I believe that I have accomplished this in ticket 708. I need
to write one more test class and then I will be ready to test locally.

Dave

-Original Message-
From: Adam Fuchs [mailto:afu...@apache.org] 
Sent: Monday, October 08, 2012 1:18 PM
To: dev@accumulo.apache.org
Subject: Re: JIRA Etiquette / Hackathon Projects

We had a discussion about this not long ago -- I think we decided that
people should mark UNASSIGNED any of the tickets they're not working on.
However, this doesn't mean you can assume that all of the tickets that are
assigned are being worked. If you find one you want to work on, you can
always ask the assignee if they're working on it.

ACCUMULO-350 would be a good hackathon project.

Adam



Re: Contributing Organizations

2012-12-19 Thread Dave Marion
+1

Dave Marion


Sent from my Motorola ATRIX™ 4G on AT&T

-Original message-
From: Christopher Tubbs 
To: dev@accumulo.apache.org
Sent: Wed, Dec 19, 2012 22:41:07 GMT+00:00
Subject: Contributing Organizations

All-

Many other projects list the organizations where their developers /
contributors are from.
See, for example:

http://zookeeper.apache.org/credits.html
http://hadoop.apache.org/who.html
http://gora.apache.org/credits.html

We can, and probably should, do the same, if this is agreeable to a
sufficient number of us. (If we do this, it should probably be understood
that it is a voluntary extra column on our page, and that it is the
responsibility of committers to add themselves.)

For reference, our current credits page looks like this:
http://accumulo.apache.org/people.html

--
Christopher L Tubbs II
http://gravatar.com/ctubbsii



RE: Contributing Organizations

2013-01-02 Thread Dave Marion
I see 3 proponents and 0 opponents of this idea. Can we put it to a vote?

Dave

-Original Message-
From: Dave Marion [mailto:dlmar...@comcast.net] 
Sent: Wednesday, December 19, 2012 6:30 PM
To: dev@accumulo.apache.org
Subject: Re: Contributing Organizations

+1

Dave Marion


Sent from my Motorola ATRIX™ 4G on AT&T

-Original message-
From: Christopher Tubbs 
To: dev@accumulo.apache.org
Sent: Wed, Dec 19, 2012 22:41:07 GMT+00:00
Subject: Contributing Organizations

All-

Many other projects list the organizations where their developers / 
contributors are from.
See, for example:

http://zookeeper.apache.org/credits.html
http://hadoop.apache.org/who.html
http://gora.apache.org/credits.html

We can, and probably should, do the same, if this is agreeable to a sufficient 
number of us. (If we do this, it should probably be understood that it is a 
voluntary extra column on our page, and that it is the responsibility of 
committers to add themselves.)

For reference, our current credits page looks like this:
http://accumulo.apache.org/people.html

--
Christopher L Tubbs II
http://gravatar.com/ctubbsii




RE: Accumulo 1.6 and beyond feature summit

2013-01-28 Thread Dave Marion
"- Allowing iterators to launch connections to other services (caching,
other tservers) to retrieve or write data"

  What does allow mean in this context? I don't think its disallowed (I know
of an iterator that does this).

-Original Message-
From: William Slacum [mailto:wilhelm.von.cl...@accumulo.net] 
Sent: Monday, January 28, 2013 7:13 PM
To: dev@accumulo.apache.org
Subject: Re: Accumulo 1.6 and beyond feature summit

I'd like to see:

- Data triggers on insertion
- REST interface for looking up ranges of keys
- A DSL or some other interpreted language for crafting iterators
  - there's the clojure iterator, but something like python (via jython) or
javascript (via rhino) would be more adoptable
- Adding a clean up hook to iterators
- Allowing iterators to launch connections to other services (caching, other
tservers) to retrieve or write data
- Merging of the batch scanner and scanner implementations
  - a batch scanner with 1 thread have the same behavior as a scanner
  - scanners have a close() method on them
- Adding some builder interface for creating and introspecting iterator
stacks
- Clients being able to scan to specific keys using the scan command



RE: Accumulo 1.6 and beyond feature summit

2013-01-28 Thread Dave Marion
Ok. That one that I know of doesn't spin up a batch scanner, it makes a call
to another program.

-Original Message-
From: William Slacum [mailto:wilhelm.von.cl...@accumulo.net] 
Sent: Monday, January 28, 2013 7:33 PM
To: dev@accumulo.apache.org
Subject: Re: Accumulo 1.6 and beyond feature summit

Currently it's not recommended to launch a batch scanner from an iterator
and retrieve new information, due to the possibility of a dead lock. Other
services may alleviate that concern, but due to lifecycle management issues
(related to the "add a clean up method to iterators"), it's not fool proof
to clean up connections from it.

On Mon, Jan 28, 2013 at 7:21 PM, Dave Marion  wrote:

> "- Allowing iterators to launch connections to other services 
> (caching, other tservers) to retrieve or write data"
>
>   What does allow mean in this context? I don't think its disallowed 
> (I know of an iterator that does this).
>
> -Original Message-
> From: William Slacum [mailto:wilhelm.von.cl...@accumulo.net]
> Sent: Monday, January 28, 2013 7:13 PM
> To: dev@accumulo.apache.org
> Subject: Re: Accumulo 1.6 and beyond feature summit
>
> I'd like to see:
>
> - Data triggers on insertion
> - REST interface for looking up ranges of keys
> - A DSL or some other interpreted language for crafting iterators
>   - there's the clojure iterator, but something like python (via 
> jython) or javascript (via rhino) would be more adoptable
> - Adding a clean up hook to iterators
> - Allowing iterators to launch connections to other services (caching, 
> other
> tservers) to retrieve or write data
> - Merging of the batch scanner and scanner implementations
>   - a batch scanner with 1 thread have the same behavior as a scanner
>   - scanners have a close() method on them
> - Adding some builder interface for creating and introspecting 
> iterator stacks
> - Clients being able to scan to specific keys using the scan command
>
>



RE: Using powermock-api-mockito in tests?

2013-03-21 Thread Dave Marion
Out of curiosity, why do you say that " System.getenv() which breaks the tests 
in AccumuloVFSClassLoaderTest?" It's worked fine for a while. What is different 
now?

-- Dave

-Original Message-
From: dlmar...@comcast.net [mailto:dlmar...@comcast.net] 
Sent: Thursday, March 21, 2013 5:12 PM
To: dev@accumulo.apache.org
Subject: Re: Using powermock-api-mockito in tests?



We can do it with PowerMock, no need to add  Mockito. This should work, going 
from memory here. I should be able to help when I get back to a computer if you 
have problems. 



  //Mock the method 

  PowerMock.mockStatic(System.class, System.class.getMethod("getenv")); 



  //Invoke it 

  Map mockSystemProperties = new HashMap();
  mockSystemProperties.put("ACCUMULO_HOME", System.getenv("HOME"));
  EasyMock.expect(System.getenv()).andReturn(mockSystemProperties); 



- Original Message -


From: "David Medinets" 
To: "accumulo-dev" 
Sent: Thursday, March 21, 2013 4:59:40 PM
Subject: Using powermock-api-mockito in tests? 

Is there any reason why I should not add a dependency in start/pom.xml to 
powermock-api-mockito? With this library, we can mock the call to
System.getenv() which breaks the tests in AccumuloVFSClassLoaderTest. 
The two tests need these four lines of setup in order to pass: 

  Map mockSystemProperties = new HashMap();
  mockSystemProperties.put("ACCUMULO_HOME", System.getenv("HOME")); 

  PowerMockito.mockStatic(System.class);
  Mockito.when(System.getenv()).thenReturn(mockSystemProperties); 

You'll notice that set ACCUMULO_HOME is set to the value of HOME to make the 
test cross-platform. 



RE: Using powermock-api-mockito in tests?

2013-03-21 Thread Dave Marion

 The Hadoop MiniDFSCluster won't work on Windoze. He'll have to exclude most of 
the new classloader tests from running on that platform. 

-Original Message-
From: Josh Elser [mailto:josh.el...@gmail.com] 
Sent: Thursday, March 21, 2013 6:44 PM
To: dev@accumulo.apache.org
Subject: Re: Using powermock-api-mockito in tests?

David is trying to build on Windows.

On 03/21/2013 06:40 PM, Dave Marion wrote:
> Out of curiosity, why do you say that " System.getenv() which breaks the 
> tests in AccumuloVFSClassLoaderTest?" It's worked fine for a while. What is 
> different now?
>
> -- Dave
>
> -Original Message-
> From: dlmar...@comcast.net [mailto:dlmar...@comcast.net]
> Sent: Thursday, March 21, 2013 5:12 PM
> To: dev@accumulo.apache.org
> Subject: Re: Using powermock-api-mockito in tests?
>
>
>
> We can do it with PowerMock, no need to add  Mockito. This should work, going 
> from memory here. I should be able to help when I get back to a computer if 
> you have problems.
>
>
>
>//Mock the method
>
>PowerMock.mockStatic(System.class, System.class.getMethod("getenv"));
>
>
>
>//Invoke it
>
>Map mockSystemProperties = new HashMap();
>mockSystemProperties.put("ACCUMULO_HOME", System.getenv("HOME"));
>EasyMock.expect(System.getenv()).andReturn(mockSystemProperties);
>
>
>
> - Original Message -
>
>
> From: "David Medinets" 
> To: "accumulo-dev" 
> Sent: Thursday, March 21, 2013 4:59:40 PM
> Subject: Using powermock-api-mockito in tests?
>
> Is there any reason why I should not add a dependency in start/pom.xml to 
> powermock-api-mockito? With this library, we can mock the call to
> System.getenv() which breaks the tests in AccumuloVFSClassLoaderTest.
> The two tests need these four lines of setup in order to pass:
>
>Map mockSystemProperties = new HashMap String>();
>mockSystemProperties.put("ACCUMULO_HOME", System.getenv("HOME"));
>
>PowerMockito.mockStatic(System.class);
>Mockito.when(System.getenv()).thenReturn(mockSystemProperties);
>
> You'll notice that set ACCUMULO_HOME is set to the value of HOME to make the 
> test cross-platform.
>



RE: Using powermock-api-mockito in tests?

2013-03-22 Thread Dave Marion
David,

  I was not trying to prevent you from being productive, I was trying to
prevent you from wasting your time. I stated several times that
MiniDFSCluster does not work on Windows. I'll try to be more specific.

 After you mock the bash execution, you will likely find that you have
broken the test for everyone else because MiniDFSCluster expects a certain
directory permission, see [1] and [2] for lengthy discussions on the topic.
In addition, when you get to the point of starting MiniDFSCluster, you will
find that it does not work on Windows due to non-portable paths and such,
see [3]. At that point you will be left with removing MiniDFSCluster and
mocking every call to HDFS from the tests.

  I think a better solution is to move the classloader tests that use
AccumuloDFSBase to integration tests. These can be run on Hudson, which I
think is using a *nix flavor of O/S and should run with no problem. Maybe a
future version of MiniDFSCluster will work on Windows.

[1] https://issues.apache.org/jira/browse/ACCUMULO-708
[2]
http://www.google.com/#hl=en&sclient=psy-ab&q=MiniDFSCluster+directory+permi
ssions
[3] http://www.google.com/#q=MiniDFSCluster+windows+paths&hl=en


Dave

-Original Message-
From: David Medinets [mailto:david.medin...@gmail.com] 
Sent: Friday, March 22, 2013 5:43 PM
To: dev@accumulo.apache.org
Subject: Re: Using powermock-api-mockito in tests?

How does it hurt the project if I spend time on this? Accumulo already
compiles on Windows. It's the tests that are failing. How does skipping
failing tests help? I suggest that unit tests should not spawn exec
processes in any case because that is a source of slowness.
Mocking the unix-specific stuff on Windows will lead to faster tests.

On Fri, Mar 22, 2013 at 10:42 AM, Jim Klucar  wrote:
> +1 to Dave's comment. I don't think we should be spending effort 
> +supporting
> compiling on an unsupported runtime environment. If something fails 
> because it is too *nix-y then just skip that test with a local pom.xml 
> override or something.
>
>
> On Fri, Mar 22, 2013 at 10:11 AM, Keith Turner  wrote:
>
>> On Thu, Mar 21, 2013 at 10:16 PM,   wrote:
>> >
>> > So we are getting into an area where you want to compile the 
>> > software on
>> a platform that is not supported. If you want to compile on an 
>> unsupported platform, then I would suggest just ignoring the tests 
>> that won't work on that system.
>>
>> My thought on this is that if changes to make it work on windows 
>> improve the test and/or build process, then thats good.  On the other 
>> hand I would be opposed to making test and/or build more complex
>> inorder to support windows.   I would define increasing complexity as
>> making it more difficult to run, maintain, or improve the test and/or 
>> build process.
>>
>> >
>> > I don't think that this needs to be changed now as Hadoop only 
>> > supports
>> *nix based systems and we are close to a 1.5.0 release. If you want 
>> to tackle this in 1.6 (trunk) thats a different story.
>> >
>> >
>> > - Original Message -
>> > From: "David Medinets" 
>> > To: dev@accumulo.apache.org
>> > Sent: Thursday, March 21, 2013 10:08:48 PM
>> > Subject: Re: Using powermock-api-mockito in tests?
>> >
>> > I hate ignoring things. It makes me uneasy. I'm looking at the 
>> > other tests as well. For example, the AccumuloDFSBase class depends 
>> > on running /bin/sh to find a umask. No reason that dependency can't 
>> > be mocked out during testing... If nothing else, this research will 
>> > form my own set of Accumulo Zen Koans.
>> >
>> > On Thu, Mar 21, 2013 at 10:03 PM,  wrote:
>> >>
>> >> Take a look at my other email on this subject, it might be better 
>> >> to
>> just add the profile that I mentioned and add this to the list of 
>> ignored tests for now. I know that there is a ticket for removing 
>> ACCUMULO_HOME in all places.
>> >>
>> >> - Original Message -
>> >> From: "David Medinets" 
>> >> To: dev@accumulo.apache.org
>> >> Sent: Thursday, March 21, 2013 9:58:18 PM
>> >> Subject: Re: Using powermock-api-mockito in tests?
>> >>
>> >> Dave, you were very close. Here is the mocking code that I used.
>> >>
>> >> Map mockSystemProperties = new HashMap> String>();
>> >> mockSystemProperties.put("ACCUMULO_HOME", System.getenv("HOME"));
>> >>
>> >> PowerMock.mockStaticPartial(System.class, "getenv");
>> >>
>> EasyMock.expect(System.getenv()).andReturn(mockSystemProperties).anyT
>> imes();
>> >>
>> EasyMock.expect(System.getenv("ACCUMULO_XTRAJARS")).andReturn("").any
>> Times();
>> >> PowerMock.replayAll();
>> >>
>> >> I'd like write a JIRA ticket and commit this code. I'll wait until 
>> >> tomorrow for feedback though. No rush for this kind of change.
>> >>
>> >> The message that started this investigation was:
>> >>
>> >>
>>
testDefaultConfig(org.apache.accumulo.start.classloader.vfs.AccumuloVFSClass
LoaderTest):
>> >> Could not find file with URI "/lib/ext/[^.].*.jar" because it is a 
>> >> relative path, and no base URI

RE: multi-table isolated batch scanner

2013-04-15 Thread Dave Marion
---> I have found that increasing the buffer size also increases the latency
for getting the first results.

  We have found that to be true also, we do the opposite to get to the first
result faster. Of course we are not performing a local sort first.

---> increasing the batch size too much puts significant memory requirements
on the process running the batch scanner

  Pushing the problem from the client to the server increases the
complexity. I would be concerned with multiple concurrent scans that are
saving state. The server side state will compete for tserver application
memory. I would assume that you would have to build some feature to restrict
the amount of memory that the state can consume. 

-Original Message-
From: Adam Fuchs [mailto:afu...@apache.org] 
Sent: Monday, April 15, 2013 6:19 PM
To: dev@accumulo.apache.org
Subject: Re: multi-table isolated batch scanner

Keith,

In this case we're filling the buffer before we can amortize the search
cost. We're using a document-partitioned table design and we have to do a
local sort before we can get the first result.

I have found that increasing the buffer size also increases the latency for
getting the first results. This application is both latency and throughput
sensitive. In addition, increasing the batch size too much puts significant
memory requirements on the process running the batch scanner.

Adam



On Mon, Apr 15, 2013 at 5:33 PM, Keith Turner  wrote:

> On Mon, Apr 15, 2013 at 5:06 PM, Adam Fuchs  wrote:
> > Chris,
> >
> > The desire for isolation stems from the desire to amortize some
> computation
> > over a number of results. Say it takes 5 seconds to compute an
> intersection
>
> Would increasing the size of the key/value buffer help in your case?
> The iterator stack is not torn down until that buffer fills up or the 
> end of tablet is reached.  Are you concerned about the cost of 
> reconstructing the iterator stack across tablets?
>
> > of a couple of sets within the iterators, and then streaming back 
> > the results takes a minute or so. If I have to redo the 5 second 
> > computation many times, as in to support the reconstruction of the 
> > iterator tree,
> then
> > that computation may start to dominate my query performance. 
> > Primarily, this means I need to be able to continue a scan without 
> > having to rebuild the iterators. Isolation in the scanner has that 
> > side effect. Proper isolation would be a "nice-to-have", but I can deal
with not having it.
> >
> > Adam
> >
> >
> >
> > On Mon, Apr 15, 2013 at 4:13 PM, Christopher 
> wrote:
> >
> >> Adam-
> >>
> >> It seems like you're talking about two features at once:
> >> 1) Multi-table batch scanner.
> >> 2) Scan Isolation on batch scanners like we have on regular scanners.
> >> Is that correct?
> >>
> >> I can see the utility of a multi-table batch scanner, but I haven't 
> >> seen a compelling need for implementing isolation on the 
> >> batch-scanners. Do you have a use case in mind for that?
> >>
> >> Also, it seems that your use case for isolation is not so much the 
> >> isolated reads, but the statefulness of the iterator stack on the 
> >> server side. Is this correct? If so, I'm even more curious about 
> >> your use case for this, since that statefulness is only guaranteed
per-row.
> >>
> >>
> >> --
> >> Christopher L Tubbs II
> >> http://gravatar.com/ctubbsii
> >>
> >>
> >> On Mon, Apr 15, 2013 at 3:10 PM, Adam Fuchs  wrote:
> >> > Thanks Bill,
> >> >
> >> > I care about latency and throughput. First available result 
> >> > ordering
> is
> >> > fine, though.
> >> >
> >> > Does Guava just chain through a collection of iterators, 
> >> > completing
> one
> >> > then moving to the next?
> >> >
> >> > Adam
> >> >
> >> >
> >> >
> >> > On Mon, Apr 15, 2013 at 3:06 PM, William Slacum < 
> >> > wilhelm.von.cl...@accumulo.net> wrote:
> >> >
> >> >> How are you expecting to get results back? Guava's Iterables 
> >> >> could
> >> concat a
> >> >> bunch of a Scanners together, if you didn't care about the 
> >> >> throughput aspect of it and simply wanted results from multiple
tables.
> >> >>
> >> >> On Mon, Apr 15, 2013 at 3:00 PM, Adam Fuchs 
> wrote:
> >> >>
> >> >> > Is anyone else pining for a multi-table isolated batch 
> >> >> > scanner, or
> is
> >> it
> >> >> > just me? I like the automatic parallelism and balancing of the
> batch
> >> >> > scanner, but I'm looking to maintain server-side state in my
> iterators
> >> >> over
> >> >> > long-running scans. I would also like to scan over multiple 
> >> >> > tables concurrently. Has anyone tried hacking something 
> >> >> > together with a
> pool
> >> of
> >> >> > non-batch scanners?
> >> >> >
> >> >> > Adam
> >> >> >
> >> >>
> >>
>



RE: VFS class reloading?

2013-04-16 Thread Dave Marion
The implementation changed several times, so the pre-1.5 layout may not
work. In 1.5, using the bootstrap script, it should put the accumulo jars
into HDFS and dynamic loading from there should occur. I'll try and test
tonight if I have time.

-Original Message-
From: John Vines [mailto:vi...@apache.org] 
Sent: Tuesday, April 16, 2013 6:30 PM
To: Accumulo Dev List
Subject: VFS class reloading?

Maybe I missed something with the switch to the VFS classloader, but does
dynamic loading out of lib/ext no longer work? I had accumulo 1.5 running,
threw an iterator in there, but had to restart tserver to get the new
iterator picked up. Was that an intentional change?



RE: VFS class reloading?

2013-04-16 Thread Dave Marion
Looking at the code, it should work. Keith and I had several conversations
about what the new classloader should do. I believe that he wanted it to
behave like the old one and what I see in the code supports that. If it is
not working, then I would say create a ticket for it for now. I'll try to
replicate it tonight if I have time.

-Original Message-
From: Dave Marion [mailto:dlmar...@comcast.net] 
Sent: Tuesday, April 16, 2013 6:41 PM
To: dev@accumulo.apache.org
Subject: RE: VFS class reloading?

The implementation changed several times, so the pre-1.5 layout may not
work. In 1.5, using the bootstrap script, it should put the accumulo jars
into HDFS and dynamic loading from there should occur. I'll try and test
tonight if I have time.

-Original Message-
From: John Vines [mailto:vi...@apache.org]
Sent: Tuesday, April 16, 2013 6:30 PM
To: Accumulo Dev List
Subject: VFS class reloading?

Maybe I missed something with the switch to the VFS classloader, but does
dynamic loading out of lib/ext no longer work? I had accumulo 1.5 running,
threw an iterator in there, but had to restart tserver to get the new
iterator picked up. Was that an intentional change?



RE: performance

2013-05-03 Thread Dave Marion
Pressure for adoption?

Sent from my Motorola ATRIX™ 4G on AT&T

-Original message-
From: Drew Pierce 
To: "dev@accumulo.apache.org" 
Sent: Fri, May 3, 2013 19:37:50 GMT+00:00
Subject: RE: performance

problem is that pressure is mounting for adoption and GA of sqrrl is some time 
away. 
thx


> Date: Fri, 3 May 2013 14:36:50 -0400
> Subject: Re: peformance
> From: wilhelm.von.cl...@accumulo.net
> To: dev@accumulo.apache.org
> 
> Does sqrrl provide an example framework to play around with?
> 
> 
> On Fri, May 3, 2013 at 2:20 PM, Adam Fuchs  wrote:
> 
> > Hey Drew,
> >
> > This could be a very broad question, so I'll give a partial answer and
> > encourage you to come back for more details.
> >
> > Impala is a mechanism that sits on top of HBase or HDFS that is design to
> > filter and process large quantities of data. People generally like Impala
> > because it supports a subset of SQL and because it is optimized to reduce
> > the latency that might be incurred by starting up a job in a bulk
> > synchronous processing framework. Instead, it uses a series of daemon
> > processes and a custom API to reduce overhead.
> >
> > With Accumulo, our approach to low-latency queries is generally to use a
> > table structure that incorporates some type of index. With appropriate
> > indexing techniques, Accumulo can achieve sub-second query latencies even
> > over multi-petabyte sized corpuses. Some of these table designs are
> > described in the manual:
> > http://accumulo.apache.org/1.4/user_manual/Table_Design.html
> >
> > Regarding the SQL piece, Accumulo does not natively support an SQL
> > interface. For that you would need to wrap it in a processing framework,
> > like Hive (https://issues.apache.org/jira/browse/ACCUMULO-143). To make a
> > shameless plug, Sqrrl (www.sqrrl.com) also offers that functionality.
> >
> > Cheers,
> > Adam
> >
> >
> >
> > On Fri, May 3, 2013 at 12:39 PM, Drew Pierce  wrote:
> >
> > > does anyone have any anecdotal results (nothing formal) for queries to
> > > speak to the likes of impala and near low-latency.
> > > Sent from my Android
> > >
> > > Sorry if brief
> > >
> > >
> >
  


Re: Is C++ code still part of 1.5 release?

2013-05-17 Thread Dave Marion
Good link Billie.


Sent from my Motorola ATRIX™ 4G on AT&T

-Original message-
From: Billie Rinaldi 
To: dev@accumulo.apache.org
Sent: Fri, May 17, 2013 21:39:44 GMT+00:00
Subject: Re: Is C++ code still part of 1.5 release?

On May 17, 2013 5:13 PM, "Adam Fuchs"  wrote:
>
> I'm with Michael on this one. We should really only be releasing one
> package that has all of the source and built binaries. IMO the
> interpretation of http://www.apache.org/dev/release.html that we must have
> a source-only release is overly restrictive. "Every ASF release must
> contain a source package, which must be sufficient for a user to build and
> test the release provided they have access to the appropriate platform and
> tools." can also be interpreted such that a single package with source and
> binaries meets the release requirement.

In lieu of ranting myself, I'll point you here: http://s.apache.org/nnN

Billie

>
> I have seen a lot of confusion about people trying to build the accumulo
> code when they really don't need to, and they often run into trouble when
> their environment is not set up for java development. Having multiple
> .tar.gz artifacts adds to this confusion. When we reordered the download
> page so that the -dist.tar.gz came before the -src.tar.gz those types of
> questions dropped dramatically on the mailing list. The existence of the
> -src.tar.gz creates confusion on its own (although our README doesn't
help).
>
> Adam
>
>
>
> On Fri, May 17, 2013 at 4:00 PM, Michael Berman  wrote:
>
> > As an Accumulo user, the thing I want most is a single package that
> > contains the things I need to set up a running instance.  I don't want
to
> > build the whole thing from source, but I am happy to build the native
map,
> > unless every possible architecture is going to be distributed.  I really
> > don't care at all whether the tarball name ends in "-bin" or "-package"
or
> > "-theStuffYouWant".  If the only reason not to include the native map
> > sources in the binary release is because the filename ends in -bin, why
not
> > just call it accumulo-1.5.0.tar.gz?
> >
> >
> > On Fri, May 17, 2013 at 3:51 PM, John Vines  wrote:
> >
> > > If we're going to be making binary releases that have no other
mechanism
> > > for creating the native libraries, then we should probably cut a few
> > > different binary releases for x86, amd64, and darwin at the very
least.
> > >
> > > Sent from my phone, please pardon the typos and brevity.
> > > On May 17, 2013 12:36 PM, "Josh Elser"  wrote:
> > >
> > > > I'm happy we're stating our opinions here, but there are also two
other
> > > > people who believe that the bin should not contain it. That's nice
that
> > > you
> > > > want source code in a binary release, but your opinion is not the
only
> > > one.
> > > > I feel like you're telling me that my opinion is sub-par to your
> > opinion
> > > > because it is.
> > > >
> > > > If this is such a sticking point, I move that we completely kill the
> > > > notion of source and binary releases and make one tarball that
contains
> > > > both.
> > > >
> > > > On 5/17/13 3:17 PM, John Vines wrote:
> > > >
> > > >> I agree with Adam. It seems like it's a debate of consistency vs.
> > > >> pragmatism. The cost of including these libraries are all of maybe
1kb
> > > in
> > > >> the package. The cost of excluding them is potential frustration
from
> > > end
> > > >> users and a lot of repetitive stress against the Apache Mirrors
(lets
> > > try
> > > >> and be considerate). I think it's a no brainer, but I have yet to
> > here a
> > > >> reason that is not 'no source code in a binary release!'
> > > >>
> > > >>
> > > >> On Fri, May 17, 2013 at 12:11 PM, Adam Fuchs 
> > wrote:
> > > >>
> > > >>  Just to solidify the decision that Chris is already leaning
towards,
> > > let
> > > >>> me
> > > >>> try to clarify my position:
> > > >>> 1. The only reason not to add the native library source code in
the
> > > >>> -bin.tar.gz distribution is that src != bin. There is no
measurable
> > > >>> negative effect of putting the cpp files and Makefile into the
> > > >>> -bin.tar.gz.
> > > >>> 2. At least one person wants the native library source code in the
> > > >>> -bin.tar.gz to make their life easier.
> > > >>>
> > > >>> This is a very simple decision. It really doesn't matter how easy
it
> > is
> > > >>> to
> > > >>> include prebuilt native code in some other way or build the code
and
> > > copy
> > > >>> it in using some other method. Those are all tangential arguments.
> > > >>>
> > > >>> Adam
> > > >>>
> > > >>>
> > > >>>
> > > >>>
> > > >>> On Fri, May 17, 2013 at 2:49 PM, William Slacum <
> > > >>> wilhelm.von.cl...@accumulo.net**> wrote:
> > > >>>
> > > >>>  I think of the native maps as an add on and they should probably
be
> > > 
> > > >>> treated
> > > >>>
> > >  as such. I think we should consider building a different package
and
> > >  installing them separately. Personally, for development and
> > testing, I
> > > >

RE: [VOTE] JDK 1.7 - Switch for Accumulo 1.6.0

2013-06-03 Thread Dave Marion

 +1 with reservations. This seems like a big API change, not because our 
dependency changes, but doesn't it force all consumers to change to Java 7 as 
well if new language features are used in the client? Looks like Java 6 
compiled iterators should work without change.

 I wonder if would make sense to:

 - move client to its own module (out of core),
 - build a Java 6 distribution of the client in addition to the Java 7 release, 
and
 - refrain from using Java 7 features in the client until a 2.0 release

-Original Message-
From: Sean Busbey [mailto:bus...@cloudera.com] 
Sent: Monday, June 03, 2013 7:05 PM
To: dev@accumulo.apache.org
Subject: Re: [VOTE] JDK 1.7 - Switch for Accumulo 1.6.0

On Mon, Jun 3, 2013 at 5:04 PM, Josh Elser  wrote:

> 

Also, some quick searching leads me to believe that 1.6 bytecode will run
> on a 1.7 JVM, but not vice versa. Does anyone know if this is the 
> case? I apologize if I'm bringing up an already-discussed subject.
> 
>


Just to confirm, the JDK7 compatibility guide says JDK7 compiled code won't 
work on a Java 6 VM[1]:

> The class file version for Java SE 7 is 51, as per the JVM 
> Specification,
because of the invokedynamic byte code
> introduced by JSR 292. Version 51 class files produced by the Java SE 
> 7
compiler cannot be used in Java SE 6.

[1]:
http://www.oracle.com/technetwork/java/javase/compatibility-417013.html#binary

--
Sean



RE: Backporting policy proposal

2013-06-17 Thread Dave Marion
+1 for more structure. I like the idea of not back porting new features, it
will hopefully allow for more frequent releases of major/minor releases.

  Someone mentioned bylaws for the project; easily found a few examples (see
below). One even mentions the types of changes that will be in the different
types of releases.

[1] http://hc.apache.org/bylaws.html
[2] http://pig.apache.org/bylaws.html

-Original Message-
From: Billie Rinaldi [mailto:billie.rina...@gmail.com] 
Sent: Monday, June 17, 2013 5:26 PM
To: dev@accumulo.apache.org
Subject: Re: Backporting policy proposal

On Mon, Jun 17, 2013 at 10:07 AM, Christopher  wrote:

> I propose we adopt a more structured policy beyond simple "lazy 
> consensus" to be apply to backporting features. Some guidelines I'd 
> like to see in this policy, include:
>
> 1. Back-porting bugfixes to a prior release line that is not EOL
> (end-of-life) is always okay (subject to normal lazy consensus), but 
> it is strongly preferred to fix it first in the older branch and merge 
> forward to the newer one(s).
>
> 2. Back-porting performance improvements to a prior release line that 
> is not EOL (end-of-life) is usually okay (subject to normal lazy 
> consensus), so long as it does not change user-facing behavior or API.
> It is still strongly preferred to add such fixes in the older branch 
> first, and merge forward to the newer one(s).
>
> 3. Back-porting new features and additions are to be avoided as a 
> general rule (see arguments for this in previous threads:
> ACCUMULO-1488 and http://s.apache.org/sU5 and probably others).
>
> 4. If it is desired to back-port a new feature, then a vote on the 
> developer mailing list should be called, due to the additional 
> development and support burden the new feature may cause for all 
> developers.
>
> 5. Even when it is agreed that a feature should be back-ported, it 
> should not be done unless/until a feature is first represented in a 
> newer release that has gone through the testing and release process, 
> and can be considered stable enough to back-port. This ensures focus 
> is kept on the main development branch for new features, and 
> significantly reduces the development burden of back-porting. It also 
> gives us a clear idea of the target behavior for the back-ported 
> feature, so that it will behave in the same way as the same feature in 
> the later release line.
>

I'm not sure #5 makes sense.  Certainly it's a sound idea not to do the
back-porting until the feature design has been hammered out very well.
However, in an example such as adding an iterator that we've agreed on
back-porting, whose design is clear, it wouldn't make sense to wait until
1.6.0 comes out to actually add it to the 1.5 line.  I could see an argument
for placing additional testing requirements on back-ported features, like
creating full-coverage unit tests and functional tests for the new code, to
offset the risk of introducing code that has not yet gone through a full
testing cycle and release process.

I'm still deciding what I think about #4.  For one, I'm reluctant to move
the discussion of the feature from the ticket to the dev list.  If we do
decide to require a vote (either on ticket or dev list), we should also
decide what type of approval is appropriate (consensus [1], majority [2], or
a modified version of either, such as requiring fewer +1s but placing the
same restrictions on -1s).

Billie

[1]: http://apache.org/foundation/glossary.html#ConsensusApproval
[2]: http://apache.org/foundation/glossary.html#MajorityApproval


>
> Please discuss these points, or add your own.
>
> --
> Christopher L Tubbs II
> http://gravatar.com/ctubbsii
>



RE: Schedule for 1.6.0 release?

2013-07-11 Thread Dave Marion
Historically, how long has it taken to complete testing of release candidates? 
Subtract that from 1 November and that should be the target date. Based on 
1.5.0, that means feature complete is tomorrow, right? :-)

-Original Message-
From: Sean Busbey [mailto:bus...@cloudera.com] 
Sent: Thursday, July 11, 2013 5:17 PM
To: dev@accumulo.apache.org
Subject: Schedule for 1.6.0 release?

One of the action items out of the 1.6.0 discussion[1] was that we'd use the 
list to decide on a target release date, feature set, and incremental 
milestones for Accumulo 1.6.0.

I know the initial plan was to aim for November, and right now Jira says as 
much[2].

That's only ~4 months away, so we should lay out some plans. When do we need to 
target feature complete to meet that goal? When does code freeze need to happen?



[1]:
https://docs.google.com/a/cloudera.com/document/d/1FkP2dDE4zzH1ou89_-qpW6-7dtBj9XdMRGjFnnLGrTI/edit
[2]: https://issues.apache.org/jira/browse/ACCUMULO/fixforversion/12322468

--
Sean



RE: Hadoop 2.0 Support for Accumulo 1.4 Branch

2013-08-01 Thread Dave Marion
Any update?

-Original Message-
From: Joey Echeverria [mailto:j...@cloudera.com] 
Sent: Monday, July 29, 2013 1:24 PM
To: dev@accumulo.apache.org
Subject: Re: Hadoop 2.0 Support for Accumulo 1.4 Branch

We're testing this today. I'll report back what we find. 


-Joey
—
Sent from Mailbox for iPhone

On Fri, Jul 26, 2013 at 3:34 PM, null  wrote:

> "Will 1.4 still work with 0.20 with these patches?" 
> Great point Billie. 
> - Original Message -
> From: "Billie Rinaldi" 
> To: dev@accumulo.apache.org
> Sent: Friday, July 26, 2013 3:02:41 PM
> Subject: Re: Hadoop 2.0 Support for Accumulo 1.4 Branch On Fri, Jul 
> 26, 2013 at 11:33 AM, Joey Echeverria  wrote:
>> > If these patches are going to be included with 1.4.4 or 1.4.5, I 
>> > would
>> like
>> > to see the following test run using CDH4 on at least a 5 node cluster. 
>> >  More nodes would be better. 
>> > 
>> >   * unit test
>> >   * Functional test
>> >   * 24 hr Continuous ingest + verification
>> >   * 24 hr Continuous ingest + verification + agitation
>> >   * 24 hr Random walk
>> >   * 24 hr Random walk + agitation
>> > 
>> > I may be able to assist with this, but I can not make any promises. 
>> 
>> Sure thing. Is there already a write-up on running this full battery 
>> of tests? I have a 10 node cluster that I can use for this.
>> 
>> 
>> > Great.  I think this would be a good patch for 1.4.   I assume that 
>> > if a user stays with Hadoop 1 there are no dependency changes?
>> 
>> Yup. It works the same way as 1.5 where all of the dependency changes 
>> are in a Hadoop 2.0 profile.
>> 
> In 1.5.0, we gave up on compatibility with 0.20 (and early versions of 
> 1.0) to make the compatibility requirements simpler; we ended up 
> without dependency changes in the hadoop version profiles.  Will 1.4 
> still work with 0.20 with these patches?  If there are dependency 
> changes in the profiles, 1.4 would have to be compiled against a 
> hadoop version compatible with the running version of hadoop, correct?  
> We had some trouble in the
> 1.5 release process with figuring out how to provide multiple binary 
> artifacts (each compiled against a different version of hadoop) for 
> the same release.  Just something we should consider before we are in 
> the midst of releasing 1.4.4.
> Billie
>> -Joey
>> 



Re: Updated staging website for 1.10 and "LTM" description

2020-09-08 Thread Dave Marion
reviewed both the staging site and announcement, looks good

On Tue, Sep 8, 2020 at 8:57 AM Christopher  wrote:

> Just a reminder: if you want to review the draft release announcement
> for 1.10 (sent in a separate email thread), or the staged website,
> which now includes the "LTM" description, and the 1.10 release, you
> have only a few hours remaining to do so. I intend to publish the site
> and send the announcement this afternoon (UTC -0400). Thanks.
>
> On Sat, Sep 5, 2020 at 6:04 AM Christopher  wrote:
> >
> > I said Monday... I meant the end of the day on Tuesday.
> > I forgot there's a U.S. holiday Monday.
> > I'll be monitoring responses regardless, but the extra day might help
> > people spend some more time to provide feedback on the changes I made.
> >
> > On Sat, Sep 5, 2020 at 5:58 AM Christopher  wrote:
> > >
> > > Hi Accumulo Devs,
> > >
> > > Please take a look at the updated staging website at
> > > https://accumulo.staged.apache.org/ for the 1.10 release. I've made
> > > many changes, and I'm looking for people to give me some feedback (in
> > > the form of discussion points here, or as PRs, or as direct commits,
> > > if necessary) before we publish the updates.
> > >
> > > Specifically, I want feedback on:
> > >
> > > 1. The term "LTM / Long Term Maintenance" - we have discussed this on
> > > the mailing list before (under the "LTS" name), and a description was
> > > drafted in the 1.10 release notes. I have moved it to another area of
> > > the website, added labels on the downloads page, and provided a
> > > detailed explanation of what it would mean. I would like this to be
> > > reviewed by others in the community before we publish.
> > >
> > > 2. Color schemes for the labels on the downloads, the release notes
> > > page alerts, and the release archive listing page. I tried to make the
> > > color scheme consistent and stuck with the provided bootstrap palette,
> > > and I also tried to provide enough information about what the labels
> > > mean, but I've been staring at it too long to have a "fresh" opinion
> > > about these.
> > >
> > > 3. Release notes navigation to see previous/next release notes. I want
> > > feedback on placement/appearance/utility of those. These are a bit of
> > > a nuisance to move, so if you really think something should be changed
> > > with these, I would appreciate assistance with that.
> > >
> > > 4. If you can, please take a look at the actual diff of the commit as
> > > well (it's already pushed as:
> > > 55cd45f818d0acb020938b105029344fa979e60a), rather than just the
> > > rendered site, just in case you catch something I missed (or broke).
> > >
> > > I wanted to get the website done and the announcement email out by
> > > Friday, but that didn't happen.
> > > I now expect to do it at the end of the day on Monday, so please take
> > > a look at the website if you can. Nothing is set in stone, so we can
> > > always change things later if you can't take a look before then.
> > >
> > > The 1.10.0 checklist now only has a few items remaining:
> > > https://github.com/apache/accumulo/issues/1699
> > > I've been up all night, so I'll try to draft the announcement email
> > > later today, unless somebody else gets to it before I wake up. :)
> > >
> > >
> > > Thanks,
> > > Christopher
>


Re: Uniquely identifying Scans

2020-09-16 Thread Dave Marion
I believe adding a property in the iterator options to identify a scan was
something that we did on a previous project of mine. IIRC list scans showed
the information.

On Wed, Sep 16, 2020 at 12:20 AM Christopher  wrote:

> Hi Andrew,
>
> We currently have the concept of a "scan session", but that is used
> internally only, for continuing a scan after retrieving a batch of
> results from a server. It does contain certain information, like a
> session id, but might not contain all the information you want... and
> in any case it's not a user-facing feature, but used internally.
>
> Your solution to use an iterator option is clever, and could work
> well, particularly if you are already setting an iterator on the
> client, and if listscans shows these options (I don't recall if it
> does). If you aren't setting an iterator on the client already, in
> order to add a superfluous option with the info you want to send, you
> could add an identity-mapping iterator (aka an "allow all" filter),
> but the extra iterator on the stack could have a performance impact.
>
> Another option, since 2.0.0, is to set an execution hint on a scanner
> (see https://accumulo.apache.org/docs/2.x/administration/scan-executors).
> However, querying the hint and emitting them to listscans, might
> require some code modification, as I'm not sure if those will show
> there right now. If you find this to be a viable option, and it needs
> additional code to work, feel free to propose a design to the dev
> list, or open a pull request.
>
> Related: we could also consider automatically populating some scan
> hints with some of the information the server side already knows, such
> as client IP address, client user name, etc.) into a reserved hint
> namespace (accumulo.* or similar), or a separate similar store that
> dispatchers would have access to in addition to execution hints.
>
> Christopher
>
> On Tue, Sep 15, 2020 at 8:50 PM Andrew Hulbert  wrote:
> >
> > Hi all,
> >
> > We were looking to uniquely tag a scan with some sort of ID that would
> > allow us to map it back to a client...something like a session id or
> > something else. The first idea we came up with was to simply set an
> > iterator option that could be viewable in listscans in the shell. Basic
> > idea would be to map it back to more than just the IP address of the
> > client (e.g a user name or something else injected on the client side
> > for a client servicing multiple users).
> >
> > Wondering if anybody else had done this before or if there were any
> > thoughts about having "scan metadata" in the future?
> >
> > Thanks,
> >
> > Andrew
> >
>


Re: [DISCUSS] Classloader change proposals

2020-09-16 Thread Dave Marion
Did anyone join the call? Any notes?

On Wed, Sep 16, 2020 at 12:44 PM Christopher  wrote:

> I just want to remind everybody that I'm available in Slack now to
> discuss this in the ongoing video call I created in the #accumulo
> room, if you want to join.
>
> On Mon, Sep 14, 2020 at 10:41 PM Christopher  wrote:
> >
> > Also, if anybody is interested in a live video conversation to discuss
> > this interactively, I intend to be on Slack on Wednesday afternoon
> > (EDT) starting around noon.
> >
> > On Mon, Sep 14, 2020 at 5:30 PM Christopher  wrote:
> > >
> > > Hi Accumulo Devs,
> > >
> > > Lately, Dave Marion (Apache ID: dlmarion) has been working on
> > > prototyping some new class loader concepts for Accumulo that he and I
> > > have discussed, and I wanted to pitch the idea here for consideration
> > > for the project.
> > >
> > > # Background:
> > >
> > > Accumulo currently has two classloaders that are instantiated at
> > > startup, and which can be used to bootstrap Accumulo dependencies (at
> > > least, those not needed for the classloader code itself). This allows
> > > us to use the `general.classpaths`[1] and
> > > `general.dynamic.classpaths`[2] properties, as well as the per-context
> > > classloaders (`general.vfs.*`[3] and `table.classpath.context`[4]) for
> > > things like iterator class isolation. Since 2.0.0, we have deprecated
> > > `general.classpaths` and `general.dynamic.classpaths`, the former
> > > supplanted by the better use of the `CLASSPATH` environment variable
> > > (along with much improved scripts in 2.0.0), and the latter being
> > > replaceable by a user-provided class loader using the built-in Java
> > > property, `java.system.class.loader`[5], at their discretion.
> > >
> > > # The Problem:
> > >
> > > The main problem with the current code is: complexity. Accumulo is
> > > already complex enough without needing to be in the business of
> > > developing and supporting complex custom class loading features,
> > > especially when users have viable alternatives that can be better
> > > supported by independent, dedicated projects. Furthermore, these
> > > custom class loaders also have a dependency on commons-vfs2, which has
> > > been the source of numerous problems and bugs that we have needed to
> > > deal with, and which affect Accumulo, even though they are not
> > > necessarily bugs in Accumulo itself. This also brings in a lot of
> > > optional dependencies that aren't needed by users who don't rely on
> > > these features.
> > >
> > > # The Requirements:
> > >
> > > In spite of these problems, I believe we still want to enable the use
> > > cases that our classloaders are currently enabling.
> > >
> > > Specifically,
> > > 1) the ability to have separate contexts for iterator class isolation
> > > (A/B testing of iterators, updating iterators in a live system, etc.),
> > > and
> > > 2) the ability for users to bootstrap their class path from some other
> > > distributed storage than local disk.
> > >
> > > # The Proposal:
> > >
> > > 1. Create a new reloading vfs class loader, with similar functionality
> > > as our current two-classloaders that do the reloading and provide vfs
> > > features, that can be easily used as a system class loader, if the
> > > user chooses to, and deprecate (for removal in 3.0) the built-in
> > > implementations. This class loader could not only be used with
> > > Accumulo, but it could also be used by any other project that chooses
> > > to use it, because it will not have much, if any, dependencies beyond
> > > commons-vfs2, and will certainly not depend on Accumulo. Creating this
> > > separate class loader provides us a path forward to simplify Accumulo
> > > by removing these features from Accumulo directly (the properties are
> > > already deprecated), and enabling it to be maintained independently.
> > > 2. Create a new class loader factory property in Accumulo, with
> > > corresponding SPI interface, for users to provide their own
> > > implementation of a class loader factory, that can map a per-table
> > > "context" to a ClassLoader of the implementation's choosing.
> > >
> > > The result of doing these two things will allow us to more flexibly
> > > support user class loading needs, without being directly responsible
> > > 

Re: [DISCUSS] Classloader change proposals

2020-09-17 Thread Dave Marion
  The new classloader will maintain the same functionality that you have
with the current classloader. At the point where the current classloader is
removed from the accumulo core code (version 3.0) Accumulo by default will
use the standard JVM classloading mechanism. If someone wants to use the
new classloader (location TBD) they would set the
`java.system.class.loader` property to the fully qualified class name of
the new classloader. When set, the JVM instantiates the specified
classloader as a child of the app classloader (the one that loads from
java.class.path) and then sets the new classloader as the system
classloader. See [1] for more information.

  The context classloading mechanism functionality is also maintained, but
the configuration has been modified a bit. There has been discussion about
the location for the new classloader with the fallback being that it's an
Accumulo subproject.

  I should be able to put up a PR soon with the proposed changes.

[1]
https://docs.oracle.com/en/java/javase/11/docs/api/java.base/java/lang/ClassLoader.html#getSystemClassLoader()

On Thu, Sep 17, 2020 at 3:18 PM ivan bella  wrote:

>  I like the idea of simplifying the dependencies that Accumulo has.
> As long as we can still have the ability to monitor a file system (e.g.
> hdfs) and reload the jars from a specified classpath when it has changed.
> This is needed for loading jars for our deployed application and managing
> patches, but then a second class loader would load the accumulo jars from
> disk. If the reloading classloader is the system classloader, will it still
> be able to fall back to the java classloader that uses CLASSPATH? Also I
> missed where the reloading vfs classloader would actually live?
>
>  I believe we will still end up having a stack of classloaders as we
> do today so I am not sure this is really simpler. It appears we are simply
> replacing the Vfs classloader with a reloading classloader that can then
> delegate to the vfs classloader. Perhaps I am missing something here.
>
>
> > On September 14, 2020 5:30 PM Christopher  wrote:
> >
> >
> > Hi Accumulo Devs,
> >
> > Lately, Dave Marion (Apache ID: dlmarion) has been working on
> > prototyping some new class loader concepts for Accumulo that he and I
> > have discussed, and I wanted to pitch the idea here for consideration
> > for the project.
> >
> > # Background:
> >
> > Accumulo currently has two classloaders that are instantiated at
> > startup, and which can be used to bootstrap Accumulo dependencies (at
> > least, those not needed for the classloader code itself). This allows
> > us to use the `general.classpaths`[1] and
> > `general.dynamic.classpaths`[2] properties, as well as the per-context
> > classloaders (`general.vfs.*`[3] and `table.classpath.context`[4]) for
> > things like iterator class isolation. Since 2.0.0, we have deprecated
> > `general.classpaths` and `general.dynamic.classpaths`, the former
> > supplanted by the better use of the `CLASSPATH` environment variable
> > (along with much improved scripts in 2.0.0), and the latter being
> > replaceable by a user-provided class loader using the built-in Java
> > property, `java.system.class.loader`[5], at their discretion.
> >
> > # The Problem:
> >
> > The main problem with the current code is: complexity. Accumulo is
> > already complex enough without needing to be in the business of
> > developing and supporting complex custom class loading features,
> > especially when users have viable alternatives that can be better
> > supported by independent, dedicated projects. Furthermore, these
> > custom class loaders also have a dependency on commons-vfs2, which has
> > been the source of numerous problems and bugs that we have needed to
> > deal with, and which affect Accumulo, even though they are not
> > necessarily bugs in Accumulo itself. This also brings in a lot of
> > optional dependencies that aren't needed by users who don't rely on
> > these features.
> >
> > # The Requirements:
> >
> > In spite of these problems, I believe we still want to enable the use
> > cases that our classloaders are currently enabling.
> >
> > Specifically,
> > 1) the ability to have separate contexts for iterator class isolation
> > (A/B testing of iterators, updating iterators in a live system, etc.),
> > and
> > 2) the ability for users to bootstrap their class path from some other
> > distributed storage than local disk.
> >
> > # The Proposal:
> >
> > 1. Create a new reloading vfs class loader, with similar functionality
> > as our current two-classloaders that do the reloading an

Re: [DISCUSS] Classloader change proposals

2020-09-18 Thread Dave Marion
 I tend to agree with Marc on this. If we need to push out a fix for the
new classloader, then we can do it as needed and not have to rely on
another group of people to come to consensus on a release. Of course, we
could maintain a fork of it in that case, but then what's the point? It
appears that VFS does have some recent activity [1], but mostly by one
person. I'm thinking that we should create a subproject for it and notify
the commons-vfs project of its existence. If they want to copy it and
include it in their project, then they can do that.

[1] https://gitbox.apache.org/repos/asf?p=commons-vfs.git;a=shortlog

On Wed, Sep 16, 2020 at 4:48 PM Christopher  wrote:

> Only Marc joined, and we didn't talk about anything that isn't already in
> this proposal, except he did mention how difficult it might be to try to
> maintain the class loader in commons-vfs2, rather than as our own small
> subproject, which is relatively easy.
>
> On Wed, Sep 16, 2020, 16:11 Dave Marion  wrote:
>
> > Did anyone join the call? Any notes?
> >
> > On Wed, Sep 16, 2020 at 12:44 PM Christopher 
> wrote:
> >
> > > I just want to remind everybody that I'm available in Slack now to
> > > discuss this in the ongoing video call I created in the #accumulo
> > > room, if you want to join.
> > >
> > > On Mon, Sep 14, 2020 at 10:41 PM Christopher 
> > wrote:
> > > >
> > > > Also, if anybody is interested in a live video conversation to
> discuss
> > > > this interactively, I intend to be on Slack on Wednesday afternoon
> > > > (EDT) starting around noon.
> > > >
> > > > On Mon, Sep 14, 2020 at 5:30 PM Christopher 
> > wrote:
> > > > >
> > > > > Hi Accumulo Devs,
> > > > >
> > > > > Lately, Dave Marion (Apache ID: dlmarion) has been working on
> > > > > prototyping some new class loader concepts for Accumulo that he
> and I
> > > > > have discussed, and I wanted to pitch the idea here for
> consideration
> > > > > for the project.
> > > > >
> > > > > # Background:
> > > > >
> > > > > Accumulo currently has two classloaders that are instantiated at
> > > > > startup, and which can be used to bootstrap Accumulo dependencies
> (at
> > > > > least, those not needed for the classloader code itself). This
> allows
> > > > > us to use the `general.classpaths`[1] and
> > > > > `general.dynamic.classpaths`[2] properties, as well as the
> > per-context
> > > > > classloaders (`general.vfs.*`[3] and `table.classpath.context`[4])
> > for
> > > > > things like iterator class isolation. Since 2.0.0, we have
> deprecated
> > > > > `general.classpaths` and `general.dynamic.classpaths`, the former
> > > > > supplanted by the better use of the `CLASSPATH` environment
> variable
> > > > > (along with much improved scripts in 2.0.0), and the latter being
> > > > > replaceable by a user-provided class loader using the built-in Java
> > > > > property, `java.system.class.loader`[5], at their discretion.
> > > > >
> > > > > # The Problem:
> > > > >
> > > > > The main problem with the current code is: complexity. Accumulo is
> > > > > already complex enough without needing to be in the business of
> > > > > developing and supporting complex custom class loading features,
> > > > > especially when users have viable alternatives that can be better
> > > > > supported by independent, dedicated projects. Furthermore, these
> > > > > custom class loaders also have a dependency on commons-vfs2, which
> > has
> > > > > been the source of numerous problems and bugs that we have needed
> to
> > > > > deal with, and which affect Accumulo, even though they are not
> > > > > necessarily bugs in Accumulo itself. This also brings in a lot of
> > > > > optional dependencies that aren't needed by users who don't rely on
> > > > > these features.
> > > > >
> > > > > # The Requirements:
> > > > >
> > > > > In spite of these problems, I believe we still want to enable the
> use
> > > > > cases that our classloaders are currently enabling.
> > > > >
> > > > > Specifically,
> > > > > 1) the ability to have separate contexts for iterator class
> isolation
> > > > &g

Re: [DISCUSS] Classloader change proposals

2020-09-22 Thread Dave Marion
[1] contains the initial set of changes to the Accumulo code base that
defines the new context class loader configuration and deprecates the
existing VFS ClassLoader objects. [2] contains the new
ReloadingVFSClassLoader that can be used as the system classloader and a
ClassLoaderFactory implementation for configuring contexts for tables and
scans. Both build successfully and I plan on doing some local testing next.
Feedback on the design and the code is welcome.

[1] https://github.com/dlmarion/accumulo/pull/2
[2] https://github.com/dlmarion/vfs-reloading-classloader

On Fri, Sep 18, 2020 at 12:19 PM Dave Marion  wrote:

>
>  I tend to agree with Marc on this. If we need to push out a fix for the
> new classloader, then we can do it as needed and not have to rely on
> another group of people to come to consensus on a release. Of course, we
> could maintain a fork of it in that case, but then what's the point? It
> appears that VFS does have some recent activity [1], but mostly by one
> person. I'm thinking that we should create a subproject for it and notify
> the commons-vfs project of its existence. If they want to copy it and
> include it in their project, then they can do that.
>
> [1] https://gitbox.apache.org/repos/asf?p=commons-vfs.git;a=shortlog
>
> On Wed, Sep 16, 2020 at 4:48 PM Christopher  wrote:
>
>> Only Marc joined, and we didn't talk about anything that isn't already in
>> this proposal, except he did mention how difficult it might be to try to
>> maintain the class loader in commons-vfs2, rather than as our own small
>> subproject, which is relatively easy.
>>
>> On Wed, Sep 16, 2020, 16:11 Dave Marion  wrote:
>>
>> > Did anyone join the call? Any notes?
>> >
>> > On Wed, Sep 16, 2020 at 12:44 PM Christopher 
>> wrote:
>> >
>> > > I just want to remind everybody that I'm available in Slack now to
>> > > discuss this in the ongoing video call I created in the #accumulo
>> > > room, if you want to join.
>> > >
>> > > On Mon, Sep 14, 2020 at 10:41 PM Christopher 
>> > wrote:
>> > > >
>> > > > Also, if anybody is interested in a live video conversation to
>> discuss
>> > > > this interactively, I intend to be on Slack on Wednesday afternoon
>> > > > (EDT) starting around noon.
>> > > >
>> > > > On Mon, Sep 14, 2020 at 5:30 PM Christopher 
>> > wrote:
>> > > > >
>> > > > > Hi Accumulo Devs,
>> > > > >
>> > > > > Lately, Dave Marion (Apache ID: dlmarion) has been working on
>> > > > > prototyping some new class loader concepts for Accumulo that he
>> and I
>> > > > > have discussed, and I wanted to pitch the idea here for
>> consideration
>> > > > > for the project.
>> > > > >
>> > > > > # Background:
>> > > > >
>> > > > > Accumulo currently has two classloaders that are instantiated at
>> > > > > startup, and which can be used to bootstrap Accumulo dependencies
>> (at
>> > > > > least, those not needed for the classloader code itself). This
>> allows
>> > > > > us to use the `general.classpaths`[1] and
>> > > > > `general.dynamic.classpaths`[2] properties, as well as the
>> > per-context
>> > > > > classloaders (`general.vfs.*`[3] and `table.classpath.context`[4])
>> > for
>> > > > > things like iterator class isolation. Since 2.0.0, we have
>> deprecated
>> > > > > `general.classpaths` and `general.dynamic.classpaths`, the former
>> > > > > supplanted by the better use of the `CLASSPATH` environment
>> variable
>> > > > > (along with much improved scripts in 2.0.0), and the latter being
>> > > > > replaceable by a user-provided class loader using the built-in
>> Java
>> > > > > property, `java.system.class.loader`[5], at their discretion.
>> > > > >
>> > > > > # The Problem:
>> > > > >
>> > > > > The main problem with the current code is: complexity. Accumulo is
>> > > > > already complex enough without needing to be in the business of
>> > > > > developing and supporting complex custom class loading features,
>> > > > > especially when users have viable alternatives that can be better
>> > > > > supported by independent, dedicated projects. Furthermore, these
>> >

Re: [DISCUSS] Classloader change proposals

2020-09-22 Thread Dave Marion
Sounds good.

On Tue, Sep 22, 2020, 1:37 PM Christopher  wrote:

> Based on the conversation and direction of this, I think it probably
> makes the most sense to just create a new repository under Accumulo
> for it to live.
>
> How about 'accumulo-classloaders' for the repo name? If this is okay,
> I can create it later today or tomorrow.
> This is just a repo name. This is just a place to collaborate on
> classloader code without constraining the scope of the repo's contents
> too much. The actual package names and artifactId can be different, if
> we want.
>
> On Tue, Sep 22, 2020 at 9:44 AM Dave Marion  wrote:
> >
> > [1] contains the initial set of changes to the Accumulo code base that
> > defines the new context class loader configuration and deprecates the
> > existing VFS ClassLoader objects. [2] contains the new
> > ReloadingVFSClassLoader that can be used as the system classloader and a
> > ClassLoaderFactory implementation for configuring contexts for tables and
> > scans. Both build successfully and I plan on doing some local testing
> next.
> > Feedback on the design and the code is welcome.
> >
> > [1] https://github.com/dlmarion/accumulo/pull/2
> > [2] https://github.com/dlmarion/vfs-reloading-classloader
> >
> > On Fri, Sep 18, 2020 at 12:19 PM Dave Marion 
> wrote:
> >
> > >
> > >  I tend to agree with Marc on this. If we need to push out a fix for
> the
> > > new classloader, then we can do it as needed and not have to rely on
> > > another group of people to come to consensus on a release. Of course,
> we
> > > could maintain a fork of it in that case, but then what's the point? It
> > > appears that VFS does have some recent activity [1], but mostly by one
> > > person. I'm thinking that we should create a subproject for it and
> notify
> > > the commons-vfs project of its existence. If they want to copy it and
> > > include it in their project, then they can do that.
> > >
> > > [1] https://gitbox.apache.org/repos/asf?p=commons-vfs.git;a=shortlog
> > >
> > > On Wed, Sep 16, 2020 at 4:48 PM Christopher 
> wrote:
> > >
> > >> Only Marc joined, and we didn't talk about anything that isn't
> already in
> > >> this proposal, except he did mention how difficult it might be to try
> to
> > >> maintain the class loader in commons-vfs2, rather than as our own
> small
> > >> subproject, which is relatively easy.
> > >>
> > >> On Wed, Sep 16, 2020, 16:11 Dave Marion  wrote:
> > >>
> > >> > Did anyone join the call? Any notes?
> > >> >
> > >> > On Wed, Sep 16, 2020 at 12:44 PM Christopher 
> > >> wrote:
> > >> >
> > >> > > I just want to remind everybody that I'm available in Slack now to
> > >> > > discuss this in the ongoing video call I created in the #accumulo
> > >> > > room, if you want to join.
> > >> > >
> > >> > > On Mon, Sep 14, 2020 at 10:41 PM Christopher  >
> > >> > wrote:
> > >> > > >
> > >> > > > Also, if anybody is interested in a live video conversation to
> > >> discuss
> > >> > > > this interactively, I intend to be on Slack on Wednesday
> afternoon
> > >> > > > (EDT) starting around noon.
> > >> > > >
> > >> > > > On Mon, Sep 14, 2020 at 5:30 PM Christopher <
> ctubb...@apache.org>
> > >> > wrote:
> > >> > > > >
> > >> > > > > Hi Accumulo Devs,
> > >> > > > >
> > >> > > > > Lately, Dave Marion (Apache ID: dlmarion) has been working on
> > >> > > > > prototyping some new class loader concepts for Accumulo that
> he
> > >> and I
> > >> > > > > have discussed, and I wanted to pitch the idea here for
> > >> consideration
> > >> > > > > for the project.
> > >> > > > >
> > >> > > > > # Background:
> > >> > > > >
> > >> > > > > Accumulo currently has two classloaders that are instantiated
> at
> > >> > > > > startup, and which can be used to bootstrap Accumulo
> dependencies
> > >> (at
> > >> > > > > least, those not needed for the classloader code itself). This
> > >> allows
> >

Re: [DISCUSS] Classloader change proposals

2020-10-05 Thread Dave Marion
I have tested the new classloader as the `java.system.class.loader`, and it
works allowing me to load classes from HDFS. The reloading feature works
too, except that the ClassLoader.loadClass() logic short circuits the
loading of the new classes with the same name by calling findLoadedClass
(which returns a cached Class object). I don't believe that we will be able
to retain the reloading feature that we currently have with
`general.dynamic.classpaths` (lib/ext) in this new classloader
implementation. I have not tested it yet, but I believe that the table
context class loader reloading mechanism will work.

I know that the table context classloader feature is being used. Does
anyone know of an instance where general.dynamic.classpath is still in use
and losing the reloading feature would be unacceptable?

On Thu, Sep 24, 2020 at 10:43 AM Christopher  wrote:

> The INFRA ticket is now closed, and the default branch has been updated.
> I also prepopulated the repo with some issue templates and GitHub
> Actions build configurations, as well as LICENSE/NOTICE and a pom.xml
> file to start it off.
>
> On Wed, Sep 23, 2020 at 10:31 AM Christopher  wrote:
> >
> > I created the repo, but waiting on
> > https://issues.apache.org/jira/browse/INFRA-20884 to fix the GitBox
> > default branch so that the GitHub issues can be enabled.
> >
> > It should be okay to add the initial code to it, though.
> >
> > On Tue, Sep 22, 2020 at 1:40 PM Dave Marion  wrote:
> > >
> > > Sounds good.
> > >
> > > On Tue, Sep 22, 2020, 1:37 PM Christopher  wrote:
> > >
> > > > Based on the conversation and direction of this, I think it probably
> > > > makes the most sense to just create a new repository under Accumulo
> > > > for it to live.
> > > >
> > > > How about 'accumulo-classloaders' for the repo name? If this is okay,
> > > > I can create it later today or tomorrow.
> > > > This is just a repo name. This is just a place to collaborate on
> > > > classloader code without constraining the scope of the repo's
> contents
> > > > too much. The actual package names and artifactId can be different,
> if
> > > > we want.
> > > >
> > > > On Tue, Sep 22, 2020 at 9:44 AM Dave Marion 
> wrote:
> > > > >
> > > > > [1] contains the initial set of changes to the Accumulo code base
> that
> > > > > defines the new context class loader configuration and deprecates
> the
> > > > > existing VFS ClassLoader objects. [2] contains the new
> > > > > ReloadingVFSClassLoader that can be used as the system classloader
> and a
> > > > > ClassLoaderFactory implementation for configuring contexts for
> tables and
> > > > > scans. Both build successfully and I plan on doing some local
> testing
> > > > next.
> > > > > Feedback on the design and the code is welcome.
> > > > >
> > > > > [1] https://github.com/dlmarion/accumulo/pull/2
> > > > > [2] https://github.com/dlmarion/vfs-reloading-classloader
> > > > >
> > > > > On Fri, Sep 18, 2020 at 12:19 PM Dave Marion 
> > > > wrote:
> > > > >
> > > > > >
> > > > > >  I tend to agree with Marc on this. If we need to push out a fix
> for
> > > > the
> > > > > > new classloader, then we can do it as needed and not have to
> rely on
> > > > > > another group of people to come to consensus on a release. Of
> course,
> > > > we
> > > > > > could maintain a fork of it in that case, but then what's the
> point? It
> > > > > > appears that VFS does have some recent activity [1], but mostly
> by one
> > > > > > person. I'm thinking that we should create a subproject for it
> and
> > > > notify
> > > > > > the commons-vfs project of its existence. If they want to copy
> it and
> > > > > > include it in their project, then they can do that.
> > > > > >
> > > > > > [1]
> https://gitbox.apache.org/repos/asf?p=commons-vfs.git;a=shortlog
> > > > > >
> > > > > > On Wed, Sep 16, 2020 at 4:48 PM Christopher  >
> > > > wrote:
> > > > > >
> > > > > >> Only Marc joined, and we didn't talk about anything that isn't
> > > > already in
> > > > > >> this proposal, except he did mention how difficult it might be
> 

Re: Which String deduplication option?

2021-02-08 Thread Dave Marion
String.intern() would seem to provide better coverage considering that some
users may not use the G1 collector.

On Mon, Feb 8, 2021 at 3:19 PM Keith Turner  wrote:

> Recently while running some large map reduce jobs I learned that
> Hadoop uses String.intern() in its RPC code (below is a link to an
> example on one place where Hadoop does this).  I learned this because
> when I ran jstack on NN, RM, and/or AM that were under distress
> sometimes I kept seeing RPC server threads that were in
> String.intern().  I never was quite sure if it was a problem though.
> Not saying String.intern() is bad or good, just sharing something I
> observed that I was uncertain about.
>
> May make sense to create some sort of stress test that could simulate
> the usage pattern of the TabletLocator and try the different options
> and see what happens.  If any long pauses or problems happen in the
> simulation, they may happen in the real environment.
>
>
> https://github.com/apache/hadoop/blob/ba631c436b806728f8ec2f54ab1e289526c90579/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-core/src/main/java/org/apache/hadoop/mapred/TaskStatus.java#L481
>
> https://github.com/apache/hadoop/blob/ba631c436b806728f8ec2f54ab1e289526c90579/hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/util/StringInterner.java#L67
>
> On Mon, Feb 1, 2021 at 9:55 PM Christopher  wrote:
> >
> > While code reviewing, I saw that
> > core/src/main/java/org/apache/accumulo/core/clientImpl/TabletLocator.java
> > was using a WeakHashMap to deduplicate some strings.
> >
> > This code can probably be removed in favor of one of the following two
> options:
> >
> > 1. Just explicitly use String.intern() - As of Java 7, there is no
> > longer a separate, fixed-size PermGen space, so intern'd strings will
> > be in the main heap, no longer constrained to a limited size pool.
> > These strings are still subject to garbage collection. It is
> > implemented as a HashMap internally (native implementation), with a
> > default bucket size of more than 60K, plenty big enough for the
> > interning that TabletLocator is doing... but this is configurable by
> > the user with JVM flags if it's not. Interning will use less memory as
> > WeakHashMap and similar performance, as long as the bucket size is big
> > enough.
> >
> > 2. Just use -XX:+UseStringDeduplication JVM flag - as of Java 9, G1 is
> > the new default Java garbage collector. This garbage collector has the
> > option to automatically attempt to deduplicate all strings behind the
> > scenes, by swapping out their underlying char arrays (so, it likely
> > won't affect == equality because the String object references
> > themselves won't change, unlike option 1). This is more passive than
> > option 1, but would apply to the entire JVM. G1GC also implements some
> > heuristics to prevent too much overhead.
> >
> > With both options, it's possible to output statistics.
> >
> > If I remove the WeakHashMap for the string deduplication in
> > TabletLocator, does anybody have an opinion on which option I should
> > replace it with? I'm leaning towards option 2 (adding it to
> > assemble/conf/accumulo-env.sh as one of the default flags).
>


External Compactions

2021-05-11 Thread Dave Marion
Keith and I have been working on a solution for issue #1451 - being able to
run major compactions outside the tablet server. This would enable
compactions to run when tables are offline, tablet servers die, and tablets
are balancing. We have created two pull requests, one for the code[1]
changes and another for the documentation[2] changes.

This change introduces two new optional components in the architecture. The
CompactionCoordinator is much like the Manager in that it is a singleton in
the system and it manages the state of external compactions across the
system. The CompactionCoordinator is started with the command:

bin/accumulo compaction-coordinator

The Compactor is the other optional component. There can be many
Compactor's running in the system and each Compactor runs one compaction at
a time. It communicates with the CompactionCoordinator to get information
about the next compaction that it needs to complete and to relay the status
of the compaction. The Compactor is started with the command:

bin/accumulo compactor -q 

The queueName parameter should match the name of the external queue set in
the compaction service options. This allows an administrator to define
different compaction services for tables, each with their own queue, and to
scale the number of Compactors differently. For example we can define a
compaction service named cs1, then create a table and configure it to use
the compaction service:

config -s
tserver.compaction.major.service.cs1.planner=org.apache.accumulo.core.spi.compaction.DefaultCompactionPlanner
config -s
'tserver.compaction.major.service.cs1.planner.opts.executors=[{"name":"all","externalQueue":"Q1"}]'
createtable test
config -t test -s
table.compaction.dispatcher=org.apache.accumulo.core.spi.compaction.SimpleCompactionDispatcher
config -t test -s table.compaction.dispatcher.opts.service=cs1

Compactions on table "test" will occur externally by starting the
CompactionCoordinator and Compactor with queueName "Q1".

With regards to testing, we have unit and integration tests.
ExternalCompactionIT has pretty decent coverage. We have also tested
locally with multiple Compactors using uno. We are hoping to perform a
cluster test soon, potentially deploying the Compactors using k8s and it's
horizontal pod scaler for a follow-on blog post. Please let us know if you
are interested in helping out with testing.


[1] https://github.com/apache/accumulo/pull/2096
[2] https://github.com/apache/accumulo-website/pull/282


Re: Accumulo 1.10.1 performance

2021-06-05 Thread Dave Marion
Jeremy,

  Are you able to share any details about the hardware and the Accumulo
configuration? Is the Accumulo/Hadoop configuration the same as the prior
test (no replication, WAL turned off, batch writer configuration, etc.)

Dave

On Sat, Jun 5, 2021 at 6:12 PM Kepner, Jeremy - LLSC - MITLL <
kep...@ll.mit.edu> wrote:

> Has anyone benchmarked Accumulo 1.10.1? I have been looking into repeating
> the measurements we did in 2014 with Accumulo 1.5 (
> https://arxiv.org/abs/1406.4923) using Accumulo 1.10.1 on a bigger system
> with more modern hardware.  Unfortunately, when I repeat the single node
> measurements, there is no performance improvement from having multiple
> ingestors inserting into different presplits of a table.  I get 120K
> inserts/sec with one ingestor and 2x60K inserts/sec with two ingestors.  In
> 2014 we got linear speedup to ~6 ingestors, providing ~600K inserts/sec on
> a single node.
>
> Regards.  -Jeremy


Re: Accumulo 1.10.1 performance

2021-06-07 Thread Dave Marion
Jeremy,

  It seems that you have the ability to quickly run your test to determine if a 
release is "good". Testing 1.8 ruled out a lot of commits for us to look at. 
Would it be possible for you to test a few others so that we can try and narrow 
it down even more. The following releases are after 1.8.0:

1.8.1
1.9.0
1.9.1
1.9.2
1.9.3
1.10.0
1.10.1

Could you test with the 1.9.2 release?


> On 06/05/2021 10:04 PM Kepner, Jeremy - LLSC - MITLL  
> wrote:
> 
>  
> I did a quick check with Accumulo 1.8 and I get the expected single node 
> performance scalability.
> So between Accumulo 1.8 and 1.10.1 something changed that significantly 
> slowed the performance.
> 
> > On Jun 5, 2021, at 8:48 PM, Kepner, Jeremy - LLSC - MITLL 
> >  wrote:
> > 
> > Hi Dave,
> >  I am looking into the Accumulo/Hadoop configuration.  Hopefully it is as 
> > simple as getting the settings the same. The hardware configurations is: 
> > Dual Xeon Platinum 8260 2.4 GHz 48 cores, DDR4 2.93 GHz 192 GB RAM.  I am 
> > looking into the disk specs, but that shouldn't matter since the writes are 
> > only a few megabytes.  I also just tested on some older hardware that is 
> > closer to what was used in the 2014 paper, and the single process ingest 
> > rate is ~8x slower.
> > 
> > Has anyone done any recent benchmarking of Accumulo 1.10+?
> > 
> > Regards.  -Jeremy
> > 
> > 
> >> On Jun 5, 2021, at 7:08 PM, Dave Marion  wrote:
> >> 
> >> Jeremy,
> >> 
> >> Are you able to share any details about the hardware and the Accumulo
> >> configuration? Is the Accumulo/Hadoop configuration the same as the prior
> >> test (no replication, WAL turned off, batch writer configuration, etc.)
> >> 
> >> Dave
> >> 
> >> On Sat, Jun 5, 2021 at 6:12 PM Kepner, Jeremy - LLSC - MITLL <
> >> kep...@ll.mit.edu> wrote:
> >> 
> >>> Has anyone benchmarked Accumulo 1.10.1? I have been looking into repeating
> >>> the measurements we did in 2014 with Accumulo 1.5 (
> >>> https://arxiv.org/abs/1406.4923) using Accumulo 1.10.1 on a bigger system
> >>> with more modern hardware.  Unfortunately, when I repeat the single node
> >>> measurements, there is no performance improvement from having multiple
> >>> ingestors inserting into different presplits of a table.  I get 120K
> >>> inserts/sec with one ingestor and 2x60K inserts/sec with two ingestors.  
> >>> In
> >>> 2014 we got linear speedup to ~6 ingestors, providing ~600K inserts/sec on
> >>> a single node.
> >>> 
> >>> Regards.  -Jeremy
> >


Re: new committer: Dominic Garguilo

2021-07-29 Thread Dave Marion
Congrats!

On Thu, Jul 29, 2021 at 2:34 PM Harjit Singh  wrote:

> Welcome !!!
>
> > On Jul 29, 2021, at 2:08 PM, Gall, Deeanna  wrote:
> >
> > Congrats!!!
> >
> > -Original Message-
> > From: Faulkner, Tanisha A 
> > Sent: Thursday, July 29, 2021 1:45 PM
> > To: dev@accumulo.apache.org
> > Subject: Re: new committer: Dominic Garguilo
> >
> > Congrats Dom!!
> >
> > Get Outlook for iOS
> 
> > From: Christopher 
> > Sent: Thursday, July 29, 2021 1:40:42 PM
> > To: accumulo-dev 
> > Subject: new committer: Dominic Garguilo
> >
> > [External Email]
> > 
> >
> > The Project Management Committee (PMC) for Apache Accumulo has invited
> Dominic Garguilo to become a committer and PMC member and we are pleased to
> announce that they have accepted.
> >
> > Dominic has been contributing various fixes and improvements to Accumulo
> since Fall 2020.
> >
> > Being a committer enables easier contribution to the project since there
> is no need to go via the patch submission process. This should enable
> better productivity. A PMC member helps manage and guide the direction of
> the project.
> >
> > Welcome, Dominic!
> >
> > 
> >
> > The preceding message (including attachments) is covered by the
> Electronic Communication Privacy Act, 18 U.S.C. sections 2510-2512, is
> intended only for the person or entity to which it is addressed, and may
> contain information that is confidential, protected by attorney-client or
> other privilege, or otherwise protected from disclosure by law. If you are
> not the intended recipient, you are hereby notified that any retention,
> dissemination, distribution, or copying of this communication is strictly
> prohibited. Please reply to the sender that you have received the message
> in error and destroy the original message and all copies.
>


Metrics Replacement

2021-09-21 Thread Dave Marion
There is a WIP pull request against 2.1.0-SNAPSHOT for replacing the Hadoop
Metrics2 framework with Micrometer[1]. Micrometer suggests using a naming
pattern[2] for the metrics internally where words are all lowercase
separated by a period. Micrometer output formats then rewrite the metric
names to the destination specific format. It's possible that we may not be
able to produce metrics in the same exact way as the Hadoop Metrics2
framework. Metrics are not part of the public API, but we do want to try
and retain as much backwards compatibility as possible. In the event that
we cannot get that compatibility it has been suggested that we document how
things are different. As I have limited knowledge of how the metrics are
being used today, I'm looking for some feedback from the community as to
how painful it would be if metric names changed in a minor release.

[1] https://micrometer.io/
[2] https://micrometer.io/docs/concepts#_naming_meters


Re: Metrics Replacement

2021-09-27 Thread Dave Marion
 uniform names and consistent naming conventions across our codebase as
> primary consideration and allow the reported names fall out from there.
> >
> > The configuration of each monitoring system will depend on the system
> chosen by the user.  We should provide a select set of examples (I advocate
> Prometheus, some flavor of statsd and logging) to guide users if one of
> those do not fit their requirements and they elect to use a different
> micrometer module / collection system.
> >
> > I agree that we should supply documentation mapping current names to
> their micrometer equivalents -  the specific name reported will be
> dependent on the conversions performed by the target system - but those
> should be documented in each module and is not within our scope.
> >
> > -Original Message-
> > From: Keith Turner 
> > Sent: Tuesday, September 21, 2021 5:07 PM
> > To: Accumulo Dev List 
> > Subject: Re: Metrics Replacement
> >
> > On Tue, Sep 21, 2021 at 3:45 PM Dave Marion  wrote:
> > >
> > > There is a WIP pull request against 2.1.0-SNAPSHOT for replacing the
> > > Hadoop
> > > Metrics2 framework with Micrometer[1]. Micrometer suggests using a
> > > naming pattern[2] for the metrics internally where words are all
> > > lowercase separated by a period. Micrometer output formats then
> > > rewrite the metric names to the destination specific format. It's
> > > possible that we may not be able to produce metrics in the same exact
> > > way as the Hadoop Metrics2
> >
> > Is it only the naming pattern that will cause incompatibility, or is it
> more than that?  Like would a timer, guage, etc in micrometer produce
> different information/metrics than a timer,gauge,etc in hadoop metrics?  I
> suspect these would differ and that would also impact compat.  Will the way
> in which accumulo is configured to report metrics also change?  I can't
> imagine it would be the same, but I have not looked at the PR.
> >
> > Can you provide an example of a naming incompat where it has to change?
> >
> > > framework. Metrics are not part of the public API, but we do want to
> > > try and retain as much backwards compatibility as possible. In the
> > > event that we cannot get that compatibility it has been suggested that
> > > we document how things are different. As I have limited knowledge of
> > > how the metrics are
> >
> > Is there a reasonable path to achieving compatibility?  If not, it seems
> like documenting what has changed is a good way to go.  Could possibly
> explain it in detail in the 2.1.0 release notes and have a link to that in
> the user manual.
> >
> > > being used today, I'm looking for some feedback from the community as
> > > to how painful it would be if metric names changed in a minor release.
> > >
> > > [1] https://micrometer.io/
> > > [2] https://micrometer.io/docs/concepts#_naming_meters
> >
>


Re: Accumulo quarterly report. Due 10/13/2021

2021-09-28 Thread Dave Marion
Typo in last line, "Jira to Gibhub"

On Tue, Sep 28, 2021 at 8:10 AM dev1  wrote:

> The Accumulo community quarterly report for October is due Wednesday
> 10/13/2021.  The community decided to publicly prepare the report on the
> dev mailing list.  Below is the current draft.
>
> Ed Coleman
>
> --- Draft report ---
>
> ## Description:
> The Apache Accumulo is a robust, scalable, distributed key/value store
> with cell-based access control and customizable server-side processing.
>
> ## Issues:
> There are no new issues requiring board attention.
>
> The trademark issue with http:www.accumulodata.com is still open.
> Although the domain owner does not have access to the domain registration,
> the domain appears to have automatically renewed, and the expiration is now
> 2022-06-28.  Email from the private list discussing this are at [1], [2]
> and [3]. No action has been required and allowing the domain to expire was
> deemed a viable option by Brand Management VP in Jan-2021 (private)[4] to
> minimize volunteer efforts.
>
> ## Membership Data:
> Apache Accumulo was founded 2012-03-21 (10 years ago)
> There are currently 40 committers and 40 PMC members in this project.
> The Committer-to-PMC ratio is 1:1.
>
> Community changes, past quarter:
> - Dominic Garguilo was added to the PMC on 2021-07-29
> - Dominic Garguilo was added as committer on 2021-07-29
>
> ## Project Activity:
> No new releases this reporting period. Last release dates:
> - accumulo-2.0.1 was released on 2020-12-24.
> - accumulo-1.10.1 was released on 2020-12-22.
>
> Project activity on the next release remains active with significant
> improvements to the current baseline. The remaining issues are being
> actively worked.
>
> ## Community Health:
> Overall community health is good and GitHub activity remains consistent.
>
> - Community participation remains healthy with discussions on the mailing
> lists and GitHub issues and pull-requests.
> - Accumulo continues to transition from Jira to GibHub issues. Jira
> activity reflects transition to using GitHub issues as obsolete issues are
> closed and open issues are transitioned to GitHub issues.
>
>
> ## Links
> (private) [1]:
> https://lists.apache.org/thread.html/r8c8ef5575b14accb6fc00d670764a313b91d76033f761c6e5c7eb29d%40%3Cprivate.accumulo.apache.org%3E
> (private) [2]:
> https://lists.apache.org/thread.html/514d3cf9162e72f4aa13be1db5d6685999fc83755695308a529de4d6@%3Cprivate.accumulo.apache.org%3E
> (private) [3]:https://lists.apache.org/thread.html/rcc8c07db43222e0
> 8b9992fd739b8f24d18569ba9af3decfdb52c4a3e%40%3Cprivate.accumulo.apache.org
> %3E
> (private) [4]:https://lists.apache.org/thread.html/r408e3eed907e3ad
> 24a7c84b5247f51973a4c965c891b01215e45ee17%40%3Cprivate.accumulo.apache.org
> %3E
> ~
>


Re: Metrics Replacement

2021-10-04 Thread Dave Marion
Thanks for the information Ed. I updated my test[1] to use a different type
of registry and the output seems closer to what Hadoop is putting out.
Here's the Hadoop output again:

1633347999547 ctx.record: Context=ctx, ProcessName=testProcess, counter=1,
gauge=2, QuantileNumI/O=0, Quantile50thPercentileQuantile=0,
Quantile75thPercentileQuantile=0, Quantile90thPercentileQuantile=0,
Quantile95thPercentileQuantile=0, Quantile99thPercentileQuantile=0,
StatNumI/O=10, StatAvgStat=10.0, StatStdevStat=31.622776601683793,
StatIMinStat=3.4028234663852886E38, StatIMaxStat=1.401298464324817E-45,
StatMinStat=3.4028234663852886E38, StatMaxStat=1.401298464324817E-45,
StatINumI/O=10

Here's the output from the new test (which prints to stdout):

gauge
 value = 2.0
quantile
 count = 1
   min = 32
   max = 32
  mean = 32.00
stddev = 0.00
median = 32.00
  75% <= 32.00
  95% <= 32.00
  98% <= 32.00
  99% <= 32.00
99.9% <= 32.00
counter
 count = 1
 mean rate = 0.10 events/second
 1-minute rate = 0.18 events/second
 5-minute rate = 0.20 events/second
15-minute rate = 0.20 events/second
stat
 count = 1
 mean rate = 0.10 calls/second
 1-minute rate = 0.20 calls/second
 5-minute rate = 0.20 calls/second
15-minute rate = 0.20 calls/second
   min = 0.01 seconds
   max = 0.01 seconds
  mean = 0.01 seconds
stddev = 0.00 seconds
median = 0.01 seconds
  75% <= 0.01 seconds
  95% <= 0.01 seconds
  98% <= 0.01 seconds
  99% <= 0.01 seconds
99.9% <= 0.01 seconds

[1] https://gist.github.com/dlmarion/67e0ed8df320633d5af23ae00d965183

On Mon, Sep 27, 2021 at 6:24 PM dev1  wrote:

> The reporting of the rate vs the absolute count is likely because the
> logging registry is currently implemented using a StepRegistry (
> https://javadoc.io/doc/io.micrometer/micrometer-core/latest/io/micrometer/core/instrument/step/StepMeterRegistry.html
> )
>
> "Registry that step-normalizes counts and sums to a rate/second over the
> publishing interval"
>
> The counter will, under the covers, just have a counter - the registry is
> going to report the measured value according to the target metrics system.
>
>   1. Based on the Micrometer output, it appears that even if we can get
> the names to match (or document appropriately), users may still have to
> change their tooling based on the values that are being reported.
>
> This unfortunately seems likely to happen - but we should be able to
> explain what is being reported (or even better refer to external docs) -
> the creation of a micrometer instrumentation meter allows for a description
> so we should be able to either automate the description gathering or
> provide a self-describing set of metrics.  We would need to provide a
> manual mapping of old / new names.
>
> Some systems (like Prometheus) will create descriptive statistics from the
> raw measurements.  If a metric has valid reason to report useful summary
> statistics, then another meter may be a better fit (either a micrometer
> Timer or DistributionSummary) There is a memory cost for accumulating
> summary statistics so it may not always be appropriate for every metric.
>
>   2. It's possible that we could take a different approach, where we
> continue to use Hadoop Metrics2 internally and attempt to write a
> Micrometer sink for the Metrics2 framework for 2.x and move to Micrometer
> for the next major release. Based on the Hadoop JIRA, it does not appear
> that they have plans to move away from this framework.
>
> In my opinion, this would not be worth the effort.
>
> Ed Coleman
>
> 
> From: Dave Marion 
> Sent: Monday, September 27, 2021 4:52 PM
> To: dev@accumulo.apache.org 
> Subject: Re: Metrics Replacement
>
> I created a test[1] to see the differences in the output. In this test I
> create equivalent metric objects and output them via their respective
> logging sink.
>
> For Hadoop Metrics, it created:
>
> 1632775059897 ctx.record: Context=ctx, ProcessName=testProcess, counter=1,
> gauge=2, QuantileNumI/O=0, Quantile50thPercentileLatency=0,
> Quantile75thPercentileLatency=0, Quantile90thPercentileLatency=0,
> Quantile95thPercentileLatency=0, Quantile99thPercentileLatency=0,
> StatNumI/O=10, StatAvgLatency=10.0, StatStdevLatency=31.622776601683793,
> StatIMinLatency=3.4028234663852886E38,
> StatIMaxLatency=1.401298464324817E-45,
> StatMinLatency=3.4028234663852886E38, StatMaxLatency=1.401298464324817E-45,
> StatINumI/O=10
>
> For Micrometer, it 

Re: [DISCUSS] Version number of next release?

2021-10-21 Thread Dave Marion
I'd like to make the case for staying with 2.1. My main motivation for this
is the slow speed at which users upgrade and the perceived risk of the
number 3.0 (vs 2.1). I think users would see "3.0" and think that it would
require a lot more work to upgrade to it than "2.1". Moving to 3.0 has a
consequence in that we have publicized that we version using semver rules,
so bumping the major version implies that there is a breaking change in the
API. I can't speak to the amount of testing that users perform, but it will
give them a sense that their application needs to be updated/fixed to work
with the new version. We have recently seen this[1].

Technically, I think there is one issue[2] in the client API that would
mandate us using 3.0 as the next version, but I think it could be easily
fixed. As for some of the other changes in main currently, some of them
will not affect existing systems as they are new optional features (e.g.
external compactions) and some of them will only affect existing systems
that use the feature (e.g. metrics, tracing, accumulo-cluster script
changes). Regarding cluster admin changes (master -> manager, scripts,
etc), the release notes should highlight the change and point to more
information (user guide, GitHub issue, etc.) on how admins should adjust.

It would be great to get feedback from users and downstream integrators in
general, and specifically on things like upgrading, as it would help make
the "right" choice here.

[1]
https://lists.apache.org/thread.html/rfaab4a8fa2b98df3d41dabae4c63a35a75d9cd293de8c1cd28f83eb3%40%3Cdev.accumulo.apache.org%3E
[2] https://the-asf.slack.com/archives/CERNB8NDC/p1634819762001300


On Thu, Oct 21, 2021 at 10:49 AM Keith Turner  wrote:

> If we were to move to 3.0, it would be nice to reach consensus on the
> specific reasons why we are doing it.  If this were to include
> dropping deprecated APIs it would be nice to identify which APIs would
> be dropped and why before deciding.  This way people can make an
> informed decision about supporting the move to 3.0.  If the reasons to
> move to 3.0 are decided after the fact, it's hard for me to know if I
> support the move.
>
> For example, we could try to obtain consensus on a list like the
> following as the reasons for moving to 3.0 before moving to 3.0.
>
>  * Metrics incompatibility
>  * Tracing incompatibility
>  * accumulo-cluster script incompatibility
>  * Possible incompatibility due to master -> manager renaming
>  * Dropping API x.y.z because ...
>  * Dropping API a.b.c because ...
>
> This would also serve as good documentation for the release notes
> eventually, listing the explicit reasons that 3.0 was chosen.
>
> Also I agree that we do need to work towards getting a 2.1 or 3.0
> release done sooner rather than later.
>
> On Wed, Oct 20, 2021 at 1:36 AM Christopher  wrote:
> >
> > We wouldn't *have* to remove additional deprecations if we did name it
> > 3.0, but it might be a good opportunity to do some cleanup for some
> > stuff that deprecated prior to 2.0, but left in there to ease the
> > transition to 2.0. Then again, removing anything else might make the
> > transition from 1.10 LTM to 3.0 LTM more challenging.
> >
> > Unless we find a clear compatibility issue in our public API that
> > forces us to bump to 3.0 because of semver, I'd be okay with either
> > version, so long as we make a decision. I do think the substantial
> > metrics/property name/tracing changes are compelling reasons to go to
> > 3.0, because even if they don't cause problems with our public API,
> > the changes may still cause headaches for sysadmins.
> >
> > On Tue, Oct 19, 2021 at 8:16 PM Ed Coleman  wrote:
> > >
> > > I stared a general thread concerning topics for the next release. One
> major topic raised was what should the next version number be?  I stared
> this thread so that version discussions can occur in a single thread for
> continuity.  From the general email thread:
> > >
> > > Version number:  There have been substantial changes since 2.0 was
> released.   The next version was expected to be 2.1, but with the number
> and the scope of changes that have been made and some that are in the
> pipeline, maybe we should signal this with a major version bump to 3.0?
> > >
> > > -   With semver, we might be able to go either way, depending on
> interpretation.
> > > -   With the adoption of LTM releases, whatever the next version
> is numbered, it will be a LTM release candidate.
> > > -   There have been over 800 changes committed.
> > > -   Notable major changes:
> > >oName changes to inclusive language (Manager instead of
> Master,…)
> > >oEnabling external compactions.
> > >oChanges in the storage of properties in ZooKeeper to reduce
> watchers (in progress, issues #1225, #1809)
> > >oChange tracing to use OpenTracing instead of HTrace (PR #2259)
> > >oChange metrics to use micrometer.io instead of
> Hadoop-metrics2 (PR #2305)
> > >   

Re: 1.10 <-> 2.0 shim

2021-10-22 Thread Dave Marion
I created a simple maven pom file that when run with *mvn clean package* will
generate a report that shows the differences in the public API between
2.0.0 and 1.10.0.

https://gist.github.com/dlmarion/b1063c334d519f637cc78d81ba9e15ef

On Wed, Oct 20, 2021 at 6:20 PM Jeremy Kepner  wrote:

> Seeme like there should be document that is kept whereby everytime a
> breaking change is made it gets documented at the time it is committed.
>
> On Wed, Oct 20, 2021 at 04:55:42PM -0400, Christopher wrote:
> > If somebody were to volunteer to create such a document, they could do so
> > from some of the many 3rd party java API comparison tools. I'm not sure
> > which tool would work best for this purpose, though.
> >
> > If anybody does this, let us know which one worked best for you. We could
> > also amend the release notes with whatever you find. That could be
> useful.
> >
> > On Wed, Oct 20, 2021, 15:14 Jeremy Kepner  wrote:
> >
> > > There should be a document that clearly states 1.10 functions will not
> > > work in 2.0
> > > so folks can grep their code to check.  Otherwise you have to install
> 2.0
> > > and then just
> > > work through the errors one-by-one.
> > >
> > > On Tue, Oct 19, 2021 at 11:17:09AM -0400, Christopher wrote:
> > > > The best reference is the release notes:
> > > > https://accumulo.apache.org/release/accumulo-2.0.0/
> > > >
> > > > On Tue, Oct 19, 2021, 09:15 Jeremy Kepner  wrote:
> > > >
> > > > > Is there a list of things in 1.10 that will no longer work in 2.0.
> > > > >
> > > > > On Tue, Oct 19, 2021 at 08:59:58AM -0400, Christopher wrote:
> > > > > > Hi Vincent,
> > > > > >
> > > > > > To supplement what Mike said, it's possible some stuff that was
> > > > > > deprecated in 1.10 was dropped in 2.0. I don't have a
> comprehensive
> > > > > > list of what that might include, but anything marked as
> deprecated in
> > > > > > 1.10 is subject to removal in 2.0. If I recall, we did try to
> limit
> > > it
> > > > > > somewhat. It wouldn't really make sense to create a shim to
> restore
> > > > > > those APIs, though, because that would just reintroduce code we
> > > > > > explicitly dropped, which defeats the purpose of a major version
> > > bump.
> > > > > > In semantic versioning, the entire point of a major version bump
> is
> > > to
> > > > > > declare a break in the backwards compatibility of the public API.
> > > > > >
> > > > > > If you need the code that was dropped, you probably aren't ready
> to
> > > > > > move to 2.x. 1.10 is an LTM release, so that means we intend to
> keep
> > > > > > patching important bugs until a year after our next LTM (which
> hasn't
> > > > > > yet been released). So, if you need to stay on 1.10, you have
> plenty
> > > > > > of time to update your code to stop using deprecated APIs and
> avoid
> > > > > > non-public APIs.
> > > > > >
> > > > > > On Tue, Oct 19, 2021 at 8:10 AM Mike Miller 
> > > wrote:
> > > > > > >
> > > > > > > If the library was written using only the public API then it
> > > shouldn't
> > > > > be a
> > > > > > > problem. See https://accumulo.apache.org/api/
> > > > > > > Accumulo follows SemVer to maintain compatibility of the
> public API
> > > > > between
> > > > > > > versions. There are a lot of changes between 1.10 and 2.0 but
> > > anything
> > > > > in
> > > > > > > the public API in 1.10 should still exist in 2.0, even if
> > > deprecated.
> > > > > > > If the library is calling internal methods or extending
> internal
> > > > > classes,
> > > > > > > then that is a different story. If it uses internals then I
> > > recommend
> > > > > > > refactoring to use the public API if possible.
> > > > > > >
> > > > > > >
> > > > > > > On Mon, Oct 18, 2021 at 3:38 PM Vincent Russell <
> > > > > vincent.russ...@gmail.com>
> > > > > > > wrote:
> > > > > > >
> > > > > > > > Hello,
> > > > > > > >
> > > > > > > > I am interested in using an accumulo query and storage
> library
> > > that
> > > > > was
> > > > > > > > written against accumulo version 1.10 and I am interested in
> > > using
> > > > > it with
> > > > > > > > accumulo 2.0.
> > > > > > > >
> > > > > > > > Is there a shim that exists that would allow the library to
> be
> > > used
> > > > > for
> > > > > > > > both versions that could be activated at compile time via a
> maven
> > > > > profile
> > > > > > > > or something?
> > > > > > > >
> > > > > > > > Thanks,
> > > > > > > > Vincent
> > > > > > > >
> > > > >
> > >
>


Re: [DISCUSS] The current state of replication and the way forward?

2021-10-29 Thread Dave Marion
https://github.com/apache/accumulo/pull/2335 has been created to deprecate
the replication classes, properties, etc. and fix-up the references to them.

On Wed, Oct 20, 2021 at 1:29 AM Christopher  wrote:

> For reference, our last conversation about the state of replication
> was
> https://lists.apache.org/thread.html/ra65ecbfcdb26af2672b7a064d313c0db0285b7d9f228c09559a14842%40%3Cdev.accumulo.apache.org%3E
> ; in that, I tried to make the community aware of the issues involving
> the long-running and frequently broken ITs that were becoming a burden
> and interfering with progress in other areas of our code. After that
> discussion, we disabled the consistently failing tests, with a call
> for somebody to volunteer to pick up the maintenance burden. Since
> that discussion, nobody has volunteered.
>
> I do think we need to:
> 1. Communicate to users the current state, so they don't have high
> expectations for its reliability when we know differently, and
> 2. Make a plan to deprecate and remove the feature (as it currently
> exists, anyway), from Accumulo, in order to prevent the technical debt
> and tight coupling to critical WAL code from inhibiting other
> development work in Accumulo.
>
> We can do #1 by updating the properties for the feature to
> Experimental and/or Deprecated. Both states are reversible if the
> status quo changes, but I think it's important users aren't misled
> into thinking the feature is more stable and well-maintained than we
> know it to be.
>
> For #2, I think it would be okay to deprecate it in the next minor
> release, and remove it in the next major release after that. Again,
> the deprecated state can be reversed if the status quo substantially
> changes.
>
>
> On Tue, Oct 19, 2021 at 8:19 PM Ed Coleman  wrote:
> >
> > I stared a general thread concerning topics for the next release. One
> major topic raised was the state of replication and trying to determine if
> there is consensus for a way forward.  I stared this thread so that
> replication discussions can occur in a single thread for continuity.  From
> the general email thread:
> >
> > It is hard to know what the state of replication is and maybe we need to
> mark it as either experimental or deprecated to convey that to users. The
> replication tests have been unstable and failing with transient errors and
> have been removed from the regular build process – this reduced the
> automated build time by over 2 hours.   A recent example is
> accumulo-testing issue #164 (
> https://github.com/apache/accumulo-testing/issues/164) Without the test
> running regularly, it is hard to state with any confidence that replication
> works reliably in a production environment.   This should not be
> interpreted as advocating that we remove replication at this point, but we
> need a way forward. Maybe someone volunteers to examine the tests and fixes
> them so that they run reliably and in a reasonable time, or maybe we begin
> to explore other approaches – for example, maybe some  kind of NiFi
> connector or something else entirely.  I really don’t know, but it seems we
> need to clearly communicate so
> >  mething to any users that may be using or considering using replication
> in the next release the current state and to signal possible future
> intentions.
>


Re: Accumulo 2.0.1 init with hdfs running with SSL

2021-12-15 Thread Dave Marion
Is that datanode configured correctly? I wonder why it's excluded.

On Wed, Dec 15, 2021 at 9:45 AM Vincent Russell 
wrote:

> Thank you Christopher,
>
> I was able to determine that the ssl settings in core-site.xml are being
> picked up and used.   In fact when accumulo init is run, accumulo is able
> to create the /accumulo directory in HDFS.What is weird is that when
> the FileSKVWriter that is used when createMetadataFile is called it
> throws an exception when close is called.
>
>
> I get:
>
> org.apache.hadoop.ipc.RemoteException(java.io.IOException): File
> /accumulo/tables/!0/table_info/0_1.rf could only be written to the 0 of the
> 1 minReplication nodes.  There are 1 datanode(s) running and 1 node(s) are
> excluded in this operation.
> at
>
> org.apache.hadoop.hdfs.server.blockmanagement.BlockManager.chooseTarget4NewBlock(BlockManager.java:1720)
> at
>
> org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getAdditionalBlock(FSNamesystem.java:3389)
> at
>
> org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.addBlock(NameNodeRpcServer.java:683)
> at
>
> org.apache.hadoop.hdfs.server.namenode.AuthorizationProviderProxyClientProtocol.addBlock(AuthorizationProviderProxyClientProtocol.java:214)
> at
>
> org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolServerSideTranslatorPB.addBlock(ClientNamenodeProtocolServerSideTranslatorPB.java:495)
> at
>
> org.apache.hadoop.hdfs.protocol.proto.ClientNamenodeProtocolProtos$ClientNamenodeProtocol$2.callBlockingMethod(ClientNamenodeProtocolProtos.java)
> at
>
> org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:617)
> at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:1073)
> at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2217)
> at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2213)
> at java.security.AccessController.doPrivileged(Native Method)
> at javax.security.auth.Subject.doAs(Subject.java:422)
> at
>
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1917)
> at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2211)
>
> I don't get this error when I disable ssl on hadoop.
>
> Any insight would be greatly appreciated.
>
> Thanks,
>
> On Tue, Dec 14, 2021 at 2:23 PM Vincent Russell  >
> wrote:
>
> > Thanks Chris.
> >
> > Yes I do get an error (I can't remember now because it's on a separate
> > computer) during the init and I get a MagicNumber exception on the
> datanode
> > during this process which says something like maybe encryption isn't
> turned
> > on.
> >
> > but let me make sure that the core-default.xml and core-site.xml are on
> > the classpath.  They may not be.
> >
> > Thanks again.
> >
> > On Tue, Dec 14, 2021 at 2:13 PM Christopher  wrote:
> >
> >> I have not personally tested HDFS configured for SSL/TLS, but `new
> >> Configuration()` will load the core-default.xml and core-site.xml
> >> files it finds on the class path. So, it looks like it should work.
> >> Have you tried it? Did you get an error?
> >>
> >>
> >> On Tue, Dec 14, 2021 at 1:54 PM Vincent Russell
> >>  wrote:
> >> >
> >> > Thank you Mike,
> >> >
> >> > but it appears that accumulo uses those settings to connect accumulo,
> >> but
> >> > not to connect to hdfs.
> >> >
> >> > For instance the VolumeManagementImpl just does this:
> >> >
> >> > VolumeConfiguration.create(new Path(volumeUriOrDir), hadoopConf));
> >> >
> >> > where the hadoopConf is just instantiated in the Initialize class:
> >> >
> >> > Configuration hadoopConfig = new Configuration();
> >> > VolumeManager fs = VolumeManagerImpl.get(siteConfig, hadoopConfig);
> >> >
> >> > Thanks,
> >> > Vincent
> >> >
> >> > On Tue, Dec 14, 2021 at 12:18 PM Mike Miller 
> >> wrote:
> >> >
> >> > > Checkout the accumulo client properties that start with the "ssl"
> >> prefix.
> >> > >
> https://accumulo.apache.org/docs/2.x/configuration/client-properties
> >> > > This blog post from a few years ago may help:
> >> > >
> >> > >
> >>
> https://accumulo.apache.org/blog/2014/09/02/generating-keystores-for-configuring-accumulo-with-ssl.html
> >> > >
> >> > > On Tue, Dec 14, 2021 at 9:58 AM Vincent Russell <
> >> vincent.russ...@gmail.com
> >> > > >
> >> > > wrote:
> >> > >
> >> > > > Hello,
> >> > > >
> >> > > > I am trying to init a test accumulo instance with an hdfs running
> >> with
> >> > > > SSL.Is this possible?  I am looking at the code and it doesn't
> >> look
> >> > > > like this is possible.
> >> > > >
> >> > > > The Initialize class just instantiates a Hadoop config and passes
> >> that
> >> > > into
> >> > > > the VolumeManager without sending over any hadoop configs from the
> >> > > core.xml
> >> > > > file.
> >> > > >
> >> > > > Am I missing something?
> >> > > >
> >> > > > Thanks in advance for your help,
> >> > > > Vincent
> >> > > >
> >> > >
> >>
> >
>


Re: Accumulo 2.0.1 init with hdfs running with SSL

2021-12-15 Thread Dave Marion
I just came across https://github.com/apache/thrift/pull/2482 and thought
of this thread. Not sure if you will run into this or not...

On Wed, Dec 15, 2021 at 11:44 AM Vincent Russell 
wrote:

> Dave,
>
> I'm investigating if the datanode is configured correctly.  I don't see any
> issue in the logs and via the name node web UI the node is live...so it
> looks to be fine.
>
> Mike,
> I have only been remote debugging when accumulo init is run.. not running
> custom code.
>
> Thanks,
>
> On Wed, Dec 15, 2021 at 11:32 AM Mike Miller  wrote:
>
> > When you say "the FileSKVWriter that is used when createMetadataFile is
> > called it" is this code that you have extended or are calling through a
> > client? If you are using the FileSKVWriter directly, then it may not have
> > the configuration properly passed to it. That interface is not in the
> > public API and should avoid being used.
> >
> > On Wed, Dec 15, 2021 at 10:45 AM Dave Marion 
> wrote:
> >
> > > Is that datanode configured correctly? I wonder why it's excluded.
> > >
> > > On Wed, Dec 15, 2021 at 9:45 AM Vincent Russell <
> > vincent.russ...@gmail.com
> > > >
> > > wrote:
> > >
> > > > Thank you Christopher,
> > > >
> > > > I was able to determine that the ssl settings in core-site.xml are
> > being
> > > > picked up and used.   In fact when accumulo init is run, accumulo is
> > able
> > > > to create the /accumulo directory in HDFS.What is weird is that
> > when
> > > > the FileSKVWriter that is used when createMetadataFile is called it
> > > > throws an exception when close is called.
> > > >
> > > >
> > > > I get:
> > > >
> > > > org.apache.hadoop.ipc.RemoteException(java.io.IOException): File
> > > > /accumulo/tables/!0/table_info/0_1.rf could only be written to the 0
> of
> > > the
> > > > 1 minReplication nodes.  There are 1 datanode(s) running and 1
> node(s)
> > > are
> > > > excluded in this operation.
> > > > at
> > > >
> > > >
> > >
> >
> org.apache.hadoop.hdfs.server.blockmanagement.BlockManager.chooseTarget4NewBlock(BlockManager.java:1720)
> > > > at
> > > >
> > > >
> > >
> >
> org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getAdditionalBlock(FSNamesystem.java:3389)
> > > > at
> > > >
> > > >
> > >
> >
> org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.addBlock(NameNodeRpcServer.java:683)
> > > > at
> > > >
> > > >
> > >
> >
> org.apache.hadoop.hdfs.server.namenode.AuthorizationProviderProxyClientProtocol.addBlock(AuthorizationProviderProxyClientProtocol.java:214)
> > > > at
> > > >
> > > >
> > >
> >
> org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolServerSideTranslatorPB.addBlock(ClientNamenodeProtocolServerSideTranslatorPB.java:495)
> > > > at
> > > >
> > > >
> > >
> >
> org.apache.hadoop.hdfs.protocol.proto.ClientNamenodeProtocolProtos$ClientNamenodeProtocol$2.callBlockingMethod(ClientNamenodeProtocolProtos.java)
> > > > at
> > > >
> > > >
> > >
> >
> org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:617)
> > > > at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:1073)
> > > > at
> org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2217)
> > > > at
> org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2213)
> > > > at java.security.AccessController.doPrivileged(Native Method)
> > > > at javax.security.auth.Subject.doAs(Subject.java:422)
> > > > at
> > > >
> > > >
> > >
> >
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1917)
> > > > at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2211)
> > > >
> > > > I don't get this error when I disable ssl on hadoop.
> > > >
> > > > Any insight would be greatly appreciated.
> > > >
> > > > Thanks,
> > > >
> > > > On Tue, Dec 14, 2021 at 2:23 PM Vincent Russell <
> > > vincent.russ...@gmail.com
> > > > >
> > > > wrote:
> > >

Re: Accumulo quarterly report. Due 1/12/2022

2022-01-10 Thread Dave Marion
Looks good.

On Mon, Jan 10, 2022 at 9:30 AM dev1  wrote:

> The Accumulo community quarterly report for January is due Wednesday
> 1/12/2022. The community decided to publicly prepare the report on the dev
> mailing list. Below is the current draft.  This is a simple cut'n'paste
> from the report wizard, so actual formatting may be different in the report.
>
> Ed Coleman
>
> --- Draft Report ---
>
> ## Description:
> The Apache Accumulo is a robust, scalable, distributed key/value store with
> cell-based access control and customizable server-side processing.
>
> ## Issues:
> There are no new issues requiring board attention.
>
> The trademark issue with http:www.accumulodata.com is still open and
> there has
> been no change since the last report. The core of the issue is the domain
> owner
> does not have access to the domain registration, the domain appears to have
> automatically renewed, and the expiration is now 2022-06-28. Email from the
> private list discussing this are at [1], [2] and [3]. No action has been
> required and allowing the domain to expire was deemed a viable option by
> Brand
> Management VP in Jan-2021 (private)[4] to minimize volunteer efforts.
>
> ## Membership Data:
> Apache Accumulo was founded 2012-03-21 (10 years ago)
> There are currently 40 committers and 40 PMC members in this project.
> The Committer-to-PMC ratio is 1:1.
>
> Community changes, past quarter:
> - No new PMC members. Last addition was Dominic Garguilo on 2021-07-29.
> - No new committers. Last addition was Dominic Garguilo on 2021-07-29.
>
> ## Project Activity:
> - The current development branch is up to date with the latest log4j2
> release.
>   Previous versions do not use log4j2 and are not impacted by the latest
>   reported vulnerabilities.
> - No new releases this reporting period. Project activity on the next
> release
>   remains active with significant improvements to the current baseline. The
>   remaining issues are being actively worked. Last release dates:
>- accumulo-2.0.1 was released on 2020-12-24.
>- accumulo-1.10.1 was released on 2020-12-22.
>
>
> ## Community Health:
> Overall community health is good and GitHub activity remains consistent.
> - Accumulo participated in the Hacktoberfest 2021.
> - Community participation remains healthy with discussions on the mailing
> lists
> and GitHub issues and pull-requests.
> - Accumulo continues to transition from Jira to GitHub issues. Jira
> activity
> reflects transition to using GitHub issues as obsolete issues are closed
> and
> open issues are transitioned to GitHub issues.
>
> ## Links
> (private) [1]:
> https://lists.apache.org/thread.html/r8c8ef5575b14accb6fc00d670764a313b91d76033f761c6e5c7eb29d%40%3Cprivate.accumulo.apache.org%3E
> (private) [2]:
> https://lists.apache.org/thread.html/514d3cf9162e72f4aa13be1db5d6685999fc83755695308a529de4d6@%3Cprivate.accumulo.apache.org%3E
> (private) [3]:
> https://lists.apache.org/thread.html/rcc8c07db43222e08b9992fd739b8f24d18569ba9af3decfdb52c4a3e%40%3Cprivate.accumulo.apache.org%3E
> (private) [4]:
> https://lists.apache.org/thread.html/r408e3eed907e3ad24a7c84b5247f51973a4c965c891b01215e45ee17%40%3Cprivate.accumulo.apache.org%3E
>


Re: [DISCUSS] 1.10.2 release with reload4j

2022-02-03 Thread Dave Marion
I'd like to try and include https://github.com/apache/accumulo/pull/2221. A
little more testing needs to be done, do you have a schedule for the 1.10.2
release?

On Thu, Feb 3, 2022 at 1:55 PM Christopher  wrote:

> I'm interested in putting together a 1.10.2 release with the changes in
> https://github.com/apache/accumulo/pull/2458 so that the 1.10 line no
> longer requires log4j1, which has several vulnerabilities. Reload4j was
> created as a fork from log4j1 from Apache by its original author in order
> to provide a transition away from the CVE-riddled log4j1 jars.
>
> I'm sure we have a couple of other small bugfixes and improvements in 1.10
> that could benefit from being released as well.
>
> If there are any objections or last-minute tweaks that should be included
> in 1.10.2, please discuss here.
>
> Thanks,
> Christopher
>


Re: 2.1 Release TODO

2022-04-01 Thread Dave Marion
I think it would be useful to do some release planning so that we know what
features we are working towards and in which release they will be in. This
would be helpful for determining what existing PRs need to make it into
2.1.0. 2.1.0 is the LTM release, so patches for existing features will be
backported (2.1.1, 2.1.2, 2.1.3, etc.) However, as defined in [1], features
that don't make it into 2.1.0 will go into the next non-LTM release (2.2.0)
and any patches to bugs in those features will go into the next non-LTM
release after that (2.3.0).

I'm not trying to hold up the 2.1.0 release by suggesting that we perform
this activity. I'm just asking what the future holds, even if it's just one
feature in the next non-LTM release. My concern is that the next release
will be open-ended and anything not included in 2.1.0 might not get put
into a release for a very long time.

[1] https://accumulo.apache.org/contributor/versioning.html#LTM


On Thu, Mar 31, 2022 at 11:43 AM Mike Miller  wrote:

> Starting an email chain of things that folks want to finish for 2.1. Here
> is what we currently have in the works that are most likely going into 2.1:
> https://github.com/apache/accumulo/pull/2569
> https://github.com/apache/accumulo/pull/2600
> https://github.com/apache/accumulo/pull/2293
>
> Some things that may go into 2.1:
> https://github.com/apache/accumulo/pull/2422
> https://github.com/apache/accumulo/pull/2475
> https://github.com/apache/accumulo/pull/2197
>
> I created a Project for follow on work to the ZK property change. I was
> planning on putting tasks in there that we want to complete for 2.1. But we
> could also use it for post 2.1 work.
> https://github.com/apache/accumulo/projects/24
> https://github.com/apache/accumulo/issues/2469
>
> FYI a draft copy of the release notes has already been on the website:
> https://accumulo.apache.org/release/accumulo-2.1.0/
>
> This may be a good thread to discuss whether or not a task needs to go into
> 2.1 or should wait for the next version. We currently have 32 open pull
> requests so please email me if there is one that you would like prioritized
> for 2.1.
>


Re: 2.1 Release TODO

2022-04-04 Thread Dave Marion
I think [3] is OBE and can be closed.

On Mon, Apr 4, 2022 at 9:11 AM Mike Miller  wrote:

> Yes I agree, that was the goal of this email thread. I found a few more
> tickets that should be addressed for the next release.
>
> Ivan - There was some work done on this PR but it has been some time. Do
> you want to take a look at it? Implement a Thread limit. [1]
> Keith T - I think we should get this one merged to fix that consistency
> check bug I found. It looks like it is finished. [2]
> Dave & Dom - Were you guys able to figure out a fix for the new external
> compaction metrics test? [3]
>
> FYI we have 6 blockers for 2.1:
> https://github.com/apache/accumulo/labels/blocker
>
> This is almost definitely going into 2.1 [4]. Thanks Jeff!
>
> [1] https://github.com/apache/accumulo/pull/1487
> [2] https://github.com/apache/accumulo/pull/2574
> [3] https://github.com/apache/accumulo/issues/2406
> [4] https://github.com/apache/accumulo/pull/2215
>
> On Fri, Apr 1, 2022 at 2:21 PM Dave Marion  wrote:
>
> > I think it would be useful to do some release planning so that we know
> what
> > features we are working towards and in which release they will be in.
> This
> > would be helpful for determining what existing PRs need to make it into
> > 2.1.0. 2.1.0 is the LTM release, so patches for existing features will be
> > backported (2.1.1, 2.1.2, 2.1.3, etc.) However, as defined in [1],
> features
> > that don't make it into 2.1.0 will go into the next non-LTM release
> (2.2.0)
> > and any patches to bugs in those features will go into the next non-LTM
> > release after that (2.3.0).
> >
> > I'm not trying to hold up the 2.1.0 release by suggesting that we perform
> > this activity. I'm just asking what the future holds, even if it's just
> one
> > feature in the next non-LTM release. My concern is that the next release
> > will be open-ended and anything not included in 2.1.0 might not get put
> > into a release for a very long time.
> >
> > [1] https://accumulo.apache.org/contributor/versioning.html#LTM
> >
> >
> > On Thu, Mar 31, 2022 at 11:43 AM Mike Miller  wrote:
> >
> > > Starting an email chain of things that folks want to finish for 2.1.
> Here
> > > is what we currently have in the works that are most likely going into
> > 2.1:
> > > https://github.com/apache/accumulo/pull/2569
> > > https://github.com/apache/accumulo/pull/2600
> > > https://github.com/apache/accumulo/pull/2293
> > >
> > > Some things that may go into 2.1:
> > > https://github.com/apache/accumulo/pull/2422
> > > https://github.com/apache/accumulo/pull/2475
> > > https://github.com/apache/accumulo/pull/2197
> > >
> > > I created a Project for follow on work to the ZK property change. I was
> > > planning on putting tasks in there that we want to complete for 2.1.
> But
> > we
> > > could also use it for post 2.1 work.
> > > https://github.com/apache/accumulo/projects/24
> > > https://github.com/apache/accumulo/issues/2469
> > >
> > > FYI a draft copy of the release notes has already been on the website:
> > > https://accumulo.apache.org/release/accumulo-2.1.0/
> > >
> > > This may be a good thread to discuss whether or not a task needs to go
> > into
> > > 2.1 or should wait for the next version. We currently have 32 open pull
> > > requests so please email me if there is one that you would like
> > prioritized
> > > for 2.1.
> > >
> >
>


Re: Scan Server discussion [WAS: Re: 2.1 Release TODO]

2022-04-04 Thread Dave Marion
I understand the desire to see less coupling for the optional features, but
getting to that point for ScanServers (and less so for ExternalCompactions)
would be a ton of work I think. The concern that I brought up in the "2.1
Release TODOs" thread regarding planning has not been addressed. If there
was a defined path forward, then that might make it easier to see how this
feature gets added in the near-future in whatever form it takes.

Regarding the concern about the readiness of the feature branch, Keith is
doing a last pass review on the draft and then I believe we are ready to
take it out of draft state. I think it will be before the end of this week.
We have added six new integration tests and we have done some local and
cluster testing.

Regarding the concern mentioned above, "availability of time to review/test
such a big feature without delaying 2.1," I didn't realize that we had a
schedule.  Does it matter if it takes 2/4/6/8 weeks to test the 1000+
completed issues in this release? I know that we want to finish up the
2.1.0 release, but is there a target date?

On Mon, Apr 4, 2022 at 12:32 PM Christopher  wrote:

> On Mon, Apr 4, 2022 at 11:50 AM Keith Turner  wrote:
> >
> > On Mon, Apr 4, 2022 at 11:17 AM Christopher  wrote:
> > >
> > > However, I'm reluctant to include #2422, because I don't think it's
> near
> > > ready enough, and by the time it is, it will be very last minute, and I
> > > don't want to delay 2.1 further for it. Even if it's included as an
> > > experimental feature, I think it has huge potential to be disruptive,
> or to
> > > have a lot of churn by the time people actually have a chance to
> review it
> > > thoroughly. Furthermore, I think there are possible alternatives (like
> a
> > > fully client-side implementation, based on offline scanners) that would
> > > avoid the tight coupling of a new service to Accumulo's core code. This
> >
> > There are some advantages to scan servers over direct file access to
> > consider.  One is scalability of computation, if a web server is
> > serving N client queries with scan servers those can potentially go to
> > different scan servers.  With direct file access, all N queries and
> > their iterator stacks would have to run in the web server.  Another is
> > scalability of caching/memory.  When web servers send queries to scan
> > servers using a sticky algorithm for assigning tablets to groups of
> > scan servers, it could lead to good cache utilization and sharing that
> > may not be possible when running scans directly in the web server. So
> > scan servers allow scaling cache and computations for queries
> > independently of web servers in way that may not be possible with
> > direct file access.
> >
> > Another advantage to consider is isolation.  With direct file access
> > and queries running directly in a web server, a bad query could bring
> > down a web server and lots of unrelated queries.  Having a bad query
> > bring down a scan server may be less disruptive.
> >
>
> I've forked this thread into its own discussion with a new subject
> line, because, as I suggested in my original reply, my intent was not
> to hijack the 2.1 planning thread with a discussion of the ScanServer
> implementation details.
>
> I'm fine with all those benefits (even if all the "could" and "may"
> were turned into concrete "will"). My objection is not an objection to
> the feature. It's an objection to including the feature in 2.1, based
> on:
>
> * readiness of the feature branch,
> * availability of time to review/test such a big feature without delaying
> 2.1,
> * its tight coupling to the core code in the implementation, and
> * the possibility that solutions may exist with the above benefits
> that are less tightly coupled has not yet been explored.
>
> I would be more okay with including it if:
>
> * it is ready,
> * it has been tested and reviewed by the wider community,
> * its coupling to the core Accumulo code is loosened, ideally if it's
> designed to use only API/SPI, and could be released as a separate,
> optional add-on. This might require improvements to API/SPI to expose
> the features needed to help it function. This could also be done by
> sub-classing the AccumuloClient. My concern here is the risk of
> technical debt and the extra maintenance costs of increased complexity
> for optional features that go unmaintained.
>
> We've been hurt by premature inclusion of optional/experimental
> features before that were rushed to release. No matter how awesome the
> feature is... if it's niche and optional, we should consider these
> risks and work to mitigate them. Otherwise, we'll be stuck with the
> technical debt for years to come. With a little bit of caution, we can
> make the feature available, without rushing, to satisfy the use case
> while reducing the risks.
>
> Also, one point of clarification: when I say "fully client side", I
> only mean relative to Accumulo, not necessarily in the client process.
> I'm lacking vocabulary

Re: [DISCUSS] Draft Accumulo quarterly report - due Wednesday 4/13.

2022-04-08 Thread Dave Marion
LGTM.

On Fri, Apr 8, 2022 at 12:57 PM dev1  wrote:

> The Accumulo quarterly report is due Wednesday, April 13, 2022.  .  The
> community decided to publicly prepare the report on the dev mailing list.
> Below is the current draft.
>
> (note: This is a cut-n-paster from the report wizard, so there may be
> formatting differences that will not appear when the report is submitted
> via the apache reporting tool.)
>  draft report ---
>
> ## Description:
> The Apache Accumulo is a robust, scalable, distributed key/value store with
> cell-based access control and customizable server-side processing.
>
> ## Issues:
> There are no new issues requiring board attention.
>
> The trademark issue with http:www.accumulodata.com is still open.
> Although the
> domain owner does not have access to the domain registration, the domain
> appears to have automatically renewed, and the expiration is now
> 2022-06-28.
> Email from the private list discussing this are at [1], [2] and [3]. No
> action
> has been required and allowing the domain to expire was deemed a viable
> option
> by Brand Management VP in Jan-2021 (private)[4] to minimize volunteer
> efforts.
>
>
> ## Membership Data:
> Apache Accumulo was founded 2012-03-20 (10 years ago)
> There are currently 40 committers and 40 PMC members in this project.
> The Committer-to-PMC ratio is 1:1.
>
> Community changes, past quarter:
> - No new PMC members. Last addition was Dominic Garguilo on 2021-07-28.
> - No new committers. Last addition was Dominic Garguilo on 2021-07-29.
>
> ## Project Activity:
> Project activity on the next release remains active with significant
> improvements to the current baseline. The remaining issues are being
> actively
> worked. Currently, Accumulo is targeting a June release of version 2.1.
> Current
> 2.1 progress is discussed in this thread [5] and includes:
>
>   - 15 pull requests that are currently in progress.
>   - 32 pull requests that are open as TODO. But a lot of these will get
> bumped
> to the next version.
>   - 1,025 pull requests have been merged.
>
> ## Community Health:
> Overall community health is good and GitHub activity remains consistent.
>
> - Community participation remains healthy with discussions on the mailing
> lists
>   and GitHub issues and pull-requests.
> - Accumulo continues to transition from Jira to GitHub issues. Jira
> activity
>   reflects transition to using GitHub issues as obsolete issues are closed
> and
>   open issues are transitioned to GitHub issues.
>
> ## Links
> (private) [1]:
> https://lists.apache.org/thread.html/r8c8ef5575b14accb6fc00d670764a313b91d76033f761c6e5c7eb29d%40%3Cprivate.accumulo.apache.org%3E
> (private) [2]:
> https://lists.apache.org/thread.html/514d3cf9162e72f4aa13be1db5d6685999fc83755695308a529de4d6@%3Cprivate.accumulo.apache.org%3E
> (private) [3]:
> https://lists.apache.org/thread.html/rcc8c07db43222e08b9992fd739b8f24d18569ba9af3decfdb52c4a3e%40%3Cprivate.accumulo.apache.org%3E
> (private) [4]:
> https://lists.apache.org/thread.html/r408e3eed907e3ad24a7c84b5247f51973a4c965c891b01215e45ee17%40%3Cprivate.accumulo.apache.org%3E
> [5]: https://lists.apache.org/thread/0nx7ml312v13chdk6xgcwn0vryr5v0xc
>
>


Re: Major compactions during map reduce

2022-04-18 Thread Dave Marion
Major compactions should not move rows to new tablets, but a tablet split
could. Are you using the new MapReduce API introduced in 2.0? Are you
setting it to use an isolated scan?

On Mon, Apr 18, 2022 at 3:01 PM Vincent Russell 
wrote:

> Hello All,
>
> Could major compactions that occur while a map reduce job is running cause
> the map reduce job to miss records because rows have been moved to a
> different tablet?
>
> How does this work?
>
> I'm using accumulo 2.0.1
>
> Thank you,
> Vincent
>


Re: Major compactions during map reduce

2022-04-19 Thread Dave Marion
I was initially thinking about the case where the splits change between the
job setup and the Map execution, but given more thought I think I went down
the wrong path. Tablet splitting should not affect the overall range of
keys for the MR job. If a Tablet splits after the job computes the splits,
but before the Map is run, then that Map will just scan multiple tablets.

On Tue, Apr 19, 2022 at 5:33 AM Christopher  wrote:

> Isolation should only give you consistency within a row, to ensure you're
> not scanning over partial changes from a mutation that is currently being
> written to a row. It shouldn't have anything to do with compactions or
> missing data that has already been written before the MapReduce scan has
> started.
>
> Splits shouldn't cause you to miss data either. It's been awhile since I
> looked, but I believe the MapReduce APIs simply break up a table into
> separate ranges to scan based on current tablet boundaries. If there are
> splits, then all that means is that some of the ranges will span across
> more than one tablet, but that's fine... a scan is a scan... scans don't
> need to be limited to a single tablet.
>
> Compactions could cause missed data if they transform the data in some way,
> but otherwise, I wouldn't expect them to.
>
> Are you seeing any error messages anywhere?
>
> On Mon, Apr 18, 2022, 15:23 Vincent Russell 
> wrote:
>
> > Hi Dave,
> >
> > Yes we are using the new MapReduce API, but we are not setting any
> > settings for isolated scan so we are using whatever the default is.
> >
> > Thanks,
> > Vincent
> >
> > On Mon, Apr 18, 2022 at 3:12 PM Dave Marion  wrote:
> >
> > > Major compactions should not move rows to new tablets, but a tablet
> split
> > > could. Are you using the new MapReduce API introduced in 2.0? Are you
> > > setting it to use an isolated scan?
> > >
> > > On Mon, Apr 18, 2022 at 3:01 PM Vincent Russell <
> > vincent.russ...@gmail.com
> > > >
> > > wrote:
> > >
> > > > Hello All,
> > > >
> > > > Could major compactions that occur while a map reduce job is running
> > > cause
> > > > the map reduce job to miss records because rows have been moved to a
> > > > different tablet?
> > > >
> > > > How does this work?
> > > >
> > > > I'm using accumulo 2.0.1
> > > >
> > > > Thank you,
> > > > Vincent
> > > >
> > >
> >
>


Re: Intro

2022-04-27 Thread Dave Marion
Welcome! And +1 for hearing more about how you are using Accumulo. I
visited the Ghost's website earlier this morning - awesome stuff.

On Wed, Apr 27, 2022 at 3:59 PM Christopher  wrote:

> Hi Nikita!
>
> Welcome to our community. I'm curious to hear more about how Ghost is
> using Accumulo. Have you considered giving a presentation at the
> upcoming ApacheCon this year? I think they are still accepting
> submissions (https://apachecon.com/acna2022/cfp.html) and that sounds
> like it would make an interesting presentation.
>
> On Wed, Apr 27, 2022 at 3:51 PM Nikita S  wrote:
> >
> > Hi Accumulo Devs,
> >
> > Was suggested I drop a note introducing myself after my last PR. :-)
> >
> > I'm Nikita; I work for Ghost ; we are using
> > Accumulo to store driving data and train models for driving cars. We hope
> > to contribute to the community with bug fixes and extensions as
> > appropriate.
> >
> > It was a great experience getting our first patch in; thanks for the help
> > and looking forward to future collaboration!
>


Re: Intro

2022-05-03 Thread Dave Marion
See https://accumulo.apache.org/contact-us/#slack, there is an invite link.

On Tue, May 3, 2022 at 12:32 PM Nikita S  wrote:

> Tried to log in to the slack channel and got "nik...@thesirohis.com
> doesn’t
> have an account on this workspace." Do I need an invite or something?
>
> On Mon, May 2, 2022 at 4:57 AM Mike Miller  wrote:
>
> > Welcome! Don't hesitate to ask questions on this dev list or on our Slack
> > channel.
> > https://the-asf.slack.com/messages/CERNB8NDC
> >
> >
> > On Sat, Apr 30, 2022 at 6:41 PM Nikita S  wrote:
> >
> > > Thanks for the warm welcome and the link!
> > >
> > > We are still fairly early in our Accumulo journey; still coming up to
> > speed
> > > in many areas and tailoring to our use cases. Probably premature to
> > present
> > > in depth. But we're confident that given what Accumulo is capable of,
> > there
> > > will be good content for us to share with the community in the future.
> We
> > > will continue to keep our eyes out for how we can engage, and excited
> to
> > do
> > > so!
> > >
> > > On Wed, Apr 27, 2022 at 1:04 PM Dave Marion 
> wrote:
> > >
> > > > Welcome! And +1 for hearing more about how you are using Accumulo. I
> > > > visited the Ghost's website earlier this morning - awesome stuff.
> > > >
> > > > On Wed, Apr 27, 2022 at 3:59 PM Christopher 
> > wrote:
> > > >
> > > > > Hi Nikita!
> > > > >
> > > > > Welcome to our community. I'm curious to hear more about how Ghost
> is
> > > > > using Accumulo. Have you considered giving a presentation at the
> > > > > upcoming ApacheCon this year? I think they are still accepting
> > > > > submissions (https://apachecon.com/acna2022/cfp.html) and that
> > sounds
> > > > > like it would make an interesting presentation.
> > > > >
> > > > > On Wed, Apr 27, 2022 at 3:51 PM Nikita S 
> > > wrote:
> > > > > >
> > > > > > Hi Accumulo Devs,
> > > > > >
> > > > > > Was suggested I drop a note introducing myself after my last PR.
> > :-)
> > > > > >
> > > > > > I'm Nikita; I work for Ghost <https://www.driveghost.com/>; we
> are
> > > > using
> > > > > > Accumulo to store driving data and train models for driving cars.
> > We
> > > > hope
> > > > > > to contribute to the community with bug fixes and extensions as
> > > > > > appropriate.
> > > > > >
> > > > > > It was a great experience getting our first patch in; thanks for
> > the
> > > > help
> > > > > > and looking forward to future collaboration!
> > > > >
> > > >
> > >
> >
>


Re: Intro

2022-05-03 Thread Dave Marion
It has been done.

On Tue, May 3, 2022 at 1:28 PM Christopher  wrote:

> The slack invite process changed. We have to explicitly invite people now.
> I can do it.
>
> On Tue, May 3, 2022, 12:33 Dave Marion  wrote:
>
> > See https://accumulo.apache.org/contact-us/#slack, there is an invite
> > link.
> >
> > On Tue, May 3, 2022 at 12:32 PM Nikita S  wrote:
> >
> > > Tried to log in to the slack channel and got "nik...@thesirohis.com
> > > doesn’t
> > > have an account on this workspace." Do I need an invite or something?
> > >
> > > On Mon, May 2, 2022 at 4:57 AM Mike Miller  wrote:
> > >
> > > > Welcome! Don't hesitate to ask questions on this dev list or on our
> > Slack
> > > > channel.
> > > > https://the-asf.slack.com/messages/CERNB8NDC
> > > >
> > > >
> > > > On Sat, Apr 30, 2022 at 6:41 PM Nikita S 
> > wrote:
> > > >
> > > > > Thanks for the warm welcome and the link!
> > > > >
> > > > > We are still fairly early in our Accumulo journey; still coming up
> to
> > > > speed
> > > > > in many areas and tailoring to our use cases. Probably premature to
> > > > present
> > > > > in depth. But we're confident that given what Accumulo is capable
> of,
> > > > there
> > > > > will be good content for us to share with the community in the
> > future.
> > > We
> > > > > will continue to keep our eyes out for how we can engage, and
> excited
> > > to
> > > > do
> > > > > so!
> > > > >
> > > > > On Wed, Apr 27, 2022 at 1:04 PM Dave Marion 
> > > wrote:
> > > > >
> > > > > > Welcome! And +1 for hearing more about how you are using
> Accumulo.
> > I
> > > > > > visited the Ghost's website earlier this morning - awesome stuff.
> > > > > >
> > > > > > On Wed, Apr 27, 2022 at 3:59 PM Christopher  >
> > > > wrote:
> > > > > >
> > > > > > > Hi Nikita!
> > > > > > >
> > > > > > > Welcome to our community. I'm curious to hear more about how
> > Ghost
> > > is
> > > > > > > using Accumulo. Have you considered giving a presentation at
> the
> > > > > > > upcoming ApacheCon this year? I think they are still accepting
> > > > > > > submissions (https://apachecon.com/acna2022/cfp.html) and that
> > > > sounds
> > > > > > > like it would make an interesting presentation.
> > > > > > >
> > > > > > > On Wed, Apr 27, 2022 at 3:51 PM Nikita S <
> nik...@thesirohis.com>
> > > > > wrote:
> > > > > > > >
> > > > > > > > Hi Accumulo Devs,
> > > > > > > >
> > > > > > > > Was suggested I drop a note introducing myself after my last
> > PR.
> > > > :-)
> > > > > > > >
> > > > > > > > I'm Nikita; I work for Ghost <https://www.driveghost.com/>;
> we
> > > are
> > > > > > using
> > > > > > > > Accumulo to store driving data and train models for driving
> > cars.
> > > > We
> > > > > > hope
> > > > > > > > to contribute to the community with bug fixes and extensions
> as
> > > > > > > > appropriate.
> > > > > > > >
> > > > > > > > It was a great experience getting our first patch in; thanks
> > for
> > > > the
> > > > > > help
> > > > > > > > and looking forward to future collaboration!
> > > > > > >
> > > > > >
> > > > >
> > > >
> > >
> >
>


Re: Behavior of Fates on Failed Compactions

2022-07-07 Thread Dave Marion
I think FaTE ensures that the transaction is started and it waits for it to
finish. It must be the case that a failure is not being propagated back up
to fail the transaction. Are you seeing FaTE restarting the same compaction
over and over again, or are the multiple IN_PROGRESS transactions from
different compactions (my guess is the latter)? It would be interesting to
see if the Iterator Test Harness[1,2] exposes the issue in your iterator.
You can delete the FaTE transactions, but you will need to shut down the
Manager (Master) to do so.

[1]
https://accumulo.apache.org/1.10/accumulo_user_manual.html#_iterator_testing
[2]
https://accumulo.apache.org/docs/2.x/development/development_tools#iterator-test-harness

On Wed, Jul 6, 2022 at 10:59 PM Christopher  wrote:

> The behavior in case of error is likely undefined, so I'm not entirely
> surprised it's behaving this way. There may be things we can do to try to
> handle errors more gracefully for user initiated compactions when an
> iterator throws an exception, but it's definitely a good idea to write
> custom iterators in a way that tries to handle its own errors as much as
> possible.
>
> On Wed, Jul 6, 2022, 20:42 Logan Jones  wrote:
>
> > Thanks Chris for the quick reply. I'll explain the behavior I'm seeing,
> and
> > then maybe you all could either confirm this is the intended behavior, or
> > decide it's maybe not that great.
> >
> > My understanding of the happy case for running a user-initiated
> compaction
> > is that a fate/transaction gets created in zookeeper, and the Accumulo
> > master node ends up farming off the compactions to the correct tablet
> > servers, once the tablets have been completed, somehow the
> > fates/transactions in zookeeper get cleaned up.
> >
> > I experienced a problem, however, in the unhappy case for compactions
> which
> > I have since reproduced. We had a custom iterator configured for a table,
> > and that custom iterator was in a bad state (i.e. it was always throwing
> an
> > exception during initialization). What we noticed is that the fates are
> > indefinitely stuck IN_PROGRESS and never go away in this case.
> Effectively
> > we have a poison pill, and if you issue too many compactions against that
> > table, you can cause other bad problems.
> >
> > I created a repo to demonstrate the problem as succinctly as I could
> > manage:
> >
> > https://github.com/loganasherjones/accumulo-iterator-failures
> >
> > I thought initially that maybe it was due to the fact that our iterator
> was
> > throwing an error during initialization, but this appears to be happening
> > for any error on next, seek, or init calls.
> >
> > So my questions are
> >
> > 1. Is it expected that a failure in a seek, next, or init in an iterator
> > during a user-initiated compaction would cause accumulo to non-stop retry
> > the compaction
> > 2. If so, could you help me understand why?
> >
> > Thanks in advance,
> >
> > - Logan
> >
> >
> >
> > On Wed, Jul 6, 2022 at 6:31 PM Christopher  wrote:
> >
> > > Yes, either here (especially if it's related to a bug or proposed code
> > > change) or at user@ would work, if it's more of a user question. Here
> is
> > > fine if you're not sure.
> > >
> > > On Wed, Jul 6, 2022, 16:35 Logan Jones  wrote:
> > >
> > > > Hello:
> > > >
> > > > I would like to discuss what happens when iterators cause
> > user-initiated
> > > > compactions to fail, specifically in relation to the fate
> transactions.
> > > Is
> > > > this the right list for this discussion?
> > > >
> > > > Thanks,
> > > >
> > > > - Logan
> > > >
> > >
> >
>


Re: {DISCUSS} Accumulo Quarterly Report - due Wed 7/13

2022-07-08 Thread Dave Marion
Looks good.

On Fri, Jul 8, 2022 at 10:08 AM dev1  wrote:

> The Accumulo quarterly report is due Wednesday, July 13, 2022.  .  The
> community decided to publicly prepare the report on the dev mailing list.
> Below is the current draft. (note: This is a cut-n-paste from the report
> wizard, so there may be formatting differences that will not appear when
> the report is submitted via the apache reporting tool.)
>
>  draft report ---
>
> ## Description:
> The Apache Accumulo is a robust, scalable, distributed key/value store with
> cell-based access control and customizable server-side processing.
>
> ## Issues:
> There are no new issues requiring board attention.
>
> The trademark issue with http:www.accumulodata.com is still open and there
> have been no changes since last few reports. The domain owner does not have
> access to the domain registration and the expiration is now 2022-06-28. No
> action has been required and allowing the domain to expire was deemed a
> viable
> option by Brand Management VP in Jan-2021 as discussed in (private)[1].
>
> ## Membership Data:
> Apache Accumulo was founded 2012-03-20 (10 years ago)
> There are currently 40 committers and 40 PMC members in this project.
> The Committer-to-PMC ratio is 1:1.
>
> Community changes, past quarter:
> - No new PMC members. Last addition was Dominic Garguilo on 2021-07-28.
> - No new committers. Last addition was Dominic Garguilo on 2021-07-29.
>
> ## Project Activity:
> Project activity on the next release remains active with significant
> improvements to the current baseline and the remaining issues are being
> actively worked. Currently, Accumulo is closing in on a release of version
> 2.1
> [2] and as discussed in thread [3]. There are two major issues that are
> currently in review that are desired to be included in a 2.1 release, see
> [4]
> and [5] for progress on those issues.
>
>
> ## Community Health:
> Overall community health is good and GitHub activity remains consistent.
>
> - Community participation remains healthy with discussions on the mailing
> lists
> and GitHub issues and pull-requests.
> - Accumulo has transitioned from Jira to GitHub issues. Jira activity
> reflects clean up of obsolete issues. All new activity uses GitHub issues.
>
> ## Links
> (private)[1]
> https://lists.apache.org/thread/d999tzdwns8mgptfjm8z3o167ngjj899
> [2] https://github.com/apache/accumulo/projects/3
> [3] https://lists.apache.org/thread/0nx7ml312v13chdk6xgcwn0vryr5v0xc
> [4] https://github.com/apache/accumulo/pull/2665
> [5] https://github.com/apache/accumulo/pull/2197
>


Re: Re: Re: Re: Question about Accumulo Tracer

2022-07-29 Thread Dave Marion
https://github.com/apache/accumulo/pull/2119 fixed a bug in 2.1 where the
-a argument was not correctly setting the hostname. It looks like 2.0.1 is
affected by this too.

On Fri, Jul 29, 2022 at 12:57 PM Christopher  wrote:

> The tracer should be advertising its own address in ZK. By default, the
> server listens on `0.0.0.0`, unless `-a` or `--address` is specified on the
> command-line when it is started. Most server types use a utility class that
> will use `InetAddress.getLocalHost().getCanonicalHostName()` for the
> advertisement address if it sees that it is listening on `0.0.0.0`.
> However, it looks like the tracer doesn't do this.
>
> I would try to verify that `0.0.0.0` actually appears as the advertised
> address in ZK for the tracer service, just be sure it's not a
> ZooTraceClient bug on the tablet server side (but I'm pretty sure it's not,
> after looking at the code).
>
> As a workaround, I would start up the tracer service using a script that
> sets `--address accumulo_tracer` on the command-line, where
> "accumulo_tracer" is the hostname or IP address that you want the tracer
> listening on and advertising to other servers. This address should be
> bindable where the tracer is running, and reachable from wherever the
> tservers are (so make sure it's not a private address only available inside
> the container running the tracer). The reason for this is that the hostname
> will resolve to an IP address inside the tracer process, and then the
> tracer will advertise the IP address. It won't advertise the original
> hostname, if you specified a hostname.
>
> I hope that helps.
>
>
> On Fri, Jul 29, 2022 at 12:18 PM kma  wrote:
>
> > Nice,
> >
> > Turn on debug really help.
> >
> > 2022-07-29 12:12:14,235 [tracer.ZooTraceClient] DEBUG: Scanning trace
> > hosts in zookeeper: /tracers
> > 2022-07-29 12:12:14,240 [tracer.ZooTraceClient] DEBUG: Trace hosts:
> > [0.0.0.0:12234, 0.0.0.0:12234]
> > 2022-07-29 12:12:14,240 [tracer.ZooTraceClient] DEBUG: Successfully
> > initialized tracer hosts from ZooKeeper
> >
> > [root@accumulo_gc /]# curl 0.0.0.0:12234
> > curl: (7) Failed connect to 0.0.0.0:12234; Connection refused
> >
> > [root@accumulo_gc /]# curl accumulo_tracer:12234
> > curl: (52) Empty reply from server
> >
> > Is there a way to set Trace hosts to accumulo_tracer:12234
> >
> > Cheers!...
> > ...Keith
> >
> >
> > On 2022/07/28 23:40:18 Christopher wrote:
> >  > Do all servers have the same configuration?
> >  >
> >  > I would investigate the tablet server debug logs to determine if it's
> >  > having trouble setting up tracing. It should be able to locate the
> > tracer
> >  > service by talking to ZooKeeper and observing its service address
> >  > advertisement, similar to how other servers register themselves in
> >  > ZooKeeper. I'm not a docker network expert, but whatever service
> address
> >  > the tracer service is advertising there should be routable from the
> > tablet
> >  > servers.
> >  >
> >  > On Thu, Jul 28, 2022 at 7:31 PM kma  wrote:
> >  >
> >  > > Thanks again Christopher,
> >  > >
> >  > > Our environment is a little different from usual.
> >  > >
> >  > > We have accumulo tracer running in it's own docker container and the
> >  > > other accumulo services (e.g. gc, master, tserver, etc.) are also
> >  > > running in different docker containers. We also have kerberos
> enabled.
> >  > >
> >  > > The following is our trace related configurations
> >  > >
> >  > > default | trace.port.client . | 12234
> >  > > default | trace.span.receivers .. |
> >  > > org.apache.accumulo.tracer.ZooTraceClient
> >  > > default | trace.table . | trace
> >  > > default | trace.token.type  |
> >  > > org.apache.accumulo.core.client.security.tokens.PasswordToken
> >  > > site | @override .. |
> >  > > org.apache.accumulo.core.client.security.tokens.KerberosToken
> >  > > default | trace.user .. | root
> >  > > site | @override .. |
> >  > > accumulo-tra...@dev.phemi.com
> >  > > default | trace.zookeeper.path  | /tracers
> >  > >
> >  > > And we can confirm that `trace on` and `trace off` works well in the
> >  > > accumulo tracer container where the tracer process is running.
> >  > >
> >  > > However, `trace on` and `trace off` does not work in any other
> >  > > containers. This is probably why we don't see compaction trace
> > messages
> >  > > in the trace table. And it's likely because these containers don't
> > know
> >  > > where the tracer service is running ?
> >  > >
> >  > > Question: How does the other accumulo services know where tracer
> > service
> >  > > is running ?
> >  > > Question: Is there a way to configure the tracer host where the
> tracer
> >  > > service is running ?
> >  > >
> >  > > Cheers!...
> >  > > ...Keith
> >  > >
> >  > >
> >  > > On 2022/07/22 14:58:07 Christopher wrote:
> >  > > > I would double check your trace cred

Re: Re: Re: Re: Question about Accumulo Tracer

2022-07-29 Thread Dave Marion
Ok, I wasn't sure what version was being used here. I saw a reference to
2.0.1 in an earlier email.

On Fri, Jul 29, 2022 at 3:03 PM Christopher  wrote:

> That was a bug in the new AbstractServer class in 2.x. I don't think it
> ever affected 1.x
> I checked the 1.10 code and it wouldn't affect the tracer server. `-a`
> should still work fine there.
>
> On Fri, Jul 29, 2022 at 3:00 PM Dave Marion  wrote:
>
> > https://github.com/apache/accumulo/pull/2119 fixed a bug in 2.1 where
> the
> > -a argument was not correctly setting the hostname. It looks like 2.0.1
> is
> > affected by this too.
> >
> > On Fri, Jul 29, 2022 at 12:57 PM Christopher 
> wrote:
> >
> > > The tracer should be advertising its own address in ZK. By default, the
> > > server listens on `0.0.0.0`, unless `-a` or `--address` is specified on
> > the
> > > command-line when it is started. Most server types use a utility class
> > that
> > > will use `InetAddress.getLocalHost().getCanonicalHostName()` for the
> > > advertisement address if it sees that it is listening on `0.0.0.0`.
> > > However, it looks like the tracer doesn't do this.
> > >
> > > I would try to verify that `0.0.0.0` actually appears as the advertised
> > > address in ZK for the tracer service, just be sure it's not a
> > > ZooTraceClient bug on the tablet server side (but I'm pretty sure it's
> > not,
> > > after looking at the code).
> > >
> > > As a workaround, I would start up the tracer service using a script
> that
> > > sets `--address accumulo_tracer` on the command-line, where
> > > "accumulo_tracer" is the hostname or IP address that you want the
> tracer
> > > listening on and advertising to other servers. This address should be
> > > bindable where the tracer is running, and reachable from wherever the
> > > tservers are (so make sure it's not a private address only available
> > inside
> > > the container running the tracer). The reason for this is that the
> > hostname
> > > will resolve to an IP address inside the tracer process, and then the
> > > tracer will advertise the IP address. It won't advertise the original
> > > hostname, if you specified a hostname.
> > >
> > > I hope that helps.
> > >
> > >
> > > On Fri, Jul 29, 2022 at 12:18 PM kma  wrote:
> > >
> > > > Nice,
> > > >
> > > > Turn on debug really help.
> > > >
> > > > 2022-07-29 12:12:14,235 [tracer.ZooTraceClient] DEBUG: Scanning trace
> > > > hosts in zookeeper: /tracers
> > > > 2022-07-29 12:12:14,240 [tracer.ZooTraceClient] DEBUG: Trace hosts:
> > > > [0.0.0.0:12234, 0.0.0.0:12234]
> > > > 2022-07-29 12:12:14,240 [tracer.ZooTraceClient] DEBUG: Successfully
> > > > initialized tracer hosts from ZooKeeper
> > > >
> > > > [root@accumulo_gc /]# curl 0.0.0.0:12234
> > > > curl: (7) Failed connect to 0.0.0.0:12234; Connection refused
> > > >
> > > > [root@accumulo_gc /]# curl accumulo_tracer:12234
> > > > curl: (52) Empty reply from server
> > > >
> > > > Is there a way to set Trace hosts to accumulo_tracer:12234
> > > >
> > > > Cheers!...
> > > > ...Keith
> > > >
> > > >
> > > > On 2022/07/28 23:40:18 Christopher wrote:
> > > >  > Do all servers have the same configuration?
> > > >  >
> > > >  > I would investigate the tablet server debug logs to determine if
> > it's
> > > >  > having trouble setting up tracing. It should be able to locate the
> > > > tracer
> > > >  > service by talking to ZooKeeper and observing its service address
> > > >  > advertisement, similar to how other servers register themselves in
> > > >  > ZooKeeper. I'm not a docker network expert, but whatever service
> > > address
> > > >  > the tracer service is advertising there should be routable from
> the
> > > > tablet
> > > >  > servers.
> > > >  >
> > > >  > On Thu, Jul 28, 2022 at 7:31 PM kma  wrote:
> > > >  >
> > > >  > > Thanks again Christopher,
> > > >  > >
> > > >  > > Our environment is a little different from usual.
> > > >  > >
> > > >  > > We have accumulo tracer running in it's own docker

  1   2   >