from:"Josh Elser"

Re: [ATTN] Cleaning up extra refs in git

2016-03-04 Thread Josh Elser


Thanks for taking the time to clean these up!

Christopher wrote:

Not much change. After doing 'git gc --aggressive --prune=now' on a 'git
clone --mirror', the repo size was 33M before the removal of these refs,
and 27M after. Since they were mostly pointing to existing blobs, I
wouldn't expect it to have dropped much. I'm actually a bit surprised it
dropped as much as 6M.

That said, i don't know how frequently ASF does a 'gc', or if git does that
automatically on the ASF remote, so I don't know if/when the potential for
a slightly smaller size will benefit anybody.

On Fri, Mar 4, 2016 at 4:32 PM William Slacum  wrote:


Any stats on what the repo size is after removing the refs and doing
something like `git gc`?

On Fri, Mar 4, 2016 at 4:25 PM, Christopher  wrote:


I was able to deleted 135 duplicate refs of the kind I described. Only

one

resulted in a new branch being created (ACCUMULO-722). We probably don't
need that at all, but it might be useful to turn into patches to attach

to

the "Won't Fix" ticket, rather than preserve them as an inactive branch.

Also note that the ACCUMULO-722 branch is not rooted on any other

branches

in our git repo. It was essentially just a sandbox in svn where Eric had
been working.

On Wed, Mar 2, 2016 at 6:14 PM Christopher  wrote:


(tl;dr version: I'm going to clean up refs/remotes/** in git, which
contains duplicate history and messes with 'git clone --mirror'; these

are

refs which are neither branches nor tags and leftover from git-svn)

So, when we switched from svn to git, there were a lot of leftover refs
left in the git repository that are from old branches/history which has
already been merged into the branches/tags that we've since created. I
think these were leftover from weird git-svn behavior. These can, and
should, be cleaned up.

You can see all of them when you do a:
git ls-remote origin

In that output, our current branches are the refs/heads/*, and our tags
are the refs/tags/*
The extras which need to be cleaned up are the refs/remotes/*

(including

refs/remotes/tags/*)

As you can see, these are duplicates of branches which have been merged

in

already, or temporary tags which didn't make it to a release (release
candidates) but whose relevant history is already in our normal git
history, or they are branches which were abandoned on purpose
(ACCUMULO-722).

Usually these extra refs don't present a problem, because we don't
normally see them when we clone (they aren't branches which are

normally

fetched). However, there are a few cases where this is a problem. In
particular, they show up when you do "git clone --mirror", and if you

push

this mirror to another git repository, like a GitLab mirror (git push
--mirror), they show up as extra branches which don't appear to exist

in

the original (a very confusing situation for a "mirror").

The interesting thing about these, is that even when they have the same
history as the git branches/tags we maintain now, the SHA1s don't match

up.

This seems to imply they were leftover from a previous invocation of
git-svn.

So, what I'd like to do is go through each of these extra refs one by

one,

and figure out if we already have this history in our branches/tags. If

we

do, then I'd delete these extras. If we don't (as in the case of
ACCUMULO-722), I'd just convert that to a normal git branch

(refs/heads/*)

until we decide what to do with it at some future point in time (for
example, perhaps do a 'git format-patch' on it and attach the files to

the

"Won't Fix" ticket so we can delete the dead branch? not sure, but that

can

be deferred).

Re: 1.6 Javadoc missing classes

2016-03-04 Thread Josh Elser


Maybe the distributed tracing APIs?

Christopher wrote:

Sure, we can include that. Are there any other classes which would be
good to have javadocs for which aren't public API?

On Fri, Mar 4, 2016 at 4:03 PM Josh Elser mailto:josh.el...@gmail.com>> wrote:

Good catch, Dan. Thanks for letting us know. Moving this one over to the
dev list to discuss further.

Christopher, looks like it might also be good to include iterator
javadocs despite not being in public API (interfaces, and
o.a.a.c.i.user?).

 Original Message 
Subject: 1.6 Javadoc missing classes
Date: Fri, 4 Mar 2016 15:59:26 -0500
From: Dan Blum mailto:db...@bbn.com>>
Reply-To: u...@accumulo.apache.org <mailto:u...@accumulo.apache.org>
To: mailto:u...@accumulo.apache.org>>

A lot of classes seem to have gone missing from
http://accumulo.apache.org/1.6/apidocs/ - SortedKeyValueIterator
would be an
obvious example.

Re: 1.6 Javadoc missing classes

2016-03-04 Thread Josh Elser

Oh, right. I forgot about the htrace shift. Don't we still have some 
public-facing wrappers? I think we should have Javadocs published for 
whatever the distributed tracing example we have published.

http://accumulo.apache.org/1.7/accumulo_user_manual.html#_instrumenting_a_client

Christopher wrote:

The tracing APIs vary from version to version significantly. That puts a
lot of extra effort on the person updating the included packages. How
important are those now we're transitioning to use an external dependency?

On Fri, Mar 4, 2016 at 5:17 PM Josh Elser mailto:josh.el...@gmail.com>> wrote:

Maybe the distributed tracing APIs?

Christopher wrote:
 > Sure, we can include that. Are there any other classes which would be
 > good to have javadocs for which aren't public API?
 >
 > On Fri, Mar 4, 2016 at 4:03 PM Josh Elser mailto:josh.el...@gmail.com>
 > <mailto:josh.el...@gmail.com <mailto:josh.el...@gmail.com>>> wrote:
 >
 > Good catch, Dan. Thanks for letting us know. Moving this one
over to the
 > dev list to discuss further.
 >
 > Christopher, looks like it might also be good to include iterator
 > javadocs despite not being in public API (interfaces, and
 > o.a.a.c.i.user?).
 >
 >  Original Message 
 > Subject: 1.6 Javadoc missing classes
 > Date: Fri, 4 Mar 2016 15:59:26 -0500
 > From: Dan Blum mailto:db...@bbn.com>
<mailto:db...@bbn.com <mailto:db...@bbn.com>>>
 > Reply-To: u...@accumulo.apache.org
<mailto:u...@accumulo.apache.org> <mailto:u...@accumulo.apache.org
<mailto:u...@accumulo.apache.org>>
 > To: mailto:u...@accumulo.apache.org> <mailto:u...@accumulo.apache.org
<mailto:u...@accumulo.apache.org>>>
 >
 > A lot of classes seem to have gone missing from
 > http://accumulo.apache.org/1.6/apidocs/ - SortedKeyValueIterator
 > would be an
 > obvious example.
 >

Broken website links (talks)

2016-03-07 Thread Josh Elser

./content/papers.mdtext:  href="http://people.apache.org/~kturner/accumulo14_15.pdf";>slides
./content/papers.mdtext:  href="http://people.apache.org/~afuchs/slides/morgan_state_talk.pdf";>slides


Keith/Adam -- any chance you can relocate your referenced talks off of 
people.a.o (which was recently decomissioned) to the Accumulo CMS or 
home.a.o?


See recent chatter on infra@ if you didn't know this on was happening :)

- Josh

Re: Broken website links (talks)

2016-03-07 Thread Josh Elser

I updated one of our releasing pages which also (incorrectly) stated 
that RC artifacts had to be pushed to people.a.o.


While I was at it, I also changed the feather in the footer to the new 
feather logo. If/when I get a chance, I'll rotate/resize it properly to 
fit into the footer well.


Josh Elser wrote:

./content/papers.mdtext: http://people.apache.org/~kturner/accumulo14_15.pdf";>slides
./content/papers.mdtext: http://people.apache.org/~afuchs/slides/morgan_state_talk.pdf";>slides


Keith/Adam -- any chance you can relocate your referenced talks off of
people.a.o (which was recently decomissioned) to the Accumulo CMS or
home.a.o?

See recent chatter on infra@ if you didn't know this on was happening :)

- Josh

Re: git-based site and jekyll

2016-03-08 Thread Josh Elser


+1 as well. I would be extremely happy moving to Jekyll.

The one concern I had was regarding automatic rendering of what would 
look like "the Apache Accumulo website" on Github (both apache/accumulo 
github account and other forks).


Christopher had said that no one seemed to object in comdev@ when he 
talked about this a while back. I wanted to make sure everyone 
considered this (for example, Christopher's fork of Drill's repository 
now also looks like a canonical host of the Apache Drill project). I'm 
not actively stating that I think it's an issue at this point, only 
suggesting that we give it some thought and maybe ask someone who is 
more knowledgable (Shane from trademarks?) before moving forward. The 
worst case I envision is that we find some way to "gimp" the 
github-rendered site (redirect back to the canonical accumulo.apache.org 
or similar).


Christopher wrote:

I got some information back from INFRA about how the git-based sites work.
It's just plain old static hosting of a git branch. So, whatever we'd put
in a specified branch would show up directly on the site, no rendering or
generation. This would completely bypass CMS and buildbot staging builds.

Was discussing this with elserj in IRC, and these ideas came out of that:

1. Switch site to use git branch named "site" or "website" or similar.
2. Use jekyll 3 to generate the static site contents in this git branch.
3. Store the unrendered (markdown) jekyll stuff in a gh-pages branch.
4. Possibly set up a post-commit hook on gh-pages branch to render locally
and commit the generated static site to the "site" branch.

This would have the following benefits:

* Canonical rendering of "site" branch at http://accumulo.apache.org
* Identical, automatic rendering of gh-pages branch at
http://apache.github.io/accumulo
* Changes to gh-pages in forks would render in fork's github.io for
preview/testing
* Jekyll can be run locally for preview for non-GitHub users wishing to
contribute updates to site
* Use of jekyll means we can still edit/use markdown to edit pages
* Can still include non-markdown content and raw html

Another project which seems to be doing this (or something close to it) is
Apache Drill:
https://drill.apache.org/
http://apache.github.io/drill/
http://ctubbsii.github.io/drill/ (example fork build)

Re: git-based site and jekyll

2016-03-08 Thread Josh Elser

It's also probably worth mentioning that this concern only comes about 
for point #4 (or if we use the branch name gh-pages in point #1).


Josh Elser wrote:

The one concern I had was regarding automatic rendering of what would
look like "the Apache Accumulo website" on Github (both apache/accumulo
github account and other forks).

Christopher had said that no one seemed to object in comdev@ when he
talked about this a while back. I wanted to make sure everyone
considered this (for example, Christopher's fork of Drill's repository
now also looks like a canonical host of the Apache Drill project). I'm
not actively stating that I think it's an issue at this point, only
suggesting that we give it some thought and maybe ask someone who is
more knowledgable (Shane from trademarks?) before moving forward. The
worst case I envision is that we find some way to "gimp" the
github-rendered site (redirect back to the canonical accumulo.apache.org
or similar).

Christopher wrote:

I got some information back from INFRA about how the git-based sites
work.
It's just plain old static hosting of a git branch. So, whatever we'd put
in a specified branch would show up directly on the site, no rendering or
generation. This would completely bypass CMS and buildbot staging builds.

Was discussing this with elserj in IRC, and these ideas came out of that:

1. Switch site to use git branch named "site" or "website" or similar.
2. Use jekyll 3 to generate the static site contents in this git branch.
3. Store the unrendered (markdown) jekyll stuff in a gh-pages branch.
4. Possibly set up a post-commit hook on gh-pages branch to render
locally
and commit the generated static site to the "site" branch.

Re: git-based site and jekyll

2016-03-08 Thread Josh Elser

Well, I think the difference is that archive.org (and others -- google 
cached pages come to mind) are devoted/known for that specific purpose. 
The fact that Github ends up being a "de-facto" location for software 
projects, I'm just nervous about the expecting good faith from the 
denizens of the internet. Maybe I'm just worrying too much. If there's 
sufficient "it'll be ok" opinion coming from the PMC, it's fine by me.


Christopher wrote:

I can't imagine there's a trademark issue since it's really just acting as
a mirror. If there were trademark issues, I imagine sites like
http://archive.org would be in big trouble. But, it certainly couldn't hurt
to find out.

Another option to sabotage the GH-rendered site is to add some javascript
which detects the location and displays an informative link back to the
canonical location for the site. That should be simple enough to do.

On Tue, Mar 8, 2016 at 1:36 PM Josh Elser  wrote:


It's also probably worth mentioning that this concern only comes about
for point #4 (or if we use the branch name gh-pages in point #1).

Josh Elser wrote:

The one concern I had was regarding automatic rendering of what would
look like "the Apache Accumulo website" on Github (both apache/accumulo
github account and other forks).

Christopher had said that no one seemed to object in comdev@ when he
talked about this a while back. I wanted to make sure everyone
considered this (for example, Christopher's fork of Drill's repository
now also looks like a canonical host of the Apache Drill project). I'm
not actively stating that I think it's an issue at this point, only
suggesting that we give it some thought and maybe ask someone who is
more knowledgable (Shane from trademarks?) before moving forward. The
worst case I envision is that we find some way to "gimp" the
github-rendered site (redirect back to the canonical accumulo.apache.org
or similar).

Christopher wrote:

I got some information back from INFRA about how the git-based sites
work.
It's just plain old static hosting of a git branch. So, whatever we'd

put

in a specified branch would show up directly on the site, no rendering

or

generation. This would completely bypass CMS and buildbot staging

builds.

Was discussing this with elserj in IRC, and these ideas came out of

that:

1. Switch site to use git branch named "site" or "website" or similar.
2. Use jekyll 3 to generate the static site contents in this git branch.
3. Store the unrendered (markdown) jekyll stuff in a gh-pages branch.
4. Possibly set up a post-commit hook on gh-pages branch to render
locally
and commit the generated static site to the "site" branch.

Re: git-based site and jekyll

2016-03-08 Thread Josh Elser

Lazy consensus is fine. If there are no objections, I don't want to hold 
things up. I feel like I've adequately expressed my concerns. Silence 
can and should be treated as acknowledgement for this, IMO.


Christopher wrote:

Another reason we probably shouldn't worry about this: anybody can create a
DNS name at their leisure which transparently redirects to
accumulo.apache.org and serves its contents. This is perfectly legitimate
for a number of reasons, including corporate proxies/mirrors,
URL-shortening services, caching services, archiving services,
vision-impaired accessibility services, foreign-language DNS mappings, and
so-on.

I think when it comes to trademarks and our website, our area of concern
should mostly focus on when people misrepresent our trademark in the course
of their mirroring/archiving. There's no risk of that for a mirror that is
explicitly under our control, but I'm really leaning towards the javascript
to detect and display a message about the canonical location just to
mitigate any possibility for concern.

If you still have concerns, I'd be happy to put it up for a formal vote
from the PMC, or to get feedback from ASF trademarks folks before we
proceed.

On Tue, Mar 8, 2016 at 3:22 PM Josh Elser  wrote:


Well, I think the difference is that archive.org (and others -- google
cached pages come to mind) are devoted/known for that specific purpose.
The fact that Github ends up being a "de-facto" location for software
projects, I'm just nervous about the expecting good faith from the
denizens of the internet. Maybe I'm just worrying too much. If there's
sufficient "it'll be ok" opinion coming from the PMC, it's fine by me.

Christopher wrote:

I can't imagine there's a trademark issue since it's really just acting

as

a mirror. If there were trademark issues, I imagine sites like
http://archive.org would be in big trouble. But, it certainly couldn't

hurt

to find out.

Another option to sabotage the GH-rendered site is to add some javascript
which detects the location and displays an informative link back to the
canonical location for the site. That should be simple enough to do.

On Tue, Mar 8, 2016 at 1:36 PM Josh Elser   wrote:


It's also probably worth mentioning that this concern only comes about
for point #4 (or if we use the branch name gh-pages in point #1).

Josh Elser wrote:

The one concern I had was regarding automatic rendering of what would
look like "the Apache Accumulo website" on Github (both apache/accumulo
github account and other forks).

Christopher had said that no one seemed to object in comdev@ when he
talked about this a while back. I wanted to make sure everyone
considered this (for example, Christopher's fork of Drill's repository
now also looks like a canonical host of the Apache Drill project). I'm
not actively stating that I think it's an issue at this point, only
suggesting that we give it some thought and maybe ask someone who is
more knowledgable (Shane from trademarks?) before moving forward. The
worst case I envision is that we find some way to "gimp" the
github-rendered site (redirect back to the canonical

accumulo.apache.org

or similar).

Christopher wrote:

I got some information back from INFRA about how the git-based sites
work.
It's just plain old static hosting of a git branch. So, whatever we'd

put

in a specified branch would show up directly on the site, no rendering

or

generation. This would completely bypass CMS and buildbot staging

builds.

Was discussing this with elserj in IRC, and these ideas came out of

that:

1. Switch site to use git branch named "site" or "website" or similar.
2. Use jekyll 3 to generate the static site contents in this git

branch.

3. Store the unrendered (markdown) jekyll stuff in a gh-pages branch.
4. Possibly set up a post-commit hook on gh-pages branch to render
locally
and commit the generated static site to the "site" branch.

Re: git-based site and jekyll

2016-03-10 Thread Josh Elser

* Some companies on http://ctubbsii.github.io/accumulo/people.html are 
goofed as are the timezones.
* Some broken links on http://ctubbsii.github.io/accumulo/source.html. 
Coding practices are also messed up.
* http://ctubbsii.github.io/accumulo/contrib.html contrib project 
entries are a little wacky.
* http://ctubbsii.github.io/accumulo/screenshots.html is weird with the 
monitor screenshot (should be beneath the text?)
* Just noticed that Other and Documentation both have a link to the 
papers/presentations. That might actually be how the site is now, just 
realized it's duplicative.


Thanks again for doing this. It's great!

Christopher wrote:

Actually, I now have it all working (as far as I can tell) with everything
pretty much the same as it looks with CMS today. After people have taken
the time to give it a glance, I'll push it to the ASF repo, and then push
the generated site to a separate branch. Then we can put in the INFRA
ticket to switch from svn to git.

On Thu, Mar 10, 2016 at 6:42 PM Christopher  wrote:


I'm working on converting our current site contents over to jekyll at
https://github.com/ctubbsii/accumulo/tree/gh-pages
(view at http://ctubbsii.github.io/accumulo)

Yes, it's terrible right now... it's in progress. :)

On Tue, Mar 8, 2016 at 4:21 PM Josh Elser  wrote:


Lazy consensus is fine. If there are no objections, I don't want to hold
things up. I feel like I've adequately expressed my concerns. Silence
can and should be treated as acknowledgement for this, IMO.

Christopher wrote:

Another reason we probably shouldn't worry about this: anybody can

create a

DNS name at their leisure which transparently redirects to
accumulo.apache.org and serves its contents. This is perfectly

legitimate

for a number of reasons, including corporate proxies/mirrors,
URL-shortening services, caching services, archiving services,
vision-impaired accessibility services, foreign-language DNS mappings,

and

so-on.

I think when it comes to trademarks and our website, our area of concern
should mostly focus on when people misrepresent our trademark in the

course

of their mirroring/archiving. There's no risk of that for a mirror that

is

explicitly under our control, but I'm really leaning towards the

javascript

to detect and display a message about the canonical location just to
mitigate any possibility for concern.

If you still have concerns, I'd be happy to put it up for a formal vote
from the PMC, or to get feedback from ASF trademarks folks before we
proceed.

On Tue, Mar 8, 2016 at 3:22 PM Josh Elser   wrote:


Well, I think the difference is that archive.org (and others -- google
cached pages come to mind) are devoted/known for that specific purpose.
The fact that Github ends up being a "de-facto" location for software
projects, I'm just nervous about the expecting good faith from the
denizens of the internet. Maybe I'm just worrying too much. If there's
sufficient "it'll be ok" opinion coming from the PMC, it's fine by me.

Christopher wrote:

I can't imagine there's a trademark issue since it's really just

acting

as

a mirror. If there were trademark issues, I imagine sites like
http://archive.org would be in big trouble. But, it certainly

couldn't

hurt

to find out.

Another option to sabotage the GH-rendered site is to add some

javascript

which detects the location and displays an informative link back to

the

canonical location for the site. That should be simple enough to do.

On Tue, Mar 8, 2016 at 1:36 PM Josh Elser

  wrote:

It's also probably worth mentioning that this concern only comes

about

for point #4 (or if we use the branch name gh-pages in point #1).

Josh Elser wrote:

The one concern I had was regarding automatic rendering of what

would

look like "the Apache Accumulo website" on Github (both

apache/accumulo

github account and other forks).

Christopher had said that no one seemed to object in comdev@ when

he

talked about this a while back. I wanted to make sure everyone
considered this (for example, Christopher's fork of Drill's

repository

now also looks like a canonical host of the Apache Drill project).

I'm

not actively stating that I think it's an issue at this point, only
suggesting that we give it some thought and maybe ask someone who is
more knowledgable (Shane from trademarks?) before moving forward.

The

worst case I envision is that we find some way to "gimp" the
github-rendered site (redirect back to the canonical

accumulo.apache.org

or similar).

Christopher wrote:

I got some information back from INFRA about how the git-based

sites

work.
It's just plain old static hosting of a git branch. So, whatever

we'd

put

in a specified branch would show up directly on the site, no

rendering

or

generation. This would completely bypass CMS and buildbot st

Re: git-based site and jekyll

2016-03-11 Thread Josh Elser


+1

Dylan Hutchison wrote:

Sounds great Chris!

On Fri, Mar 11, 2016 at 9:50 AM, Christopher  wrote:


So, if everybody's happy doing this, I'll go ahead and perform the
following steps:

1. Push gh-pages branch to our repo
2. Perform a jekyll build on the branch and put it in a branch called "
accumulo.apache.org"
3. Push the accumulo.apache.org branch
4. File INFRA ticket to switch our site to git using the
accumulo.apache.org
branch


On Fri, Mar 11, 2016 at 11:46 AM Billie Rinaldi
wrote:


Wow, that's looking great.  Thanks, Christopher!

Billie

On Thu, Mar 10, 2016 at 10:38 PM, Christopher

wrote:

Thanks Josh! I fixed all the issues you saw, except the screenshots

one,

since that's currently just how our layout is (looks the same at
accumulo.apache.org).

Most of the bugs you saw were existing bugs with either our HTML or our
Markdown... but whatever CMS is doing is a bit more tolerant than

Kramdown

is apparently.

Biggest problem I saw was that people keep forgetting quotes around

HTML

attributes. Example, it should be, not.

On Thu, Mar 10, 2016 at 9:57 PM Josh Elser

wrote:

* Some companies on http://ctubbsii.github.io/accumulo/people.html

are

goofed as are the timezones.
* Some broken links on

http://ctubbsii.github.io/accumulo/source.html.

Coding practices are also messed up.
* http://ctubbsii.github.io/accumulo/contrib.html contrib project
entries are a little wacky.
* http://ctubbsii.github.io/accumulo/screenshots.html is weird with

the

monitor screenshot (should be beneath the text?)
* Just noticed that Other and Documentation both have a link to the
papers/presentations. That might actually be how the site is now,

just

realized it's duplicative.

Thanks again for doing this. It's great!

Christopher wrote:

Actually, I now have it all working (as far as I can tell) with

everything

pretty much the same as it looks with CMS today. After people have

taken

the time to give it a glance, I'll push it to the ASF repo, and

then

push

the generated site to a separate branch. Then we can put in the

INFRA

ticket to switch from svn to git.

On Thu, Mar 10, 2016 at 6:42 PM Christopher

wrote:

I'm working on converting our current site contents over to jekyll

at

https://github.com/ctubbsii/accumulo/tree/gh-pages
(view at http://ctubbsii.github.io/accumulo)

Yes, it's terrible right now... it's in progress. :)

On Tue, Mar 8, 2016 at 4:21 PM Josh Elser

wrote:

Lazy consensus is fine. If there are no objections, I don't want

to

hold

things up. I feel like I've adequately expressed my concerns.

Silence

can and should be treated as acknowledgement for this, IMO.

Christopher wrote:

Another reason we probably shouldn't worry about this: anybody

can

create a

DNS name at their leisure which transparently redirects to
accumulo.apache.org and serves its contents. This is perfectly

legitimate

for a number of reasons, including corporate proxies/mirrors,
URL-shortening services, caching services, archiving services,
vision-impaired accessibility services, foreign-language DNS

mappings,

and

so-on.

I think when it comes to trademarks and our website, our area of

concern

should mostly focus on when people misrepresent our trademark in

the

course

of their mirroring/archiving. There's no risk of that for a

mirror

that

is

explicitly under our control, but I'm really leaning towards the

javascript

to detect and display a message about the canonical location

just

to

mitigate any possibility for concern.

If you still have concerns, I'd be happy to put it up for a

formal

vote

from the PMC, or to get feedback from ASF trademarks folks

before

we

proceed.

On Tue, Mar 8, 2016 at 3:22 PM Josh Elser

  wrote:

Well, I think the difference is that archive.org (and others

--

google

cached pages come to mind) are devoted/known for that specific

purpose.

The fact that Github ends up being a "de-facto" location for

software

projects, I'm just nervous about the expecting good faith from

the

denizens of the internet. Maybe I'm just worrying too much. If

there's

sufficient "it'll be ok" opinion coming from the PMC, it's fine

by

me.

Christopher wrote:

I can't imagine there's a trademark issue since it's really

just

acting

as

a mirror. If there were trademark issues, I imagine sites like
http://archive.org would be in big trouble. But, it certainly

couldn't

hurt

to find out.

Another option to sabotage the GH-rendered site is to add some

javascript

which detects the location and displays an informative link

back

to

the

canonical location for the site. That should be simple enough

to

do.

On Tue, Mar 8, 2016 at 1:36 PM Josh Elser<

josh.el...@gmail.com>

   wrote:

It's also probably worth mentioning that this concern only

comes

about

for point #4 (or if we use the branch name

Re: delete + insert case

2016-03-19 Thread Josh Elser


Just clarified with Keith in IRC (because I wasn't positive)

This approach will work if you want Accumulo to assign timestamps (e.g. 
not specify them at all in the client). If you can manage that yourself, 
you can try what I suggested in the other message.


Keith Turner wrote:

There are no order guarantees for two mutations added prior to flush being
called.   One possible solution it to have two batch writers.  One for
deletes and flush it first.

On Wed, Mar 16, 2016 at 4:33 PM, z11373  wrote:


Hi,
I have object abstraction class which delete/add operation will eventually
translate to calling Accumulo writer.putDelete and writer.put
To achieve higher throughput, the code will only call writer.flush per
request (my implementation knows when it's end of request), instead of
flushing per each delete or add operation.
In this case we have client request calling my service which for example
would be:
1. delete A
2. add A
3. add B

I'd expect the end result would be both row id A and B exists in the table,
but apparently it's only B. I already checked from the log, the order the
code being executed is delete first before add operation. However, I guess
since I call flush after all putDelete and put calls being made, Accumulo
somehow make putDelete 'win' (in same flush cycle), is that correct? If
yes,
how to workaround this without sacrificing performance.


Thanks,
Z



--
View this message in context:
http://apache-accumulo.1065345.n5.nabble.com/delete-insert-case-tp16375.html
Sent from the Developers mailing list archive at Nabble.com.

Re: delete + insert case

2016-03-19 Thread Josh Elser

Make sure that your insert has a newer timestamp than the delete does. 
Otherwise, the delete will mask any inserts with smaller timestamps 
until it is compacted away (which is essentially an unknown to you as a 
client).


e.g.

1. delete A ts=5
2. add A ts=6
3. add B ts=whatever

z11373 wrote:

Hi,
I have object abstraction class which delete/add operation will eventually
translate to calling Accumulo writer.putDelete and writer.put
To achieve higher throughput, the code will only call writer.flush per
request (my implementation knows when it's end of request), instead of
flushing per each delete or add operation.
In this case we have client request calling my service which for example
would be:
1. delete A
2. add A
3. add B

I'd expect the end result would be both row id A and B exists in the table,
but apparently it's only B. I already checked from the log, the order the
code being executed is delete first before add operation. However, I guess
since I call flush after all putDelete and put calls being made, Accumulo
somehow make putDelete 'win' (in same flush cycle), is that correct? If yes,
how to workaround this without sacrificing performance.


Thanks,
Z



--
View this message in context: 
http://apache-accumulo.1065345.n5.nabble.com/delete-insert-case-tp16375.html
Sent from the Developers mailing list archive at Nabble.com.

Re: delete + insert case

2016-03-19 Thread Josh Elser

Server-assigned timestamps are done per-batch. This is getting back to 
what Keith suggested. It's not that Accumulo isn't "setting the 
timestamp properly" like you suggest, this is just how server-assigned 
time works.


If you submit a delete and an update, without timestamps, in the same 
batch (without a flush inbetween) they'll get the same timestamp, and 
the delete will override the update.


tl;dr Use Keith's approach if you can't compute an increasing value for 
the timestamp.


z11373 wrote:

Thanks Keith/Josh!

@Josh: the client API I use in my code is the one without passing that long
timestamp, so Accumulo should assign the timestamp in time ordered manner,
right?
 From my service app log file, I see the add always being called after
delete, so it should work if Accumulo set the timestamp properly, but there
could be possibility they are assigned same timestamp, hence the behavior is
not predictable?
I prefer not to assign timestamp from my code if possible.


Thanks,
Z



--
View this message in context: 
http://apache-accumulo.1065345.n5.nabble.com/delete-insert-case-tp16375p16379.html
Sent from the Developers mailing list archive at Nabble.com.

Re: git-based site and jekyll

2016-03-19 Thread Josh Elser

One thing I just noticed is that the quick "anchor" links at the end of 
each header (specifically on the release notes page) are missing.


I liked those because it was easy to click the header to get a url and 
use it to reference.


The IDs are still there on each section, just the quick link to get 
there is missing (so I have to look it up). Not sure if there is an easy 
way to get this with Jekyll (or if it'd just be something we have to do 
by hand).


Christopher wrote:

There's plenty of room for improvement to the new git/Jekyll site. For
instance, we can start blogging there, so we have greater control over the
look and feel of our blog posts, and so any committer can blog without
needing to request an extra account. At some point, I think it'd be good to
migrate our existing blogs over to this.

Another thing we can do is put our release notes in an RSS feed, so users
subscribe to new release announcements/notes. I might put some thought into
that at a later point in time. For now, I'm just happy we're on git for
everything except the dist.apache.org/release mirroring (for which I'm
totally fine using git-svn).

Re: delete + insert case

2016-03-19 Thread Josh Elser

Server-assigned timestamps aren't noticeably slower than user-assigned 
timestamps, if that's what you're referring to WRT throughput.


As for using currentTimeMillis(), probably fine, but not always.

1) NTP updates might cause currentTimeMillis() to change in reverse
2) You need to make sure the delete and update always come from the same 
host (otherwise two hosts might have different values for 
currentTimeMillis())


Time is hard in distributed systems.

z11373 wrote:

Thanks Josh! For better throughput, I think I'd just assign the timestamp
from my code.
Using this code, System.currentTimeMillis(); for timestamp should be ok,
right?


Thanks,
Z




--
View this message in context: 
http://apache-accumulo.1065345.n5.nabble.com/delete-insert-case-tp16375p16382.html
Sent from the Developers mailing list archive at Nabble.com.

Re: git-based site and jekyll

2016-03-21 Thread Josh Elser


Moved the anchor links to the right side again :)

In your post-commit hook, `mktemp` doesn't work on OSX with the 
`--tmpdir` option.


Have you made a canonical "How to update the website" page yet?

Christopher wrote:

I didn't see anything which indicated we had those in the past. Maybe it
was something CMS was doing special, but I figured out how to get the
section links present.

On Sat, Mar 19, 2016 at 5:21 PM Christopher  wrote:


I don't think I ever noticed those before. There might be a kramdown
option we can turn on.

On Sat, Mar 19, 2016, 16:55 Josh Elser  wrote:


One thing I just noticed is that the quick "anchor" links at the end of
each header (specifically on the release notes page) are missing.

I liked those because it was easy to click the header to get a url and
use it to reference.

The IDs are still there on each section, just the quick link to get
there is missing (so I have to look it up). Not sure if there is an easy
way to get this with Jekyll (or if it'd just be something we have to do
by hand).

Christopher wrote:

There's plenty of room for improvement to the new git/Jekyll site. For
instance, we can start blogging there, so we have greater control over

the

look and feel of our blog posts, and so any committer can blog without
needing to request an extra account. At some point, I think it'd be

good to

migrate our existing blogs over to this.

Another thing we can do is put our release notes in an RSS feed, so

users

subscribe to new release announcements/notes. I might put some thought

into

that at a later point in time. For now, I'm just happy we're on git for
everything except the dist.apache.org/release mirroring (for which I'm
totally fine using git-svn).

Re: git-based site and jekyll

2016-03-22 Thread Josh Elser


Yeah, that menu scrolling has been a long-time irritation :)

You can adding some more padding-left if you want to increase the 
spacing (I think I put 5-10px).


Christopher wrote:

I noticed that the arrow on the right of the drop-down menus looked a bit
closer after Josh's last change, but I don't think it's a problem (it's not
on my screen, anyway).
(though, I think I prefer the anchors on the left for the sections)

One annoying think about the anchor links is that linking directly to a
section puts the section name underneath our top menu... that's kinda
annoying, but not sure how to avoid it. Maybe make the menu auto-hide
unless you're scrolled all the way to the top?


On Tue, Mar 22, 2016 at 2:34 AM Christopher  wrote:


I think I updated all the existing pages covering that, but the whole site
could do with a cleanup and reorganization, along with additional howtos.

On Mon, Mar 21, 2016, 22:17 Josh Elser  wrote:


Moved the anchor links to the right side again :)

In your post-commit hook, `mktemp` doesn't work on OSX with the
`--tmpdir` option.

Have you made a canonical "How to update the website" page yet?

Christopher wrote:

I didn't see anything which indicated we had those in the past. Maybe it
was something CMS was doing special, but I figured out how to get the
section links present.

On Sat, Mar 19, 2016 at 5:21 PM Christopher

wrote:

I don't think I ever noticed those before. There might be a kramdown
option we can turn on.

On Sat, Mar 19, 2016, 16:55 Josh Elser   wrote:


One thing I just noticed is that the quick "anchor" links at the end

of

each header (specifically on the release notes page) are missing.

I liked those because it was easy to click the header to get a url and
use it to reference.

The IDs are still there on each section, just the quick link to get
there is missing (so I have to look it up). Not sure if there is an

easy

way to get this with Jekyll (or if it'd just be something we have to

do

by hand).

Christopher wrote:

There's plenty of room for improvement to the new git/Jekyll site.

For

instance, we can start blogging there, so we have greater control

over

the

look and feel of our blog posts, and so any committer can blog

without

needing to request an extra account. At some point, I think it'd be

good to

migrate our existing blogs over to this.

Another thing we can do is put our release notes in an RSS feed, so

users

subscribe to new release announcements/notes. I might put some

thought

into

that at a later point in time. For now, I'm just happy we're on git

for

everything except the dist.apache.org/release mirroring (for which

I'm

totally fine using git-svn).

Re: Pros and Cons of moving SKVI to public API

2016-03-24 Thread Josh Elser

Billie Rinaldi wrote:

On Thu, Mar 24, 2016 at 1:15 PM, Christopher  wrote:

>  We do have the opportunity to move to a new improved API, if somebody were
>  to put time into it. I guess that's true whether we put this in the public
>  API officially or not.

Agreed.  Even if we do create a new API, we can't change or drop the
existing API without breaking a lot of people's code.  I feel like SKVI is
in a category of things that we treat as though they're in the public API,
so we might as well say it is.

+1 well put.

Re: Pros and Cons of moving SKVI to public API

2016-03-24 Thread Josh Elser


That was my gut reaction too.

Separating "public API" by artifact would be my preferred way to tackle 
it moving forward. Until then, trying to maintain our current approach 
seems reasonable to me. If there's some reason with how we have things 
structured now which makes this infeasible/difficult, let's by all means 
explore options (I didn't even realize that Yetus had their own audience 
annotations).


Christopher wrote:

That's a good idea, at least for now, until we have a proper API jar at
some hypothetical future point. But, I'd be concerned about adding a
dependency for users on previous versions (1.6, 1.7) since it has a runtime
retention.

We could also make our own annotation, but it'd be nice to take advantage
of an existing javadoc doclet to do the filtering, like the one Yetus
provides.

On Thu, Mar 24, 2016 at 5:49 PM Sean Busbey  wrote:


We could switch from a list of packages to annotations using the Apache
Yetus Audience Annotations.

http://yetus.apache.org/documentation/0.2.0/#yetus-audience-annotations

That would allow us to mark specific classes, and even carve out particular
methods should we choose.

On Thu, Mar 24, 2016 at 3:15 PM, Christopher  wrote:


We do have the opportunity to move to a new improved API, if somebody

were

to put time into it. I guess that's true whether we put this in the

public

API officially or not. I think maybe the hardest part is that we don't
really want to put just the interface in the API... but it exists in a
package with a bunch of other classes which probably shouldn't be public
API. So, some thought needs to be put into *how* we're going to do it,

too.

On Thu, Mar 24, 2016 at 3:27 PM William Slacum

wrote:

It should be public API. It's one of the core reasons for choosing

Accumulo

over a similar project like HBase or Cassandra. Allegedly, Jeff "Mean

Gene"

Dean said we got the concept correct as well :)

Personally I hate the current API from a usability standpoint (ie, the
generic types are useless and already encoded in the name, it

needlessly

diverges from the standard java Iterator calling standards), but it's a
strong, identifying feature we have.

On Thu, Mar 24, 2016 at 2:50 PM, Christopher

wrote:

Accumulators,

What are the pros and cons that you can see for moving the
SortedKeyValueIterator into the public API?

Right now, I think there's still some need for improvement in the

Iterator

API, and many of the iterators may not be stable enough to really

recommend

people use without some serious caveats (because we may not be able

to

keep

their API stable very easily). So, there's a con.

In the pros side, iterators are a core feature of Accumulo, and

nearly

all

of Accumulo's distributed processing capabilities are dependent upon

them.

It is reasonable to expect users to take advantage of them, and we've

at

least tried to be cautious about changing the iterators in

incompatible

ways, even if they aren't in the public API.

Recently, this came up when we stripped out all the non-public API

javadocs

from the website. (reported by Dan Blum on the user list on March

4th:



http://mail-archives.apache.org/mod_mbox/accumulo-user/201603.mbox/%3C066a01d17658%24bc9dc1b0%2435d94510%24%40bbn.com%3E

)

What would it take for us to feel comfortable moving them to the

public

API? Do we need a better interface first, or should we isolate the

other

iterators into another package (some of that has already been done),

or

should we wait for a proper public API package (2.0?) to provide this
interface in?




--
busbey

Re: 1.6 Javadoc missing classes

2016-03-28 Thread Josh Elser

So this just bit me. I went looking for Iterators and was confused why 
they weren't there.


Christopher wrote:

Sure, we can include that. Are there any other classes which would be
good to have javadocs for which aren't public API?

On Fri, Mar 4, 2016 at 4:03 PM Josh Elser mailto:josh.el...@gmail.com>> wrote:

Good catch, Dan. Thanks for letting us know. Moving this one over to the
dev list to discuss further.

Christopher, looks like it might also be good to include iterator
javadocs despite not being in public API (interfaces, and
o.a.a.c.i.user?).

 Original Message 
Subject: 1.6 Javadoc missing classes
Date: Fri, 4 Mar 2016 15:59:26 -0500
From: Dan Blum mailto:db...@bbn.com>>
Reply-To: u...@accumulo.apache.org <mailto:u...@accumulo.apache.org>
To: mailto:u...@accumulo.apache.org>>

A lot of classes seem to have gone missing from
http://accumulo.apache.org/1.6/apidocs/ - SortedKeyValueIterator
would be an
obvious example.

Re: Anyone else have trouble running all integration tests on a Mac?

2016-04-01 Thread Josh Elser

There is a -Dtimeout.factor option you can set on the Maven CLI to scale 
up the timeouts.


e.g. `mvn verify -Dtimeout.factor=2` would double the default test timeouts.

Interesting that both of the failures are from stopping processes, but 
that might be circumstantial? Dunno for sure.


Christopher wrote:

I see periodic failures on Linux, too. Some tests are timing sensitive,
others may get stuck in various places. If we can troubleshoot them and fix
them, we should. But, I don't think there's necessarily anything specific
about the Mac, vs. Linux, which is triggering these failures.

On Thu, Mar 31, 2016 at 2:26 PM Michael Wall  wrote:


I consistently have issues running 'mvn clean verify' on Accumulo on my
Mac?  This does not happen when I build on Linux.

Some specs.
Maven 3.2.5
Java 1.7 and 1.8
Accumulo branch 1.6
OSX 10.10.5
16G Ram
MAVEN_OPTS -Xmx1024m -Xms256m -XX:PermSize=128m -XX:MaxPermSize=256m

Josh mentioned making sure DNS can resolve.  But I have the following in my
/etc/hosts
127.0.0.1 localhost
255.255.255.255 broadcasthost
::1 localhost
#fe80::1%lo0 localhost

You will see the ipv6 line commented out, was causing issues on another
project.

The first test that fails is Accumulo3047IT.  When run by itself, it works
fine.  When it fails, it doesn't seem to get past the alterConfig method
which has the @Before.  This method tears down the MAC and restarts with
different configs.  The ExampleIT always takes the longest, in this case
1,131.977 sec.

I pasted the full output from maven at http://pastebin.com/raw/YFiqKRNm,
but here are some highlights

---
  T E S T S
---
Running org.apache.accumulo.server.security.SystemCredentialsIT
Tests run: 1, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 16.896 sec
- in org.apache.accumulo.server.security.SystemCredentialsIT
Running org.apache.accumulo.test.Accumulo3010IT
Tests run: 1, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 38.626 sec
- in org.apache.accumulo.test.Accumulo3010IT
Running org.apache.accumulo.test.Accumulo3047IT
Tests run: 1, Failures: 0, Errors: 1, Skipped: 0, Time elapsed: 60.18 sec
<<<  FAILURE! - in org.apache.accumulo.test.Accumulo3047IT
test(org.apache.accumulo.test.Accumulo3047IT)  Time elapsed: 60.033 sec
  <<<  ERROR!
java.lang.Exception: test timed out after 6 milliseconds
at sun.misc.Unsafe.park(Native Method)
at java.util.concurrent.locks.LockSupport.parkNanos(LockSupport.java:215)
at java.util.concurrent.FutureTask.awaitDone(FutureTask.java:426)
at java.util.concurrent.FutureTask.get(FutureTask.java:204)
at

org.apache.accumulo.minicluster.impl.MiniAccumuloClusterImpl.stopProcessWithTimeout(MiniAccumuloClusterImpl.java:750)
at

org.apache.accumulo.minicluster.impl.MiniAccumuloClusterControl.stop(MiniAccumuloClusterControl.java:234)
at

org.apache.accumulo.minicluster.impl.MiniAccumuloClusterImpl.stop(MiniAccumuloClusterImpl.java:667)
at

org.apache.accumulo.harness.AccumuloClusterIT.teardownCluster(AccumuloClusterIT.java:135)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at

sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
at

sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:497)
at

org.junit.runners.model.FrameworkMethod$1.runReflectiveCall(FrameworkMethod.java:47)
at

org.junit.internal.runners.model.ReflectiveCallable.run(ReflectiveCallable.java:12)
at

org.junit.runners.model.FrameworkMethod.invokeExplosively(FrameworkMethod.java:44)
at
org.junit.internal.runners.statements.RunAfters.evaluate(RunAfters.java:33)
at

org.junit.internal.runners.statements.FailOnTimeout$StatementThread.run(FailOnTimeout.java:74)
...
Running org.apache.accumulo.test.ConditionalWriterIT
Tests run: 18, Failures: 0, Errors: 1, Skipped: 0, Time elapsed: 282.206
sec<<<  FAILURE! - in org.apache.accumulo.test.ConditionalWriterIT
testTrace(org.apache.accumulo.test.ConditionalWriterIT)  Time elapsed:
60.008 sec<<<  ERROR!
java.lang.Exception: test timed out after 6 milliseconds
at java.lang.Thread.sleep(Native Method)
at

org.apache.accumulo.test.ConditionalWriterIT.testTrace(ConditionalWriterIT.java:1433)
...
Running org.apache.accumulo.test.functional.DeleteRowsIT
Tests run: 2, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 82.884 sec
- in org.apache.accumulo.test.functional.DeleteRowsIT
Running org.apache.accumulo.test.functional.DeleteRowsSplitIT
Tests run: 1, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 41.212 sec
- in org.apache.accumulo.test.functional.DeleteRowsSplitIT
Running org.apache.accumulo.test.functional.DeleteTableDuringSplitIT
Tests run: 1, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 225.544 sec
- in org.apache.accumulo.test.functional.DeleteTableDuringSplitIT
Running org.apache.accumulo.test.functional.DynamicThreadPoolsIT
Tests run: 1, Failures: 0, Err

Re: LruBlockCache alternative

2016-04-04 Thread Josh Elser


Cool, thanks for the poke, Ben!

Last I checked, our version of the LRUBlockCache was nearly identical to 
what was in HBase 1.x. I would imagine this would be easy to bring over.


Maybe we can also try to swipe BucketCache while we're at it and get 
some off-heap support for blocks.


Aside: it would be nice if we could somehow find a way to share code 
like this across the projects. HBase was interested in FATE for some 
time, but eventually created their own new solution. I'm not sure what 
else exists that we might want to share between projects.


dlmar...@comcast.net wrote:

Associated issue: https://issues.apache.org/jira/browse/ACCUMULO-4177



-Original Message-
From: Christopher [mailto:ctubb...@apache.org]
Sent: Sunday, April 03, 2016 1:31 PM
To: dev@accumulo.apache.org
Subject: Re: LruBlockCache alternative

Thanks for the pointer!

On Sun, Apr 3, 2016, 12:08 Benjamin Manes  wrote:


Hi,

I noticed that Accumulo's LruBlockCache [1] appears to be based on

HBase's.

I currently have a patch being reviewed in HBASE-15560 [2] that
replaces the pseudo Segmented LRU with the TinyLFU eviction policy.
That should allow the cache to make better predictions based on
frequency and recency, such as improved scan resistance. Full details
are in the JIRA ticket. I think it should be easy to port if there is interest.

Cheers,
Ben

[1]



https://github.com/apache/accumulo/blob/master/core/src/main/java/org/

apache/accumulo/core/file/blockfile/cache/LruBlockCache.java
[2] https://issues.apache.org/jira/browse/HBASE-15560

Re: LruBlockCache alternative

2016-04-04 Thread Josh Elser


Sean Busbey wrote:

On Mon, Apr 4, 2016 at 9:45 AM, Keith Turner  wrote:

On Mon, Apr 4, 2016 at 10:20 AM, Josh Elser  wrote:


Cool, thanks for the poke, Ben!

Last I checked, our version of the LRUBlockCache was nearly identical to
what was in HBase 1.x. I would imagine this would be easy to bring over.

Maybe we can also try to swipe BucketCache while we're at it and get some
off-heap support for blocks.

Aside: it would be nice if we could somehow find a way to share code like
this across the projects. HBase was interested in FATE for some time, but
eventually created their own new solution. I'm not sure what else exists
that we might want to share between projects.



One way to share code is to spin off new projects, like the following :

  https://github.com/snazy/ohc



I'd much rather share code under ASF governance. Some code that
started in Hadoop recently jumped over to Apache Commons, maybe that
could serve as an example?


https://s.apache.org/commons-crypto-vote

https://github.com/intel-hadoop/chimera/blob/master/PROPOSAL.html

https://s.apache.org/commons-crypto-vote-result

Maybe we should start a new thread to go over things we'd like to start sharing?


Ooo, Apache Commons would be a nice home, IMO. Keeping it under the ASF 
governance guidelines would be a plus. I feel like I had a talk about 
this with Billie (and Stack, maybe?) a long time ago. Figuring out the 
proper scope for it will be tricky though. Maybe we should already have 
a list of initial things before taking it to HBase and others? (maybe 
Cassandra would have benefit, too?)

Re: GPG verification in GitHub tags

2016-04-06 Thread Josh Elser

Well, it seems I made Github unhappy using a key for 
josh.el...@gmail.com but having the email on the commit/tag be 
elserj@apache :)


But, some of the releases I made are now listed as "verified".

Christopher wrote:

Devs,

I saw GitHub rolled out a new feature to verify GPG-signed commits and tags
in the UI [1]. If release managers upload their GPG public keys to their
profile in GitHub[2], it shows up as "Verified", which is pretty cool [3].

[1]: https://github.com/blog/2144-gpg-signature-verification
[2]: https://github.com/settings/keys
[3]: https://github.com/apache/accumulo/tags

Re: GPG verification in GitHub tags

2016-04-06 Thread Josh Elser


CHRISTOPHER. What is this voodoo. Looking into this now :)

Christopher wrote:

You could just add the extra identity to your key. :)

On Wed, Apr 6, 2016, 09:49 Josh Elser  wrote:


Well, it seems I made Github unhappy using a key for
josh.el...@gmail.com but having the email on the commit/tag be
elserj@apache :)

But, some of the releases I made are now listed as "verified".

Christopher wrote:

Devs,

I saw GitHub rolled out a new feature to verify GPG-signed commits and

tags

in the UI [1]. If release managers upload their GPG public keys to their
profile in GitHub[2], it shows up as "Verified", which is pretty cool

[3].

[1]: https://github.com/blog/2144-gpg-signature-verification
[2]: https://github.com/settings/keys
[3]: https://github.com/apache/accumulo/tags

Re: Master Thesis on False Positives in Test Failures

2016-04-08 Thread Josh Elser


Hi Kevin,

Many of those test bugs and fixes were probably my doing.

Most of them were just flakiness in general, but, if you can provide an 
explicit list, I can try to confirm whether or not that was exactly the 
case.


- Josh

Kevin van den Bekerom wrote:

Dear Developers of the Apache Accumulo project,



My name is Kevin van den Bekerom and I am currently doing my Master's
research on the topic of false alarms in test code. I would like to ask the
input of the Accumulo development team categorizing test code bugs.



My research is based on a recent paper by Arash et al. (
http://salt.ece.ubc.ca/publications/docs/icsme15.pdf). They conducted an
empirical study, categorizing "test code bugs" in Apache software projects,
e.g. semantic, flaky, environmental, etc. A "test code bug" is a failing
test, where the System Under Test is correct, but the test code is
incorrect. To identify test code bugs they looked at issues in JIRA, and
checked if the fixing commit was only in the test code. Only fixed issues
were counted and categorised.



My goal is to replicate their results using a different approach, i.e. ask
developers that were involved in the issue and/or fix how they would
categorize it.  For the Accumulo project they counted 187 test code bugs.
Insight into false positives can therefore be very relevant for your
project. Note that they only sampled a number of identified test code bugs
for individual inspection (30 for the Accumulo project).


I would like to ask the Accumulo team’s participation in categorizing the
various test code bugs. I will provide a list of JIRA IDs which are
identified as test code bugs and an initial list of categories to aid in
the categorization process. In my belief, the developers that worked on the
issue are the one's that are most capable of categorizing the issue. Please
let me know if this project looks interesting to you and you are willing to
help me out.



As a next step I will look for common patterns in identified test code bugs
and my aim is to extent static source code analysis techniques to be also
suited to find test code bugs. I am of course very happy to share my
findings with the team.



Hope to hear from you!



With kind regards,

Re: Fwd: Data authorization/visibility limit in Accumulo

2016-04-08 Thread Josh Elser

Hi Fikri,

Welcome! You're the first Accumulo enthusiast I've heard from in
Indonesia :)

Responses inline:

Fikri Akbar wrote:

Hi Guys,

We're a group of accumulo enthusiasts from Indonesia. We've been trying to
implement accumulo for several different type of data processing purposes.
We've got several questions regarding Accumulo, which you might help us
with. We encounter these issues when we're trying to process heavy amount
of data, our questions are as follows:

1. Let's say that I have a file in HDFS that's about 300 GB with a total
1.6 Billion rows, and each line are separated by "^". The question is, what
is the most effective way to move the data to Accumulo (with assumption
that the structure of each cell is [rowkey cf:cq vis value] => [lineNumber
raw:columnName fileName columnValue])?

For a 300GB file, you likely want to use MapReduce to ingest it into
Accumulo. You can use the AccumuloOutputFormat to write to Accumulo
directly from a MapReduce job.

Reading data whose lines are separated by a '^' will likely require some
custom InputFormat. I'm not sure if one already exists that you can
build from. If you can convert the '^' to a standard newline character,
you can probably leverage the existing TextInputFormat or similar.

2. What is the most effective way to ingest data, if we're receiving data
with the size of>1 TB on a daily basis?

If latency is not a primary concern, creating Accumuo RFiles and
performing bulk ingest/bulk loading is by far the most efficient way to
getting data into Accumulo. This is often done by a MapReduce job to
process your incoming data, create Accumulo RFiles and then bulk load
these files into Accumulo. If you have a low latency for getting data
into Accumuo, waiting for a MapReduce job to complete may take too long
to meet your required latencies.

3. We're currently testing the ability of Accumulo for its data-level
access control, however the issue regarding the limit of dataset
authorization occurred when the datasets reached>20,000.

For example, lets say user X has a data called one.txt. This will make user
X has authorization to one.txt (let's call it X.one.txt). Now, what if X
has more than that (one.txt, two.xt, three.txt...n.txt), this will result
in user X having multiple authorization (as much as the data or n
authorization) and apparently when we tried it for datasets>20,000 (which
user will have>20,000 authorization), we're not able to execute "get
auth". We find that this is a very crucial issue, especially if (in one
case) there's>20,000 datasets that is being granted authorization at once.

Accumulo's column visibilities don't directly work well in the situation
you describe; this is likely why you are having problems. Specifically,
because the ColumnVisibility is a part of the Accumulo Key, you cannot
update it without removing the old Key-Value and adding a new one.

As such, ColumnVisibilities work much better as a labelling system than
a direct authorization mechanism. Does that make sense? They are a
building block to help you build authorization, not a complete
authorization system on their own.

Authorizations for users are stored in ZooKeeper by default, which is
probably why you were having problems with 20k+ authorizations.

Can you go into some detail on what your access control requirements
are? For example, are documents only visible to one user known at ingest
time? Do the set of allowed users for a file change over time?

Commonly, some external system that manages the current roles for a user
is a better approach here. For some $user, you can configure Accumulo to
query that system to get the set of authorizations that $user current
has and query that way. With some more specifics, we can try to get you
a better recommendation.

The following are error logs from our system.

*Error log in shell:*

org.apache.accumulo.core.client.AccumuloException:
org.apache.thrift.TApplicationException: Internal error processing
getUserAuthorizations
at
org.apache.accumulo.core.client.impl.SecurityOperationsImpl.execute(SecurityOperationsImpl.java:83)
at
org.apache.accumulo.core.client.impl.SecurityOperationsImpl.getUserAuthorizations(SecurityOperationsImpl.java:182)
at com.msk.auxilium.table.AuxUser.setUserAuth(AuxUser.java:310)
at
com.msk.auxilium.commons.UserSystem.getAuxUser(UserSystem.java:24)
at com.msk.auxilium.tester.HDFSTest.main(HDFSTest.java:57)
Caused by: org.apache.thrift.TApplicationException: Internal error
processing getUserAuthorizations
at
org.apache.thrift.TApplicationException.read(TApplicationException.java:108)
at
org.apache.thrift.TServiceClient.receiveBase(TServiceClient.java:71)
at
org.apache.accumulo.core.client.impl.thrift.ClientService$Client.recv_getUserAuthorizations(ClientService.java:580)
at
org.apache.accumulo.core.client.impl.thrift.ClientService$Client.getUserAuthorizations(ClientService

Re: Fwd: Data authorization/visibility limit in Accumulo

2016-04-10 Thread Josh Elser

Dylan Hutchison wrote:

>  2. What is the most effective way to ingest data, if we're receiving data

>>  with the size of>1 TB on a daily basis?
>>

>
>  If latency is not a primary concern, creating Accumuo RFiles and
>  performing bulk ingest/bulk loading is by far the most efficient way to
>  getting data into Accumulo. This is often done by a MapReduce job to
>  process your incoming data, create Accumulo RFiles and then bulk load these
>  files into Accumulo. If you have a low latency for getting data into
>  Accumuo, waiting for a MapReduce job to complete may take too long to meet
>  your required latencies.
>
>

If you need a lower latency, you still have the option of parallel ingest
via normal BatchWriters.  Assuming good load balancing and the same number
of ingestors as tablet servers, you should easily obtain ingest rates of
100k entries/sec/node.  With significant effort, some have pushed this to
400k entries/sec/node.

Josh, do we have numbers on bulk ingest rates?  I'm curious what the best
rates ever achieved are.

Hrm. Not that I'm aware of. Generally, a bulk import is some ZooKeeper 
operations (via FATE) and a few metadata updates per file (~3? i'm not 
actually sure). Maybe I'm missing something?

My hunch is that you'd run into HDFS issues in generating the data to 
import before you'd run into Accumulo limits. Eventually, compactions 
might bog you down too (depending on how you generated the data). I'm 
not sure if we even have a bulk-import benchmark (akin to continuous 
ingest).

Re: Our API compatibility history

2016-04-16 Thread Josh Elser


Awesome. This is pretty cool.

Thanks for sharing :)

Sean Busbey wrote:

yeah, it was pretty rough the first go around. at first I was like "oh
jeez what have I done."

:)

On Fri, Apr 15, 2016 at 10:38 PM, Christopher  wrote:

That's cool. Thanks for having him update it with the API definition. I
think I saw it before, and it looks much better now.

On Fri, Apr 15, 2016, 19:47 Sean Busbey  wrote:


Since this has now shown up in my Google alerts, I should stop dawdling on
pointing it out.

http://abi-laboratory.pro/java/tracker/timeline/accumulo/

The author of the Java API Compliance Checker has been working on showing
API changes over time. His tooling will eventually be open sourced, but for
now it's just for projects he chooses to include.

He was gracious enough to add Accumulo to the list after I expressed
interest. He was also able to quickly update it according to our API
definition after I let him know how to find it.

It's nice to see reinforcement that we've been doing well since adopting
semver!

--
Sean Busbey

Re: Checking what a BatchWriter is stuck on; failure during split

2016-04-19 Thread Josh Elser


Nice findings. Sorry I haven't had any cycles to dig into this myself.

I look forward to hearing what you find :)

Dylan Hutchison wrote:

I investigated a bit more and I am pretty sure the problem is that the
BatchWriter is not recognizing that the tablet vb<<  split into vb;2436<  and
vb<;2436.  It keeps trying to update the closed tablet vb<<.  Each update
writes 0 mutations and records a failure at the tablet server UpdateSession
because vb<<  is closed.

I'm not sure why this is happening because the BatchWriter should have
invalidated its tablet locator cache upon recognizing a failure.  Then it
would recognize that the entries it wants to write fall into the new
tablets vb;2436<  and vb<;2436.  I think there is a timing bug for this edge
case, when a table split occurs during heavy writes.

I will write this up if I can reproduce it.  Maybe it is too rare to matter.

Cheers, Dylan

On Mon, Apr 18, 2016 at 2:38 PM, Dylan Hutchison
wrote:



Hi devs,

I'd like to ask your help in figuring out what is happening to a
BatchWriter.  The following gives my reasoning so far.

In Accumulo 1.7.1, I have a BatchWriter that is stuck in WAITING status in
its addMutation method.  I saw that it is stuck by jstack'ing the Accumulo
client.  It's been stuck like this for 16 hours.

The BatchWriter is supposed to wait when a mutation is added if no
failures have recorded and either (a) the total memory used exceeds the
maximum allowed for the BatchWriter, or (b) the batchwriter is currently
flushed.  So we conclude that one of (a) or (b) have occurred and no
failures were recorded, at the time when addMutation was called.  I think
(a) is likely.

The BatchWriter is supposed to notify itself when either (1) a flush
finishes, (2) a constraint violation or authorization failure or server
error or unknown error occurs, (3) memory usage decreases, which happens
when entries successfully send to the tablet server.  Since the BatchWriter
is stuck on WAITING, none of these conditions are occurring.

The BatchWriter has 3 write threads (the default number).  All three have
status TIMED_WAITING (parked) in jstack.  Their stack traces don't give
useful information.

Here's what I can tell from the tserver logs.  A new table (and tablet)
was created successfully.  The BatchWriter started writing to this tablet
steadily.  The logs show that the tablet (vb<<) flushed every 5 seconds or
so and major compacted at a steady periodic rate.

Everything looks good, until vb<<  grew large enough that it needed
splitting.  This occurred about 42 minutes after the BatchWriter started
writing entries.  The logs show a failure in an UpdateSession that popped
up in the middle of the split operation.  This failure continues to show
for the next 15 hours.

I copied the portion of the tserver logs that look relevant to the split
below.  I highlighted the line reporting the first failure.  It occurs in
between when the split starts and when it finishes.

Any idea what could have caused this?  I don't know if the failure is
related to the BatchWriter being stuck in WAITING.  It seems likely.  I
think it is weird that the 3 write threads are all idle; at least one of
them should be doing something if the thread calling addMutation() is
waiting.

Here is a pastebin of the jstack, though I
think I wrote the useful parts from it.

2016-04-17 22:38:06,436 [tablet.Tablet] TABLET_HIST: vb<<  closed
2016-04-17 22:38:06,439 [tablet.Tablet] DEBUG: Files for low split
vb;2436<
  [hdfs://localhost:9000/accumulo/tables/vb/default_tablet/C8lh.rf,
hdfs://localhost:9000/accumulo/tables/vb/default_tablet/C9iz.rf,
hdfs://localhost:9000/accumulo/tables/vb/default_tablet/Ca08.rf,
hdfs://localhost:9000/accumulo/tables/vb/default_tablet/Ca4t.rf,
hdfs://localhost:9000/accumulo/tables/vb/default_tablet/Ca7m.rf, hdfs:
//localhost:9000/accumulo/tables/vb/default_tablet/Ca8f.rf,
hdfs://localhost:9000/accumulo/tables/vb/default_tablet/Ca8n.rf,
hdfs://localhost:9000/accumulo/tables/vb/default_tablet/Fa8p.rf,
hdfs://localhost:9000/accumulo/tables/vb/default_tablet/Fa8q.rf,
hdfs://localhost:9000/accumulo/tables/vb/default_tablet/Ma93.rf,
hdfs://localhost:9000/accumulo/tables/vb/default_tablet/Ma9b.rf,
hdfs://localhost:90
00/accumulo/tables/vb/default_tablet/Ma9g.rf,
hdfs://localhost:9000/accumulo/tables/vb/default_tablet/Maa7.rf,
hdfs://localhost:9000/accumulo/tables/vb/default_tablet/Maap.rf,
hdfs://localhost:9000/accumulo/tables/vb/default_tablet/Mabe.rf,
hdfs://localhost:9000/accumulo/tables/vb/default_tablet/Mabn.rf]
2016-04-17 22:38:06,439 [tablet.Tablet] DEBUG: Files for high split
vb<;2436
  [hdfs://localhost:9000/accumulo/tables/vb/default_tablet/C8lh.rf,
hdfs://localhost:9000/accumulo/tables/vb/default_tablet/C9iz.rf,
hdfs://localhost:9000/accumulo/tables/vb/default_tablet/Ca08.rf,
hdfs://localhost:9000/accumulo/tables/vb/default_tablet/Ca4t.rf,
hdfs://localhos

Re: Checking what a BatchWriter is stuck on; failure during split

2016-04-22 Thread Josh Elser


Awesome hunting and write-up, Dylan!

Dylan Hutchison wrote:

Hi all,

Check out ACCUMULO-4229
<https://issues.apache.org/jira/browse/ACCUMULO-4229>.  I copy the text
below to include it in our discussion.

This issue came up again during a similar long-lasting BatchWrite.  It had
the same circumstances: the failures started happening after a split event.

I turned on TRACE logs and I think I pinned it down: the TabletLocator
cached by a BatchWriter gets out of sync with the static cache of
TabletLocators.

1. The TabletServerBatchWriter caches a TabletLocator from the static
collection of TabletLocators when it starts writing.  Suppose it is writing
to tablet T1.
2. The TabletServerBatchWriter uses its locally cached TabletLocator
inside its `binMutations` method for its entire lifespan; this cache is
never refreshed or updated to sync up with the static collection of
TabletLocators.
3. Every hour, the static collection of TabletLocators clears itself.
The next call to get a TabletLocator from the static collection allocates a
new TabletLocator.  Unfortunately, the TabletServerBatchWriter does not
reflect this change and continues to use the old, locally cached
TabletLocator.
4. Tablet T1 splits into T2 and T3, which closes T1.  As such, it no
longer exists and the tablet server that receives the entries meant to go
to T1 all fail to write because T1 is closed.
5. The TabletServerBatchWriter receives the response from the tablet
server that all entries failed to write.  It invalidates the cache of the
*new* TabletLocator obtained from the static collection of TabletLocators.
The old TabletLocator that is cached locally does not get invalidated.
6. The TabletServerBatchWriter re-queues the failed entries and tries to
write them to the same closed tablet T1, because it is still looking up
tablets using the old TabletLocator.

This behavior subsumes the circumstances William wrote about in the thread
<https://mail-archives.apache.org/mod_mbox/accumulo-user/201406.mbox/%3ccamz+duvmmhegon9ejehr9h_rrpp50l2qz53bbdruvo0pira...@mail.gmail.com%3E>
he mentioned.  The problem would occur as a result of either splits or
major compactions.  It would only stop the BatchWriter if its entire memory
filled up with writes to the same tablet that was closed as a result of a
majc or split; otherwise it would just slow down the BatchWriter by failing
to write some number of entries with every RPC.

There are a few solutions we can think of.

1. Not have the MutationWriter inside the TabletServerBatchWriter
locally cache TabletLocators.  I suspect this was done for performance
reasons, so it's probably not a good solution.
2. Have all the MutationWriters clear their cache at the same time the
static TabletLocator cache clears.  I like this one.  We could store a
reference to the Map that each MutationWriter has inside a static
synchronized WeakHashMap.  The only time the weak map needs to be accessed
is:
   1. When a MutationWriter is constructed (from constructing a
   TabletServerBatchWriter), add its new local TabletLocator cache
to the weak
   map.
   2. When the static TabletLocator cache is cleared, also clear every
   map in the weak map.
3. Another solution is to make the invalidate calls on the local
TabletLocator cache rather than the global static one.  If we go this route
we should double check the idea to make sure it does not impact the
correctness of any other pieces of code that use the cache. I like the
previous idea better.

The TimeoutTabletLocator does not help when no timeout is set on the
BatchWriter (the default behavior).


On Tue, Apr 19, 2016 at 8:09 PM, William Slacum  wrote:


Good digs, Dylan. I don't think it's too rare to matter. I notice  often
during MR jobs, and there's usually a point where I give up and just start
writing RFiles.

It could possibly be related to what I saw back in the dayoday with:

https://mail-archives.apache.org/mod_mbox/accumulo-user/201406.mbox/%3ccamz+duvmmhegon9ejehr9h_rrpp50l2qz53bbdruvo0pira...@mail.gmail.com%3E

On Tue, Apr 19, 2016 at 6:26 PM, Josh Elser  wrote:


Nice findings. Sorry I haven't had any cycles to dig into this myself.

I look forward to hearing what you find :)


Dylan Hutchison wrote:


I investigated a bit more and I am pretty sure the problem is that the
BatchWriter is not recognizing that the tablet vb<<   split into vb;2436<
and
vb<;2436.  It keeps trying to update the closed tablet vb<<.  Each

update

writes 0 mutations and records a failure at the tablet server
UpdateSession
because vb<<   is closed.

I'm not sure why this is happening because the BatchWriter should have
invalidated its tablet locator cache upon recognizing a failure.  Then

it

would recognize that the entries it wants to write fall into the new
t

Regarding JIRA spam

2016-04-22 Thread Josh Elser

I wanted to take a moment to personally say "thank you" to everyone who 
has been helping out with the recent deluge of bogus JIRA issues. It's 
very much appreciated.


ICYMI, there's some ongoing chatter on infrastructure@a.o regarding 
correction that INFRA is pursuing to negate this further for those who 
are interested.


- Josh

Re: Accumulo on s3

2016-04-25 Thread Josh Elser

I'm not sure on the guarantees of s3 (much less the s3 or s3a Hadoop 
FileSystem implementations), but, historically, the common issue is 
lacking/incorrect implementations of sync(). For durability (read-as: 
not losing your data), Accumulo *must* know that when it calls sync() on 
a file, the data is persisted.


I don't know definitively what S3 guarantees (or asserts to guarantee), 
but I would be very afraid until I ran some testing (we have one good 
test in Accumulo that can run for days and verify data integrity called 
continuous ingest).


You might have luck reaching out to the Hadoop community to get some 
understanding from them about what can reasonably be expected with the 
current S3 FileSystem implementations, and then run your own tests to 
make sure that data is not lost.


vdelmeglio wrote:

Hi everyone,

I recently got this answer on stackoverflow (link:
http://stackoverflow.com/questions/36602719/accumulo-cluster-in-aws-with-s3-not-really-stable/36772874#36772874):



  Yes, I would expect that running Accumulo with S3 would result in
problems. Even though S3 has a FileSystem implementation, it does not
behave like a normal file system. Some examples of the differences are
that operations we would expect to be atomic are not atomic in S3,
exceptions may mean different things than we expect, and we assume our
view of files and their metadata is consistent rather than the eventual
consistency S3 provides.

It's possible these issues could be mitigated if we made some
modifications to the Accumulo code, but as far as I know no one has tried
running Accumulo on S3 to figure out the problems and whether those could
be fixed or not.


Since we're currently running an accumulo cluster on aws with s3 for
evaluation purpose, this answer make me wonder, should someone explain me
why running accumulo on s3 is not a good idea? in the specific, which
operations are expected to be atomic on accumulo?

Is there eventually a roadmap for s3 compatibility?

Thanks!
Valerio



--
View this message in context: 
http://apache-accumulo.1065345.n5.nabble.com/Accumulo-on-s3-tp16737.html
Sent from the Developers mailing list archive at Nabble.com.

On the future of our commons-vfs2 dynamic classloading implementation

2016-04-25 Thread Josh Elser

I was trying to test out Dylan's patch this weekend and was met with a 
repeated failure of another VFS unit test due to the same race condition 
we've been fighting against for years.


A cursory glance to vfs' website show still shows that they haven't made 
the 2.1 release which supposedly fixes this issue. In other words, they 
have no made a release since 2011.


I'm now under the assumption that we cannot rely on them to make a 
release. As such, if the community wishes to continue to support this 
feature, something needs to happen before 1.8.0 -- fork vfs, use/build 
some other library for this functionality, or remove the 
dynamic-classloader functionality from Accumulo completely.


I've tried to be very patient waiting for this to happen, but I'm rather 
frustrated having wasted significant time over the past years (not even 
exaggerating which is even crazier) working around known broken code 
that is unusable by users.


Thoughts?

- Josh

Re: On the future of our commons-vfs2 dynamic classloading implementation

2016-04-25 Thread Josh Elser

Thanks Dave. I know you've spent a bit of time trying to push them in 
the right direction.


I don't have a lot of familiarity with how the commons-* actually work. 
I am more than happy to address any issues of pick-and-choose engagement 
by commons-* folks at the ASF level if you have more info to back this 
up. I trust Benson to raise the issue to the right group if there is 
something inherently wrong.


Based on your emails, it seems like there have been discussions having a 
release for 2+ years, but no discernible progress (despite your offers 
to help with the process).


Being the one who has been making the most effort on this front, what do 
you feel is the best course of action to unblock us?


dlmar...@comcast.net wrote:

I feel your pain and am very frustrated by the lack of support from the Commons 
team. I have brought up the subject multiple times[1,2,3] and have even 
volunteered to do the release. FWIW, I am using the features of the new 
classloader in production with little issue (using the 2.1 snapshot code). A 
few months ago I discussed this with Christopher and the topic of forking did 
come up. Also note that Benson just went through the release process for 
Commons IO and it was not pain free. Apparently they are willing to work with 
some people and not others.

[1] 
http://markmail.org/message/4iczynn2tqbtwdhd?q=VFS+2.1+release+list:org.apache.commons.dev/+from:%22dlmarion%40comcast.net%22&page=1
[2] 
http://markmail.org/search/?q=VFS%202.1%20release%20list%3Aorg.apache.commons.dev%2F#query:VFS%202.1%20release%20list%3Aorg.apache.commons.dev%2F%20from%3A%22dlmarion%40comcast.net%22+page:1+mid:zkjkgvpsrh4blvtj+state:results
[3] 
http://markmail.org/message/ojgizsfevkjjl6jv?q=VFS+2.1+release+list:org.apache.commons.dev/+from:%22dlmarion%40comcast.net%22&page=1

- Original Message -

From: "Mike Drob"
To: dev@accumulo.apache.org
Sent: Monday, April 25, 2016 11:37:25 AM
Subject: Re: On the future of our commons-vfs2 dynamic classloading 
implementation

Have we asked them about making a release?

On Mon, Apr 25, 2016 at 10:35 AM, Josh Elser  wrote:


I was trying to test out Dylan's patch this weekend and was met with a
repeated failure of another VFS unit test due to the same race condition
we've been fighting against for years.

A cursory glance to vfs' website show still shows that they haven't made
the 2.1 release which supposedly fixes this issue. In other words, they
have no made a release since 2011.

I'm now under the assumption that we cannot rely on them to make a
release. As such, if the community wishes to continue to support this
feature, something needs to happen before 1.8.0 -- fork vfs, use/build some
other library for this functionality, or remove the dynamic-classloader
functionality from Accumulo completely.

I've tried to be very patient waiting for this to happen, but I'm rather
frustrated having wasted significant time over the past years (not even
exaggerating which is even crazier) working around known broken code that
is unusable by users.

Thoughts?

- Josh

Re: Accumulo on s3

2016-04-25 Thread Josh Elser

Yeah, ec2's EBS and ephemeral storage are fine AFAIK. I just don't know
much anything at all about S3 (which might be why I'm inherently so
pessimistic about it working :P).

Dylan Hutchison wrote:

Hey Josh,

Are there other platforms on AWS (or another cloud provider) that
Accumulo/HDFS are friendly to run on? I thought I remembered you and
others running the agitation tests on Amazon instances during
release-testing time. If there are alternatives, what advantages would S3
have over the current method?

On Mon, Apr 25, 2016 at 8:09 AM, Josh Elser wrote:

I'm not sure on the guarantees of s3 (much less the s3 or s3a Hadoop
FileSystem implementations), but, historically, the common issue is
lacking/incorrect implementations of sync(). For durability (read-as: not
losing your data), Accumulo *must* know that when it calls sync() on a
file, the data is persisted.

I don't know definitively what S3 guarantees (or asserts to guarantee),
but I would be very afraid until I ran some testing (we have one good test
in Accumulo that can run for days and verify data integrity called
continuous ingest).

You might have luck reaching out to the Hadoop community to get some
understanding from them about what can reasonably be expected with the
current S3 FileSystem implementations, and then run your own tests to make
sure that data is not lost.

vdelmeglio wrote:

Hi everyone,

I recently got this answer on stackoverflow (link:

http://stackoverflow.com/questions/36602719/accumulo-cluster-in-aws-with-s3-not-really-stable/36772874#36772874
):

Yes, I would expect that running Accumulo with S3 would result in

problems. Even though S3 has a FileSystem implementation, it does not
behave like a normal file system. Some examples of the differences are
that operations we would expect to be atomic are not atomic in S3,
exceptions may mean different things than we expect, and we assume our
view of files and their metadata is consistent rather than the eventual
consistency S3 provides.

It's possible these issues could be mitigated if we made some
modifications to the Accumulo code, but as far as I know no one has tried
running Accumulo on S3 to figure out the problems and whether those could
be fixed or not.

Since we're currently running an accumulo cluster on aws with s3 for
evaluation purpose, this answer make me wonder, should someone explain me
why running accumulo on s3 is not a good idea? in the specific, which
operations are expected to be atomic on accumulo?

Is there eventually a roadmap for s3 compatibility?

Thanks!
Valerio

--
View this message in context:
http://apache-accumulo.1065345.n5.nabble.com/Accumulo-on-s3-tp16737.html
Sent from the Developers mailing list archive at Nabble.com.

Re: On the future of our commons-vfs2 dynamic classloading implementation

2016-04-25 Thread Josh Elser


Alright, I'll suck it up and start the conversation yet again.

At least that will remove me from any future guilt ;)

dlmar...@comcast.net wrote:

My email from Dec 2015 was sent as a last ditch effort before we fork. I don't 
remember receiving a response to it. I may have worn out my welcome in that 
community as I have been outspoken on the lack of movement. It might be useful 
for someone else to try once more, and if not, then we fork VFS, remove all of 
the things we don't need, and make it better.


- Original Message -----

From: "Josh Elser"
To: dev@accumulo.apache.org
Sent: Monday, April 25, 2016 1:02:08 PM
Subject: Re: On the future of our commons-vfs2 dynamic classloading 
implementation

Thanks Dave. I know you've spent a bit of time trying to push them in
the right direction.

I don't have a lot of familiarity with how the commons-* actually work.
I am more than happy to address any issues of pick-and-choose engagement
by commons-* folks at the ASF level if you have more info to back this
up. I trust Benson to raise the issue to the right group if there is
something inherently wrong.

Based on your emails, it seems like there have been discussions having a
release for 2+ years, but no discernible progress (despite your offers
to help with the process).

Being the one who has been making the most effort on this front, what do
you feel is the best course of action to unblock us?

dlmar...@comcast.net wrote:

I feel your pain and am very frustrated by the lack of support from the Commons 
team. I have brought up the subject multiple times[1,2,3] and have even 
volunteered to do the release. FWIW, I am using the features of the new 
classloader in production with little issue (using the 2.1 snapshot code). A 
few months ago I discussed this with Christopher and the topic of forking did 
come up. Also note that Benson just went through the release process for 
Commons IO and it was not pain free. Apparently they are willing to work with 
some people and not others.

[1] 
http://markmail.org/message/4iczynn2tqbtwdhd?q=VFS+2.1+release+list:org.apache.commons.dev/+from:%22dlmarion%40comcast.net%22&page=1
[2] 
http://markmail.org/search/?q=VFS%202.1%20release%20list%3Aorg.apache.commons.dev%2F#query:VFS%202.1%20release%20list%3Aorg.apache.commons.dev%2F%20from%3A%22dlmarion%40comcast.net%22+page:1+mid:zkjkgvpsrh4blvtj+state:results
[3] 
http://markmail.org/message/ojgizsfevkjjl6jv?q=VFS+2.1+release+list:org.apache.commons.dev/+from:%22dlmarion%40comcast.net%22&page=1

- Original Message -

From: "Mike Drob"
To: dev@accumulo.apache.org
Sent: Monday, April 25, 2016 11:37:25 AM
Subject: Re: On the future of our commons-vfs2 dynamic classloading 
implementation

Have we asked them about making a release?

On Mon, Apr 25, 2016 at 10:35 AM, Josh Elser  wrote:


I was trying to test out Dylan's patch this weekend and was met with a
repeated failure of another VFS unit test due to the same race condition
we've been fighting against for years.

A cursory glance to vfs' website show still shows that they haven't made
the 2.1 release which supposedly fixes this issue. In other words, they
have no made a release since 2011.

I'm now under the assumption that we cannot rely on them to make a
release. As such, if the community wishes to continue to support this
feature, something needs to happen before 1.8.0 -- fork vfs, use/build some
other library for this functionality, or remove the dynamic-classloader
functionality from Accumulo completely.

I've tried to be very patient waiting for this to happen, but I'm rather
frustrated having wasted significant time over the past years (not even
exaggerating which is even crazier) working around known broken code that
is unusable by users.

Thoughts?

- Josh

Fwd: [VFS] 2.1 Release Plan

2016-04-25 Thread Josh Elser

FYI -- if anyone would like to follow on, see dev@c.a.o

 Original Message 
Subject: [VFS] 2.1 Release Plan
Date: Mon, 25 Apr 2016 15:06:03 -0400
From: Josh Elser 
To: d...@commons.apache.org

Hi all,

There are presently 171 resolved issues sitting in commons-vfs2
2.1-SNAPSHOT, with 4 outstanding (none of which look like blockers to me).

The lack of any release of commons-vfs2 in years has been a big problem
downstream. This past weekend, I was again annoyed by bugs that have
been fixed (but not release) which is spurring me to take some action.
There have been emails reaching back as far as 2014 asking when the next
release might be, not to mention the fact that vfs-2.0 was released in
2011 (!).

History aside, I'm reaching out today to:

1) See if anyone on the PMC has the cycles to volunteer as RM.
  1a) If not, how can you empower me (or others) to make the release on
your behalf.
2) Understand, specifically, what (if any) roadblocks exist to release
this version.

Thanks.

- Josh

Re: Accumulo on s3

2016-04-26 Thread Josh Elser

Shawn -- you win the gold star for the day from me. This is exactly the
fear I had, but had an inability put it into words correctly :)

Valerio/chutium -- The common scenario I have run into is that
processing jobs (your use of Spark) can read data from S3 and ingest it
into the database (Accumulo here, but commonly Hive or HBase for others).

One thought would be that you could use Spark to create Accumulo RFiles,
store those in s3 and then get load them into Accumulo running not on
s3. You could use ec2 to run Accumulo instead and bulk load your
pre-created Accumulo RFiles from s3 (this should be fine, but I haven't
tried it myself). It isn't quite the same as what you were hoping to get
via S3, but I think it could be very close (easy to reprovision ec2 and
re-import all of your current data to a new Accumulo instance).

I don't think S3 provides enough of a "real filesystem" implementation
to run Accumulo natively over -- Shawn's points really drive the "why"
home. This isn't something we can "fix Accumulo" to do as it would
change the entire characteristics of the ssytem. Accumulo needs to be
able to append data to a file and sync it to make it durable --
otherwise, Accumulo *will* eventually lose data. You may not see it in
trivial testing, but I guarantee you 100% that you will run into data
loss issues.

Does that make sense?

Shawn Walker wrote:

RFiles (Accumulo's primary data storage mechanism) are immutable and lazily
deleted (by the GC process). Read-after-write consistency for new files
should be sufficient for them. I suspect the only real gotchas would be:
NativeS3FileSystem has a 5G max file size, and NativeS3FileSystem is very
slow when renaming files. One might consider using the Hadoop S3 block
filesystem instead, for better rename performance.

On the other hand, write-ahead logs simply can't function as expected atop
the NativeS3FileSystem: A write-ahead log is an incrementally built file,
and S3 doesn't support a compatible concept of incremental writes to a
stored object. Neither hflush()'ing or hsync()'ing a FSDataOutputStream
sourced from a NativeS3FileSystem actually makes any data available outside
the process. The S3 block filesystem improves matters slightly, but not
enough to

This means that a tablet server death (e.g. caused by a lost Zookeeper
lock) can (almost certainly will) lead to lost mutations. It strikes me
that this would be particularly bad for mutations against the root tablet
or metadata tablets, and can leave Accumulo in an inconsistent state.

Working around that limitation in Accumulo would likely prove somewhere
between difficult and impossible. At the least, it might mean redesigning
the entire WAL concept.

--
Shawn Walker

On Tue, Apr 26, 2016 at 5:12 AM, chutium wrote:

Hi Josh,

about the guarantees of s3, according to this doc from amazon:

https://docs.aws.amazon.com/ElasticMapReduce/latest/ReleaseGuide/emr-plan-consistent-view.html

Amazon S3 buckets in xxx, xxx regions provide read-after-write

consistency

for put requests of new objects and eventual consistency for overwrite

put

and delete requests.

so may be accumulo will get problem with consistency only by major
compactions right? it seems no other operation is overwriting or deleting
files on HDFS.

let me describe our usage of accumulo on s3, basically, we want to combine
the unlimited storage feature of s3 and the fine grained access control
provided by accumulo.

we are using "accumulo on s3" as a secured storage behind data processing
engine (spark), data are ingested into accumulo regularly, not in real time
(no single put, batch ingestion each X hours), most of data access use
cases
are batch processing, so no realtime read or write.

then consistency or sync will still be a problem or not?

I added some thoughts of mine in that stackoverflow thread:
http://stackoverflow.com/a/36845743/5630352 , I really want to know is
this
possible to solve the s3 problem for our use case? because it seems until
now, no other tools can provide such a powerful access control framework
like accumulo.

Thanks!

--
View this message in context:
http://apache-accumulo.1065345.n5.nabble.com/Accumulo-on-s3-tp16737p16764.html
Sent from the Developers mailing list archive at Nabble.com.

Re: Fwd: Data authorization/visibility limit in Accumulo

2016-04-26 Thread Josh Elser

Yeah, for building a real security-tagging system, the labeling that 
Accumulo does is only one "piece of the puzzle". For example, you would 
likely have external systems that define the authorizations that your 
users would have. The authorization and labeling that Accumulo does is a 
hard piece of that puzzle on its own, it's just not, in itself, a 
complete solution for you to pull off the shelf.


A very valid criticism of Accumulo would be that there aren't good 
public examples available that show how this can be built (they're 
always talked about in the abstract).


Such a practical example would be of great worth to the project/community.

Fikri Akbar wrote:

Hi Again guys,

Thanks Josh&  Dylan for the suggestions, here are our updates regarding the
data ingestion&  visibility issues.

We've tried several workaround and used some of the suggestions offered for
our particular issues, and we've decided for the data ingestion to use the
bulk ingestion as the most suitable approach for getting large data sets
into accumulo.

"creating Accumuo RFiles and performing bulk ingest/bulk loading is by far
the most efficient way to getting data into Accumulo."

However, for the visibility issue, we don't find any suitable solution from
accumulo (at the moment) to handle the situation. We have suspected that
accumulo's visibility is more towards labeling, turns out it is true :(.
Hence, we decided to build (sort of) security layer for user's data
visibility level. We're on a tight schedule, this won't be the final
solution, but definitely will help us mend the situation a bit. Hopefully
we'll come up/find something better in the near future.

Thanks again for the help guys, much appreciated. Cheers

Regards,

*Fikri Akbar*
Technology


*PT Mediatrac Sistem Komunikasi*
Grha Tirtadi 2nd Floor   |   Jl. Senopati 71-73   |   Jakarta 12110   |
Indonesia   |   *M**ap* 6°13'57.37"S 106°48'42.29"E
*P* +62 21 520 2568   |   *F* +62 21 520 4180   |   *M*  +62 812 1243 4786
|   *www.mediatrac.co.id*

Re: On the future of our commons-vfs2 dynamic classloading implementation

2016-04-26 Thread Josh Elser


Thanks, Dave.

For those not following along on dev@commons.a.o -- I do not see any 
roadblocks which would completely negate my ability to act as release 
manager. I believe history can show us that the community there is not 
capable of making a release of commons-vfs (as it's been 4+ years). As 
such, I volunteered to be an RM for 2.1. I'm waiting on an ACK from 
them, but I don't anticipate any negative feedback.


dlmar...@comcast.net wrote:

Josh,

I see that you have made progress. Let me know how I can help get this released.

Dave
- Original Message -----

From: "Josh Elser"
To: dev@accumulo.apache.org
Sent: Monday, April 25, 2016 2:57:45 PM
Subject: Re: On the future of our commons-vfs2 dynamic classloading 
implementation

Alright, I'll suck it up and start the conversation yet again.

At least that will remove me from any future guilt ;)

dlmar...@comcast.net wrote:

My email from Dec 2015 was sent as a last ditch effort before we fork. I don't 
remember receiving a response to it. I may have worn out my welcome in that 
community as I have been outspoken on the lack of movement. It might be useful 
for someone else to try once more, and if not, then we fork VFS, remove all of 
the things we don't need, and make it better.


----- Original Message -

From: "Josh Elser"
To: dev@accumulo.apache.org
Sent: Monday, April 25, 2016 1:02:08 PM
Subject: Re: On the future of our commons-vfs2 dynamic classloading 
implementation

Thanks Dave. I know you've spent a bit of time trying to push them in
the right direction.

I don't have a lot of familiarity with how the commons-* actually work.
I am more than happy to address any issues of pick-and-choose engagement
by commons-* folks at the ASF level if you have more info to back this
up. I trust Benson to raise the issue to the right group if there is
something inherently wrong.

Based on your emails, it seems like there have been discussions having a
release for 2+ years, but no discernible progress (despite your offers
to help with the process).

Being the one who has been making the most effort on this front, what do
you feel is the best course of action to unblock us?

dlmar...@comcast.net wrote:

I feel your pain and am very frustrated by the lack of support from the Commons 
team. I have brought up the subject multiple times[1,2,3] and have even 
volunteered to do the release. FWIW, I am using the features of the new 
classloader in production with little issue (using the 2.1 snapshot code). A 
few months ago I discussed this with Christopher and the topic of forking did 
come up. Also note that Benson just went through the release process for 
Commons IO and it was not pain free. Apparently they are willing to work with 
some people and not others.

[1] 
http://markmail.org/message/4iczynn2tqbtwdhd?q=VFS+2.1+release+list:org.apache.commons.dev/+from:%22dlmarion%40comcast.net%22&page=1
[2] 
http://markmail.org/search/?q=VFS%202.1%20release%20list%3Aorg.apache.commons.dev%2F#query:VFS%202.1%20release%20list%3Aorg.apache.commons.dev%2F%20from%3A%22dlmarion%40comcast.net%22+page:1+mid:zkjkgvpsrh4blvtj+state:results
[3] 
http://markmail.org/message/ojgizsfevkjjl6jv?q=VFS+2.1+release+list:org.apache.commons.dev/+from:%22dlmarion%40comcast.net%22&page=1

- Original Message -

From: "Mike Drob"
To: dev@accumulo.apache.org
Sent: Monday, April 25, 2016 11:37:25 AM
Subject: Re: On the future of our commons-vfs2 dynamic classloading 
implementation

Have we asked them about making a release?

On Mon, Apr 25, 2016 at 10:35 AM, Josh Elser  wrote:


I was trying to test out Dylan's patch this weekend and was met with a
repeated failure of another VFS unit test due to the same race condition
we've been fighting against for years.

A cursory glance to vfs' website show still shows that they haven't made
the 2.1 release which supposedly fixes this issue. In other words, they
have no made a release since 2011.

I'm now under the assumption that we cannot rely on them to make a
release. As such, if the community wishes to continue to support this
feature, something needs to happen before 1.8.0 -- fork vfs, use/build some
other library for this functionality, or remove the dynamic-classloader
functionality from Accumulo completely.

I've tried to be very patient waiting for this to happen, but I'm rather
frustrated having wasted significant time over the past years (not even
exaggerating which is even crazier) working around known broken code that
is unusable by users.

Thoughts?

- Josh

[DISCUSS] Java 8 support (was Fwd: [jira] [Commented] (ACCUMULO-4177) TinyLFU-based BlockCache)

2016-05-01 Thread Josh Elser


Folks --

Let's come up with a plan for Java 8 support. Do we bump minJdk for 
accumulo-1.8.0 to 8? Should we fork a branch for 1.8 and make master 
2.0.0-SNAPSHOT (and do the bump there)?


Other approaches?

- Josh

 Original Message 
Subject: [jira] [Commented] (ACCUMULO-4177) TinyLFU-based BlockCache
Date: Sat, 30 Apr 2016 01:06:12 + (UTC)
From: Ben Manes (JIRA) 
Reply-To: j...@apache.org
To: notificati...@accumulo.apache.org


[ 
https://issues.apache.org/jira/browse/ACCUMULO-4177?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15265032#comment-15265032 
]


Ben Manes commented on ACCUMULO-4177:
-

I can put something together when Accumulo is ready to accept Java 8 
patches. Let me know.



TinyLFU-based BlockCache


Key: ACCUMULO-4177
URL: https://issues.apache.org/jira/browse/ACCUMULO-4177
Project: Accumulo
 Issue Type: Improvement
   Reporter: Ben Manes

[LruBlockCache|https://github.com/apache/accumulo/blob/master/core/src/main/java/org/apache/accumulo/core/file/blockfile/cache/LruBlockCache.java]
 appears to be based on HBase's. I currently have a patch being reviewed in 
[HBASE-15560|https://issues.apache.org/jira/browse/HBASE-15560] that replaces 
the pseudo Segmented LRU with the TinyLFU eviction policy. That should allow 
the cache to make [better 
predictions|https://github.com/ben-manes/caffeine/wiki/Efficiency] based on 
frequency and recency, such as improved scan resistance. The implementation 
uses [Caffeine|https://github.com/ben-manes/caffeine], the successor to Guava's 
cache, to provide concurrency and keep the patch small.
Full details are in the JIRA ticket. I think it should be easy to port if there 
is interest.




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Re: [DISCUSS] Java 8 support (was Fwd: [jira] [Commented] (ACCUMULO-4177) TinyLFU-based BlockCache)

2016-05-02 Thread Josh Elser


Thanks for the input, Sean.

Playing devil's advocate: we didn't have a major version bump when we 
dropped JDK6 support (in Accumulo-1.7.0). Oracle has EOL'ed java 7 back 
in April 2015. Was the 6->7 upgrade different than a 7->8 upgrade?


Sean Busbey wrote:

If we drop jdk7 support, I would strongly prefer a major version bump.

On Sun, May 1, 2016 at 1:43 PM, Josh Elser  wrote:

Folks --

Let's come up with a plan for Java 8 support. Do we bump minJdk for
accumulo-1.8.0 to 8? Should we fork a branch for 1.8 and make master
2.0.0-SNAPSHOT (and do the bump there)?

Other approaches?

- Josh

 Original Message 
Subject: [jira] [Commented] (ACCUMULO-4177) TinyLFU-based BlockCache
Date: Sat, 30 Apr 2016 01:06:12 + (UTC)
From: Ben Manes (JIRA)
Reply-To: j...@apache.org
To: notificati...@accumulo.apache.org


 [
https://issues.apache.org/jira/browse/ACCUMULO-4177?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15265032#comment-15265032
]

Ben Manes commented on ACCUMULO-4177:
-

I can put something together when Accumulo is ready to accept Java 8
patches. Let me know.


TinyLFU-based BlockCache


 Key: ACCUMULO-4177
 URL: https://issues.apache.org/jira/browse/ACCUMULO-4177
 Project: Accumulo
  Issue Type: Improvement
Reporter: Ben Manes


[LruBlockCache|https://github.com/apache/accumulo/blob/master/core/src/main/java/org/apache/accumulo/core/file/blockfile/cache/LruBlockCache.java]
appears to be based on HBase's. I currently have a patch being reviewed in
[HBASE-15560|https://issues.apache.org/jira/browse/HBASE-15560] that
replaces the pseudo Segmented LRU with the TinyLFU eviction policy. That
should allow the cache to make [better
predictions|https://github.com/ben-manes/caffeine/wiki/Efficiency] based on
frequency and recency, such as improved scan resistance. The implementation
uses [Caffeine|https://github.com/ben-manes/caffeine], the successor to
Guava's cache, to provide concurrency and keep the patch small.
Full details are in the JIRA ticket. I think it should be easy to port if
there is interest.




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Re: [DISCUSS] Java 8 support (was Fwd: [jira] [Commented] (ACCUMULO-4177) TinyLFU-based BlockCache)

2016-05-02 Thread Josh Elser


Thanks, Dave.

What to do with 1.8.0 has come up a few times now with the general 
consensus that it's time. We probably need someone to get behind it and 
push.


I will hold my tongue about commons-vfs stuff as I have absolutely 
nothing nice to say about it :)


(at risk of derailing this conversation) If anyone is interested in 
being the 1.8.0 release manager, please start a thread so we can hash 
that out too.


dlmar...@comcast.net wrote:

I was thinking the same thing, and if we are going to do that, lets bump the 
versions of our major dependencies (Hadoop, ZK, etc). Josh is making good 
progress on the VFS front, much more than I was able to achieve. 1.8-SNAPSHOT 
has over 260 resolved issues. We could resolve the 5 open blockers, bump the 
JDK and dependency versions, and rename 1.8 to 2.0.

- Original Message -

From: "Sean Busbey"
To: "dev@accumulo apache. org"
Sent: Monday, May 2, 2016 1:54:53 AM
Subject: Re: [DISCUSS] Java 8 support (was Fwd: [jira] [Commented] 
(ACCUMULO-4177) TinyLFU-based BlockCache)

If we drop jdk7 support, I would strongly prefer a major version bump.

On Sun, May 1, 2016 at 1:43 PM, Josh Elser  wrote:

Folks --

Let's come up with a plan for Java 8 support. Do we bump minJdk for
accumulo-1.8.0 to 8? Should we fork a branch for 1.8 and make master
2.0.0-SNAPSHOT (and do the bump there)?

Other approaches?

- Josh

 Original Message 
Subject: [jira] [Commented] (ACCUMULO-4177) TinyLFU-based BlockCache
Date: Sat, 30 Apr 2016 01:06:12 + (UTC)
From: Ben Manes (JIRA)
Reply-To: j...@apache.org
To: notificati...@accumulo.apache.org


[
https://issues.apache.org/jira/browse/ACCUMULO-4177?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15265032#comment-15265032
]

Ben Manes commented on ACCUMULO-4177:
-

I can put something together when Accumulo is ready to accept Java 8
patches. Let me know.


TinyLFU-based BlockCache


Key: ACCUMULO-4177
URL: https://issues.apache.org/jira/browse/ACCUMULO-4177
Project: Accumulo
Issue Type: Improvement
Reporter: Ben Manes


[LruBlockCache|https://github.com/apache/accumulo/blob/master/core/src/main/java/org/apache/accumulo/core/file/blockfile/cache/LruBlockCache.java]
appears to be based on HBase's. I currently have a patch being reviewed in
[HBASE-15560|https://issues.apache.org/jira/browse/HBASE-15560] that
replaces the pseudo Segmented LRU with the TinyLFU eviction policy. That
should allow the cache to make [better
predictions|https://github.com/ben-manes/caffeine/wiki/Efficiency] based on
frequency and recency, such as improved scan resistance. The implementation
uses [Caffeine|https://github.com/ben-manes/caffeine], the successor to
Guava's cache, to provide concurrency and keep the patch small.
Full details are in the JIRA ticket. I think it should be easy to port if
there is interest.




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Re: [DISCUSS] Java 8 support (was Fwd: [jira] [Commented] (ACCUMULO-4177) TinyLFU-based BlockCache)

2016-05-02 Thread Josh Elser


I think Mike meant "Accumulo wasn't following SemVer".

To your question, Mike, per https://accumulo.apache.org/versioning.html

"The Apache Accumulo PMC closed a vote on 2014/12/12 which adopted 
Semantic Versioning 2.0.0 as the reference document"


According to reporter.a.o, 1.7.0 was released 2015/05/15.

I think that means by the time 1.7.0 landed, we were well versed in 
doing semver properly (maybe it was the 1.6 line that wasn't?).


William Slacum wrote:

Kind of hard to pull the semver card when java 1.7 technically isn't semver
compliant.

On Mon, May 2, 2016 at 10:31 AM, Mike Drob  wrote:


Wasn't 1.7.0 pre SemVer?

On Mon, May 2, 2016 at 8:55 AM, Josh Elser  wrote:


Thanks for the input, Sean.

Playing devil's advocate: we didn't have a major version bump when we
dropped JDK6 support (in Accumulo-1.7.0). Oracle has EOL'ed java 7 back

in

April 2015. Was the 6->7 upgrade different than a 7->8 upgrade?


Sean Busbey wrote:


If we drop jdk7 support, I would strongly prefer a major version bump.

On Sun, May 1, 2016 at 1:43 PM, Josh Elser

wrote:

Folks --

Let's come up with a plan for Java 8 support. Do we bump minJdk for
accumulo-1.8.0 to 8? Should we fork a branch for 1.8 and make master
2.0.0-SNAPSHOT (and do the bump there)?

Other approaches?

- Josh

 Original Message 
Subject: [jira] [Commented] (ACCUMULO-4177) TinyLFU-based BlockCache
Date: Sat, 30 Apr 2016 01:06:12 + (UTC)
From: Ben Manes (JIRA)
Reply-To: j...@apache.org
To: notificati...@accumulo.apache.org


  [



https://issues.apache.org/jira/browse/ACCUMULO-4177?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15265032#comment-15265032

]

Ben Manes commented on ACCUMULO-4177:
-

I can put something together when Accumulo is ready to accept Java 8
patches. Let me know.

TinyLFU-based BlockCache



  Key: ACCUMULO-4177
  URL:
https://issues.apache.org/jira/browse/ACCUMULO-4177
  Project: Accumulo
   Issue Type: Improvement
 Reporter: Ben Manes


[LruBlockCache|


https://github.com/apache/accumulo/blob/master/core/src/main/java/org/apache/accumulo/core/file/blockfile/cache/LruBlockCache.java

]
appears to be based on HBase's. I currently have a patch being

reviewed

in
[HBASE-15560|https://issues.apache.org/jira/browse/HBASE-15560] that
replaces the pseudo Segmented LRU with the TinyLFU eviction policy.

That

should allow the cache to make [better
predictions|https://github.com/ben-manes/caffeine/wiki/Efficiency]
based on
frequency and recency, such as improved scan resistance. The
implementation
uses [Caffeine|https://github.com/ben-manes/caffeine], the successor

to

Guava's cache, to provide concurrency and keep the patch small.
Full details are in the JIRA ticket. I think it should be easy to port
if
there is interest.




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Re: [DISCUSS] Java 8 support (was Fwd: [jira] [Commented] (ACCUMULO-4177) TinyLFU-based BlockCache)

2016-05-02 Thread Josh Elser

Sean Busbey wrote:

On Mon, May 2, 2016 at 8:55 AM, Josh Elser  wrote:

>  Thanks for the input, Sean.
>
>  Playing devil's advocate: we didn't have a major version bump when we
>  dropped JDK6 support (in Accumulo-1.7.0). Oracle has EOL'ed java 7 back in
>  April  2015. Was the 6->7 upgrade different than a 7->8 upgrade?
>

On Mon, May 2, 2016 at 10:31 AM, Keith Turner  wrote:

>  On Mon, May 2, 2016 at 1:54 AM, Sean Busbey  wrote:
>

>>  If we drop jdk7 support, I would strongly prefer a major version bump.
>>

>
>
>  Whats the rationale for binding a bump to Accumulo 2.0 with a bump in the
>  JDK version?
>

The decision to drop JDK6 support happened in latemarch  / earlyApril
2014[1], long before any of our discussions or decisions on semver.
AFAICT it didn't get discussed again, presumably because by the time
we got to 1.7.0 RCs it was too far in the past.

Thanks for the correction, Sean. I hadn't dug around closely enough.

Re: using Range.prefix

2016-05-03 Thread Josh Elser

Suggestions (preferably in patch form ;D) about how to improve the 
documentation's clarity are always appreciated.


z11373 wrote:

Ah.. I think I've gone thru the same issue some time back, and I forgot.
The documentation is somewhat not obvious, at least for me.
Anyway, it's working now. Thanks Billie!



--
View this message in context: 
http://apache-accumulo.1065345.n5.nabble.com/using-Range-prefix-tp16835p16837.html
Sent from the Developers mailing list archive at Nabble.com.

Re: [DISCUSS] Java 8 support (was Fwd: [jira] [Commented] (ACCUMULO-4177) TinyLFU-based BlockCache)

2016-05-03 Thread Josh Elser

That's a new assertion ("we can't actually use Java 8 features util 
Accumulo-2"), isn't it? We could use new Java 8 features internally 
which would require a minimum of Java 8 and not affect the public API. 
These are related, not mutally exclusive, IMO.


To Shawn's point: introducing Java 8 types/APIs was exactly the point -- 
we got here from ACCUMULO-4177 which does exactly that.


Mike Drob wrote:

I agree with Shawn's implied statement -- why bother dropping Java 7 in any
Accumulo 1.x if we can't actually make use of Java 8 features.until
Accumulo 2.0

On Tue, May 3, 2016 at 1:29 PM, Christopher  wrote:


Right, these are competing and mutually exclusive goals, so we need to
decide which is a priority and on what timeline we should transition to
Java 8 to support those goals.

On Tue, May 3, 2016 at 9:16 AM Shawn Walker
wrote:


I'm not sure that guaranteeing build-ability under Java 7 would address

the

issue that raised this discussion:  We (might) want to add a dependency
which requires Java 8.  Or, following Keith's comment, we might wish to
introduce Java 8 types (e.g. CompletableFuture) into Accumulo's

"public"

API.



On Mon, May 2, 2016 at 6:42 PM, Christopher  wrote:


I don't feel strongly about this, but I was kind of thinking that we'd

bump

to Java 8 dependency (opportunistically) when we were ready to develop

a

2.0 version. But, I'm not opposed to doing it on the 1.8 branch.

On Mon, May 2, 2016 at 2:50 PM William Slacum

wrote:

So my point about versioning WRT to the Java runtime is more about

how

there are incompatibilities within the granularity of Java versions

we

talk

about (I'm specifically referencing a Kerberos incompatibility within
versions of Java 7), so I think that just blanket saying "We support

Java X

or Y" really isn't enough. I personally feel some kind of version

bump

is

nice to say that something has changed, but until the public API

starts

exposing Java 8 features, it's a total cop out to say, "Here's all

these

bug fixes and some new features, oh by the way upgrade your

infrastructure

because we decided to use a new Java version for an optional

feature".

The best parallel I can think of is in Scala, where there's no binary
compatibility between minor versions (ie, 2.10, 2.11,etc), so there's
generally an extra qualifier on libraries marking the scala

compability

level. Would we ever want to have accumulo-server-1.7-j[7|8]  styled
artifacts to signal some general JRE compatibility? It's a total

mess,

but

I haven't seen a better solution.

Another idea is we could potentially have some guarantee for Java 7,

such

as making sure we can build a distribution using Java 7, but only
distribute Java 8 artifacts by default?

On Mon, May 2, 2016 at 2:30 PM, Josh Elser

wrote:

Sean Busbey wrote:


On Mon, May 2, 2016 at 8:55 AM, Josh Elser

wrote:

  Thanks for the input, Sean.

  Playing devil's advocate: we didn't have a major version bump

when

we

  dropped JDK6 support (in Accumulo-1.7.0). Oracle has EOL'ed

java 7

back in

  April  2015. Was the 6->7 upgrade different than a 7->8

upgrade?

On Mon, May 2, 2016 at 10:31 AM, Keith Turner

wrote:

  On Mon, May 2, 2016 at 1:54 AM, Sean Busbey<

bus...@cloudera.com

wrote:

  If we drop jdk7 support, I would strongly prefer a major

version

bump.


  Whats the rationale for binding a bump to Accumulo 2.0 with a

bump

in

the

  JDK version?


The decision to drop JDK6 support happened in latemarch  /

earlyApril

2014[1], long before any of our discussions or decisions on

semver.

AFAICT it didn't get discussed again, presumably because by the

time

we got to 1.7.0 RCs it was too far in the past.


Thanks for the correction, Sean. I hadn't dug around closely

enough.

Re: [DISCUSS] Java 8 support (was Fwd: [jira] [Commented] (ACCUMULO-4177) TinyLFU-based BlockCache)

2016-05-03 Thread Josh Elser

Gotcha. Thanks for clarifying, Mike -- I'm inclined to agree with you. I 
can't think of a reason why we would upgrade to Java8 and not make use 
of it in some way (publicly or privately).


That being said, I don't think I see consensus. How about we regroup in 
the form of a vote? (normal semver rules are an invariant -- no changes 
to our public API compatibility rules are implied by the below)


* Call the current 1.8.0-SNAPSHOT (master) "2.0.0-SNAPSHOT" and move to 
jdk8

* Branch 1.8, make master 2.0.0-SNAPSHOT. 1.8 stays jdk7, 2.0 goes jdk8

Please chime in if I missed another option or am calling discussion too 
soon. It just seems like we might have veered off-track and I don't want 
this to fall to the wayside (again) without decision.


Mike Drob wrote:

If our code ends up using java 8 bytecode in any classes required by a
consumer, then I think they will get compilation (linking?) errors,
regardless of java 8 types in our methods signatures.

On Tue, May 3, 2016 at 3:09 PM, Josh Elser  wrote:


That's a new assertion ("we can't actually use Java 8 features util
Accumulo-2"), isn't it? We could use new Java 8 features internally which
would require a minimum of Java 8 and not affect the public API. These are
related, not mutally exclusive, IMO.

To Shawn's point: introducing Java 8 types/APIs was exactly the point --
we got here from ACCUMULO-4177 which does exactly that.


Mike Drob wrote:


I agree with Shawn's implied statement -- why bother dropping Java 7 in
any
Accumulo 1.x if we can't actually make use of Java 8 features.until
Accumulo 2.0

On Tue, May 3, 2016 at 1:29 PM, Christopher   wrote:

Right, these are competing and mutually exclusive goals, so we need to

decide which is a priority and on what timeline we should transition to
Java 8 to support those goals.

On Tue, May 3, 2016 at 9:16 AM Shawn Walker
wrote:

I'm not sure that guaranteeing build-ability under Java 7 would address
the


issue that raised this discussion:  We (might) want to add a dependency
which requires Java 8.  Or, following Keith's comment, we might wish to
introduce Java 8 types (e.g. CompletableFuture) into Accumulo's


"public"


API.



On Mon, May 2, 2016 at 6:42 PM, Christopher
wrote:

I don't feel strongly about this, but I was kind of thinking that we'd
bump


to Java 8 dependency (opportunistically) when we were ready to develop


a
2.0 version. But, I'm not opposed to doing it on the 1.8 branch.

On Mon, May 2, 2016 at 2:50 PM William Slacum


wrote:
So my point about versioning WRT to the Java runtime is more about

how

there are incompatibilities within the granularity of Java versions

we

talk

about (I'm specifically referencing a Kerberos incompatibility within
versions of Java 7), so I think that just blanket saying "We support


Java X


or Y" really isn't enough. I personally feel some kind of version


bump

is


nice to say that something has changed, but until the public API
starts

exposing Java 8 features, it's a total cop out to say, "Here's all

these
bug fixes and some new features, oh by the way upgrade your
infrastructure


because we decided to use a new Java version for an optional


feature".

The best parallel I can think of is in Scala, where there's no binary

compatibility between minor versions (ie, 2.10, 2.11,etc), so there's
generally an extra qualifier on libraries marking the scala


compability

level. Would we ever want to have accumulo-server-1.7-j[7|8]  styled

artifacts to signal some general JRE compatibility? It's a total


mess,

but

I haven't seen a better solution.

Another idea is we could potentially have some guarantee for Java 7,


such
as making sure we can build a distribution using Java 7, but only

distribute Java 8 artifacts by default?

On Mon, May 2, 2016 at 2:30 PM, Josh Elser


wrote:
Sean Busbey wrote:

On Mon, May 2, 2016 at 8:55 AM, Josh Elser
wrote:
   Thanks for the input, Sean.

   Playing devil's advocate: we didn't have a major version bump


when

we

   dropped JDK6 support (in Accumulo-1.7.0). Oracle has EOL'ed

java 7

back in

   April  2015. Was the 6->7 upgrade different than a 7->8


upgrade?

On Mon, May 2, 2016 at 10:31 AM, Keith Turner

wrote:

   On Mon, May 2, 2016 at 1:54 AM, Sean Busbey<

bus...@cloudera.com

wrote:

   If we drop jdk7 support, I would strongly prefer a major

version

bump.


   Whats the rationale for binding a bump to Accumulo 2.0 with a


bump

in

the

   JDK version?

The decision to drop JDK6 support happened in latemarch  /

earlyApril

2014[1], long before any of our discussions or decisions on

semver.

AFAICT it didn't get discussed again, presumably because by the

time

we got to 1.7.0 RCs it was too far in the past.

Thanks for the correction, Sean. I hadn't dug around closely

enough.

Re: [DISCUSS] Java 8 support (was Fwd: [jira] [Commented] (ACCUMULO-4177) TinyLFU-based BlockCache)

2016-05-03 Thread Josh Elser

Anything is possible! Even more if you figured out how to go faster than 
the speed of light ;)


dlmar...@comcast.net wrote:

Branch 1.8, leave at JDK 7
Move master to 2.0.0-SNAPSHOT and:
->  Move to JDK 8
->  Remove deprecated items
->  Bump versions for dependencies (*Htrace issue)

Question: Could we release 1.8.0 and 2.0.0 around the same time such that 2.0.0 
is equivalent to 1.8.0 except for the changes mentioned above?

- Original Message -

From: "Josh Elser"
To: dev@accumulo.apache.org
Sent: Tuesday, May 3, 2016 4:33:39 PM
Subject: Re: [DISCUSS] Java 8 support (was Fwd: [jira] [Commented] 
(ACCUMULO-4177) TinyLFU-based BlockCache)

Gotcha. Thanks for clarifying, Mike -- I'm inclined to agree with you. I
can't think of a reason why we would upgrade to Java8 and not make use
of it in some way (publicly or privately).

That being said, I don't think I see consensus. How about we regroup in
the form of a vote? (normal semver rules are an invariant -- no changes
to our public API compatibility rules are implied by the below)

* Call the current 1.8.0-SNAPSHOT (master) "2.0.0-SNAPSHOT" and move to
jdk8
* Branch 1.8, make master 2.0.0-SNAPSHOT. 1.8 stays jdk7, 2.0 goes jdk8

Please chime in if I missed another option or am calling discussion too
soon. It just seems like we might have veered off-track and I don't want
this to fall to the wayside (again) without decision.

Mike Drob wrote:

If our code ends up using java 8 bytecode in any classes required by a
consumer, then I think they will get compilation (linking?) errors,
regardless of java 8 types in our methods signatures.

On Tue, May 3, 2016 at 3:09 PM, Josh Elser  wrote:


That's a new assertion ("we can't actually use Java 8 features util
Accumulo-2"), isn't it? We could use new Java 8 features internally which
would require a minimum of Java 8 and not affect the public API. These are
related, not mutally exclusive, IMO.

To Shawn's point: introducing Java 8 types/APIs was exactly the point --
we got here from ACCUMULO-4177 which does exactly that.


Mike Drob wrote:


I agree with Shawn's implied statement -- why bother dropping Java 7 in
any
Accumulo 1.x if we can't actually make use of Java 8 features.until
Accumulo 2.0

On Tue, May 3, 2016 at 1:29 PM, Christopher  wrote:

Right, these are competing and mutually exclusive goals, so we need to

decide which is a priority and on what timeline we should transition to
Java 8 to support those goals.

On Tue, May 3, 2016 at 9:16 AM Shawn Walker
wrote:

I'm not sure that guaranteeing build-ability under Java 7 would address
the


issue that raised this discussion: We (might) want to add a dependency
which requires Java 8. Or, following Keith's comment, we might wish to
introduce Java 8 types (e.g. CompletableFuture) into Accumulo's


"public"


API.



On Mon, May 2, 2016 at 6:42 PM, Christopher
wrote:

I don't feel strongly about this, but I was kind of thinking that we'd
bump


to Java 8 dependency (opportunistically) when we were ready to develop


a
2.0 version. But, I'm not opposed to doing it on the 1.8 branch.

On Mon, May 2, 2016 at 2:50 PM William Slacum


wrote:
So my point about versioning WRT to the Java runtime is more about

how

there are incompatibilities within the granularity of Java versions

we

talk

about (I'm specifically referencing a Kerberos incompatibility within
versions of Java 7), so I think that just blanket saying "We support


Java X


or Y" really isn't enough. I personally feel some kind of version


bump

is


nice to say that something has changed, but until the public API
starts

exposing Java 8 features, it's a total cop out to say, "Here's all

these
bug fixes and some new features, oh by the way upgrade your
infrastructure


because we decided to use a new Java version for an optional


feature".

The best parallel I can think of is in Scala, where there's no binary

compatibility between minor versions (ie, 2.10, 2.11,etc), so there's
generally an extra qualifier on libraries marking the scala


compability

level. Would we ever want to have accumulo-server-1.7-j[7|8] styled

artifacts to signal some general JRE compatibility? It's a total


mess,

but

I haven't seen a better solution.

Another idea is we could potentially have some guarantee for Java 7,


such
as making sure we can build a distribution using Java 7, but only

distribute Java 8 artifacts by default?

On Mon, May 2, 2016 at 2:30 PM, Josh Elser


wrote:
Sean Busbey wrote:

On Mon, May 2, 2016 at 8:55 AM, Josh Elser
wrote:
Thanks for the input, Sean.

Playing devil's advocate: we didn't have a major version bump


when

we

dropped JDK6 support (in Accumulo-1.7.0). Oracle has EOL'ed

java 7

back in

April 2015. Was the 6->7 upgrade different than a 7->8


upgrade?

On

Re: [DISCUSS] Java 8 support (was Fwd: [jira] [Commented] (ACCUMULO-4177) TinyLFU-based BlockCache)

2016-05-05 Thread Josh Elser


Ok, looks to me that we are in agreement now and don't need a vote.

I will create a 1.8 branch today (updating Jenkins appropriately) so we 
can get master in a state that would be ready for the changes in 4177.


Keith Turner wrote:

On Tue, May 3, 2016 at 4:54 PM, Christopher  wrote:


I think I'd prefer leaving 1.8 as it stands, with the expectation to have a
release line of 1.8 which only requires Java 7.



+1

I can not see any reason to switch to JDK8 before releasing 1.8... assuming
thats going to happen soonish



We can create a 2.0 branch, which bumps the Java version, and can accept
changes which require Java 8 or API-breaking changes (as per semver) for
the next major release line after 1.8.

That would put us on a solid roadmap for 2.0 without disrupting 1.8
development, which is probably already nearing release readiness.

On Tue, May 3, 2016 at 4:33 PM Josh Elser  wrote:


Gotcha. Thanks for clarifying, Mike -- I'm inclined to agree with you. I
can't think of a reason why we would upgrade to Java8 and not make use
of it in some way (publicly or privately).

That being said, I don't think I see consensus. How about we regroup in
the form of a vote? (normal semver rules are an invariant -- no changes
to our public API compatibility rules are implied by the below)

* Call the current 1.8.0-SNAPSHOT (master) "2.0.0-SNAPSHOT" and move to
jdk8
* Branch 1.8, make master 2.0.0-SNAPSHOT. 1.8 stays jdk7, 2.0 goes jdk8

Please chime in if I missed another option or am calling discussion too
soon. It just seems like we might have veered off-track and I don't want
this to fall to the wayside (again) without decision.

Mike Drob wrote:

If our code ends up using java 8 bytecode in any classes required by a
consumer, then I think they will get compilation (linking?) errors,
regardless of java 8 types in our methods signatures.

On Tue, May 3, 2016 at 3:09 PM, Josh Elser

wrote:

That's a new assertion ("we can't actually use Java 8 features util
Accumulo-2"), isn't it? We could use new Java 8 features internally

which

would require a minimum of Java 8 and not affect the public API. These

are

related, not mutally exclusive, IMO.

To Shawn's point: introducing Java 8 types/APIs was exactly the point

--

we got here from ACCUMULO-4177 which does exactly that.


Mike Drob wrote:


I agree with Shawn's implied statement -- why bother dropping Java 7

in

any
Accumulo 1.x if we can't actually make use of Java 8 features.until
Accumulo 2.0

On Tue, May 3, 2016 at 1:29 PM, Christopher

  wrote:

Right, these are competing and mutually exclusive goals, so we need

to

decide which is a priority and on what timeline we should transition

to

Java 8 to support those goals.

On Tue, May 3, 2016 at 9:16 AM Shawn Walker<

accum...@shawn-walker.net

wrote:

I'm not sure that guaranteeing build-ability under Java 7 would

address

the


issue that raised this discussion:  We (might) want to add a

dependency

which requires Java 8.  Or, following Keith's comment, we might

wish

to

introduce Java 8 types (e.g. CompletableFuture) into Accumulo's


"public"


API.



On Mon, May 2, 2016 at 6:42 PM, Christopher
wrote:

I don't feel strongly about this, but I was kind of thinking that

we'd

bump


to Java 8 dependency (opportunistically) when we were ready to

develop

a
2.0 version. But, I'm not opposed to doing it on the 1.8 branch.

On Mon, May 2, 2016 at 2:50 PM William Slacum


wrote:
So my point about versioning WRT to the Java runtime is more about

how

there are incompatibilities within the granularity of Java versions

we

talk

about (I'm specifically referencing a Kerberos incompatibility

within

versions of Java 7), so I think that just blanket saying "We

support

Java X


or Y" really isn't enough. I personally feel some kind of version


bump

is


nice to say that something has changed, but until the public API
starts

exposing Java 8 features, it's a total cop out to say, "Here's all

these
bug fixes and some new features, oh by the way upgrade your
infrastructure


because we decided to use a new Java version for an optional


feature".

The best parallel I can think of is in Scala, where there's no

binary

compatibility between minor versions (ie, 2.10, 2.11,etc), so

there's

generally an extra qualifier on libraries marking the scala


compability

level. Would we ever want to have accumulo-server-1.7-j[7|8]

styled

artifacts to signal some general JRE compatibility? It's a total


mess,

but

I haven't seen a better solution.

Another idea is we could potentially have some guarantee for Java

7,

such
as making sure we can build a distribution using Java 7, but only

distribute Java 8 artifacts by default?

On Mon, May 2, 2016 at 2:30 PM, Josh Elser


wrote:
Sean Busbey wrote:

On Mon, May 2, 2016 at 8:55 A

Re: [DISCUSS] Java 8 support (was Fwd: [jira] [Commented] (ACCUMULO-4177) TinyLFU-based BlockCache)

2016-05-05 Thread Josh Elser


Sounds good!

I had tried to switch master to jdk8 as well, but ran into modernizer 
plugin issues. I've since been on a call, so I haven't been able to push 
that update. I'll get to it when I can, but perhaps someone has beaten 
me to it already.


Christopher wrote:

Okay, so if we're okay treating the master branch as a 2.0 development
branch, then I'm going to go ahead and start focusing on some 2.0 tickets
that may involve refactoring which have breaking changes that I've been
reluctant to do before without an explicit 2.0 development branch. Of
course, none of this says we have to stop development on 1.x stuffs, or
says anything about when we'll release a 2.0, but it'd be nice to have a
place to start putting in stuff for an eventual 2.0.

On Thu, May 5, 2016 at 11:07 AM Josh Elser  wrote:


Ok, looks to me that we are in agreement now and don't need a vote.

I will create a 1.8 branch today (updating Jenkins appropriately) so we
can get master in a state that would be ready for the changes in 4177.

Keith Turner wrote:

On Tue, May 3, 2016 at 4:54 PM, Christopher   wrote:


I think I'd prefer leaving 1.8 as it stands, with the expectation to

have a

release line of 1.8 which only requires Java 7.


+1

I can not see any reason to switch to JDK8 before releasing 1.8...

assuming

thats going to happen soonish



We can create a 2.0 branch, which bumps the Java version, and can accept
changes which require Java 8 or API-breaking changes (as per semver) for
the next major release line after 1.8.

That would put us on a solid roadmap for 2.0 without disrupting 1.8
development, which is probably already nearing release readiness.

On Tue, May 3, 2016 at 4:33 PM Josh Elser   wrote:


Gotcha. Thanks for clarifying, Mike -- I'm inclined to agree with you.

I

can't think of a reason why we would upgrade to Java8 and not make use
of it in some way (publicly or privately).

That being said, I don't think I see consensus. How about we regroup in
the form of a vote? (normal semver rules are an invariant -- no changes
to our public API compatibility rules are implied by the below)

* Call the current 1.8.0-SNAPSHOT (master) "2.0.0-SNAPSHOT" and move to
jdk8
* Branch 1.8, make master 2.0.0-SNAPSHOT. 1.8 stays jdk7, 2.0 goes jdk8

Please chime in if I missed another option or am calling discussion too
soon. It just seems like we might have veered off-track and I don't

want

this to fall to the wayside (again) without decision.

Mike Drob wrote:

If our code ends up using java 8 bytecode in any classes required by a
consumer, then I think they will get compilation (linking?) errors,
regardless of java 8 types in our methods signatures.

On Tue, May 3, 2016 at 3:09 PM, Josh Elser

wrote:

That's a new assertion ("we can't actually use Java 8 features util
Accumulo-2"), isn't it? We could use new Java 8 features internally

which

would require a minimum of Java 8 and not affect the public API.

These

are

related, not mutally exclusive, IMO.

To Shawn's point: introducing Java 8 types/APIs was exactly the point

--

we got here from ACCUMULO-4177 which does exactly that.


Mike Drob wrote:


I agree with Shawn's implied statement -- why bother dropping Java 7

in

any
Accumulo 1.x if we can't actually make use of Java 8 features.until
Accumulo 2.0

On Tue, May 3, 2016 at 1:29 PM, Christopher

   wrote:

Right, these are competing and mutually exclusive goals, so we need

to

decide which is a priority and on what timeline we should

transition

to

Java 8 to support those goals.

On Tue, May 3, 2016 at 9:16 AM Shawn Walker<

accum...@shawn-walker.net

wrote:

I'm not sure that guaranteeing build-ability under Java 7 would

address

the


issue that raised this discussion:  We (might) want to add a

dependency

which requires Java 8.  Or, following Keith's comment, we might

wish

to

introduce Java 8 types (e.g. CompletableFuture) into Accumulo's


"public"


API.



On Mon, May 2, 2016 at 6:42 PM, Christopher
wrote:

I don't feel strongly about this, but I was kind of thinking that

we'd

bump


to Java 8 dependency (opportunistically) when we were ready to

develop

a
2.0 version. But, I'm not opposed to doing it on the 1.8 branch.

On Mon, May 2, 2016 at 2:50 PM William Slacum


wrote:
So my point about versioning WRT to the Java runtime is more about

how

there are incompatibilities within the granularity of Java

versions

we

talk

about (I'm specifically referencing a Kerberos incompatibility

within

versions of Java 7), so I think that just blanket saying "We

support

Java X


or Y" really isn't enough. I personally feel some kind of

version

bump

is


nice to say that something has changed, but until the public API
starts

exposing Java 8 features, it's a total cop out to say, "Here's all

these
bug

Re: [DISCUSS] Java 8 support (was Fwd: [jira] [Commented] (ACCUMULO-4177) TinyLFU-based BlockCache)

2016-05-05 Thread Josh Elser

Thanks boss. I figured you'd have my back :)
On May 5, 2016 9:43 PM, "Christopher"  wrote:

> Already pushed. Initially forgot about modernizer, but I'm working through
> it now.
>
> On Thu, May 5, 2016 at 7:25 PM Josh Elser  wrote:
>
> > Sounds good!
> >
> > I had tried to switch master to jdk8 as well, but ran into modernizer
> > plugin issues. I've since been on a call, so I haven't been able to push
> > that update. I'll get to it when I can, but perhaps someone has beaten
> > me to it already.
> >
> > Christopher wrote:
> > > Okay, so if we're okay treating the master branch as a 2.0 development
> > > branch, then I'm going to go ahead and start focusing on some 2.0
> tickets
> > > that may involve refactoring which have breaking changes that I've been
> > > reluctant to do before without an explicit 2.0 development branch. Of
> > > course, none of this says we have to stop development on 1.x stuffs, or
> > > says anything about when we'll release a 2.0, but it'd be nice to have
> a
> > > place to start putting in stuff for an eventual 2.0.
> > >
> > > On Thu, May 5, 2016 at 11:07 AM Josh Elser
> wrote:
> > >
> > >> Ok, looks to me that we are in agreement now and don't need a vote.
> > >>
> > >> I will create a 1.8 branch today (updating Jenkins appropriately) so
> we
> > >> can get master in a state that would be ready for the changes in 4177.
> > >>
> > >> Keith Turner wrote:
> > >>> On Tue, May 3, 2016 at 4:54 PM, Christopher
> >  wrote:
> > >>>
> > >>>> I think I'd prefer leaving 1.8 as it stands, with the expectation to
> > >> have a
> > >>>> release line of 1.8 which only requires Java 7.
> > >>>>
> > >>> +1
> > >>>
> > >>> I can not see any reason to switch to JDK8 before releasing 1.8...
> > >> assuming
> > >>> thats going to happen soonish
> > >>>
> > >>>
> > >>>> We can create a 2.0 branch, which bumps the Java version, and can
> > accept
> > >>>> changes which require Java 8 or API-breaking changes (as per semver)
> > for
> > >>>> the next major release line after 1.8.
> > >>>>
> > >>>> That would put us on a solid roadmap for 2.0 without disrupting 1.8
> > >>>> development, which is probably already nearing release readiness.
> > >>>>
> > >>>> On Tue, May 3, 2016 at 4:33 PM Josh Elser
> >  wrote:
> > >>>>
> > >>>>> Gotcha. Thanks for clarifying, Mike -- I'm inclined to agree with
> > you.
> > >> I
> > >>>>> can't think of a reason why we would upgrade to Java8 and not make
> > use
> > >>>>> of it in some way (publicly or privately).
> > >>>>>
> > >>>>> That being said, I don't think I see consensus. How about we
> regroup
> > in
> > >>>>> the form of a vote? (normal semver rules are an invariant -- no
> > changes
> > >>>>> to our public API compatibility rules are implied by the below)
> > >>>>>
> > >>>>> * Call the current 1.8.0-SNAPSHOT (master) "2.0.0-SNAPSHOT" and
> move
> > to
> > >>>>> jdk8
> > >>>>> * Branch 1.8, make master 2.0.0-SNAPSHOT. 1.8 stays jdk7, 2.0 goes
> > jdk8
> > >>>>>
> > >>>>> Please chime in if I missed another option or am calling discussion
> > too
> > >>>>> soon. It just seems like we might have veered off-track and I don't
> > >> want
> > >>>>> this to fall to the wayside (again) without decision.
> > >>>>>
> > >>>>> Mike Drob wrote:
> > >>>>>> If our code ends up using java 8 bytecode in any classes required
> > by a
> > >>>>>> consumer, then I think they will get compilation (linking?)
> errors,
> > >>>>>> regardless of java 8 types in our methods signatures.
> > >>>>>>
> > >>>>>> On Tue, May 3, 2016 at 3:09 PM, Josh Elser
> > >>>> wrote:
> > >>>>>>> That's a new assertion ("we can't actually use Java 8 features
> util
> > >>

Re: [DISCUSS] Java 8 support (was Fwd: [jira] [Commented] (ACCUMULO-4177) TinyLFU-based BlockCache)

2016-05-06 Thread Josh Elser

We can't disable modernizer just for mock? Or really, any code which we
intentionally don't want to modernize?
On May 5, 2016 11:43 PM, "Christopher"  wrote:

> Another interesting point... didn't realize until actually doing it:
> bumping to JDK8 *requires* a bump in the major version, because modernizer
> will block on some incompatible API changes in Mock, which is already
> deprecated. (Unless we're okay with disabling modernizer... which I guess
> is an acceptable solution... but it makes me unhappy :) )
>
> On Thu, May 5, 2016 at 11:39 PM Josh Elser  wrote:
>
> > Thanks boss. I figured you'd have my back :)
> > On May 5, 2016 9:43 PM, "Christopher"  wrote:
> >
> > > Already pushed. Initially forgot about modernizer, but I'm working
> > through
> > > it now.
> > >
> > > On Thu, May 5, 2016 at 7:25 PM Josh Elser 
> wrote:
> > >
> > > > Sounds good!
> > > >
> > > > I had tried to switch master to jdk8 as well, but ran into modernizer
> > > > plugin issues. I've since been on a call, so I haven't been able to
> > push
> > > > that update. I'll get to it when I can, but perhaps someone has
> beaten
> > > > me to it already.
> > > >
> > > > Christopher wrote:
> > > > > Okay, so if we're okay treating the master branch as a 2.0
> > development
> > > > > branch, then I'm going to go ahead and start focusing on some 2.0
> > > tickets
> > > > > that may involve refactoring which have breaking changes that I've
> > been
> > > > > reluctant to do before without an explicit 2.0 development branch.
> Of
> > > > > course, none of this says we have to stop development on 1.x
> stuffs,
> > or
> > > > > says anything about when we'll release a 2.0, but it'd be nice to
> > have
> > > a
> > > > > place to start putting in stuff for an eventual 2.0.
> > > > >
> > > > > On Thu, May 5, 2016 at 11:07 AM Josh Elser
> > > wrote:
> > > > >
> > > > >> Ok, looks to me that we are in agreement now and don't need a
> vote.
> > > > >>
> > > > >> I will create a 1.8 branch today (updating Jenkins appropriately)
> so
> > > we
> > > > >> can get master in a state that would be ready for the changes in
> > 4177.
> > > > >>
> > > > >> Keith Turner wrote:
> > > > >>> On Tue, May 3, 2016 at 4:54 PM, Christopher
> > > >  wrote:
> > > > >>>
> > > > >>>> I think I'd prefer leaving 1.8 as it stands, with the
> expectation
> > to
> > > > >> have a
> > > > >>>> release line of 1.8 which only requires Java 7.
> > > > >>>>
> > > > >>> +1
> > > > >>>
> > > > >>> I can not see any reason to switch to JDK8 before releasing
> 1.8...
> > > > >> assuming
> > > > >>> thats going to happen soonish
> > > > >>>
> > > > >>>
> > > > >>>> We can create a 2.0 branch, which bumps the Java version, and
> can
> > > > accept
> > > > >>>> changes which require Java 8 or API-breaking changes (as per
> > semver)
> > > > for
> > > > >>>> the next major release line after 1.8.
> > > > >>>>
> > > > >>>> That would put us on a solid roadmap for 2.0 without disrupting
> > 1.8
> > > > >>>> development, which is probably already nearing release
> readiness.
> > > > >>>>
> > > > >>>> On Tue, May 3, 2016 at 4:33 PM Josh Elser
> > > >  wrote:
> > > > >>>>
> > > > >>>>> Gotcha. Thanks for clarifying, Mike -- I'm inclined to agree
> with
> > > > you.
> > > > >> I
> > > > >>>>> can't think of a reason why we would upgrade to Java8 and not
> > make
> > > > use
> > > > >>>>> of it in some way (publicly or privately).
> > > > >>>>>
> > > > >>>>> That being said, I don't think I see consensus. How about we
> > > regroup
> > > > in
> > > > >>>>> the for

Re: [DISCUSS] Java 8 support (was Fwd: [jira] [Commented] (ACCUMULO-4177) TinyLFU-based BlockCache)

2016-05-06 Thread Josh Elser


+1 to that, too

Dave Marion wrote:

It's 2.0, remove mock and deprecate it in 1.8 if it's not already.
On May 6, 2016 10:25 AM, "Josh Elser"  wrote:

We can't disable modernizer just for mock? Or really, any code which we
intentionally don't want to modernize?
On May 5, 2016 11:43 PM, "Christopher"  wrote:


Another interesting point... didn't realize until actually doing it:
bumping to JDK8 *requires* a bump in the major version, because modernizer
will block on some incompatible API changes in Mock, which is already
deprecated. (Unless we're okay with disabling modernizer... which I guess
is an acceptable solution... but it makes me unhappy :) )

On Thu, May 5, 2016 at 11:39 PM Josh Elser  wrote:


Thanks boss. I figured you'd have my back :)
On May 5, 2016 9:43 PM, "Christopher"  wrote:


Already pushed. Initially forgot about modernizer, but I'm working

through

it now.

On Thu, May 5, 2016 at 7:25 PM Josh Elser

wrote:

Sounds good!

I had tried to switch master to jdk8 as well, but ran into

modernizer

plugin issues. I've since been on a call, so I haven't been able to

push

that update. I'll get to it when I can, but perhaps someone has

beaten

me to it already.

Christopher wrote:

Okay, so if we're okay treating the master branch as a 2.0

development

branch, then I'm going to go ahead and start focusing on some 2.0

tickets

that may involve refactoring which have breaking changes that I've

been

reluctant to do before without an explicit 2.0 development branch.

Of

course, none of this says we have to stop development on 1.x

stuffs,

or

says anything about when we'll release a 2.0, but it'd be nice to

have

a

place to start putting in stuff for an eventual 2.0.

On Thu, May 5, 2016 at 11:07 AM Josh Elser

wrote:

Ok, looks to me that we are in agreement now and don't need a

vote.

I will create a 1.8 branch today (updating Jenkins appropriately)

so

we

can get master in a state that would be ready for the changes in

4177.

Keith Turner wrote:

On Tue, May 3, 2016 at 4:54 PM, Christopher

  wrote:

I think I'd prefer leaving 1.8 as it stands, with the

expectation

to

have a

release line of 1.8 which only requires Java 7.


+1

I can not see any reason to switch to JDK8 before releasing

1.8...

assuming

thats going to happen soonish



We can create a 2.0 branch, which bumps the Java version, and

can

accept

changes which require Java 8 or API-breaking changes (as per

semver)

for

the next major release line after 1.8.

That would put us on a solid roadmap for 2.0 without disrupting

1.8

development, which is probably already nearing release

readiness.

On Tue, May 3, 2016 at 4:33 PM Josh Elser

  wrote:

Gotcha. Thanks for clarifying, Mike -- I'm inclined to agree

with

you.

I

can't think of a reason why we would upgrade to Java8 and not

make

use

of it in some way (publicly or privately).

That being said, I don't think I see consensus. How about we

regroup

in

the form of a vote? (normal semver rules are an invariant --

no

changes

to our public API compatibility rules are implied by the

below)

* Call the current 1.8.0-SNAPSHOT (master) "2.0.0-SNAPSHOT"

and

move

to

jdk8
* Branch 1.8, make master 2.0.0-SNAPSHOT. 1.8 stays jdk7, 2.0

goes

jdk8

Please chime in if I missed another option or am calling

discussion

too

soon. It just seems like we might have veered off-track and I

don't

want

this to fall to the wayside (again) without decision.

Mike Drob wrote:

If our code ends up using java 8 bytecode in any classes

required

by a

consumer, then I think they will get compilation (linking?)

errors,

regardless of java 8 types in our methods signatures.

On Tue, May 3, 2016 at 3:09 PM, Josh Elser<

josh.el...@gmail.com

wrote:

That's a new assertion ("we can't actually use Java 8

features

util

Accumulo-2"), isn't it? We could use new Java 8 features

internally

which

would require a minimum of Java 8 and not affect the public

API.

These

are

related, not mutally exclusive, IMO.

To Shawn's point: introducing Java 8 types/APIs was exactly

the

point

--

we got here from ACCUMULO-4177 which does exactly that.


Mike Drob wrote:


I agree with Shawn's implied statement -- why bother

dropping

Java 7

in

any
Accumulo 1.x if we can't actually make use of Java 8

features.until

Accumulo 2.0

On Tue, May 3, 2016 at 1:29 PM, Christopher<

ctubb...@apache.org

wrote:

Right, these are competing and mutually exclusive goals, so

we

need

to

decide which is a priority and on what timeline we should

transition

to

Java 8 to support those goals.

On Tue, May 3, 2016 at 9:16 AM Shawn Walker<

accum...@shawn-walker.net

wrote:

I'm not sure that guaranteeing build-ability under Java 7

would

address

the


issue that raised this disc

Re: [DISCUSS] Java 8 support (was Fwd: [jira] [Commented] (ACCUMULO-4177) TinyLFU-based BlockCache)

2016-05-06 Thread Josh Elser


Thanks for the info (could've looked it up myself, I'm sure).

If there's anything contentious, let's split out another thread for 
deprecated removals, please. I appreciate you being proactive on this 
already, Christopher.


Mock I think everyone could be in agreement to remove. Aggregators, 
while not really maintained, don't seem like they are something that are 
holding us back or creating more work for us.


In general, we made the JDK8 decision. Let's address other potential 2.0 
changes separately :)


Christopher wrote:

I intend to remove mock, and other deprecated stuffs (aggregators!) in 2.0.
But, that's exactly my point. Removing or changing these things required a
bump in 2.0, so discussions about whether or not we'd need to bump to 2.0
with the jdk8 switch were moot (unless we were willing to disable
modernizer, of course).

To Josh's question, unfortunately, modernizer has a fail/no-fail mode, but
it doesn't allow custom exceptions like findbugs. It's more like checkstyle
in that way. It's either on or off.

On Fri, May 6, 2016 at 12:11 PM Josh Elser  wrote:


+1 to that, too

Dave Marion wrote:

It's 2.0, remove mock and deprecate it in 1.8 if it's not already.
On May 6, 2016 10:25 AM, "Josh Elser"   wrote:

We can't disable modernizer just for mock? Or really, any code which we
intentionally don't want to modernize?
On May 5, 2016 11:43 PM, "Christopher"   wrote:


Another interesting point... didn't realize until actually doing it:
bumping to JDK8 *requires* a bump in the major version, because

modernizer

will block on some incompatible API changes in Mock, which is already
deprecated. (Unless we're okay with disabling modernizer... which I

guess

is an acceptable solution... but it makes me unhappy :) )

On Thu, May 5, 2016 at 11:39 PM Josh Elser

wrote:

Thanks boss. I figured you'd have my back :)
On May 5, 2016 9:43 PM, "Christopher"   wrote:


Already pushed. Initially forgot about modernizer, but I'm working

through

it now.

On Thu, May 5, 2016 at 7:25 PM Josh Elser

wrote:

Sounds good!

I had tried to switch master to jdk8 as well, but ran into

modernizer

plugin issues. I've since been on a call, so I haven't been able to

push

that update. I'll get to it when I can, but perhaps someone has

beaten

me to it already.

Christopher wrote:

Okay, so if we're okay treating the master branch as a 2.0

development

branch, then I'm going to go ahead and start focusing on some 2.0

tickets

that may involve refactoring which have breaking changes that I've

been

reluctant to do before without an explicit 2.0 development branch.

Of

course, none of this says we have to stop development on 1.x

stuffs,

or

says anything about when we'll release a 2.0, but it'd be nice to

have

a

place to start putting in stuff for an eventual 2.0.

On Thu, May 5, 2016 at 11:07 AM Josh Elser

wrote:

Ok, looks to me that we are in agreement now and don't need a

vote.

I will create a 1.8 branch today (updating Jenkins appropriately)

so

we

can get master in a state that would be ready for the changes in

4177.

Keith Turner wrote:

On Tue, May 3, 2016 at 4:54 PM, Christopher

   wrote:

I think I'd prefer leaving 1.8 as it stands, with the

expectation

to

have a

release line of 1.8 which only requires Java 7.


+1

I can not see any reason to switch to JDK8 before releasing

1.8...

assuming

thats going to happen soonish



We can create a 2.0 branch, which bumps the Java version, and

can

accept

changes which require Java 8 or API-breaking changes (as per

semver)

for

the next major release line after 1.8.

That would put us on a solid roadmap for 2.0 without disrupting

1.8

development, which is probably already nearing release

readiness.

On Tue, May 3, 2016 at 4:33 PM Josh Elser

   wrote:

Gotcha. Thanks for clarifying, Mike -- I'm inclined to agree

with

you.

I

can't think of a reason why we would upgrade to Java8 and not

make

use

of it in some way (publicly or privately).

That being said, I don't think I see consensus. How about we

regroup

in

the form of a vote? (normal semver rules are an invariant --

no

changes

to our public API compatibility rules are implied by the

below)

* Call the current 1.8.0-SNAPSHOT (master) "2.0.0-SNAPSHOT"

and

move

to

jdk8
* Branch 1.8, make master 2.0.0-SNAPSHOT. 1.8 stays jdk7, 2.0

goes

jdk8

Please chime in if I missed another option or am calling

discussion

too

soon. It just seems like we might have veered off-track and I

don't

want

this to fall to the wayside (again) without decision.

Mike Drob wrote:

If our code ends up using java 8 bytecode in any classes

required

by a

consumer, then I think they will get compilation (linking?)

errors,

regardless of java 8 types in our methods signatures.

On T

Re: Pros and Cons of moving SKVI to public API

2016-05-10 Thread Josh Elser

Oy, that really did not come across well for me in email-form. Can you 
use paste.a.o or something?


+1 for moving "internal-only" iterators and IteratorUtils. Neither are 
things that we intend for users to need.


IMO, IteratorEnvironment was also kind of hokey/goofy (I never really 
used it either). I could go either way on it.


Keith Turner wrote:

I modified the apilyzer config in the 1.8 branch to include iterators.   It
spit out the following problems with types.

I think we could move IteratorUtil out of
org.apache.accumulo.core.iterators.  Also we could possibly move the
iterators in org.apache.accumulo.core.iterators.system somewhere else (or
declare it as an exception).  Could possibly deprecate IteratorEnvironment.
getConfig() and add a replacement method that returns a Map.

Problems :

   CONTEXT
TYPE
FIELD/METHODNON-PUBLIC REFERENCE

   METHOD_RETURN
org.apache.accumulo.core.iterators.IteratorEnvironment
getConfig(...)
org.apache.accumulo.core.conf.AccumuloConfiguration
   METHOD_PARAM
org.apache.accumulo.core.iterators.IteratorUtil
parseIterConf(...)
org.apache.accumulo.core.conf.AccumuloConfiguration
   METHOD_PARAM
org.apache.accumulo.core.iterators.IteratorUtil
loadIterators(...)
org.apache.accumulo.core.data.impl.KeyExtent
   METHOD_PARAM
org.apache.accumulo.core.iterators.IteratorUtil
loadIterators(...)
org.apache.accumulo.core.conf.AccumuloConfiguration
   METHOD_PARAM
org.apache.accumulo.core.iterators.IteratorUtil
loadIterators(...)
org.apache.accumulo.core.data.impl.KeyExtent
   METHOD_PARAM
org.apache.accumulo.core.iterators.IteratorUtil
loadIterators(...)
org.apache.accumulo.core.conf.AccumuloConfiguration
   METHOD_PARAM
org.apache.accumulo.core.iterators.IteratorUtil
loadIterators(...)
org.apache.accumulo.core.data.impl.KeyExtent
   METHOD_PARAM
org.apache.accumulo.core.iterators.IteratorUtil
loadIterators(...)
org.apache.accumulo.core.conf.AccumuloConfiguration
   METHOD_PARAM
org.apache.accumulo.core.iterators.IteratorUtil
loadIterators(...)
org.apache.accumulo.core.data.impl.KeyExtent
   METHOD_PARAM
org.apache.accumulo.core.iterators.IteratorUtil
loadIterators(...)
org.apache.accumulo.core.conf.AccumuloConfiguration
   METHOD_PARAM
org.apache.accumulo.core.iterators.IteratorUtil
loadIterators(...)
org.apache.accumulo.core.data.impl.KeyExtent
   METHOD_PARAM
org.apache.accumulo.core.iterators.IteratorUtil
loadIterators(...)
org.apache.accumulo.core.conf.AccumuloConfiguration
   CTOR_PARAM
org.apache.accumulo.core.iterators.system.MapFileIterator
(...)
org.apache.accumulo.core.conf.AccumuloConfiguration
   METHOD_RETURN
org.apache.accumulo.core.iterators.system.MapFileIterator
getSample(...)
org.apache.accumulo.core.file.FileSKVIterator
   METHOD_PARAM
org.apache.accumulo.core.iterators.system.MapFileIterator
getSample(...)
org.apache.accumulo.core.sample.impl.SamplerConfigurationImpl
   CTOR_PARAM
org.apache.accumulo.core.iterators.system.MultiIterator
(...)
org.apache.accumulo.core.data.impl.KeyExtent
   CTOR_PARAM
org.apache.accumulo.core.iterators.system.SampleIterator
(...)   org.apache.accumulo.core.sample.Sampler
   CTOR_PARAM
org.apache.accumulo.core.iterators.system.SequenceFileIterator
(...)   org.apache.hadoop.io.SequenceFile$Reader
   METHOD_RETURN
org.apache.accumulo.core.iterators.system.SequenceFileIterator
getSample(...)
org.apache.accumulo.core.file.FileSKVIterator
   METHOD_PARAM
org.apache.accumulo.core.iterators.system.SequenceFileIterator
getSample(...)
org.apache.accumulo.core.sample.impl.SamplerConfigurationImpl
   FIELD
org.apache.accumulo.core.iterators.system.VisibilityFilter
cache
org.apache.commons.collections.map.LRUMap
   FIELD
org.apache.accumulo.core.iterators.user.TransformingIterator
log org.slf4j.Logger

Total : 22


On Mon, May 9, 2016 at 4:54 PM, Christopher  wrote:


Hey Keith. Just wanted to ping you on this to see if you've had a chance to
look.

On Fri, Mar 25, 2016 at 12:17 PM Keith Turner  wrote:


On Thu, Mar 24, 2016 at 8:27 PM, Christopher

wrote:

It seems there's a general agreement that we treat it like public API

and

we should just call it public API.

I just don't know how we're going to actually say this is public API,
without addressing the issue of the other iterators in the same

package,

unless we're okay with going back to a more complicated API definition
which calls out specific classes instead of whole packages.



Keith, you spent the most time thinking about how to convey the public

API

in the README. What do you think about how to actually make this

happen?


I will look into it at the end of next week and try to come up with a
strategy for incorporating it into the public API.  Need to analyze what
types are used by the existing iterators.  In 1.7.0 I tried to make API
types only reference other API types.  Things that violated this type
constraint were deprecated and a replacement that met the

Re: Accumulo not initializing

2016-05-16 Thread Josh Elser

Adina Crainiceanu wrote:

Amila,

2 ideas:

1) Are you sure that Zookeeper started successfully? Maybe it did not. Make
sure the permissions on that local dir you used for dirData are correct.

Try the
sudo echo ruok | nc 127.0.0.1 2181

if you don;t get imok back, start Zookeeper

Thread "org.apache.accumulo.master.state.SetGoalState" died Failed to
connect to zookeeper (localhost:2181) within 2x zookeeper timeout period
3

java.lang.RuntimeException: Failed to connect to zookeeper (localhost:2181)
within 2x zookeeper timeout period 3

at org.apache.accumulo.fate.zookeeper.ZooSession.connect(ZooSession.java:123)
at org.apache.accumulo.fate.zookeeper.ZooSession.getSession(ZooSession.java:167)
at org.apache.accumulo.fate.zookeeper.ZooReader.getSession(ZooReader.java:39)
at 
org.apache.accumulo.fate.zookeeper.ZooReaderWriter.getZooKeeper(ZooReaderWriter.java:50)
at 

org.apache.accumulo.fate.zookeeper.ZooReader.getChildren(ZooReader.java:128)

at 
org.apache.accumulo.server.Accumulo.waitForZookeeperAndHdfs(Accumulo.java:269)
at org.apache.accumulo.master.state.SetGoalState.main(SetGoalState.java:46)

Did you try Adina's recommendation? If you're running it inside of 
constrained VM, it's possible that you just don't have enough resources 
to run it (although, I don't think that's extremely likely). 
Essentially, ZooKeeper didn't respond in 60s -- which is likely it's not 
reachable by the address you gave it.

Have you properly configured your /etc/hosts file for your system?

>>
>> WARN : Max open files on localhost is 1024, recommend 32768

Don't ignore this, btw! :)

Fwd: [RESULT] [VOTE] Apache Commons VFS 2.1 rc2

2016-05-18 Thread Josh Elser


Holy . We actually got it done :P

 Original Message 
Subject: [RESULT] [VOTE] Apache Commons VFS 2.1 rc2
Date: Wed, 18 May 2016 11:40:21 -0400
From: Josh Elser 
To: Commons Developers List 

We've let this go a bit longer than the originally specified date range,
but let's close it now that we have consensus.

This VOTE passes with 3 binding +1's and 1 non-binding +1.

Thanks again for everyone's help who made this a reality. I'll try to
follow through and find some docs on the steps to promote these
artifacts (with the concerns acknowledged that were mentioned during the
vote).

- Josh

Josh Elser wrote:

All,

Please consider the following for Apache Commons VFS2 version 2.1 (rc2).

Maven repository:
https://repository.apache.org/content/repositories/orgapachecommons-1166
Artifacts: https://dist.apache.org/repos/dist/dev/commons/vfs/ r13608

MD5 commons-vfs-distribution-2.1-bin.tar.gz
8cc35a3169e1faee727c5af94c7dd904
SHA1 commons-vfs-distribution-2.1-bin.tar.gz
72b7557c4e8b1789b8aa0a9c1e0cb2c9daecec30
MD5 commons-vfs-distribution-2.1-src.tar.gz
a182ac642874e85fbc7d1086f7663482
SHA1 commons-vfs-distribution-2.1-src.tar.gz
e42ac2053deb314277213e43f6f1d6b43eff3de9

MD5 commons-vfs-distribution-2.1-bin.zip b512a45c63d824eef826f174fd7a9245
SHA1 commons-vfs-distribution-2.1-bin.zip
886cb5a430da58b3a68e2abf297b6f735d3ffaf0
MD5 commons-vfs-distribution-2.1-src.zip 29645a0ad091c15b2cf76b619c33069f
SHA1 commons-vfs-distribution-2.1-src.zip
5824cf4d802865b5e93ff62c90affa58df6c4384

Signed with 4677D66C from
https://dist.apache.org/repos/dist/release/commons/KEYS

SVN tag is available at
https://svn.apache.org/repos/asf/commons/proper/vfs/tags/commons-vfs2-project-2.1-rc2/
r1743451

Staged Maven website:
http://home.apache.org/~elserj/commons/commons-vfs-2.1/

All reports are available in the provided staged Maven site (see "Project
Reports" at the root-level as well as under each sub-module).
JIRA-generated
release notes are available in the dist.a.o "Artifacts" repository. Unit
tests pass and the RC was built util JDK6.

(For Sebb) A direct Clirr link
http://home.apache.org/~elserj/commons/commons-vfs-2.1/commons-vfs2/clirr-report.html


Changes since rc1:

* Fixed more compatibility concerns against 2.0 (thanks, Greg)
* Improved release notes (thanks, Sebb)

This vote will be open for 72-hours, 2016/05/14 0400 UTC.

[ ] +1 Release these artifacts as version 2.1
[ ] 0 OK, but...
[ ] -1 I oppose these artifacts as version 2.1 because..

- Josh

-
To unsubscribe, e-mail: dev-unsubscr...@commons.apache.org
For additional commands, e-mail: dev-h...@commons.apache.org

Re: [RESULT] [VOTE] Apache Commons VFS 2.1 rc2

2016-05-18 Thread Josh Elser


Hot damn. A whole case! Maybe it was worth it ;)

dlmar...@comcast.net wrote:

I've been silently watching all the RC threads. Kudos to you for sticking with 
it through all the different issues raised, the odd (and poorly documented) 
release process, and lack of responses to the RC votes. I owe you a case of 
beer.


- Original Message -

From: "Josh Elser"
To: "dev"
Sent: Wednesday, May 18, 2016 11:41:05 AM
Subject: Fwd: [RESULT] [VOTE] Apache Commons VFS 2.1 rc2

Holy . We actually got it done :P

 Original Message 
Subject: [RESULT] [VOTE] Apache Commons VFS 2.1 rc2
Date: Wed, 18 May 2016 11:40:21 -0400
From: Josh Elser
To: Commons Developers List

We've let this go a bit longer than the originally specified date range,
but let's close it now that we have consensus.

This VOTE passes with 3 binding +1's and 1 non-binding +1.

Thanks again for everyone's help who made this a reality. I'll try to
follow through and find some docs on the steps to promote these
artifacts (with the concerns acknowledged that were mentioned during the
vote).

- Josh

Josh Elser wrote:

All,

Please consider the following for Apache Commons VFS2 version 2.1 (rc2).

Maven repository:
https://repository.apache.org/content/repositories/orgapachecommons-1166
Artifacts: https://dist.apache.org/repos/dist/dev/commons/vfs/ r13608

MD5 commons-vfs-distribution-2.1-bin.tar.gz
8cc35a3169e1faee727c5af94c7dd904
SHA1 commons-vfs-distribution-2.1-bin.tar.gz
72b7557c4e8b1789b8aa0a9c1e0cb2c9daecec30
MD5 commons-vfs-distribution-2.1-src.tar.gz
a182ac642874e85fbc7d1086f7663482
SHA1 commons-vfs-distribution-2.1-src.tar.gz
e42ac2053deb314277213e43f6f1d6b43eff3de9

MD5 commons-vfs-distribution-2.1-bin.zip b512a45c63d824eef826f174fd7a9245
SHA1 commons-vfs-distribution-2.1-bin.zip
886cb5a430da58b3a68e2abf297b6f735d3ffaf0
MD5 commons-vfs-distribution-2.1-src.zip 29645a0ad091c15b2cf76b619c33069f
SHA1 commons-vfs-distribution-2.1-src.zip
5824cf4d802865b5e93ff62c90affa58df6c4384

Signed with 4677D66C from
https://dist.apache.org/repos/dist/release/commons/KEYS

SVN tag is available at
https://svn.apache.org/repos/asf/commons/proper/vfs/tags/commons-vfs2-project-2.1-rc2/
r1743451

Staged Maven website:
http://home.apache.org/~elserj/commons/commons-vfs-2.1/

All reports are available in the provided staged Maven site (see "Project
Reports" at the root-level as well as under each sub-module).
JIRA-generated
release notes are available in the dist.a.o "Artifacts" repository. Unit
tests pass and the RC was built util JDK6.

(For Sebb) A direct Clirr link
http://home.apache.org/~elserj/commons/commons-vfs-2.1/commons-vfs2/clirr-report.html


Changes since rc1:

* Fixed more compatibility concerns against 2.0 (thanks, Greg)
* Improved release notes (thanks, Sebb)

This vote will be open for 72-hours, 2016/05/14 0400 UTC.

[ ] +1 Release these artifacts as version 2.1
[ ] 0 OK, but...
[ ] -1 I oppose these artifacts as version 2.1 because..

- Josh

-
To unsubscribe, e-mail: dev-unsubscr...@commons.apache.org
For additional commands, e-mail: dev-h...@commons.apache.org

Accumulo folks at Hadoop Summit San Jose

2016-05-19 Thread Josh Elser

Out of curiosity, are there going to be any Accumulo-folks at Hadoop 
Summit in San Jose, CA at the end of June?


- Josh

Re: Accumulo folks at Hadoop Summit San Jose

2016-05-19 Thread Josh Elser

Try to make the meetup that Billie is setting up and be sure to 
introduce yourself :)


I'll be there this year (if that wasn't obvious by me asking)

Claudia Rose wrote:

I'll be there although I don't know the other "folks" yet.

-----Original Message-
From: Josh Elser [mailto:josh.el...@gmail.com]
Sent: Thursday, May 19, 2016 11:01 AM
To: dev
Cc: u...@accumulo.apache.org
Subject: Accumulo folks at Hadoop Summit San Jose

Out of curiosity, are there going to be any Accumulo-folks at Hadoop Summit in 
San Jose, CA at the end of June?

- Josh

Co-working group on May 17th

2016-05-19 Thread Josh Elser

I forgot to mention earlier that a small number of us got together on 
the 17th to work in proximity to each other.


Myself, Billie, Keith, Mike Wall, Mike Walch, and Christopher were 
present at points throughout the day (The common prefix on the "Mike 
Wal*"'s is pretty great, IMO).


There really weren't any discussions had on Accumulo, I believe the only 
thing that stood out was Mike Wall asking some questions as he worked 
through the release process. I believe we can hope to see a volunteer 
from him to be RM for 1.8.0 soon as well as some discussion on what 
needs to happen before 1.8.0-rc0.


For those who haven't noticed otherwise yet, check out Apache Fluo 
(incubating)[1]! This is pretty exciting news.


- Josh

[1] http://incubator.apache.org/projects/fluo.html

[ANNOUNCE] Apache Commons VFS 2.1 released

2016-05-20 Thread Josh Elser

The Apache Commons team is pleased to announce the release of Apache 
Commons VFS 2.1. Apache Commons VFS provides a single API for accessing 
various different file systems. It presents a uniform view of the files 
from various different sources, such as the files on local disk, on an 
HTTP server, or inside a Zip archive.


Details of the changes and bug fixes in this release can be found in
the release notes:

http://www.apache.org/dist/commons/vfs/RELEASE_NOTES.txt

This release is binary compatible with 2.0, users are encouraged to 
upgrade at their earliest convenience.


For information on Commons VFS please visit the VFS website:
http://commons.apache.org/vfs/

Commons VFS can be downloaded from the following page:
http://commons.apache.org/vfs/download_vfs.cgi

- The Apache Commons community

Fwd: Accumulo-Test-1.6-Hadoop-1 - Build # 1029 - Unstable!

2016-05-20 Thread Josh Elser

Looks like the 2.1 upgrade failed on 1.6 with Hadoop-1. Did you happen 
to notice this, Dave?

 Original Message 
Subject: Accumulo-Test-1.6-Hadoop-1 - Build # 1029 - Unstable!
Date: Fri, 20 May 2016 17:29:15 + (UTC)
From: els...@apache.org
To: josh.el...@gmail.com

Accumulo-Test-1.6-Hadoop-1 - Build # 1029 - Unstable:

Check console output at 
https://secure.penguinsinabox.com/jenkins/job/Accumulo-Test-1.6-Hadoop-1/1029/ 
to view the results.

Re: [DISCUSS] Time for a 1.8.0 release?

2016-05-23 Thread Josh Elser

You can always feel free to move them all out to a 1.9 and people who care
about certain ones can move them back into the release and make them
blockers. Less work for you :)

Thanks for volunteering to be RM!
On May 22, 2016 9:42 PM, "Michael Wall"  wrote:

> After last weeks discussion with Josh, Christopher and others at the
> Accumulo Working Day, I am going to shepherd the 1.8 release.  First step
> is to create a release candidate?  Before I do that, are there any tickets
> that need to get into the release?  I know Keith mentioned 1 or 2 and I
> have one I'd like to finish.
>
> Here is what Jira says is unresolved,
> https://s.apache.org/accumulo-1.8-unresolved
>
> On Wed I would like to move all tickets not identified for the 1.8 release
> to 2.0.  Then on Friday I would like to cut the first release candidate for
> 1.8.  Is that enough time?  Anything I am missing?
>
> Thanks
>
> Mike
>

Re: [DISCUSS] Time for a 1.8.0 release?

2016-05-23 Thread Josh Elser

I'll try to take a look at this one tonight to have another set of eyes 
on it.


Keith Turner wrote:

I just opened a PR for ACCUMULO-1124, thats one change I wanted to get in
for 1.8.  The other change I would like to get in is ACCUMULO-4165.  I will
try my best to get a PR for that in tomorrow.

https://issues.apache.org/jira/browse/ACCUMULO-1124
https://issues.apache.org/jira/browse/ACCUMULO-4165

On Sun, May 22, 2016 at 9:42 PM, Michael Wall  wrote:


After last weeks discussion with Josh, Christopher and others at the
Accumulo Working Day, I am going to shepherd the 1.8 release.  First step
is to create a release candidate?  Before I do that, are there any tickets
that need to get into the release?  I know Keith mentioned 1 or 2 and I
have one I'd like to finish.

Here is what Jira says is unresolved,
https://s.apache.org/accumulo-1.8-unresolved

On Wed I would like to move all tickets not identified for the 1.8 release
to 2.0.  Then on Friday I would like to cut the first release candidate for
1.8.  Is that enough time?  Anything I am missing?

Thanks

Mike

Re: [DISCUSS] Release 1.7.2

2016-05-25 Thread Josh Elser


Yay! That would be awesome, Mike!

I'll go through the 1.7.2 ones I see right now and tag any that I think 
need special attention.


Mike Drob wrote:

Following up on the 1.8.0 release thread, maybe we should also get a 1.7.2
release going as well. I'll probably go through and move issues out to
1.7.3 either this week or next week. Does anybody have issues that they
believe are blockers for 1.7.2?

Mike

Re: [DISCUSS] Time for a 1.8.0 release?

2016-05-26 Thread Josh Elser

Looks great to me. You now have the power to decide if RCs for 1.8.0 are
zero or one indexed. Choose wisely ;)
On May 26, 2016 5:57 PM, "Michael Wall"  wrote:

> Didn't get a chance to talk to Christopher so hopefully what I understood
> from emails with Josh and him is correct.
>
> Moved issues out of 1.8.0.  Here is a summary of the fix version changes
>
> 8 issues - 1.7.2, 1.8.0 => 1.7.2, 1.8.1
> 9 issues - 1.6.6, 1.7.3, 1.8.0 => 1.6.6, 1.7.3, 1.8.1
> 34 issues - 1.7.3, 1.8.0 => 1.7.3, 1.8.1
> 102 issues (BUG) - 1.8.0 => 1.8.1
> 248 issues (not BUG) - 1.8.0 => 1.9.1
>
> That leaves 3 issues in 1.8.0, I made them blockers
> - https://issues.apache.org/jira/browse/ACCUMULO-4157 (WAL can be
> prematurely deleted)
> - https://issues.apache.org/jira/browse/ACCUMULO-4165 (Create a user level
> API for RFile)
> - https://issues.apache.org/jira/browse/ACCUMULO-1124 (optimize index size
> in RFile)
>
> Keith has a PR in for 1124.  I am looking to put in a PR for 4157
> tomorrow/Sat.  Keith, if I need to move 4165 to 1.8.1 let me know.
>
> Once those are closed/moved, I will cut an RC1.
>
> Mike
>
>
> On Thu, May 26, 2016 at 8:18 AM, Michael Wall  wrote:
>
> > Christopher,
> >
> > I'd like to talk this through with you before I move the tickets to make
> > sure I understand what you are saying here.
> >
> > Thanks for the note, it is helpful.
> >
> > Mike
> >
> > On Tue, May 24, 2016 at 6:41 PM, Christopher 
> wrote:
> >
> >> On Sun, May 22, 2016 at 9:42 PM Michael Wall  wrote:
> >>
> >> > After last weeks discussion with Josh, Christopher and others at the
> >> > Accumulo Working Day, I am going to shepherd the 1.8 release.  First
> >> step
> >> > is to create a release candidate?  Before I do that, are there any
> >> tickets
> >> > that need to get into the release?  I know Keith mentioned 1 or 2 and
> I
> >> > have one I'd like to finish.
> >> >
> >> > Here is what Jira says is unresolved,
> >> > https://s.apache.org/accumulo-1.8-unresolved
> >> >
> >> > On Wed I would like to move all tickets not identified for the 1.8
> >> release
> >> > to 2.0.  Then on Friday I would like to cut the first release
> candidate
> >> for
> >> > 1.8.  Is that enough time?  Anything I am missing?
> >> >
> >> > Thanks
> >> >
> >> > Mike
> >> >
> >>
> >> I think it's probably time. I don't know that I'd bump the stuff to 2.0.
> >> I'd rather bump it to 1.9, just because we've been on a roll with this
> >> backwards compatibility thing, and I think there's probably ongoing
> demand
> >> for updated 1.x versions.
> >>
> >> I'll try to go through the issues I've created (or have assigned to me)
> >> and
> >> bump them myself. So, if you could hold off on that for a few more days,
> >> it
> >> would help.
> >>
> >> Also, keep in mind, if you do bump using JIRAs batch features, you've
> got
> >> to do it multiple times, depending on if they have more than one
> >> fixVersion
> >> on them, otherwise you'll overwrite the multiple versions with a single
> >> one
> >> (or vice versa).
> >>
> >> Eg.
> >> (1.6.6, 1.7.2, 1.8.0) -> (1.6.6, 1.7.2, 1.8.1) // should just be bug
> fixes
> >> (1.7.2, 1.8.0) -> (1.7.2, 1.8.1) // should just be bug fixes
> >> (1.8.0) -> (1.8.1 or 1.9.0) // depends on if bugfix or feature addition
> >>
> >
> >
>

Fwd: Accumulo-Test-1.6-Hadoop-1 - Build # 1032 - Failure!

2016-05-26 Thread Josh Elser

Looks like hadoop-1 is still having problems on 1.6.
-- Forwarded message --
From: 
Date: May 26, 2016 8:40 PM
Subject: Accumulo-Test-1.6-Hadoop-1 - Build # 1032 - Failure!
To: 
Cc:

Accumulo-Test-1.6-Hadoop-1 - Build # 1032 - Failure:

Check console output at
https://secure.penguinsinabox.com/jenkins/job/Accumulo-Test-1.6-Hadoop-1/1032/
to view the results.

Re: Fwd: Accumulo-Test-1.6-Hadoop-1 - Build # 1032 - Failure!

2016-05-26 Thread Josh Elser

Correction, all of 1.6 and 1.7 appear busted after Christopher's asf pom
version update.
On May 26, 2016 11:11 PM, "Josh Elser"  wrote:

> Looks like hadoop-1 is still having problems on 1.6.
> -- Forwarded message --
> From: 
> Date: May 26, 2016 8:40 PM
> Subject: Accumulo-Test-1.6-Hadoop-1 - Build # 1032 - Failure!
> To: 
> Cc:
>
> Accumulo-Test-1.6-Hadoop-1 - Build # 1032 - Failure:
>
> Check console output at
> https://secure.penguinsinabox.com/jenkins/job/Accumulo-Test-1.6-Hadoop-1/1032/
> to view the results.
>

Re: Fwd: Accumulo-Test-1.6-Hadoop-1 - Build # 1032 - Failure!

2016-05-27 Thread Josh Elser


Thanks :)

Christopher wrote:

Oh, crap, I messed up jar sealing (didn't notice because we normally skip
tests during a release, and we previously only sealed jars during a
release). I will fix that later today.

On Fri, May 27, 2016 at 3:05 AM Christopher  wrote:


Hmm. I tested all of them. The PIAB builds were already failing... but
I'll look at it later today.

On Fri, May 27, 2016 at 2:13 AM Josh Elser  wrote:


Correction, all of 1.6 and 1.7 appear busted after Christopher's asf pom
version update.
On May 26, 2016 11:11 PM, "Josh Elser"  wrote:


Looks like hadoop-1 is still having problems on 1.6.
-- Forwarded message --
From:
Date: May 26, 2016 8:40 PM
Subject: Accumulo-Test-1.6-Hadoop-1 - Build # 1032 - Failure!
To:
Cc:

Accumulo-Test-1.6-Hadoop-1 - Build # 1032 - Failure:

Check console output at


https://secure.penguinsinabox.com/jenkins/job/Accumulo-Test-1.6-Hadoop-1/1032/

to view the results.

Re: Fwd: Accumulo-Test-1.6-Hadoop-1 - Build # 1032 - Failure!

2016-05-27 Thread Josh Elser

Looks like I had just done an upgrade on the box with bumped some JDK 
versions but I didn't update Jenkins to point at the new installations. 
Should be taken care of now.


Christopher wrote:

Should be fixed now. Looks like I had already done this with the 1.8 branch
at one point, but not in the old ones.

On Fri, May 27, 2016 at 11:58 AM Josh Elser  wrote:


Thanks :)

Christopher wrote:

Oh, crap, I messed up jar sealing (didn't notice because we normally skip
tests during a release, and we previously only sealed jars during a
release). I will fix that later today.

On Fri, May 27, 2016 at 3:05 AM Christopher   wrote:


Hmm. I tested all of them. The PIAB builds were already failing... but
I'll look at it later today.

On Fri, May 27, 2016 at 2:13 AM Josh Elser

wrote:

Correction, all of 1.6 and 1.7 appear busted after Christopher's asf

pom

version update.
On May 26, 2016 11:11 PM, "Josh Elser"   wrote:


Looks like hadoop-1 is still having problems on 1.6.
-- Forwarded message --
From:
Date: May 26, 2016 8:40 PM
Subject: Accumulo-Test-1.6-Hadoop-1 - Build # 1032 - Failure!
To:
Cc:

Accumulo-Test-1.6-Hadoop-1 - Build # 1032 - Failure:

Check console output at


https://secure.penguinsinabox.com/jenkins/job/Accumulo-Test-1.6-Hadoop-1/1032/

to view the results.

Re: Discussion: Address binding for monitor.

2016-05-31 Thread Josh Elser




Ed Coleman wrote:

Discovered the way the monitor determines the hostname and publishes address
for the monitor log forwarding, that is written to zookeeper for clients,
changed slightly between 1.6.4 and 1.6.5.


Being (one of?) the people who has been messing with this recently, this 
seems rather unintended.



In 1.6.4 the monitor uses InetAddress.getLocalHost(). to determine the
hostname that is written to zookeeper for discovery by the tservers.



In 1.6.5 it uses the -address command line parameter. This is getting set by
the server-start script, which calls accumulo-env.sh. In accumulo-env.sh we
had the monitor set to bind to all interfaces at the default port, 4560.
This resolved to the monitor binding to 0.0.0.0:4560 - which is correct for
the monitor. However, it then put 0.0.0.0:4560 into zookeeper for the
tservers - at which point they could not publish log messages to the
monitor.


Advertising 0.0.0.0 is just flat out wrong. Determining which interface 
(when presented with multiple) is difficult though..



Setting the monitor to not bind to all interfaces in accumulo-env.sh, the
server start script then uses the monitor hostname, and this is published
and the loggers had a valid hostname:port to forward logs to. But now, the
monitor is only bound to the interface that resolves to the hostname - if
other interface(s) were used, the monitor is not going to receive log
messages sent to them. (If the interfaces are bound together through the OS,
this is not an issue.)



It seems like there should be two parameters to control this behavior - one
for setting the bind address for the monitor, another to set the "external"
address that is published into zookeeper so that tservers can find the
host:port to forward log messages to.


At first glance, I would say the quick fix would be to use 
InetAddress.getLocalHost and do the rDNS lookup on that (which, IIRC, 
should use the default name resolution configured on the box and 
/etc/hosts). I'm still not 100% sure how this would work on systems with 
multiple NICs, but I think it would be consistent with how 1.6.4 worked?



It seems that this could be useful for running Accumulo in containers that
may have different "virtual" interface / ports. And there may be other
similar configuration changes that we can consider. Wanted to open a
discussion to see if there are other considerations / requirements and
services that should be considered before any change is recommended.


Completely agree that this is something where we don't entirely have the 
necessary configuration in place. It would be great to evaluate what is 
necessary and how best we can improve on what we have now (maybe even 
surveying what other people do).


Thanks for writing this up. This is a really good catch.


Ed Coleman

No more Jenkins ITs from me

2016-05-31 Thread Josh Elser

I had a talk with Christopher today in IRC which ultimately boiled down 
to him asking me to disable the Jenkins emails which I run which are 
sent to notifications@a.a.o.


For a long time now, my little server which can usually run 1.6 ITs has 
been unable to run the full IT suite on >=1.7 (read as: since we've been 
doing 1.7 dev in earnest). I've never found the time to actually figure 
out what has changed at this gap to consistently trigger the kernel's 
OOM killer.


Anyways, if anyone did look at these, sorry. You won't be seeing them 
anymore.

Re: Minimum supported Hadoop?

2016-06-01 Thread Josh Elser

For that reasoning, wouldn't bumping to 2.6.4 be better (as long as 
Hadoop didn't do anything screwy that they shouldn't have in a 
maintenance release...)


I have not looked at deltas between 2.6.1 and 2.6.4

Christopher wrote:

I was looking at the recently bumped tickets and noticed
https://issues.apache.org/jira/browse/ACCUMULO-4150

It seems to me that we may want to make our minimum supported Hadoop
version 2.6.1, at least for the 1.8.0 release.

That's not to say it won't work with other versions... just that it's not
something we're testing for in the latest release, and isn't recommended
(and possibly, a downstream packager may need to patch Accumulo to support
the older version).

Re: Apache Accumulo integrated with Presto

2016-06-13 Thread Josh Elser

Adam J. Shook wrote:

A few clarifications:

- Presto supports hash-based distributed joins as well as broadcast joins

- Presto metadata is stored in ZooKeeper, but metadata storage is pluggable
and could be stored in Accumulo instead

- The connector does use tablet locality when scanning Accumulo, but our
testing has shown you get better performance by giving Accumulo and Presto
their own dedicated machines, making locality a moot point. This will
certainly change based on types of queries, data sizes, network quality,
etc.

I'm a little confused by this. Co-locating presto workers and tservers
doesn't necessarily mean that you're going to get local reads/writes at
the Accumulo layer. I remember this is something that the Argyle Data
folks had found was a big gain for them when they were doing
Presto+Accumulo. Maybe your findings were more based on your specific
circumstances?

- You can insert the results of a query into a Presto table using INSERT
INTO foo SELECT ..., as well as create a table from the results of a query
(CTAS). Though, for large inserts, it is typically best to bypass the
Presto layer and insert directly into the Accumulo tables using the
PrestoBatchWriter API

Cheers,
--Adam

On Mon, Jun 13, 2016 at 7:20 AM, Christopher wrote:

Thanks for that summary, Dylan! Very helpful.

On Mon, Jun 13, 2016, 01:36 Dylan Hutchison
wrote:

Thanks for sharing Sean. Here are some notes I wrote after reading the
article on Presto-Accumulo design. I have a research interest in the
relationship between relational (SQL) and non-relational (Accumulo)
systems, so I couldn't resist reading the post in detail.

- Places the primary key in the Accumulo row.
- Performs row-at-a-time processing (each tuple is one row in
Accumulo) using WholeRowIterator behavior.
- Relational table metadata is stored in the Presto infrastructure (as
opposed to an Accumulo table).
- Supports the creation of index tables for any attributes. These
index tables speed up queries that filter on indexed attributes. It

standard secondary indexing, which provides speedups when the

selectivity

of the query is roughly<10% of the original table.
- Only database->client querying is supported. You cannot run "select
... into result_table".
- As far as I can see, Presto only has one join strategy: *broadcast
join*. The right table of every join is scanned into one of the
Presto worker's memory. Subsequently the size of the right table is
limited by worker memory.
- There is one Presto worker for each Accumulo tablet, which enables
good scaling.
- The Presto bridge classes track internal Accumulo information such
as the assignment of tablets to tablet servers by reading Accumulo's
Metadata table. Presto uses tablet locations to provide better

locality.

- The Presto bridge comes with several Accumulo server-side iterators
for filtering and aggregating.
- The code is quite nice and clean.

This image below gives Presto's architecture. Accumulo takes the role of
the DB icon in the bottom-right corner.

[image: Inline image 2]

Bloomberg ran 13 out of the 22 TPC-H queries. There is no fundamental
reason why they cannot run all the queries; they just have not

implemented

everything required ('exists' clauses, non-equi join, etc.).

The interface looks like this, though they use a compiled java jar to
insert entries from a csv file (it wraps around a BatchWriter).

[image: Inline image 3]

Here are performance results. They don't say what hardware or data sizes
they use. Whatever it is, they must have the ability to fit the smaller
table of any join into memory as a result of Presto's broadcast join
strategy. The strong scaling looks very nice.

[image: Inline image 4]

They have one other plot that shows how secondary indexing speeds up some
queries with low selectivity.

Cheers, Dylan

On Sun, Jun 12, 2016 at 7:06 PM, Sean Busbey

wrote:

Bloomberg have a post about a connector they made to query Accumulo from
Presto:

http://www.bloomberg.com/company/announcements/open-source-at-bloomberg-reducing-application-development-time-via-presto-accumulo/

--
Sean Busbey

Re: Apache Accumulo integrated with Presto

2016-06-13 Thread Josh Elser

Interesting! Yeah, I'm not sure anymore (assuming I even knew at one 
time). I think they've moved away from it entirely and the code wasn't 
open-sourced, so it may just be lost to the ages :)


Either way, thanks for the extra details!

Adam J. Shook wrote:

Maybe.  I'd be interested in what they'd done to get the big gains.  My
approach was, for a full table scan (vs. using the index), the connector
creates one Presto split for each tablet.  The Accumulo metadata table is
scanned to get the tablet location, and the host hint for the Presto split
is set to where the tablet is located.  Using this approach, the query
times were nearly identical whether the workers were co-located with tablet
servers or not.

On Mon, Jun 13, 2016 at 11:55 AM, Josh Elser  wrote:



Adam J. Shook wrote:


A few clarifications:

- Presto supports hash-based distributed joins as well as broadcast joins

- Presto metadata is stored in ZooKeeper, but metadata storage is
pluggable
and could be stored in Accumulo instead

- The connector does use tablet locality when scanning Accumulo, but our
testing has shown you get better performance by giving Accumulo and Presto
their own dedicated machines, making locality a moot point.  This will
certainly change based on types of queries, data sizes, network quality,
etc.


I'm a little confused by this. Co-locating presto workers and tservers
doesn't necessarily mean that you're going to get local reads/writes at the
Accumulo layer. I remember this is something that the Argyle Data folks had
found was a big gain for them when they were doing Presto+Accumulo. Maybe
your findings were more based on your specific circumstances?


- You can insert the results of a query into a Presto table using INSERT

INTO foo SELECT ..., as well as create a table from the results of a query
(CTAS).  Though, for large inserts, it is typically best to bypass the
Presto layer and insert directly into the Accumulo tables using the
PrestoBatchWriter API

Cheers,
--Adam

On Mon, Jun 13, 2016 at 7:20 AM, Christopher   wrote:

Thanks for that summary, Dylan! Very helpful.

On Mon, Jun 13, 2016, 01:36 Dylan Hutchison
wrote:

Thanks for sharing Sean.  Here are some notes I wrote after reading the

article on Presto-Accumulo design.  I have a research interest in the
relationship between relational (SQL) and non-relational (Accumulo)
systems, so I couldn't resist reading the post in detail.

 - Places the primary key in the Accumulo row.
 - Performs row-at-a-time processing (each tuple is one row in
 Accumulo) using WholeRowIterator behavior.
 - Relational table metadata is stored in the Presto infrastructure
(as
 opposed to an Accumulo table).
 - Supports the creation of index tables for any attributes. These
 index tables speed up queries that filter on indexed attributes.  It


is


 standard secondary indexing, which provides speedups when the


selectivity


 of the query is roughly<10% of the original table.
 - Only database->client querying is supported.  You cannot run
"select
 ... into result_table".
 - As far as I can see, Presto only has one join strategy: *broadcast
 join*.  The right table of every join is scanned into one of the
 Presto worker's memory.  Subsequently the size of the right table is
 limited by worker memory.
 - There is one Presto worker for each Accumulo tablet, which enables
 good scaling.
 - The Presto bridge classes track internal Accumulo information such
 as the assignment of tablets to tablet servers by reading Accumulo's
 Metadata table. Presto uses tablet locations to provide better


locality.


 - The Presto bridge comes with several Accumulo server-side
iterators
 for filtering and aggregating.
 - The code is quite nice and clean.

This image below gives Presto's architecture.  Accumulo takes the role
of
the DB icon in the bottom-right corner.

[image: Inline image 2]

Bloomberg ran 13 out of the 22 TPC-H queries.  There is no fundamental
reason why they cannot run all the queries; they just have not


implemented


everything required ('exists' clauses, non-equi join, etc.).

The interface looks like this, though they use a compiled java jar to
insert entries from a csv file (it wraps around a BatchWriter).

[image: Inline image 3]

Here are performance results.  They don't say what hardware or data
sizes
they use.  Whatever it is, they must have the ability to fit the smaller
table of any join into memory as a result of Presto's broadcast join
strategy.  The strong scaling looks very nice.

[image: Inline image 4]

They have one other plot that shows how secondary indexing speeds up
some
queries with low selectivity.

Cheers, Dylan



On Sun, Jun 12, 2016 at 7:06 PM, Sean Busbey


wrote:


Bloomberg have a post about a connector they made to query Accumulo from

Presto:





http://www.bl

Re: [VOTE] Accumulo 1.7.2-rc1

2016-06-14 Thread Josh Elser


Thanks for putting this together, Mike.

What kind of testing have you done so far and how have the results looked?

Mike Drob wrote:

Accumulo Developers,

Please consider the following candidate for Accumulo 1.7.2.

All content generated via
 assemble/build.sh --create-release-candidate -P '!thrift'

Git Commit:
 a49291b8aa85b81650f2b79f80b400e10b594795
Branch:
 1.7.2-rc1

If this vote passes, a gpg-signed tag will be created using:
 git tag -f -m 'Apache Accumulo 1.7.2' -s rel/1.7.2
a49291b8aa85b81650f2b79f80b400e10b594795

Staging repo:
https://repository.apache.org/content/repositories/orgapacheaccumulo-1051
Source (official release artifact):
https://repository.apache.org/content/repositories/orgapacheaccumulo-1051/org/apache/accumulo/accumulo/1.7.2/accumulo-1.7.2-src.tar.gz
Binary:
https://repository.apache.org/content/repositories/orgapacheaccumulo-1051/org/apache/accumulo/accumulo/1.7.2/accumulo-1.7.2-bin.tar.gz
(Append ".sha1", ".md5", or ".asc" to download the signature/hash for a
given artifact.)

All artifacts were built and staged with:
 mvn release:prepare&&  mvn release:perform

Signing keys are available at https://www.apache.org/dist/accumulo/KEYS
(Expected fingerprint: 6A90F817481A73C1C419B62D8F2F220786C4FB2A
86EDB9C33B8517228E88A8F93E48C0C6EF362B9E)

Release notes (in progress) can be found at:
https://accumulo.apache.org/release_notes/1.7.2

Please vote one of:
[ ] +1 - I have verified and accept...
[ ] +0 - I have reservations, but not strong enough to vote against...
[ ] -1 - Because..., I do not accept...
... these artifacts as the 1.7.2 release of Apache Accumulo.

This vote will end on Fri Jun 17 19:30:00 UTC 2016
(Fri Jun 17 15:30:00 EDT 2016 / Fri Jun 17 12:30:00 PDT 2016)

Thanks!

P.S. Hint: download the whole staging repo with
 wget -erobots=off -r -l inf -np -nH \

https://repository.apache.org/content/repositories/orgapacheaccumulo-1051/
 # note the trailing slash is needed

Re: [VOTE] Accumulo 1.7.2-rc1

2016-06-15 Thread Josh Elser

Thanks for the info, Mike!

Keith Turner wrote:

On Tue, Jun 14, 2016 at 5:47 PM, Mike Drob  wrote:

>  Unit tests pass.
>
>  ITs mostly pass. Have had transient failures on some, but have seen them
>  all pass as well:
> DurabilityIT (ACCUMULO-4343)
> ChaoticBalancerIT (times out sometimes, completes other times, unable to
>  consistently reproduce)
> AssignmentThreadsIT.testConcurrentAssignmentPerformance (was not
>  performant enough, I assume this was a hardware contention issue)
>
>  Installed on a cdh5.7.0 cluster and did some basic insert and query
>  operations.
>
>  Have not run CI, RW, or anything for replication tests yet. I'm about 50/50
>  on having time to do that this week, so if anybody else wants to volunteer,
>

I started CI w/o agitationrunning on 9  EC2 nodesyesterday.   I will
probably start a run w/ agitation.

Thanks, Keith. I wouldn't be able to get anything together until 
Thursday night at best. If you want to extend the VOTE over the weekend, 
I'll try to run stuff too, but I don't think it's necessary if Keith is 
on the ball.

>  that would be swell. I know we've discussed a lower test bar for bugfix
>  releases in the past, so maybe we don't strictly need them? (A vote thread
>  is probably not the best place to discuss this, though).
>

I agree.  I think these decisions can be made per release.  If someone
feels inadequate testing was done for a release they can vote accordingly
and explain the vote.

Ditto. I was asking just to understand what you had done so far, not to 
turn around and tell you that you didn't do enough :)

A quick CI and/or RW is always nice, but definitely not the 1 and 3 day 
runs that we do for the "major" (in our old terms, "minor" in terms of 
semver) releases.

Re: [VOTE] Accumulo 1.7.2-rc1

2016-06-17 Thread Josh Elser

Mike Drob wrote:

Thanks for taking a look, Sean.

The LICENSE file in the source tarball refers to the BSD license and
includes "for details see
core/src/main/java/org/apache/accumulo/core/bloomfilter" and all files
there (BloomFilter.java, DynamicBloomFilter.java, and Filter.java) include
the full 3-Clause BSD license in the header. Similarly, the MIT clause has
"for details see server/monitor/src/main/resources/web/flot" which has a
LICENSE.txt

I have absolutely no idea if this is "sufficient" or not. I can 
understand Sean's confusion in not seeing relevant licenses in the 
LICENSE file.

Regarding the crypto, according towww.apache.org/dev/crypto.html#inform  it
looks like we need to place that disclaimer in the README and not the
NOTICE file anyway. If you prefer this reading of the policy, can you file
a JIRA for making these changes and set it as a blocker? Thanks.

Yeah, NOTICE is not the right place for this, AFAIU. I wouldn't think 
this is a blocker, but something we should just remove from the 
src-release's NOTICE file.

Mike

On Fri, Jun 17, 2016 at 12:28 PM, Sean Busbey  wrote:

>  -1
>
>  good:
>
>  * verified checksums and signatures
>  * source artifact corresponds to referenced commit
>  * source builds correctly with Oracle JDK 1.7.0_80 / Apache Maven
>  3.3.9 (including unit tests, not including ITs)
>
>  bad:
>
>  * LICENSE in source tarball references the "3 clause BSD" and "MIT"
>  licenses but does not provide their text or a pointer to where the
>  text can be found in the artifact.
>  * NOTICE in the binary tarball doesn't include any of the encryption
>  notice stuff that's in the source tarball NOTICE (I don't know if this
>  information is required in the NOTICE file, but it seems like we
>  should be consistent for things that are equally applicable in the
>  two).

Re: Code Quality Improvements for accumulo

2016-06-19 Thread Josh Elser

Hi George,

We would be happy for any contributions you'd like to make to Accumulo. We
do try to keep up on what Findbugs reports already but I'm sure some slip
through.

I've cc'ed the developer list. Please use that for future correspondence.

Thanks!

- Josh
On Jun 19, 2016 12:51 PM, "George Kankava" 
wrote:

> Hello,
>
> I'd like to send you some pull requests to improve the maintainability
> of accumulo.
>
> My company - DevFactory - is sponsoring me to identify and fix code
> quality issues and improve unit test coverage in open source projects.
> DevFactory is obsessed with code quality and is providing its commercially
> available code quality improvement service for free to qualified
> open-source projects.
>
> If you are interested, please let me know and we will add it to our
> pipeline. Our first step will be to utilize tools like PMD, FindBugs and
> Sonar to identify the most important issues to fix. Once we fix them, we'll
> follow up with some pull requests.
>
> Thanks,
> George Kankava
>

Re: [VOTE] Accumulo 1.7.2-rc2

2016-06-19 Thread Josh Elser

Dylan Hutchison wrote:

+1 with notes below~

* NOTICE and LICENSE look good to my inexperienced eyes.
* Source-compiled binary tar.gz matches the binary tar.gz artifact, except
for META-INF entries.
* Unit tests pass.
* Good checksums and sigs. Fingerprint matches Mike's key.
* Graphulo tests pass.

Yay, API compatibility :)

* Sunny integration tests pass on a single-node standalone deployment.
Tested on Zookeeper 3.4.6 and both Hadoop 2.4.1 and 2.7.2.

Notes / Questions:

1. On the ITs: for some reason I can't figure out, the "stop Accumulo
processes" part of ReadWriteIT#sunnyDay gives me trouble when I run it
alongside the others, but it passes when I run it alone. Similar story for
ExamplesIT#testBulkIngest.

Interesting. Are you setting forkMode > 1? Or running multiple
invocations of the build at the same time? I wouldn't be surprised if
some of the logic we have to 'test' is actually wrong when we have
concurrent processes running, but I'm not sure why these two in
particular would have troubles.

2. On diffing the source-built binary with the binary artifact: it seems
the source-built binary has more license information in
the META-INF/DEPENDENCIES than the binary artifact, in addition to a few of
the entries being permuted. This holds true for all the jars except
accumulo-fate.jar. Here is a pastebin for the source-built binary deps
, and a pastebin for the binary artifact
deps for accumulo-core.jar. Here is
a pastebin
of their diff. I don't know how
significant the difference is; maybe Sean or Christopher could comment.

This is probably due to the difference in the release-process creation
of the binary tarball and what gets built when you just do a `mvn
package` on your computer (e.g. activating the 'apache-release' Maven
profile). I also see findbugs in the list, so that's likely unintended.

Overall, for the purposes of the ASF licensing, the DEPENDENCIES file is
a "nice to have" (LICENSE and NOTICE are the ones we really need to get
right).

Also, with your commit bit, you can also use paste.apache.org if you
want to avoid the ads on pastebin :)

3. Is it good practice to use a code-signing key with no expiration date?

As I understand it, it's not bad like a non-expiring password, but it's
good to have an expiration date. If you do lose/compromise your key, at
least everyone knows that there is a certain date the key is no longer
valid. It's also easy to extend the validity of your key, IIRC.

On Fri, Jun 17, 2016 at 9:31 PM, Mike Drob wrote:

Accumulo Developers,

Please consider the following candidate for Accumulo 1.7.2.

All content generated via
assemble/build.sh --create-release-candidate -P '!thrift'

Changes from 1.7.2-rc1

ACCUMULO-4346 correct LICENSE file for source to include text of reference
ACCUMULO-4347 Crypto notification should be in README files instead of
NOTICE

Git Commit:
a01e67741d101c3d87f1d6e16d54ff7a96951ad0
Branch:
1.7.2-rc2

If this vote passes, a gpg-signed tag will be created using:
git tag -f -m 'Apache Accumulo 1.7.2' -s rel/1.7.2
a01e67741d101c3d87f1d6e16d54ff7a96951ad0

Staging repo:
https://repository.apache.org/content/repositories/orgapacheaccumulo-1052
Source (official release artifact):

https://repository.apache.org/content/repositories/orgapacheaccumulo-1052/org/apache/accumulo/accumulo/1.7.2/accumulo-1.7.2-src.tar.gz
Binary:

https://repository.apache.org/content/repositories/orgapacheaccumulo-1052/org/apache/accumulo/accumulo/1.7.2/accumulo-1.7.2-bin.tar.gz
(Append ".sha1", ".md5", or ".asc" to download the signature/hash for a
given artifact.)

All artifacts were built and staged with:
mvn release:prepare&& mvn release:perform

Signing keys are available at https://www.apache.org/dist/accumulo/KEYS
(Expected fingerprint: 86EDB9C33B8517228E88A8F93E48C0C6EF362B9E)

Release notes (in progress) can be found at:
https://accumulo.apache.org/release_notes/1.7.2

Please vote one of:
[ ] +1 - I have verified and accept...
[ ] +0 - I have reservations, but not strong enough to vote against...
[ ] -1 - Because..., I do not accept...
... these artifacts as the 1.7.2 release of Apache Accumulo.

This vote will end on Tue Jun 21 05:00:00 UTC 2016
(Tue Jun 21 01:00:00 EDT 2016 / Mon Jun 20 22:00:00 PDT 2016)

Thanks!

P.S. Hint: download the whole staging repo with
wget -erobots=off -r -l inf -np -nH \

https://repository.apache.org/content/repositories/orgapacheaccumulo-1052/
# note the trailing slash is needed

Re: [VOTE] Accumulo 1.7.2-rc2

2016-06-19 Thread Josh Elser


-1 (binding)

HTrace's NOTICE is missing in -bin's NOTICE file

"""
htrace-core
Copyright 2015 The Apache Software Foundation
"""

The good:

* Can run with bin tarball out of the box. Simple write/read/update/read 
works in the shell.

* `mvn verify -Psunny` passes on src tarball
* xsums/sigs are fine.
* Good on you using the long-form keyid
* Verified no Thrift changes (accounting for the !thrift)
* Verified no changes to public API code (to avoid running japi)

Sorry, Mike. Wish I could've caught this one on rc1.

- Josh

Mike Drob wrote:

Accumulo Developers,

Please consider the following candidate for Accumulo 1.7.2.

All content generated via
 assemble/build.sh --create-release-candidate -P '!thrift'

Changes from 1.7.2-rc1

ACCUMULO-4346 correct LICENSE file for source to include text of reference
ACCUMULO-4347 Crypto notification should be in README files instead of
NOTICE

Git Commit:
 a01e67741d101c3d87f1d6e16d54ff7a96951ad0
Branch:
 1.7.2-rc2

If this vote passes, a gpg-signed tag will be created using:
 git tag -f -m 'Apache Accumulo 1.7.2' -s rel/1.7.2
a01e67741d101c3d87f1d6e16d54ff7a96951ad0

Staging repo:
https://repository.apache.org/content/repositories/orgapacheaccumulo-1052
Source (official release artifact):
https://repository.apache.org/content/repositories/orgapacheaccumulo-1052/org/apache/accumulo/accumulo/1.7.2/accumulo-1.7.2-src.tar.gz
Binary:
https://repository.apache.org/content/repositories/orgapacheaccumulo-1052/org/apache/accumulo/accumulo/1.7.2/accumulo-1.7.2-bin.tar.gz
(Append ".sha1", ".md5", or ".asc" to download the signature/hash for a
given artifact.)

All artifacts were built and staged with:
 mvn release:prepare&&  mvn release:perform

Signing keys are available at https://www.apache.org/dist/accumulo/KEYS
(Expected fingerprint: 86EDB9C33B8517228E88A8F93E48C0C6EF362B9E)

Release notes (in progress) can be found at:
https://accumulo.apache.org/release_notes/1.7.2

Please vote one of:
[ ] +1 - I have verified and accept...
[ ] +0 - I have reservations, but not strong enough to vote against...
[ ] -1 - Because..., I do not accept...
... these artifacts as the 1.7.2 release of Apache Accumulo.

This vote will end on Tue Jun 21 05:00:00 UTC 2016
(Tue Jun 21 01:00:00 EDT 2016 / Mon Jun 20 22:00:00 PDT 2016)

Thanks!

P.S. Hint: download the whole staging repo with
 wget -erobots=off -r -l inf -np -nH \

https://repository.apache.org/content/repositories/orgapacheaccumulo-1052/
 # note the trailing slash is needed

Re: [VOTE] Accumulo 1.7.2-rc2

2016-06-19 Thread Josh Elser

I just did a `git log --numstat rel/1.7.1..origin/1.7.2-rc2` and 
manually inspected the output. There were only about 30 commits this 
time, so it was pretty simple.


Dylan Hutchison wrote:

Out of curiosity Josh, what tool do you use to verify no public API
changes, if not japi? I should also read the pom on !thrift.
On Jun 19, 2016 9:26 PM, "Josh Elser"  wrote:

-1 (binding)

HTrace's NOTICE is missing in -bin's NOTICE file

"""
htrace-core
Copyright 2015 The Apache Software Foundation
"""

The good:

* Can run with bin tarball out of the box. Simple write/read/update/read
works in the shell.
* `mvn verify -Psunny` passes on src tarball
* xsums/sigs are fine.
* Good on you using the long-form keyid
* Verified no Thrift changes (accounting for the !thrift)
* Verified no changes to public API code (to avoid running japi)

Sorry, Mike. Wish I could've caught this one on rc1.

- Josh

Mike Drob wrote:


Accumulo Developers,

Please consider the following candidate for Accumulo 1.7.2.

All content generated via
  assemble/build.sh --create-release-candidate -P '!thrift'

Changes from 1.7.2-rc1

ACCUMULO-4346 correct LICENSE file for source to include text of reference
ACCUMULO-4347 Crypto notification should be in README files instead of
NOTICE

Git Commit:
  a01e67741d101c3d87f1d6e16d54ff7a96951ad0
Branch:
  1.7.2-rc2

If this vote passes, a gpg-signed tag will be created using:
  git tag -f -m 'Apache Accumulo 1.7.2' -s rel/1.7.2
a01e67741d101c3d87f1d6e16d54ff7a96951ad0

Staging repo:
https://repository.apache.org/content/repositories/orgapacheaccumulo-1052
Source (official release artifact):

https://repository.apache.org/content/repositories/orgapacheaccumulo-1052/org/apache/accumulo/accumulo/1.7.2/accumulo-1.7.2-src.tar.gz
Binary:

https://repository.apache.org/content/repositories/orgapacheaccumulo-1052/org/apache/accumulo/accumulo/1.7.2/accumulo-1.7.2-bin.tar.gz
(Append ".sha1", ".md5", or ".asc" to download the signature/hash for a
given artifact.)

All artifacts were built and staged with:
  mvn release:prepare&&   mvn release:perform


Signing keys are available at https://www.apache.org/dist/accumulo/KEYS
(Expected fingerprint: 86EDB9C33B8517228E88A8F93E48C0C6EF362B9E)

Release notes (in progress) can be found at:
https://accumulo.apache.org/release_notes/1.7.2

Please vote one of:
[ ] +1 - I have verified and accept...
[ ] +0 - I have reservations, but not strong enough to vote against...
[ ] -1 - Because..., I do not accept...
... these artifacts as the 1.7.2 release of Apache Accumulo.

This vote will end on Tue Jun 21 05:00:00 UTC 2016
(Tue Jun 21 01:00:00 EDT 2016 / Mon Jun 20 22:00:00 PDT 2016)

Thanks!

P.S. Hint: download the whole staging repo with
  wget -erobots=off -r -l inf -np -nH \

https://repository.apache.org/content/repositories/orgapacheaccumulo-1052/
  # note the trailing slash is needed

Re: [VOTE] Accumulo 1.7.2-rc2

2016-06-20 Thread Josh Elser

My understanding was that you don't need the "developed at the Apache
Software Foundation" line, but the copyright line was still necessary.

http://www.apache.org/dev/licensing-howto.html#bundle-asf-product

However, I forget about LEGAL-234. It looks like we just dropped the
ball to correct both our NOTICE files and the "licensing howto"
documentation. Thanks for reminding me, Christopher.

This changes my -1 to a +1 (binding). Let's also get issues filed to fix
this stuff before the next release (to avoid this one again).

Mike Drob wrote:

Ah, I didn't see this response while composing my other one. A similar
question then, Christopher. Do we need to remove all the individual project
lines?

On Mon, Jun 20, 2016 at 11:31 AM, Christopher wrote:

On Sun, Jun 19, 2016 at 9:26 PM Josh Elser wrote:

-1 (binding)

HTrace's NOTICE is missing in -bin's NOTICE file

The NOTICE file should not have entries for other ASF-copyrighted
materials. The ones that are there already should not be there. This was
discussed on https://issues.apache.org/jira/browse/LEGAL-234 with the
conclusion that these were superfluous and the NOTICE could be simplified
by omitting these. The fact that these aren't necessary, and the general
instructions to omit anything that isn't legally required (
http://www.apache.org/dev/licensing-howto.html#mod-notice), means these
shouldn't be there.

I provide this information in case it changes your vote.

The good:

* Can run with bin tarball out of the box. Simple write/read/update/read
works in the shell.
* `mvn verify -Psunny` passes on src tarball
* xsums/sigs are fine.
* Good on you using the long-form keyid
* Verified no Thrift changes (accounting for the !thrift)
* Verified no changes to public API code (to avoid running japi)

Sorry, Mike. Wish I could've caught this one on rc1.

- Josh

Mike Drob wrote:

Accumulo Developers,

Please consider the following candidate for Accumulo 1.7.2.

All content generated via
assemble/build.sh --create-release-candidate -P '!thrift'

Changes from 1.7.2-rc1

ACCUMULO-4346 correct LICENSE file for source to include text of

reference

ACCUMULO-4347 Crypto notification should be in README files instead of
NOTICE

Git Commit:
a01e67741d101c3d87f1d6e16d54ff7a96951ad0
Branch:
1.7.2-rc2

If this vote passes, a gpg-signed tag will be created using:
git tag -f -m 'Apache Accumulo 1.7.2' -s rel/1.7.2
a01e67741d101c3d87f1d6e16d54ff7a96951ad0

Staging repo:

https://repository.apache.org/content/repositories/orgapacheaccumulo-1052

Source (official release artifact):

https://repository.apache.org/content/repositories/orgapacheaccumulo-1052/org/apache/accumulo/accumulo/1.7.2/accumulo-1.7.2-src.tar.gz

Binary:

https://repository.apache.org/content/repositories/orgapacheaccumulo-1052/org/apache/accumulo/accumulo/1.7.2/accumulo-1.7.2-bin.tar.gz

(Append ".sha1", ".md5", or ".asc" to download the signature/hash for a
given artifact.)

All artifacts were built and staged with:
mvn release:prepare&& mvn release:perform

Signing keys are available at

https://www.apache.org/dist/accumulo/KEYS

(Expected fingerprint: 86EDB9C33B8517228E88A8F93E48C0C6EF362B9E)

Release notes (in progress) can be found at:
https://accumulo.apache.org/release_notes/1.7.2

This vote will end on Tue Jun 21 05:00:00 UTC 2016
(Tue Jun 21 01:00:00 EDT 2016 / Mon Jun 20 22:00:00 PDT 2016)

Thanks!

P.S. Hint: download the whole staging repo with
wget -erobots=off -r -l inf -np -nH \

https://repository.apache.org/content/repositories/orgapacheaccumulo-1052/

# note the trailing slash is needed

Re: [VOTE] Accumulo 1.7.2-rc2

2016-06-20 Thread Josh Elser




Christopher wrote:

+0

* Verified all hashes, sigs
* Unit tests and ITs all pass
(org.apache.accumulo.test.BadDeleteMarkersCreatedIT.test timed out the
first time, but passed on re-run)
* Verified contents of bin tarball match jars in staging repo and src
tarball match rc branch in git

My only concern would be that
https://issues.apache.org/jira/browse/ACCUMULO-4317 was marked as a
blocker, but was bumped. So, I'll hold my +1 until it's agreed whether
that's actually a blocker and should be included in 1.7.2, or if it's not a
blocker and okay to bump.


I'd encourage you to weigh in on how critical you think it is on the 
JIRA issue. It seems pretty bad to me that we fail like this, but as 
Mike rightfully points out, it isn't a new bug. We failed to get the 
patch applied and I think that looks sloppy on our part (I would hate 
for someone else to run into the same issue with a 1.7 and still have no 
release which contains the fix available).


It's on all of us to decide whether or not the severity of the issue, 
not just how I throw my weight around :)



On Sat, Jun 18, 2016 at 12:31 AM Mike Drob  wrote:


Accumulo Developers,

Please consider the following candidate for Accumulo 1.7.2.

All content generated via
 assemble/build.sh --create-release-candidate -P '!thrift'

Changes from 1.7.2-rc1

ACCUMULO-4346 correct LICENSE file for source to include text of reference
ACCUMULO-4347 Crypto notification should be in README files instead of
NOTICE

Git Commit:
 a01e67741d101c3d87f1d6e16d54ff7a96951ad0
Branch:
 1.7.2-rc2

If this vote passes, a gpg-signed tag will be created using:
 git tag -f -m 'Apache Accumulo 1.7.2' -s rel/1.7.2
a01e67741d101c3d87f1d6e16d54ff7a96951ad0

Staging repo:
https://repository.apache.org/content/repositories/orgapacheaccumulo-1052
Source (official release artifact):

https://repository.apache.org/content/repositories/orgapacheaccumulo-1052/org/apache/accumulo/accumulo/1.7.2/accumulo-1.7.2-src.tar.gz
Binary:

https://repository.apache.org/content/repositories/orgapacheaccumulo-1052/org/apache/accumulo/accumulo/1.7.2/accumulo-1.7.2-bin.tar.gz
(Append ".sha1", ".md5", or ".asc" to download the signature/hash for a
given artifact.)

All artifacts were built and staged with:
 mvn release:prepare&&  mvn release:perform

Signing keys are available at https://www.apache.org/dist/accumulo/KEYS
(Expected fingerprint: 86EDB9C33B8517228E88A8F93E48C0C6EF362B9E)

Release notes (in progress) can be found at:
https://accumulo.apache.org/release_notes/1.7.2

Please vote one of:
[ ] +1 - I have verified and accept...
[ ] +0 - I have reservations, but not strong enough to vote against...
[ ] -1 - Because..., I do not accept...
... these artifacts as the 1.7.2 release of Apache Accumulo.

This vote will end on Tue Jun 21 05:00:00 UTC 2016
(Tue Jun 21 01:00:00 EDT 2016 / Mon Jun 20 22:00:00 PDT 2016)

Thanks!

P.S. Hint: download the whole staging repo with
 wget -erobots=off -r -l inf -np -nH \

https://repository.apache.org/content/repositories/orgapacheaccumulo-1052/
 # note the trailing slash is needed

Re: Code Quality Improvements for accumulo

2016-06-21 Thread Josh Elser

Again, please use the mailing list dev@accumulo.apache.org for all 
future Apache Accumulo related correspondence.


Thanks.

George Kankava wrote:

Hi Josh,

I have created several PRs for your project.
Hope they will be useful and you will like them.
Please feel free to chose which one of them you would like to merge
and let me know if there are any issues I will be happy to fix.

Regards,
George

On Sun, Jun 19, 2016 at 9:57 PM, Josh Elser mailto:josh.el...@gmail.com>> wrote:

Hi George,

We would be happy for any contributions you'd like to make to
Accumulo. We do try to keep up on what Findbugs reports already but
I'm sure some slip through.

I've cc'ed the developer list. Please use that for future
correspondence.

Thanks!

- Josh

On Jun 19, 2016 12:51 PM, "George Kankava"
mailto:george.kank...@devfactory.com>> wrote:

Hello,

I'd like to send you some pull requests to improve the
maintainability of accumulo.

My company - DevFactory - is sponsoring me to identify and fix
code quality issues and improve unit test coverage in open
source projects. DevFactory is obsessed with code quality and is
providing its commercially available code quality improvement
service for free to qualified open-source projects.

If you are interested, please let me know and we will add it to
our pipeline. Our first step will be to utilize tools like PMD,
FindBugs and Sonar to identify the most important issues to fix.
Once we fix them, we'll follow up with some pull requests.

Thanks,
George Kankava

Re: Code Quality Improvements for accumulo

2016-06-21 Thread Josh Elser

Superb. I half wondered what was going on and why I was emailed in the 
first place.


Let's give them the benefit of the doubt that they're going to actually 
respond and then close (assuming they do in fact never respond).


Dylan Hutchison wrote:

What a funny series of PRs to see this morning. As you probably figured out
Josh, the DevFactory folks run scripts that download the code of
open-source repos, run them with lots of lint, and create PRs with
generated fixes.

No human created these PRs; no human will respond to your comments. It is
up to us to accept real issues and reject false positives (e.g. changes to
thrift classes).
On Jun 21, 2016 10:55 AM, "Josh Elser"  wrote:


Again, please use the mailing list dev@accumulo.apache.org for all future
Apache Accumulo related correspondence.

Thanks.

George Kankava wrote:


Hi Josh,

I have created several PRs for your project.
Hope they will be useful and you will like them.
Please feel free to chose which one of them you would like to merge
and let me know if there are any issues I will be happy to fix.

Regards,
George

On Sun, Jun 19, 2016 at 9:57 PM, Josh Elsermailto:josh.el...@gmail.com>>  wrote:

 Hi George,

 We would be happy for any contributions you'd like to make to
 Accumulo. We do try to keep up on what Findbugs reports already but
 I'm sure some slip through.

 I've cc'ed the developer list. Please use that for future
 correspondence.

 Thanks!

 - Josh

 On Jun 19, 2016 12:51 PM, "George Kankava"
 mailto:george.kank...@devfactory.com>>  wrote:

 Hello,

 I'd like to send you some pull requests to improve the
 maintainability of accumulo.

 My company - DevFactory - is sponsoring me to identify and fix
 code quality issues and improve unit test coverage in open
 source projects. DevFactory is obsessed with code quality and is
 providing its commercially available code quality improvement
 service for free to qualified open-source projects.

 If you are interested, please let me know and we will add it to
 our pipeline. Our first step will be to utilize tools like PMD,
 FindBugs and Sonar to identify the most important issues to fix.
 Once we fix them, we'll follow up with some pull requests.

 Thanks,
 George Kankava

Re: Using Iterator-Test-Harness

2016-06-21 Thread Josh Elser


Happy to receive improvements!

Feel free to ping me if you'd like some review on any changes. You're 
likely the one first using it other than me -- I would love to hear what 
works well and what doesn't. I likely have tunnel-vision because I wrote it.


I think my biggest concern is making sure we have a stable API for 
developers to integrate with the harness but still letting you have the 
ability to tweak it to your needs. This is a fine balance, but one that 
is definitely achievable.


Dylan Hutchison wrote:

Hi all,

I like the structure of Accumulo's iterator-test-harness

module and would like to use it in my own code.  I was wondering if there
is a better alternative than (1) copying the code, (2) modifying it to suit
my own needs, (3) pushing back improvements I make if any, by
re-implementing them in the Accumulo project and filing a JIRA.

Cheers, Dylan

Re: Using Iterator-Test-Harness

2016-06-21 Thread Josh Elser

Sonatype has some nice resources which you could look into as a 
short-term too. You could change the GAV for the iterator-test-harness 
module and push it to Sonatype. Then, maven would be able to find it 
like normal via maven central.


A little bit more work though :)

Dylan Hutchison wrote:

Thanks Josh and Russ.  I like Russ's idea to do a mvn install on 1.8.0.
Then I can test using the harness as an external dependency.

On Tue, Jun 21, 2016 at 2:25 PM, Josh Elser  wrote:


Happy to receive improvements!

Feel free to ping me if you'd like some review on any changes. You're
likely the one first using it other than me -- I would love to hear what
works well and what doesn't. I likely have tunnel-vision because I wrote it.

I think my biggest concern is making sure we have a stable API for
developers to integrate with the harness but still letting you have the
ability to tweak it to your needs. This is a fine balance, but one that is
definitely achievable.

Dylan Hutchison wrote:


Hi all,

I like the structure of Accumulo's iterator-test-harness
<https://github.com/apache/accumulo/tree/master/iterator-test-harness>
module and would like to use it in my own code.  I was wondering if there
is a better alternative than (1) copying the code, (2) modifying it to
suit
my own needs, (3) pushing back improvements I make if any, by
re-implementing them in the Accumulo project and filing a JIRA.

Cheers, Dylan

Re: [ANNOUNCE] Timely - Secure Time Series Database

2016-06-22 Thread Josh Elser


Awesome!

dlmar...@comcast.net wrote:

Timely is a time series database application that provides secure access to 
time series data. It is designed to be used with Apache Accumulo for 
persistence and Grafana for visualization. Timely is located at 
https://github.com/NationalSecurityAgency/timely .

Publicity/blog on Timely

2016-06-22 Thread Josh Elser

My single word "Awesome" really doesn't give justice to how excited I am 
about this. I've long hoped that we'd be able to push Accumulo to fit 
well into the time-series realm (due to our ability to handle writes 
very well -- sans the inherent limitations of the BigTable architecture 
I suppose).


The documentation in the README is super thorough too.

I'd love to read/see more. Any chance I could persuade you into a blog 
post or similar? Some perf numbers by node (e.g. one 
Collector/TabletServer can support an ingest rate of X), screenshots in 
action, hypothetical (or real) use cases?


Josh Elser wrote:

Awesome!

dlmar...@comcast.net wrote:

Timely is a time series database application that provides secure
access to time series data. It is designed to be used with Apache
Accumulo for persistence and Grafana for visualization. Timely is
located at https://github.com/NationalSecurityAgency/timely .

Re: Feedback for 1.7.2 release notes

2016-06-23 Thread Josh Elser


Making a few changes and pushing the HTML for ya, Mike :)

Mike Drob wrote:

Thanks. Changes made and notes pushed to the website.

On Wed, Jun 22, 2016 at 5:51 PM, Michael Wall  wrote:


Follow on to

https://lists.apache.org/thread.html/962102c6c3745b759a2d94d59d40b7f213f2013bfdc7f9d6bc7ef48f@%3Cdev.accumulo.apache.org%3E

Feedback for draft release notes at
https://github.com/madrob/accumulo/blob/gh-pages/release_notes/1.7.2.md

1 - Should link to Jira in the third sentence be to the Jira release notes
for 1.7.2 at

https://issues.apache.org/jira/secure/ReleaseNote.jspa?projectId=12312121&version=12333776

Apache Accumulo 1.7.2 is a maintenance release on the 1.7 version branch.
This
release contains changes from more than 150 issues, comprised of bug-fixes,
performance improvements, build quality improvements, and more. See
[JIRA][*JIRA_172*]

2 - Inadvertently misspelled in

### Write-Ahead Logs can be prematurely deleted

There were cases where the Accumulo Garbage Collector may *inadvetrantly*

3 - Second ticket number should be 4148 in

### Native Map failed to increment mutation count properly

There was a bug ([ACCUMULO-4148][ACCUMULO-*4158*])

4 - conditions is misspelled in

## Other Notable Changes

*  [ACCUMULO-4335][ACCUMULO-4335] Error *conditionas*

5 - testing table has no entries

Thanks for releasing Mike.

Re: Feedback for 1.7.2 release notes

2016-06-23 Thread Josh Elser


+1 from me now. Thanks for putting these together.

Josh Elser wrote:

Making a few changes and pushing the HTML for ya, Mike :)

Mike Drob wrote:

Thanks. Changes made and notes pushed to the website.

On Wed, Jun 22, 2016 at 5:51 PM, Michael Wall wrote:


Follow on to

https://lists.apache.org/thread.html/962102c6c3745b759a2d94d59d40b7f213f2013bfdc7f9d6bc7ef48f@%3Cdev.accumulo.apache.org%3E


Feedback for draft release notes at
https://github.com/madrob/accumulo/blob/gh-pages/release_notes/1.7.2.md

1 - Should link to Jira in the third sentence be to the Jira release
notes
for 1.7.2 at

https://issues.apache.org/jira/secure/ReleaseNote.jspa?projectId=12312121&version=12333776


Apache Accumulo 1.7.2 is a maintenance release on the 1.7 version
branch.
This
release contains changes from more than 150 issues, comprised of
bug-fixes,
performance improvements, build quality improvements, and more. See
[JIRA][*JIRA_172*]

2 - Inadvertently misspelled in

### Write-Ahead Logs can be prematurely deleted

There were cases where the Accumulo Garbage Collector may
*inadvetrantly*

3 - Second ticket number should be 4148 in

### Native Map failed to increment mutation count properly

There was a bug ([ACCUMULO-4148][ACCUMULO-*4158*])

4 - conditions is misspelled in

## Other Notable Changes

* [ACCUMULO-4335][ACCUMULO-4335] Error *conditionas*

5 - testing table has no entries

Thanks for releasing Mike.

Re: [ANNOUNCE] Apache Accumulo 1.7.2 Released

2016-06-23 Thread Josh Elser


Don't forget to fwd the corrected version to announce@a.o, please.

Mike Drob wrote:

Whoops, meant to say that we are proud to announce the release of
Accumulo version 1.7.2!

On Thu, Jun 23, 2016 at 10:47 AM, Mike Drob mailto:md...@apache.org>> wrote:

The Accumulo team is proud to announce the release of Accumulo version
1.7.1!

This release contains over 30 bugfixes and improvements over 1.7.1,
and is
backwards-compatible with 1.7.0 and 1.7.1. Existing users of 1.7.1
are encouraged to
upgrade immediately.

This version is now available in Maven Central, and at:
https://accumulo.apache.org/downloads/

The full release notes can be viewed at:
https://accumulo.apache.org/release_notes/1.7.2.html

The Apache Accumulo™ sorted, distributed key/value store is a robust,
scalable, high performance data storage system that features cell-based
access control and customizable server-side processing. It is based on
Google's BigTable design and is built on top of Apache Hadoop, Apache
ZooKeeper, and Apache Thrift.

--
The Apache Accumulo Team

Re: [ANNOUNCE] Apache Accumulo 1.7.2 Released

2016-06-24 Thread Josh Elser


*ping*

Josh Elser wrote:

Don't forget to fwd the corrected version to announce@a.o, please.

Mike Drob wrote:

Whoops, meant to say that we are proud to announce the release of
Accumulo version 1.7.2!

On Thu, Jun 23, 2016 at 10:47 AM, Mike Drob mailto:md...@apache.org>> wrote:

The Accumulo team is proud to announce the release of Accumulo version
1.7.1!

This release contains over 30 bugfixes and improvements over 1.7.1,
and is
backwards-compatible with 1.7.0 and 1.7.1. Existing users of 1.7.1
are encouraged to
upgrade immediately.

This version is now available in Maven Central, and at:
https://accumulo.apache.org/downloads/

The full release notes can be viewed at:
https://accumulo.apache.org/release_notes/1.7.2.html

The Apache Accumulo™ sorted, distributed key/value store is a robust,
scalable, high performance data storage system that features cell-based
access control and customizable server-side processing. It is based on
Google's BigTable design and is built on top of Apache Hadoop, Apache
ZooKeeper, and Apache Thrift.

--
The Apache Accumulo Team

Re: [ANNOUNCE] Apache Accumulo 1.7.2 Released

2016-06-25 Thread Josh Elser

Sorry. Not sure if gmail filtered it weirdly or something, but I never 
saw it here. Thanks for handling all of 1.7.2.


Mike Drob wrote:

Only the correct version went out to announce@a.o --
https://lists.apache.org/thread.html/60eec03a659d0784bb367a7c656e27dd1bc8cf51c40f6dce2aa11cfb@%3Cannounce.apache.org%3E

On Fri, Jun 24, 2016 at 4:02 PM, Josh Elser  wrote:


*ping*


Josh Elser wrote:


Don't forget to fwd the corrected version to announce@a.o, please.

Mike Drob wrote:


Whoops, meant to say that we are proud to announce the release of
Accumulo version 1.7.2!

On Thu, Jun 23, 2016 at 10:47 AM, Mike Drobmailto:md...@apache.org>>  wrote:

The Accumulo team is proud to announce the release of Accumulo version
1.7.1!

This release contains over 30 bugfixes and improvements over 1.7.1,
and is
backwards-compatible with 1.7.0 and 1.7.1. Existing users of 1.7.1
are encouraged to
upgrade immediately.

This version is now available in Maven Central, and at:
https://accumulo.apache.org/downloads/

The full release notes can be viewed at:
https://accumulo.apache.org/release_notes/1.7.2.html

The Apache Accumulo™ sorted, distributed key/value store is a robust,
scalable, high performance data storage system that features cell-based
access control and customizable server-side processing. It is based on
Google's BigTable design and is built on top of Apache Hadoop, Apache
ZooKeeper, and Apache Thrift.

--
The Apache Accumulo Team

Potential delayed responses from Josh

2016-06-26 Thread Josh Elser

FYI, I'll be traveling, conference-ing, and then traveling some more 
over the next week (June27-July3). If there's anything important that 
needs attention, feel free to shoot me a direct email. Otherwise, I 
might not be as responsive as usual to list traffic.


Thanks!

- JOsh

1 2 3 4 5 6 7 8 9 10 >

1 - 100 of 1268 matches

Mail list logo