Oh. My. Thank you *so* much for writing this all up. It's extremely helpful. Some comments inline.

Joe Stein wrote:
I have had a lot of feedback in the market place on Accumulo. This feedback
was 100% from folks that didn't have Accumulo as a requirement to run and
feel that it is very relevant to broader adoption. All of the below
comments are a combination of my own opinions and what I have heard from
others in the market in discussion about Accumulo.

1) Iterators are awesome from a software architecture perspective. From a
development perspective if you have worked with them you have an experience
or two to share on how to improve them. Anything that can be done to
improve this experience for developers will be welcomed for new and
existing users.

This comes up a lot. I know I always struggle with actually describing the *why* to someone. Maybe more concrete examples are the best route -- e.g. expand our existing examples in the codebase or create some PMC-managed repos with examples?

2) Lots of little cosmetic surface things in lots of places and attentions
to details. e.g. https://github.com/apache/accumulo the branch is not the
latest and even the latest branch (master?) README isn't really welcoming
or appealing from a "my first time visiting the project" perspective. For
new users you only get 1 impression for a first impression, this is
important under the "technical marketing umbrella".  Some Vagrant and/or
Docker will make getting up and running quickly fantastic for folks that
have to (or want to) interact with Accumulo.

I will file an INFRA issue tonight to switch this to `master` (most recent/unstable). The tags should be self-explanatory in users finding/building stable releases.

3) The project should/could have more out of the box integrations and
support from the core project release cycles. e.g. Accumulo Framework for
Apache Mesos. I don't think the drive for this (Mesos support) is lacking
but having spoken to other Accumulo users there is no clear path how folks
can help to make this happen. The eco system just isn't big enough for
these type of projects to exist successfully outside the core project on
some github url.

I know we have some hooks into YARN integration with Apache Slider, but I haven't really looked into Mesos integration (nor am I familiar with what/how to go about this). I'm sure we could reach out to someone like Paco Nathan and get some direction if no one else has a good feeling.

4) Some eco system page or place where "all things accumulo" can be sought
after... planet accumulo, something like that (no reason to reinvent this
wheel).  This is probably a combined issue of lack of aggregatable things
(which we should try to improve) and the ability to have them seen in one
place.  One of the coolest things I have seen Accumulo release since
following the project has been
https://blogs.apache.org/accumulo/entry/scaling_accumulo_with_multi_volume
but haven't seen anything else since this posting. Is it that the
information isn't bubbling up or that people aren't posting more about cool
things in place? Are people even using it?

I think this is one clear direction we can made easy progress in. I know there are lots of neat things happening, but in production and development. I'm not sure how much we lack in outward posting due to "developers not liking to write" and how much is just "other reasons".

5) Not; just; Java; please; =>  how about more Scala (maybe Iterator
examples) and/or Go with some ProtoBuf interface? from an implementation
perspective Java; just; kills; things; in; their; tracks; ! and Thrift has
a way to-do that too...

:) -- I really think Protobufs with Accumulo Combiners (formerly Aggregators) are pretty darn slick to use (and used it to build the multi-DC replication). That's an obvious win in the form of example+blogpost.

I know others have experience with Scala. Any good examples that can be shared for how it works well with Accumulo? Go, as well?

6) Operations is almost an opaque box. Getting something up and running for
development is important but so is pushing it into production and
sustaining it at scale. The more information about how this is done and
where things work and do not work will be a  *HUGE* driver for the
community (IMHO). Again, maybe all this stuff is out there and #4 is really
how to solve this for folks to not spend their nights and weekends googling.

Indeed. This is a very hard problem in general (and I think the market very obviously confirms it). Overall, I do want to say that I think we do a good job in helping people who come to us and ask questions (go us!). The hard part is making it self-service: a solution for a problem can range from DNS all the way up to an Iterator implementation.

How do other projects deal with this? Is it primarily good answers that eventually get indexed by Google and people can find them? How can we be more aggressive in this regard?

7) Apache Spark support. While arguably this goes under #3 I think it has
to be called out as another (better?) option for MapReduce. It is really
easy to get Spark to use AccumuloInputFormat which is wonderful and a
fantastic opportunity for making Accumulo shine with Spark. A few samples
people can run with Spark and Accumulo together that do something more than
word count will go a long way to attracting an audience too.

I lack experience here as well but again know that others have experience here. Spark users -- give us some more direction :)

8) More ways to highlight the work loads that Accumulo was built for and
what it does now and how it is not about website or social or ads is
important to organizations in verticals that care differently about their
data.

That's a good point. I know that many of our people have put a lot of thought into these sorts of verticals in the past, but they haven't made it into "official" write-ups. This would be a good area we can improve through our own "marketing".

9) Better call out features and highlight them with more examples
explicitly. I might be repeating myself at this point but wanted to bring
up "Tracing" as another good example of a REALLY cool feature that folks
when they see it don't entirely understand what/how todo with it. Google
for "accumulo trace" or even going through the documentation it is
impossible to figure out how to use it and make it work without late nights
and tender loving care.

Good point. Examples + documentation + blog posts would help here. Perhaps focused-usages of the novel features are a better way to go about this? A concrete implementation is a better read than an abstract concept and lends itself well to avoid "so what?" questions.

None of these things are easy and are very demanding for open source
projects and communities. I think this is a great discussion and hope to
continue to contribute moving forward.

Thanks so much, again, for taking the time to write this down!

/*******************************************
  Joe Stein
  Founder, Principal Consultant
  Big Data Open Source Security LLC
  http://www.stealth.ly
  Twitter: @allthingshadoop<http://www.twitter.com/allthingshadoop>
********************************************/

On Tue, Jan 13, 2015 at 4:37 PM, Keith Turner<[email protected]>  wrote:

I think a minimal getting started guide is needed on the web site.
Something that will take a user from download to running on a cluster in as
few steps as possible.  This info is buried in the README, but there is too
much other stuff in the readme.

On Tue, Jan 13, 2015 at 4:09 PM, Josh Elser<[email protected]>  wrote:

I meant to send this out closer to the new year (to ride on the new year
resolution stereotype), but I slacked. Forgive me.

As should be aware by those paying attention, we have had very little
growth within the project over the past 6-9 months. We've had our normal
spattering of contributions, a few from some repeat people, but I don't
think we've grown as much as we could.

I wanted to see if anyone has any suggestions on what we could try to do
better in the coming year to help more people get involved with the
project. I don't want this to turn into a "we do X wrong" discussion, so
please try to stay positive and include suggestion(s) for every problem
presented when possible.

Also, everyone should feel welcome to participate in the discussion here.
If you fall into the "bucket" described, I'd love to hear from you. If
anyone doesn't want to publicly respond, please feel free to email me
privately and I'll anonymously post to the list on your behalf.

Some ideas to start off discussion:

* Help reduce barrier to entry for new developers
   - Ensure imple/easy-to-process instructions for getting and building
code in common environments
   - Instructions on running tests and reporting issues

* More high-level examples
   - Maybe we start too deep in distributed-systems land and we scare away
devs who think they "don't know enough to help"
   - Recording "newbie" tickets and providing adequate information for
anyone to come along and try to take it on
   - Encourage/help/promote "concrete" ideas/code in the project.
Something
that is more tangible for devs to wrap their head around (also can help
with adoption from new users)

* Better documentation and "marketing"
   - We do "ok" with the occasional blog post, and the user manual is
usually thorough, but we can obviously do better.
   - Can we create more "literature" to encourage more users and devs to
get involved, trying to lower the barrier to entry?

Thanks all.


Reply via email to