I have had a lot of feedback in the market place on Accumulo. This feedback was 100% from folks that didn't have Accumulo as a requirement to run and feel that it is very relevant to broader adoption. All of the below comments are a combination of my own opinions and what I have heard from others in the market in discussion about Accumulo.
1) Iterators are awesome from a software architecture perspective. From a development perspective if you have worked with them you have an experience or two to share on how to improve them. Anything that can be done to improve this experience for developers will be welcomed for new and existing users. 2) Lots of little cosmetic surface things in lots of places and attentions to details. e.g. https://github.com/apache/accumulo the branch is not the latest and even the latest branch (master?) README isn't really welcoming or appealing from a "my first time visiting the project" perspective. For new users you only get 1 impression for a first impression, this is important under the "technical marketing umbrella". Some Vagrant and/or Docker will make getting up and running quickly fantastic for folks that have to (or want to) interact with Accumulo. 3) The project should/could have more out of the box integrations and support from the core project release cycles. e.g. Accumulo Framework for Apache Mesos. I don't think the drive for this (Mesos support) is lacking but having spoken to other Accumulo users there is no clear path how folks can help to make this happen. The eco system just isn't big enough for these type of projects to exist successfully outside the core project on some github url. 4) Some eco system page or place where "all things accumulo" can be sought after... planet accumulo, something like that (no reason to reinvent this wheel). This is probably a combined issue of lack of aggregatable things (which we should try to improve) and the ability to have them seen in one place. One of the coolest things I have seen Accumulo release since following the project has been https://blogs.apache.org/accumulo/entry/scaling_accumulo_with_multi_volume but haven't seen anything else since this posting. Is it that the information isn't bubbling up or that people aren't posting more about cool things in place? Are people even using it? 5) Not; just; Java; please; => how about more Scala (maybe Iterator examples) and/or Go with some ProtoBuf interface? from an implementation perspective Java; just; kills; things; in; their; tracks; ! and Thrift has a way to-do that too... 6) Operations is almost an opaque box. Getting something up and running for development is important but so is pushing it into production and sustaining it at scale. The more information about how this is done and where things work and do not work will be a *HUGE* driver for the community (IMHO). Again, maybe all this stuff is out there and #4 is really how to solve this for folks to not spend their nights and weekends googling. 7) Apache Spark support. While arguably this goes under #3 I think it has to be called out as another (better?) option for MapReduce. It is really easy to get Spark to use AccumuloInputFormat which is wonderful and a fantastic opportunity for making Accumulo shine with Spark. A few samples people can run with Spark and Accumulo together that do something more than word count will go a long way to attracting an audience too. 8) More ways to highlight the work loads that Accumulo was built for and what it does now and how it is not about website or social or ads is important to organizations in verticals that care differently about their data. 9) Better call out features and highlight them with more examples explicitly. I might be repeating myself at this point but wanted to bring up "Tracing" as another good example of a REALLY cool feature that folks when they see it don't entirely understand what/how todo with it. Google for "accumulo trace" or even going through the documentation it is impossible to figure out how to use it and make it work without late nights and tender loving care. None of these things are easy and are very demanding for open source projects and communities. I think this is a great discussion and hope to continue to contribute moving forward. /******************************************* Joe Stein Founder, Principal Consultant Big Data Open Source Security LLC http://www.stealth.ly Twitter: @allthingshadoop <http://www.twitter.com/allthingshadoop> ********************************************/ On Tue, Jan 13, 2015 at 4:37 PM, Keith Turner <[email protected]> wrote: > I think a minimal getting started guide is needed on the web site. > Something that will take a user from download to running on a cluster in as > few steps as possible. This info is buried in the README, but there is too > much other stuff in the readme. > > On Tue, Jan 13, 2015 at 4:09 PM, Josh Elser <[email protected]> wrote: > > > I meant to send this out closer to the new year (to ride on the new year > > resolution stereotype), but I slacked. Forgive me. > > > > As should be aware by those paying attention, we have had very little > > growth within the project over the past 6-9 months. We've had our normal > > spattering of contributions, a few from some repeat people, but I don't > > think we've grown as much as we could. > > > > I wanted to see if anyone has any suggestions on what we could try to do > > better in the coming year to help more people get involved with the > > project. I don't want this to turn into a "we do X wrong" discussion, so > > please try to stay positive and include suggestion(s) for every problem > > presented when possible. > > > > Also, everyone should feel welcome to participate in the discussion here. > > If you fall into the "bucket" described, I'd love to hear from you. If > > anyone doesn't want to publicly respond, please feel free to email me > > privately and I'll anonymously post to the list on your behalf. > > > > Some ideas to start off discussion: > > > > * Help reduce barrier to entry for new developers > > - Ensure imple/easy-to-process instructions for getting and building > > code in common environments > > - Instructions on running tests and reporting issues > > > > * More high-level examples > > - Maybe we start too deep in distributed-systems land and we scare away > > devs who think they "don't know enough to help" > > - Recording "newbie" tickets and providing adequate information for > > anyone to come along and try to take it on > > - Encourage/help/promote "concrete" ideas/code in the project. > Something > > that is more tangible for devs to wrap their head around (also can help > > with adoption from new users) > > > > * Better documentation and "marketing" > > - We do "ok" with the occasional blog post, and the user manual is > > usually thorough, but we can obviously do better. > > - Can we create more "literature" to encourage more users and devs to > > get involved, trying to lower the barrier to entry? > > > > Thanks all. > > >
