I haven't worked professionally with Accumulo in over a year partly because I haven't found local projects or companies that use it. When not working daily with Accumulo, it's hard to make meaningful contributions. Can we host a page listing organizations using Accumulo?
On Tue, Jan 13, 2015 at 6:06 PM, Josh Elser <[email protected]> wrote: > Oh. My. Thank you *so* much for writing this all up. It's extremely helpful. > Some comments inline. > > Joe Stein wrote: >> >> I have had a lot of feedback in the market place on Accumulo. This >> feedback >> was 100% from folks that didn't have Accumulo as a requirement to run and >> feel that it is very relevant to broader adoption. All of the below >> comments are a combination of my own opinions and what I have heard from >> others in the market in discussion about Accumulo. >> >> 1) Iterators are awesome from a software architecture perspective. From a >> development perspective if you have worked with them you have an >> experience >> or two to share on how to improve them. Anything that can be done to >> improve this experience for developers will be welcomed for new and >> existing users. > > > This comes up a lot. I know I always struggle with actually describing the > *why* to someone. Maybe more concrete examples are the best route -- e.g. > expand our existing examples in the codebase or create some PMC-managed > repos with examples? > >> 2) Lots of little cosmetic surface things in lots of places and attentions >> to details. e.g. https://github.com/apache/accumulo the branch is not the >> latest and even the latest branch (master?) README isn't really welcoming >> or appealing from a "my first time visiting the project" perspective. For >> new users you only get 1 impression for a first impression, this is >> important under the "technical marketing umbrella". Some Vagrant and/or >> Docker will make getting up and running quickly fantastic for folks that >> have to (or want to) interact with Accumulo. > > > I will file an INFRA issue tonight to switch this to `master` (most > recent/unstable). The tags should be self-explanatory in users > finding/building stable releases. > >> 3) The project should/could have more out of the box integrations and >> support from the core project release cycles. e.g. Accumulo Framework for >> Apache Mesos. I don't think the drive for this (Mesos support) is lacking >> but having spoken to other Accumulo users there is no clear path how folks >> can help to make this happen. The eco system just isn't big enough for >> these type of projects to exist successfully outside the core project on >> some github url. > > > I know we have some hooks into YARN integration with Apache Slider, but I > haven't really looked into Mesos integration (nor am I familiar with > what/how to go about this). I'm sure we could reach out to someone like Paco > Nathan and get some direction if no one else has a good feeling. > >> 4) Some eco system page or place where "all things accumulo" can be sought >> after... planet accumulo, something like that (no reason to reinvent this >> wheel). This is probably a combined issue of lack of aggregatable things >> (which we should try to improve) and the ability to have them seen in one >> place. One of the coolest things I have seen Accumulo release since >> following the project has been >> https://blogs.apache.org/accumulo/entry/scaling_accumulo_with_multi_volume >> but haven't seen anything else since this posting. Is it that the >> information isn't bubbling up or that people aren't posting more about >> cool >> things in place? Are people even using it? > > > I think this is one clear direction we can made easy progress in. I know > there are lots of neat things happening, but in production and development. > I'm not sure how much we lack in outward posting due to "developers not > liking to write" and how much is just "other reasons". > >> 5) Not; just; Java; please; => how about more Scala (maybe Iterator >> examples) and/or Go with some ProtoBuf interface? from an implementation >> perspective Java; just; kills; things; in; their; tracks; ! and Thrift has >> a way to-do that too... > > > :) -- I really think Protobufs with Accumulo Combiners (formerly > Aggregators) are pretty darn slick to use (and used it to build the multi-DC > replication). That's an obvious win in the form of example+blogpost. > > I know others have experience with Scala. Any good examples that can be > shared for how it works well with Accumulo? Go, as well? > >> 6) Operations is almost an opaque box. Getting something up and running >> for >> development is important but so is pushing it into production and >> sustaining it at scale. The more information about how this is done and >> where things work and do not work will be a *HUGE* driver for the >> community (IMHO). Again, maybe all this stuff is out there and #4 is >> really >> how to solve this for folks to not spend their nights and weekends >> googling. > > > Indeed. This is a very hard problem in general (and I think the market very > obviously confirms it). Overall, I do want to say that I think we do a good > job in helping people who come to us and ask questions (go us!). The hard > part is making it self-service: a solution for a problem can range from DNS > all the way up to an Iterator implementation. > > How do other projects deal with this? Is it primarily good answers that > eventually get indexed by Google and people can find them? How can we be > more aggressive in this regard? > >> 7) Apache Spark support. While arguably this goes under #3 I think it has >> to be called out as another (better?) option for MapReduce. It is really >> easy to get Spark to use AccumuloInputFormat which is wonderful and a >> fantastic opportunity for making Accumulo shine with Spark. A few samples >> people can run with Spark and Accumulo together that do something more >> than >> word count will go a long way to attracting an audience too. > > > I lack experience here as well but again know that others have experience > here. Spark users -- give us some more direction :) > >> 8) More ways to highlight the work loads that Accumulo was built for and >> what it does now and how it is not about website or social or ads is >> important to organizations in verticals that care differently about their >> data. > > > That's a good point. I know that many of our people have put a lot of > thought into these sorts of verticals in the past, but they haven't made it > into "official" write-ups. This would be a good area we can improve through > our own "marketing". > >> 9) Better call out features and highlight them with more examples >> explicitly. I might be repeating myself at this point but wanted to bring >> up "Tracing" as another good example of a REALLY cool feature that folks >> when they see it don't entirely understand what/how todo with it. Google >> for "accumulo trace" or even going through the documentation it is >> impossible to figure out how to use it and make it work without late >> nights >> and tender loving care. > > > Good point. Examples + documentation + blog posts would help here. Perhaps > focused-usages of the novel features are a better way to go about this? A > concrete implementation is a better read than an abstract concept and lends > itself well to avoid "so what?" questions. > >> None of these things are easy and are very demanding for open source >> projects and communities. I think this is a great discussion and hope to >> continue to contribute moving forward. > > > Thanks so much, again, for taking the time to write this down! > > >> /******************************************* >> Joe Stein >> Founder, Principal Consultant >> Big Data Open Source Security LLC >> http://www.stealth.ly >> Twitter: @allthingshadoop<http://www.twitter.com/allthingshadoop> >> ********************************************/ >> >> On Tue, Jan 13, 2015 at 4:37 PM, Keith Turner<[email protected]> wrote: >> >>> I think a minimal getting started guide is needed on the web site. >>> Something that will take a user from download to running on a cluster in >>> as >>> few steps as possible. This info is buried in the README, but there is >>> too >>> much other stuff in the readme. >>> >>> On Tue, Jan 13, 2015 at 4:09 PM, Josh Elser<[email protected]> wrote: >>> >>>> I meant to send this out closer to the new year (to ride on the new year >>>> resolution stereotype), but I slacked. Forgive me. >>>> >>>> As should be aware by those paying attention, we have had very little >>>> growth within the project over the past 6-9 months. We've had our normal >>>> spattering of contributions, a few from some repeat people, but I don't >>>> think we've grown as much as we could. >>>> >>>> I wanted to see if anyone has any suggestions on what we could try to do >>>> better in the coming year to help more people get involved with the >>>> project. I don't want this to turn into a "we do X wrong" discussion, so >>>> please try to stay positive and include suggestion(s) for every problem >>>> presented when possible. >>>> >>>> Also, everyone should feel welcome to participate in the discussion >>>> here. >>>> If you fall into the "bucket" described, I'd love to hear from you. If >>>> anyone doesn't want to publicly respond, please feel free to email me >>>> privately and I'll anonymously post to the list on your behalf. >>>> >>>> Some ideas to start off discussion: >>>> >>>> * Help reduce barrier to entry for new developers >>>> - Ensure imple/easy-to-process instructions for getting and building >>>> code in common environments >>>> - Instructions on running tests and reporting issues >>>> >>>> * More high-level examples >>>> - Maybe we start too deep in distributed-systems land and we scare >>>> away >>>> devs who think they "don't know enough to help" >>>> - Recording "newbie" tickets and providing adequate information for >>>> anyone to come along and try to take it on >>>> - Encourage/help/promote "concrete" ideas/code in the project. >>> >>> Something >>>> >>>> that is more tangible for devs to wrap their head around (also can help >>>> with adoption from new users) >>>> >>>> * Better documentation and "marketing" >>>> - We do "ok" with the occasional blog post, and the user manual is >>>> usually thorough, but we can obviously do better. >>>> - Can we create more "literature" to encourage more users and devs to >>>> get involved, trying to lower the barrier to entry? >>>> >>>> Thanks all. >>>> >> >
