Re: Growing project involvement

David Medinets Tue, 13 Jan 2015 15:40:39 -0800

I haven't worked professionally with Accumulo in over a year partly
because I haven't found local projects or companies that use it. When
not working daily with Accumulo, it's hard to make meaningful
contributions. Can we host a page listing organizations using
Accumulo?


On Tue, Jan 13, 2015 at 6:06 PM, Josh Elser <[email protected]> wrote:
> Oh. My. Thank you *so* much for writing this all up. It's extremely helpful.
> Some comments inline.
>
> Joe Stein wrote:
>>
>> I have had a lot of feedback in the market place on Accumulo. This
>> feedback
>> was 100% from folks that didn't have Accumulo as a requirement to run and
>> feel that it is very relevant to broader adoption. All of the below
>> comments are a combination of my own opinions and what I have heard from
>> others in the market in discussion about Accumulo.
>>
>> 1) Iterators are awesome from a software architecture perspective. From a
>> development perspective if you have worked with them you have an
>> experience
>> or two to share on how to improve them. Anything that can be done to
>> improve this experience for developers will be welcomed for new and
>> existing users.
>
>
> This comes up a lot. I know I always struggle with actually describing the
> *why* to someone. Maybe more concrete examples are the best route -- e.g.
> expand our existing examples in the codebase or create some PMC-managed
> repos with examples?
>
>> 2) Lots of little cosmetic surface things in lots of places and attentions
>> to details. e.g. https://github.com/apache/accumulo the branch is not the
>> latest and even the latest branch (master?) README isn't really welcoming
>> or appealing from a "my first time visiting the project" perspective. For
>> new users you only get 1 impression for a first impression, this is
>> important under the "technical marketing umbrella".  Some Vagrant and/or
>> Docker will make getting up and running quickly fantastic for folks that
>> have to (or want to) interact with Accumulo.
>
>
> I will file an INFRA issue tonight to switch this to `master` (most
> recent/unstable). The tags should be self-explanatory in users
> finding/building stable releases.
>
>> 3) The project should/could have more out of the box integrations and
>> support from the core project release cycles. e.g. Accumulo Framework for
>> Apache Mesos. I don't think the drive for this (Mesos support) is lacking
>> but having spoken to other Accumulo users there is no clear path how folks
>> can help to make this happen. The eco system just isn't big enough for
>> these type of projects to exist successfully outside the core project on
>> some github url.
>
>
> I know we have some hooks into YARN integration with Apache Slider, but I
> haven't really looked into Mesos integration (nor am I familiar with
> what/how to go about this). I'm sure we could reach out to someone like Paco
> Nathan and get some direction if no one else has a good feeling.
>
>> 4) Some eco system page or place where "all things accumulo" can be sought
>> after... planet accumulo, something like that (no reason to reinvent this
>> wheel).  This is probably a combined issue of lack of aggregatable things
>> (which we should try to improve) and the ability to have them seen in one
>> place.  One of the coolest things I have seen Accumulo release since
>> following the project has been
>> https://blogs.apache.org/accumulo/entry/scaling_accumulo_with_multi_volume
>> but haven't seen anything else since this posting. Is it that the
>> information isn't bubbling up or that people aren't posting more about
>> cool
>> things in place? Are people even using it?
>
>
> I think this is one clear direction we can made easy progress in. I know
> there are lots of neat things happening, but in production and development.
> I'm not sure how much we lack in outward posting due to "developers not
> liking to write" and how much is just "other reasons".
>
>> 5) Not; just; Java; please; =>  how about more Scala (maybe Iterator
>> examples) and/or Go with some ProtoBuf interface? from an implementation
>> perspective Java; just; kills; things; in; their; tracks; ! and Thrift has
>> a way to-do that too...
>
>
> :) -- I really think Protobufs with Accumulo Combiners (formerly
> Aggregators) are pretty darn slick to use (and used it to build the multi-DC
> replication). That's an obvious win in the form of example+blogpost.
>
> I know others have experience with Scala. Any good examples that can be
> shared for how it works well with Accumulo? Go, as well?
>
>> 6) Operations is almost an opaque box. Getting something up and running
>> for
>> development is important but so is pushing it into production and
>> sustaining it at scale. The more information about how this is done and
>> where things work and do not work will be a  *HUGE* driver for the
>> community (IMHO). Again, maybe all this stuff is out there and #4 is
>> really
>> how to solve this for folks to not spend their nights and weekends
>> googling.
>
>
> Indeed. This is a very hard problem in general (and I think the market very
> obviously confirms it). Overall, I do want to say that I think we do a good
> job in helping people who come to us and ask questions (go us!). The hard
> part is making it self-service: a solution for a problem can range from DNS
> all the way up to an Iterator implementation.
>
> How do other projects deal with this? Is it primarily good answers that
> eventually get indexed by Google and people can find them? How can we be
> more aggressive in this regard?
>
>> 7) Apache Spark support. While arguably this goes under #3 I think it has
>> to be called out as another (better?) option for MapReduce. It is really
>> easy to get Spark to use AccumuloInputFormat which is wonderful and a
>> fantastic opportunity for making Accumulo shine with Spark. A few samples
>> people can run with Spark and Accumulo together that do something more
>> than
>> word count will go a long way to attracting an audience too.
>
>
> I lack experience here as well but again know that others have experience
> here. Spark users -- give us some more direction :)
>
>> 8) More ways to highlight the work loads that Accumulo was built for and
>> what it does now and how it is not about website or social or ads is
>> important to organizations in verticals that care differently about their
>> data.
>
>
> That's a good point. I know that many of our people have put a lot of
> thought into these sorts of verticals in the past, but they haven't made it
> into "official" write-ups. This would be a good area we can improve through
> our own "marketing".
>
>> 9) Better call out features and highlight them with more examples
>> explicitly. I might be repeating myself at this point but wanted to bring
>> up "Tracing" as another good example of a REALLY cool feature that folks
>> when they see it don't entirely understand what/how todo with it. Google
>> for "accumulo trace" or even going through the documentation it is
>> impossible to figure out how to use it and make it work without late
>> nights
>> and tender loving care.
>
>
> Good point. Examples + documentation + blog posts would help here. Perhaps
> focused-usages of the novel features are a better way to go about this? A
> concrete implementation is a better read than an abstract concept and lends
> itself well to avoid "so what?" questions.
>
>> None of these things are easy and are very demanding for open source
>> projects and communities. I think this is a great discussion and hope to
>> continue to contribute moving forward.
>
>
> Thanks so much, again, for taking the time to write this down!
>
>
>> /*******************************************
>>   Joe Stein
>>   Founder, Principal Consultant
>>   Big Data Open Source Security LLC
>>   http://www.stealth.ly
>>   Twitter: @allthingshadoop<http://www.twitter.com/allthingshadoop>
>> ********************************************/
>>
>> On Tue, Jan 13, 2015 at 4:37 PM, Keith Turner<[email protected]>  wrote:
>>
>>> I think a minimal getting started guide is needed on the web site.
>>> Something that will take a user from download to running on a cluster in
>>> as
>>> few steps as possible.  This info is buried in the README, but there is
>>> too
>>> much other stuff in the readme.
>>>
>>> On Tue, Jan 13, 2015 at 4:09 PM, Josh Elser<[email protected]>  wrote:
>>>
>>>> I meant to send this out closer to the new year (to ride on the new year
>>>> resolution stereotype), but I slacked. Forgive me.
>>>>
>>>> As should be aware by those paying attention, we have had very little
>>>> growth within the project over the past 6-9 months. We've had our normal
>>>> spattering of contributions, a few from some repeat people, but I don't
>>>> think we've grown as much as we could.
>>>>
>>>> I wanted to see if anyone has any suggestions on what we could try to do
>>>> better in the coming year to help more people get involved with the
>>>> project. I don't want this to turn into a "we do X wrong" discussion, so
>>>> please try to stay positive and include suggestion(s) for every problem
>>>> presented when possible.
>>>>
>>>> Also, everyone should feel welcome to participate in the discussion
>>>> here.
>>>> If you fall into the "bucket" described, I'd love to hear from you. If
>>>> anyone doesn't want to publicly respond, please feel free to email me
>>>> privately and I'll anonymously post to the list on your behalf.
>>>>
>>>> Some ideas to start off discussion:
>>>>
>>>> * Help reduce barrier to entry for new developers
>>>>    - Ensure imple/easy-to-process instructions for getting and building
>>>> code in common environments
>>>>    - Instructions on running tests and reporting issues
>>>>
>>>> * More high-level examples
>>>>    - Maybe we start too deep in distributed-systems land and we scare
>>>> away
>>>> devs who think they "don't know enough to help"
>>>>    - Recording "newbie" tickets and providing adequate information for
>>>> anyone to come along and try to take it on
>>>>    - Encourage/help/promote "concrete" ideas/code in the project.
>>>
>>> Something
>>>>
>>>> that is more tangible for devs to wrap their head around (also can help
>>>> with adoption from new users)
>>>>
>>>> * Better documentation and "marketing"
>>>>    - We do "ok" with the occasional blog post, and the user manual is
>>>> usually thorough, but we can obviously do better.
>>>>    - Can we create more "literature" to encourage more users and devs to
>>>> get involved, trying to lower the barrier to entry?
>>>>
>>>> Thanks all.
>>>>
>>
>

Re: Growing project involvement

Reply via email to