I'd love to see in-depth examples of the cell-level security. How about an example of using Accumulo for HIPPA? Is anyone using Accumulo for Genetics?
On Thu, Jan 15, 2015 at 3:13 PM, Josh Elser <[email protected]> wrote: > Another anonymous response: > > <quote> > I had never looked at the accumulo front page until this morning. I think > it does ok with "who are you?", but should to better at "*why* are you?". it > indirectly mentions the security model and iterators, but I think it should > make those front and center. and ingest performance is huge. > > I don't know how aggressive you want to get, but I think you really ought to > directly compare to hbase and cassandra, on various dimensions. > > What market segments would you love accumulo to get in to? (health care? > ...). If I were a developer looking to spend my hobby time, the front page > might lead me to check out the other projects, and maybe not come back (and > a google of "hbase vs" lists a number of comparisons that did not even > include accumulo). > > In general, I think getting more users would get more developers: > - I think that points to the marketing side of things > - NiFi is doing a stunningly good job with blog posts about low-pain setup > and examples, right out of the gate > > Iterators are terrifying to implement/deploy: > > - they are clearly a novel paradigm when reading the paper/docs, but > implementing and deploying a complex new iterator, or even an update to an > iterator that's been working for a long time, on a large cloud, always makes > me hold my breath until i'm about to pass out > - Even after i've added every possible unit test I can think of, I still > assume that I will see a storm of crashing tservers when I push out to a > large cloud. > - Some sort of systematic safety harness for vetting a new iterator or > combination of iterators would be great > - I think it's mostly scary because we don't really have a small live > playground in to which we can copy data and make mistakes. Maybe the > solution is to create the playground (with real, non-cherry-picked data), > and be able to make mistakes that don't cost days to undo but that takes a > good deal of work, and tools could be written to support that. > <quote> > > Some personal thoughts: > > Good points about being more assertive WRT marketing. I think it's fair to > say that we get "walked" often because we're not aggressive enough in > stating that Accumulo is a player. > > We should make an iterator fuzzing framework. We know what the system does > that is unexpected and can likely codify that in a test environment. It > would take a little bit of effort to implement well, but I do think it's > feasible. Clone()'ing a table is one option if you have real data in a real > environment -- that will at least prevent you from destroying existing data, > but it doesn't protect you against tanking your Accumulo instance with some > thread/memory leak :) > > Josh Elser wrote: >> >> I meant to send this out closer to the new year (to ride on the new year >> resolution stereotype), but I slacked. Forgive me. >> >> As should be aware by those paying attention, we have had very little >> growth within the project over the past 6-9 months. We've had our normal >> spattering of contributions, a few from some repeat people, but I don't >> think we've grown as much as we could. >> >> I wanted to see if anyone has any suggestions on what we could try to do >> better in the coming year to help more people get involved with the >> project. I don't want this to turn into a "we do X wrong" discussion, so >> please try to stay positive and include suggestion(s) for every problem >> presented when possible. >> >> Also, everyone should feel welcome to participate in the discussion >> here. If you fall into the "bucket" described, I'd love to hear from >> you. If anyone doesn't want to publicly respond, please feel free to >> email me privately and I'll anonymously post to the list on your behalf. >> >> Some ideas to start off discussion: >> >> * Help reduce barrier to entry for new developers >> - Ensure imple/easy-to-process instructions for getting and building >> code in common environments >> - Instructions on running tests and reporting issues >> >> * More high-level examples >> - Maybe we start too deep in distributed-systems land and we scare away >> devs who think they "don't know enough to help" >> - Recording "newbie" tickets and providing adequate information for >> anyone to come along and try to take it on >> - Encourage/help/promote "concrete" ideas/code in the project. Something >> that is more tangible for devs to wrap their head around (also can help >> with adoption from new users) >> >> * Better documentation and "marketing" >> - We do "ok" with the occasional blog post, and the user manual is >> usually thorough, but we can obviously do better. >> - Can we create more "literature" to encourage more users and devs to >> get involved, trying to lower the barrier to entry? >> >> Thanks all.
