I'm assuming you want a random selection of entries in accumulo - so say a random selection of key's/values?
How are your keys formatted (conceptually is fine); is there some sort of regularity to them? (I.e. can you calculate ahead of time a random distribution of keys without validating which keys are present)? If you can't calculate the key distribution ahead of time, are you keeping any statistics (or could you) on ingest (cardinality, distribution, etc.) - and finally, how rigorous and performant do you need this random sampling to be? Do you just want representative data, or are you trying to do something like BlinkDB[1] (allow people to specify confidence intervals on queries, and only sample enough data to meet the requisite uncertainty requirements)? [1] http://blinkdb.org/ Chris On Sat, Feb 1, 2014 at 3:58 PM, cprigano <[email protected]> wrote: > I am looking at writing an Accumulo iterator to return a random sample of a > percentile of a table. > > I would appreciate any suggestions. > > Thnaks, > > Chris > > > > -- > View this message in context: > http://apache-accumulo.1065345.n5.nabble.com/Accumulo-iterator-to-return-a-random-sample-of-a-percentile-of-a-table-tp7354.html > Sent from the Developers mailing list archive at Nabble.com. >
