Re: Accumulo iterator to return a random sample of a percentile of a table

Chris Bennight Tue, 04 Feb 2014 17:27:44 -0800

I'm assuming you want a random selection of entries in accumulo - so say a
random selection of key's/values?

How are your keys formatted (conceptually is fine); is there some sort of
regularity to them?  (I.e. can you calculate ahead of time a random
distribution of keys without validating which keys are present)?

If you can't calculate the key distribution ahead of time, are you keeping
any statistics (or could you) on ingest (cardinality, distribution, etc.) -
and finally, how rigorous and performant do you need this random sampling
to be?  Do you just want representative data, or are you trying to do
something like BlinkDB[1]  (allow people to specify confidence intervals on
queries, and only sample enough data to meet the requisite uncertainty
requirements)?

[1] http://blinkdb.org/

Chris

On Sat, Feb 1, 2014 at 3:58 PM, cprigano <[email protected]> wrote:

> I am looking at writing an Accumulo iterator to return a random sample of a
> percentile of a table.
>
> I would appreciate any suggestions.
>
> Thnaks,
>
> Chris
>
>
>
> --
> View this message in context:
> http://apache-accumulo.1065345.n5.nabble.com/Accumulo-iterator-to-return-a-random-sample-of-a-percentile-of-a-table-tp7354.html
> Sent from the Developers mailing list archive at Nabble.com.
>

Re: Accumulo iterator to return a random sample of a percentile of a table

Reply via email to