On Fri, Jun 29, 2012 at 12:53 AM, Andrew Wang <andrew.w...@cloudera.com> wrote:
> I put this on the jira too, but the algo I found whittled down a stream of
> 10 million items down to ~19.5k samples. With each sample at ~36B, that's
> ~685KiB. There's a bit more from using a LinkedList and general bookkeeping.
>
> Since the estimator is reset every O(minutes) window, and I doubt very many
> metrics see more than 10 million items in O(minutes), it seems lightweight
> enough to keep going.
>
> I'm planning on doing this in hadoop-common's metrics2 since HDFS is also
> interested, backporting to 1.x and 2.x. This would thus depend on the
> metrics2 conversion (HBASE-4050) going through too.
>

Sounds great Andrew.
St.Ack

Reply via email to