On Fri, Jun 29, 2012 at 12:53 AM, Andrew Wang <andrew.w...@cloudera.com> wrote: > I put this on the jira too, but the algo I found whittled down a stream of > 10 million items down to ~19.5k samples. With each sample at ~36B, that's > ~685KiB. There's a bit more from using a LinkedList and general bookkeeping. > > Since the estimator is reset every O(minutes) window, and I doubt very many > metrics see more than 10 million items in O(minutes), it seems lightweight > enough to keep going. > > I'm planning on doing this in hadoop-common's metrics2 since HDFS is also > interested, backporting to 1.x and 2.x. This would thus depend on the > metrics2 conversion (HBASE-4050) going through too. >
Sounds great Andrew. St.Ack