max value for a dataset

Edward Capriolo Sat, 18 Apr 2009 09:00:03 -0700

I jumped into Hadoop at the 'deep end'. I know pig, hive, and hbase
support the ability to max(). I am writing my own max() over a simple
one column dataset.


The best solution I came up with was using MapRunner. With maprunner I
can store the highest value in a private member variable. I can read
through the entire data set and only have to emit one value per mapper
upon completion of the map data. Then I can specify one reducer and
carry out the same operation.

Does anyone have a better tactic. I thought a counter could do this
but are they atomic?

max value for a dataset

Reply via email to