Hadoop Abacus, a package for performing simple counting/aggregation
-------------------------------------------------------------------
Key: HADOOP-908
URL: https://issues.apache.org/jira/browse/HADOOP-908
Project: Hadoop
Issue Type: New Feature
Components: contrib/streaming
Reporter: Runping Qi
Hadoop Abacus package is a specialization of map/reduce framework,
specilizing for performing various counting and aggregations.
It offers similar functionalities to Google's SawZall.
Generally speaking, in order to implement an application using Map/Reduce
model,
the developer needs to implement Map and Reduce functions (and possibly Combine
function).
However, for a lot of applications related to counting and statistics
computing,
these functions have very similar characteristics.
Abacus abstracts out the general patterns and provides a package implementing
those patterns.
In particular, the package provides a generic mapper class, a reducer class and
a combiner class,
and a set of built-in value aggregators. It also provides a generic utility
class, ValueAggregatorJob
for creating Abacus jobs.
To create an Abacus job, the user just needs to implement one plugin class that
is responsible for specifying what aggregators to use and what values are for
which aggregators.
The mapper will call this class in the runtime to generate aggregation ids and
values.
The generic combiner and reducer will aggregate the values associated with the
same
aggregation ids accordingly. Thus, it is much easier to create and run an
Abacus job than
a normal map/reduce job. Since a built-in generic combiner is always used, the
execution is very efficient.
--
This message is automatically generated by JIRA.
-
If you think it was sent incorrectly contact one of the administrators:
https://issues.apache.org/jira/secure/Administrators.jspa
-
For more information on JIRA, see: http://www.atlassian.com/software/jira