[
https://issues.apache.org/jira/browse/HADOOP-908?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Runping Qi updated HADOOP-908:
------------------------------
Status: Patch Available (was: Open)
A new patch with package.html for abacus package and
updated build.xml including javadoc for abacus.
> Hadoop Abacus, a package for performing simple counting/aggregation
> -------------------------------------------------------------------
>
> Key: HADOOP-908
> URL: https://issues.apache.org/jira/browse/HADOOP-908
> Project: Hadoop
> Issue Type: New Feature
> Components: contrib/streaming
> Reporter: Runping Qi
> Assigned To: Runping Qi
> Attachments: abacus.patch
>
>
> Hadoop Abacus package is a specialization of map/reduce framework,
> specilizing for performing various counting and aggregations.
> It offers similar functionalities to Google's SawZall.
> Generally speaking, in order to implement an application using Map/Reduce
> model,
> the developer needs to implement Map and Reduce functions (and possibly
> Combine function).
> However, for a lot of applications related to counting and statistics
> computing,
> these functions have very similar characteristics.
> Abacus abstracts out the general patterns and provides a package implementing
> those patterns.
> In particular, the package provides a generic mapper class, a reducer class
> and a combiner class,
> and a set of built-in value aggregators. It also provides a generic utility
> class, ValueAggregatorJob
> for creating Abacus jobs.
> To create an Abacus job, the user just needs to implement one plugin class
> that
> is responsible for specifying what aggregators to use and what values are for
> which aggregators.
> The mapper will call this class in the runtime to generate aggregation ids
> and values.
> The generic combiner and reducer will aggregate the values associated with
> the same
> aggregation ids accordingly. Thus, it is much easier to create and run an
> Abacus job than
> a normal map/reduce job. Since a built-in generic combiner is always used,
> the execution is very efficient.
--
This message is automatically generated by JIRA.
-
If you think it was sent incorrectly contact one of the administrators:
https://issues.apache.org/jira/secure/Administrators.jspa
-
For more information on JIRA, see: http://www.atlassian.com/software/jira