Mathieu Bastian created DATAFU-51:
-------------------------------------

             Summary: Add DataFu MR project, a lightweight  for implementing 
Java/Scala MapReduce jobs
                 Key: DATAFU-51
                 URL: https://issues.apache.org/jira/browse/DATAFU-51
             Project: DataFu
          Issue Type: New Feature
            Reporter: Mathieu Bastian


New lightweight framework to develop Java/Scala MapReduce jobs. Inspired from 
Matt's work on Hourglass and my experience in developing Java jobs on Hadoop. 
It's a thin layer on top of the Hadoop API which mostly reduces boilerplate 
code and automate configuration.

Features (see details on README):
* Built-in support for Avro input and output formats
* Though we recommend using Avro, one can use any input/output format class
* Mapper, reducer and intermediate key/value classes are inferred when possible
* Avro schemas are inferred when using POJO objects
* Staged output to avoid deleting the existing file if the job fails
* Estimate the number of reducers needed if not provided
* Supports `#LATEST` suffix in input paths to work with timestamped folders 



--
This message was sent by Atlassian JIRA
(v6.2#6252)

Reply via email to