Mathieu Bastian created DATAFU-51: ------------------------------------- Summary: Add DataFu MR project, a lightweight for implementing Java/Scala MapReduce jobs Key: DATAFU-51 URL: https://issues.apache.org/jira/browse/DATAFU-51 Project: DataFu Issue Type: New Feature Reporter: Mathieu Bastian
New lightweight framework to develop Java/Scala MapReduce jobs. Inspired from Matt's work on Hourglass and my experience in developing Java jobs on Hadoop. It's a thin layer on top of the Hadoop API which mostly reduces boilerplate code and automate configuration. Features (see details on README): * Built-in support for Avro input and output formats * Though we recommend using Avro, one can use any input/output format class * Mapper, reducer and intermediate key/value classes are inferred when possible * Avro schemas are inferred when using POJO objects * Staged output to avoid deleting the existing file if the job fails * Estimate the number of reducers needed if not provided * Supports `#LATEST` suffix in input paths to work with timestamped folders -- This message was sent by Atlassian JIRA (v6.2#6252)