[
https://issues.apache.org/jira/browse/DATAFU-71?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Matthew Hayes updated DATAFU-71:
--------------------------------
Attachment: DATAFU-71.patch
Attaching an initial patch. This includes a modified version of AvroStorage
from Pig 0.14 that enables some of its methods to be overriden, a
DatePartitionedAvroStorage that enables reading date ranges of yyyy/mm/dd
partitioned input data, and an IncrementalAvroStorage that derives from
DatePartitionedAvroStorage and adds the ability to incrementally process the
input. This is basically the Pig UDF equivalent to some of the Hourglass
functionality.
Initial feedback is welcome. I need to go through the code and do another pass
at documentation. Also more unit tests would be good.
> Create IncrementalAvroStorage UDF for incrementally processing date
> partitioned data
> ------------------------------------------------------------------------------------
>
> Key: DATAFU-71
> URL: https://issues.apache.org/jira/browse/DATAFU-71
> Project: DataFu
> Issue Type: New Feature
> Reporter: Matthew Hayes
> Assignee: Matthew Hayes
> Attachments: DATAFU-71.patch
>
>
> Data can sometimes be stored in HDFS in a time-partitioned manner, e.g.
> /some/input/yyyy/mm/dd. You may want to process this data incrementally,
> where the output has a format like /some/output/yyyy/mm/dd. It be useful if
> there is a UDF that handles the incremental processing for you.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)