[ 
https://issues.apache.org/jira/browse/DATAFU-71?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Matthew Hayes updated DATAFU-71:
--------------------------------
    Attachment: DATAFU-71.patch

Attaching an initial patch.  This includes a modified version of AvroStorage 
from Pig 0.14 that enables some of its methods to be overriden, a 
DatePartitionedAvroStorage that enables reading date ranges of yyyy/mm/dd 
partitioned input data, and an IncrementalAvroStorage that derives from 
DatePartitionedAvroStorage and adds the ability to incrementally process the 
input.  This is basically the Pig UDF equivalent to some of the Hourglass 
functionality.

Initial feedback is welcome.  I need to go through the code and do another pass 
at documentation.  Also more unit tests would be good.

> Create IncrementalAvroStorage UDF for incrementally processing date 
> partitioned data
> ------------------------------------------------------------------------------------
>
>                 Key: DATAFU-71
>                 URL: https://issues.apache.org/jira/browse/DATAFU-71
>             Project: DataFu
>          Issue Type: New Feature
>            Reporter: Matthew Hayes
>            Assignee: Matthew Hayes
>         Attachments: DATAFU-71.patch
>
>
> Data can sometimes be stored in HDFS in a time-partitioned manner, e.g. 
> /some/input/yyyy/mm/dd.  You may want to process this data incrementally, 
> where the output has a format like /some/output/yyyy/mm/dd.  It be useful if 
> there is a UDF that handles the incremental processing for you.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to