[ https://issues.apache.org/jira/browse/DATAFU-71?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Matthew Hayes updated DATAFU-71: -------------------------------- Attachment: DATAFU-71.patch Attaching an initial patch. This includes a modified version of AvroStorage from Pig 0.14 that enables some of its methods to be overriden, a DatePartitionedAvroStorage that enables reading date ranges of yyyy/mm/dd partitioned input data, and an IncrementalAvroStorage that derives from DatePartitionedAvroStorage and adds the ability to incrementally process the input. This is basically the Pig UDF equivalent to some of the Hourglass functionality. Initial feedback is welcome. I need to go through the code and do another pass at documentation. Also more unit tests would be good. > Create IncrementalAvroStorage UDF for incrementally processing date > partitioned data > ------------------------------------------------------------------------------------ > > Key: DATAFU-71 > URL: https://issues.apache.org/jira/browse/DATAFU-71 > Project: DataFu > Issue Type: New Feature > Reporter: Matthew Hayes > Assignee: Matthew Hayes > Attachments: DATAFU-71.patch > > > Data can sometimes be stored in HDFS in a time-partitioned manner, e.g. > /some/input/yyyy/mm/dd. You may want to process this data incrementally, > where the output has a format like /some/output/yyyy/mm/dd. It be useful if > there is a UDF that handles the incremental processing for you. -- This message was sent by Atlassian JIRA (v6.3.4#6332)