[ https://issues.apache.org/jira/browse/PIG-1526?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Gerrit Jansen van Vuuren updated PIG-1526: ------------------------------------------ Attachment: TestHiveColumnarLoader.java TestPathPartitioner.java TestPathPartitionHelper.java I've attached the 3 test source files. > HiveColumnarLoader Partitioning Support > --------------------------------------- > > Key: PIG-1526 > URL: https://issues.apache.org/jira/browse/PIG-1526 > Project: Pig > Issue Type: Improvement > Affects Versions: 0.8.0 > Reporter: Gerrit Jansen van Vuuren > Assignee: Gerrit Jansen van Vuuren > Priority: Minor > Fix For: 0.8.0 > > Attachments: PIG-1526-2.patch, PIG-1526.patch, > TestHiveColumnarLoader.java, TestPathPartitioner.java, > TestPathPartitionHelper.java > > > I've made allot improvements on the HiveColumnarLoader: > -> Added support for LoadMetadata and data path Partitioning > -> Improved and simplefied column loading > Data Path Partitioning: > Hive stores partitions as folders like to > /mytable/partition1=[value]/partition2=[value]. That is the table mytable > contains 2 partitions [partition1, partition2]. > The HiveColumnarLoader will scan the inputpath /mytable and add to the > PigSchema the columns partition2 and partition2. > These columns can then be used in filtering. > For example: We've got year,month,day,hour partitions in our data uploads. > So a table might look like mytable/year=2010/month=02/day=01. > Loading with the HiveColumnarLoader allows our pig scripts do filter by date > using the standard pig Filter operator. > I've added 2 classes for this: > -> PathPartitioner > -> PathPartitionHelper > These classes are not hive dependent and could be used by any other loader > that wants to support partitioning and helps with implementing the > LoadMetadata interface. > For this reason I though it best to put it into the package > org.apache.pig.piggybank.storage.partition. > What would be nice is in the future have the PigStorage also use these 2 > classes to provide automatic path partitioning support. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.