[ https://issues.apache.org/jira/browse/PIG-1526?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Daniel Dai updated PIG-1526: ---------------------------- Attachment: PIG-1526-fix.patch Hi, Gerrit, TestHiveColumnarLoader is due to OOM on my machine, I increase heap size in build.xml to solve it. I also find TestPathPartitionHelper and TestPathPartitioner does not work if I have a hadoop-site file in classpath. So I add the following code to deal with it: {code} File oldConf = new File(System.getProperty("user.home")+"/pigtest/conf/hadoop-site.xml"); oldConf.delete(); {code} Please take a look of attached patch, if it is Ok, I will commit it. Thanks > HiveColumnarLoader Partitioning Support > --------------------------------------- > > Key: PIG-1526 > URL: https://issues.apache.org/jira/browse/PIG-1526 > Project: Pig > Issue Type: Improvement > Affects Versions: 0.8.0 > Reporter: Gerrit Jansen van Vuuren > Assignee: Gerrit Jansen van Vuuren > Priority: Minor > Fix For: 0.8.0 > > Attachments: PIG-1526-2.patch, PIG-1526-fix.patch, PIG-1526.patch, > TestHiveColumnarLoader.java, TestPathPartitioner.java, > TestPathPartitionHelper.java > > > I've made allot improvements on the HiveColumnarLoader: > -> Added support for LoadMetadata and data path Partitioning > -> Improved and simplefied column loading > Data Path Partitioning: > Hive stores partitions as folders like to > /mytable/partition1=[value]/partition2=[value]. That is the table mytable > contains 2 partitions [partition1, partition2]. > The HiveColumnarLoader will scan the inputpath /mytable and add to the > PigSchema the columns partition2 and partition2. > These columns can then be used in filtering. > For example: We've got year,month,day,hour partitions in our data uploads. > So a table might look like mytable/year=2010/month=02/day=01. > Loading with the HiveColumnarLoader allows our pig scripts do filter by date > using the standard pig Filter operator. > I've added 2 classes for this: > -> PathPartitioner > -> PathPartitionHelper > These classes are not hive dependent and could be used by any other loader > that wants to support partitioning and helps with implementing the > LoadMetadata interface. > For this reason I though it best to put it into the package > org.apache.pig.piggybank.storage.partition. > What would be nice is in the future have the PigStorage also use these 2 > classes to provide automatic path partitioning support. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.