Arup Malakar created HCATALOG-490:
-------------------------------------

             Summary: HCatStorer()  throws error when the same partition key is 
present in records in more than one  tasks running as part of the same job
                 Key: HCATALOG-490
                 URL: https://issues.apache.org/jira/browse/HCATALOG-490
             Project: HCatalog
          Issue Type: Bug
            Reporter: Arup Malakar
            Assignee: Arup Malakar


I have a file with ~240MB data. One of the columns in input data was 'action' 
and the value is either 1 or 2. 

When I try to load it using the following script:
{code}
in = load '/user/malakar/page_views_20000000_0/part-00000' USING 
PigStorage(',') AS (user:chararray, timespent:int, query_term:chararray, 
ip_addr:int, estimated_revenue:int, page_info:chararray, action:int);

STORE in into 'page_views_20000000_0' USING 
org.apache.hcatalog.pig.HCatStorer();
{code}

It throws the following exception:

{quote}
org.apache.hadoop.mapred.FileAlreadyExistsException: Output directory 
hdfs://tasktrackerhost:8020/user/hive/warehouse/page_views_20000000_0/_DYN0.7622108853605496/action=1
 already exists at 
org.apache.hadoop.mapred.FileOutputFormat.checkOutputSpecs(FileOutputFormat.java:121)
 at 
org.apache.hcatalog.mapreduce.FileRecordWriterContainer.write(FileRecordWriterContainer.java:200)
 at 
org.apache.hcatalog.mapreduce.FileRecordWriterContainer.write(FileRecordWriterContainer.java:52)
 at org.apache.hcatalog.pig.HCatBaseStorer.putNext(HCatBaseStorer.java:235) at 
org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigOutputFormat$PigRecordWriter.write(PigOutputFormat.java:139)
 at 
org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigOutputFormat$PigRecordWriter.write(PigOutputFormat.java:98)
 at 
org.apache.hadoop.mapred.MapTask$NewDirectOutputCollector.write(MapTask.java:639)
 at 
org.apache.hadoop.mapreduce.TaskInputOutputContext.write(TaskInputOutputContext.java:80)
 at 
org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigMapOnly$Map.collect(PigMapOnly.java:48)
 at 
org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigGenericMapBase.runPipeline(PigGenericMapBase.java:269)
 at 
org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigGenericMapBase.map(PigGenericMapBase.java:262)
 at 
org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigGenericMapBase.map(PigGenericMapBase.java:64)
 at org.apache.hadoop.mapreduce.Mapper.run(Mapper.java:144) at 
org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:764) at 
org.apache.hadoop.mapred.MapTask.run(MapTask.java:370) at 
org.apache.hadoop.mapred.Child$4.run(Child.java:255) at 
java.security.AccessController.doPrivileged(Native Method) at 
javax.security.auth.Subject.doAs(Subject.java:396) at 
org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1093)
 at org.apache.hadoop.mapred.Child.main(Child.java:249) 
{quote}

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

Reply via email to