[jira] [Created] (HCATALOG-490) HCatStorer() throws error when the same partition key is present in records in more than one tasks running as part of the same job

Arup Malakar (JIRA) Fri, 31 Aug 2012 10:47:09 -0700

Arup Malakar created HCATALOG-490:
-------------------------------------

             Summary: HCatStorer()  throws error when the same partition key is 
present in records in more than one  tasks running as part of the same job
                 Key: HCATALOG-490
                 URL: https://issues.apache.org/jira/browse/HCATALOG-490
             Project: HCatalog
          Issue Type: Bug
            Reporter: Arup Malakar
            Assignee: Arup Malakar



I have a file with ~240MB data. One of the columns in input data was 'action' 
and the value is either 1 or 2. 

When I try to load it using the following script:
{code}
in = load '/user/malakar/page_views_20000000_0/part-00000' USING 
PigStorage(',') AS (user:chararray, timespent:int, query_term:chararray, 
ip_addr:int, estimated_revenue:int, page_info:chararray, action:int);

STORE in into 'page_views_20000000_0' USING 
org.apache.hcatalog.pig.HCatStorer();
{code}

It throws the following exception:

{quote}
org.apache.hadoop.mapred.FileAlreadyExistsException: Output directory 
hdfs://tasktrackerhost:8020/user/hive/warehouse/page_views_20000000_0/_DYN0.7622108853605496/action=1
 already exists at 
org.apache.hadoop.mapred.FileOutputFormat.checkOutputSpecs(FileOutputFormat.java:121)
 at 
org.apache.hcatalog.mapreduce.FileRecordWriterContainer.write(FileRecordWriterContainer.java:200)
 at 
org.apache.hcatalog.mapreduce.FileRecordWriterContainer.write(FileRecordWriterContainer.java:52)
 at org.apache.hcatalog.pig.HCatBaseStorer.putNext(HCatBaseStorer.java:235) at 
org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigOutputFormat$PigRecordWriter.write(PigOutputFormat.java:139)
 at 
org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigOutputFormat$PigRecordWriter.write(PigOutputFormat.java:98)
 at 
org.apache.hadoop.mapred.MapTask$NewDirectOutputCollector.write(MapTask.java:639)
 at 
org.apache.hadoop.mapreduce.TaskInputOutputContext.write(TaskInputOutputContext.java:80)
 at 
org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigMapOnly$Map.collect(PigMapOnly.java:48)
 at 
org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigGenericMapBase.runPipeline(PigGenericMapBase.java:269)
 at 
org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigGenericMapBase.map(PigGenericMapBase.java:262)
 at 
org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigGenericMapBase.map(PigGenericMapBase.java:64)
 at org.apache.hadoop.mapreduce.Mapper.run(Mapper.java:144) at 
org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:764) at 
org.apache.hadoop.mapred.MapTask.run(MapTask.java:370) at 
org.apache.hadoop.mapred.Child$4.run(Child.java:255) at 
java.security.AccessController.doPrivileged(Native Method) at 
javax.security.auth.Subject.doAs(Subject.java:396) at 
org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1093)
 at org.apache.hadoop.mapred.Child.main(Child.java:249) 
{quote}

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Created] (HCATALOG-490) HCatStorer() throws error when the same partition key is present in records in more than one tasks running as part of the same job

Reply via email to