[
https://issues.apache.org/jira/browse/HCATALOG-490?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13449135#comment-13449135
]
Francis Liu commented on HCATALOG-490:
--------------------------------------
Travis, the issue is as Arup says. Take a look at line 200 here:
http://svn.apache.org/viewvc/incubator/hcatalog/trunk/src/java/org/apache/hcatalog/mapreduce/FileRecordWriterContainer.java?view=markup
If two tasks end up writing to the same partition checkoutputSpecs and setupJob
will be called twice, and the second one will fail trying to create the
partition directory again.
> HCatStorer() throws error when the same partition key is present in records
> in more than one tasks running as part of the same job
> ------------------------------------------------------------------------------------------------------------------------------------
>
> Key: HCATALOG-490
> URL: https://issues.apache.org/jira/browse/HCATALOG-490
> Project: HCatalog
> Issue Type: Bug
> Reporter: Arup Malakar
> Assignee: Arup Malakar
>
> I have a file with ~240MB data. One of the columns in input data was 'action'
> and the value is either 1 or 2.
> When I try to load it using the following script:
> {code}
> in = load '/user/malakar/page_views_20000000_0/part-00000' USING
> PigStorage(',') AS (user:chararray, timespent:int, query_term:chararray,
> ip_addr:int, estimated_revenue:int, page_info:chararray, action:int);
> STORE in into 'page_views_20000000_0' USING
> org.apache.hcatalog.pig.HCatStorer();
> {code}
> It throws the following exception:
> {quote}
> org.apache.hadoop.mapred.FileAlreadyExistsException: Output directory
> hdfs://tasktrackerhost:8020/user/hive/warehouse/page_views_20000000_0/_DYN0.7622108853605496/action=1
> already exists at
> org.apache.hadoop.mapred.FileOutputFormat.checkOutputSpecs(FileOutputFormat.java:121)
> at
> org.apache.hcatalog.mapreduce.FileRecordWriterContainer.write(FileRecordWriterContainer.java:200)
> at
> org.apache.hcatalog.mapreduce.FileRecordWriterContainer.write(FileRecordWriterContainer.java:52)
> at org.apache.hcatalog.pig.HCatBaseStorer.putNext(HCatBaseStorer.java:235)
> at
> org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigOutputFormat$PigRecordWriter.write(PigOutputFormat.java:139)
> at
> org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigOutputFormat$PigRecordWriter.write(PigOutputFormat.java:98)
> at
> org.apache.hadoop.mapred.MapTask$NewDirectOutputCollector.write(MapTask.java:639)
> at
> org.apache.hadoop.mapreduce.TaskInputOutputContext.write(TaskInputOutputContext.java:80)
> at
> org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigMapOnly$Map.collect(PigMapOnly.java:48)
> at
> org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigGenericMapBase.runPipeline(PigGenericMapBase.java:269)
> at
> org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigGenericMapBase.map(PigGenericMapBase.java:262)
> at
> org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigGenericMapBase.map(PigGenericMapBase.java:64)
> at org.apache.hadoop.mapreduce.Mapper.run(Mapper.java:144) at
> org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:764) at
> org.apache.hadoop.mapred.MapTask.run(MapTask.java:370) at
> org.apache.hadoop.mapred.Child$4.run(Child.java:255) at
> java.security.AccessController.doPrivileged(Native Method) at
> javax.security.auth.Subject.doAs(Subject.java:396) at
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1093)
> at org.apache.hadoop.mapred.Child.main(Child.java:249)
> {quote}
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira