[jira] [Commented] (HCATALOG-490) HCatStorer() throws error when the same partition key is present in records in more than one tasks running as part of the same job

Francis Liu (JIRA) Wed, 05 Sep 2012 14:09:09 -0700

    [ 
https://issues.apache.org/jira/browse/HCATALOG-490?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13449135#comment-13449135
 ]


Francis Liu commented on HCATALOG-490:
--------------------------------------

Travis, the issue is as Arup says. Take a look at line 200 here:

http://svn.apache.org/viewvc/incubator/hcatalog/trunk/src/java/org/apache/hcatalog/mapreduce/FileRecordWriterContainer.java?view=markup

If two tasks end up writing to the same partition checkoutputSpecs and setupJob 
will be called twice, and the second one will fail trying to create the 
partition directory again.
                
> HCatStorer()  throws error when the same partition key is present in records 
> in more than one  tasks running as part of the same job
> ------------------------------------------------------------------------------------------------------------------------------------
>
>                 Key: HCATALOG-490
>                 URL: https://issues.apache.org/jira/browse/HCATALOG-490
>             Project: HCatalog
>          Issue Type: Bug
>            Reporter: Arup Malakar
>            Assignee: Arup Malakar
>
> I have a file with ~240MB data. One of the columns in input data was 'action' 
> and the value is either 1 or 2. 
> When I try to load it using the following script:
> {code}
> in = load '/user/malakar/page_views_20000000_0/part-00000' USING 
> PigStorage(',') AS (user:chararray, timespent:int, query_term:chararray, 
> ip_addr:int, estimated_revenue:int, page_info:chararray, action:int);
> STORE in into 'page_views_20000000_0' USING 
> org.apache.hcatalog.pig.HCatStorer();
> {code}
> It throws the following exception:
> {quote}
> org.apache.hadoop.mapred.FileAlreadyExistsException: Output directory 
> hdfs://tasktrackerhost:8020/user/hive/warehouse/page_views_20000000_0/_DYN0.7622108853605496/action=1
>  already exists at 
> org.apache.hadoop.mapred.FileOutputFormat.checkOutputSpecs(FileOutputFormat.java:121)
>  at 
> org.apache.hcatalog.mapreduce.FileRecordWriterContainer.write(FileRecordWriterContainer.java:200)
>  at 
> org.apache.hcatalog.mapreduce.FileRecordWriterContainer.write(FileRecordWriterContainer.java:52)
>  at org.apache.hcatalog.pig.HCatBaseStorer.putNext(HCatBaseStorer.java:235) 
> at 
> org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigOutputFormat$PigRecordWriter.write(PigOutputFormat.java:139)
>  at 
> org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigOutputFormat$PigRecordWriter.write(PigOutputFormat.java:98)
>  at 
> org.apache.hadoop.mapred.MapTask$NewDirectOutputCollector.write(MapTask.java:639)
>  at 
> org.apache.hadoop.mapreduce.TaskInputOutputContext.write(TaskInputOutputContext.java:80)
>  at 
> org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigMapOnly$Map.collect(PigMapOnly.java:48)
>  at 
> org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigGenericMapBase.runPipeline(PigGenericMapBase.java:269)
>  at 
> org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigGenericMapBase.map(PigGenericMapBase.java:262)
>  at 
> org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigGenericMapBase.map(PigGenericMapBase.java:64)
>  at org.apache.hadoop.mapreduce.Mapper.run(Mapper.java:144) at 
> org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:764) at 
> org.apache.hadoop.mapred.MapTask.run(MapTask.java:370) at 
> org.apache.hadoop.mapred.Child$4.run(Child.java:255) at 
> java.security.AccessController.doPrivileged(Native Method) at 
> javax.security.auth.Subject.doAs(Subject.java:396) at 
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1093)
>  at org.apache.hadoop.mapred.Child.main(Child.java:249) 
> {quote}

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HCATALOG-490) HCatStorer() throws error when the same partition key is present in records in more than one tasks running as part of the same job

Reply via email to