Arup Malakar created HCATALOG-490:
-------------------------------------
Summary: HCatStorer() throws error when the same partition key is
present in records in more than one tasks running as part of the same job
Key: HCATALOG-490
URL: https://issues.apache.org/jira/browse/HCATALOG-490
Project: HCatalog
Issue Type: Bug
Reporter: Arup Malakar
Assignee: Arup Malakar
I have a file with ~240MB data. One of the columns in input data was 'action'
and the value is either 1 or 2.
When I try to load it using the following script:
{code}
in = load '/user/malakar/page_views_20000000_0/part-00000' USING
PigStorage(',') AS (user:chararray, timespent:int, query_term:chararray,
ip_addr:int, estimated_revenue:int, page_info:chararray, action:int);
STORE in into 'page_views_20000000_0' USING
org.apache.hcatalog.pig.HCatStorer();
{code}
It throws the following exception:
{quote}
org.apache.hadoop.mapred.FileAlreadyExistsException: Output directory
hdfs://tasktrackerhost:8020/user/hive/warehouse/page_views_20000000_0/_DYN0.7622108853605496/action=1
already exists at
org.apache.hadoop.mapred.FileOutputFormat.checkOutputSpecs(FileOutputFormat.java:121)
at
org.apache.hcatalog.mapreduce.FileRecordWriterContainer.write(FileRecordWriterContainer.java:200)
at
org.apache.hcatalog.mapreduce.FileRecordWriterContainer.write(FileRecordWriterContainer.java:52)
at org.apache.hcatalog.pig.HCatBaseStorer.putNext(HCatBaseStorer.java:235) at
org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigOutputFormat$PigRecordWriter.write(PigOutputFormat.java:139)
at
org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigOutputFormat$PigRecordWriter.write(PigOutputFormat.java:98)
at
org.apache.hadoop.mapred.MapTask$NewDirectOutputCollector.write(MapTask.java:639)
at
org.apache.hadoop.mapreduce.TaskInputOutputContext.write(TaskInputOutputContext.java:80)
at
org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigMapOnly$Map.collect(PigMapOnly.java:48)
at
org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigGenericMapBase.runPipeline(PigGenericMapBase.java:269)
at
org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigGenericMapBase.map(PigGenericMapBase.java:262)
at
org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigGenericMapBase.map(PigGenericMapBase.java:64)
at org.apache.hadoop.mapreduce.Mapper.run(Mapper.java:144) at
org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:764) at
org.apache.hadoop.mapred.MapTask.run(MapTask.java:370) at
org.apache.hadoop.mapred.Child$4.run(Child.java:255) at
java.security.AccessController.doPrivileged(Native Method) at
javax.security.auth.Subject.doAs(Subject.java:396) at
org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1093)
at org.apache.hadoop.mapred.Child.main(Child.java:249)
{quote}
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira