[jira] [Commented] (HCATALOG-465) Dynamic Partition in HCatalog 0.4 throws FileAlreadyExists exception

Arup Malakar (JIRA) Tue, 14 Aug 2012 16:23:39 -0700

    [ 
https://issues.apache.org/jira/browse/HCATALOG-465?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13434625#comment-13434625
 ]


Arup Malakar commented on HCATALOG-465:
---------------------------------------

Hi Rajesh, I tried to reproduce this bug, but I was unable to. Here are the 
exact steps:

{code:title=Create hcatalog tables}
# PIG was complaining about uppercase column name so used lowercase column 
names everywhere!
# ERROR 1115: Column names should all be in lowercase. Invalid name found: 
HIT_TIME_GMT

create table table_1( hit_time_gmt string,service string,accept_language 
string, date_time string) partitioned by (load_date string,repo_name string) 
row format delimited fields terminated by '\t' stored as textfile;

create table table_2( hit_time_gmt string,service string,accept_language 
string, date_time string) partitioned by (load_date string,repo_name string) 
row format delimited fields terminated by '\t' stored as textfile;
{code}


{code:title=Input file}
time1   yahoo   en      Tue Aug_14
time2   yahoo   en      Tue Aug_15
{code}

{code:title=Load Data in Hadoop}
$ hadoop fs -mkdir /user/malakar/j465_input 
$ hadoop fs -copyFromLocal data /user/malakar/j465_input/                       
                          
{code}

{code:title=Load the data in HCat and run the pig script}
D = LOAD '/user/malakar/j465_input' AS (hit_time_gmt:CHARARRAY, 
service:CHARARRAY, accept_language:CHARARRAY, date_time:CHARARRAY);

STORE D INTO 'table_1'  USING 
org.apache.hcatalog.pig.HCatStorer('load_date=20120101,repo_name=testRepo','hit_time_gmt:CHARARRAY,service:CHARARRAY,accept_language:CHARARRAY,date_time:CHARARRAY');


a = load 'table_1' using org.apache.hcatalog.pig.HCatLoader();
b = filter a by (load_date == '20120101' and repo_name == 'testRepo');
store b into 'table_2' using org.apache.hcatalog.pig.HCatStorer();
{code}

Versions:
Apache Pig version 0.9.3.1204121426 (r1325529) 
Hadoop 1.0.2.1206210100
HCat 0.4

Let me know if I missed any of the steps.
                
> Dynamic Partition in HCatalog 0.4 throws FileAlreadyExists exception
> --------------------------------------------------------------------
>
>                 Key: HCATALOG-465
>                 URL: https://issues.apache.org/jira/browse/HCATALOG-465
>             Project: HCatalog
>          Issue Type: Bug
>          Components: pig
>    Affects Versions: 0.4
>         Environment: Hadoop 0.20.20x, HCatalog 0.4, Pig 0.9.2
>            Reporter: Rajesh Balamohan
>
> Here is a simple use case which can reproduce this error. I have also 
> attached the stacktrace
> 1. In HCat, create 2 tables (table_1 and table_2). Populate some data in 
> table_1.
> 2. Load from table_1 to table_2 with dynamic partition
>  
> hcat -e "create table table_1( HIT_TIME_GMT string,SERVICE 
> string,ACCEPT_LANGUAGE string, DATE_TIME string) 
> partitioned by (load_date string,repo_name string) row format delimited 
> fields terminated by '\t' stored as textfile";
> hcat -e "create table table_2( HIT_TIME_GMT string,SERVICE 
> string,ACCEPT_LANGUAGE string, DATE_TIME string) 
> partitioned by (load_date string,repo_name string) row format delimited 
> fields terminated by '\t' stored as textfile";
> Have some data populated to this with load_date='20120101' and 
> repo_name='testRepo'
> a = load 'table_1' using org.apache.hcatalog.pig.HCatLoader();
> b = filter a by (load_date == '20120101' and repo_name == 'testRepo');
> store b into 'table_2' using org.apache.hcatalog.pig.HCatStorer();
> org.apache.hadoop.mapred.FileAlreadyExistsException: Output directory 
> hdfs://cluster:54310/user/hive8/warehouse/db/table_1/_DYN0.4448079902737385/load_date=20120515/repo_name=testRepo
>  already exists
>       at 
> org.apache.hadoop.mapred.FileOutputFormat.checkOutputSpecs(FileOutputFormat.java:117)
>       at 
> org.apache.hcatalog.mapreduce.FileRecordWriterContainer.write(FileRecordWriterContainer.java:201)
>       at 
> org.apache.hcatalog.mapreduce.FileRecordWriterContainer.write(FileRecordWriterContainer.java:52)
>       at 
> org.apache.hcatalog.pig.HCatBaseStorer.putNext(HCatBaseStorer.java:235)
>       at 
> org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigOutputFormat$PigRecordWriter.write(PigOutputFormat.java:139)
>       at 
> org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigOutputFormat$PigRecordWriter.write(PigOutputFormat.java:98)
>       at 
> org.apache.hadoop.mapred.MapTask$NewDirectOutputCollector.write(MapTask.java:531)
>       at 
> org.apache.hadoop.mapreduce.TaskInputOutputContext.write(TaskInputOutputContext.java:80)
>       at 
> org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigMapOnly$Map.collect(PigMapOnly.java:48)
>       at 
> org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigGenericMapBase.map(PigGenericMapBase.java:248)
>       at 
> org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigGenericMapBase.map(PigGenericMapBase.java:64)
>       at org.apache.hadoop.mapreduce.Mapper.run(Mapper.java:144)
>       at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:647)
>       at org.apache.hadoop.mapred.MapTask.run(MapTask.java:323)
>       at org.apache.hadoop.mapred.Child$4.run(Child.java:270)
>       at java.security.AccessController.doPrivileged(Native Method)
>       at javax.security.auth.Subject.doAs(Subject.java:396)
>       at 
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1177)
>       at org.apache.hadoop.mapred.Child.main(Child.java:264)

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HCATALOG-465) Dynamic Partition in HCatalog 0.4 throws FileAlreadyExists exception

Reply via email to