[ 
https://issues.apache.org/jira/browse/HCATALOG-535?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13481987#comment-13481987
 ] 

Travis Crawford commented on HCATALOG-535:
------------------------------------------


The failing tests were because I was using a MySQL HiveMetaStore. The tests use 
managed tables, and drop if exists before running tests. The caused the test 
data files to be deleted.

For now, I just use a Derby metastore, but we should probably use unmanaged 
tables in the tests to avoid this. It could be a separate patch though.

The only test still failing is:

{code}
ant clean package test-e2e-setup test-e2e-deploy test-e2e -Dtests.to.run="-t 
Pig_Complex_6"
{code}

Error:

{code}
org.apache.hadoop.mapred.FileAlreadyExistsException: Output directory 
hdfs://localhost:9000/user/hive/warehouse/pig_complex_6/_TEMP already exists
        at 
org.apache.hadoop.mapred.FileOutputFormat.checkOutputSpecs(FileOutputFormat.java:121)
        at 
org.apache.hcatalog.mapreduce.FileOutputFormatContainer.checkOutputSpecs(FileOutputFormatContainer.java:136)
        at 
org.apache.hcatalog.mapreduce.HCatBaseOutputFormat.checkOutputSpecs(HCatBaseOutputFormat.java:71)
        at 
org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigOutputFormat.checkOutputSpecsHelper(PigOutputFormat.java:207)
        at 
org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigOutputFormat.checkOutputSpecs(PigOutputFormat.java:188)
        at org.apache.hadoop.mapred.JobClient$2.run(JobClient.java:887)
        at org.apache.hadoop.mapred.JobClient$2.run(JobClient.java:850)
        at java.security.AccessController.doPrivileged(Native Method)
        at javax.security.auth.Subject.doAs(Subject.java:396)
        at 
org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1121)
        at 
org.apache.hadoop.mapred.JobClient.submitJobInternal(JobClient.java:850)
        at org.apache.hadoop.mapred.JobClient.submitJob(JobClient.java:824)
        at org.apache.hadoop.mapred.jobcontrol.Job.submit(Job.java:378)
        at 
org.apache.hadoop.mapred.jobcontrol.JobControl.startReadyJobs(JobControl.java:247)
        at 
org.apache.hadoop.mapred.jobcontrol.JobControl.run(JobControl.java:279)
        at java.lang.Thread.run(Thread.java:680)
        at 
org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher$1.run(MapReduceLauncher.java:260)
{code}


The script has an {{exec}} statement, which forces Pig run the stuff above. I 
believe this is so the output can be read again, otherwise the data would not 
be available in the metastore when setting up the job. Because HCat writes its 
output to _TEMP I think this is causing the check to fail because the dir 
already exists from the first portion.

{code}
                                ,'hcat_prep'=>q\drop table if exists 
pig_complex_6;
create table pig_complex_6 (a array<string>) STORED AS TEXTFILE;\
                                ,'pig' => q\
a = load 'studenttab10k' using org.apache.hcatalog.pig.HCatLoader();
b = foreach a generate name;
c = distinct b;
d = group c all;
e = foreach d generate $1 as a;
store e into 'pig_complex_6' using org.apache.hcatalog.pig.HCatStorer();
exec;
f = load 'pig_complex_6' using org.apache.hcatalog.pig.HCatLoader();
g = foreach f generate flatten(a);
store g into ':OUTPATH:';\
{code}
                
> HCatalog e2e tests should run locally with minimal configuration
> ----------------------------------------------------------------
>
>                 Key: HCATALOG-535
>                 URL: https://issues.apache.org/jira/browse/HCATALOG-535
>             Project: HCatalog
>          Issue Type: Improvement
>            Reporter: Travis Crawford
>            Assignee: Travis Crawford
>
> Setting up the environment to run e2e tests is documented here:
> https://cwiki.apache.org/confluence/display/HCATALOG/How+To+Test
> Its extremely time consuming to setup because there are so many moving parts. 
> Some are very machine-specific, like configuring SSH and installing MySQL for 
> your platform. However, some stuff we can automate for the developer, like 
> downloading, installing & configuring all the Java stuff. We should do that 
> to simplify.
> Also, tests do not run from a git repo because of the svn external. This 
> would be very helpful to fix. Developing with Git is WAAAAAY nicer because 
> branching is so easy.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

Reply via email to