[jira] Resolved: (PIG-1576) Difference in Semantics between Load statement in Pig and HDFS client on Command line

Olga Natkovich (JIRA) Tue, 16 Nov 2010 17:39:39 -0800

     [ 
https://issues.apache.org/jira/browse/PIG-1576?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]


Olga Natkovich resolved PIG-1576.
---------------------------------

    Resolution: Invalid

We followed up with the HDFS guys and they confirmed that HDFS does not support 
this pattern either through API. The reason this worked is likely due to 
interaction with Unix shell

> Difference in Semantics between Load statement in Pig and HDFS client on 
> Command line
> -------------------------------------------------------------------------------------
>
>                 Key: PIG-1576
>                 URL: https://issues.apache.org/jira/browse/PIG-1576
>             Project: Pig
>          Issue Type: Bug
>          Components: impl
>    Affects Versions: 0.6.0, 0.7.0
>            Reporter: Viraj Bhat
>             Fix For: 0.9.0
>
>
> Here is my directory structure on HDFS which I want to access using Pig. 
> This is a sample, but in real use case I have more than 100 of these 
> directories.
> {code}
> $ hadoop fs -ls /user/viraj/recursive/
> Found 3 items
> drwxr-xr-x   - viraj supergroup          0 2010-08-26 11:25 
> /user/viraj/recursive/20080615
> drwxr-xr-x   - viraj supergroup          0 2010-08-26 11:25 
> /user/viraj/recursive/20080616
> drwxr-xr-x   - viraj supergroup          0 2010-08-26 11:25 
> /user/viraj/recursive/20080617
> {code}
> Using the command line I am access them using variety of options:
> {code}
> $ hadoop fs -ls /user/viraj/recursive/{200806}{15..17}/
> -rw-r--r--   1 viraj supergroup       5791 2010-08-26 11:25 
> /user/viraj/recursive/20080615/kv2.txt
> -rw-r--r--   1 viraj supergroup       5791 2010-08-26 11:25 
> /user/viraj/recursive/20080616/kv2.txt
> -rw-r--r--   1 viraj supergroup       5791 2010-08-26 11:25 
> /user/viraj/recursive/20080617/kv2.txt
> $ hadoop fs -ls /user/viraj/recursive/{20080615..20080617}/
> -rw-r--r--   1 viraj supergroup       5791 2010-08-26 11:25 
> /user/viraj/recursive/20080615/kv2.txt
> -rw-r--r--   1 viraj supergroup       5791 2010-08-26 11:25 
> /user/viraj/recursive/20080616/kv2.txt
> -rw-r--r--   1 viraj supergroup       5791 2010-08-26 11:25 
> /user/viraj/recursive/20080617/kv2.txt
> {code}
> I have written a Pig script, all the below combination of load statements do 
> not work?
> {code}
> --A = load '/user/viraj/recursive/{200806}{15..17}/' using 
> PigStorage('\u0001') as (k:int, v:chararray);
> A = load '/user/viraj/recursive/{20080615..20080617}/' using 
> PigStorage('\u0001') as (k:int, v:chararray);
> AL = limit A 10;
> dump AL;
> {code}
> I get the following error in Pig 0.8
> {noformat}
> 2010-08-27 16:34:27,704 [main] ERROR 
> org.apache.pig.tools.pigstats.PigStatsUtil - 1 map reduce job(s) failed!
> 2010-08-27 16:34:27,711 [main] INFO  org.apache.pig.tools.pigstats.PigStats - 
> Script Statistics: 
> HadoopVersion   PigVersion      UserId  StartedAt       FinishedAt      
> Features
> 0.20.2  0.8.0-SNAPSHOT  viraj   2010-08-27 16:34:24     2010-08-27 16:34:27   
>   LIMIT
> Failed!
> Failed Jobs:
> JobId   Alias   Feature Message Outputs
> N/A     A,AL            Message: 
> org.apache.pig.backend.executionengine.ExecException: ERROR 2118: Unable to 
> create input splits for: /user/viraj/recursive/{20080615..20080617}/
>         at 
> org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigInputFormat.getSplits(PigInputFormat.java:279)
>         at 
> org.apache.hadoop.mapred.JobClient.writeNewSplits(JobClient.java:885)
>         at 
> org.apache.hadoop.mapred.JobClient.submitJobInternal(JobClient.java:779)
>         at org.apache.hadoop.mapred.JobClient.submitJob(JobClient.java:730)
>         at org.apache.hadoop.mapred.jobcontrol.Job.submit(Job.java:378)
>         at 
> org.apache.hadoop.mapred.jobcontrol.JobControl.startReadyJobs(JobControl.java:247)
>         at 
> org.apache.hadoop.mapred.jobcontrol.JobControl.run(JobControl.java:279)
>         at java.lang.Thread.run(Thread.java:619)
> Caused by: org.apache.hadoop.mapreduce.lib.input.InvalidInputException: Input 
> Pattern hdfs://localhost:9000/user/viraj/recursive/{20080615..20080617} 
> matches 0 files
>         at 
> org.apache.hadoop.mapreduce.lib.input.FileInputFormat.listStatus(FileInputFormat.java:224)
>         at 
> org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigTextInputFormat.listStatus(PigTextInputFormat.java:36)
>         at 
> org.apache.hadoop.mapreduce.lib.input.FileInputFormat.getSplits(FileInputFormat.java:241)
>         at 
> org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigInputFormat.getSplits(PigInputFormat.java:268)
>         ... 7 more
>         hdfs://localhost:9000/tmp/temp241388470/tmp987803889,
> {noformat}
> The following works:
> {code}
> A = load '/user/viraj/recursive/{200806}{15,16,17}/' using 
> PigStorage('\u0001') as (k:int, v:chararray);
> AL = limit A 10;
> dump AL;
> {code}
> Why is there an inconsistency between HDFS client and Pig?
> Viraj

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Resolved: (PIG-1576) Difference in Semantics between Load statement in Pig and HDFS client on Command line

Reply via email to