[
https://issues.apache.org/jira/browse/PIG-1576?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Olga Natkovich resolved PIG-1576.
---------------------------------
Resolution: Invalid
We followed up with the HDFS guys and they confirmed that HDFS does not support
this pattern either through API. The reason this worked is likely due to
interaction with Unix shell
> Difference in Semantics between Load statement in Pig and HDFS client on
> Command line
> -------------------------------------------------------------------------------------
>
> Key: PIG-1576
> URL: https://issues.apache.org/jira/browse/PIG-1576
> Project: Pig
> Issue Type: Bug
> Components: impl
> Affects Versions: 0.6.0, 0.7.0
> Reporter: Viraj Bhat
> Fix For: 0.9.0
>
>
> Here is my directory structure on HDFS which I want to access using Pig.
> This is a sample, but in real use case I have more than 100 of these
> directories.
> {code}
> $ hadoop fs -ls /user/viraj/recursive/
> Found 3 items
> drwxr-xr-x - viraj supergroup 0 2010-08-26 11:25
> /user/viraj/recursive/20080615
> drwxr-xr-x - viraj supergroup 0 2010-08-26 11:25
> /user/viraj/recursive/20080616
> drwxr-xr-x - viraj supergroup 0 2010-08-26 11:25
> /user/viraj/recursive/20080617
> {code}
> Using the command line I am access them using variety of options:
> {code}
> $ hadoop fs -ls /user/viraj/recursive/{200806}{15..17}/
> -rw-r--r-- 1 viraj supergroup 5791 2010-08-26 11:25
> /user/viraj/recursive/20080615/kv2.txt
> -rw-r--r-- 1 viraj supergroup 5791 2010-08-26 11:25
> /user/viraj/recursive/20080616/kv2.txt
> -rw-r--r-- 1 viraj supergroup 5791 2010-08-26 11:25
> /user/viraj/recursive/20080617/kv2.txt
> $ hadoop fs -ls /user/viraj/recursive/{20080615..20080617}/
> -rw-r--r-- 1 viraj supergroup 5791 2010-08-26 11:25
> /user/viraj/recursive/20080615/kv2.txt
> -rw-r--r-- 1 viraj supergroup 5791 2010-08-26 11:25
> /user/viraj/recursive/20080616/kv2.txt
> -rw-r--r-- 1 viraj supergroup 5791 2010-08-26 11:25
> /user/viraj/recursive/20080617/kv2.txt
> {code}
> I have written a Pig script, all the below combination of load statements do
> not work?
> {code}
> --A = load '/user/viraj/recursive/{200806}{15..17}/' using
> PigStorage('\u0001') as (k:int, v:chararray);
> A = load '/user/viraj/recursive/{20080615..20080617}/' using
> PigStorage('\u0001') as (k:int, v:chararray);
> AL = limit A 10;
> dump AL;
> {code}
> I get the following error in Pig 0.8
> {noformat}
> 2010-08-27 16:34:27,704 [main] ERROR
> org.apache.pig.tools.pigstats.PigStatsUtil - 1 map reduce job(s) failed!
> 2010-08-27 16:34:27,711 [main] INFO org.apache.pig.tools.pigstats.PigStats -
> Script Statistics:
> HadoopVersion PigVersion UserId StartedAt FinishedAt
> Features
> 0.20.2 0.8.0-SNAPSHOT viraj 2010-08-27 16:34:24 2010-08-27 16:34:27
> LIMIT
> Failed!
> Failed Jobs:
> JobId Alias Feature Message Outputs
> N/A A,AL Message:
> org.apache.pig.backend.executionengine.ExecException: ERROR 2118: Unable to
> create input splits for: /user/viraj/recursive/{20080615..20080617}/
> at
> org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigInputFormat.getSplits(PigInputFormat.java:279)
> at
> org.apache.hadoop.mapred.JobClient.writeNewSplits(JobClient.java:885)
> at
> org.apache.hadoop.mapred.JobClient.submitJobInternal(JobClient.java:779)
> at org.apache.hadoop.mapred.JobClient.submitJob(JobClient.java:730)
> at org.apache.hadoop.mapred.jobcontrol.Job.submit(Job.java:378)
> at
> org.apache.hadoop.mapred.jobcontrol.JobControl.startReadyJobs(JobControl.java:247)
> at
> org.apache.hadoop.mapred.jobcontrol.JobControl.run(JobControl.java:279)
> at java.lang.Thread.run(Thread.java:619)
> Caused by: org.apache.hadoop.mapreduce.lib.input.InvalidInputException: Input
> Pattern hdfs://localhost:9000/user/viraj/recursive/{20080615..20080617}
> matches 0 files
> at
> org.apache.hadoop.mapreduce.lib.input.FileInputFormat.listStatus(FileInputFormat.java:224)
> at
> org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigTextInputFormat.listStatus(PigTextInputFormat.java:36)
> at
> org.apache.hadoop.mapreduce.lib.input.FileInputFormat.getSplits(FileInputFormat.java:241)
> at
> org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigInputFormat.getSplits(PigInputFormat.java:268)
> ... 7 more
> hdfs://localhost:9000/tmp/temp241388470/tmp987803889,
> {noformat}
> The following works:
> {code}
> A = load '/user/viraj/recursive/{200806}{15,16,17}/' using
> PigStorage('\u0001') as (k:int, v:chararray);
> AL = limit A 10;
> dump AL;
> {code}
> Why is there an inconsistency between HDFS client and Pig?
> Viraj
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.