[
https://issues.apache.org/jira/browse/PIG-1576?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Alan Gates updated PIG-1576:
----------------------------
Fix Version/s: 0.9.0
> Difference in Semantics between Load statement in Pig and HDFS client on
> Command line
> -------------------------------------------------------------------------------------
>
> Key: PIG-1576
> URL: https://issues.apache.org/jira/browse/PIG-1576
> Project: Pig
> Issue Type: Bug
> Components: impl
> Affects Versions: 0.6.0, 0.7.0
> Reporter: Viraj Bhat
> Fix For: 0.9.0
>
>
> Here is my directory structure on HDFS which I want to access using Pig.
> This is a sample, but in real use case I have more than 100 of these
> directories.
> {code}
> $ hadoop fs -ls /user/viraj/recursive/
> Found 3 items
> drwxr-xr-x - viraj supergroup 0 2010-08-26 11:25
> /user/viraj/recursive/20080615
> drwxr-xr-x - viraj supergroup 0 2010-08-26 11:25
> /user/viraj/recursive/20080616
> drwxr-xr-x - viraj supergroup 0 2010-08-26 11:25
> /user/viraj/recursive/20080617
> {code}
> Using the command line I am access them using variety of options:
> {code}
> $ hadoop fs -ls /user/viraj/recursive/{200806}{15..17}/
> -rw-r--r-- 1 viraj supergroup 5791 2010-08-26 11:25
> /user/viraj/recursive/20080615/kv2.txt
> -rw-r--r-- 1 viraj supergroup 5791 2010-08-26 11:25
> /user/viraj/recursive/20080616/kv2.txt
> -rw-r--r-- 1 viraj supergroup 5791 2010-08-26 11:25
> /user/viraj/recursive/20080617/kv2.txt
> $ hadoop fs -ls /user/viraj/recursive/{20080615..20080617}/
> -rw-r--r-- 1 viraj supergroup 5791 2010-08-26 11:25
> /user/viraj/recursive/20080615/kv2.txt
> -rw-r--r-- 1 viraj supergroup 5791 2010-08-26 11:25
> /user/viraj/recursive/20080616/kv2.txt
> -rw-r--r-- 1 viraj supergroup 5791 2010-08-26 11:25
> /user/viraj/recursive/20080617/kv2.txt
> {code}
> I have written a Pig script, all the below combination of load statements do
> not work?
> {code}
> --A = load '/user/viraj/recursive/{200806}{15..17}/' using
> PigStorage('\u0001') as (k:int, v:chararray);
> A = load '/user/viraj/recursive/{20080615..20080617}/' using
> PigStorage('\u0001') as (k:int, v:chararray);
> AL = limit A 10;
> dump AL;
> {code}
> I get the following error in Pig 0.8
> {noformat}
> 2010-08-27 16:34:27,704 [main] ERROR
> org.apache.pig.tools.pigstats.PigStatsUtil - 1 map reduce job(s) failed!
> 2010-08-27 16:34:27,711 [main] INFO org.apache.pig.tools.pigstats.PigStats -
> Script Statistics:
> HadoopVersion PigVersion UserId StartedAt FinishedAt
> Features
> 0.20.2 0.8.0-SNAPSHOT viraj 2010-08-27 16:34:24 2010-08-27 16:34:27
> LIMIT
> Failed!
> Failed Jobs:
> JobId Alias Feature Message Outputs
> N/A A,AL Message:
> org.apache.pig.backend.executionengine.ExecException: ERROR 2118: Unable to
> create input splits for: /user/viraj/recursive/{20080615..20080617}/
> at
> org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigInputFormat.getSplits(PigInputFormat.java:279)
> at
> org.apache.hadoop.mapred.JobClient.writeNewSplits(JobClient.java:885)
> at
> org.apache.hadoop.mapred.JobClient.submitJobInternal(JobClient.java:779)
> at org.apache.hadoop.mapred.JobClient.submitJob(JobClient.java:730)
> at org.apache.hadoop.mapred.jobcontrol.Job.submit(Job.java:378)
> at
> org.apache.hadoop.mapred.jobcontrol.JobControl.startReadyJobs(JobControl.java:247)
> at
> org.apache.hadoop.mapred.jobcontrol.JobControl.run(JobControl.java:279)
> at java.lang.Thread.run(Thread.java:619)
> Caused by: org.apache.hadoop.mapreduce.lib.input.InvalidInputException: Input
> Pattern hdfs://localhost:9000/user/viraj/recursive/{20080615..20080617}
> matches 0 files
> at
> org.apache.hadoop.mapreduce.lib.input.FileInputFormat.listStatus(FileInputFormat.java:224)
> at
> org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigTextInputFormat.listStatus(PigTextInputFormat.java:36)
> at
> org.apache.hadoop.mapreduce.lib.input.FileInputFormat.getSplits(FileInputFormat.java:241)
> at
> org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigInputFormat.getSplits(PigInputFormat.java:268)
> ... 7 more
> hdfs://localhost:9000/tmp/temp241388470/tmp987803889,
> {noformat}
> The following works:
> {code}
> A = load '/user/viraj/recursive/{200806}{15,16,17}/' using
> PigStorage('\u0001') as (k:int, v:chararray);
> AL = limit A 10;
> dump AL;
> {code}
> Why is there an inconsistency between HDFS client and Pig?
> Viraj
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.