[ https://issues.apache.org/jira/browse/PIG-569?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12665537#action_12665537 ]
Richard Spring commented on PIG-569: ------------------------------------ I've come across a similar issue when trying to load subdirectories using globs: The command works fine via the shell: hadoop dfs -ls /data/2008/{11/13,11/15}/14/video_impressions But the following exception occurs when run via Pig if one of the globs does not return any files.: impressions = LOAD '/data/2008/{11/13,11/15}/14/video_impressions'....; Exception in thread "Thread-6" java.lang.NullPointerException at org.apache.hadoop.fs.FileSystem.globStatus(FileSystem.java:873) at org.apache.hadoop.fs.FileSystem.globStatus(FileSystem.java:846) at org.apache.pig.backend.hadoop.datastorage.HDataStorage.asCollection(HDataStorage.java:215) at org.apache.pig.backend.hadoop.datastorage.HDataStorage.asCollection(HDataStorage.java:40) at org.apache.pig.impl.io.FileLocalizer.globMatchesFiles(FileLocalizer.java:486) at org.apache.pig.impl.io.FileLocalizer.fileExists(FileLocalizer.java:455) at org.apache.pig.backend.executionengine.PigSlicer.validate(PigSlicer.java:114) at org.apache.pig.impl.io.ValidatingInputFileSpec.validate(ValidatingInputFileSpec.java:59) at org.apache.pig.impl.io.ValidatingInputFileSpec.<init>(ValidatingInputFileSpec.java:44) at org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigInputFormat.getSplits(PigInputFormat.java:200) at org.apache.hadoop.mapred.JobClient.submitJob(JobClient.java:782) at org.apache.hadoop.mapred.jobcontrol.Job.submit(Job.java:378) at org.apache.hadoop.mapred.jobcontrol.JobControl.startReadyJobs(JobControl.java:247) at org.apache.hadoop.mapred.jobcontrol.JobControl.run(JobControl.java:279) at java.lang.Thread.run(Unknown Source) > Inconsistency with Hadoop in Pig load statements involving globs with > subdirectories > ------------------------------------------------------------------------------------ > > Key: PIG-569 > URL: https://issues.apache.org/jira/browse/PIG-569 > Project: Pig > Issue Type: Bug > Components: impl > Affects Versions: types_branch > Environment: FC Linux x86/64, Pig revision 724576 > Reporter: Kevin Weil > Fix For: types_branch > > > Pig cannot handle LOAD statements with Hadoop globs where the globs have > subdirectories. For example, > A = LOAD 'dir/{dir1/subdir1,dir2/subdir2,dir3/subdir3}' USING ... > A similar statement in Hadoop, hadoop dfs -ls > dir/{dir1/subdir1,dir2/subdir2,dir3/subdir3}, does work correctly. > The output of running the above load statement in pig, built from svn > revision 724576, is: > 2008-12-17 12:02:28,480 [main] INFO > org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher > - 0% complete > 2008-12-17 12:02:28,480 [main] ERROR > org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher > - Map reduce job failed > 2008-12-17 12:02:28,480 [main] ERROR > org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher > - java.io.IOException: Unable to get collect for pattern > dir/{dir1/subdir1,dir2/subdir2,dir3/subdir3}} [Failed to obtain glob for > dir/{dir1/subdir1,dir2/subdir2,dir3/subdir3}] > at > org.apache.pig.backend.hadoop.datastorage.HDataStorage.asCollection(HDataStorage.java:231) > at > org.apache.pig.backend.hadoop.datastorage.HDataStorage.asCollection(HDataStorage.java:40) > at > org.apache.pig.impl.io.FileLocalizer.globMatchesFiles(FileLocalizer.java:486) > at > org.apache.pig.impl.io.FileLocalizer.fileExists(FileLocalizer.java:455) > at > org.apache.pig.backend.executionengine.PigSlicer.validate(PigSlicer.java:108) > at > org.apache.pig.impl.io.ValidatingInputFileSpec.validate(ValidatingInputFileSpec.java:59) > at > org.apache.pig.impl.io.ValidatingInputFileSpec.<init>(ValidatingInputFileSpec.java:44) > at > org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigInputFormat.getSplits(PigInputFormat.java:200) > at org.apache.hadoop.mapred.JobClient.submitJob(JobClient.java:742) > at org.apache.hadoop.mapred.jobcontrol.Job.submit(Job.java:370) > at > org.apache.hadoop.mapred.jobcontrol.JobControl.startReadyJobs(JobControl.java:247) > at > org.apache.hadoop.mapred.jobcontrol.JobControl.run(JobControl.java:279) > at java.lang.Thread.run(Thread.java:619) > Caused by: org.apache.pig.backend.datastorage.DataStorageException: Failed to > obtain glob for dir/{dir1/subdir1,dir2/subdir2,dir3/subdir3} > ... 13 more > Caused by: java.io.IOException: Illegal file pattern: Expecting set closure > character or end of range, or } for glob {dir1 at 5 > at > org.apache.hadoop.fs.FileSystem$GlobFilter.error(FileSystem.java:1084) > at > org.apache.hadoop.fs.FileSystem$GlobFilter.setRegex(FileSystem.java:1069) > at > org.apache.hadoop.fs.FileSystem$GlobFilter.<init>(FileSystem.java:987) > at org.apache.hadoop.fs.FileSystem.globPathsLevel(FileSystem.java:953) > at org.apache.hadoop.fs.FileSystem.globPathsLevel(FileSystem.java:962) > at org.apache.hadoop.fs.FileSystem.globPathsLevel(FileSystem.java:962) > at org.apache.hadoop.fs.FileSystem.globPathsLevel(FileSystem.java:962) > at org.apache.hadoop.fs.FileSystem.globStatus(FileSystem.java:902) > at org.apache.hadoop.fs.FileSystem.globStatus(FileSystem.java:862) > at > org.apache.pig.backend.hadoop.datastorage.HDataStorage.asCollection(HDataStorage.java:215) > ... 12 more -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.