Ok so the behavior is a little different when using FileInputFormat.addInputPath
as opposed to using pig. Ill try the glob. Thanks On 4/6/11 8:41 AM, Robert Evans wrote:
I believe that opening a directory as a file will result in a file not found. You probably need to set it to a glob, that points to that actual files. Something like /user/root/logs/2011/*/*/* for all entries in 2011, or /user/root/logs/2011/01/*/* if you want to restrict it to just January. By default if you pass in a directory as input the input format will assume that the directory contains only files, no sub directories and that you really want to use each of those files an input. --Bobby Evans On 4/6/11 9:53 AM, "Mark"<static.void....@gmail.com> wrote: How can I tell my job to include all the subdirectories and their content of a certain path? My directory structure is as follows: logs/{YEAR}/{MONTH}/{DAY} and I tried setting my input path to 'logs/' using FileInputFormat.addInputPath however I keep receiving the following error: ava.io.FileNotFoundException: File does not exist: /user/root/logs/2011/01 at org.apache.hadoop.hdfs.DFSClient$DFSInputStream.openInfo(DFSClient.java:1586) at org.apache.hadoop.hdfs.DFSClient$DFSInputStream.<init>(DFSClient.java:1577) at org.apache.hadoop.hdfs.DFSClient.open(DFSClient.java:428) at org.apache.hadoop.hdfs.DistributedFileSystem.open(DistributedFileSystem.java:187) at org.apache.hadoop.fs.FileSystem.open(FileSystem.java:456) at org.apache.hadoop.mapreduce.lib.input.LineRecordReader.initialize(LineRecordReader.java:67) at org.apache.hadoop.mapred.MapTask$NewTrackingRecordReader.initialize(MapTask.java:450) at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:645) at org.apache.hadoop.mapred.MapTask.run(MapTask.java:322) at org.apache.hadoop.mapred.Child$4.run(Child.java:240) at java.security.AccessController.doPrivileged(Native Method) at javax.security.auth.Subject.doAs(Subject.java:396) at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1115) at org.apache.hadoop.mapred.Child.main(Child.java:234 Do my directories/directory contents need to be in any particular format? Thanks