Thanks for the response. What I meant by uniform view is that I would be able to avoid having to reference each individual part-r-xxxx file. It wasn't immediately clear to me that the directory could be the input path. That tells me then the problem(s) is somewhere in my MR code. Thanks!
On Wed, Mar 2, 2011 at 11:19 PM, Harsh J <[email protected]> wrote: > Hello, > > On Thu, Mar 3, 2011 at 7:51 AM, John Sanda <[email protected]> wrote: > > The output path created from the first job is a directory, and it the > file > > in that directory that has a name like part-r-0000 that I want to feed as > > input into the second job. I am running in pseudo-distributed mode so I > know > > that that file name is going to be the same every run. But in a true > > distributed mode that file name will be different for each node. More > over, > > The default filename of many OutputFormats start with "part", and is > not node dependent. You will get filenames in out1 as part-r-00000 > onwards to part-r-{num. of reduce tasks for your job}. > > > when in distributed mode don't I want a uniform view of that output file > > which will be spread across my cluster? Is there something wrong in my > code? > > Or can someone point me to some examples that do this? > > I do not understand what you mean by uniform view. Using a directory > as an input for a job is very much acceptable and a normal thing to do > in file-based MR. The directories form the whole input, with files > containing small "parts" of it. I do not see anything grossly wrong in > your code provided. > > -- > Harsh J > www.harshj.com > -- - John
