If you have time, update the wiki FAQ on this so that the next person has an easy time figuring this question out.
On 10/15/07 2:22 PM, "Ming Yang" <[EMAIL PROTECTED]> wrote: > thank you! after tracing the code I realized that I should override > getRecordReader(...) as well to return the whole content of the file, > ie. to finish the job. :) > > 2007/10/15, Ted Dunning <[EMAIL PROTECTED]>: >> >> >> You didn't do anything wrong. You just didn't finish the job. >> >> You need to override getRecordReader as well so that it returns the contents >> of the file (or a lazy version of same) as a single record. >> >> >> On 10/15/07 11:00 AM, "Ming Yang" <[EMAIL PROTECTED]> wrote: >> >>> I just did a test by simply extending from TextInputFormat >>> and override isSplitable(FileSystem fs, Path file) to always >>> returning false. However, in my mapper, I still see the input >>> file gets splitted into lines. I did set the input format in >>> JobConfiguration and isSplitable(...) -> false did get called >>> during job execution. Is there anything I did wrong or >>> this is the behavior I should be expecting? >>> >>> Thanks, >>> >>> Ming >>> >>> 2007/10/15, Ted Dunning <[EMAIL PROTECTED]>: >>>> >>>> That doesn't quite do what the poster requested. They wanted to pass the >>>> entire file to the mapper. >>>> >>>> That requires a custom input format or an indirect input approach (list of >>>> file names in input). >>>> >>>> >>>> On 10/15/07 9:57 AM, "Rick Cox" <[EMAIL PROTECTED]> wrote: >>>> >>>>> You can also gzip each input file. Hadoop will not split a compressed >>>>> input file (but will automatically decompress it before feeding it to >>>>> your mapper). >>>>> >>>>> rick >>>>> >>>>> On 10/15/07, Ted Dunning <[EMAIL PROTECTED]> wrote: >>>>>> >>>>>> >>>>>> Use a list of file names as your map input. Then your mapper can read a >>>>>> line, use that to open and read a file for processing. >>>>>> >>>>>> This is similar to the problem of web-crawling where the input is a list >>>>>> of >>>>>> URL's. >>>>>> >>>>>> On 10/15/07 6:57 AM, "Ming Yang" <[EMAIL PROTECTED]> wrote: >>>>>> >>>>>>> I was writing a test mapreduce program and noticed that the >>>>>>> input file was always broken down into separate lines and fed >>>>>>> to the mapper. However, in my case I need to process the whole >>>>>>> file in the mapper since there are some dependency between >>>>>>> lines in the input file. Is there any way I can achieve this -- >>>>>>> process the whole input file, either text or binary, in the mapper? >>>>>> >>>>>> >>>> >>>> >> >>