On Fri, Nov 4, 2011 at 10:04 AM, Pedro Costa <[email protected]> wrote: > 1- I think that IFIle.reader can only read the whole map output file. I > want to read a partition of the map output. How can I do that? How do I set > the size of a partition in the I
Look at the code for MapOutputServlet - it uses the index mechanism to find a particular partition. > > 2 - I know that map output is composed by blocks. What is the size of a > block? Is it 64MB by default? Nope, it doesn't use blocks. That's HDFS you're thinking of. -Todd > 2011/11/4 Todd Lipcon <[email protected]> > >> Hi Pedro, >> >> The format is called IFile. Check out the source for more info on the >> format - it's fairly simple. The partition starts are recorded in a >> separate index file next to the output file. >> >> I don't think you'll find significant docs on this format since it's >> MR-internal - the code is your best resource. >> >> -Todd >> >> On Fri, Nov 4, 2011 at 8:37 AM, Pedro Costa <[email protected]> wrote: >> > Hi, >> > >> > I'm trying to understand the structure of the map output file. Here's an >> > example of a mapoutput file that contains 2 partitions: >> > >> > [code] >> > <FF><FF><FF><FF>^@^@716banana banana apple banana carrot carrot apple >> > banana 0apple carrot carrot carrot banana carrot carrot 5^N4carrot apple >> > carrot apple apple carrot banana apple ^Mbanana apple >> <FF><FF><DF>|<8E><B7> >> > [/code] >> > >> > 1 - I would like to understand what are the ASCII characters parts. What >> > they means? >> > >> > 2 - What type of file is a map output? Is it a SequenceFileOutputFormat, >> or >> > a TextOutputFormat? >> > >> > 3 - I've a small program that runs independently of the MR that has the >> > goal to digest each partition and give the correspondent hash. How do I >> > know where each partition starts? >> > >> > >> > -- >> > Thanks, >> > PSC >> > >> >> >> >> -- >> Todd Lipcon >> Software Engineer, Cloudera >> > > > > -- > Thanks, > -- Todd Lipcon Software Engineer, Cloudera
