[Nutch-dev] Re: record termination and MapReduce

Doug Cutting Mon, 06 Mar 2006 15:04:35 -0800

Toby DiPasquale wrote:

I have a question about the MapReduce and NDFS implementations. When
writing records into an NDFS file, how does one make sure that records
terminate cleanly on block boundaries such that a Map job's input does not

span multiple physical blocks?

We do not currently guarantee that. A task's input may span multipleblocks. We try to split things into block-sized chunks, but the lastfew records (up to the first sync mark past the split point) may be inthe next block. So a bit of i/o will happen over the network, but notthe vast majority.

It also appears as if NDFS does not have an explicit "record append"
operation. Is this the case?


Yes.  DFS currently is write-once.

Please note that the MapReduce and DFS code has moved from Nutch to theHadoop project. Such questions are more appropriately asked there.


Doug


-------------------------------------------------------
This SF.Net email is sponsored by xPML, a groundbreaking scripting language
that extends applications into web and mobile media. Attend the live webcast
and join the prime developer group breaking into this new coding territory!
http://sel.as-us.falkag.net/sel?cmd=lnk&kid=110944&bid=241720&dat=121642
_______________________________________________
Nutch-developers mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/nutch-developers

[Nutch-dev] Re: record termination and MapReduce

Reply via email to