Hi people
Could someone help me and explain some nutch specific things?

Imagine the situation, nutch fetches a page which get split on two parts
during parsing. The parts are stored in a segment.

Question 1. How will nutch save it? As a single chunk but with some marks
which would say that it consists of two parts or nutch will save each parsed
part of the original page as a separate block and which would be independent
from the other part?

Then one part of the page is updated and next time I re-fetch the page it
will consist again  of two parts. One part is old and one part is new.

How will nutch merge previous segment with a new one. Will it keep old
second segment or it will purge it and replace with a new one?

And finally the page is totally updated. So will old parsed blocks be purged
and replaced with new content and on what stage?

-- 
Best Regards
Alexander Aristov

Reply via email to