Hi people Could someone help me and explain some nutch specific things? Imagine the situation, nutch fetches a page which get split on two parts during parsing. The parts are stored in a segment.
Question 1. How will nutch save it? As a single chunk but with some marks which would say that it consists of two parts or nutch will save each parsed part of the original page as a separate block and which would be independent from the other part? Then one part of the page is updated and next time I re-fetch the page it will consist again of two parts. One part is old and one part is new. How will nutch merge previous segment with a new one. Will it keep old second segment or it will purge it and replace with a new one? And finally the page is totally updated. So will old parsed blocks be purged and replaced with new content and on what stage? -- Best Regards Alexander Aristov
