I think you underestimate the potential applications of NUTCH, because there
can be quite a lot of intelligence ("IP") in the plug-in architecture.
You can choose to focus your crawl &/or content, you can choose to add
specific fields which fits a vertical search field, you may want to adapt
scoring of URL and content.

So the limitations are not technical but related to the fact that search
applications are different and therefore the data will be different.

2009/6/22 Paul Jones <[email protected]>

> Hi
>
> A newbie to the world of lucene, nutch , mahout, spent all weekend on
> Mahout, and now looking at Nutch. So I have a question, its seems (after
> reading the archives) that alot of people are using Nutch to index the web,
> whether for vertical searches, or just the web as a whole. Now rather than
> everyone starting again from scratch, and since very little (if any) "IP"
> would exist in the index, since nothing clever has been done to them except
> being processed by Nutch, would it not be possible to "share" all these
> indexes with each other, i.e if someone has built an index of all blogs, or
> all car related websites, or just indexed 100 million webpages at random.
> Maybe there is some tech reason I am missing.
>
> Paul
>
>
>
>




-- 
-MilleBii-

Reply via email to