I'd like to crawl the whole-web

Applying my own IndexingFilter plugin I can select the URLs that are of
interest to me

Than using the 'Prune' method I have the searcher focused on those pages
which are of interest

So far so good it works,

But I would like to recrawl with different strategis, say different
frequencies:
1. weekly, the sites which contain interesting data
2. monthly, the sites which do not contain interesting data

I tried to look into the scoring plug-in but I didn't find the way to use it
for that purpose.
Any suggestions ?

Basically there seems to be missing a mechanism to feedback from the
indexing phase into the crawling phase.
In this way we could focus the crawling based on content and not on URL
filtering which is a poor reflection of the content.

-Raymond-

Reply via email to