Tomi NA wrote:
> 2006/10/18, [EMAIL PROTECTED] <[EMAIL PROTECTED]>:
>
>> Btw we have some virtual local hosts, hoz does the 
>> db.ignore.external.links
>> deal with that ?
>
> Update:
> setting db.ignore.external.links to true in nutch-site (and later also
> in nutch-default as a sanity check) *doesn't work*: I feed the crawl
> process a handfull of URLs and can only helplessly watch as the crawl
> spreads to dozens of other sites.

Could you give an example of a root URL, which leads to this symptom 
(i.e. leaks outside the original site)?

>
> In answer to your question, it seems pointless to talk about virtual
> host handling if the elementary filtering logic doesn't seem to
> work... :-\

Well, if this logic doesn't work it needs to be fixed, that's all.


-- 
Best regards,
Andrzej Bialecki     <><
 ___. ___ ___ ___ _ _   __________________________________
[__ || __|__/|__||\/|  Information Retrieval, Semantic Web
___|||__||  \|  ||  |  Embedded Unix, System Integration
http://www.sigram.com  Contact: info at sigram dot com



-------------------------------------------------------------------------
Using Tomcat but need to do more? Need to support web services, security?
Get stuff done quickly with pre-integrated technology to make your job easier
Download IBM WebSphere Application Server v.1.0.1 based on Apache Geronimo
http://sel.as-us.falkag.net/sel?cmd=lnk&kid=120709&bid=263057&dat=121642
_______________________________________________
Nutch-general mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/nutch-general

Reply via email to