Tomi NA wrote: > 2006/10/18, [EMAIL PROTECTED] <[EMAIL PROTECTED]>: > >> Btw we have some virtual local hosts, hoz does the >> db.ignore.external.links >> deal with that ? > > Update: > setting db.ignore.external.links to true in nutch-site (and later also > in nutch-default as a sanity check) *doesn't work*: I feed the crawl > process a handfull of URLs and can only helplessly watch as the crawl > spreads to dozens of other sites.
Could you give an example of a root URL, which leads to this symptom (i.e. leaks outside the original site)? > > In answer to your question, it seems pointless to talk about virtual > host handling if the elementary filtering logic doesn't seem to > work... :-\ Well, if this logic doesn't work it needs to be fixed, that's all. -- Best regards, Andrzej Bialecki <>< ___. ___ ___ ___ _ _ __________________________________ [__ || __|__/|__||\/| Information Retrieval, Semantic Web ___|||__|| \| || | Embedded Unix, System Integration http://www.sigram.com Contact: info at sigram dot com ------------------------------------------------------------------------- Using Tomcat but need to do more? Need to support web services, security? Get stuff done quickly with pre-integrated technology to make your job easier Download IBM WebSphere Application Server v.1.0.1 based on Apache Geronimo http://sel.as-us.falkag.net/sel?cmd=lnk&kid=120709&bid=263057&dat=121642 _______________________________________________ Nutch-general mailing list [email protected] https://lists.sourceforge.net/lists/listinfo/nutch-general
