Hi, sorry for my mail, i hitted unintentionally the enter-Key. Again:
I've already written a crawler for HTTP and Filesystem (with different include- and exclude-Options) (based on OROMatcher, thanks god there is open source software!). We needed that for importing Web-Sites into our product, a content-managment system. I suggest to develop that into a relatively autonomous library, wich could be used by lucene and other packages for retrieving masses of html-pages. regards, Manfred -- To unsubscribe, e-mail: <mailto:[EMAIL PROTECTED]> For additional commands, e-mail: <mailto:[EMAIL PROTECTED]>