To answer my own question, I now realize there is an entry in 
crawl-urlfilter.txt to ignore query strings by default.  I commented 
that out and it works now.

Chris Stephens wrote:
> How do I get Nutch to follow URLs that contain a query string such as 
> ?blah=something at the end of the url?  Nutch seems to ignore these 
> and I didn't find any configuration option to enable this.  Does a 
> plugin or some such exist to facilitate following these types of links?
>
>
>
>


-------------------------------------------------------------------------
Using Tomcat but need to do more? Need to support web services, security?
Get stuff done quickly with pre-integrated technology to make your job easier
Download IBM WebSphere Application Server v.1.0.1 based on Apache Geronimo
http://sel.as-us.falkag.net/sel?cmd=lnk&kid=120709&bid=263057&dat=121642
_______________________________________________
Nutch-general mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/nutch-general

Reply via email to