To answer my own question, I now realize there is an entry in crawl-urlfilter.txt to ignore query strings by default. I commented that out and it works now.
Chris Stephens wrote: > How do I get Nutch to follow URLs that contain a query string such as > ?blah=something at the end of the url? Nutch seems to ignore these > and I didn't find any configuration option to enable this. Does a > plugin or some such exist to facilitate following these types of links? > > > > ------------------------------------------------------------------------- Using Tomcat but need to do more? Need to support web services, security? Get stuff done quickly with pre-integrated technology to make your job easier Download IBM WebSphere Application Server v.1.0.1 based on Apache Geronimo http://sel.as-us.falkag.net/sel?cmd=lnk&kid=120709&bid=263057&dat=121642 _______________________________________________ Nutch-general mailing list [email protected] https://lists.sourceforge.net/lists/listinfo/nutch-general
