Nutch spidered one of our sites last night and when it encountered a URL that contained a space character it would ignore everything after the space which caused our application to fail with the resulting URL it attempted to access.

Example URL that should have been requested:
  http://www.apache.org/cgi-bin/view?status=A%20&id=1

What Nutch then tried to access:
  http://www.apache.org/cgi-bin/view?status=A

Please investigate.

Thanks,
Rick Flosi

Reply via email to