RE: Crawler submits forms?

Andy Read Tue, 13 Dec 2005 10:30:45 -0800

Thanks for these various responses.

I agree that I should be checking input more carefully and will do so.
In my experience most developers find it useful to allow both GET and POST
input so would prefer not to deny GET requests.


But I do agree with Doug's fix to stop the crawler following POST links as
the recommendation is that POST requests are used where side-effects are
likely (see http://www.w3.org/2001/tag/doc/whenToUseGet.html#checklist).  I
assume this fix will make it into 0.7.2 some time, if I don't want to build
from CVS.

I'm not quite sure Jack's response about Stanford's HiWE search engine was a
direct answer to my question, but it does raise the question of whether some
applications will always think there are valid reasons to submit form POSTs
in an effort to discover "the hidden web".

This seems very reminiscent of the Google Web Accelerator saga earlier this
year (e.g. see
http://www.sitepoint.com/newsletter/viewissue.php?id=3&issue=113&format=html
), although that caused problems even with hrefs with side-effects (bad
idea!) but usually only when users are logged in.

Andy Read

www.azurite.co.uk

RE: Crawler submits forms?

Reply via email to