And please note the mail from Doug on Nov 23.

---------------------------------------------------------------------------------------------
Title: [Fwd: Spider Causing Contact Form Submissions]
Body: It looks as though Nutch is inadvertantly submitting forms.

At DOMContentUtils.java:58 we specify that the "action" parameter of an
HTML form should be extracted as a link.  Yet we ignore the "method"
parameter of the form.  I think we should only follow these when the
method is "get", not when it is "post".

Do others agree?

Doug
-------------------------------------------------------------------------------------------

I think the source code in svn ignore the POST url now .

/Jack


On 12/14/05, Jack Tang <[EMAIL PROTECTED]> wrote:
> Hi
>
> You can read the article about Stanford's HiWE search engine on www10.org.
> And it is easy to extend Nutch if you are using http-client protocol.
>
> http://www10.org/cdrom/posters/p1049/
>
> Good luck:)
>
> /Jack
>
> On 12/14/05, Andy Read <[EMAIL PROTECTED]> wrote:
> > Hi,
> >
> > I'm using nutch to create a site search facility for a couple of site.
> >
> > I upgraded from 0.6 to 0.7.1 a few days ago and have just noticed that blank
> > users are being registered on my site at the exact times the cron job runs
> > the crawl tool to re-index the site.  This means that the crawler is now
> > submitting a post request from the registration form!  Is this a new
> > 'feature' of 0.7 or 0.7.1?  I can't find any mention in changes.txt and I
> > can't find any config option referring to it.  Surely the crawler should
> > never submit form input?
> >
> > Any help appreciated.
> >
> > Thanks,
> >
> > Andy Read
> >
> > www.azurite.co.uk
> >
> >
> >
> >
>
>
> --
> Keep Discovering ... ...
> http://www.jroller.com/page/jmars
>


--
Keep Discovering ... ...
http://www.jroller.com/page/jmars

Reply via email to