Ken Krugler wrote:
Hi Oleg,

On Dec 3, 2009, at 2:40am, Oleg Kalnichevski wrote:

On Wed, 2009-12-02 at 19:15 -0800, Ken Krugler wrote:
Below is an email from August 7th, which I'm reviving due to this
becoming a bigger issue over in Bixo-land.

I've continued to run into this issue with my crawls, but so far I'm
not doing anything with cookies, so it hasn't been a priority to track
down.

However another Bixo user also runs into it, and he noticed that by
switching back to HttpClient 4.0-beta3, the warnings went away.

I believe he just opened HTTPCLIENT-896 as a clone of HTTPCLIENT-773,
which seemed to be this exact same bug (fixed by Oleg around 17/May/08).

I'm wondering if the bug crept back into the code sometime between
then and the final release.

Thanks,

-- Ken


Hi Ken

The cookie in question violates the format of 'expires' attribute
expected by the Netscape policy. One can configure the policy to be more
lenient about the format of 'expires' attribute by using a special HTTP
parameter. For details see HTTPCLIENT-896.

It is not really a regression. I think the Netscape cookie policy was
made stricter at some point of time post 4.0-beta1

Hope this clarifies the situation.

Thanks for the clarification, and the example code you added in a comment to HTTPCLIENT-896.

Given the number of invalid cookies w/this issue that I see during a crawl, would it make sense for the "best match" policy to select a more lenient Netscape format?

Or maybe add a "best match-lenient" policy that does this?

I haven't had to do much in the way of cookie processing in the past, so I'll confess up front that I'm ignorant about the potential issues that could arise from using a more lenient policy.

Thanks again,

-- Ken



Ken

I am somewhat reluctant to optimize HttpClient for just one particular use case, such as web crawling. Not only does the cookie in question violate the HTTP state management standards, it also violates the Netscape Draft spec. I do not think HttpClient should accept such cookies as valid per default. At the same time it is really easy to override the default behavior with just one parameter.

Cheers

Oleg


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to