Doğacan Güney wrote:
> On 6/7/07, Emmanuel JOKE <[EMAIL PROTECTED]> wrote:
>> Hi Guys,
>>
>> I've different website which set a cookie session and then allow the 
>> user to
>> surf on the site.
>> I would like to crawl those site but I don't know if Nutch know how to
>> manage cookie session.
>> Could you confirm ?
>>
>> I'm completly lost with the different plugin which are use to crawl 
>> with the
>> HTTP protocol.
>> Is it lib-http, protocol-http or protocol-httpclient ?
>> What is the difference between all of them ?
>>
>> I would appreciate your view, it will help me to  implement the 
>> management
>> of cookie in Nutch.
> 
> I forgot to answer your question:)
> 
> If you only need to remember session cookie during one round of fetch,
> it is pretty simple. In lib-http, when you get a cookie put it in a
> Map (from hosts to strings) then when you are fetching next url from
> the same host, get the cookie and add it to your request.
> 
> If you want to remember cookies across fetcher, well.... I am not sure
> how to do it:) Perhaps, you can write an extra job that puts the
> cookie to every datum from that host, then pick it up in fetcher. Or
> perhaps someone has a better idea :)

Actually, if you use protocol-httpclient, it handles cookies properly 
without any additional configuration.

However, they are not stored anywhere, so they will be valid only for 
the duration of a single fetch.

-- 
Best regards,
Andrzej Bialecki     <><
  ___. ___ ___ ___ _ _   __________________________________
[__ || __|__/|__||\/|  Information Retrieval, Semantic Web
___|||__||  \|  ||  |  Embedded Unix, System Integration
http://www.sigram.com  Contact: info at sigram dot com


-------------------------------------------------------------------------
This SF.net email is sponsored by DB2 Express
Download DB2 Express C - the FREE version of DB2 express and take
control of your XML. No limits. Just data. Click to get it now.
http://sourceforge.net/powerbar/db2/
_______________________________________________
Nutch-general mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/nutch-general

Reply via email to