Hi,

On 6/7/07, Emmanuel JOKE <[EMAIL PROTECTED]> wrote:
> Hi Guys,
>
> I've different website which set a cookie session and then allow the user to
> surf on the site.
> I would like to crawl those site but I don't know if Nutch know how to
> manage cookie session.
> Could you confirm ?

AFAIK, there is no support for cookies.

>
> I'm completly lost with the different plugin which are use to crawl with the
> HTTP protocol.
> Is it lib-http, protocol-http or protocol-httpclient ?
> What is the difference between all of them ?

lib-http is the base of both protocol plugins. It handles stuff like
parsing robots.txt, making sure that fetcher is polite etc., but it
doesn't fetch pages. It delegates fetching to one of the
protocol-(http|httpclient) plugins. Since lib-http is a dependency for
both plugins it gets loaded when either of them gets loaded.

>
> I would appreciate your view, it will help me to  implement the management
> of cookie in Nutch.
>
> Thanks
>


-- 
Doğacan Güney
-------------------------------------------------------------------------
This SF.net email is sponsored by DB2 Express
Download DB2 Express C - the FREE version of DB2 express and take
control of your XML. No limits. Just data. Click to get it now.
http://sourceforge.net/powerbar/db2/
_______________________________________________
Nutch-general mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/nutch-general

Reply via email to