Hi, On 6/7/07, Emmanuel JOKE <[EMAIL PROTECTED]> wrote: > Hi Guys, > > I've different website which set a cookie session and then allow the user to > surf on the site. > I would like to crawl those site but I don't know if Nutch know how to > manage cookie session. > Could you confirm ?
AFAIK, there is no support for cookies. > > I'm completly lost with the different plugin which are use to crawl with the > HTTP protocol. > Is it lib-http, protocol-http or protocol-httpclient ? > What is the difference between all of them ? lib-http is the base of both protocol plugins. It handles stuff like parsing robots.txt, making sure that fetcher is polite etc., but it doesn't fetch pages. It delegates fetching to one of the protocol-(http|httpclient) plugins. Since lib-http is a dependency for both plugins it gets loaded when either of them gets loaded. > > I would appreciate your view, it will help me to implement the management > of cookie in Nutch. > > Thanks > -- Doğacan Güney ------------------------------------------------------------------------- This SF.net email is sponsored by DB2 Express Download DB2 Express C - the FREE version of DB2 express and take control of your XML. No limits. Just data. Click to get it now. http://sourceforge.net/powerbar/db2/ _______________________________________________ Nutch-general mailing list [email protected] https://lists.sourceforge.net/lists/listinfo/nutch-general
