On 6/7/07, Emmanuel JOKE <[EMAIL PROTECTED]> wrote: > Hi Guys, > > I've different website which set a cookie session and then allow the user to > surf on the site. > I would like to crawl those site but I don't know if Nutch know how to > manage cookie session. > Could you confirm ? > > I'm completly lost with the different plugin which are use to crawl with the > HTTP protocol. > Is it lib-http, protocol-http or protocol-httpclient ? > What is the difference between all of them ? > > I would appreciate your view, it will help me to implement the management > of cookie in Nutch.
I forgot to answer your question:) If you only need to remember session cookie during one round of fetch, it is pretty simple. In lib-http, when you get a cookie put it in a Map (from hosts to strings) then when you are fetching next url from the same host, get the cookie and add it to your request. If you want to remember cookies across fetcher, well.... I am not sure how to do it:) Perhaps, you can write an extra job that puts the cookie to every datum from that host, then pick it up in fetcher. Or perhaps someone has a better idea :) > > Thanks > -- Doğacan Güney ------------------------------------------------------------------------- This SF.net email is sponsored by DB2 Express Download DB2 Express C - the FREE version of DB2 express and take control of your XML. No limits. Just data. Click to get it now. http://sourceforge.net/powerbar/db2/ _______________________________________________ Nutch-general mailing list [email protected] https://lists.sourceforge.net/lists/listinfo/nutch-general
