Doğacan Güney wrote: > On 6/7/07, Emmanuel JOKE <[EMAIL PROTECTED]> wrote: >> Hi Guys, >> >> I've different website which set a cookie session and then allow the >> user to >> surf on the site. >> I would like to crawl those site but I don't know if Nutch know how to >> manage cookie session. >> Could you confirm ? >> >> I'm completly lost with the different plugin which are use to crawl >> with the >> HTTP protocol. >> Is it lib-http, protocol-http or protocol-httpclient ? >> What is the difference between all of them ? >> >> I would appreciate your view, it will help me to implement the >> management >> of cookie in Nutch. > > I forgot to answer your question:) > > If you only need to remember session cookie during one round of fetch, > it is pretty simple. In lib-http, when you get a cookie put it in a > Map (from hosts to strings) then when you are fetching next url from > the same host, get the cookie and add it to your request. > > If you want to remember cookies across fetcher, well.... I am not sure > how to do it:) Perhaps, you can write an extra job that puts the > cookie to every datum from that host, then pick it up in fetcher. Or > perhaps someone has a better idea :)
Actually, if you use protocol-httpclient, it handles cookies properly without any additional configuration. However, they are not stored anywhere, so they will be valid only for the duration of a single fetch. -- Best regards, Andrzej Bialecki <>< ___. ___ ___ ___ _ _ __________________________________ [__ || __|__/|__||\/| Information Retrieval, Semantic Web ___|||__|| \| || | Embedded Unix, System Integration http://www.sigram.com Contact: info at sigram dot com ------------------------------------------------------------------------- This SF.net email is sponsored by DB2 Express Download DB2 Express C - the FREE version of DB2 express and take control of your XML. No limits. Just data. Click to get it now. http://sourceforge.net/powerbar/db2/ _______________________________________________ Nutch-general mailing list [email protected] https://lists.sourceforge.net/lists/listinfo/nutch-general
