Hi dear nutchers, I have implemented http session support for nutch. A patch will be released, as soon as i switched to mapreduce. I am crawling an intranet CMS. I was succesfull in indexing the PDFs. If I follow the link in the search result pane, the PDFs are not retrieved by the clients browser, because a session cookie is not set. I need some kind of metadata in the PDF refering to the original HTML-URL, were this session cookie is set before the page is redirekted to the url of the PDF. This information is only availible when this HTML-URL is parsed.
Any ideas? Thanks for your help. Marcel Schnippe ------------------------------------------------------- This SF.net email is sponsored by: Splunk Inc. Do you grep through log files for problems? Stop! Download the new AJAX search engine that makes searching your log files as easy as surfing the web. DOWNLOAD SPLUNK! http://ads.osdn.com/?ad_id=7637&alloc_id=16865&op=click _______________________________________________ Nutch-developers mailing list Nutch-developers@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/nutch-developers