Hi dear nutchers,

I have implemented http session support for nutch. A patch will be
released, as soon as i switched to mapreduce.
I am crawling an intranet CMS. I was succesfull in indexing the PDFs.
If I follow the link in the search result pane, the PDFs are not retrieved
by the clients browser, because a session cookie is not set. I need some
kind of metadata in the PDF refering to the original HTML-URL, were this
session cookie is set before the page is redirekted to the url of the PDF.
This information is only availible when this HTML-URL is parsed.

Any ideas?

Thanks for your help.

Marcel Schnippe



-------------------------------------------------------
This SF.net email is sponsored by: Splunk Inc. Do you grep through log files
for problems?  Stop!  Download the new AJAX search engine that makes
searching your log files as easy as surfing the  web.  DOWNLOAD SPLUNK!
http://ads.osdn.com/?ad_id=7637&alloc_id=16865&op=click
_______________________________________________
Nutch-developers mailing list
Nutch-developers@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/nutch-developers

Reply via email to