Hi Marcel,

for version 0.7.x you can use a patch I had uploaded to the jira.
http://issues.apache.org/jira/browse/NUTCH-59

For version 0.8 this will not work anymore.
I already discussed the meta data issue with Doug and how we can solve it in 0.8 but I haven't found any time to write something, but it is definitely on my todo list.

Stefan




Am 28.11.2005 um 10:37 schrieb [EMAIL PROTECTED]:


Hi dear nutchers,

I have implemented http session support for nutch. A patch will be
released, as soon as i switched to mapreduce.
I am crawling an intranet CMS. I was succesfull in indexing the PDFs.
If I follow the link in the search result pane, the PDFs are not retrieved by the clients browser, because a session cookie is not set. I need some kind of metadata in the PDF refering to the original HTML-URL, were this session cookie is set before the page is redirekted to the url of the PDF.
This information is only availible when this HTML-URL is parsed.

Any ideas?

Thanks for your help.

Marcel Schnippe





-------------------------------------------------------
This SF.net email is sponsored by: Splunk Inc. Do you grep through log files
for problems?  Stop!  Download the new AJAX search engine that makes
searching your log files as easy as surfing the  web.  DOWNLOAD SPLUNK!
http://ads.osdn.com/?ad_id=7637&alloc_id=16865&op=click
_______________________________________________
Nutch-developers mailing list
Nutch-developers@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/nutch-developers

Reply via email to