Hi Marcel,
for version 0.7.x you can use a patch I had uploaded to the jira.
http://issues.apache.org/jira/browse/NUTCH-59
For version 0.8 this will not work anymore.
I already discussed the meta data issue with Doug and how we can
solve it in 0.8 but I haven't found any time to write something, but
it is definitely on my todo list.
Stefan
Am 28.11.2005 um 10:37 schrieb [EMAIL PROTECTED]:
Hi dear nutchers,
I have implemented http session support for nutch. A patch will be
released, as soon as i switched to mapreduce.
I am crawling an intranet CMS. I was succesfull in indexing the PDFs.
If I follow the link in the search result pane, the PDFs are not
retrieved
by the clients browser, because a session cookie is not set. I need
some
kind of metadata in the PDF refering to the original HTML-URL, were
this
session cookie is set before the page is redirekted to the url of
the PDF.
This information is only availible when this HTML-URL is parsed.
Any ideas?
Thanks for your help.
Marcel Schnippe
-------------------------------------------------------
This SF.net email is sponsored by: Splunk Inc. Do you grep through log files
for problems? Stop! Download the new AJAX search engine that makes
searching your log files as easy as surfing the web. DOWNLOAD SPLUNK!
http://ads.osdn.com/?ad_id=7637&alloc_id=16865&op=click
_______________________________________________
Nutch-developers mailing list
Nutch-developers@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/nutch-developers