Look at the creativecommons plugin built into Nutch. That will show
you how to do it by implementing an HtmlParseFilter. I created a
custom plugin just like this just a few moments ago and it works great.
Erik
On Aug 10, 2005, at 1:39 PM, Fuad Efendi wrote:
I need specific pre-processing of a html-page, to add more fields to
Document before storing it in Index, and to modify web-interface
accordingly.
Where is the base point of extension?
Thanks!
-------------------------------------------------------
SF.Net email is Sponsored by the Better Software Conference & EXPO
September 19-22, 2005 * San Francisco, CA * Development Lifecycle
Practices
Agile & Plan-Driven Development * Managing Projects & Teams *
Testing & QA
Security * Process Improvement & Measurement * http://www.sqe.com/
bsce5sf
_______________________________________________
Nutch-general mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/nutch-general
-------------------------------------------------------
SF.Net email is Sponsored by the Better Software Conference & EXPO
September 19-22, 2005 * San Francisco, CA * Development Lifecycle Practices
Agile & Plan-Driven Development * Managing Projects & Teams * Testing & QA
Security * Process Improvement & Measurement * http://www.sqe.com/bsce5sf
_______________________________________________
Nutch-general mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/nutch-general