Hi Thorsten On 11/27/06 4:00 AM, "Thorsten Scherler" <[EMAIL PROTECTED]> wrote:
> > Reading the wiki and the docu I get the impression I need to write my > own implementation of an indexer/searcher plugin, which is able to > filter/index crucial filter information such as <summary year="2006" > number="209" date="27-10-2006" section="1">, <organisation > name="ConsejerÃa de Economia y Hacienda"> and <disposition > type="Resolución" >. Yes, you may need to write your own parse, indexer and searcher plugins, however, I am currently working on getting the parse-xml plugin into the Nutch sources. The parse-xml plugin includes an indexing filter for the fields that are extracted by the xml parser. The XML parser is configurable to custom schemas and fields that need to be extracted. This plugin is available currently in JIRA, attached to this issue: http://issues.apache.org/jira/browse/NUTCH-185 I am working hard to get this plugin ported to the latest trunk source, and ready to be committed to the sources. I hope to attach a patch within the next week that brings this plugin up to date, and gets the code ready for prime-time (formatting, public javadocs, etc.). Once I attach the patch, you may find that you only need to write your searcher plugin. Then again, in the interest of time, you may go the route for writing your own set of plugins. In that case, you can find examples of how to write the parse/index/query plugins, by looking at the Nutch source, in subversion, available here: Parse plugins: http://svn.apache.org/repos/asf/lucene/nutch/trunk/src/plugin/parse-* Index plugins: http://svn.apache.org/repos/asf/lucene/nutch/trunk/src/plugin/index-* Query plugins: http://svn.apache.org/repos/asf/lucene/nutch/trunk/src/plugin/query-* > > Still being a newbie to nutch I would appreciate the opinion of > experienced devs whether nutch is the right choice and if so how I > should start. I think that you could do this with Nutch, and if you do, for free, you get: Crawling Parsing/Indexing Search Webapp, and RSS based OpenSearch servlet You could also do this with Lucene, but I think you may find that you end up maintaining more code, and having to rewrite existing functionality available within Nutch. Just my 2 cents... Cheers, Chris ______________________________________________ Chris A. Mattmann [EMAIL PROTECTED] Staff Member Modeling and Data Management Systems Section (387) Data Management Systems and Technologies Group _________________________________________________ Jet Propulsion Laboratory Pasadena, CA Office: 171-266B Mailstop: 171-246 _______________________________________________________ Disclaimer: The opinions presented within are my own and do not reflect those of either NASA, JPL, or the California Institute of Technology. ------------------------------------------------------------------------- Take Surveys. Earn Cash. Influence the Future of IT Join SourceForge.net's Techsay panel and you'll get the chance to share your opinions on IT & business topics through brief surveys - and earn cash http://www.techsay.com/default.php?page=join.php&p=sourceforge&CID=DEVDEV _______________________________________________ Nutch-general mailing list [email protected] https://lists.sourceforge.net/lists/listinfo/nutch-general
