On Mon, 2006-11-27 at 09:34 -0800, Chris Mattmann wrote: > Hi Thorsten > > On 11/27/06 4:00 AM, "Thorsten Scherler" > <[EMAIL PROTECTED]> wrote: > > > > > Reading the wiki and the docu I get the impression I need to write my > > own implementation of an indexer/searcher plugin, which is able to > > filter/index crucial filter information such as <summary year="2006" > > number="209" date="27-10-2006" section="1">, <organisation > > name="ConsejerÃa de Economia y Hacienda"> and <disposition > > type="Resolución" >. > > Yes, you may need to write your own parse, indexer and searcher plugins, > however, I am currently working on getting the parse-xml plugin into the > Nutch sources. The parse-xml plugin includes an indexing filter for the > fields that are extracted by the xml parser. The XML parser is configurable > to custom schemas and fields that need to be extracted. > > This plugin is available currently in JIRA, attached to this issue: > > http://issues.apache.org/jira/browse/NUTCH-185 >
Nice, I need to finish ATM another part of the project, but as soon I come back to the engine I will have a closer look on the issue. Thanks. > I am working hard to get this plugin ported to the latest trunk source, and > ready to be committed to the sources. I hope to attach a patch within the > next week that brings this plugin up to date, and gets the code ready for > prime-time (formatting, public javadocs, etc.). Once I attach the patch, you > may find that you only need to write your searcher plugin. Then again, in > the interest of time, you may go the route for writing your own set of > plugins. In that case, you can find examples of how to write the > parse/index/query plugins, by looking at the Nutch source, in subversion, > available here: > > Parse plugins: > http://svn.apache.org/repos/asf/lucene/nutch/trunk/src/plugin/parse-* > Index plugins: > http://svn.apache.org/repos/asf/lucene/nutch/trunk/src/plugin/index-* > Query plugins: > http://svn.apache.org/repos/asf/lucene/nutch/trunk/src/plugin/query-* > > Yeah, thanks very much. > > > > Still being a newbie to nutch I would appreciate the opinion of > > experienced devs whether nutch is the right choice and if so how I > > should start. > > I think that you could do this with Nutch, and if you do, for free, you get: > > Crawling > Parsing/Indexing > Search Webapp, and RSS based OpenSearch servlet > > You could also do this with Lucene, but I think you may find that you end up > maintaining more code, and having to rewrite existing functionality > available within Nutch. > I understand. What about http://incubator.apache.org/solr/? That seems to be as well a good alternative. I still need to evaluate nutch and solr more to come to a final decission. > Just my 2 cents... > > Cheers, > Chris > Thank you very much Chris for your feedback. salu2 > > ______________________________________________ > Chris A. Mattmann > [EMAIL PROTECTED] > Staff Member > Modeling and Data Management Systems Section (387) > Data Management Systems and Technologies Group > > _________________________________________________ > Jet Propulsion Laboratory Pasadena, CA > Office: 171-266B Mailstop: 171-246 > _______________________________________________________ > > Disclaimer: The opinions presented within are my own and do not reflect > those of either NASA, JPL, or the California Institute of Technology. > > ------------------------------------------------------------------------- Take Surveys. Earn Cash. Influence the Future of IT Join SourceForge.net's Techsay panel and you'll get the chance to share your opinions on IT & business topics through brief surveys - and earn cash http://www.techsay.com/default.php?page=join.php&p=sourceforge&CID=DEVDEV _______________________________________________ Nutch-general mailing list [email protected] https://lists.sourceforge.net/lists/listinfo/nutch-general
