On Mon, 2006-11-27 at 09:34 -0800, Chris Mattmann wrote:
> Hi Thorsten
> 
> On 11/27/06 4:00 AM, "Thorsten Scherler"
> <[EMAIL PROTECTED]> wrote:
> 
> > 
> > Reading the wiki and the docu I get the impression I need to write my
> > own implementation of an indexer/searcher plugin, which is able to
> > filter/index crucial filter information such as <summary year="2006"
> > number="209" date="27-10-2006" section="1">, <organisation
> > name="Consejería de Economia y Hacienda"> and <disposition
> > type="Resolución" >.
> 
>  Yes, you may need to write your own parse, indexer and searcher plugins,
> however, I am currently working on getting the parse-xml plugin into the
> Nutch sources. The parse-xml plugin includes an indexing filter for the
> fields that are extracted by the xml parser. The XML parser is configurable
> to custom schemas and fields that need to be extracted.
> 
>  This plugin is available currently in JIRA, attached to this issue:
> 
> http://issues.apache.org/jira/browse/NUTCH-185
> 

Nice, I need to finish ATM another part of the project, but as soon I
come back to the engine I will have a closer look on the issue. Thanks.

> I am working hard to get this plugin ported to the latest trunk source, and
> ready to be committed to the sources. I hope to attach a patch within the
> next week that brings this plugin up to date, and gets the code ready for
> prime-time (formatting, public javadocs, etc.). Once I attach the patch, you
> may find that you only need to write your searcher plugin. Then again, in
> the interest of time, you may go the route for writing your own set of
> plugins. In that case, you can find examples of how to write the
> parse/index/query plugins, by looking at the Nutch source, in subversion,
> available here:
> 
> Parse plugins: 
> http://svn.apache.org/repos/asf/lucene/nutch/trunk/src/plugin/parse-*
> Index plugins: 
> http://svn.apache.org/repos/asf/lucene/nutch/trunk/src/plugin/index-*
> Query plugins: 
> http://svn.apache.org/repos/asf/lucene/nutch/trunk/src/plugin/query-*
> 
> 


Yeah, thanks very much.

> > 
> > Still being a newbie to nutch I would appreciate the opinion of
> > experienced devs whether nutch is the right choice and if so how I
> > should start. 
> 
> I think that you could do this with Nutch, and if you do, for free, you get:
> 
> Crawling
> Parsing/Indexing
> Search Webapp, and RSS based OpenSearch servlet
> 
> You could also do this with Lucene, but I think you may find that you end up
> maintaining more code, and having to rewrite existing functionality
> available within Nutch.
> 

I understand. What about http://incubator.apache.org/solr/? That seems
to be as well a good alternative. I still need to evaluate nutch and
solr more to come to a final decission. 

> Just my 2 cents...
> 
> Cheers,
>   Chris
> 

Thank you very much Chris for your feedback.

salu2

> 
> ______________________________________________
> Chris A. Mattmann
> [EMAIL PROTECTED]
> Staff Member
> Modeling and Data Management Systems Section (387)
> Data Management Systems and Technologies Group
> 
> _________________________________________________
> Jet Propulsion Laboratory            Pasadena, CA
> Office: 171-266B                        Mailstop:  171-246
> _______________________________________________________
> 
> Disclaimer:  The opinions presented within are my own and do not reflect
> those of either NASA, JPL, or the California Institute of Technology.
> 
> 


-------------------------------------------------------------------------
Take Surveys. Earn Cash. Influence the Future of IT
Join SourceForge.net's Techsay panel and you'll get the chance to share your
opinions on IT & business topics through brief surveys - and earn cash
http://www.techsay.com/default.php?page=join.php&p=sourceforge&CID=DEVDEV
_______________________________________________
Nutch-general mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/nutch-general

Reply via email to