[ http://issues.apache.org/jira/browse/NUTCH-74?page=comments#action_12316094 ]
Jerome Charron commented on NUTCH-74: ------------------------------------- Christophe, I already done such plugin for French and German in order to test the Analyzer Factory. The difference with your approach is that instead of copying the luncene's analyzer code, I added some dependencies on the lucene libs. I think it is a better approach since it avoids to duplicate the code. I added an analysis extension point too in order to plug the analysis plugins. But for now, these plugins are called by the AnalysisFactory depending on the language identifier result. And as I explained in a previous mail, the language identifier failed (bad language identification) due to an enconding problem in Nutch. I'm currently working on this issue, and I can't submit my code in its current state. But if you want, I can send you some parts of the code. Regards Jerome > French Analyzer Plugin > ---------------------- > > Key: NUTCH-74 > URL: http://issues.apache.org/jira/browse/NUTCH-74 > Project: Nutch > Type: New Feature > Environment: Nutch > Reporter: Christophe Noel > Attachments: analyze-french.zip > > This is DRAFT for a new plugin for French Analysis (all java file come from > Lucene project sandbox)... This includes ISO LATIN1 accent filter, plurial > forms removing, ... > Analyze-frech should be used instead of NutchDocumentAnalysis as described by > Jerome Charron in New Language Identifier project. It should be used also as > a query-parser in Nutch searcher. > We miss an EXTENSION-POINT to include this kind of plugin in Nutch. Could > anyone help me to build this new Extension Point please ? -- This message is automatically generated by JIRA. - If you think it was sent incorrectly contact one of the administrators: http://issues.apache.org/jira/secure/Administrators.jspa - For more information on JIRA, see: http://www.atlassian.com/software/jira