[ 
http://issues.apache.org/jira/browse/NUTCH-74?page=comments#action_12316094 ] 

Jerome Charron commented on NUTCH-74:
-------------------------------------

Christophe,

I already done such plugin for French and German in order to test the Analyzer 
Factory. The difference with your approach is that instead of copying the 
luncene's analyzer code, I added some dependencies on the lucene libs. I think 
it is a better approach since it avoids to duplicate the code.
I added an analysis extension point too in order to plug the analysis plugins.
But for now, these plugins are called by the AnalysisFactory depending on the 
language identifier result. And as I explained in a previous mail, the language 
identifier failed (bad language identification) due to an enconding problem in 
Nutch. I'm currently working on this issue, and I can't submit my code in its 
current state.
But if you want, I can send you some parts of the code.

Regards

Jerome


> French Analyzer Plugin
> ----------------------
>
>          Key: NUTCH-74
>          URL: http://issues.apache.org/jira/browse/NUTCH-74
>      Project: Nutch
>         Type: New Feature
>  Environment: Nutch
>     Reporter: Christophe Noel
>  Attachments: analyze-french.zip
>
> This is DRAFT for a new plugin for French Analysis (all java file come from 
> Lucene project sandbox)... This includes ISO LATIN1 accent filter, plurial 
> forms removing, ...
> Analyze-frech should be used instead of NutchDocumentAnalysis as described by 
> Jerome Charron in New Language Identifier project. It should be used also as 
> a query-parser in Nutch searcher.
> We miss an EXTENSION-POINT to include this kind of plugin in Nutch. Could 
> anyone help me to build this new Extension Point please ?

-- 
This message is automatically generated by JIRA.
-
If you think it was sent incorrectly contact one of the administrators:
   http://issues.apache.org/jira/secure/Administrators.jspa
-
For more information on JIRA, see:
   http://www.atlassian.com/software/jira

Reply via email to