Doğacan Güney wrote: > On 6/28/07, Robert Young <[EMAIL PROTECTED]> wrote: >> Hi, >> >> Are the Nutch Stemming modifications available as a patch? I can't >> seem to find anything on issue.apache.org > > There is some sort of stemming for German and French languages > (available as plugin analysis-de and analysis-fr). I don't know how > well they work (or if they work). AFAIK, there is no support for > stemming English. There is PorterStemmer in lucene, but is not used in nutch. You can easily add this by overriding NutchDocumentAnalyzer.
> > Btw, I think we should revise nutch's document analysis system. For > example, analyzers for index-basic's fields are hard-coded in analysis > package (what happens if I don't use index-basic and use my own > index-mind-blowingly-awesome plugin?) . You either have to use all of > it or completely override it and use none of it. We should allow index > plugins to specify their analyzers per field. There are analysis-* > plugins but they work for documents in specific languages (what if I > don't want to use language identification? what if nutch can't figure > out what the language is?) I strongly agree. Index-* plugins and analysis-* plugins are cross dependent. For every new field added by the indexing plugins, ALL the analysis plugins should be changed to analyze this new field, which brakes the golden rule. I agree with the idea that index plugins should specify their analyzers. > > Index plugins should also be able control how stuff like their field's > length norm is calculated (which currently is hard coded too and can't > be changed). > > Oh and, if you are feeling up to it, any help in this area would be > much appreciated :). > >> >> Thanks >> Rob >> > > ------------------------------------------------------------------------- This SF.net email is sponsored by DB2 Express Download DB2 Express C - the FREE version of DB2 express and take control of your XML. No limits. Just data. Click to get it now. http://sourceforge.net/powerbar/db2/ _______________________________________________ Nutch-general mailing list [email protected] https://lists.sourceforge.net/lists/listinfo/nutch-general
