On 6/28/07, Robert Young <[EMAIL PROTECTED]> wrote: > Hi, > > Are the Nutch Stemming modifications available as a patch? I can't > seem to find anything on issue.apache.org
There is some sort of stemming for German and French languages (available as plugin analysis-de and analysis-fr). I don't know how well they work (or if they work). AFAIK, there is no support for stemming English. Btw, I think we should revise nutch's document analysis system. For example, analyzers for index-basic's fields are hard-coded in analysis package (what happens if I don't use index-basic and use my own index-mind-blowingly-awesome plugin?) . You either have to use all of it or completely override it and use none of it. We should allow index plugins to specify their analyzers per field. There are analysis-* plugins but they work for documents in specific languages (what if I don't want to use language identification? what if nutch can't figure out what the language is?) Index plugins should also be able control how stuff like their field's length norm is calculated (which currently is hard coded too and can't be changed). Oh and, if you are feeling up to it, any help in this area would be much appreciated :). > > Thanks > Rob > -- Doğacan Güney ------------------------------------------------------------------------- This SF.net email is sponsored by DB2 Express Download DB2 Express C - the FREE version of DB2 express and take control of your XML. No limits. Just data. Click to get it now. http://sourceforge.net/powerbar/db2/ _______________________________________________ Nutch-general mailing list [email protected] https://lists.sourceforge.net/lists/listinfo/nutch-general
