Re: [PATCH] Bug on BrazilianAnalyzer

Michael McCandless Tue, 02 Dec 2008 04:06:04 -0800


Rafael,

Could you work these changes into a patch, add a test case, and opena Jira issue? Maybe first make the simple fixes (removing final,moving LowerCaseFilter up in the chain), and then as a 2nd issue thisdeeper refactoring of all StemFilters? Thanks.

I agree the original issue (LowerCaseFilter coming after StopFilter)is a bug, though does the BrazilianStemFilter mind if all tokenscoming it are now lowercased (I would assume not)?


Mike

Adriano Crestani wrote:

Hi Rafael,
I kind of agree with you. Practically all the StemFilters have thesame logic, they might be combined into only one class. AllStemFilters seems to have a setStemmer already, we could keep thatand also allow to pass the stemmer as a constructor paramenter, likeyou said. I think you can create a JIRA and submit a patch forthat, let's see what the lucene member will think about it :)
Now, about the BrazilianAnalyzer being final, it's probably onlybecause they wanted to increase the runtime performance, as long asfinal classes are faster once the JVM does not need to check forsubclassing.
Best Regards,
Adriano Crestani Campos
On Fri, Nov 21, 2008 at 2:15 PM, Rafael Cunha de Almeida <[EMAIL PROTECTED]> wrote:
On Fri, 21 Nov 2008 16:46:30 -0200
Rafael Cunha de Almeida <[EMAIL PROTECTED]> wrote:

> On Mon, 17 Nov 2008 19:58:47 -0800
> "Adriano Crestani" <[EMAIL PROTECTED]> wrote:
>
> > Hi Rafael,
> >
> > What is your scenario?
> >
> > Maybe it was defined this way so it do not filter uppercasedstop words.> > Like, for example, the downcased word "se" is a stopword, butthe uppercased> > "SE" stands for "Sergipe", a brazilian state, so it should notbe filtered.
>
> Suppose you are right, but passing it through the LowerCaseFiltercan
> be useful too, specially if you don't care much about those corner
> cases (the GermanAnalyzer, for instance, passes through
> LowerCaseFilter first). The class being final doesn't allow toinherit> from it and make the changes if one needs to, which isunfortunate :-(.
>
> I would like to see a change in this whole stemmer's and language
> analyzer's API in order to make it more flexible and extensible. The
> way it is you have to use them in that predeterminaded way.
>
> It would be nice if there was only one StemFilter, a Stemmerinterface> and all Stemmers were subclasses of that. Then, the StemFiltershould
> get its Stemmer as a constructor parameter. I see no reason for
> BrazilianAnalyzer to be public.

To be final, sorry. I was a bit tired when I wrote all that.

> Are you interested in those kind of changes? Do you agree with them?

---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

Re: [PATCH] Bug on BrazilianAnalyzer

Reply via email to