Re: [PATCH] Bug on BrazilianAnalyzer

Adriano Crestani Mon, 17 Nov 2008 19:59:19 -0800

Hi Rafael,

What is your scenario?


Maybe it was defined this way so it do not filter uppercased stop words.
Like, for example, the downcased word "se" is a stopword, but the uppercased
"SE" stands for "Sergipe", a brazilian state, so it should not be filtered.

Best Regards,
Adriano Crestani

On Mon, Nov 17, 2008 at 3:39 PM, Rafael Cunha de Almeida <
[EMAIL PROTECTED]> wrote:

> Following is the patch for what I think is a bug on the
> BrazilianAnalyzer. The default stopwords list is all in lowercase, so
> it will only work if the LowerCaseFilter comes first of if the
> StopWordFilter is set to ignore case.
>
> Since the LowerCaseFilter is instantiated anyway I just changed its
> order. If there's some problem with that order, then please consider
> setting StopWordFilter to ignore case.
>
> Index: BrazilianAnalyzer.java
> ===================================================================
> --- BrazilianAnalyzer.java      (revision 718407)
> +++ BrazilianAnalyzer.java      (working copy)
> @@ -131,10 +131,9 @@
>        public final TokenStream tokenStream(String fieldName, Reader
> reader) { TokenStream result = new StandardTokenizer( reader );
>                result = new StandardFilter( result );
> +               result = new LowerCaseFilter( result );
>                result = new StopFilter( result, stoptable );
>                result = new BrazilianStemFilter( result, excltable );
> -               // Convert to lowercase after stemming!
> -               result = new LowerCaseFilter( result );
>                return result;
>        }
>  }
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: [EMAIL PROTECTED]
> For additional commands, e-mail: [EMAIL PROTECTED]
>
>

Re: [PATCH] Bug on BrazilianAnalyzer

Reply via email to