LowerCaseFilter should be able to be configured to use a specific locale.
-------------------------------------------------------------------------
Key: LUCENE-1581
URL: https://issues.apache.org/jira/browse/LUCENE-1581
Project: Lucene - Java
Issue Type: Improvement
Reporter: Digy
//Since I am a .Net programmer, Sample codes will be in c# but I don't think
that it would be a problem to understand them.
//
Assume an input text like "İ" and and analyzer like below
{code}
public class SomeAnalyzer : Analyzer
{
public override TokenStream TokenStream(string fieldName,
System.IO.TextReader reader)
{
TokenStream t = new SomeTokenizer(reader);
t = new Lucene.Net.Analysis.ASCIIFoldingFilter(t);
t = new LowerCaseFilter(t);
return t;
}
}
{code}
ASCIIFoldingFilter will return "I" and after, LowerCaseFilter will return
"i" (if locale is "en-US")
or
"ı' if(locale is "tr-TR") (that means,this token should be input to
another instance of ASCIIFoldingFilter)
So, calling LowerCaseFilter before ASCIIFoldingFilter would be a solution, but
a better approach can be adding
a new constructor to LowerCaseFilter and forcing it to use a specific locale.
{code}
public sealed class LowerCaseFilter : TokenFilter
{
/* +++ */System.Globalization.CultureInfo CultureInfo =
System.Globalization.CultureInfo.CurrentCulture;
public LowerCaseFilter(TokenStream in) : base(in)
{
}
/* +++ */ public LowerCaseFilter(TokenStream in,
System.Globalization.CultureInfo CultureInfo) : base(in)
/* +++ */ {
/* +++ */ this.CultureInfo = CultureInfo;
/* +++ */ }
public override Token Next(Token result)
{
result = Input.Next(result);
if (result != null)
{
char[] buffer = result.TermBuffer();
int length = result.termLength;
for (int i = 0; i < length; i++)
/* +++ */ buffer[i] =
System.Char.ToLower(buffer[i],CultureInfo);
return result;
}
else
return null;
}
}
{code}
DIGY
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]