Re: search with accent not match

Mark Miller Wed, 06 Aug 2008 06:30:32 -0700

You certainly can - just create your own Analyzer starting with a copyof the French one you are using.


Then you just plug in the filter in the order you want it applied:


result = new ISOLatin1AccentFilter(result);

You have to decide for yourself where it will come - if you put itbefore the stopword step, more stops words might be removed than if itwas after - that type of thing usually comes down to individualrequirements/filter limitations. If your stopword list has diacriticalsand you run the accent filter before applying the stopword list, someexpected stopwords will never be removed...etc.



Christophe from paris wrote:

Actualy in my FrenchAnalyser

i have :

 TokenStream result = new StandardTokenizer(reader);
    result = new StandardFilter(result);
    result = new StopFilter(result, stoptable);
    result = new FrenchStemFilter(result, excltable);
    result = new LowerCaseFilter(result);


I can use ISOLatin1AccentFilter in this Class for indexing ans search ?
And it is the case where ?


markrmiller wrote:

Check out org.apache.lucene.analysis.ISOLatin1AccentFilter

It will strip diacritics - just be sure to use it at index time andquery time to get what you want. Also, you will no longer be able todifferentiate between the two in your searching (rarely that importantin my opinion, but others certainly disagree).


- Mark

Christophe from paris wrote:

Hello

I'm use FrenchAnalyzer for index

IndexWriter writer = new IndexWriter(pathOfIndex, new FrenchAnalyzer(),
true);
Document = new Document();
doc.add(new
Field("TXT_CHARACT_VALUE",word.toLowerCase(),Field.Store.YES,Field.Index.TOKENIZED));
writer.addDocument(doc);

And search

IndexReader reader = IndexReader.open(pathOfIndex);                     
Searcher searcher = new IndexSearcher(reader);
Analyzer analyzer = new FrenchAnalyzer();                                       
        
QueryParser parser = new QueryParser(field, analyzer);                          
        
Query query = parser.parse(motRecherche);
Hits hits = searcher.search(query);

in my document i have the word "lumiere" and "lumière"

when i search lumière only document match lumière but "lumiere" is not
return

and if search "lumiere" the result is lumiere, lumieres ,lumiére,lumiéres
but not lumière

for a total match i must search "lumiere OR limière"

but is not the best solution

---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

Re: search with accent not match

Reply via email to