Re: ASCIIFoldingFilterFactory

Steve Rowe Thu, 05 Jun 2014 17:49:22 -0700

Hi Michael,

Questions about Solr should go to the Solr user mailing list, rather than this 
list, which is for Lucene users - see 
<http://lucene.apache.org/solr/discussion.html> for how to subscribe.


I’ve never heard of ASCIIFoldingExpansionFilterFactory, but 
ASCIIFoldingFilterFactory has a new option “preserveOriginal”, introduced in 
Lucene/Solr 4.7 by LUCENE-5437 
<https://issues.apache.org/jira/browse/LUCENE-5437>, that should do the trick.

Just add preserveOriginal=“true” - see the example in the javadocs (if you 
copy/paste it, make sure you change the attribute value from “false”, as it is 
in the example, to “true”): 
<http://lucene.apache.org/core/4_8_1/analyzers-common/org/apache/lucene/analysis/miscellaneous/ASCIIFoldingFilterFactory.html>

Note that as Ahmet Arslan points out on LUCENE-5437, though, queries that 
generate multiple terms (e.g. prefix and regex queries) will trigger a failure. 
 You can work around this problem by defining both “index" and “query" analyzer 
types for the fieldtype you use with this field, and only use 
preserveOriginal=“true” on the “index” analyzer type.

See this page on the Solr Reference Guide for more info about analyzers in 
Solr: <https://cwiki.apache.org/confluence/display/solr/What+Is+An+Analyzer%3F>.

Steve

On Jun 5, 2014, at 8:05 PM, Michael Tobias <[email protected]> wrote:

> Hi there
> 
> I am a relative newbie Solr user so please be gentle with me.
> 
> I am experimenting with various phonetic filters and the tokens created can
> vary depending on whether the words contain diacritical characters.
> 
> My problem is that the documents being indexed are not always consistent in
> terms of the use of diacritics (sometimes the same word can have diacritics
> and sometimes not) and of course when users submit  queries they may or may
> not use diacritics properly.
> 
> If I wasn't trying to use phonetic matching I would simply use the
> ASCIIFoldingFilterFactory to remove any problem characters and match on
> that.
> 
> What I would like to do is create phonetic tokens for both the
> diacritic-version of the word and the folded-version of the word - but I
> would like to store the tokens in a single phonetic field for querying
> purposes.....
> 
> How can I achieve that????
> 
> I did find a few references online to "ASCIIFoldingExpansionFilterFactory"
> which appears to do what I want - when creating the 'folded' version of a
> word it appears to keep the diacritic version too. I could then apply my
> phonetic filter to those expanded tokens.
> 
> Is there any other way to do this?  Or if ASCIIFoldingExpansionFilterFactory
> is the only or easiest way to do the job can somebody tell me HOW to
> incorporate that into my Solr setup????
> 
> Many thanks!!
> 
> Michael
> 
> 
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: [email protected]
> For additional commands, e-mail: [email protected]
> 


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Re: ASCIIFoldingFilterFactory

Reply via email to