Hi Michael, Questions about Solr should go to the Solr user mailing list, rather than this list, which is for Lucene users - see <http://lucene.apache.org/solr/discussion.html> for how to subscribe.
I’ve never heard of ASCIIFoldingExpansionFilterFactory, but ASCIIFoldingFilterFactory has a new option “preserveOriginal”, introduced in Lucene/Solr 4.7 by LUCENE-5437 <https://issues.apache.org/jira/browse/LUCENE-5437>, that should do the trick. Just add preserveOriginal=“true” - see the example in the javadocs (if you copy/paste it, make sure you change the attribute value from “false”, as it is in the example, to “true”): <http://lucene.apache.org/core/4_8_1/analyzers-common/org/apache/lucene/analysis/miscellaneous/ASCIIFoldingFilterFactory.html> Note that as Ahmet Arslan points out on LUCENE-5437, though, queries that generate multiple terms (e.g. prefix and regex queries) will trigger a failure. You can work around this problem by defining both “index" and “query" analyzer types for the fieldtype you use with this field, and only use preserveOriginal=“true” on the “index” analyzer type. See this page on the Solr Reference Guide for more info about analyzers in Solr: <https://cwiki.apache.org/confluence/display/solr/What+Is+An+Analyzer%3F>. Steve On Jun 5, 2014, at 8:05 PM, Michael Tobias <mich...@tobias.org.uk> wrote: > Hi there > > I am a relative newbie Solr user so please be gentle with me. > > I am experimenting with various phonetic filters and the tokens created can > vary depending on whether the words contain diacritical characters. > > My problem is that the documents being indexed are not always consistent in > terms of the use of diacritics (sometimes the same word can have diacritics > and sometimes not) and of course when users submit queries they may or may > not use diacritics properly. > > If I wasn't trying to use phonetic matching I would simply use the > ASCIIFoldingFilterFactory to remove any problem characters and match on > that. > > What I would like to do is create phonetic tokens for both the > diacritic-version of the word and the folded-version of the word - but I > would like to store the tokens in a single phonetic field for querying > purposes..... > > How can I achieve that???? > > I did find a few references online to "ASCIIFoldingExpansionFilterFactory" > which appears to do what I want - when creating the 'folded' version of a > word it appears to keep the diacritic version too. I could then apply my > phonetic filter to those expanded tokens. > > Is there any other way to do this? Or if ASCIIFoldingExpansionFilterFactory > is the only or easiest way to do the job can somebody tell me HOW to > incorporate that into my Solr setup???? > > Many thanks!! > > Michael > > > --------------------------------------------------------------------- > To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org > For additional commands, e-mail: java-user-h...@lucene.apache.org > --------------------------------------------------------------------- To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org For additional commands, e-mail: java-user-h...@lucene.apache.org