[ https://issues.apache.org/jira/browse/SOLR-3235?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Johannes Brucher updated SOLR-3235: ----------------------------------- Fix Version/s: 3.4 Description: If you use the following schema.xml entrie: <fieldType name="contenttype" class="solr.TextField" multiValued="true" omitNorms="true"> <analyzer type="index"> <tokenizer class="solr.KeywordTokenizerFactory"/> <filter class="solr.SynonymFilterFactory" synonyms="synonyms.txt" ignoreCase="true" expand="false"/> </analyzer> <analyzer type="query"> <tokenizer class="solr.KeywordTokenizerFactory"/> </analyzer> </fieldType> With a synonym list having such entrie: text/html;\ charset=ISO-8859-1 => html Solr 3.4 and 3.5 can't handle the whitespace between "html;" and "charset" and no synonym substitution is processed. The same config works find in Solr 3.3. No exception or error is thrown. This is my first jira ticket, so if I mist something let me know... Regrads Johannes Edit: Ok found the solution for that problem. Provide the following: <filter class="solr.SynonymFilterFactory" synonyms="synonyms.txt" ignoreCase="true" expand="false" tokenizerFactory="solr.KeywordTokenizerFactory" /> As tokenizerFactory you should use "solr.KeywordTokenizerFactory" instead of "solr.WhitespaceTokenizerFactory". See the javadocs for more details: https://builds.apache.org/job/Solr-trunk/javadoc/org/apache/solr/analysis/SynonymFilterFactory.html was: If you use the following schema.xml entrie: <fieldType name="contenttype" class="solr.TextField" multiValued="true" omitNorms="true"> <analyzer type="index"> <tokenizer class="solr.KeywordTokenizerFactory"/> <filter class="solr.SynonymFilterFactory" synonyms="synonyms.txt" ignoreCase="true" expand="false"/> </analyzer> <analyzer type="query"> <tokenizer class="solr.KeywordTokenizerFactory"/> </analyzer> </fieldType> With a synonym list having such entrie: text/html;\ charset=ISO-8859-1 => html Solr 3.4 and 3.5 can't handle the whitespace between "html;" and "charset" and no synonym substitution is processed. The same config works find in Solr 3.3. No exception or error is thrown. This is my first jira ticket, so if I mist something let me know... Regrads Johannes Issue Type: New Feature (was: Bug) > Whitespace issue in synonym list > -------------------------------- > > Key: SOLR-3235 > URL: https://issues.apache.org/jira/browse/SOLR-3235 > Project: Solr > Issue Type: New Feature > Components: Schema and Analysis > Affects Versions: 3.4, 3.5 > Environment: Windows 7 > Firefox 10.0.2 > Solr example (start.jar) > Reporter: Johannes Brucher > Labels: synonyms > Fix For: 3.4 > > > If you use the following schema.xml entrie: > <fieldType name="contenttype" class="solr.TextField" multiValued="true" > omitNorms="true"> > <analyzer type="index"> > <tokenizer class="solr.KeywordTokenizerFactory"/> > <filter class="solr.SynonymFilterFactory" > synonyms="synonyms.txt" ignoreCase="true" expand="false"/> > </analyzer> > <analyzer type="query"> > <tokenizer class="solr.KeywordTokenizerFactory"/> > </analyzer> > </fieldType> > With a synonym list having such entrie: > text/html;\ charset=ISO-8859-1 => html > Solr 3.4 and 3.5 can't handle the whitespace between "html;" and "charset" > and no synonym substitution is processed. The same config works find in Solr > 3.3. > No exception or error is thrown. > This is my first jira ticket, so if I mist something let me know... > Regrads > Johannes > Edit: Ok found the solution for that problem. Provide the following: > <filter class="solr.SynonymFilterFactory" synonyms="synonyms.txt" > ignoreCase="true" expand="false" > tokenizerFactory="solr.KeywordTokenizerFactory" /> > As tokenizerFactory you should use "solr.KeywordTokenizerFactory" instead of > "solr.WhitespaceTokenizerFactory". > See the javadocs for more details: > https://builds.apache.org/job/Solr-trunk/javadoc/org/apache/solr/analysis/SynonymFilterFactory.html -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira --------------------------------------------------------------------- To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org