[ https://issues.apache.org/jira/browse/SOLR-2921?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13234855#comment-13234855 ]
Robert Muir commented on SOLR-2921: ----------------------------------- Patch looks good: i think you should commit it and I'll follow up with the other ones. only one nitpick: {noformat} -/** +/** * Factory for {@link TurkishLowerCaseFilter}. * <pre class="prettyprint" > * <fieldType name="text_trlwr" class="solr.TextField" positionIncrementGap="100"> - * <analyzer> - * <tokenizer class="solr.StandardTokenizerFactory"/> - * <filter class="solr.TurkishLowerCaseFilterFactory"/> - * </analyzer> - * </fieldType></pre> + * <analyzer> + * <tokenizer class="solr.StandardTokenizerFactory"/> + * <filter class="solr.TurkishLowerCaseFilterFactory"/> + * </analyzer> + * </fieldType></pre> + * {noformat} Did your IDE do this? I don't think we should lose the indentation of the example there. > Make any Filters, Tokenizers and CharFilters implement > MultiTermAwareComponent if they should > --------------------------------------------------------------------------------------------- > > Key: SOLR-2921 > URL: https://issues.apache.org/jira/browse/SOLR-2921 > Project: Solr > Issue Type: Improvement > Components: Schema and Analysis > Affects Versions: 3.6, 4.0 > Environment: All > Reporter: Erick Erickson > Assignee: Erick Erickson > Priority: Minor > Attachments: SOLR-2921-3x.patch, SOLR-2921-3x.patch > > > SOLR-2438 creates a new MultiTermAwareComponent interface. This allows Solr > to automatically assemble a "multiterm" analyzer that does the right thing > vis-a-vis transforming the individual terms of a multi-term query at query > time. Examples are: lower casing, folding accents, etc. Currently > (27-Nov-2011), the following classes implement MultiTermAwareComponent: > * ASCIIFoldingFilterFactory > * LowerCaseFilterFactory > * LowerCaseTokenizerFactory > * MappingCharFilterFactory > * PersianCharFilterFactory > When users put any of the above in their query analyzer, Solr will "do the > right thing" at query time and the perennial question users have, "why didn't > my wildcard query automatically lower-case (or accent fold or....) my terms?" > will be gone. Die question die! > But taking a quick look, for instance, at the various FilterFactories that > exist, there are a number of possibilities that *might* be good candidates > for implementing MultiTermAwareComponent. But I really don't understand the > correct behavior here well enough to know whether these should implement the > interface or not. And this doesn't include other CharFilters or Tokenizers. > Actually implementing the interface is often trivial, see the classes above > for examples. Note that LowerCaseTokenizerFactory returns a *Filter*, which > is the right thing in this case. > Here is a quick cull of the Filters that, just from their names, might be > candidates. If anyone wants to take any of them on, that would be great. If > all you can do is provide test cases, I could probably do the code part, just > let me know. > ArabicNormalizationFilterFactory > GreekLowerCaseFilterFactory > HindiNormalizationFilterFactory > ICUFoldingFilterFactory > ICUNormalizer2FilterFactory > ICUTransformFilterFactory > IndicNormalizationFilterFactory > ISOLatin1AccentFilterFactory > PersianNormalizationFilterFactory > RussianLowerCaseFilterFactory > TurkishLowerCaseFilterFactory -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira --------------------------------------------------------------------- To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org