[jira] [Commented] (SOLR-2921) Make any Filters, Tokenizers and CharFilters implement MultiTermAwareComponent if they should

Robert Muir (Commented) (JIRA) Wed, 21 Mar 2012 11:54:02 -0700

    [ 
https://issues.apache.org/jira/browse/SOLR-2921?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13234855#comment-13234855
 ]


Robert Muir commented on SOLR-2921:
-----------------------------------

Patch looks good: i think you should commit it and I'll follow up with the 
other ones.

only one nitpick:
{noformat}
-/** 
+/**
  * Factory for {@link TurkishLowerCaseFilter}.
  * <pre class="prettyprint" >
  * &lt;fieldType name="text_trlwr" class="solr.TextField" 
positionIncrementGap="100"&gt;
- *   &lt;analyzer&gt;
- *     &lt;tokenizer class="solr.StandardTokenizerFactory"/&gt;
- *     &lt;filter class="solr.TurkishLowerCaseFilterFactory"/&gt;
- *   &lt;/analyzer&gt;
- * &lt;/fieldType&gt;</pre> 
+ * &lt;analyzer&gt;
+ * &lt;tokenizer class="solr.StandardTokenizerFactory"/&gt;
+ * &lt;filter class="solr.TurkishLowerCaseFilterFactory"/&gt;
+ * &lt;/analyzer&gt;
+ * &lt;/fieldType&gt;</pre>
+ *
{noformat}

Did your IDE do this? I don't think we should lose the indentation of the 
example there.

                
> Make any Filters, Tokenizers and CharFilters implement 
> MultiTermAwareComponent if they should
> ---------------------------------------------------------------------------------------------
>
>                 Key: SOLR-2921
>                 URL: https://issues.apache.org/jira/browse/SOLR-2921
>             Project: Solr
>          Issue Type: Improvement
>          Components: Schema and Analysis
>    Affects Versions: 3.6, 4.0
>         Environment: All
>            Reporter: Erick Erickson
>            Assignee: Erick Erickson
>            Priority: Minor
>         Attachments: SOLR-2921-3x.patch, SOLR-2921-3x.patch
>
>
> SOLR-2438 creates a new MultiTermAwareComponent interface. This allows Solr 
> to automatically assemble a "multiterm" analyzer that does the right thing 
> vis-a-vis transforming the individual terms of a multi-term query at query 
> time. Examples are: lower casing, folding accents, etc. Currently 
> (27-Nov-2011), the following classes implement MultiTermAwareComponent:
>  * ASCIIFoldingFilterFactory
>  * LowerCaseFilterFactory
>  * LowerCaseTokenizerFactory
>  * MappingCharFilterFactory
>  * PersianCharFilterFactory
> When users put any of the above in their query analyzer, Solr will "do the 
> right thing" at query time and the perennial question users have, "why didn't 
> my wildcard query automatically lower-case (or accent fold or....) my terms?" 
> will be gone. Die question die!
> But taking a quick look, for instance, at the various FilterFactories that 
> exist, there are a number of possibilities that *might* be good candidates 
> for implementing MultiTermAwareComponent. But I really don't understand the 
> correct behavior here well enough to know whether these should implement the 
> interface or not. And this doesn't include other CharFilters or Tokenizers.
> Actually implementing the interface is often trivial, see the classes above 
> for examples. Note that LowerCaseTokenizerFactory returns a *Filter*, which 
> is the right thing in this case.
> Here is a quick cull of the Filters that, just from their names, might be 
> candidates. If anyone wants to take any of them on, that would be great. If 
> all you can do is provide test cases, I could probably do the code part, just 
> let me know.
> ArabicNormalizationFilterFactory
> GreekLowerCaseFilterFactory
> HindiNormalizationFilterFactory
> ICUFoldingFilterFactory
> ICUNormalizer2FilterFactory
> ICUTransformFilterFactory
> IndicNormalizationFilterFactory
> ISOLatin1AccentFilterFactory
> PersianNormalizationFilterFactory
> RussianLowerCaseFilterFactory
> TurkishLowerCaseFilterFactory

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[jira] [Commented] (SOLR-2921) Make any Filters, Tokenizers and CharFilters implement MultiTermAwareComponent if they should

Reply via email to