Re: New Lucene features and Solr indexes

Jack Krupansky Sat, 16 Feb 2013 11:59:30 -0800

It seems as if you are using the text field analyzer to "clean up" or"normalize" the values for that field, but generally an analyzer is mappingfrom source terms to index terms, with the expectation that the indexterm(s) may be radically different from the source terms, and generally,tokenizing the input stream as well.

Maybe this is simply a question of best practices for using analyzers for"analysis" as opposed to the cleanup/normalization that an update processorwould normally do. In other words, situations where the analyzer is used asa poor man's update processor for what otherwise would/should be simplestring fields.


-- Jack Krupansky

-----Original Message-----From: Shawn Heisey

Sent: Saturday, February 16, 2013 11:43 AM
To: dev@lucene.apache.org
Subject: Re: New Lucene features and Solr indexes

2/14/2013 8:26 AM, Adrien Grand wrote:

This suggests that adding docvalues to the uniqueKey field would be agood

idea for distributed searching in general, since the first phase of a
distributed search only retrieves that field and score.  That assumes of
course that the docvalues are fully utilized for retrieving fields during
that initial phase.


Right, this would likely improve performance given than doc values
(even if disk-based) are more likely to be in memory than stored
fields. Another (better?) approach would be to use the internal Lucene
doc IDs for distributed search (I assumed there was an open JIRA issue
to do that but I can't find it).


Related to this ... I have been watching SOLR-3855.  I notice that
TextField is not listed on the supported types.  Is that likely to
change in the future, or is there a fundamental issue there?

My uniqueKey field uses the following fieldType definition:

    <!-- lowercases the entire field value -->
    <fieldType name="lowercase" class="solr.TextField"
sortMissingLast="true" positionIncrementGap="0" omitNorms="true">
      <analyzer>
        <tokenizer class="solr.KeywordTokenizerFactory"/>
        <filter class="solr.ICUFoldingFilterFactory"/>
        <filter class="solr.TrimFilterFactory"/>
      </analyzer>
    </fieldType>

I'm about 95% sure that the source value from MySQL will never contain
lowercase characters and probably does not actually need to be trimmed,
but we want to be able to search when an uppercase value is entered.
Would I have to give up that capability to get docvalues on this field?
 Does the current SOLR-3855 patch take advantage of docvalues for the
first phase of a distributed search when they are present, as we
discussed earlier?

Thanks,
Shawn


---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org

For additional commands, e-mail: dev-h...@lucene.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

Re: New Lucene features and Solr indexes

Reply via email to