Chaiyasit (Sit) Manovit created SOLR-5362:
---------------------------------------------

             Summary: SolrCell's order of field operation with lowernames=true
                 Key: SOLR-5362
                 URL: https://issues.apache.org/jira/browse/SOLR-5362
             Project: Solr
          Issue Type: Improvement
          Components: contrib - Solr Cell (Tika extraction)
            Reporter: Chaiyasit (Sit) Manovit


This follows from SOLR-1634.

I am not sure if SOLR-1856 completely fixes SOLR-1634, particularly when 
{{lowernames=true}} comes in to the picture. Consider a case where:

1. Tika generated field {{Category=Foo}} for a doc (e.g., this comes from 
user-defined document properties).

2. {{literalsOverride=true}}.

3. {{lowernames=true}}.

4. User supplied {{literal.category=bar}}.

According to the 
[rules|http://wiki.apache.org/solr/ExtractingRequestHandler#Order_of_field_operations],
 {{literalsOverride}} is applied before {{lowernames}} and, thus, will have no 
effect here since the field {{Category}} from Tika and {{literal.category}} are 
considered different fields at this stage before {{lowernames=true}} kicks in. 
And when {{lowernames=true}} kicks in, it has the effect of merging 
{{Category}} into {{category}}, giving it both values {{Foo}} and {{bar}}.

Adding {{fmap.Category=tika_category}} does not help because {{fmap}} is 
applied even later, by that time {{category}} already contains both {{Foo}} and 
{{bar}}.

Adding {{fmap.Category=tika_category}} *and* with {{lowernames=false}} would do 
(regardless of {{literalsOverride}}), but what if we need {{lowernames=true}} 
and what if the capitalization of {{Category}} can vary (e.g., {{CATEGORY}}).

Would it make sense to have an option to apply the rules in the order that they 
are specified in the config file or URL params rather than always in a static 
order?

Thanks.

PS. Marking this as Major because there seems to be no easy workaround 
(condition for Minor).

------------------------

Response from Jan Høydahl 
([link|https://issues.apache.org/jira/browse/SOLR-1634?focusedCommentId=13797273&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-13797273]):

bq. To me it sounds like a potential, very simple solution would be to apply 
lowercasing at several places if {{lowernames=true}}

Agreed. Particularly, to apply {{lowernames=true}} as soon as Tika has 
extracted a field, before {{literalsOverride}} is even considered.



--
This message was sent by Atlassian JIRA
(v6.1#6144)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to