Re: Ability to specify 2 different query analyzers for same indexed field in Solr

Jan Høydahl Tue, 05 Mar 2013 14:40:12 -0800

Hi,

Please have a look at 
http://nolanlawson.com/2012/10/31/better-synonym-handling-in-solr/ and a 
working plugin to Solr to deboost the expanded synonyms. The plugin code 
currently lacks ability to configure different dictionaries for each field, but 
that could be added. Also see SOLR-4381 for eventual inclusion in Solr.


--
Jan Høydahl, search solution architect
Cominvent AS - www.cominvent.com
Solr Training - www.solrtraining.com

5. mars 2013 kl. 17:26 skrev Tom Burton-West <[email protected]>:

> Thanks Erick,
> 
> Payloads might work but I'm looking at a more general problem
> 
> Here is another use case:
> 
> We have a mix of Traditional and Simplified Chinese documents indexed in the 
> same OCR field.  
>  When a user searches using Traditional Chinese, I would like to also search 
> in Simplified Chinese, but rank the results matching Traditional Chinese 
> higher.   Similarly, if a user enters a query in Simplified Chinese, I want 
> to also search in Traditional Chinese but rank matches of the Simplified 
> Chinese query terms higher.
> 
> Since it is not always possible to determine whether a short query is in 
> Simplified or Traditional Chinese here is what I would like to do.
> 
> 1) Convert the query to Traditional Chinese
> 2) Convert the query to Simplified Chinese
> (One of these two steps would not be necessary if I could reliably determine 
> the nature of the query)
> 
> q1=QueryAsEntered^10 OR QueryTraditional^1 OR QuerySimplifed^1.
> 
> Again, this could be done with copy fields, but that would increase my index 
> size too much.  What I really want to be able to do is to query the same 
> index (i.e. document as created ) with the user's query processed/analyzed in 
> 3 different ways.
> 
> I could do this myself in the app layer, but I would really like to be able 
> to use Solr.
> 
> 
> Tom
> 
> 
> 
> On Mon, Mar 4, 2013 at 8:19 PM, Erick Erickson <[email protected]> 
> wrote:
> Tom:
> 
> I wonder if you could do something with payloads here. Index all terms with 
> payloads of 10, but synonyms with 1?
> 
> Random thought off the top of my head.
> 
> Erick
> 
> 
>     <analyzer type=index>
>    <tokenizer class="solr.StandardTokenizerFactory"/>
>   <filter class="solr.LowerCaseFilterFactory"/>
> </analyzer>
> <fieldType name="plain">
>     <analyzer type=query>
>    <tokenizer class="solr.StandardTokenizerFactory"/>
>   <filter class="solr.LowerCaseFilterFactory"/>
> </analyzer>
> 
> <fieldType name="syn">
>     <analyzer type=index>
>    <tokenizer class="solr.StandardTokenizerFactory"/>
>   <filter class="solr.LowerCaseFilterFactory"/>
> </analyzer>
> <fieldType name="plain">
>     <analyzer type=query>
>    <tokenizer class="solr.StandardTokenizerFactory"/>
>    <filter class="solr.SynonymFilterFactory" synonyms="synonyms.txt" 
> ignoreCase="true" expand="true"/>
>   <filter class="solr.LowerCaseFilterFactory"/>
> </analyzer>
> <copyField source="plain" dest="syn"/>
> 
> On Mon, Mar 4, 2013 at 4:43 PM, Jack Krupansky <[email protected]> 
> wrote:
> Please clarify, and try providing a couple more use cases. I mean, the case 
> you provided suggests that the contents of the index will be different 
> between the two fields, while you told us that you wanted to share the same 
> indexed field. In other words, it sounds like you will have two copies of 
> similar data anyway.
>  
> Maybe you simply want one copy of the stored value for the field and then 
> have one or more copyfields that index the same source data differently, but 
> don’t re-store the copied source data.
> 
> -- Jack Krupansky
>  
> From: Tom Burton-West
> Sent: Monday, March 04, 2013 3:57 PM
> To: [email protected]
> Subject: Ability to specify 2 different query analyzers for same indexed 
> field in Solr
>  
> Hello,
>  
> We would like to be able to specify two different fields that both use the 
> same indexed field but use different analyzers.   An example use-case for 
> this might be doing query-time synonym expansion with the synonyms weighted 
> lower than an exact match.  
>  
> q=exact_field^10 OR synonyms^1
>  
> The normal way to do this in Solr, which is just to set up separate analyzer 
> chains and use a copyfield, will not work for us because the field in 
> question is huge.  It is about 7 TB of OCR.
>  
> Is there a way to do this currently in Solr?   If not ,
>  
> 1) should I open a JIRA issue?
> 2) can someone point me towards the part of the code I might need to modify?
>  
> Tom
>  
> Tom Burton-West
> Information Retrieval Programmer
> Digital Library Production Service
> University of Michigan Library
> http://www.hathitrust.org/blogs/large-scale-search
>  
>  
> 
> 
>

Re: Ability to specify 2 different query analyzers for same indexed field in Solr

Reply via email to