Hi, Please have a look at http://nolanlawson.com/2012/10/31/better-synonym-handling-in-solr/ and a working plugin to Solr to deboost the expanded synonyms. The plugin code currently lacks ability to configure different dictionaries for each field, but that could be added. Also see SOLR-4381 for eventual inclusion in Solr.
-- Jan Høydahl, search solution architect Cominvent AS - www.cominvent.com Solr Training - www.solrtraining.com 5. mars 2013 kl. 17:26 skrev Tom Burton-West <tburt...@umich.edu>: > Thanks Erick, > > Payloads might work but I'm looking at a more general problem > > Here is another use case: > > We have a mix of Traditional and Simplified Chinese documents indexed in the > same OCR field. > When a user searches using Traditional Chinese, I would like to also search > in Simplified Chinese, but rank the results matching Traditional Chinese > higher. Similarly, if a user enters a query in Simplified Chinese, I want > to also search in Traditional Chinese but rank matches of the Simplified > Chinese query terms higher. > > Since it is not always possible to determine whether a short query is in > Simplified or Traditional Chinese here is what I would like to do. > > 1) Convert the query to Traditional Chinese > 2) Convert the query to Simplified Chinese > (One of these two steps would not be necessary if I could reliably determine > the nature of the query) > > q1=QueryAsEntered^10 OR QueryTraditional^1 OR QuerySimplifed^1. > > Again, this could be done with copy fields, but that would increase my index > size too much. What I really want to be able to do is to query the same > index (i.e. document as created ) with the user's query processed/analyzed in > 3 different ways. > > I could do this myself in the app layer, but I would really like to be able > to use Solr. > > > Tom > > > > On Mon, Mar 4, 2013 at 8:19 PM, Erick Erickson <erickerick...@gmail.com> > wrote: > Tom: > > I wonder if you could do something with payloads here. Index all terms with > payloads of 10, but synonyms with 1? > > Random thought off the top of my head. > > Erick > > > <analyzer type=index> > <tokenizer class="solr.StandardTokenizerFactory"/> > <filter class="solr.LowerCaseFilterFactory"/> > </analyzer> > <fieldType name="plain"> > <analyzer type=query> > <tokenizer class="solr.StandardTokenizerFactory"/> > <filter class="solr.LowerCaseFilterFactory"/> > </analyzer> > > <fieldType name="syn"> > <analyzer type=index> > <tokenizer class="solr.StandardTokenizerFactory"/> > <filter class="solr.LowerCaseFilterFactory"/> > </analyzer> > <fieldType name="plain"> > <analyzer type=query> > <tokenizer class="solr.StandardTokenizerFactory"/> > <filter class="solr.SynonymFilterFactory" synonyms="synonyms.txt" > ignoreCase="true" expand="true"/> > <filter class="solr.LowerCaseFilterFactory"/> > </analyzer> > <copyField source="plain" dest="syn"/> > > On Mon, Mar 4, 2013 at 4:43 PM, Jack Krupansky <j...@basetechnology.com> > wrote: > Please clarify, and try providing a couple more use cases. I mean, the case > you provided suggests that the contents of the index will be different > between the two fields, while you told us that you wanted to share the same > indexed field. In other words, it sounds like you will have two copies of > similar data anyway. > > Maybe you simply want one copy of the stored value for the field and then > have one or more copyfields that index the same source data differently, but > don’t re-store the copied source data. > > -- Jack Krupansky > > From: Tom Burton-West > Sent: Monday, March 04, 2013 3:57 PM > To: dev@lucene.apache.org > Subject: Ability to specify 2 different query analyzers for same indexed > field in Solr > > Hello, > > We would like to be able to specify two different fields that both use the > same indexed field but use different analyzers. An example use-case for > this might be doing query-time synonym expansion with the synonyms weighted > lower than an exact match. > > q=exact_field^10 OR synonyms^1 > > The normal way to do this in Solr, which is just to set up separate analyzer > chains and use a copyfield, will not work for us because the field in > question is huge. It is about 7 TB of OCR. > > Is there a way to do this currently in Solr? If not , > > 1) should I open a JIRA issue? > 2) can someone point me towards the part of the code I might need to modify? > > Tom > > Tom Burton-West > Information Retrieval Programmer > Digital Library Production Service > University of Michigan Library > http://www.hathitrust.org/blogs/large-scale-search > > > > >