Thanks Jan, The blog post is very good, I didn't quite realize all those various pitfalls with synonyms.
I would still like the ability to specify two different query analysis chains with one index, rather than having to write a custom parser for each use case. For example the Traditional/Simplified Chinese use case in my previous message could probably be solved with a custom query parser along the lines of the synonym solution in the blog post but if there were a way to specify two different query analysis chains for the same indexed field, I would not have to write a custom query parser. Tom On Tue, Mar 5, 2013 at 5:39 PM, Jan Høydahl <jan....@cominvent.com> wrote: > Hi, > > Please have a look at > http://nolanlawson.com/2012/10/31/better-synonym-handling-in-solr/ and a > working plugin to Solr to deboost the expanded synonyms. The plugin code > currently lacks ability to configure different dictionaries for each field, > but that could be added. Also see SOLR-4381 for eventual inclusion in Solr. > > -- > Jan Høydahl, search solution architect > Cominvent AS - www.cominvent.com > Solr Training - www.solrtraining.com > > 5. mars 2013 kl. 17:26 skrev Tom Burton-West <tburt...@umich.edu>: > > Thanks Erick, > > Payloads might work but I'm looking at a more general problem > > Here is another use case: > > We have a mix of Traditional and Simplified Chinese documents indexed in > the same OCR field. > When a user searches using Traditional Chinese, I would like to also > search in Simplified Chinese, but rank the results matching Traditional > Chinese higher. Similarly, if a user enters a query in Simplified > Chinese, I want to also search in Traditional Chinese but rank matches of > the Simplified Chinese query terms higher. > > Since it is not always possible to determine whether a short query is in > Simplified or Traditional Chinese here is what I would like to do. > > 1) Convert the query to Traditional Chinese > 2) Convert the query to Simplified Chinese > (One of these two steps would not be necessary if I could reliably > determine the nature of the query) > > q1=QueryAsEntered^10 OR QueryTraditional^1 OR QuerySimplifed^1. > > Again, this could be done with copy fields, but that would increase my > index size too much. What I really want to be able to do is to query the > same index (i.e. document as created ) with the user's query > processed/analyzed in 3 different ways. > > I could do this myself in the app layer, but I would really like to be > able to use Solr. > > > Tom > > > > On Mon, Mar 4, 2013 at 8:19 PM, Erick Erickson <erickerick...@gmail.com>wrote: > >> Tom: >> >> I wonder if you could do something with payloads here. Index all terms >> with payloads of 10, but synonyms with 1? >> >> Random thought off the top of my head. >> >> Erick >> >> >>> <analyzer type=index> >>> <tokenizer class="solr.StandardTokenizerFactory"/> >>> <filter class="solr.LowerCaseFilterFactory"/> >>> </analyzer> >>> <fieldType name="plain"> >>> <analyzer type=query> >>> <tokenizer class="solr.StandardTokenizerFactory"/> >>> <filter class="solr.LowerCaseFilterFactory"/> >>> </analyzer> >>> >>> <fieldType name="syn"> >>> <analyzer type=index> >>> <tokenizer class="solr.StandardTokenizerFactory"/> >>> <filter class="solr.LowerCaseFilterFactory"/> >>> </analyzer> >>> <fieldType name="plain"> >>> <analyzer type=query> >>> <tokenizer class="solr.StandardTokenizerFactory"/> >>> <filter class="solr.SynonymFilterFactory" synonyms="synonyms.txt" >>> ignoreCase="true" expand="true"/> >>> <filter class="solr.LowerCaseFilterFactory"/> >>> </analyzer> >>> <copyField source="plain" dest="syn"/> >>> >>> On Mon, Mar 4, 2013 at 4:43 PM, Jack Krupansky >>> <j...@basetechnology.com>wrote: >>> >>>> Please clarify, and try providing a couple more use cases. I mean, >>>> the case you provided suggests that the contents of the index will be >>>> different between the two fields, while you told us that you wanted to >>>> share the same indexed field. In other words, it sounds like you will have >>>> two copies of similar data anyway. >>>> >>>> Maybe you simply want one copy of the stored value for the field and >>>> then have one or more copyfields that index the same source data >>>> differently, but don’t re-store the copied source data. >>>> >>>> -- Jack Krupansky >>>> >>>> *From:* Tom Burton-West <tburt...@umich.edu> >>>> *Sent:* Monday, March 04, 2013 3:57 PM >>>> *To:* dev@lucene.apache.org >>>> *Subject:* Ability to specify 2 different query analyzers for same >>>> indexed field in Solr >>>> >>>> Hello, >>>> >>>> We would like to be able to specify two different fields that both use >>>> the same indexed field but use different analyzers. An example use-case >>>> for this might be doing query-time synonym expansion with the synonyms >>>> weighted lower than an exact match. >>>> >>>> q=exact_field^10 OR synonyms^1 >>>> >>>> The normal way to do this in Solr, which is just to set up separate >>>> analyzer chains and use a copyfield, will not work for us because the field >>>> in question is huge. It is about 7 TB of OCR. >>>> >>>> Is there a way to do this currently in Solr? If not , >>>> >>>> 1) should I open a JIRA issue? >>>> 2) can someone point me towards the part of the code I might need to >>>> modify? >>>> >>>> Tom >>>> >>>> Tom Burton-West >>>> Information Retrieval Programmer >>>> Digital Library Production Service >>>> University of Michigan Library >>>> http://www.hathitrust.org/blogs/large-scale-search >>>> >>>> >>>> >>> >>> >> > >