Update: I have implemented my own subclasses of QueryParser, BooleanQuery, BooleanScorer and Similarity to deal with this.
I have been successful in getting the exact behaviour I want... when calling the .explain() method. However, the scores for some documents often differ when calling IndexSearcher.search() vs IndexSearcher.explain(). I am a bit confused by this. The coord() seems to be one of the things I need to change, but is not the only element in the formula that I have clearly changed for the .explain() pipeline but not for .search(). The implementation of BulkScorer remains perplexing to me and I suspect it is something in there I have missed. Any pointers? Thanks! Daniel On 15 January 2015 at 23:00, Jack Krupansky-3 [via Lucene] < ml-node+s472066n4179925...@n3.nabble.com> wrote: > File a Jira for this particular doc fix since it is significant and not > just mere worksmithing. Better yet, submit a patch since that's Javadoc, > although the exact form of the doc fix might be debatable, so I general > description of the problem should be sufficient, unless you feel > motivated. > > -- Jack Krupansky > > On Thu, Jan 15, 2015 at 11:23 AM, danield <[hidden email] > <http:///user/SendEmail.jtp?type=node&node=4179925&i=0>> wrote: > > > Hi Mike, > > > > Thank you for your reply. Yes, I had thought of this, but it is not a > > solution to my problem, and this is because the Term Frequency and > > therefore > > the results will still be wrong, as prepending or appending a string to > the > > term will still make it a different term. > > > > Similarily, I could use regex queries, but again that doesn't fix the TF > > issue. I am not talking here hypothetically, I have proof this doesn't > work > > experimentally (i.e. the precision for my task goes down in my > > experiments). > > > > Also, I agree that when your fields are essentially different as in > > /title/, > > /author /and /text/, normalizing by field length makes sense, but in my > > case > > my fields are many and are all chunks of a larger text (extracted > sentences > > that have been labelled with a number of different classes), and in the > > experiments I am running I am trying to establish whether weighting > > sentences in different classes differently will lead to increased > relevance > > of results. > > > > This also doesn't change the fact that documentation is wrong! Any ideas > > how > > to fix? > > Daniel > > > > > > > > -- > > View this message in context: > > > http://lucene.472066.n3.nabble.com/Similarity-formula-documentation-is-misleading-how-to-make-field-agnostic-queries-tp4179307p4179834.html > > Sent from the Lucene - Java Users mailing list archive at Nabble.com. > > > > --------------------------------------------------------------------- > > To unsubscribe, e-mail: [hidden email] > <http:///user/SendEmail.jtp?type=node&node=4179925&i=1> > > For additional commands, e-mail: [hidden email] > <http:///user/SendEmail.jtp?type=node&node=4179925&i=2> > > > > > > > ------------------------------ > If you reply to this email, your message will be added to the discussion > below: > > http://lucene.472066.n3.nabble.com/Similarity-formula-documentation-is-misleading-how-to-make-field-agnostic-queries-tp4179307p4179925.html > To unsubscribe from Similarity formula documentation is misleading + how > to make field-agnostic queries?, click here > <http://lucene.472066.n3.nabble.com/template/NamlServlet.jtp?macro=unsubscribe_by_code&node=4179307&code=ZGFuaWVsZHVtYUBnbWFpbC5jb218NDE3OTMwN3wxMjkzMjkwMDg3> > . > NAML > <http://lucene.472066.n3.nabble.com/template/NamlServlet.jtp?macro=macro_viewer&id=instant_html%21nabble%3Aemail.naml&base=nabble.naml.namespaces.BasicNamespace-nabble.view.web.template.NabbleNamespace-nabble.view.web.template.NodeNamespace&breadcrumbs=notify_subscribers%21nabble%3Aemail.naml-instant_emails%21nabble%3Aemail.naml-send_instant_email%21nabble%3Aemail.naml> > -- View this message in context: http://lucene.472066.n3.nabble.com/Similarity-formula-documentation-is-misleading-how-to-make-field-agnostic-queries-tp4179307p4180529.html Sent from the Lucene - Java Users mailing list archive at Nabble.com.