Thank you both, Umesh Prasad and Anshum. You've been a great help.
-----Original Message----- From: Anshum [mailto:[email protected]] Sent: Sunday, January 16, 2011 5:26 PM To: [email protected] Subject: Re: Result ordering Hi Pelit, Firstly, number of words that match a query in a document is not term frequency. You may get some more idea on the terminologies used in search at http://www.miislita.com/term-vector/term-vector-3.html Looking at what you're trying to achieve, a few solutions to you would be. Below, I am assuming your query to be for terms "A B": 1. Look at phrase queries and setting a high slop value so matches like "A B" are weighted more than "A .... B" (separated by a few positions). 2. Also you could have a custom similarity (advanced) http://lucene.apache.org/java/2_4_0/api/org/apache/lucene/search/SimilarityDelegator.html and use the coord value. 3. For wildcard searches, a brute force mechanism could be, you may have an or query (finally the query is expanded to an OR query anyways) for RING OR RING* and boost the former part. Looking at your current query seems like you'd need more understanding on lucene and getting a copy of "Lucene In Action 2nd Ed<http://www.manning.com/hatcher3/>." would be a good idea for you and everyone in your position. Hope that helps. -- Anshum Gupta http://ai-cafe.blogspot.com On Sun, Jan 16, 2011 at 8:03 PM, Pelit Mamani <[email protected]>wrote: > Hi, > > I'm maintaining some Lucene-based code, and we're trying to get > control over result ordering (users aren't happy with the default). > I know how to boost a Field or Document (very useful). > But: > > > 1) Is there a way to boost "OR" queries, based on the number of > matched terms? > So the OR query "lord rings" will first show the document "LORD of the > RINGS" (which holds both words), and only later "selected jewels and RINGS" > (which only holds one word). > Is that what you call "Term Frequency"? And how do you boost it further? > I did a bit of tinkering and got the impression Lucene would boost it > by default, but not enough - it's sometimes overridden by other > boosting factors (maybe the boost for short expressions). > > > 2) Is there a way to boost based on "positions"? > So "LORD of the RINGS" gets precedence over "LORD of the funny golden > RINGS", because the search words are positioned closer to each other? > > > 3) With wildcard searches, is there a way to boost documents that hold > an exact match. > So if I search for "ring*", I first see the exact match "story of a > RING", and only later "a RINGING failure" > > Thanx a bunch. > > > > ************************************************************************************ This footnote confirms that this email message has been scanned by PineApp Mail-SeCure for the presence of malicious code, vandals & computer viruses. ************************************************************************************ --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
