Boosting Search Results
Hi, new here. I recently started using lucene and had encounter a problem.I crawl and index a number of documents. When i perform a search, lets say "tall fat", by right the results that matches all the keyword should be on top and display first. But in my search results, some of the document with only 1 matches of the keyword like 'tall' is display first. Why is that? What had i done wrong? can anyone advise me on this? thanks -- View this message in context: http://www.nabble.com/Boosting-Search-Results-tp24753954p24753954.html Sent from the Lucene - Java Users mailing list archive at Nabble.com. - To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org For additional commands, e-mail: java-user-h...@lucene.apache.org
Re: Boosting Search Results
Thanks for all the reply. It help me to understand problem better, but is it possible to create a query that will give additional boost to the results if and only if both of the word is found inside the results. This will definitely make sure that the results will be in the higher up of the list. Can this type of query be created? -- View this message in context: http://www.nabble.com/Boosting-Search-Results-tp24753954p24784708.html Sent from the Lucene - Java Users mailing list archive at Nabble.com. - To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org For additional commands, e-mail: java-user-h...@lucene.apache.org
Re: Boosting Search Results
Hey, thanks for the suggestion. I think of performing 2 searches as well. Unfortunately I dont know how to perform a search on the first results return. Could u guide me a little? I tried to look around for the information but found none Thanks Ian Lea wrote: > > You could write your own Similarity, extending DefaultSimilarity and > overriding whichever methods will help you achieve your aims. > > Or how about running 2 searches, the first with both words required > (+word1 +word2) and then a second search where they aren't both > required (word1 word2). Then merge/dedup the two lists of hits, > keeping the ones from the first search at the top. > > > -- > Ian. > > On Mon, Aug 3, 2009 at 4:14 AM, bourne71 wrote: >> >> Thanks for all the reply. It help me to understand problem better, but is >> it >> possible to create a query that will give additional boost to the results >> if >> and only if both of the word is found inside the results. This will >> definitely make sure that the results will be in the higher up of the >> list. >> >> Can this type of query be created? >> -- >> View this message in context: >> http://www.nabble.com/Boosting-Search-Results-tp24753954p24784708.html >> Sent from the Lucene - Java Users mailing list archive at Nabble.com. >> >> >> - >> To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org >> For additional commands, e-mail: java-user-h...@lucene.apache.org >> >> > > - > To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org > For additional commands, e-mail: java-user-h...@lucene.apache.org > > > -- View this message in context: http://www.nabble.com/Boosting-Search-Results-tp24753954p24788000.html Sent from the Lucene - Java Users mailing list archive at Nabble.com. - To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org For additional commands, e-mail: java-user-h...@lucene.apache.org
Re: Boosting Search Results
Sorry...I mean the double searching part. That is the part I dont understand how to do...since after retrieving the 1st results, I am not sure how to search it again. Ian Lea wrote: > > Sorry, I'm not clear what you don't know how to do. > > > To spell out the double search suggestion a bit more: > > QueryParser qp = new QueryParser(...) > > Query q1 = qp.parse("+word1 +word2"); > TopDocs td1 = searcher.search(q1, ...) > > Query q2 = qp.parse("word1 word2"); > TopDocs td2 = searcher.search(q2); > > ScoreDoc[] sd1 = td1.scoreDocs; > ScoreDoc[] sd2 = td2.scoreDocs; > > // Grab all docids from first search > List docidl = new ArrayList(); > for (int i1 = 0; i1 < sd1.length; i1++) { > docidl.add(sd1[i1].doc); > } > > // Add any docids from second search that are not already on the list > for (int i2 = 0; i2 < sd2.length; i2++) { > int docid = sd2[i2].doc); > if (!docidl.contains(docid)) { > docidl.add(docid); > } > } > > (code just a suggestion, off the top of my head, may not work, may be > full of bugs, there will be other maybe better ways to do it). > > If that doesn't help, perhaps you could rephrase the question. > > > -- > Ian. > > > On Mon, Aug 3, 2009 at 10:51 AM, bourne71 wrote: >> >> Hey, thanks for the suggestion. >> I think of performing 2 searches as well. Unfortunately I dont know how >> to >> perform a search on the first results return. Could u guide me a little? >> I >> tried to look around for the information but found none >> >> Thanks >> >> Ian Lea wrote: >>> >>> You could write your own Similarity, extending DefaultSimilarity and >>> overriding whichever methods will help you achieve your aims. >>> >>> Or how about running 2 searches, the first with both words required >>> (+word1 +word2) and then a second search where they aren't both >>> required (word1 word2). Then merge/dedup the two lists of hits, >>> keeping the ones from the first search at the top. >>> >>> >>> -- >>> Ian. >>> >>> On Mon, Aug 3, 2009 at 4:14 AM, bourne71 wrote: >>>> >>>> Thanks for all the reply. It help me to understand problem better, but >>>> is >>>> it >>>> possible to create a query that will give additional boost to the >>>> results >>>> if >>>> and only if both of the word is found inside the results. This will >>>> definitely make sure that the results will be in the higher up of the >>>> list. >>>> >>>> Can this type of query be created? >>>> -- >>>> View this message in context: >>>> http://www.nabble.com/Boosting-Search-Results-tp24753954p24784708.html >>>> Sent from the Lucene - Java Users mailing list archive at Nabble.com. >>>> >>>> >>>> - >>>> To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org >>>> For additional commands, e-mail: java-user-h...@lucene.apache.org >>>> >>>> >>> >>> - >>> To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org >>> For additional commands, e-mail: java-user-h...@lucene.apache.org >>> >>> >>> >> >> -- >> View this message in context: >> http://www.nabble.com/Boosting-Search-Results-tp24753954p24788000.html >> Sent from the Lucene - Java Users mailing list archive at Nabble.com. >> >> >> - >> To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org >> For additional commands, e-mail: java-user-h...@lucene.apache.org >> >> > > - > To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org > For additional commands, e-mail: java-user-h...@lucene.apache.org > > > -- View this message in context: http://www.nabble.com/Boosting-Search-Results-tp24753954p24800970.html Sent from the Lucene - Java Users mailing list archive at Nabble.com. - To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org For additional commands, e-mail: java-user-h...@lucene.apache.org
Query Boosting
Hi, I am fairly new to Lucene and have encounter a problem with the search function i am trying to create using Lucene. When I search, lets say "news sharing", then the results return and display. Its fine up to this point until I check the ranking. Some results, although match only 1 of the 2 keywords, will have higher ranking. The problem is like describe below: Page 1 news - Total found 23 sharing - Total found 0 Page 2 news - Total found 1 sharing - Total found 21 This is understandable why Page 1 got better ranking, bcs it has more keyword found. But this will make the results return to be less relevant My current query is like the following: (url:sharing^2.0 content:sharing title:sharing^1.5) (url:news^2.0 content:news title:news^1.5) url:"sharing news"~2147483647^2.0 content:"sharing news"~2147483647 title:"sharing news"~2147483647^1.5 Is there anyway I can add an additional query that will give an additional boost to results that has both the keyword in it? -- View this message in context: http://www.nabble.com/Query-Boosting-tp24913967p24913967.html Sent from the Lucene - Java Users mailing list archive at Nabble.com. - To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org For additional commands, e-mail: java-user-h...@lucene.apache.org
Re: Query Boosting
thanks, I understand how boosting works, what I need will be a boost in the query that will increase the score of a page if all keywords/query is found in the page to increase its ranking. I tried all sort of combination and it did not work. Anyone can provide any suggestion? Simon Willnauer wrote: > > Hi there, > > well, where to start from I would suggest you look at the output > of Query#explain() first to see how the score is calculated. You might > use a simpler query to get started with it as this might be quite > cryptic if you see it the first time. > To completely understand what the output means have a closer look to > the javadoc of the class Similarity > (http://lucene.apache.org/java/2_4_1/api/core/org/apache/lucene/search/Similarity.html) > this will explain how the score is calculated in the very detail. > Once you understand what is going on during the scoring process I > would suggest you revise your boosting. I don't know if you have field > boost set but it seems it would make more sense in your usecase as far > as I can tell. > In general make sure you understand what the different boosts are used > for - this snippet from the wiki might help you: > > What is the difference between field (or document) boosting and query > boosting? > > Index time field boosts (field.setBoost(boost)) are a way to express > things like "this document's title is worth twice as much as the title > of most documents". Query time boosts (query.setBoost(boost)) are a > way to express "I care about matches on this clause of my query twice > as much as I do about matches on other clauses of my query". > > Index time field boosts are worthless if you set them on every document. > > Index time document boosts (doc.setBoost(float)) are equivalent to > setting a field boost on ever field in that document. > > (http://wiki.apache.org/lucene-java/LuceneFAQ#head-246300129b9d3bf73f597facec54ac2ee54e15d7) > > hope that helps to get started with scoring etc. > > simon > > > On Tue, Aug 11, 2009 at 10:50 AM, bourne71 wrote: >> >> Hi, >> >> I am fairly new to Lucene and have encounter a problem with the search >> function i am trying to create using Lucene. When I search, lets say >> "news >> sharing", then the results return and display. >> >> Its fine up to this point until I check the ranking. Some results, >> although >> match only 1 of the 2 keywords, will have higher ranking. The problem is >> like describe below: >> >> Page 1 >> news - Total found 23 >> sharing - Total found 0 >> >> Page 2 >> news - Total found 1 >> sharing - Total found 21 >> >> This is understandable why Page 1 got better ranking, bcs it has more >> keyword found. But this will make the results return to be less relevant >> >> My current query is like the following: >> (url:sharing^2.0 content:sharing title:sharing^1.5) (url:news^2.0 >> content:news title:news^1.5) url:"sharing news"~2147483647^2.0 >> content:"sharing news"~2147483647 title:"sharing news"~2147483647^1.5 >> >> Is there anyway I can add an additional query that will give an >> additional >> boost to results that has both the keyword in it? >> -- >> View this message in context: >> http://www.nabble.com/Query-Boosting-tp24913967p24913967.html >> Sent from the Lucene - Java Users mailing list archive at Nabble.com. >> >> >> - >> To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org >> For additional commands, e-mail: java-user-h...@lucene.apache.org >> >> > > - > To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org > For additional commands, e-mail: java-user-h...@lucene.apache.org > > > -- View this message in context: http://www.nabble.com/Query-Boosting-tp24913967p24928789.html Sent from the Lucene - Java Users mailing list archive at Nabble.com. - To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org For additional commands, e-mail: java-user-h...@lucene.apache.org
Generating Query
Hi, I am trying to build a query that looks like the following: url:(+news +politics)^1.5 content:(+news +politics)^2.0 But I can't seems to find any reference to it. I try hardcoding it like the following: BooleanQuery query = new BooleanQuery(); query.add(new TermQuery(new Term(field, "+news +politics")), BooleanClause.Occur.SHOULD); But with this, the query doesn't seems to provide any response or effect. By right its suppose to boost the field of the page that contain both of the word in it. Can anyone advise me on how to create this type of query? Thanks -- View this message in context: http://www.nabble.com/Generating-Query-tp24931880p24931880.html Sent from the Lucene - Java Users mailing list archive at Nabble.com. - To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org For additional commands, e-mail: java-user-h...@lucene.apache.org
Re: Generating Query
thanks for the suggestion, but unfortunately it does not work >< Ahmet Arslan wrote: > >> I am trying to build a query that looks like the >> following: >> url:(+news +politics)^1.5 content:(+news +politics)^2.0 >> >> But I can't seems to find any reference to it. I try >> hardcoding it like the >> following: >> BooleanQuery query = new BooleanQuery(); >> query.add(new TermQuery(new Term(field, "+news >> +politics")), >> BooleanClause.Occur.SHOULD); > > Query t1 = new TermQuery(new Term("url", "news")); > Query t2 = new TermQuery(new Term("url", "politics")); > > Query t3 = new TermQuery(new Term("content", "news")); > Query t4 = new TermQuery(new Term("content", "politics")); > > BooleanQuery b1 = new BooleanQuery(); > b1.add(t1, BooleanClause.Occur.MUST); > b1.add(t2, BooleanClause.Occur.MUST); > b1.setBoost(1.5f); > > BooleanQuery b2 = new BooleanQuery(); > b2.add(t3, BooleanClause.Occur.MUST); > b2.add(t4, BooleanClause.Occur.MUST); > b2.setBoost(2.0f); > > BooleanQuery finalQuery = new BooleanQuery(); > finalQuery.add(b1,BooleanClause.Occur.SHOULD); > finalQuery.add(b2,BooleanClause.Occur.SHOULD); > > > > > - > To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org > For additional commands, e-mail: java-user-h...@lucene.apache.org > > > -- View this message in context: http://www.nabble.com/Generating-Query-tp24931880p24943981.html Sent from the Lucene - Java Users mailing list archive at Nabble.com. - To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org For additional commands, e-mail: java-user-h...@lucene.apache.org
Re: Generating Query
I am trying to boost results that have all the query in it to increase its ranking. But both the query unfortunately does not seems to effect it Ahmet Arslan wrote: > >> thanks for the suggestion, but unfortunately it does not >> work. > > What are you trying to do? Both Adriano's and my query satisfies what you > were asking for. What didn't work? > > > > > - > To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org > For additional commands, e-mail: java-user-h...@lucene.apache.org > > > -- View this message in context: http://www.nabble.com/Generating-Query-tp24931880p24951573.html Sent from the Lucene - Java Users mailing list archive at Nabble.com. - To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org For additional commands, e-mail: java-user-h...@lucene.apache.org
Re: Generating Query
hm...try tat...but doesn't seems to be working for me though Ahmet Arslan wrote: > >> I am trying to boost results that have all the query >> in it to increase its ranking. But both the query unfortunately does not >> > seems to effect it > > Did you read last two messages on this thread? > > http://www.nabble.com/Generating-Query-for-Multiple-Clauses-in-a-Single-Field-td24694748.html > > > And do not forget to use your new similarity class in both indexing and > searching. IndexSearcher.setSimilarity and IndexWriter.setSimilarity. > > > > > - > To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org > For additional commands, e-mail: java-user-h...@lucene.apache.org > > > -- View this message in context: http://www.nabble.com/Generating-Query-tp24931880p24952356.html Sent from the Lucene - Java Users mailing list archive at Nabble.com. - To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org For additional commands, e-mail: java-user-h...@lucene.apache.org