Boosting Search Results

2009-07-31 Thread bourne71

Hi, new here.

I recently started using lucene and had encounter a problem.I crawl and
index a number of documents. 
When i perform a search, lets say "tall fat", by right the results that
matches all the keyword should be on top and display first. 

But in my search results, some of the document with only 1 matches of the
keyword like 'tall' is display first. Why is that? What had i done wrong?

can anyone advise me on this? thanks
-- 
View this message in context: 
http://www.nabble.com/Boosting-Search-Results-tp24753954p24753954.html
Sent from the Lucene - Java Users mailing list archive at Nabble.com.


-
To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-user-h...@lucene.apache.org



Re: Boosting Search Results

2009-08-02 Thread bourne71

Thanks for all the reply. It help me to understand problem better, but is it
possible to create a query that will give additional boost to the results if
and only if both of the word is found inside the results. This will
definitely make sure that the results will be in the higher up of the list.

Can this type of query be created?
-- 
View this message in context: 
http://www.nabble.com/Boosting-Search-Results-tp24753954p24784708.html
Sent from the Lucene - Java Users mailing list archive at Nabble.com.


-
To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-user-h...@lucene.apache.org



Re: Boosting Search Results

2009-08-03 Thread bourne71

Hey, thanks for the suggestion.
I think of performing 2 searches as well. Unfortunately I dont know how to
perform a search on the first results return. Could u guide me a little? I
tried to look around for the information but found none

Thanks

Ian Lea wrote:
> 
> You could write your own Similarity, extending DefaultSimilarity and
> overriding whichever methods will help you achieve your aims.
> 
> Or how about running 2 searches, the first with both words required
> (+word1 +word2) and then a second search where they aren't both
> required (word1 word2).  Then merge/dedup the two lists of hits,
> keeping the ones from the first search at the top.
> 
> 
> --
> Ian.
> 
> On Mon, Aug 3, 2009 at 4:14 AM, bourne71 wrote:
>>
>> Thanks for all the reply. It help me to understand problem better, but is
>> it
>> possible to create a query that will give additional boost to the results
>> if
>> and only if both of the word is found inside the results. This will
>> definitely make sure that the results will be in the higher up of the
>> list.
>>
>> Can this type of query be created?
>> --
>> View this message in context:
>> http://www.nabble.com/Boosting-Search-Results-tp24753954p24784708.html
>> Sent from the Lucene - Java Users mailing list archive at Nabble.com.
>>
>>
>> -
>> To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org
>> For additional commands, e-mail: java-user-h...@lucene.apache.org
>>
>>
> 
> -
> To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org
> For additional commands, e-mail: java-user-h...@lucene.apache.org
> 
> 
> 

-- 
View this message in context: 
http://www.nabble.com/Boosting-Search-Results-tp24753954p24788000.html
Sent from the Lucene - Java Users mailing list archive at Nabble.com.


-
To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-user-h...@lucene.apache.org



Re: Boosting Search Results

2009-08-03 Thread bourne71

Sorry...I mean the double searching part. That is the part I dont understand
how to do...since after retrieving the 1st results, I am not sure how to
search it again.


Ian Lea wrote:
> 
> Sorry, I'm not clear what you don't know how to do.
> 
> 
> To spell out the double search suggestion a bit more:
> 
> QueryParser qp = new QueryParser(...)
> 
> Query q1 = qp.parse("+word1 +word2");
> TopDocs td1 = searcher.search(q1, ...)
> 
> Query q2 = qp.parse("word1 word2");
> TopDocs td2 = searcher.search(q2);
> 
> ScoreDoc[] sd1 = td1.scoreDocs;
> ScoreDoc[] sd2 = td2.scoreDocs;
> 
> // Grab all docids from first search
> List docidl = new ArrayList();
> for (int i1 = 0; i1 < sd1.length; i1++) {
>   docidl.add(sd1[i1].doc);
> }
> 
> // Add any docids from second search that are not already on the list
> for (int i2 = 0; i2 < sd2.length; i2++) {
>   int docid = sd2[i2].doc);
>   if (!docidl.contains(docid)) {
> docidl.add(docid);
>   }
> }
> 
> (code just a suggestion, off the top of my head, may not work, may be
> full of bugs, there will be other maybe better ways to do it).
> 
> If that doesn't help, perhaps you could rephrase the question.
> 
> 
> --
> Ian.
> 
> 
> On Mon, Aug 3, 2009 at 10:51 AM, bourne71 wrote:
>>
>> Hey, thanks for the suggestion.
>> I think of performing 2 searches as well. Unfortunately I dont know how
>> to
>> perform a search on the first results return. Could u guide me a little?
>> I
>> tried to look around for the information but found none
>>
>> Thanks
>>
>> Ian Lea wrote:
>>>
>>> You could write your own Similarity, extending DefaultSimilarity and
>>> overriding whichever methods will help you achieve your aims.
>>>
>>> Or how about running 2 searches, the first with both words required
>>> (+word1 +word2) and then a second search where they aren't both
>>> required (word1 word2).  Then merge/dedup the two lists of hits,
>>> keeping the ones from the first search at the top.
>>>
>>>
>>> --
>>> Ian.
>>>
>>> On Mon, Aug 3, 2009 at 4:14 AM, bourne71 wrote:
>>>>
>>>> Thanks for all the reply. It help me to understand problem better, but
>>>> is
>>>> it
>>>> possible to create a query that will give additional boost to the
>>>> results
>>>> if
>>>> and only if both of the word is found inside the results. This will
>>>> definitely make sure that the results will be in the higher up of the
>>>> list.
>>>>
>>>> Can this type of query be created?
>>>> --
>>>> View this message in context:
>>>> http://www.nabble.com/Boosting-Search-Results-tp24753954p24784708.html
>>>> Sent from the Lucene - Java Users mailing list archive at Nabble.com.
>>>>
>>>>
>>>> -
>>>> To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org
>>>> For additional commands, e-mail: java-user-h...@lucene.apache.org
>>>>
>>>>
>>>
>>> -
>>> To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org
>>> For additional commands, e-mail: java-user-h...@lucene.apache.org
>>>
>>>
>>>
>>
>> --
>> View this message in context:
>> http://www.nabble.com/Boosting-Search-Results-tp24753954p24788000.html
>> Sent from the Lucene - Java Users mailing list archive at Nabble.com.
>>
>>
>> -
>> To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org
>> For additional commands, e-mail: java-user-h...@lucene.apache.org
>>
>>
> 
> -
> To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org
> For additional commands, e-mail: java-user-h...@lucene.apache.org
> 
> 
> 

-- 
View this message in context: 
http://www.nabble.com/Boosting-Search-Results-tp24753954p24800970.html
Sent from the Lucene - Java Users mailing list archive at Nabble.com.


-
To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-user-h...@lucene.apache.org



Query Boosting

2009-08-11 Thread bourne71

Hi,

I am fairly new to Lucene and have encounter a problem with the search
function i am trying to create using Lucene.  When I search, lets say "news
sharing", then the results return and display.

Its fine up to this point until I check the ranking. Some results, although
match only 1 of the 2 keywords, will have higher ranking. The problem is
like describe below:

Page 1
news - Total found 23
sharing - Total found 0

Page 2
news - Total found 1
sharing - Total found 21

This is understandable why Page 1 got better ranking, bcs it has more
keyword found. But this will make the results return to be less relevant

My current query is like the following:
(url:sharing^2.0 content:sharing title:sharing^1.5) (url:news^2.0
content:news title:news^1.5) url:"sharing news"~2147483647^2.0
content:"sharing news"~2147483647 title:"sharing news"~2147483647^1.5 

Is there anyway I can add an additional query that will give an additional
boost to results that has both the keyword in it?
-- 
View this message in context: 
http://www.nabble.com/Query-Boosting-tp24913967p24913967.html
Sent from the Lucene - Java Users mailing list archive at Nabble.com.


-
To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-user-h...@lucene.apache.org



Re: Query Boosting

2009-08-11 Thread bourne71

thanks, I understand how boosting works, what I need will be a boost in the
query that will increase the score of a page if all keywords/query is found
in the page to increase its ranking.

I tried all sort of combination and it did not work. Anyone can provide any
suggestion?


Simon Willnauer wrote:
> 
> Hi there,
> 
> well, where to start from I would suggest you look at the output
> of Query#explain() first to see how the score is calculated. You might
> use a simpler query to get started with it as this might be quite
> cryptic if you see it the first time.
> To completely understand what the output means have a closer look to
> the javadoc of the class Similarity
> (http://lucene.apache.org/java/2_4_1/api/core/org/apache/lucene/search/Similarity.html)
> this will explain how the score is calculated in the very detail.
> Once you understand what is going on during the scoring process I
> would suggest you revise your boosting. I don't know if you have field
> boost set but it seems it would make more sense in your usecase as far
> as I can tell.
> In general make sure you understand what the different boosts are used
> for - this snippet from the wiki might help you:
> 
> What is the difference between field (or document) boosting and query
> boosting?
> 
> Index time field boosts (field.setBoost(boost)) are a way to express
> things like "this document's title is worth twice as much as the title
> of most documents". Query time boosts (query.setBoost(boost)) are a
> way to express "I care about matches on this clause of my query twice
> as much as I do about matches on other clauses of my query".
> 
> Index time field boosts are worthless if you set them on every document.
> 
> Index time document boosts (doc.setBoost(float)) are equivalent to
> setting a field boost on ever field in that document.
> 
> (http://wiki.apache.org/lucene-java/LuceneFAQ#head-246300129b9d3bf73f597facec54ac2ee54e15d7)
> 
> hope that helps to get started with scoring etc.
> 
> simon
> 
> 
> On Tue, Aug 11, 2009 at 10:50 AM, bourne71 wrote:
>>
>> Hi,
>>
>> I am fairly new to Lucene and have encounter a problem with the search
>> function i am trying to create using Lucene.  When I search, lets say
>> "news
>> sharing", then the results return and display.
>>
>> Its fine up to this point until I check the ranking. Some results,
>> although
>> match only 1 of the 2 keywords, will have higher ranking. The problem is
>> like describe below:
>>
>> Page 1
>> news - Total found 23
>> sharing - Total found 0
>>
>> Page 2
>> news - Total found 1
>> sharing - Total found 21
>>
>> This is understandable why Page 1 got better ranking, bcs it has more
>> keyword found. But this will make the results return to be less relevant
>>
>> My current query is like the following:
>> (url:sharing^2.0 content:sharing title:sharing^1.5) (url:news^2.0
>> content:news title:news^1.5) url:"sharing news"~2147483647^2.0
>> content:"sharing news"~2147483647 title:"sharing news"~2147483647^1.5
>>
>> Is there anyway I can add an additional query that will give an
>> additional
>> boost to results that has both the keyword in it?
>> --
>> View this message in context:
>> http://www.nabble.com/Query-Boosting-tp24913967p24913967.html
>> Sent from the Lucene - Java Users mailing list archive at Nabble.com.
>>
>>
>> -
>> To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org
>> For additional commands, e-mail: java-user-h...@lucene.apache.org
>>
>>
> 
> -
> To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org
> For additional commands, e-mail: java-user-h...@lucene.apache.org
> 
> 
> 

-- 
View this message in context: 
http://www.nabble.com/Query-Boosting-tp24913967p24928789.html
Sent from the Lucene - Java Users mailing list archive at Nabble.com.


-
To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-user-h...@lucene.apache.org



Generating Query

2009-08-12 Thread bourne71

Hi,

I am trying to build a query that looks like the following:
url:(+news +politics)^1.5 content:(+news +politics)^2.0

But I can't seems to find any reference to it. I try hardcoding it like the
following:
BooleanQuery query = new BooleanQuery();
query.add(new TermQuery(new Term(field, "+news +politics")),
BooleanClause.Occur.SHOULD);

But with this, the query doesn't seems to provide any response or effect. By
right its suppose to boost the field of the page that contain both of the
word in it.

Can anyone advise me on how to create this type of query? Thanks
-- 
View this message in context: 
http://www.nabble.com/Generating-Query-tp24931880p24931880.html
Sent from the Lucene - Java Users mailing list archive at Nabble.com.


-
To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-user-h...@lucene.apache.org



Re: Generating Query

2009-08-12 Thread bourne71

thanks for the suggestion, but unfortunately it does not work ><


Ahmet Arslan wrote:
> 
>> I am trying to build a query that looks like the
>> following:
>> url:(+news +politics)^1.5 content:(+news +politics)^2.0
>> 
>> But I can't seems to find any reference to it. I try
>> hardcoding it like the
>> following:
>> BooleanQuery query = new BooleanQuery();
>> query.add(new TermQuery(new Term(field, "+news
>> +politics")),
>> BooleanClause.Occur.SHOULD);
> 
> Query t1 = new TermQuery(new Term("url", "news"));
> Query t2 = new TermQuery(new Term("url", "politics"));
> 
> Query t3 = new TermQuery(new Term("content", "news"));
> Query t4 = new TermQuery(new Term("content", "politics"));
> 
> BooleanQuery b1 = new BooleanQuery();
> b1.add(t1, BooleanClause.Occur.MUST);
> b1.add(t2, BooleanClause.Occur.MUST);
> b1.setBoost(1.5f);
> 
> BooleanQuery b2 = new BooleanQuery();
> b2.add(t3, BooleanClause.Occur.MUST);
> b2.add(t4, BooleanClause.Occur.MUST);
> b2.setBoost(2.0f);
> 
> BooleanQuery finalQuery = new BooleanQuery();
> finalQuery.add(b1,BooleanClause.Occur.SHOULD);
> finalQuery.add(b2,BooleanClause.Occur.SHOULD); 
> 
> 
>   
> 
> -
> To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org
> For additional commands, e-mail: java-user-h...@lucene.apache.org
> 
> 
> 

-- 
View this message in context: 
http://www.nabble.com/Generating-Query-tp24931880p24943981.html
Sent from the Lucene - Java Users mailing list archive at Nabble.com.


-
To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-user-h...@lucene.apache.org



Re: Generating Query

2009-08-13 Thread bourne71

I am trying to boost  results that have all the query in it to increase its
ranking. But both the query unfortunately does not seems to effect it

Ahmet Arslan wrote:
> 
>> thanks for the suggestion, but unfortunately it does not
>> work.
> 
> What are you trying to do? Both Adriano's and my query satisfies what you
> were asking for. What didn't work?
> 
> 
>   
> 
> -
> To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org
> For additional commands, e-mail: java-user-h...@lucene.apache.org
> 
> 
> 

-- 
View this message in context: 
http://www.nabble.com/Generating-Query-tp24931880p24951573.html
Sent from the Lucene - Java Users mailing list archive at Nabble.com.


-
To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-user-h...@lucene.apache.org



Re: Generating Query

2009-08-13 Thread bourne71

hm...try tat...but doesn't seems to be working for me though

Ahmet Arslan wrote:
> 
>> I am trying to boost  results that have all the query
>> in it to increase its ranking. But both the query unfortunately does not
>> > seems to effect it
> 
> Did you read last two messages on this thread?
> 
> http://www.nabble.com/Generating-Query-for-Multiple-Clauses-in-a-Single-Field-td24694748.html
>  
> 
> And do not forget to use your new similarity class in both indexing and
> searching. IndexSearcher.setSimilarity and IndexWriter.setSimilarity.
> 
> 
> 
> 
> -
> To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org
> For additional commands, e-mail: java-user-h...@lucene.apache.org
> 
> 
> 

-- 
View this message in context: 
http://www.nabble.com/Generating-Query-tp24931880p24952356.html
Sent from the Lucene - Java Users mailing list archive at Nabble.com.


-
To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-user-h...@lucene.apache.org