Yes, you can do that.
On Jul 25, 2007, at 12:31 PM, Askar Zaidi wrote:
Heres what I mean:
http://lucene.apache.org/java/docs/queryparsersyntax.html#Fields
title:"The Right Way" AND text:go
Although, I am not searching for the title "the right way" , I am
looking
for the score by specifying a unique field (itemID).
when I do System.out.println(query);
I get:
+contents:Harvard +contents:Business + contents: Review
Can I just add:
+contents:Harvard +contents:Business + contents: Review
+itemID=id ??
That query would just return one document.
On 7/25/07, Askar Zaidi <[EMAIL PROTECTED]> wrote:
Instead of refactoring the code, would there be a way to just
modify the
query in each search routine ?
Such as, "search contents:<text> and item:<itemID>"; This means it
would
just collect the score of that one document whose itemID field =
itemID
passed from while( rs.next()).
I just need to collect the score of the <itemID> already in the
index.
Would there be a way to modify the query ? Add a clause ?
thanks,
Askar
On 7/25/07, Grant Ingersoll <[EMAIL PROTECTED]> wrote:
So, you really want a single Lucene score (based on the scores of
your 4 fields) for every itemID, correct? And this score
consists of
scoring the title, tag, summary and body against some keywords
correct?
Here's what I would do:
while (rs.next())
{
doc = getDocument(itemId); // Get your document, including
contents from your database, no need even to put them in Lucene,
although you could
add the doc to a MemoryIndex (see contrib/memory)
Run your 4 searches against that memory index to get your
score. Even better, combine your query into a single query that
searches all 4 fields at once, then Lucene will combine the score
for
you
}
MemoryIndex info can be found at http://lucene.zones.apache.org:
8080/
hudson/job/Lucene-Nightly/javadoc/org/apache/lucene/index/memory/
package-summary.html
-Grant
On Jul 25, 2007, at 11:45 AM, Askar Zaidi wrote:
Hi Grant,
Thanks for the response. Heres what I am trying to accomplish:
1. Iterate over itemID (unique) in the database using one SQL
query.
2. For every itemID found, run 4 searches on Lucene Index.
3. doTagSearch(itemID....) ; collect score
4. doTitleSearch(itemID...) ; collect score
5. doSummarySearch(itemID...) ; collect score
6. doBodySearch(itemID....) ; collect score
These scores are then added and I get a total score for each unique
item in
the database.
Lucene Index has: <itemID><tags><title><summary><contents>
So if I am running a body search, I have 92 hits from over 300
documents for
a query. I already know my hit with the <itemID> .
For instance, from step (1) if itemID 16 is passed to all the 4
searches, I
just need to get the score of the document which has itemID field =
16. I
don't have to iterate over all the hits.
I suppose I have to change my query to look for <contents> where
itemID=16.
Can you guide me as to how to do it ?
thanks a ton,
Askar
On 7/25/07, Grant Ingersoll <[EMAIL PROTECTED] > wrote:
Hi Askar,
I suggest we take a step back, and ask the question, what are you
trying to accomplish? That is, what is your application trying to
do? Forget the code, etc. just explain what you want the end
result
to be and we can work from there. Based on what you have
described,
I am not sure you need access to the hits. It seems like you just
need to make better queries.
Is your itemID a unique identifier? If yes, then you shouldn't
need
to loop over hits at all, as you should only ever have one
result IF
your query contains a required term. Also, if this is the
case, why
do you need to do a search at all? Haven't you already identified
the items of interest when you did your select query in the
database? Or is it that you want to score the item based on some
terms as well. If that is the case, there are other ways of doing
this and we can discuss them.
-Grant
On Jul 25, 2007, at 10:10 AM, Askar Zaidi wrote:
Hey Guys,
I need to know how I can use the HitCollector class ? I am using
Hits and
looping over all the possible document hits (turns out its 92
times
I am
looping; for 300 searches, its 300*92 !!). Can I avoid this using
HitCollector ? I can't seem to understand how its used.
thanks a lot,
Askar
On 7/25/07, Dmitry <[EMAIL PROTECTED]> wrote:
Askar,
why do you need to add +id:<idWeCareAbout>?
thanks,
dt,
www.ejinz.com
search engine news forms
----- Original Message -----
From: "Askar Zaidi" <[EMAIL PROTECTED] >
To: <java-user@lucene.apache.org>; <[EMAIL PROTECTED]>
Sent: Wednesday, July 25, 2007 12:39 AM
Subject: Re: Fine Tuning Lucene implementation
Hey Hira ,
Thanks so much for the reply. Much appreciate it.
Quote:
Would it be possible to just include a query clause?
- i.e., instead of just contents:<userQuery>, also add
+id:<idWeCareAbout>
How can I do that ?
I see my query as :
+contents:harvard +contents:business +contents:review
where the search phrase was: harvard business review
Now how can I add +id:<idWeCareAbout> ??
This would give me that one exact document I am looking
for , for
that
id.
I
don't have to iterate through hits.
thanks,
Askar
On 7/24/07, N. Hira < [EMAIL PROTECTED]> wrote:
I'm no expert on this (so please accept the comments in that
context)
but 2 things seem weird to me:
1. Iterating over each hit is an expensive proposition. I've
often
seen people recommending a HitCollector.
2. It seems that doBodySearch() is essentially saying, do
this
search
and return the score pertinent to this ID (using an exhaustive
loop).
Would it be possible to just include a query clause?
- i.e., instead of just contents:<userQuery>, also add
+id:<idWeCareAbout>
In general though, I think your algorithm seems inefficient
(if I
understand it correctly):-- if I want to search for one term
among 3 in
a "collection" of 300 documents (as defined by some external
attribute),
I will wind up executing 300 x 3 searches, and for each search
that is
executed, I will iterate over every Hit, even if I've already
found the
one that I "care about".
What would break if you:
1. Included "creator" in the Lucene index (or, filtered
out the
Hits
using a BitSet or something like it)
2. Executed 1 search
3. Collected the results of the first N Hits (where N is some
reasonable limit, like 100 or 500)
-h
On Tue, 2007-07-24 at 20:14 -0400, Askar Zaidi wrote:
Sure.
public float doBodySearch(Searcher searcher,String query,
int
id){
try{
score = search(searcher,
query,id);
}
catch(IOException io){}
catch(ParseException pe){}
return score;
}
private float search(Searcher searcher, String queryString,
int id)
throws ParseException, IOException {
// Build a Query object
QueryParser queryParser = new QueryParser("contents",
new
KeywordAnalyzer());
queryParser.setDefaultOperator
( QueryParser.Operator.AND);
Query query = queryParser.parse(queryString);
// Search for the query
Hits hits = searcher.search(query);
Document doc = null;
// Examine the Hits object to see if there were any
matches
int hitCount = hits.length();
for(int i=0;i<hitCount;i++){
doc = hits.doc(i);
String str = doc.get("item");
int tmp = Integer.parseInt (str);
if(tmp==id)
score = hits.score(i);
}
return score;
}
I really need to optimize doBodySearch(...) as this takes the
most
time.
thanks guys,
Askar
On 7/24/07, N. Hira <[EMAIL PROTECTED]> wrote:
Could you show us the relevant source from
doBodySearch()?
-h
On Tue, 2007-07-24 at 19:58 -0400, Askar Zaidi wrote:
I ran some tests and it seems that the slowness is from
Lucene calls when I
do "doBodySearch", if I remove that call, Lucene gives me
results in 5
seconds. otherwise it takes about 50 seconds.
But I need to do Body search and that field contains lots
of
text. The field
is <contents>. How can I optimize that ?
thanks,
Askar
----------------------------------------------------------------
---
--
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: java-user-
[EMAIL PROTECTED]
--------------------------
Grant Ingersoll
Center for Natural Language Processing
http://www.cnlp.org/tech/lucene.asp
Read the Lucene Java FAQ at http://wiki.apache.org/lucene-java/
LuceneFAQ
------------------------------------------------------------------
---
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]
------------------------------------------------------
Grant Ingersoll
http://www.grantingersoll.com/
http://lucene.grantingersoll.com
http://www.paperoftheweek.com/
--------------------------------------------------------------------
-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]
------------------------------------------------------
Grant Ingersoll
http://www.grantingersoll.com/
http://lucene.grantingersoll.com
http://www.paperoftheweek.com/
---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]