I'm no expert on this (so please accept the comments in that context)
but 2 things seem weird to me:
1. Iterating over each hit is an expensive proposition. I've often
seen people recommending a HitCollector.
2. It seems that doBodySearch() is essentially saying, do this search
and return the score pertinent to this ID (using an exhaustive loop).
Would it be possible to just include a query clause?
- i.e., instead of just contents:<userQuery>, also add
+id:<idWeCareAbout>
In general though, I think your algorithm seems inefficient (if I
understand it correctly):-- if I want to search for one term among 3 in
a "collection" of 300 documents (as defined by some external attribute),
I will wind up executing 300 x 3 searches, and for each search that is
executed, I will iterate over every Hit, even if I've already found the
one that I "care about".
What would break if you:
1. Included "creator" in the Lucene index (or, filtered out the Hits
using a BitSet or something like it)
2. Executed 1 search
3. Collected the results of the first N Hits (where N is some
reasonable limit, like 100 or 500)
-h
On Tue, 2007-07-24 at 20:14 -0400, Askar Zaidi wrote:
> Sure.
>
> public float doBodySearch(Searcher searcher,String query, int id){
>
> try{
> score = search(searcher, query,id);
> }
> catch(IOException io){}
> catch(ParseException pe){}
>
> return score;
>
> }
>
> private float search(Searcher searcher, String queryString, int id)
> throws ParseException, IOException {
>
> // Build a Query object
>
> QueryParser queryParser = new QueryParser("contents", new
> KeywordAnalyzer());
>
> queryParser.setDefaultOperator(QueryParser.Operator.AND);
>
> Query query = queryParser.parse(queryString);
>
> // Search for the query
>
> Hits hits = searcher.search(query);
> Document doc = null;
>
> // Examine the Hits object to see if there were any matches
> int hitCount = hits.length();
>
> for(int i=0;i<hitCount;i++){
> doc = hits.doc(i);
> String str = doc.get("item");
> int tmp = Integer.parseInt(str);
> if(tmp==id)
> score = hits.score(i);
> }
>
> return score;
> }
>
> I really need to optimize doBodySearch(...) as this takes the most
> time.
>
> thanks guys,
> Askar
>
>
> On 7/24/07, N. Hira <[EMAIL PROTECTED]> wrote:
>
> Could you show us the relevant source from doBodySearch()?
>
> -h
>
> On Tue, 2007-07-24 at 19:58 -0400, Askar Zaidi wrote:
> > I ran some tests and it seems that the slowness is from
> Lucene calls when I
> > do "doBodySearch", if I remove that call, Lucene gives me
> results in 5
> > seconds. otherwise it takes about 50 seconds.
> >
> > But I need to do Body search and that field contains lots of
> text. The field
> > is <contents>. How can I optimize that ?
> >
> > thanks,
> > Askar
> >
> >