After some more testing, it seems the special characters in the url are
causing problems.

The url field is stored un_tokenized, so the analyzer in use shouldn't
matter?

And in my original question, "contentlength" and "length" were indeed
only errors in that post, the actual code contains the correct field
names.

Thanks for the ideas so far.

At this point, I've tried using QueryParser.Parse and
QueryParser.Escape.
I've also tried adding quotes around the url:
url:"http://some.website.com/";

Neither method gives the results I expect. Is there some way to have a
look at what luke does to a query under the hood?

-----Original Message-----
From: Mark Cottman-Fields [mailto:mar...@qpac.com.au]
Sent: Sunday, January 11, 2009 11:57 PM
To: lucene-net-user@incubator.apache.org
Subject: difference between Luke search and Lucene.Net search?

Hi All

I'm currently testing a lucene.net index for a website.

The Luke search utility and my C# code give me different results for
searches. There's probably a simple explanation, but I can't find it.

The idea here is to check for changes using the url and the content
length.

C# code:

(searcher is Lucene.Net.Search.IndexSearcher instance)

public bool IsIndexed(Uri url, int stringcontent) {
int foundCount = 0;
Lucene.Net.Search.BooleanQuery bq = new
Lucene.Net.Search.BooleanQuery(); bq.Add(new
Lucene.Net.Search.TermQuery(new Lucene.Net.Index.Term("url", "\"" +
url.AbsoluteUri + "\"")), Lucene.Net.Search.BooleanClause.Occur.MUST);
bq.Add(new Lucene.Net.Search.TermQuery(new
Lucene.Net.Index.Term("length", stringcontent.ToString())),
Lucene.Net.Search.BooleanClause.Occur.MUST);
Lucene.Net.Search.Hits hits = searcher.Search(bq);
foundCount = hits.Length();
return foundCount > 0;
}

A search like:
+url:"http://some.website.com/"; +contentlength:1234

Finds exactly one item in Luke (the expected behaviour, if the url is
indexed and has that length exactly), but returns nothing in my code
above. They are both using the same index.

Any ideas?

BTW, great project, thank you!

Mark

********** Disclaimer **********
This email, together with any attachments, is intended for the named recipient 
only.  This email may contain information which is confidential, of a private 
nature or which is subject to legal professional privilege or copyright.  
Accordingly, any form of disclosure, modification, distribution and/or 
publication of this email message is prohibited unless expressly authorised by 
the sender acting with the authority of or on behalf of the Queensland 
Performing Arts Centre.

If you have received this email by mistake, please inform the sender as soon as 
possible and delete the message and any copies of this message from your 
computer system network. The confidentiality, privacy or legal professional 
privilege attached to this email is not waived or destroyed by that mistake.

Unless expressly attributed, the views expressed in this email do not 
necessarily represent the views of the Queensland Performing Arts Centre.
********************************

Reply via email to