On Saturday, October 11, 2003, at 09:44 AM, Michael Giles wrote:
He is probably using the StandardAnalyzer. I was about to write the exact same email (but using Wal-Mart as an example on this page - http://www.benchmark.com/cgi-bin/suid/~bcmlp/ newsletter.cgi?mode=show&year=2003&date=2003-10-07). I index and search with the same analyzer (Standard), but when I search for Wal-Mart, I don't find a match. I DO find a match if I search for "Wal-Mart" or Wal Mart (no hyphen). This seems like a bug.

Sorry for the delay. I've been meaning to reply to this.


When you index using StandardAnalyzer, you are indexing it to two terms "wal" and "mart" (without the quotes). QueryParser does its own (weird?) stuff to
strings passed to it. Here's how it breaks down:


String[] queries = {"Wal-Mart", "\"Wal-Mart\"", "Wal Mart"};
for (int i = 0; i < queries.length; i++) {
String query = queries[i];
Query q = QueryParser.parse(query, "contents", new StandardAnalyzer());
System.out.println(query + " = " + q);
}


Wal-Mart = contents:wal -contents:mart
"Wal-Mart" = contents:"wal mart"
Wal Mart = contents:wal contents:mart

Notice all three are completely different queries. The Wal-Mart one is excluding "mart" making it miss documents you expect. The second one is a phrase query, which is basically what you're after. The third one is matching any documents with "wal" or "mart" in them regardless of whether they are side-by-side.

Is this a bug? Nah... just the nature of the QueryParser beast. It would be a non-backwards-compatible change to change how QueryParser deals with a dash. That is the main issue here with it interpreting it as a NOT operator. But it seems logical to me that it shouldn't do so when its mashed against a word like this and leave it to the analyzer to deal with.

Erik


--------------------------------------------------------------------- To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]



Reply via email to