I'm confused about how to use escape characters in Lucene.  My Lucene configuration is 
1.3-dev1 and I use the StandardAnalyzer and QueryParser.  

My documents have a field called 'path' with a value like "1102/a55407-2002nov2.xml".  
This field is indexed but not tokenized.  Here are the various queries I've tried and 
their results:

1) When a dash is included in the query, Lucene interprets this as a space. 
("path:1102/a55402-2002nov2.xml" is interpreted as  "path:1102/a55402 
-body:2002nov2.xml")

2) When a backslash is inserted before the dash (and the query does *not* contain a 
wildcard), Lucene interprets this by inserting a space in lieu of the next character. 
('path:1102/a55402\-2002nov2.xml' interpreted as 'path:"1102/a55402 2002nov2.xml" 
[note the space where the dash was]')

3) When a backslash is inserted before the dash (and the query *does* contain a 
wildcard), Lucene interprets this literally, without any conversion. 
("path:1102/55407\-2002nov*" is interpreted literally).

4) When a backslash is inserted before the dash and immediately followed by a 
wildcard, Lucene reports an error. ('path:1102/a55407-*'    causes lexical error: 
Encountered <EOF> after :"")

My overall observation is that it appears it is not possible to escape a dash - is 
this true?

A previous post (yesterday) suggests that it is also not possible to escape a 
backslash.  If that's also true, what characters can be escaped?


Regards,

Terry



Reply via email to