Correct me if I'm wrong but the WhiteSpace Analyzer doesn't
lowercase.
As I mentioned in my previous email, that's the one I'm using. When
I
don't use the wildcard everything works fine:
IndexSearcher is = new
IndexSearcher("C:/J2EE_Projects/Lucene/indexDirCompound");
PerFieldAnalyzerWrapper pw = new PerFieldAnalyzerWrapper(new
StandardAnalyzer());
pw.addAnalyzer("molecular_formula", new
WhitespaceAnalyzer());
String sr = "C9H10O5";
Query q = QueryParser.parse(sr, "molecular_formula", pw);
System.out.println(q.toString());
The last line generates the following:
molecular_formula:C9H10O5
Now if I change the sr String:
IndexSearcher is = new
IndexSearcher("C:/J2EE_Projects/Lucene/indexDirCompound");
PerFieldAnalyzerWrapper pw = new PerFieldAnalyzerWrapper(new
StandardAnalyzer());
pw.addAnalyzer("molecular_formula", new
WhitespaceAnalyzer());
String sr = "C9H10O5*";//only change is adding the *
Query q = QueryParser.parse(sr, "molecular_formula", pw);
System.out.println(q.toString());
I get this:
molecular_formula:c9h10o5*
As you can see it's been lower cased and I get no hits. Looks like
something is lowercasing the wildcard query. How can I make it not
do
that?
Thanks,
Peter
-----Original Message-----
From: Otis Gospodnetic [mailto:[EMAIL PROTECTED]
Sent: Thursday, August 12, 2004 3:03 PM
To: Lucene Users List
Subject: Re: wildcard uppercase
Just use an Analyzer that doesn't lowercase. That FAQ entry assumes
that the Analyzer does lowercase its input.
Searching IS case sensitive, it's just that people often use an
Analyzer that lowercases everything (at indexing and at query time),
so
the search appears not to be case sensitive, and that is just what
most
people want.
Otis
--- "Kipping, Peter" <[EMAIL PROTECTED]> wrote:
I'm doing wildcard searches on molecular formulas where case is
critical. For instance Co = Cobalt, CO = Carbon Monoxide. I've
read
the faq on this:
Yes, unlike other types of Lucene queries, Wildcard, Prefix, and
Fuzzy
queries are case sensitive.
That is because those types of queries are not passed through the
Analyzer, which is the component that performs operations such as
stemming and lowercasing.
The reason for skipping the Analyzer is that if you were searching
for
"dogs*" you would not want "dogs" first stemmed to "dog", since
that
would then match "dog*", which is not the intended query.
A workaround for this is simply to lowercase the entire query
before
passing it to the query parser.
But it makes no sense. First most analyzers don't even do
stemming.
I'm using the whitespace analyzer which doesn't. Second
lowercasing
is
a completely separate issue from stemming, I see no reason why the
a
wildcard query has to be lowercased. Is there any way to prevent
my
wildcard queries from being lowercased? Example: String input
"C9H10O5*", resulting query "c9h10o5*"
Thanks,
Peter
---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail:
[EMAIL PROTECTED]
---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]
---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]