RE: Wildcard query with untokenized punctuation

McGuigan, Colin Sat, 10 Mar 2007 11:05:02 -0800

Doron;

You're entirely correct about the analyzer (I'm using one that breaks on
non-alphanumeric characters, so all punctuation is ignored).  To be
honest, I hadn't thought about altering this, but I guess I could; just
reticent that there might be unforeseen consequences.


But I'm still curious about the original solution.  Shouldn't it be
possible to take "pagefile.*" and tokenize it (essentially throwing away
the wildcard at the end)?  

--Colin McGuigan

-----Original Message-----
From: Doron Cohen [mailto:[EMAIL PROTECTED] 
Sent: Saturday, March 10, 2007 2:08 AM
To: [email protected]
Subject: Re: Wildcard query with untokenized punctuation

Hi Colin,

Is it possible that you are using an analyzer that breaks words on non
letters? For instance SimpleAnalyzer? if so, the doc text:
   pagefile.sys
is indexed as two words:
  pagefile sys
At search time, the query text:
  pagefile.sys
is also parsed-tokenized into a two words query:
  profile sys
but the query text:
  pagefile.sys*
is not analyzed (by design) and matches only words that start with:
  pagefile.sys
But there are no such words in the index, because it was indexed with
breaking words on non-letters...

Hopefully this gets you started... If this is the reason, you may want
to
use a different analyzer (See Wiki page "AnalysisParalysis").

Otherwise, make sure you use the same analyzers at indexing and search
...
and see the Lucene FAQ entry "Why am I getting no hits / incorrect
hits?".

If all this still fails, try to post here a simple code snippet showing
how
you index and how you search.


---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

RE: Wildcard query with untokenized punctuation

Reply via email to