Re: using different analyzer for searching

Erik Hatcher Fri, 01 Apr 2005 06:23:42 -0800

On Mar 31, 2005, at 10:49 PM, pashupathinath wrote:

   i should do even more analysis as suggested by you
before i should come to a decision of which analyser i
should be using to solve this. what about writing a
custom analyzer to solve this ??? how can i go abt the
logic of implementing this in a custom analyzer..
where this returns all the documents that has even a
part of  the search string.
   any insight into this would be very helpful
especially in terms of performance wise.

This is an involved topic, and one that is covered in great detail in the analysis chapter of Lucene in Action (shameless plug, yes, I know!).

I recommend you analyze the types of queries that need to be made and what type of user interface you will present for this - then determine what makes the most sense analysis-wise. WhitespaceAnalyzer is not going to be good enough, as I suspect you'll want case-insensitive searches at least.

        Erik


thanks,
pashupathinath.k

--- Erik Hatcher <[EMAIL PROTECTED]> wrote:


On Mar 31, 2005, at 11:44 AM, pashupathinath wrote:

is it possible to index using a predefined

analyzer

and search using a custom analyzer ??


Yes, its perfectly fine to do so with the caveat
that you end up
searching for the terms exactly as they were
indexed.

I end up doing this in most applications, actually,
primarily because
untokenized fields need to use the KeywordAnalyzer
during searching.

  i'm searching using the built in whitespace
analyser. the problem is when i'm searching for a

part

of a string the search results are zero.
  i'm using white space analyzer. for example if

the

statement is "my name is abc123" the search for

abc or

123 doesnt return any hits.
  anyinsight into this ??


The exact terms indexed using WhitespaceAnalyzer are
like this (using
the Lucene in Action AnalyzerDemo - "ant
AnalyzerDemo"):

     [input] String to analyze: [This string will be
analyzed.]
my name is abc123
      [echo] Running lia.analysis.AnalyzerDemo...
      [java] Analyzing "my name is abc123"
      [java]   WhitespaceAnalyzer:
      [java]     [my] [name] [is] [abc123]

      [java]   SimpleAnalyzer:
      [java]     [my] [name] [is] [abc]

      [java]   StopAnalyzer:
      [java]     [my] [name] [abc]

      [java]   StandardAnalyzer:
      [java]     [my] [name] [abc123]

So you indexed "abc123" and searches must search for
that term
*exactly*.  You can search for "abc*" as a
PrefixQuery or WildcardQuery
and find "abc123".  "*123" will also find it though
QueryParser does
not support leading wildcard characters (but the API
does).  Wildcard
queries are not ideally what you want as it tends to
be much slower for
large indexes.

You may need to do specialized analysis.  Perhaps
you could share you
real needs with the list and we could offer
recommendations.  It is
possible to index "abc123", "abc", and "123" all
within the same
position in the index if you do some clever analysis
and that meshes
with what you're after.

        Erik

---------------------------------------------------------------------

To unsubscribe, e-mail:
[EMAIL PROTECTED]
For additional commands, e-mail:
[EMAIL PROTECTED]

Send instant messages to your online friends http://uk.messenger.yahoo.com

---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

Re: using different analyzer for searching

Reply via email to