Copy the current Standard Analyzer and add '-' to the definition of a LETTER. You might work with the StandardAnazlyer on the trunk which uses JFlex rather than the current JavaCC flavor...the new one is something like 6-10 times faster.

- Mark

Shakti_Sareen wrote:
Hi
Can anyone help me with a code having a class which extends the
StandardAnalyzer and that analyzer should not tokenize the word across
hyphen

SHAKTI SAREEN

-----Original Message-----
From: Shai Erera [mailto:[EMAIL PROTECTED] Sent: Thursday, November 22, 2007 9:53 PM
To: java-user@lucene.apache.org
Subject: Re: help required urgent!!!!!!!!!!!

The thing is - StandardAnalyzer breaks on hyphen. You'll need to work
around
this by either extend StandardAnalyzer

>From StandardTokenizer's documentation (which is used by
StandardAnalyzer):
*   <li> *Splits words at hyphens, unless there's a number in the token,
in
which case
 *     the whole token is interpreted as a product number and is not
split.*

I've investigated StandardAnalyzer's tokenization and it doesn't look
simple
to disable that behavior. What you can do is extend StandardAnalyzer and
override its tokenStream method to create a TokenStream of your own. If
you
know your text is space separated, you can use StringTokenizer to split
the
text on spaces. If a token contains '-', don't break it, otherwise pass
it
forward the the TokenStream returned by StandardAnalyzer.

Maybe someone else has a better answer, but if you insist on using
StandardAnalyzer, I have a feeling it will be problematic.

On Nov 22, 2007 6:02 PM, Shakti_Sareen < [EMAIL PROTECTED]>
wrote:

Hi

But the file I am indexing is very big and I don't know which word
will
contain the hyphen. The thing you suggest can be implemented only if
there are some specific words in the file.

Apart from StandardAnalyzer I have got no option.

Thanks a lot for your reply.

Please suggest me how can I go ahead.


SHAKTI SAREEN
GE-GDC
STC HYDERABAD
9948777794

-----Original Message-----
From: Shai Erera [mailto:[EMAIL PROTECTED]
Sent: Thursday, November 22, 2007 9:25 PM
To: java-user@lucene.apache.org
Subject: Re: help required urgent!!!!!!!!!!!

Hi

You can simply create a PrefixQuery. However, if you're using
StandardAnalyzer, and the word is added as Index.TOKENIZED,
sotf-wa<something> will be broken to 'soft' and 'wa<something>'.
Therefore
you'll need to add the word as Index.UN_TOKENIZED, or use a different
Analyzer when you index the data (for this field at least).

Here's a sample code:

       // Indexing.
       Document doc = new Document();
       doc.add(new Field("field", "soft-wash", Store.NO,
Index.UN_TOKENIZED
));

       // Search
       Query q = new PrefixQuery(new Term("field", "soft-wa"));

Does that help?

On Nov 22, 2007 5:46 PM, Shakti_Sareen < [EMAIL PROTECTED]>
wrote:
Hi
I am using StandardAnalyser() to index the data.
But I want to do a like search on a word containing Hyphen
For example it want to search a word "soft-wa*"

I am getting no hits for that. It is said that if the hyphen is
there
in
the word, then we should include that word in the double quotes (").
But
enclosing the word in a double quotes (") means the exact word
search.
How can I perform the like search on a word containing hyphen???????

Please help.

Regards,
Shakti Sareen





DISCLAIMER:
This email (including any attachments) is intended for the sole use
of
the
intended recipient/s and may contain material that is CONFIDENTIAL
AND
PRIVATE COMPANY INFORMATION. Any review or reliance by others or
copying or
distribution or forwarding of any or all of the contents in this
message is
STRICTLY PROHIBITED. If you are not the intended recipient, please
contact
the sender by email and delete all copies; your cooperation in this
regard
is appreciated.


---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]


--
Regards,

Shai Erera


DISCLAIMER:
This email (including any attachments) is intended for the sole use of
the
intended recipient/s and may contain material that is CONFIDENTIAL AND
PRIVATE COMPANY INFORMATION. Any review or reliance by others or
copying or
distribution or forwarding of any or all of the contents in this
message is
STRICTLY PROHIBITED. If you are not the intended recipient, please
contact
the sender by email and delete all copies; your cooperation in this
regard
is appreciated.


---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]




------------------------------------------------------------------------

---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

Reply via email to