Re: help required

Mark Miller Fri, 23 Nov 2007 05:29:09 -0800

Copy the current Standard Analyzer and add '-' to the definition of aLETTER. You might work with the StandardAnazlyer on the trunk which usesJFlex rather than the current JavaCC flavor...the new one is somethinglike 6-10 times faster.


- Mark


Shakti_Sareen wrote:

Hi

Can anyone help me with a code having a class which extends the
StandardAnalyzer and that analyzer should not tokenize the word across
hyphen

SHAKTI SAREEN


-----Original Message-----

From: Shai Erera [mailto:[EMAIL PROTECTED]Sent: Thursday, November 22, 2007 9:53 PM

To: [email protected]
Subject: Re: help required urgent!!!!!!!!!!!

The thing is - StandardAnalyzer breaks on hyphen. You'll need to work
around
this by either extend StandardAnalyzer

>From StandardTokenizer's documentation (which is used by
StandardAnalyzer):
*   <li> *Splits words at hyphens, unless there's a number in the token,
in
which case
 *     the whole token is interpreted as a product number and is not
split.*

I've investigated StandardAnalyzer's tokenization and it doesn't look
simple
to disable that behavior. What you can do is extend StandardAnalyzer and
override its tokenStream method to create a TokenStream of your own. If
you
know your text is space separated, you can use StringTokenizer to split
the
text on spaces. If a token contains '-', don't break it, otherwise pass
it
forward the the TokenStream returned by StandardAnalyzer.

Maybe someone else has a better answer, but if you insist on using
StandardAnalyzer, I have a feeling it will be problematic.

On Nov 22, 2007 6:02 PM, Shakti_Sareen < [EMAIL PROTECTED]>
wrote:

Hi

But the file I am indexing is very big and I don't know which word

will

contain the hyphen. The thing you suggest can be implemented only if
there are some specific words in the file.

Apart from StandardAnalyzer I have got no option.

Thanks a lot for your reply.

Please suggest me how can I go ahead.


SHAKTI SAREEN
GE-GDC
STC HYDERABAD
9948777794

-----Original Message-----
From: Shai Erera [mailto:[EMAIL PROTECTED]
Sent: Thursday, November 22, 2007 9:25 PM
To: [email protected]
Subject: Re: help required urgent!!!!!!!!!!!

Hi

You can simply create a PrefixQuery. However, if you're using
StandardAnalyzer, and the word is added as Index.TOKENIZED,
sotf-wa<something> will be broken to 'soft' and 'wa<something>'.
Therefore
you'll need to add the word as Index.UN_TOKENIZED, or use a different
Analyzer when you index the data (for this field at least).

Here's a sample code:

       // Indexing.
       Document doc = new Document();
       doc.add(new Field("field", "soft-wash", Store.NO,
Index.UN_TOKENIZED
));

       // Search
       Query q = new PrefixQuery(new Term("field", "soft-wa"));

Does that help?

On Nov 22, 2007 5:46 PM, Shakti_Sareen < [EMAIL PROTECTED]>

wrote:

Hi
I am using StandardAnalyser() to index the data.
But I want to do a like search on a word containing Hyphen
For example it want to search a word "soft-wa*"

I am getting no hits for that. It is said that if the hyphen is

there

in

the word, then we should include that word in the double quotes (").

But

enclosing the word in a double quotes (") means the exact word

search.

How can I perform the like search on a word containing hyphen???????

Please help.

Regards,
Shakti Sareen





DISCLAIMER:
This email (including any attachments) is intended for the sole use

of

the

intended recipient/s and may contain material that is CONFIDENTIAL

AND

PRIVATE COMPANY INFORMATION. Any review or reliance by others or

copying or

distribution or forwarding of any or all of the contents in this

message is

STRICTLY PROHIBITED. If you are not the intended recipient, please

contact

the sender by email and delete all copies; your cooperation in this

regard

is appreciated.

---------------------------------------------------------------------

To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

--
Regards,

Shai Erera


DISCLAIMER:
This email (including any attachments) is intended for the sole use of

the

intended recipient/s and may contain material that is CONFIDENTIAL AND
PRIVATE COMPANY INFORMATION. Any review or reliance by others or

copying or

distribution or forwarding of any or all of the contents in this

message is

STRICTLY PROHIBITED. If you are not the intended recipient, please

contact

the sender by email and delete all copies; your cooperation in this

regard

is appreciated.


---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

------------------------------------------------------------------------


---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]


---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

Re: help required

Reply via email to