help required

Shakti_Sareen Fri, 23 Nov 2007 03:40:57 -0800

Hi 

Can anyone help me with a code having a class which extends the
StandardAnalyzer and that analyzer should not tokenize the word across
hyphen


 
SHAKTI SAREEN

-----Original Message-----
From: Shai Erera [mailto:[EMAIL PROTECTED] 
Sent: Thursday, November 22, 2007 9:53 PM
To: java-user@lucene.apache.org
Subject: Re: help required urgent!!!!!!!!!!!

The thing is - StandardAnalyzer breaks on hyphen. You'll need to work
around
this by either extend StandardAnalyzer

>From StandardTokenizer's documentation (which is used by
StandardAnalyzer):
*   <li> *Splits words at hyphens, unless there's a number in the token,
in
which case
 *     the whole token is interpreted as a product number and is not
split.*

I've investigated StandardAnalyzer's tokenization and it doesn't look
simple
to disable that behavior. What you can do is extend StandardAnalyzer and
override its tokenStream method to create a TokenStream of your own. If
you
know your text is space separated, you can use StringTokenizer to split
the
text on spaces. If a token contains '-', don't break it, otherwise pass
it
forward the the TokenStream returned by StandardAnalyzer.

Maybe someone else has a better answer, but if you insist on using
StandardAnalyzer, I have a feeling it will be problematic.

On Nov 22, 2007 6:02 PM, Shakti_Sareen < [EMAIL PROTECTED]>
wrote:

> Hi
>
> But the file I am indexing is very big and I don't know which word
will
> contain the hyphen. The thing you suggest can be implemented only if
> there are some specific words in the file.
>
> Apart from StandardAnalyzer I have got no option.
>
> Thanks a lot for your reply.
>
> Please suggest me how can I go ahead.
>
>
> SHAKTI SAREEN
> GE-GDC
> STC HYDERABAD
> 9948777794
>
> -----Original Message-----
> From: Shai Erera [mailto:[EMAIL PROTECTED]
> Sent: Thursday, November 22, 2007 9:25 PM
> To: java-user@lucene.apache.org
> Subject: Re: help required urgent!!!!!!!!!!!
>
> Hi
>
> You can simply create a PrefixQuery. However, if you're using
> StandardAnalyzer, and the word is added as Index.TOKENIZED,
> sotf-wa<something> will be broken to 'soft' and 'wa<something>'.
> Therefore
> you'll need to add the word as Index.UN_TOKENIZED, or use a different
> Analyzer when you index the data (for this field at least).
>
> Here's a sample code:
>
>        // Indexing.
>        Document doc = new Document();
>        doc.add(new Field("field", "soft-wash", Store.NO,
> Index.UN_TOKENIZED
> ));
>
>        // Search
>        Query q = new PrefixQuery(new Term("field", "soft-wa"));
>
> Does that help?
>
> On Nov 22, 2007 5:46 PM, Shakti_Sareen < [EMAIL PROTECTED]>
wrote:
>
> > Hi
> > I am using StandardAnalyser() to index the data.
> > But I want to do a like search on a word containing Hyphen
> > For example it want to search a word "soft-wa*"
> >
> > I am getting no hits for that. It is said that if the hyphen is
there
> in
> > the word, then we should include that word in the double quotes (").
> But
> > enclosing the word in a double quotes (") means the exact word
search.
> >
> > How can I perform the like search on a word containing hyphen???????
> >
> > Please help.
> >
> > Regards,
> > Shakti Sareen
> >
> >
> >
> >
> >
> > DISCLAIMER:
> > This email (including any attachments) is intended for the sole use
of
> the
> > intended recipient/s and may contain material that is CONFIDENTIAL
AND
> > PRIVATE COMPANY INFORMATION. Any review or reliance by others or
> copying or
> > distribution or forwarding of any or all of the contents in this
> message is
> > STRICTLY PROHIBITED. If you are not the intended recipient, please
> contact
> > the sender by email and delete all copies; your cooperation in this
> regard
> > is appreciated.
> >
> >
---------------------------------------------------------------------
> > To unsubscribe, e-mail: [EMAIL PROTECTED]
> > For additional commands, e-mail: [EMAIL PROTECTED]
> >
> >
>
>
> --
> Regards,
>
> Shai Erera
>
>
> DISCLAIMER:
> This email (including any attachments) is intended for the sole use of
the
> intended recipient/s and may contain material that is CONFIDENTIAL AND
> PRIVATE COMPANY INFORMATION. Any review or reliance by others or
copying or
> distribution or forwarding of any or all of the contents in this
message is
> STRICTLY PROHIBITED. If you are not the intended recipient, please
contact
> the sender by email and delete all copies; your cooperation in this
regard
> is appreciated.
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: [EMAIL PROTECTED]
> For additional commands, e-mail: [EMAIL PROTECTED]
>



-- 
Regards,

Shai Erera


DISCLAIMER:
This email (including any attachments) is intended for the sole use of the 
intended recipient/s and may contain material that is CONFIDENTIAL AND PRIVATE 
COMPANY INFORMATION. Any review or reliance by others or copying or 
distribution or forwarding of any or all of the contents in this message is 
STRICTLY PROHIBITED. If you are not the intended recipient, please contact the 
sender by email and delete all copies; your cooperation in this regard is 
appreciated.

---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

help required

Reply via email to