Hi Can anyone help me with a code having a class which extends the StandardAnalyzer and that analyzer should not tokenize the word across hyphen
SHAKTI SAREEN -----Original Message----- From: Shai Erera [mailto:[EMAIL PROTECTED] Sent: Thursday, November 22, 2007 9:53 PM To: [email protected] Subject: Re: help required urgent!!!!!!!!!!! The thing is - StandardAnalyzer breaks on hyphen. You'll need to work around this by either extend StandardAnalyzer >From StandardTokenizer's documentation (which is used by StandardAnalyzer): * <li> *Splits words at hyphens, unless there's a number in the token, in which case * the whole token is interpreted as a product number and is not split.* I've investigated StandardAnalyzer's tokenization and it doesn't look simple to disable that behavior. What you can do is extend StandardAnalyzer and override its tokenStream method to create a TokenStream of your own. If you know your text is space separated, you can use StringTokenizer to split the text on spaces. If a token contains '-', don't break it, otherwise pass it forward the the TokenStream returned by StandardAnalyzer. Maybe someone else has a better answer, but if you insist on using StandardAnalyzer, I have a feeling it will be problematic. On Nov 22, 2007 6:02 PM, Shakti_Sareen < [EMAIL PROTECTED]> wrote: > Hi > > But the file I am indexing is very big and I don't know which word will > contain the hyphen. The thing you suggest can be implemented only if > there are some specific words in the file. > > Apart from StandardAnalyzer I have got no option. > > Thanks a lot for your reply. > > Please suggest me how can I go ahead. > > > SHAKTI SAREEN > GE-GDC > STC HYDERABAD > 9948777794 > > -----Original Message----- > From: Shai Erera [mailto:[EMAIL PROTECTED] > Sent: Thursday, November 22, 2007 9:25 PM > To: [email protected] > Subject: Re: help required urgent!!!!!!!!!!! > > Hi > > You can simply create a PrefixQuery. However, if you're using > StandardAnalyzer, and the word is added as Index.TOKENIZED, > sotf-wa<something> will be broken to 'soft' and 'wa<something>'. > Therefore > you'll need to add the word as Index.UN_TOKENIZED, or use a different > Analyzer when you index the data (for this field at least). > > Here's a sample code: > > // Indexing. > Document doc = new Document(); > doc.add(new Field("field", "soft-wash", Store.NO, > Index.UN_TOKENIZED > )); > > // Search > Query q = new PrefixQuery(new Term("field", "soft-wa")); > > Does that help? > > On Nov 22, 2007 5:46 PM, Shakti_Sareen < [EMAIL PROTECTED]> wrote: > > > Hi > > I am using StandardAnalyser() to index the data. > > But I want to do a like search on a word containing Hyphen > > For example it want to search a word "soft-wa*" > > > > I am getting no hits for that. It is said that if the hyphen is there > in > > the word, then we should include that word in the double quotes ("). > But > > enclosing the word in a double quotes (") means the exact word search. > > > > How can I perform the like search on a word containing hyphen??????? > > > > Please help. > > > > Regards, > > Shakti Sareen > > > > > > > > > > > > DISCLAIMER: > > This email (including any attachments) is intended for the sole use of > the > > intended recipient/s and may contain material that is CONFIDENTIAL AND > > PRIVATE COMPANY INFORMATION. Any review or reliance by others or > copying or > > distribution or forwarding of any or all of the contents in this > message is > > STRICTLY PROHIBITED. If you are not the intended recipient, please > contact > > the sender by email and delete all copies; your cooperation in this > regard > > is appreciated. > > > > --------------------------------------------------------------------- > > To unsubscribe, e-mail: [EMAIL PROTECTED] > > For additional commands, e-mail: [EMAIL PROTECTED] > > > > > > > -- > Regards, > > Shai Erera > > > DISCLAIMER: > This email (including any attachments) is intended for the sole use of the > intended recipient/s and may contain material that is CONFIDENTIAL AND > PRIVATE COMPANY INFORMATION. Any review or reliance by others or copying or > distribution or forwarding of any or all of the contents in this message is > STRICTLY PROHIBITED. If you are not the intended recipient, please contact > the sender by email and delete all copies; your cooperation in this regard > is appreciated. > > > --------------------------------------------------------------------- > To unsubscribe, e-mail: [EMAIL PROTECTED] > For additional commands, e-mail: [EMAIL PROTECTED] > -- Regards, Shai Erera DISCLAIMER: This email (including any attachments) is intended for the sole use of the intended recipient/s and may contain material that is CONFIDENTIAL AND PRIVATE COMPANY INFORMATION. Any review or reliance by others or copying or distribution or forwarding of any or all of the contents in this message is STRICTLY PROHIBITED. If you are not the intended recipient, please contact the sender by email and delete all copies; your cooperation in this regard is appreciated.
--------------------------------------------------------------------- To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
