I added the following to both TestStandardAnalyzer and TestClassicAnalyzer in
branches/lucene_solr_3_6/, and it passed in both cases:
public void testWhitespaceHyphenWhitespace() throws Exception {
BaseTokenStreamTestCase.assertAnalyzesTo
(a, "drinks - water", new String[]{"drinks", "water"});
}
So I'm not seeing the same behavior as you guys - the hyphen is not part of any
emitted token.
Steve
-----Original Message-----
From: [email protected] [mailto:[email protected]]
Sent: Monday, June 25, 2012 11:33 AM
To: [email protected]
Subject: Re: how to remove the dash
A Segunda, 25 de Junho de 2012 16:10:38 Ian Lea escreveu:
> My apologies - you are right.
>
> With both ClassicAnalyzer and StandardAnalyzer, "drinks - water"
comes
> out as "drinks -water" whereas "drinks-water" comes out as "drinks
> water", as I'd expected.
>
> I guess this is fixable in JFlex, or I think there is some replace
> tokenizer somewhere that can replace character X with character Y
e.g.
> "-" with " ". Or pre-process your text/queries with a regexp. Maybe
> someone else has better ideas.
I guess the same... I'am already using my own Tokenizer(based on
StandardTokenizer) to mark some strings for replacement or removal and i'am
using a a filter to replace them and the filter to remove... And tried to do
that with the "-" but didn't worked... I can't even mark the "-".
I'am avoiding pre-process...
I'am hoping that somebody could tell what can I change on StandardTokenizer
JFlex to changes this behavior.
Thanks
>
>
> --
> Ian.
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: [email protected]
> For additional commands, e-mail: [email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]