RE: jaspq: dashed numerical values tokenized differently

Daniel Taurat Wed, 03 Nov 2004 05:52:12 -0800


> -----Original Message-----
> From: Erik Hatcher [mailto:[EMAIL PROTECTED]
> Sent: Mittwoch, 3. November 2004 13:39
> To: Lucene Users List
> Subject: Re: jaspq: dashed numerical values tokenized differently
> 
> 
> On Nov 3, 2004, at 5:03 AM, Daniel Taurat wrote:
> >> Query parser was changed to treat '-' within words as part of the
> >> word.
> >> Before that change a query 'dash-test' was parsed as 'dash AND NOT
> > test'.
> >> Now QP reads one word 'dash-test' which is analyzed. If the
analyzer
> >> splits that to more than one token (standard analyzer does) a
phrase
> >> query is created.
> >> The difference you see comes from standard analyzer which tokenizes
> >> dash-test dash-123 to tokens dash, test and dash-123.
> >> Prefix queries aren't analyzed.
> >
> > So you say that dash-123 is a prefix query whereas dash-test is not?
> > I found also (with Luke) that dash-anystring123 is not tokenized as
> > well.
> > What exactly are the criteria for Lucene to decide what a prefix is
and
> > what not?
> 
> Anything that ends with an asterisk is parsed as a PrefixQuery, as
long
> as it does not have other wildcard characters.  If it has other
> wildcard characters or the asterisk is not at the end, then it is
> parsed as a WildcardQuery.
> 
>       Erik
> 


Okay, got that. 
Now my only question is, why the tokenizing works differently for
strings with numerical components, or if there is a way to make the
standardAnalyzer treat those dashed mixed-characters strings similar to
plain letter-strings.

Daniel

---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]
RE: jaspq: dashed numerical values tokenized differently

Reply via email to