Re: Number range search through Query subclass

2003-02-15 Thread Tatu Saloranta
On Friday 14 February 2003 02:58, Volker Luedeling wrote:
> Hi,
>  
> I am writing an application that constructs Lucene searches from XML
> queries. Each item from the XML is represented by a Query of the
> corresponding type. I have a problem when I try to search for number
> ranges, since RangeQuery compares strings, not numbers, so 15 < 155 < 20.
> What I need is a subclass of Query that evaluates numbers correctly. I have
> tried subclassing RangeQuery, MultiTermQuery or Query directly, but each
> time I have run into problems with inheritance and access rights to various
> methods or inner classes. 
> Does anyone know of a solution to this problem? If there is none, the only
> way I can think of would be indexing numbers as something like "#15#". But
> it's not a very elegant solution when all I need is a slight variation of
> one existing class. 
> Thanks for any help you can offer,

Actually the problem is not (just) the query, it's tokenizer/analyzer/indexer 
as well. For range query to work, tokens have to be correctly ordered 
lexically (~= in alphabetic order). I don't think using #s as markers would 
work, as they do not make tokens get ordered properly (plus, most analyzers 
would just remove those chars).

The usual way to do this is to use suitable numeric format for indexed data; 
for dates format like -MM-DD works ok (ie. dates are correctly ordered 
when ordering date tokens alphabetically), for other numbers (like 
timestamps) what is usually done is padding, so that numbers in your case
could be "015", "155" and "20" (instead of leading 0 any other letter that is 
before '1' in alphabetic order would do). So, you need to know biggest number 
you'd need to index and use appropriate zero padding.

Now, if you store these numbers as single values in separate index, padding is 
easy to do. If you are trying to get random numeric data contained in 
otherwise plain text content, things are bit more complicated.

Hope this helps,

-+ Tatu +-


-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]




Re: Syntax Problem

2003-02-15 Thread Terry Steichen
Christoph,

Same basic result:

+(cloning clone) +animal yields 1072 hits
(cloning OR clone) AND animal yields 19 hits.
(cloning clone) AND animal yields 19 hits.

Regards,

Terry

- Original Message -
From: "Christoph Kiehl" <[EMAIL PROTECTED]>
To: <[EMAIL PROTECTED]>
Sent: Saturday, February 15, 2003 7:41 PM
Subject: Re: Syntax Problem


> Terry Steichen wrote:
> > I have an index which, when searched with this query ("cloning clone
> > animal") produces 1103 hits.  A different, more narrow query
> > ("(cloning clone) AND animal") produces only 19 hits.
>
> AFAIK the terms in your queries are by default concatenated by OR. This
> means "cloning clone animal" == "cloning OR clone OR animal".
>
> > What's puzzling to me is that if I try a different (but supposedly
> > identical) form of the more narrow query ("+(cloning clone)
> > +animal"), it produces 1103 hits rather than the 19 that I expect.
> >
> > In other words, "+(cloning clone) +animal" appears to be the
> > equivalent of "cloning OR clone OR animal" rather than "(cloning OR
> > clone) AND animal".
>
> Hm, strange. I would expect "+(cloning clone) +animal" being translated to
> "(cloning OR clone) AND animal". I just tried it here. The translation is
> done as I expected. Perhaps you could try the last query ("(cloning OR
> clone) AND animal") and compare the resultsize with the one from
"+(cloning
> clone) +animal" (even if both seem to be the same as "(cloning clone) AND
> animal" ;)?
>
> Christoph
>
>
>
>
>
>
> -
> To unsubscribe, e-mail: [EMAIL PROTECTED]
> For additional commands, e-mail: [EMAIL PROTECTED]
>
>


-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]




Re: Syntax Problem

2003-02-15 Thread Christoph Kiehl
Terry Steichen wrote:
> I have an index which, when searched with this query ("cloning clone
> animal") produces 1103 hits.  A different, more narrow query
> ("(cloning clone) AND animal") produces only 19 hits.

AFAIK the terms in your queries are by default concatenated by OR. This
means "cloning clone animal" == "cloning OR clone OR animal".

> What's puzzling to me is that if I try a different (but supposedly
> identical) form of the more narrow query ("+(cloning clone)
> +animal"), it produces 1103 hits rather than the 19 that I expect.
>
> In other words, "+(cloning clone) +animal" appears to be the
> equivalent of "cloning OR clone OR animal" rather than "(cloning OR
> clone) AND animal".

Hm, strange. I would expect "+(cloning clone) +animal" being translated to
"(cloning OR clone) AND animal". I just tried it here. The translation is
done as I expected. Perhaps you could try the last query ("(cloning OR
clone) AND animal") and compare the resultsize with the one from "+(cloning
clone) +animal" (even if both seem to be the same as "(cloning clone) AND
animal" ;)?

Christoph






-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]




Syntax Problem

2003-02-15 Thread Terry Steichen
I have an index which, when searched with this query ("cloning clone animal") produces 
1103 hits.  A different, more narrow query ("(cloning clone) AND animal") produces 
only 19 hits.

What's puzzling to me is that if I try a different (but supposedly identical) form of 
the more narrow query ("+(cloning clone) +animal"), it produces 1103 hits rather than 
the 19 that I expect.

In other words, "+(cloning clone) +animal" appears to be the equivalent of "cloning OR 
clone OR animal" rather than "(cloning OR clone) AND animal".

Am I misunderstanding something about the "+ -" syntax, or is this some kind of bug?

Regards,

Terry