Re: Number range search through Query subclass
On Friday 14 February 2003 02:58, Volker Luedeling wrote: > Hi, > > I am writing an application that constructs Lucene searches from XML > queries. Each item from the XML is represented by a Query of the > corresponding type. I have a problem when I try to search for number > ranges, since RangeQuery compares strings, not numbers, so 15 < 155 < 20. > What I need is a subclass of Query that evaluates numbers correctly. I have > tried subclassing RangeQuery, MultiTermQuery or Query directly, but each > time I have run into problems with inheritance and access rights to various > methods or inner classes. > Does anyone know of a solution to this problem? If there is none, the only > way I can think of would be indexing numbers as something like "#15#". But > it's not a very elegant solution when all I need is a slight variation of > one existing class. > Thanks for any help you can offer, Actually the problem is not (just) the query, it's tokenizer/analyzer/indexer as well. For range query to work, tokens have to be correctly ordered lexically (~= in alphabetic order). I don't think using #s as markers would work, as they do not make tokens get ordered properly (plus, most analyzers would just remove those chars). The usual way to do this is to use suitable numeric format for indexed data; for dates format like -MM-DD works ok (ie. dates are correctly ordered when ordering date tokens alphabetically), for other numbers (like timestamps) what is usually done is padding, so that numbers in your case could be "015", "155" and "20" (instead of leading 0 any other letter that is before '1' in alphabetic order would do). So, you need to know biggest number you'd need to index and use appropriate zero padding. Now, if you store these numbers as single values in separate index, padding is easy to do. If you are trying to get random numeric data contained in otherwise plain text content, things are bit more complicated. Hope this helps, -+ Tatu +- - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
Re: Syntax Problem
Christoph, Same basic result: +(cloning clone) +animal yields 1072 hits (cloning OR clone) AND animal yields 19 hits. (cloning clone) AND animal yields 19 hits. Regards, Terry - Original Message - From: "Christoph Kiehl" <[EMAIL PROTECTED]> To: <[EMAIL PROTECTED]> Sent: Saturday, February 15, 2003 7:41 PM Subject: Re: Syntax Problem > Terry Steichen wrote: > > I have an index which, when searched with this query ("cloning clone > > animal") produces 1103 hits. A different, more narrow query > > ("(cloning clone) AND animal") produces only 19 hits. > > AFAIK the terms in your queries are by default concatenated by OR. This > means "cloning clone animal" == "cloning OR clone OR animal". > > > What's puzzling to me is that if I try a different (but supposedly > > identical) form of the more narrow query ("+(cloning clone) > > +animal"), it produces 1103 hits rather than the 19 that I expect. > > > > In other words, "+(cloning clone) +animal" appears to be the > > equivalent of "cloning OR clone OR animal" rather than "(cloning OR > > clone) AND animal". > > Hm, strange. I would expect "+(cloning clone) +animal" being translated to > "(cloning OR clone) AND animal". I just tried it here. The translation is > done as I expected. Perhaps you could try the last query ("(cloning OR > clone) AND animal") and compare the resultsize with the one from "+(cloning > clone) +animal" (even if both seem to be the same as "(cloning clone) AND > animal" ;)? > > Christoph > > > > > > > - > To unsubscribe, e-mail: [EMAIL PROTECTED] > For additional commands, e-mail: [EMAIL PROTECTED] > > - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
Re: Syntax Problem
Terry Steichen wrote: > I have an index which, when searched with this query ("cloning clone > animal") produces 1103 hits. A different, more narrow query > ("(cloning clone) AND animal") produces only 19 hits. AFAIK the terms in your queries are by default concatenated by OR. This means "cloning clone animal" == "cloning OR clone OR animal". > What's puzzling to me is that if I try a different (but supposedly > identical) form of the more narrow query ("+(cloning clone) > +animal"), it produces 1103 hits rather than the 19 that I expect. > > In other words, "+(cloning clone) +animal" appears to be the > equivalent of "cloning OR clone OR animal" rather than "(cloning OR > clone) AND animal". Hm, strange. I would expect "+(cloning clone) +animal" being translated to "(cloning OR clone) AND animal". I just tried it here. The translation is done as I expected. Perhaps you could try the last query ("(cloning OR clone) AND animal") and compare the resultsize with the one from "+(cloning clone) +animal" (even if both seem to be the same as "(cloning clone) AND animal" ;)? Christoph - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
Syntax Problem
I have an index which, when searched with this query ("cloning clone animal") produces 1103 hits. A different, more narrow query ("(cloning clone) AND animal") produces only 19 hits. What's puzzling to me is that if I try a different (but supposedly identical) form of the more narrow query ("+(cloning clone) +animal"), it produces 1103 hits rather than the 19 that I expect. In other words, "+(cloning clone) +animal" appears to be the equivalent of "cloning OR clone OR animal" rather than "(cloning OR clone) AND animal". Am I misunderstanding something about the "+ -" syntax, or is this some kind of bug? Regards, Terry