Re: indexing numbers in texts for range queries
Hi Mikhail, Range queries allowed inside phrases with ComplexPhraseQParser, but I think string order is used. Also LUCENE-5205 / SOLR-5410 is meant to supersede complex phrase. It might have that functionality too. Ahmet On Tuesday, December 2, 2014 10:43 PM, Mikhail Khludnev wrote: Hello Michael, On Tue, Dec 2, 2014 at 11:15 PM, Michael Sokolov < msoko...@safaribooksonline.com> wrote: > Mikhail - I can imagine a filter that strips out everything but numbers > and then indexes those with a (separate) numeric (trie) field. But I don't > believe you can do phrase or other proximity queries across multiple > fields. Technically it's not a big deal. I used FieldMaskingSpanQuery before. As long as an or-query is good enough, I think this problem is not too > hard? But if you need proximity it becomes more complicated. Once in the > distant past we coded a numeric range query using a complicated set of > wildcard queries that could handle large numbers efficiently - this search > index (Verity) had no range capability, so we had to mock it up using > text. The way this worked was something along these lines: > > 1) transform all the numbers into their binary encoding (8 = 0b1000, > eg) > 2) write queries by encoding the range as a set of bitmasks represented by > wildcard queries: > [8 TO 20] becomes (0b1000 0b000100?? 0b00010100) > > I know you said you cannot use [0-9]* terms, but you will not see terrible > term explosion with this. What's your concern there? > it's not terrible but significant, I wish to make a try with the trie magic, which reduces query time processing. Thanks for suggestions. Do I remember correctly that you ignored last Lucene Revolution? > > -Mike > > > > On 12/02/2014 02:59 PM, Mikhail Khludnev wrote: > >> Hello Searchers, >> >> Don't you remember any examples of indexing numbers inside of plain text. >> eg. if I have a text: "foo and 10 bars" I want to find it with a query >> like >> foo [8 TO 20] bars. >> The question no.1 whether to put trie terms into the separate field or >> they >> can reside at the same text one? Note, enumerating [0-9]* terms in >> MultiTermQuery is not an option for me, I definitely need the trie field >> magic! >> Perhaps you can remind a blog or chapter, whatever makes me happy. >> >> Thanks a lot! >> >> > -- Sincerely yours Mikhail Khludnev Principal Engineer, Grid Dynamics <http://www.griddynamics.com>
Re: indexing numbers in texts for range queries
On 12/02/2014 03:41 PM, Mikhail Khludnev wrote: Thanks for suggestions. Do I remember correctly that you ignored last Lucene Revolution? I wouldn't say I ignored it, but it's true I wasn't there in DC: I'm excited to catch up on the presentations as the videos become available, though. -Mike
Re: indexing numbers in texts for range queries
Hello Michael, On Tue, Dec 2, 2014 at 11:15 PM, Michael Sokolov < msoko...@safaribooksonline.com> wrote: > Mikhail - I can imagine a filter that strips out everything but numbers > and then indexes those with a (separate) numeric (trie) field. But I don't > believe you can do phrase or other proximity queries across multiple > fields. Technically it's not a big deal. I used FieldMaskingSpanQuery before. As long as an or-query is good enough, I think this problem is not too > hard? But if you need proximity it becomes more complicated. Once in the > distant past we coded a numeric range query using a complicated set of > wildcard queries that could handle large numbers efficiently - this search > index (Verity) had no range capability, so we had to mock it up using > text. The way this worked was something along these lines: > > 1) transform all the numbers into their binary encoding (8 = 0b1000, > eg) > 2) write queries by encoding the range as a set of bitmasks represented by > wildcard queries: > [8 TO 20] becomes (0b1000 0b000100?? 0b00010100) > > I know you said you cannot use [0-9]* terms, but you will not see terrible > term explosion with this. What's your concern there? > it's not terrible but significant, I wish to make a try with the trie magic, which reduces query time processing. Thanks for suggestions. Do I remember correctly that you ignored last Lucene Revolution? > > -Mike > > > > On 12/02/2014 02:59 PM, Mikhail Khludnev wrote: > >> Hello Searchers, >> >> Don't you remember any examples of indexing numbers inside of plain text. >> eg. if I have a text: "foo and 10 bars" I want to find it with a query >> like >> foo [8 TO 20] bars. >> The question no.1 whether to put trie terms into the separate field or >> they >> can reside at the same text one? Note, enumerating [0-9]* terms in >> MultiTermQuery is not an option for me, I definitely need the trie field >> magic! >> Perhaps you can remind a blog or chapter, whatever makes me happy. >> >> Thanks a lot! >> >> > -- Sincerely yours Mikhail Khludnev Principal Engineer, Grid Dynamics <http://www.griddynamics.com>
Re: indexing numbers in texts for range queries
Mikhail - I can imagine a filter that strips out everything but numbers and then indexes those with a (separate) numeric (trie) field. But I don't believe you can do phrase or other proximity queries across multiple fields. As long as an or-query is good enough, I think this problem is not too hard? But if you need proximity it becomes more complicated. Once in the distant past we coded a numeric range query using a complicated set of wildcard queries that could handle large numbers efficiently - this search index (Verity) had no range capability, so we had to mock it up using text. The way this worked was something along these lines: 1) transform all the numbers into their binary encoding (8 = 0b1000, eg) 2) write queries by encoding the range as a set of bitmasks represented by wildcard queries: [8 TO 20] becomes (0b1000 0b000100?? 0b00010100) I know you said you cannot use [0-9]* terms, but you will not see terrible term explosion with this. What's your concern there? -Mike On 12/02/2014 02:59 PM, Mikhail Khludnev wrote: Hello Searchers, Don't you remember any examples of indexing numbers inside of plain text. eg. if I have a text: "foo and 10 bars" I want to find it with a query like foo [8 TO 20] bars. The question no.1 whether to put trie terms into the separate field or they can reside at the same text one? Note, enumerating [0-9]* terms in MultiTermQuery is not an option for me, I definitely need the trie field magic! Perhaps you can remind a blog or chapter, whatever makes me happy. Thanks a lot!
indexing numbers in texts for range queries
Hello Searchers, Don't you remember any examples of indexing numbers inside of plain text. eg. if I have a text: "foo and 10 bars" I want to find it with a query like foo [8 TO 20] bars. The question no.1 whether to put trie terms into the separate field or they can reside at the same text one? Note, enumerating [0-9]* terms in MultiTermQuery is not an option for me, I definitely need the trie field magic! Perhaps you can remind a blog or chapter, whatever makes me happy. Thanks a lot! -- Sincerely yours Mikhail Khludnev Principal Engineer, Grid Dynamics <http://www.griddynamics.com>
Re: indexing numbers
the default schema.xml provided in the Solr distribution is well-documented, and a good place to get started (including numeric fieldTypes): http://wiki.apache.org/solr/SchemaXml Lucid Imagination also provides a nice reference guide: http://www.lucidimagination.com/Downloads/LucidWorks-for-Solr/Reference-Guide hope that helps, rob On Wed, May 25, 2011 at 6:20 PM, antoniosi wrote: > Hi, > > How does solr index a numeric value? Does it index it as a string or does it > keep it as a numeric value? > > Thanks. > > -- > View this message in context: > http://lucene.472066.n3.nabble.com/indexing-numbers-tp2986424p2986424.html > Sent from the Solr - User mailing list archive at Nabble.com. >
indexing numbers
Hi, How does solr index a numeric value? Does it index it as a string or does it keep it as a numeric value? Thanks. -- View this message in context: http://lucene.472066.n3.nabble.com/indexing-numbers-tp2986424p2986424.html Sent from the Solr - User mailing list archive at Nabble.com.