Re: Trie Based field (long) value parsing on query time
: q=reference:4-1.2 : : the value is a text, but the following is indexed as a number (e.g.: : 004001002, where 4 becomes 004, and 1 becomes 001, and 2 002), depnding on how you look at it, you could implment this as one of two plugins: 1) if you consider this a special form of query syntax, then you should implement it as a QParser -- ie: a RefrenceNumberQparser that it could be used with any field/fieldType/FieldType, regardless of wether it's ultimately backed by a TrieLongField or an IntField, etc... 2) if you consider this to be an special format of the underlying data, then you would implement it in the FieldType -- ie: by subclassing TrieLongField and calling it RefrenceNumberField and overriding methods like readableToIndexed (and optionally: things like indexedToReadable and toObject if you'd like to format the resulting data as 4-1.2 when you return it to the client) Practically i think the key decision maker is what sort of behavior you want to support if someone asks for something like refrence:4 ... should that be equivilent to 4-0.0 (in which case you could implement this easily as a FieldType) or should it be a range query for everything in article 4, regardless of section and subsection? (in which case i would implement it in a QParser) -Hoss
Re: trie
2010/9/21 Péter Király kirun...@gmail.com: You can read about it in Lucene in Action second edition. have a look at http://www.lucidimagination.com/developer/whitepaper/Whats-New-in-Apache-Lucene-3-0 page 4 to 8 should give you a good intro to the topic simon Péter 2010/9/21 Papp Richard ccode...@gmail.com: is there any good tutorial how to use and what is trie? what I found on the net is really blurry. rgeards, Rich __ Information from ESET NOD32 Antivirus, version of virus signature database 5419 (20100902) __ The message was checked by ESET NOD32 Antivirus. http://www.eset.com
RE: trie
thank you guys for the answers... now I have to check / read some docs ;) Rich -Original Message- From: Simon Willnauer [mailto:simon.willna...@googlemail.com] Sent: Tuesday, September 21, 2010 23:00 To: solr-user@lucene.apache.org Subject: Re: trie 2010/9/21 Péter Király kirun...@gmail.com: You can read about it in Lucene in Action second edition. have a look at http://www.lucidimagination.com/developer/whitepaper/Whats-New-in-Apache-Lucene-3-0 page 4 to 8 should give you a good intro to the topic simon Péter 2010/9/21 Papp Richard ccode...@gmail.com: is there any good tutorial how to use and what is trie? what I found on the net is really blurry. rgeards, Rich __ Information from ESET NOD32 Antivirus, version of virus signature database 5419 (20100902) __ The message was checked by ESET NOD32 Antivirus. http://www.eset.com __ Information from ESET NOD32 Antivirus, version of virus signature database 5419 (20100902) __ The message was checked by ESET NOD32 Antivirus. http://www.eset.com __ Information from ESET NOD32 Antivirus, version of virus signature database 5419 (20100902) __ The message was checked by ESET NOD32 Antivirus. http://www.eset.com
Re: trie fields and sortMissingLast
On Mon, Dec 21, 2009 at 5:37 PM, Marc Sturlese marc.sturl...@gmail.comwrote: Should sortMissingLast param be working on trie-fields? Nope, trie fields do not support sortMissingFirst or sortMissingLast. -- Regards, Shalin Shekhar Mangar.
Re: trie fields and sortMissingLast
On Thu, Oct 1, 2009 at 2:54 PM, Lance Norskog goks...@gmail.com wrote: Trie fields also do not support faceting. Only those that index multiple tokens per value to speed up range queries. They also take more ram in some operations. Should be less memory on average. -Yonik http://www.lucidimagination.com Given these defects, I'm not sure that promoting tries as the default is appropriate at this time. (I'm sure this is an old argument.:) On Thu, Oct 1, 2009 at 7:39 AM, Steve Conover scono...@gmail.com wrote: I just noticed this comment in the default schema: !-- These types should only be used for back compatibility with existing indexes, or if sortMissingLast functionality is needed. Use Trie based fields instead. -- Does that mean TrieFields are never going to get sortMissingLast? Do you all think that a reasonable strategy is to use a copyField and use s fields for sorting (only), and trie for everything else? On Wed, Sep 30, 2009 at 10:59 PM, Steve Conover scono...@gmail.com wrote: Am I correct in thinking that trie fields don't support sortMissingLast (my tests show that they don't). If not, is there any plan for adding it in? Regards, Steve -- Lance Norskog goks...@gmail.com
Re: trie fields and sortMissingLast
I just noticed this comment in the default schema: !-- These types should only be used for back compatibility with existing indexes, or if sortMissingLast functionality is needed. Use Trie based fields instead. -- Does that mean TrieFields are never going to get sortMissingLast? Do you all think that a reasonable strategy is to use a copyField and use s fields for sorting (only), and trie for everything else? On Wed, Sep 30, 2009 at 10:59 PM, Steve Conover scono...@gmail.com wrote: Am I correct in thinking that trie fields don't support sortMissingLast (my tests show that they don't). If not, is there any plan for adding it in? Regards, Steve
Re: trie fields and sortMissingLast
Trie fields also do not support faceting. They also take more ram in some operations. Given these defects, I'm not sure that promoting tries as the default is appropriate at this time. (I'm sure this is an old argument.:) On Thu, Oct 1, 2009 at 7:39 AM, Steve Conover scono...@gmail.com wrote: I just noticed this comment in the default schema: !-- These types should only be used for back compatibility with existing indexes, or if sortMissingLast functionality is needed. Use Trie based fields instead. -- Does that mean TrieFields are never going to get sortMissingLast? Do you all think that a reasonable strategy is to use a copyField and use s fields for sorting (only), and trie for everything else? On Wed, Sep 30, 2009 at 10:59 PM, Steve Conover scono...@gmail.com wrote: Am I correct in thinking that trie fields don't support sortMissingLast (my tests show that they don't). If not, is there any plan for adding it in? Regards, Steve -- Lance Norskog goks...@gmail.com
Re: trie fields and sortMissingLast
On Thu, Oct 1, 2009 at 10:39 AM, Steve Conover scono...@gmail.com wrote: I just noticed this comment in the default schema: !-- These types should only be used for back compatibility with existing indexes, or if sortMissingLast functionality is needed. Use Trie based fields instead. -- Does that mean TrieFields are never going to get sortMissingLast? Not in time for 1.4, but yes they will eventually get it. It has to do with the representation... currently we can't tell between a 0 and missing. Do you all think that a reasonable strategy is to use a copyField and use s fields for sorting (only), and trie for everything else? If you don't need the fast range queries, use the s fields only. -Yonik http://www.lucidimagination.com On Wed, Sep 30, 2009 at 10:59 PM, Steve Conover scono...@gmail.com wrote: Am I correct in thinking that trie fields don't support sortMissingLast (my tests show that they don't). If not, is there any plan for adding it in? Regards, Steve
Re: trie fields and sortMissingLast
Not in time for 1.4, but yes they will eventually get it. It has to do with the representation... currently we can't tell between a 0 and missing. Hmm. So does that mean that a query for latitudes, stored as trie floats, from -10 to +10 matches documents with no (i.e. null) latitude value?
Re: trie fields and sortMissingLast
On Thu, Oct 1, 2009 at 11:09 PM, Steve Conover scono...@gmail.com wrote: Not in time for 1.4, but yes they will eventually get it. It has to do with the representation... currently we can't tell between a 0 and missing. Hmm. So does that mean that a query for latitudes, stored as trie floats, from -10 to +10 matches documents with no (i.e. null) latitude value? No, because normal queries work off of the inverted index (term-docids_that_match), and there won't be any values indexed for that document. Sorting and function queries work off of a non-inverted index (docid-value), that depending on the representation can't tell non-matching from default value. -Yonik http://www.lucidimagination.com
Re: Trie Date question
Thanks for the reply Yonik! I'm using the nightly from 2009-08-20, so its a rather fresh build. And by comparing the schema with the one im using now I had made a mistake when defining the field. By examining the most recent build, i noticed that the normal date field is defined as follows: fieldType name=date class=solr.TrieDateField omitNorms=true precisionStep=0 positionIncrementGap=0/ (its actually a TrieDateField? does this mean that we are moving away from the standard SolrDateField ?) and that the tdate is specified as follows: fieldType name=tdate class=solr.TrieDateField omitNorms=true precisionStep=6 positionIncrementGap=0/ I'll update my schema definitions and reindex:) Guess that pretty much will solve my problems. Thanks! Aleks On Thu, Aug 27, 2009 at 3:47 PM, Yonik Seeley yo...@lucidimagination.comwrote: I can't reproduce any problem. Are you using a recent nightly build? See the example schema of a recent nightly build for the correct way to define a Trie based field - the article / blog may be out of date. Here's what I used to test the example data: http://localhost:8983/solr/select?q=manufacturedate_dt:[NOW/DAY-4YEAR%20TO%20NOW/DAY] -Yonik http://www.lucidimagination.com On Thu, Aug 27, 2009 at 3:49 AM, Aleksander Stensbyaleksander.sten...@integrasco.com wrote: Hello everyone, after reading Grant's article about TrieRange capabilities on the lucid blog I did some experimenting, but I have some trouble with the tdate type and I was hoping that you guys could point me in the right direction. So, basically I index a regular solr date field and use that for sorting and range queries today. For experimenting I added tdate field, indexing it with the same data as in my other date field, but I'm obviously doing something wrong here, because the results coming back are completely different... the definitions in my schema: field name=datetime type=date indexed=true stored=false omitNorms=true/ field name=tdatetime type=tdate indexed=true stored=false/ so if I do a query on my test index: q=datetime:[NOW/DAY-1YEAR TO NOW/DAY] i get numFound=1031524 (don't worry about the ordering yet).. then, if I do the following on my trie date field: q=tdatetime:[NOW/DAY-1YEAR TO NOW/DAY] i get numFound=0 Where did I go wrong? (And yes, both fields are indexed with the exactly same data...) Thanks for any guidance here! Cheers, Aleks -- Aleksander M. Stensby Lead Software Developer and System Architect Integrasco A/S www.integrasco.com http://twitter.com/Integrasco http://facebook.com/Integrasco Please consider the environment before printing all or any of this e-mail -- Aleksander M. Stensby Lead Software Developer and System Architect Integrasco A/S www.integrasco.com http://twitter.com/Integrasco http://facebook.com/Integrasco Please consider the environment before printing all or any of this e-mail
Re: Trie Date question
Hmm, seems I was one day too early with my nightly then:p Quote from Chris (2009-08-20 17:04): i changed it to be manufacturedate_dt since that fits with the existing scheme ... the data is all made up, but so is all hte rest of our data. seems like lucene.apache.org is down at the moment but will try out the new example data once its back up again then, because even though I changed my schema definitions, the two fields still gives back different results... :( I'll keep you updated. - Aleks On Fri, Aug 28, 2009 at 9:33 AM, Aleksander Stensby aleksander.sten...@integrasco.com wrote: Thanks for the reply Yonik! I'm using the nightly from 2009-08-20, so its a rather fresh build. And by comparing the schema with the one im using now I had made a mistake when defining the field. By examining the most recent build, i noticed that the normal date field is defined as follows: fieldType name=date class=solr.TrieDateField omitNorms=true precisionStep=0 positionIncrementGap=0/ (its actually a TrieDateField? does this mean that we are moving away from the standard SolrDateField ?) and that the tdate is specified as follows: fieldType name=tdate class=solr.TrieDateField omitNorms=true precisionStep=6 positionIncrementGap=0/ I'll update my schema definitions and reindex:) Guess that pretty much will solve my problems. Thanks! Aleks On Thu, Aug 27, 2009 at 3:47 PM, Yonik Seeley yo...@lucidimagination.comwrote: I can't reproduce any problem. Are you using a recent nightly build? See the example schema of a recent nightly build for the correct way to define a Trie based field - the article / blog may be out of date. Here's what I used to test the example data: http://localhost:8983/solr/select?q=manufacturedate_dt:[NOW/DAY-4YEAR%20TO%20NOW/DAY] -Yonik http://www.lucidimagination.com On Thu, Aug 27, 2009 at 3:49 AM, Aleksander Stensbyaleksander.sten...@integrasco.com wrote: Hello everyone, after reading Grant's article about TrieRange capabilities on the lucid blog I did some experimenting, but I have some trouble with the tdate type and I was hoping that you guys could point me in the right direction. So, basically I index a regular solr date field and use that for sorting and range queries today. For experimenting I added tdate field, indexing it with the same data as in my other date field, but I'm obviously doing something wrong here, because the results coming back are completely different... the definitions in my schema: field name=datetime type=date indexed=true stored=false omitNorms=true/ field name=tdatetime type=tdate indexed=true stored=false/ so if I do a query on my test index: q=datetime:[NOW/DAY-1YEAR TO NOW/DAY] i get numFound=1031524 (don't worry about the ordering yet).. then, if I do the following on my trie date field: q=tdatetime:[NOW/DAY-1YEAR TO NOW/DAY] i get numFound=0 Where did I go wrong? (And yes, both fields are indexed with the exactly same data...) Thanks for any guidance here! Cheers, Aleks -- Aleksander M. Stensby Lead Software Developer and System Architect Integrasco A/S www.integrasco.com http://twitter.com/Integrasco http://facebook.com/Integrasco Please consider the environment before printing all or any of this e-mail -- Aleksander M. Stensby Lead Software Developer and System Architect Integrasco A/S www.integrasco.com http://twitter.com/Integrasco http://facebook.com/Integrasco Please consider the environment before printing all or any of this e-mail -- Aleksander M. Stensby Lead Software Developer and System Architect Integrasco A/S E-mail: aleksander.sten...@integrasco.com Tel.: +47 41 22 82 72 www.integrasco.com http://twitter.com/Integrasco http://facebook.com/Integrasco Please consider the environment before printing all or any of this e-mail
Re: Trie Date question
I can't reproduce any problem. Are you using a recent nightly build? See the example schema of a recent nightly build for the correct way to define a Trie based field - the article / blog may be out of date. Here's what I used to test the example data: http://localhost:8983/solr/select?q=manufacturedate_dt:[NOW/DAY-4YEAR%20TO%20NOW/DAY] -Yonik http://www.lucidimagination.com On Thu, Aug 27, 2009 at 3:49 AM, Aleksander Stensbyaleksander.sten...@integrasco.com wrote: Hello everyone, after reading Grant's article about TrieRange capabilities on the lucid blog I did some experimenting, but I have some trouble with the tdate type and I was hoping that you guys could point me in the right direction. So, basically I index a regular solr date field and use that for sorting and range queries today. For experimenting I added tdate field, indexing it with the same data as in my other date field, but I'm obviously doing something wrong here, because the results coming back are completely different... the definitions in my schema: field name=datetime type=date indexed=true stored=false omitNorms=true/ field name=tdatetime type=tdate indexed=true stored=false/ so if I do a query on my test index: q=datetime:[NOW/DAY-1YEAR TO NOW/DAY] i get numFound=1031524 (don't worry about the ordering yet).. then, if I do the following on my trie date field: q=tdatetime:[NOW/DAY-1YEAR TO NOW/DAY] i get numFound=0 Where did I go wrong? (And yes, both fields are indexed with the exactly same data...) Thanks for any guidance here! Cheers, Aleks -- Aleksander M. Stensby Lead Software Developer and System Architect Integrasco A/S www.integrasco.com http://twitter.com/Integrasco http://facebook.com/Integrasco Please consider the environment before printing all or any of this e-mail
Re: Trie vs long string for sorting
: My data are library call numbers, normalized to be comparable, resulting in : (maximum) 21-character strings of the form RK 052180H359~999~999 : : Now, these are fine -- they work for sorting and ranges and the whole thing, : but right now I can't use them because I've got two or three for each of my : 6M documents and on a 32-bit machine I run out of heap. : : Another option would be to turn them into longs (using roughly 56 bits of : the 64 bit space) and use a trie type. Is there any sort of a win involved : there? I don't think Trie fields can be used for sorting (because they result in multiple terms per doc) but i could be wrong about that, smarter people then me may have done something cool with the TreiField that i'm not aware of. As a general rule: if you have character data that fits a rigid enough set of constraints that you can encode any legal value into a single numberic value (either int, or long) such that they still sort properly, sorting on those encoded values is going to be more memory efficient (and probably just as fast) as sorting on the string values. -Hoss
Re: Trie vs long string for sorting
Trie has a custom parser that can load the FieldCache for sorting. Its basically a built in type now, that supports fieldcache, sorting, stored fields, etc. On Sat, Jul 4, 2009 at 3:27 PM, Chris Hostetter hossman_luc...@fucit.orgwrote: : My data are library call numbers, normalized to be comparable, resulting in : (maximum) 21-character strings of the form RK 052180H359~999~999 : : Now, these are fine -- they work for sorting and ranges and the whole thing, : but right now I can't use them because I've got two or three for each of my : 6M documents and on a 32-bit machine I run out of heap. : : Another option would be to turn them into longs (using roughly 56 bits of : the 64 bit space) and use a trie type. Is there any sort of a win involved : there? I don't think Trie fields can be used for sorting (because they result in multiple terms per doc) but i could be wrong about that, smarter people then me may have done something cool with the TreiField that i'm not aware of. As a general rule: if you have character data that fits a rigid enough set of constraints that you can encode any legal value into a single numberic value (either int, or long) such that they still sort properly, sorting on those encoded values is going to be more memory efficient (and probably just as fast) as sorting on the string values. -Hoss -- -- - Mark http://www.lucidimagination.com
Re: Trie Patches- Backportable?
I take it by the deafening silence that this is not possible? :-) On Mon, Jun 8, 2009 at 11:34 AM, Amit Nithian anith...@gmail.com wrote: Hi, I am still using Solr 1.2 with the Lucene 2.2 that came with that version of Solr. I am interested in taking advantage of the trie filtering to alleviate some performance problems and was wondering how back-portable these patches are? I am also trying to understand how the Trie algorithm cuts down the number of term queries compared to a normal range query. I was at the recent Bay Area lucene/solr meetup where this was covered but missed some of the details. I know the ideal case is to upgrade to a newer Solr/Lucene but we are resource constrained and can't devote the time right now to test and upgrade our production systems to a newer Solr. Thanks! Amit
Re: Trie Patches- Backportable?
On Tue, Jun 9, 2009 at 10:19 PM, Amit Nithian anith...@gmail.com wrote: I take it by the deafening silence that this is not possible? :-) Anything is possible :) However, it might be easier to upgrade to 1.4 instead. On Mon, Jun 8, 2009 at 11:34 AM, Amit Nithian anith...@gmail.com wrote: Hi, I am still using Solr 1.2 with the Lucene 2.2 that came with that version of Solr. I am interested in taking advantage of the trie filtering to alleviate some performance problems and was wondering how back-portable these patches are? Trie is a new functionality. It does have a few dependencies on new Lucene APIs (TokenStream/TermAttribute etc.). On the Solr side I think it'd be easier. I am also trying to understand how the Trie algorithm cuts down the number of term queries compared to a normal range query. I was at the recent Bay Area lucene/solr meetup where this was covered but missed some of the details. See the javadocs. It has the link to the paper in which it is described in more detail. http://hudson.zones.apache.org/hudson/job/Lucene-trunk/javadoc/contrib-queries/org/apache/lucene/search/trie/package-summary.html -- Regards, Shalin Shekhar Mangar.