Re: Trie Based field (long) value parsing on query time

2012-09-28 Thread Chris Hostetter
: q=reference:4-1.2
: 
: the value is a text, but the following is indexed as a number (e.g.:
: 004001002, where 4 becomes 004, and 1 becomes 001, and 2 002),

depnding on how you look at it, you could implment this as 
one of two plugins:

1) if you consider this a special form of query syntax, then you should 
implement it as a QParser -- ie: a RefrenceNumberQparser that it could be 
used with any field/fieldType/FieldType, regardless of wether it's 
ultimately backed by a TrieLongField or an IntField, etc...

2) if you consider this to be an special format of the underlying data, 
then you would implement it in the FieldType -- ie: by subclassing 
TrieLongField and calling it RefrenceNumberField and overriding methods 
like readableToIndexed (and optionally: things like indexedToReadable and 
toObject if you'd like to format the resulting data as 4-1.2 when you 
return it to the client)

Practically i think the key decision maker is what sort of behavior you 
want to support if someone asks for something like refrence:4 ... should 
that be equivilent to 4-0.0 (in which case you could implement this 
easily as a FieldType) or should it be a range query for 
everything in article 4, regardless of section and subsection? (in which 
case i would implement it in a QParser)


-Hoss


Re: trie

2010-09-21 Thread Simon Willnauer
2010/9/21 Péter Király kirun...@gmail.com:
 You can read about it in Lucene in Action second edition.
have a look at 
http://www.lucidimagination.com/developer/whitepaper/Whats-New-in-Apache-Lucene-3-0

page 4 to 8 should give you a good intro to the topic

simon

 Péter

 2010/9/21 Papp Richard ccode...@gmail.com:
  is there any good tutorial how to use and what is trie? what I found on the
 net is really blurry.

 rgeards,
  Rich


 __ Information from ESET NOD32 Antivirus, version of virus signature
 database 5419 (20100902) __

 The message was checked by ESET NOD32 Antivirus.

 http://www.eset.com






RE: trie

2010-09-21 Thread Papp Richard
thank you guys for the answers... now I have to check / read some docs ;)

Rich

-Original Message-
From: Simon Willnauer [mailto:simon.willna...@googlemail.com] 
Sent: Tuesday, September 21, 2010 23:00
To: solr-user@lucene.apache.org
Subject: Re: trie

2010/9/21 Péter Király kirun...@gmail.com:
 You can read about it in Lucene in Action second edition.
have a look at 
http://www.lucidimagination.com/developer/whitepaper/Whats-New-in-Apache-Lucene-3-0

page 4 to 8 should give you a good intro to the topic

simon

 Péter

 2010/9/21 Papp Richard ccode...@gmail.com:
  is there any good tutorial how to use and what is trie? what I found on the
 net is really blurry.

 rgeards,
  Rich


 __ Information from ESET NOD32 Antivirus, version of virus signature
 database 5419 (20100902) __

 The message was checked by ESET NOD32 Antivirus.

 http://www.eset.com




 

__ Information from ESET NOD32 Antivirus, version of virus signature 
database 5419 (20100902) __

The message was checked by ESET NOD32 Antivirus.

http://www.eset.com
 
 

__ Information from ESET NOD32 Antivirus, version of virus signature 
database 5419 (20100902) __

The message was checked by ESET NOD32 Antivirus.

http://www.eset.com
 



Re: trie fields and sortMissingLast

2009-12-21 Thread Shalin Shekhar Mangar
On Mon, Dec 21, 2009 at 5:37 PM, Marc Sturlese marc.sturl...@gmail.comwrote:


 Should sortMissingLast param be working on trie-fields?


Nope, trie fields do not support sortMissingFirst or sortMissingLast.

-- 
Regards,
Shalin Shekhar Mangar.


Re: trie fields and sortMissingLast

2009-10-02 Thread Yonik Seeley
On Thu, Oct 1, 2009 at 2:54 PM, Lance Norskog goks...@gmail.com wrote:
 Trie fields also do not support faceting.

Only those that index multiple tokens per value to speed up range queries.

 They also take more ram in
 some operations.

Should be less memory on average.

-Yonik
http://www.lucidimagination.com

 Given these defects, I'm not sure that promoting tries as the default
 is appropriate at this time. (I'm sure this is an old argument.:)

 On Thu, Oct 1, 2009 at 7:39 AM, Steve Conover scono...@gmail.com wrote:
 I just noticed this comment in the default schema:

 !--
       These types should only be used for back compatibility with existing
       indexes, or if sortMissingLast functionality is needed. Use
 Trie based fields instead.
    --

 Does that mean TrieFields are never going to get sortMissingLast?

 Do you all think that a reasonable strategy is to use a copyField and
 use s fields for sorting (only), and trie for everything else?

 On Wed, Sep 30, 2009 at 10:59 PM, Steve Conover scono...@gmail.com wrote:
 Am I correct in thinking that trie fields don't support
 sortMissingLast (my tests show that they don't).  If not, is there any
 plan for adding it in?

 Regards,
 Steve





 --
 Lance Norskog
 goks...@gmail.com



Re: trie fields and sortMissingLast

2009-10-01 Thread Steve Conover
I just noticed this comment in the default schema:

!--
   These types should only be used for back compatibility with existing
   indexes, or if sortMissingLast functionality is needed. Use
Trie based fields instead.
--

Does that mean TrieFields are never going to get sortMissingLast?

Do you all think that a reasonable strategy is to use a copyField and
use s fields for sorting (only), and trie for everything else?

On Wed, Sep 30, 2009 at 10:59 PM, Steve Conover scono...@gmail.com wrote:
 Am I correct in thinking that trie fields don't support
 sortMissingLast (my tests show that they don't).  If not, is there any
 plan for adding it in?

 Regards,
 Steve



Re: trie fields and sortMissingLast

2009-10-01 Thread Lance Norskog
Trie fields also do not support faceting. They also take more ram in
some operations.

Given these defects, I'm not sure that promoting tries as the default
is appropriate at this time. (I'm sure this is an old argument.:)

On Thu, Oct 1, 2009 at 7:39 AM, Steve Conover scono...@gmail.com wrote:
 I just noticed this comment in the default schema:

 !--
       These types should only be used for back compatibility with existing
       indexes, or if sortMissingLast functionality is needed. Use
 Trie based fields instead.
    --

 Does that mean TrieFields are never going to get sortMissingLast?

 Do you all think that a reasonable strategy is to use a copyField and
 use s fields for sorting (only), and trie for everything else?

 On Wed, Sep 30, 2009 at 10:59 PM, Steve Conover scono...@gmail.com wrote:
 Am I correct in thinking that trie fields don't support
 sortMissingLast (my tests show that they don't).  If not, is there any
 plan for adding it in?

 Regards,
 Steve





-- 
Lance Norskog
goks...@gmail.com


Re: trie fields and sortMissingLast

2009-10-01 Thread Yonik Seeley
On Thu, Oct 1, 2009 at 10:39 AM, Steve Conover scono...@gmail.com wrote:
 I just noticed this comment in the default schema:

 !--
       These types should only be used for back compatibility with existing
       indexes, or if sortMissingLast functionality is needed. Use
 Trie based fields instead.
    --

 Does that mean TrieFields are never going to get sortMissingLast?

Not in time for 1.4, but yes they will eventually get it.
It has to do with the representation... currently we can't tell
between a 0 and missing.

 Do you all think that a reasonable strategy is to use a copyField and
 use s fields for sorting (only), and trie for everything else?

If you don't need the fast range queries, use the s fields only.

-Yonik
http://www.lucidimagination.com


 On Wed, Sep 30, 2009 at 10:59 PM, Steve Conover scono...@gmail.com wrote:
 Am I correct in thinking that trie fields don't support
 sortMissingLast (my tests show that they don't).  If not, is there any
 plan for adding it in?

 Regards,
 Steve




Re: trie fields and sortMissingLast

2009-10-01 Thread Steve Conover
 Not in time for 1.4, but yes they will eventually get it.
 It has to do with the representation... currently we can't tell
 between a 0 and missing.

Hmm.  So does that mean that a query for latitudes, stored as trie
floats, from -10 to +10 matches documents with no (i.e. null) latitude
value?


Re: trie fields and sortMissingLast

2009-10-01 Thread Yonik Seeley
On Thu, Oct 1, 2009 at 11:09 PM, Steve Conover scono...@gmail.com wrote:
 Not in time for 1.4, but yes they will eventually get it.
 It has to do with the representation... currently we can't tell
 between a 0 and missing.

 Hmm.  So does that mean that a query for latitudes, stored as trie
 floats, from -10 to +10 matches documents with no (i.e. null) latitude
 value?

No, because normal queries work off of the inverted index
(term-docids_that_match), and there won't be any values indexed for
that document.
Sorting and function queries work off of a non-inverted index
(docid-value), that depending on the representation can't tell
non-matching from default value.

-Yonik
http://www.lucidimagination.com


Re: Trie Date question

2009-08-28 Thread Aleksander Stensby
Thanks for the reply Yonik!
I'm using the nightly from 2009-08-20, so its a rather fresh build. And by
comparing the schema with the one im using now I had made a mistake when
defining the field.
By examining the most recent build, i noticed that the normal date field is
defined as follows:
fieldType name=date class=solr.TrieDateField omitNorms=true
precisionStep=0 positionIncrementGap=0/
(its actually a TrieDateField? does this mean that we are moving away from
the standard SolrDateField ?)
and that the tdate is specified as follows:
fieldType name=tdate class=solr.TrieDateField omitNorms=true
precisionStep=6 positionIncrementGap=0/
I'll update my schema definitions and reindex:) Guess that pretty much will
solve my problems.
Thanks!
 Aleks

On Thu, Aug 27, 2009 at 3:47 PM, Yonik Seeley yo...@lucidimagination.comwrote:

 I can't reproduce any problem.

 Are you using a recent nightly build?
 See the example schema of a recent nightly build for the correct way
 to define a Trie based field - the article / blog may be out of date.

 Here's what I used to test the example data:

 http://localhost:8983/solr/select?q=manufacturedate_dt:[NOW/DAY-4YEAR%20TO%20NOW/DAY]

 -Yonik
 http://www.lucidimagination.com



 On Thu, Aug 27, 2009 at 3:49 AM, Aleksander
 Stensbyaleksander.sten...@integrasco.com wrote:
  Hello everyone,
  after reading Grant's article about TrieRange capabilities on the lucid
 blog
  I did some experimenting, but I have some trouble with the tdate type and
 I
  was hoping that you guys could point me in the right direction.
  So, basically I index a regular solr date field and use that for sorting
 and
  range queries today. For experimenting I added tdate field, indexing it
 with
  the same data as in my other date field, but I'm obviously doing
 something
  wrong here, because the results coming back are completely different...
  the definitions in my schema:
  field name=datetime type=date indexed=true stored=false
  omitNorms=true/
  field name=tdatetime type=tdate indexed=true stored=false/
 
  so if I do a query on my test index:
  q=datetime:[NOW/DAY-1YEAR TO NOW/DAY]
  i get numFound=1031524 (don't worry about the ordering yet)..
  then, if I do the following on my trie date field:
  q=tdatetime:[NOW/DAY-1YEAR TO NOW/DAY]
  i get numFound=0
  Where did I go wrong? (And yes, both fields are indexed with the exactly
  same data...)
  Thanks for any guidance here!
  Cheers,
   Aleks
 
  --
  Aleksander M. Stensby
  Lead Software Developer and System Architect
  Integrasco A/S
  www.integrasco.com
  http://twitter.com/Integrasco
  http://facebook.com/Integrasco
 
  Please consider the environment before printing all or any of this e-mail
 




-- 
Aleksander M. Stensby
Lead Software Developer and System Architect
Integrasco A/S
www.integrasco.com
http://twitter.com/Integrasco
http://facebook.com/Integrasco

Please consider the environment before printing all or any of this e-mail


Re: Trie Date question

2009-08-28 Thread Aleksander Stensby
Hmm, seems I was one day too early with my nightly then:p
Quote from Chris (2009-08-20 17:04):
i changed it to be manufacturedate_dt since that fits with the existing
scheme ... the data is all made up, but so is all hte rest of our data.

seems like lucene.apache.org is down at the moment but will try out the new
example data once its back up again then, because even though I changed my
schema definitions, the two fields still gives back different results... :(
I'll keep you updated.
- Aleks
On Fri, Aug 28, 2009 at 9:33 AM, Aleksander Stensby 
aleksander.sten...@integrasco.com wrote:

 Thanks for the reply Yonik!
 I'm using the nightly from 2009-08-20, so its a rather fresh build. And by
 comparing the schema with the one im using now I had made a mistake when
 defining the field.
 By examining the most recent build, i noticed that the normal date field is
 defined as follows:
 fieldType name=date class=solr.TrieDateField omitNorms=true
 precisionStep=0 positionIncrementGap=0/
 (its actually a TrieDateField? does this mean that we are moving away from
 the standard SolrDateField ?)
 and that the tdate is specified as follows:
 fieldType name=tdate class=solr.TrieDateField omitNorms=true
 precisionStep=6 positionIncrementGap=0/
 I'll update my schema definitions and reindex:) Guess that pretty much will
 solve my problems.
 Thanks!
  Aleks

 On Thu, Aug 27, 2009 at 3:47 PM, Yonik Seeley 
 yo...@lucidimagination.comwrote:

 I can't reproduce any problem.

 Are you using a recent nightly build?
 See the example schema of a recent nightly build for the correct way
 to define a Trie based field - the article / blog may be out of date.

 Here's what I used to test the example data:

 http://localhost:8983/solr/select?q=manufacturedate_dt:[NOW/DAY-4YEAR%20TO%20NOW/DAY]

 -Yonik
 http://www.lucidimagination.com



 On Thu, Aug 27, 2009 at 3:49 AM, Aleksander
 Stensbyaleksander.sten...@integrasco.com wrote:
  Hello everyone,
  after reading Grant's article about TrieRange capabilities on the lucid
 blog
  I did some experimenting, but I have some trouble with the tdate type
 and I
  was hoping that you guys could point me in the right direction.
  So, basically I index a regular solr date field and use that for sorting
 and
  range queries today. For experimenting I added tdate field, indexing it
 with
  the same data as in my other date field, but I'm obviously doing
 something
  wrong here, because the results coming back are completely different...
  the definitions in my schema:
  field name=datetime type=date indexed=true stored=false
  omitNorms=true/
  field name=tdatetime type=tdate indexed=true stored=false/
 
  so if I do a query on my test index:
  q=datetime:[NOW/DAY-1YEAR TO NOW/DAY]
  i get numFound=1031524 (don't worry about the ordering yet)..
  then, if I do the following on my trie date field:
  q=tdatetime:[NOW/DAY-1YEAR TO NOW/DAY]
  i get numFound=0
  Where did I go wrong? (And yes, both fields are indexed with the exactly
  same data...)
  Thanks for any guidance here!
  Cheers,
   Aleks
 
  --
  Aleksander M. Stensby
  Lead Software Developer and System Architect
  Integrasco A/S
  www.integrasco.com
  http://twitter.com/Integrasco
  http://facebook.com/Integrasco
 
  Please consider the environment before printing all or any of this
 e-mail
 




 --
 Aleksander M. Stensby
 Lead Software Developer and System Architect
 Integrasco A/S
 www.integrasco.com
 http://twitter.com/Integrasco
 http://facebook.com/Integrasco

 Please consider the environment before printing all or any of this e-mail




-- 
Aleksander M. Stensby
Lead Software Developer and System Architect
Integrasco A/S
E-mail: aleksander.sten...@integrasco.com
Tel.: +47 41 22 82 72
www.integrasco.com
http://twitter.com/Integrasco
http://facebook.com/Integrasco

Please consider the environment before printing all or any of this e-mail


Re: Trie Date question

2009-08-27 Thread Yonik Seeley
I can't reproduce any problem.

Are you using a recent nightly build?
See the example schema of a recent nightly build for the correct way
to define a Trie based field - the article / blog may be out of date.

Here's what I used to test the example data:
http://localhost:8983/solr/select?q=manufacturedate_dt:[NOW/DAY-4YEAR%20TO%20NOW/DAY]

-Yonik
http://www.lucidimagination.com



On Thu, Aug 27, 2009 at 3:49 AM, Aleksander
Stensbyaleksander.sten...@integrasco.com wrote:
 Hello everyone,
 after reading Grant's article about TrieRange capabilities on the lucid blog
 I did some experimenting, but I have some trouble with the tdate type and I
 was hoping that you guys could point me in the right direction.
 So, basically I index a regular solr date field and use that for sorting and
 range queries today. For experimenting I added tdate field, indexing it with
 the same data as in my other date field, but I'm obviously doing something
 wrong here, because the results coming back are completely different...
 the definitions in my schema:
 field name=datetime type=date indexed=true stored=false
 omitNorms=true/
 field name=tdatetime type=tdate indexed=true stored=false/

 so if I do a query on my test index:
 q=datetime:[NOW/DAY-1YEAR TO NOW/DAY]
 i get numFound=1031524 (don't worry about the ordering yet)..
 then, if I do the following on my trie date field:
 q=tdatetime:[NOW/DAY-1YEAR TO NOW/DAY]
 i get numFound=0
 Where did I go wrong? (And yes, both fields are indexed with the exactly
 same data...)
 Thanks for any guidance here!
 Cheers,
  Aleks

 --
 Aleksander M. Stensby
 Lead Software Developer and System Architect
 Integrasco A/S
 www.integrasco.com
 http://twitter.com/Integrasco
 http://facebook.com/Integrasco

 Please consider the environment before printing all or any of this e-mail



Re: Trie vs long string for sorting

2009-07-04 Thread Chris Hostetter

: My data are library call numbers, normalized to be comparable, resulting in
: (maximum) 21-character strings of the form RK 052180H359~999~999
: 
: Now, these are fine -- they work for sorting and ranges and the whole thing,
: but right now I can't use them because I've got two or three for each of my
: 6M documents and on a 32-bit machine I run out of heap.
: 
: Another option would be to turn them into longs (using roughly 56 bits of
: the 64 bit space) and use a trie type. Is there any sort of a win involved
: there?

I don't think Trie fields can be used for sorting (because they result in 
multiple terms per doc) but i could be wrong about that, smarter people 
then me may have done something cool with the TreiField that i'm not aware 
of.

As a general rule: if you have character data that fits a rigid enough set 
of constraints that you can encode any legal value into a single 
numberic value (either int, or long) such that they still sort properly, 
sorting on those encoded values is going to be more memory efficient (and 
probably just as fast) as sorting on the string values.


-Hoss



Re: Trie vs long string for sorting

2009-07-04 Thread Mark Miller
Trie has a custom parser that can load the FieldCache for sorting. Its
basically a built in type now, that supports fieldcache, sorting, stored
fields, etc.

On Sat, Jul 4, 2009 at 3:27 PM, Chris Hostetter hossman_luc...@fucit.orgwrote:


 : My data are library call numbers, normalized to be comparable, resulting
 in
 : (maximum) 21-character strings of the form RK 052180H359~999~999
 :
 : Now, these are fine -- they work for sorting and ranges and the whole
 thing,
 : but right now I can't use them because I've got two or three for each of
 my
 : 6M documents and on a 32-bit machine I run out of heap.
 :
 : Another option would be to turn them into longs (using roughly 56 bits of
 : the 64 bit space) and use a trie type. Is there any sort of a win
 involved
 : there?

 I don't think Trie fields can be used for sorting (because they result in
 multiple terms per doc) but i could be wrong about that, smarter people
 then me may have done something cool with the TreiField that i'm not aware
 of.

 As a general rule: if you have character data that fits a rigid enough set
 of constraints that you can encode any legal value into a single
 numberic value (either int, or long) such that they still sort properly,
 sorting on those encoded values is going to be more memory efficient (and
 probably just as fast) as sorting on the string values.


 -Hoss




-- 
-- 
- Mark

http://www.lucidimagination.com


Re: Trie Patches- Backportable?

2009-06-09 Thread Amit Nithian
I take it by the deafening silence that this is not possible? :-)

On Mon, Jun 8, 2009 at 11:34 AM, Amit Nithian anith...@gmail.com wrote:

 Hi,
 I am still using Solr 1.2 with the Lucene 2.2 that came with that version
 of Solr. I am interested in taking advantage of the trie filtering to
 alleviate some performance problems and was wondering how back-portable
 these patches are?

 I am also trying to understand how the Trie algorithm cuts down the number
 of term queries compared to a normal range query. I was at the recent Bay
 Area lucene/solr meetup where this was covered but missed some of the
 details.

 I know the ideal case is to upgrade to a newer Solr/Lucene but we are
 resource constrained and can't devote the time right now to test and upgrade
 our production systems to a newer Solr.

 Thanks!
 Amit



Re: Trie Patches- Backportable?

2009-06-09 Thread Shalin Shekhar Mangar
On Tue, Jun 9, 2009 at 10:19 PM, Amit Nithian anith...@gmail.com wrote:

 I take it by the deafening silence that this is not possible? :-)


Anything is possible :)

However, it might be easier to upgrade to 1.4 instead.



 On Mon, Jun 8, 2009 at 11:34 AM, Amit Nithian anith...@gmail.com wrote:

  Hi,
  I am still using Solr 1.2 with the Lucene 2.2 that came with that version
  of Solr. I am interested in taking advantage of the trie filtering to
  alleviate some performance problems and was wondering how back-portable
  these patches are?
 


Trie is a new functionality. It does have a few dependencies on new Lucene
APIs (TokenStream/TermAttribute etc.). On the Solr side I think it'd be
easier.



  I am also trying to understand how the Trie algorithm cuts down the
 number
  of term queries compared to a normal range query. I was at the recent Bay
  Area lucene/solr meetup where this was covered but missed some of the
  details.
 


See the javadocs. It has the link to the paper in which it is described in
more detail.

http://hudson.zones.apache.org/hudson/job/Lucene-trunk/javadoc/contrib-queries/org/apache/lucene/search/trie/package-summary.html
-- 
Regards,
Shalin Shekhar Mangar.