Hi,

> Although your comment seemed to imply that the new 4.1 postings format
> would store day-style dates more efficiently - could you summarize what
> effects we could see?

It has nothing to do with postings formats, numeric fields are not related to 
that, it is how you index the values (e.g. use only 32 bits) in tries - but 
that is user-land code (e.g. solr field type) - Lucene by itself does not know 
dates at all, only numeric and how YOU/Solr encode dates as numeric is an 
implementation detail of Solr. And stored fields are little different in 4.1, 
but that’s not really a problem altogether (LZ4 would optimize the overhead). 
The idea is simply to add a new field type in Solr that uses day-granularity 
and see them as 32 bit integer values.

Uwe

-----
Uwe Schindler
H.-H.-Meier-Allee 63, D-28213 Bremen
http://www.thetaphi.de
eMail: u...@thetaphi.de

> -----Original Message-----
> From: Jack Krupansky [mailto:j...@basetechnology.com]
> Sent: Wednesday, December 19, 2012 2:47 PM
> To: dev@lucene.apache.org
> Subject: Re: Possible improvement: TrieDate without time of day
> 
> "Have you tried to set your dates' hours, minutes, seconds and milliseconds
> to 0 before indexing them ?"
> 
> If only it were that easy!
> 
> And maybe that's the point - we need an attribute on "date/DateField" fields
> to express those semantics - throw away the time of day when indexing
> values for this field/type. Maybe an attribute such as indexTime="false".
> 
> Also, I am wondering if I use day dates and they are in a range like 1990 to
> 2012, that's a relatively small number of unique values, like 8,000.
> 
> And, also enable a source format that has only the day date so that the
> source text can be more compact.
> 
> And maybe support other date/day formats as well, including RFC format.
> SolrCell has some support for RFC date format, I think.
> 
> But, the real point of this thread was whether it matters or not if time of 
> day
> is suppressed.
> 
> Although your comment seemed to imply that the new 4.1 postings format
> would store day-style dates more efficiently - could you summarize what
> effects we could see?
> 
> -- Jack Krupansky
> 
> -----Original Message-----
> From: Adrien Grand
> Sent: Wednesday, December 19, 2012 5:05 AM
> To: dev@lucene.apache.org
> Subject: Re: Possible improvement: TrieDate without time of day
> 
> Hi Jack,
> 
> On Sat, Dec 15, 2012 at 4:36 PM, Jack Krupansky
> <j...@basetechnology.com>
> wrote:
> > I have seen a few inquiries concerned with the overhead of storing
> > time of day for simple dates. The concerns are both storage and
> > performance. So, the question/proposal is whether a variant of
> > TrieDate with no time of day component, call it TrieDay or
> > TrieDateTimeless or TrieDateNoTime (or incompatibly rename TrieDate to
> > TrieDateTime and use TrieDate for the new format), could be stored
> > with, say, 40% more storage efficiency and maybe a comparable or at
> > least significant performance improvement for queries.
> 
> Storing only the day in a 32-bits integer could save space, but I'm not sure
> Solr should provide a type for all granularities of dates?
> Have you tried to set your dates' hours, minutes, seconds and milliseconds to
> 0 before indexing them ? This should help postings lists share terms and
> improve storage efficiency (especially with the new
> Lucene41PostingsFormat).
> 
> --
> Adrien
> 
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional
> commands, e-mail: dev-h...@lucene.apache.org
> 
> 
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional
> commands, e-mail: dev-h...@lucene.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

Reply via email to