"Have you tried to set your dates' hours, minutes, seconds and milliseconds to 0 before indexing them ?"

If only it were that easy!

And maybe that's the point - we need an attribute on "date/DateField" fields to express those semantics - throw away the time of day when indexing values for this field/type. Maybe an attribute such as indexTime="false".

Also, I am wondering if I use day dates and they are in a range like 1990 to 2012, that's a relatively small number of unique values, like 8,000.

And, also enable a source format that has only the day date so that the source text can be more compact.

And maybe support other date/day formats as well, including RFC format. SolrCell has some support for RFC date format, I think.

But, the real point of this thread was whether it matters or not if time of day is suppressed.

Although your comment seemed to imply that the new 4.1 postings format would store day-style dates more efficiently - could you summarize what effects we could see?

-- Jack Krupansky

-----Original Message----- From: Adrien Grand
Sent: Wednesday, December 19, 2012 5:05 AM
To: dev@lucene.apache.org
Subject: Re: Possible improvement: TrieDate without time of day

Hi Jack,

On Sat, Dec 15, 2012 at 4:36 PM, Jack Krupansky <j...@basetechnology.com> wrote:
I have seen a few inquiries concerned with the overhead of storing time of
day for simple dates. The concerns are both storage and performance. So, the
question/proposal is whether a variant of TrieDate with no time of day
component, call it TrieDay or TrieDateTimeless or TrieDateNoTime (or
incompatibly rename TrieDate to TrieDateTime and use TrieDate for the new
format), could be stored with, say, 40% more storage efficiency and maybe a
comparable or at least significant performance improvement for queries.

Storing only the day in a 32-bits integer could save space, but I'm
not sure Solr should provide a type for all granularities of dates?
Have you tried to set your dates' hours, minutes, seconds and
milliseconds to 0 before indexing them ? This should help postings
lists share terms and improve storage efficiency (especially with the
new Lucene41PostingsFormat).

--
Adrien

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

Reply via email to