Re: Why isn't the DateField implementation of ISO 8601 broader?
Chris Hostetter wrote: : I would expect field:2001-03 to be a hit on a partial match such as : field:[2001-02-28T00:00:00Z TO 2001-03-13T00:00:00Z]. I suppose that my : expectation would be that field:2001-03 would be counted once per day for each : day in its range. It would follow that a user looking for documents relating ...meanwhile someone else might expect that unless the ambiguous date must be entirely contained within the range being queried on. If implemented in DateField I guess this behaviour would need to be configurable. (your implication of counting once per day would have pretty weird results on faceting by the way) I agree. It would be possible to have one document hit on a query but have hundreds of facet categories with a count of one under this scheme. I'm leaning towards the scenario I described where the document would be counted once in an "other" facet category if it is relevant through rounding. with unambiguous dates, you can have exactly what you want just by being a little more verbose when indexing/quering, (and somoene else can have exactly what they want by being equally verbose using slightly differnet options/queries in your case: i would suggest that you use two fields: date_low and date_high ... when you have an exact date (down to the smallest level of granularity you care about) you put the same value in both fields, when you have an ambiguous value (like 2001-03) you put the largest value possible in date_high and the lowest value possible in date_low (ie: date_low:2001-03-01T00:00:00Z & date_high:2001-03-31T23:59:59.999Z) then a query for anything *overlapping* the range from feb28 to march 13 would be... +date_low:[* TO 2001-03-13T00:00:00Z] +date_high:[2001-02-28T00:00:00Z TO *] ...it works for ambiguous dates, and it works for exact dates. (someone else who only wants to see matches if the ranges *completely* overlap would just swap which end point they queried against which field) We've had a really similar solution in place for range queries for a while. Our current problem is really faceting. Thanks, Tricia
Re: Why isn't the DateField implementation of ISO 8601 broader?
: I would expect field:2001-03 to be a hit on a partial match such as : field:[2001-02-28T00:00:00Z TO 2001-03-13T00:00:00Z]. I suppose that my : expectation would be that field:2001-03 would be counted once per day for each : day in its range. It would follow that a user looking for documents relating ...meanwhile someone else might expect that unless the ambiguous date must be entirely contained within the range being queried on. (your implication of counting once per day would have pretty weird results on faceting by the way) with unambiguous dates, you can have exactly what you want just by being a little more verbose when indexing/quering, (and somoene else can have exactly what they want by being equally verbose using slightly differnet options/queries in your case: i would suggest that you use two fields: date_low and date_high ... when you have an exact date (down to the smallest level of granularity you care about) you put the same value in both fields, when you have an ambiguous value (like 2001-03) you put the largest value possible in date_high and the lowest value possible in date_low (ie: date_low:2001-03-01T00:00:00Z & date_high:2001-03-31T23:59:59.999Z) then a query for anything *overlapping* the range from feb28 to march 13 would be... +date_low:[* TO 2001-03-13T00:00:00Z] +date_high:[2001-02-28T00:00:00Z TO *] ...it works for ambiguous dates, and it works for exact dates. (someone else who only wants to see matches if the ranges *completely* overlap would just swap which end point they queried against which field) -Hoss
Re: Why isn't the DateField implementation of ISO 8601 broader?
On 6 Oct 09, at 5:31 PM, Chris Hostetter wrote: ...your expectations may be different then everyone elses. by requiring that the dates be explicit there is no ambiguity, you are in control of the behavior. The power of some of the other formulas in ISO 8601 is that you don't introduce false levels of precision. The "October 2009" issue of a magazine is precisely tagged as "200910" or "2009-10" . It doesn't have a day, hour or minute. Most books come with a copyright year: no month, no day ... In the library/book/periodical world these are a common set of expectations. Walter
Re: Why isn't the DateField implementation of ISO 8601 broader?
Thanks for making me think about this a little bit deeper, Hoss. Comments in-line. Chris Hostetter wrote: because those would be ambiguous. if you just indexed field:2001-03 would you expect it to match field:[2001-02-28T00:00:00Z TO 2001-03-13T00:00:00Z] ... what about date faceting, what should the counts be if you facet per day? I would expect field:2001-03 to be a hit on a partial match such as field:[2001-02-28T00:00:00Z TO 2001-03-13T00:00:00Z]. I suppose that my expectation would be that field:2001-03 would be counted once per day for each day in its range. It would follow that a user looking for documents relating to 1919 might also be interested in 1910. But conversely a user looking for documents relating to 1919 might really only want documents specifically related to 1919. Maybe the implementation would be smart (or configurable) about precision so that it wouldn't be counted when the precision asked to be represented by facets had more significant figures that the indexed/stored value. Maybe there would be another facet category at each precision for "others" -- the documents that have less precision than the current date facet precision. I'm envisioning a hierarchical system that starts general with century with click-throughs drilling down eventually to days. ...your expectations may be different then everyone elses. by requiring that the dates be explicit there is no ambiguity, you are in control of the behavior. I can see your point but surely there are others out there with non explicit data regarding dates out there? Does my use case makes sense to anyone else? in can always just index the first date of whatever block of time (month, yera, century, etc..) and then facet normally. Until a better solution presents itself we've gone the route of creating more fields for faceting on different blocks of time. So fields for century, decade, year, month, and day will let us facet on each of these time periods as needed. Documents with dates with less precision will not show up in date facets with more precision. I was hoping there was an elegant hack for faceting on prefix of a defined number of characters (prefix=*, prefix=**, prefix=***, ...) without having to explicitly specify ..., prefix=188, prefix=189, prefix=190, prefix=191, ... Regards, Tricia
Re: Why isn't the DateField implementation of ISO 8601 broader?
:My question is why isn't the DateField implementation of ISO 8601 broader : so that it could include and MM as acceptable date strings? What because those would be ambiguous. if you just indexed field:2001-03 would you expect it to match field:[2001-02-28T00:00:00Z TO 2001-03-13T00:00:00Z] ... what about date faceting, what should the counts be if you facet per day? ...your expectations may be different then everyone elses. by requiring that the dates be explicit there is no ambiguity, you are in control of the behavior. : would it take to do so? Are there any work-arounds for faceting by century, : year, month without creating new fields in my schema? The last resort would in can always just index the first date of whatever block of time (month, yera, century, etc..) and then facet normally. -Hoss
Re: Why isn't the DateField implementation of ISO 8601 broader?
> My question is why isn't the DateField implementation of ISO 8601 broader so > that it could include and MM as acceptable date strings? What would > it take to do so? Nobody ever cared? But yes, you're right, the spurious precision is annoying. However, there is no "fuzzy search" for dates so the precision is always used. Let's say I want to limit it to "19th century America culture". 1790-1910 are a fairly contiguous sequence in US history, with a massive break at 1910 for WW1. > Are there any work-arounds for faceting by century, year, month without > creating new fields in my schema? The last resort would be to create these > new fields but I'm hoping to leverage the power of the DateField and the trie > to replace range stuff. There are no workarounds as yet. You do not have to store the century/year etc. fields, only index them. Tries do not support faceting yet. > Some interesting observations from tinkering with the DateFieldTest: > * 2003-03-00T00:00:00Z becomes 2003-02-28T00:00:00Z The date parser should blow up with these values!
Why isn't the DateField implementation of ISO 8601 broader?
Hi All, I'm working with data that has multiple date precisions most of which do not have a time associated with them, rather centuries (like 1800's), years (like 1867), and years/months (like 1918-11). I'm able to sort and search using a workaround where we store the date as a string CCYYMM where YYMM are optional. I was hoping to be able to tie this into the DateField type so that it becomes possible to facet on them without much work and duplication of data. Unfortunately it requires the "cannonical representation of dateTime" which means the time part of the string is mandatory. My question is why isn't the DateField implementation of ISO 8601 broader so that it could include and MM as acceptable date strings? What would it take to do so? Are there any work-arounds for faceting by century, year, month without creating new fields in my schema? The last resort would be to create these new fields but I'm hoping to leverage the power of the DateField and the trie to replace range stuff. Thanks, Tricia Some interesting observations from tinkering with the DateFieldTest: * 2003-03-00T00:00:00Z becomes 2003-02-28T00:00:00Z * 2008-03-00T00:00:00Z becomes 2008-02-29T00:00:00Z * 2003-00-00T00:00:00Z becomes 2002-11-30T00:00:00Z * 2000-00-00T00:00:00Z becomes 1999-11-30T00:00:00Z * 1979-00-31T00:00:00Z becomes 1978-12-31T00:00:00Z * 2005-04-00T00:00:00Z becomes 2005-03-31T00:00:00Z * 1850-10-00T00:00:00Z becomes 1850-09-30T00:00:00Z The rounding /YEAR, /MONTH, etc artificially imposes extra precision that the original data wouldn't have. In any case where months are zero weird rounding happens.