Thanks for the historical context; it's nice to know.  You say it had to do
with XML but XML's choice was in turn following ISO specs.  So even though
the user might not be using XML, the format we've chosen is sound
regardless of the format.

JavaBin uses the 1970 epoch in milliseconds, and I suspect some other
formats like Avro might too though I don't know.  I checked Avro and it's
also milliseconds.  So no formatting issues there.  I suspect other
efficient protocols do likewise, to avoid formatting concerns and because,
of course, it's efficient to parse.

Regarding back-compat, yeah I get that there will be some users that need
to tweak something.  But this is a major release we're talking about.  With
major releases especially, users take their time to upgrade and don't do so
blindly.  IMO it's not worth making work for ourselves that will be
inevitable for a user any way.  Considering your input, perhaps on the
parsing side (but not formatting) I could go with a system property (not
request param) that enables Solr to use a DateTimeFormatter instance
constructed with what DateTimeFormatterBuilder.parseLenient().  It doesn't
parse 65 seconds (which I found an oddball example of in a test), but
parses non-padded time elements (e.g. "1" for January) and allows the "+"
to be optional for 5 digit years.  I just did some experimentation to
observe this.

I said parsing but not formatting because there's far fewer call sites in
my patch for Instant.parse and because of the "Robustness Principle":
https://en.wikipedia.org/wiki/Robustness_principle

Sorry, but I don't like the idea of supporting a list of formats.  Sounds
great for the extraction module or some sort of explicitly invoked utility
(e.g. in an URP), but not for everywhere.

~ David

On Fri, Mar 25, 2016 at 2:37 PM Chris Hostetter <[email protected]>
wrote:

>
> : I started a quick hack to cut over DateFormatUtil's formatting to this
> : one-liner:  DateTimeFormatter.ISO_INSTANT.format(d.toInstant());    and
>         ...
> : I'd love to just cut over to this but there are some slight differences
> we
> : would see and I want to get people's opinion if any of these differences
> : are a blocker:
>
> For some historical context: the current format was choosen back in the
> days when the *only* way to get data in or out of Solr was XML, and the
> format was picked to be consistent with the best standard available at the
> time for representing moments in time in XML.
>
> When we use javabin for communiating with clients, we use serialized
> versions of the Date objects that deserialize back to Date objects on the
> other side of the wire; and likewise if we added thrift, or avro, or
> whatever support for communicating with clients, i would be 100% in favor
> of following whatever standards those formats have for communicating
> moment in time info -- but even in all of those cases where there may be a
> protocol specific representation for moments in time over the wire, it
> still seems very important/useful for having a standard (default) format
> for dealing with dates represented as *strings* over the wire --
> particularly when dealing with query parsing.
>
>
> In general, i'm a *little* concerned about how tweaking the
> parsing/formatting of "dates as strings" will affect existing
> clients/users that are using generic/custom parsing/formatting code in
> various langauges (however small those tweaks may be)...
>
> : * Milliseconds are 0 padded to 3 digits if the milliseconds is non-zero.
> : Thus 30 milliseconds will have ".030" added on.  Our current formatting
> : code does ".03".
>
> ...probably won't impact existing many existing users since they alreayd
> have to be prepared to parse up to 3 decimals, and (i'm assuming) the new
> parser you're suggesting we use in solr will be forgiving if they send a
> string with less.  but if someone was obeying the existing spec
> religiously they may have errors if we return "*.900"
>
> : * Dates with years after '9999' (i.e. 10000 and beyond, >= 5 digit years)
> : *must* have a leading '+' -- it is formatted with a "+" and if such a
> year
> : is parsed it *must* have a "+" or there is an exception.  Currently a '+'
> : would yield an exception and there is no "+" emitted.
>
> ...this could also easily be problematic for some people in practice.
>
> : * Of course as mentioned, currently we don't support negative years
> : (resulting in invisible errors mostly); we'd get this for free.
>
> +1
>
>
> : dissent on my proposal.  If there's a real reason to keep something
> : consistent with current behavior, I'm sure we could complicate things
> : further but it'd be great to simply use ISO_INSTANT exactly.
>
> My personal preference would be to at least have some ability to enable a
> backcompat mode...
>
> 1) switch to the new syntax as you describe by default
> 2) support a request param option to force the old parsing code to be used
> (or perhaps just *emulated* by stripping off any leading + and trailing
> 0 when formatting, and adding them if needed when parsing) ... this would
> be a fall back option for clients whos existing code is brittle, and
> wouldn't have to be something we optimize heavily -- people whould be
> encouraged to swith to the new format ASAP.
> 3) after backporting to 6x, remove support for the request param override
> in master (so it's not supported at all starting in 7x)
>
> ...i think as a baseline, that would be awesome -- but I wonder if there
> are bigger questions we should be asking, and a better overall "API"
> for how date formatting/parsing rules are choosen on a per-request basis?
>
> ie: If we're going to change the rules of date parsing/formatting,
> should we rethink *all* the rules, rather then just tweaking one detail of
> rules that were written solely with communicating over XML in mind?
>
>
> I mean ... i'm way out of the loop on the state of art in java.time.*, but
> even w/o knowing what all is supported/recommend there, i have to wonder
> if this would be a good oportunity to add general support for an
> 'datetime.fmt' request param that can be multivalued to specify an ordered
> set of parsers to be used any tiem solr needs to parse a String specified
> by the client as a moment in time -- and using the first fmt specified any
> time a moment in time may need formatted to return to the user as a string
>
>         ?
>
>
>
> -Hoss
> http://www.lucidworks.com/
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: [email protected]
> For additional commands, e-mail: [email protected]
>
> --
Lucene/Solr Search Committer, Consultant, Developer, Author, Speaker
LinkedIn: http://linkedin.com/in/davidwsmiley | Book:
http://www.solrenterprisesearchserver.com

Reply via email to