[
https://issues.apache.org/jira/browse/SOLR-8904?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
David Smiley updated SOLR-8904:
-------------------------------
Attachment: SOLR_8904.patch
This new patch adds parsing leniency in accordance with the [Robustness
Principle|https://en.wikipedia.org/wiki/Robustness_principle] -- "Be
conservative in what you send, be liberal in what you accept". In the dev list
I mentioned making this leniency settable via a System property but I am not
making it toggle-able here. Perhaps it could be added later or we can let it
be this way forever.
Patch summary:
* {{DateMathParser.parseMath}} now parses the date literal part using a new
method, {{parseNoMath()}} that uses a static instance of DTF created like this:
{{new DateTimeFormatterBuilder().
.parseCaseInsensitive().parseLenient().appendInstant().toFormatter(Locale.ROOT)}}.
** I have this method private because, in general, Solr should be parsing with
"date math" from user input. When it's from Solr (e.g. SolrJ parsing dates),
it can use {{Integer.parse}} which is strict. Solr is strict about formatting
output.
** parsing leniency is now explicitly tested. A leading "+" is always optional
no matter how many digits are in the year. And numbers need not have leading 0
pads. However it's an error to have values out of bounds (e.g. >= 60 secs).
* I changed {{ValueSourceAugmenter}} to call {{DateMathParser.parseMath}}
(instead of without).
* I changed all 3 spots in the analytics contrib module that parse dates to
parse with math (instead of without).
* I moved most testing in {{DateFieldTest}} into {{DateMathParserTest}}, doing
a bit of consolidating/refactoring along the way. The only test that remains
is testing createField. I made that test a bit more clear that TrieDateField
parses with date math.
New suggested CHANGES.txt:
{noformat}
* Formatted date-times from Solr have some differences. If the year is more
than 4 digits, there is a leading '+'. When there is a non-zero number of
milliseconds, it is padded with zeros to 3 digits. Negative year (BC) dates are
now possible. It is now an error to supply a portion of the date out
of its, range, like 67 seconds.
{noformat}
I think this is ready.
> Switch from SimpleDateFormat to Java 8 DateTimeFormatter.ISO_INSTANT
> --------------------------------------------------------------------
>
> Key: SOLR-8904
> URL: https://issues.apache.org/jira/browse/SOLR-8904
> Project: Solr
> Issue Type: Task
> Reporter: David Smiley
> Assignee: David Smiley
> Fix For: 6.0
>
> Attachments: SOLR_8904.patch,
> SOLR_8904_switch_from_SimpleDateFormat_to_Instant_parse_and_format.patch
>
>
> I'd like to move Solr away from SimpleDateFormat to Java 8's
> java.time.formatter.DateTimeFormatter API, particularly using simply
> ISO_INSTANT without any custom rules. This especially involves our
> DateFormatUtil class in Solr core, but also involves DateUtil (I filed
> SOLR-8903 to deal with additional delete/move/deprecations for that one).
> In particular, there's {{new Date(Instant.parse(d).toEpochMilli())}} for
> parsing and {{DateTimeFormatter.ISO_INSTANT.format(d.toInstant())}} for
> formatting. Simple & thread-safe!
> I want to simply cut over completely without having special custom rules.
> There are differences in how ISO_INSTANT does things:
> * Formatting: Milliseconds are 0 padded to 3 digits if the milliseconds is
> non-zero. Thus 30 milliseconds will have ".030" added on. Our current
> formatting code emits ".03".
> * Dates with years after '9999' (i.e. 10000 and beyond, >= 5 digit years):
> ISO_INSTANT strictly demands a leading '\+' -- it is formatted with a "\+"
> and if such a year is parsed it *must* have a "\+" or there is an exception.
> SimpleDateFormatter requires the opposite -- no '+' and and if you tried to
> give it one, it would throw an exception.
> * Currently we don't support negative years (resulting in invisible errors
> mostly!). ISO_INSTANT supports this!
> In addition, DateFormatUtil.parseDate currently allows the trailing 'Z' to be
> optional, but the only caller that could exploit this is the analytics
> module. I'd like to remove the optional-ness of 'Z' and inline this method
> away to {{new Date(Instant.parse(d).toEpochMilli())}}.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]