[ 
https://issues.apache.org/jira/browse/SOLR-8904?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

David Smiley updated SOLR-8904:
-------------------------------
    Attachment: SOLR_8904.patch

This new patch adds parsing leniency in accordance with the [Robustness 
Principle|https://en.wikipedia.org/wiki/Robustness_principle] -- "Be 
conservative in what you send, be liberal in what you accept".  In the dev list 
I mentioned making this leniency settable via a System property but I am not 
making it toggle-able here.  Perhaps it could be added later or we can let it 
be this way forever.

Patch summary:
* {{DateMathParser.parseMath}} now parses the date literal part using a new 
method, {{parseNoMath()}} that uses a static instance of DTF created like this: 
{{new DateTimeFormatterBuilder().  
    
.parseCaseInsensitive().parseLenient().appendInstant().toFormatter(Locale.ROOT)}}.
  
** I have this method private because, in general, Solr should be parsing with 
"date math" from user input.  When it's from Solr (e.g. SolrJ parsing dates), 
it can use {{Integer.parse}} which is strict. Solr is strict about formatting 
output.
** parsing leniency is now explicitly tested.  A leading "+" is always optional 
no matter how many digits are in the year.  And numbers need not have leading 0 
pads. However it's an error to have values out of bounds (e.g. >= 60 secs).
* I changed {{ValueSourceAugmenter}} to call {{DateMathParser.parseMath}}  
(instead of without).
* I changed all 3 spots in the analytics contrib module that parse dates to 
parse with math (instead of without).
* I moved most testing in {{DateFieldTest}} into {{DateMathParserTest}}, doing 
a bit of consolidating/refactoring along the way.  The only test that remains 
is testing createField.  I made that test a bit more clear that TrieDateField 
parses with date math.

New suggested CHANGES.txt:
{noformat}
 * Formatted date-times from Solr have some differences.  If the year is more
than 4 digits, there is a leading '+'.  When there is a non-zero number of 
milliseconds, it is padded with zeros to 3 digits.  Negative year (BC) dates are
now possible.  It is now an error to supply a portion of the date out 
of its, range, like 67 seconds.
{noformat}

I think this is ready.

> Switch from SimpleDateFormat to Java 8 DateTimeFormatter.ISO_INSTANT
> --------------------------------------------------------------------
>
>                 Key: SOLR-8904
>                 URL: https://issues.apache.org/jira/browse/SOLR-8904
>             Project: Solr
>          Issue Type: Task
>            Reporter: David Smiley
>            Assignee: David Smiley
>             Fix For: 6.0
>
>         Attachments: SOLR_8904.patch, 
> SOLR_8904_switch_from_SimpleDateFormat_to_Instant_parse_and_format.patch
>
>
> I'd like to move Solr away from SimpleDateFormat to Java 8's 
> java.time.formatter.DateTimeFormatter API, particularly using simply 
> ISO_INSTANT without any custom rules.  This especially involves our 
> DateFormatUtil class in Solr core, but also involves DateUtil (I filed 
> SOLR-8903 to deal with additional delete/move/deprecations for that one).
> In particular, there's {{new Date(Instant.parse(d).toEpochMilli())}} for 
> parsing and {{DateTimeFormatter.ISO_INSTANT.format(d.toInstant())}} for 
> formatting.  Simple & thread-safe!
> I want to simply cut over completely without having special custom rules.  
> There are differences in how ISO_INSTANT does things:
> * Formatting: Milliseconds are 0 padded to 3 digits if the milliseconds is 
> non-zero.  Thus 30 milliseconds will have ".030" added on.  Our current 
> formatting code emits ".03".
> * Dates with years after '9999' (i.e. 10000 and beyond, >= 5 digit years):  
> ISO_INSTANT strictly demands a leading '\+' -- it is formatted with a "\+" 
> and if such a year is parsed it *must* have a "\+" or there is an exception.  
> SimpleDateFormatter requires the opposite -- no '+' and and if you tried to 
> give it one, it would throw an exception.  
> * Currently we don't support negative years (resulting in invisible errors 
> mostly!).  ISO_INSTANT supports this!
> In addition, DateFormatUtil.parseDate currently allows the trailing 'Z' to be 
> optional, but the only caller that could exploit this is the analytics 
> module.  I'd like to remove the optional-ness of 'Z' and inline this method 
> away to {{new Date(Instant.parse(d).toEpochMilli())}}.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to