Github user dsmiley commented on a diff in the pull request: https://github.com/apache/lucene-solr/pull/438#discussion_r212800845 --- Diff: solr/server/solr/configsets/_default/conf/solrconfig.xml --- @@ -1141,11 +1141,13 @@ <updateProcessor class="solr.ParseDoubleFieldUpdateProcessorFactory" name="parse-double"/> <updateProcessor class="solr.ParseDateFieldUpdateProcessorFactory" name="parse-date"> <arr name="format"> - <str>yyyy-MM-dd'T'HH:mm[:ss[.SSS]][z</str> - <str>yyyy-MM-dd'T'HH:mm[:ss[,SSS]][z</str> + <str>yyyy-MM-dd['T'[HH:mm[:ss[.SSS]][z</str> + <str>yyyy-MM-dd['T'[HH:mm[:ss[,SSS]][z</str> <str>yyyy-MM-dd HH:mm[:ss[.SSS]][z</str> <str>yyyy-MM-dd HH:mm[:ss[,SSS]][z</str> - <str>yyyy-MM-dd</str> + <str>EEE MMM d [HH:mm:ss ][z ]yyyy</str> + <str>EEEE, dd-MMM-yy HH:mm:ss [z</str> --- End diff -- These last two patterns are RFC-1036 and RFC-1123. Neither should have an optional timezone -- as seen in `ExtractDateUtils.PATTERN_RFC1036`, `ExtractDateUtils.PATTERN_RFC1123`, `DateTimeFormatter.RFC_1123_DATE_TIME`, and I looked at RFC-1036 spec as well. Why did you make the time of day optional in ASCTIME? I don't see that in ExtractDateUtils. And can you add a test we parse "Sun Nov 6 08:49:37 1994" (example date from RFC-2616 which is HTTP/1.1 spec which lists RFC-1123, RFC-1036, and asctime() -- the origin of why we see these particular patterns in ExtractDateUtils, borrowed from ApacheHttpClient). The double-space before the single digit day is deliberate. It may be necessary to use the 'p' (pad modifier) as specified in DateTimeFormatter. I've seen conflicting information from internet searches on the "day" portion of asctime() as either 2-digit or 1; so it'd be good to test that either work. "Leniency" will hopefully ensure one pattern works without needing to add more variations. Good catch on noticing the seconds is optional in RFC-1123! Can you reverse the order of these last 3 patterns? Based on RFC-2616 (HTTP/1.1), this is the order defined by preference. BTW if we really did want RFC-1123 & RFC-1036 patterns to have an optional timezone, then it would need to be specified differently than how you did it. You put the optional start bracket to the right of the space when it would need to be to the left of it. Obviously, all tweaks we do to these patterns need to be redone between * `solr/server/solr/configsets/_default/conf/solrconfig.xml` * `solr/core/src/test- files/solr/collection1/conf/solrconfig-parsing-update-processor-chains.xml` * `solr/contrib/extraction/src/test-files/extraction/solr/collection1/conf/solrconfig.xml` Probably elsewhere; I can check before committing.
--- --------------------------------------------------------------------- To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org