Github user dsmiley commented on a diff in the pull request:

    https://github.com/apache/lucene-solr/pull/438#discussion_r212800845
  
    --- Diff: solr/server/solr/configsets/_default/conf/solrconfig.xml ---
    @@ -1141,11 +1141,13 @@
       <updateProcessor class="solr.ParseDoubleFieldUpdateProcessorFactory" 
name="parse-double"/>
       <updateProcessor class="solr.ParseDateFieldUpdateProcessorFactory" 
name="parse-date">
         <arr name="format">
    -      <str>yyyy-MM-dd'T'HH:mm[:ss[.SSS]][z</str>
    -      <str>yyyy-MM-dd'T'HH:mm[:ss[,SSS]][z</str>
    +      <str>yyyy-MM-dd['T'[HH:mm[:ss[.SSS]][z</str>
    +      <str>yyyy-MM-dd['T'[HH:mm[:ss[,SSS]][z</str>
           <str>yyyy-MM-dd HH:mm[:ss[.SSS]][z</str>
           <str>yyyy-MM-dd HH:mm[:ss[,SSS]][z</str>
    -      <str>yyyy-MM-dd</str>
    +      <str>EEE MMM d [HH:mm:ss ][z ]yyyy</str>
    +      <str>EEEE, dd-MMM-yy HH:mm:ss [z</str>
    --- End diff --
    
    These last two patterns are RFC-1036 and RFC-1123.  Neither should have an 
optional timezone -- as seen in 
    `ExtractDateUtils.PATTERN_RFC1036`, `ExtractDateUtils.PATTERN_RFC1123`, 
`DateTimeFormatter.RFC_1123_DATE_TIME`, and I looked at RFC-1036 spec as well.
    
    Why did you make the time of day optional in ASCTIME?  I don't see that in 
ExtractDateUtils.  And can you add a test we parse "Sun Nov  6 08:49:37 1994"  
(example date from RFC-2616 which is HTTP/1.1 spec which lists RFC-1123, 
RFC-1036, and asctime() -- the origin of why we see these particular patterns 
in ExtractDateUtils, borrowed from ApacheHttpClient).  The double-space before 
the single digit day is deliberate.  It may be necessary to use the 'p' (pad 
modifier) as specified in DateTimeFormatter.  I've seen conflicting information 
from internet searches on the "day" portion of asctime() as either 2-digit or 
1; so it'd be good to test that either work.  "Leniency" will hopefully ensure 
one pattern works without needing to add more variations.
    
    Good catch on noticing the seconds is optional in RFC-1123!
    
    Can you reverse the order of these last 3 patterns?  Based on RFC-2616 
(HTTP/1.1), this is the order defined by preference.
    
    BTW if we really did want RFC-1123 & RFC-1036 patterns to have an optional 
timezone, then it would need to be specified differently than how you did it.  
You put the optional start bracket to the right of the space when it would need 
to be to the left of it.
    
    Obviously, all tweaks we do to these patterns need to be redone between 
    * `solr/server/solr/configsets/_default/conf/solrconfig.xml` 
    * `solr/core/src/test- 
files/solr/collection1/conf/solrconfig-parsing-update-processor-chains.xml`
    * 
`solr/contrib/extraction/src/test-files/extraction/solr/collection1/conf/solrconfig.xml`
    Probably elsewhere; I can check before committing.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

Reply via email to