[GitHub] lucene-solr pull request #438: SOLR-12593: Remove date parsing functionality...

2018-09-28 Thread asfgit
Github user asfgit closed the pull request at:

https://github.com/apache/lucene-solr/pull/438


---

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[GitHub] lucene-solr pull request #438: SOLR-12593: Remove date parsing functionality...

2018-08-25 Thread dsmiley
Github user dsmiley commented on a diff in the pull request:

https://github.com/apache/lucene-solr/pull/438#discussion_r212807806
  
--- Diff: 
solr/core/src/test/org/apache/solr/update/processor/ParsingFieldUpdateProcessorsTest.java
 ---
@@ -992,6 +992,14 @@ public void testLenient() throws IOException {
 assertParsedDate("Friday Oct 7 13:14:15 2005", 
Date.from(inst20051007131415()), "parse-date-patterns-default-config");
   }
 
+  public void testRfc2616() throws Exception {
+assertParsedDate("Fri Oct 7 13:14:15 2005" , 
Date.from(inst20051007131415()), "parse-date-patterns-default-config");
+  }
+
+  public void testRfc2616Leniency() throws Exception {
--- End diff --

Should be named testAsctimeLeniency (see my previous comment).  Glad to see 
this test works.

'course there  are already some tests with similar names... so feel free to 
combine some.  For example the "testLenient" method is an Ansi C test of 
leniency; by all means combine tests into one test method as appropriate.


---

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[GitHub] lucene-solr pull request #438: SOLR-12593: Remove date parsing functionality...

2018-08-25 Thread dsmiley
Github user dsmiley commented on a diff in the pull request:

https://github.com/apache/lucene-solr/pull/438#discussion_r212807780
  
--- Diff: 
solr/core/src/test/org/apache/solr/update/processor/ParsingFieldUpdateProcessorsTest.java
 ---
@@ -992,6 +992,14 @@ public void testLenient() throws IOException {
 assertParsedDate("Friday Oct 7 13:14:15 2005", 
Date.from(inst20051007131415()), "parse-date-patterns-default-config");
   }
 
+  public void testRfc2616() throws Exception {
--- End diff --

Should be named testAsctime since that's the format of this input you are 
testing.  When I mentioned RFC2616, that is a spec that in turn refers to 
multiple formats (including asctime).

Also, like I pointed out, need to test with 2 spaces left of a single digit 
day.


---

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[GitHub] lucene-solr pull request #438: SOLR-12593: Remove date parsing functionality...

2018-08-25 Thread barrotsteindev
Github user barrotsteindev commented on a diff in the pull request:

https://github.com/apache/lucene-solr/pull/438#discussion_r212803611
  
--- Diff: solr/server/solr/configsets/_default/conf/solrconfig.xml ---
@@ -1141,11 +1141,13 @@
   
   
 
-  -MM-dd'T'HH:mm[:ss[.SSS]][z
-  -MM-dd'T'HH:mm[:ss[,SSS]][z
+  -MM-dd['T'[HH:mm[:ss[.SSS]][z
+  -MM-dd['T'[HH:mm[:ss[,SSS]][z
   -MM-dd HH:mm[:ss[.SSS]][z
   -MM-dd HH:mm[:ss[,SSS]][z
-  -MM-dd
+  EEE MMM d [HH:mm:ss ][z ]
+  , dd-MMM-yy HH:mm:ss [z
--- End diff --

>And can you add a test we parse "Sun Nov 6 08:49:37 1994"

BTW, I used "Fri Oct 7 13:14:15 2005" instead, since it uses the same date 
format, but there is a helper method to generate this instant in 
ParsingFieldUpdateProcessorsTest#inst20051007131415.
Hope it is OK.


---

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[GitHub] lucene-solr pull request #438: SOLR-12593: Remove date parsing functionality...

2018-08-25 Thread barrotsteindev
Github user barrotsteindev commented on a diff in the pull request:

https://github.com/apache/lucene-solr/pull/438#discussion_r212803494
  
--- Diff: solr/server/solr/configsets/_default/conf/solrconfig.xml ---
@@ -1141,11 +1141,13 @@
   
   
 
-  -MM-dd'T'HH:mm[:ss[.SSS]][z
-  -MM-dd'T'HH:mm[:ss[,SSS]][z
+  -MM-dd['T'[HH:mm[:ss[.SSS]][z
+  -MM-dd['T'[HH:mm[:ss[,SSS]][z
   -MM-dd HH:mm[:ss[.SSS]][z
   -MM-dd HH:mm[:ss[,SSS]][z
-  -MM-dd
+  EEE MMM d [HH:mm:ss ][z ]
+  , dd-MMM-yy HH:mm:ss [z
--- End diff --

Oh, now I recall why I added the optional time zone to asctime.
When the optional time zone is added, the pattern covers AKS and EDT date 
formats as well.
IMO, it might be beneficial to keep it in the default config.


---

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[GitHub] lucene-solr pull request #438: SOLR-12593: Remove date parsing functionality...

2018-08-25 Thread barrotsteindev
Github user barrotsteindev commented on a diff in the pull request:

https://github.com/apache/lucene-solr/pull/438#discussion_r212803184
  
--- Diff: solr/server/solr/configsets/_default/conf/solrconfig.xml ---
@@ -1141,11 +1141,13 @@
   
   
 
-  -MM-dd'T'HH:mm[:ss[.SSS]][z
-  -MM-dd'T'HH:mm[:ss[,SSS]][z
+  -MM-dd['T'[HH:mm[:ss[.SSS]][z
+  -MM-dd['T'[HH:mm[:ss[,SSS]][z
   -MM-dd HH:mm[:ss[.SSS]][z
   -MM-dd HH:mm[:ss[,SSS]][z
-  -MM-dd
+  EEE MMM d [HH:mm:ss ][z ]
+  , dd-MMM-yy HH:mm:ss [z
--- End diff --

I made time of day optional in asciitime so we can remove this 
configuration -MM-dd, which uses same without a time of day.
I will remove the optional timezone in RFC-1123 & RFC-1036, guess it is my 
bad.


---

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[GitHub] lucene-solr pull request #438: SOLR-12593: Remove date parsing functionality...

2018-08-25 Thread dsmiley
Github user dsmiley commented on a diff in the pull request:

https://github.com/apache/lucene-solr/pull/438#discussion_r212800845
  
--- Diff: solr/server/solr/configsets/_default/conf/solrconfig.xml ---
@@ -1141,11 +1141,13 @@
   
   
 
-  -MM-dd'T'HH:mm[:ss[.SSS]][z
-  -MM-dd'T'HH:mm[:ss[,SSS]][z
+  -MM-dd['T'[HH:mm[:ss[.SSS]][z
+  -MM-dd['T'[HH:mm[:ss[,SSS]][z
   -MM-dd HH:mm[:ss[.SSS]][z
   -MM-dd HH:mm[:ss[,SSS]][z
-  -MM-dd
+  EEE MMM d [HH:mm:ss ][z ]
+  , dd-MMM-yy HH:mm:ss [z
--- End diff --

These last two patterns are RFC-1036 and RFC-1123.  Neither should have an 
optional timezone -- as seen in 
`ExtractDateUtils.PATTERN_RFC1036`, `ExtractDateUtils.PATTERN_RFC1123`, 
`DateTimeFormatter.RFC_1123_DATE_TIME`, and I looked at RFC-1036 spec as well.

Why did you make the time of day optional in ASCTIME?  I don't see that in 
ExtractDateUtils.  And can you add a test we parse "Sun Nov  6 08:49:37 1994"  
(example date from RFC-2616 which is HTTP/1.1 spec which lists RFC-1123, 
RFC-1036, and asctime() -- the origin of why we see these particular patterns 
in ExtractDateUtils, borrowed from ApacheHttpClient).  The double-space before 
the single digit day is deliberate.  It may be necessary to use the 'p' (pad 
modifier) as specified in DateTimeFormatter.  I've seen conflicting information 
from internet searches on the "day" portion of asctime() as either 2-digit or 
1; so it'd be good to test that either work.  "Leniency" will hopefully ensure 
one pattern works without needing to add more variations.

Good catch on noticing the seconds is optional in RFC-1123!

Can you reverse the order of these last 3 patterns?  Based on RFC-2616 
(HTTP/1.1), this is the order defined by preference.

BTW if we really did want RFC-1123 & RFC-1036 patterns to have an optional 
timezone, then it would need to be specified differently than how you did it.  
You put the optional start bracket to the right of the space when it would need 
to be to the left of it.

Obviously, all tweaks we do to these patterns need to be redone between 
* `solr/server/solr/configsets/_default/conf/solrconfig.xml` 
* `solr/core/src/test- 
files/solr/collection1/conf/solrconfig-parsing-update-processor-chains.xml`
* 
`solr/contrib/extraction/src/test-files/extraction/solr/collection1/conf/solrconfig.xml`
Probably elsewhere; I can check before committing.


---

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[GitHub] lucene-solr pull request #438: SOLR-12593: Remove date parsing functionality...

2018-08-24 Thread barrotsteindev
Github user barrotsteindev commented on a diff in the pull request:

https://github.com/apache/lucene-solr/pull/438#discussion_r212788729
  
--- Diff: 
solr/contrib/extraction/src/test-files/extraction/solr/collection1/conf/solrconfig.xml
 ---
@@ -43,6 +43,25 @@
   
 
 
+  
+
+  -MM-dd['T'[HH:mm[:ss[.SSS]][z
+  -MM-dd['T'[HH:mm[:ss[,SSS]][z
+  -MM-dd HH:mm[:ss[.SSS]][z
+  -MM-dd HH:mm[:ss[,SSS]][z
+  EEE MMM d [HH:mm:ss ][z ]
+  , dd-MMM-yy HH:mm:ss [z
+  [EEE, ]dd MMM  HH:mm[:ss] [z
--- End diff --

> BTW I was looking at DateTimeFormatter.RFC_1123_DATE_TIME and noticed the 
leading day of week is optional.

I also had a little peek and noticed the second of the minute is also 
optional, here is the code snippet that initializes the default 
RFC_1123_DATE_TIME DateTimeFromatter:

`.optionalStart()
  .appendLiteral(':')
  .appendValue(SECOND_OF_MINUTE, 2)
  .optionalEnd()`


---

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[GitHub] lucene-solr pull request #438: SOLR-12593: Remove date parsing functionality...

2018-08-22 Thread dsmiley
Github user dsmiley commented on a diff in the pull request:

https://github.com/apache/lucene-solr/pull/438#discussion_r212041807
  
--- Diff: solr/server/solr/configsets/_default/conf/solrconfig.xml ---
@@ -1141,11 +1141,11 @@
   
   
 
-  -MM-dd'T'HH:mm[:ss[.SSS]][z
--- End diff --

By "the old patterns" do you mean the _current_ patterns we see in the 
default config?  Yes.  You can tweak an existing pattern to parse more things, 
but don't remove the ability to parse something it would have parsed before!  
And yes add/merge the additional ones from SOLR-12591.  By "merge" I mean 
modify some existing patterns to match some of the ones in SOLR-12591.


---

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[GitHub] lucene-solr pull request #438: SOLR-12593: Remove date parsing functionality...

2018-08-22 Thread barrotsteindev
Github user barrotsteindev commented on a diff in the pull request:

https://github.com/apache/lucene-solr/pull/438#discussion_r212033718
  
--- Diff: solr/server/solr/configsets/_default/conf/solrconfig.xml ---
@@ -1141,11 +1141,11 @@
   
   
 
-  -MM-dd'T'HH:mm[:ss[.SSS]][z
--- End diff --

So should I keep the old patterns in addition to the ones from SOLR-12591?


---

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[GitHub] lucene-solr pull request #438: SOLR-12593: Remove date parsing functionality...

2018-08-22 Thread dsmiley
Github user dsmiley commented on a diff in the pull request:

https://github.com/apache/lucene-solr/pull/438#discussion_r212031794
  
--- Diff: solr/server/solr/configsets/_default/conf/solrconfig.xml ---
@@ -1141,11 +1141,11 @@
   
   
 
-  -MM-dd'T'HH:mm[:ss[.SSS]][z
--- End diff --

You just changed the default date patterns to not parse patterns it parsed 
before.  We need to be cautious!  For example it used to parse a comma to 
separate the milliseconds.  And the 'T' was optional.

Whatever patterns exist here ought to be identical in documentation & test 
configs (which yes is annoying to keep in sync but shouldn't change often).


---

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[GitHub] lucene-solr pull request #438: SOLR-12593: Remove date parsing functionality...

2018-08-21 Thread dsmiley
Github user dsmiley commented on a diff in the pull request:

https://github.com/apache/lucene-solr/pull/438#discussion_r211821295
  
--- Diff: solr/server/solr/configsets/_default/conf/solrconfig.xml ---
@@ -1146,6 +1146,11 @@
   -MM-dd HH:mm[:ss[.SSS]][z
   -MM-dd HH:mm[:ss[,SSS]][z
   -MM-dd
+  EEE, dd MMM  HH:mm:ss z
+  , dd-MMM-yy HH:mm:ss z
+  EEE MMM d HH:mm:ss 
+  EEE MMM d hh:mm:ss z 
--- End diff --

Yeah, that's what I thought you were going to do.  You've separately tested 
that the ExtractionDateUtils tests also work on these patterns in the last 
issue, so we want them to be the same; otherwise due to bugs (like your 'hh') 
we might inadvertently screw that up.  If needed, modify/tweak one of those 
tests/configs.


---

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[GitHub] lucene-solr pull request #438: SOLR-12593: Remove date parsing functionality...

2018-08-21 Thread dsmiley
Github user dsmiley commented on a diff in the pull request:

https://github.com/apache/lucene-solr/pull/438#discussion_r211821003
  
--- Diff: solr/server/solr/configsets/_default/conf/solrconfig.xml ---
@@ -1146,6 +1146,11 @@
   -MM-dd HH:mm[:ss[.SSS]][z
   -MM-dd HH:mm[:ss[,SSS]][z
   -MM-dd
+  EEE, dd MMM  HH:mm:ss z
+  , dd-MMM-yy HH:mm:ss z
+  EEE MMM d HH:mm:ss 
+  EEE MMM d hh:mm:ss z 
+  -MM-dd hh:mm:ss
--- End diff --

Close; it'd be this:  `-MM-dd[ hh:mm:ss]`  (notice the bracket starts 
immediately after the 'dd'; otherwise we'd be wrongly insisting on a trailing 
space.  

And yes, always use "HH" and not "hh" unless the pattern contains an AM/PM 
designator (and these don't).


---

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[GitHub] lucene-solr pull request #438: SOLR-12593: Remove date parsing functionality...

2018-08-21 Thread barrotsteindev
Github user barrotsteindev commented on a diff in the pull request:

https://github.com/apache/lucene-solr/pull/438#discussion_r211818326
  
--- Diff: solr/server/solr/configsets/_default/conf/solrconfig.xml ---
@@ -1146,6 +1146,11 @@
   -MM-dd HH:mm[:ss[.SSS]][z
   -MM-dd HH:mm[:ss[,SSS]][z
   -MM-dd
+  EEE, dd MMM  HH:mm:ss z
+  , dd-MMM-yy HH:mm:ss z
+  EEE MMM d HH:mm:ss 
+  EEE MMM d hh:mm:ss z 
--- End diff --

Should I just try and copy that part of the config to the default one?


---

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[GitHub] lucene-solr pull request #438: SOLR-12593: Remove date parsing functionality...

2018-08-21 Thread barrotsteindev
Github user barrotsteindev commented on a diff in the pull request:

https://github.com/apache/lucene-solr/pull/438#discussion_r211818226
  
--- Diff: solr/server/solr/configsets/_default/conf/solrconfig.xml ---
@@ -1146,6 +1146,11 @@
   -MM-dd HH:mm[:ss[.SSS]][z
   -MM-dd HH:mm[:ss[,SSS]][z
   -MM-dd
+  EEE, dd MMM  HH:mm:ss z
+  , dd-MMM-yy HH:mm:ss z
+  EEE MMM d HH:mm:ss 
+  EEE MMM d hh:mm:ss z 
+  -MM-dd hh:mm:ss
--- End diff --

Right, it could be joined with "-MM-dd" as "-MM-dd [hh:mm:ss]".
Or should I replace the "hh" bit with "HH"?


---

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[GitHub] lucene-solr pull request #438: SOLR-12593: Remove date parsing functionality...

2018-08-21 Thread dsmiley
Github user dsmiley commented on a diff in the pull request:

https://github.com/apache/lucene-solr/pull/438#discussion_r211817633
  
--- Diff: solr/server/solr/configsets/_default/conf/solrconfig.xml ---
@@ -1146,6 +1146,11 @@
   -MM-dd HH:mm[:ss[.SSS]][z
   -MM-dd HH:mm[:ss[,SSS]][z
   -MM-dd
+  EEE, dd MMM  HH:mm:ss z
+  , dd-MMM-yy HH:mm:ss z
+  EEE MMM d HH:mm:ss 
+  EEE MMM d hh:mm:ss z 
+  -MM-dd hh:mm:ss
--- End diff --

This pattern could be removed if we add optional brackets to an existing 
pattern.


---

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[GitHub] lucene-solr pull request #438: SOLR-12593: Remove date parsing functionality...

2018-08-21 Thread dsmiley
Github user dsmiley commented on a diff in the pull request:

https://github.com/apache/lucene-solr/pull/438#discussion_r211817529
  
--- Diff: solr/server/solr/configsets/_default/conf/solrconfig.xml ---
@@ -1146,6 +1146,11 @@
   -MM-dd HH:mm[:ss[.SSS]][z
   -MM-dd HH:mm[:ss[,SSS]][z
   -MM-dd
+  EEE, dd MMM  HH:mm:ss z
+  , dd-MMM-yy HH:mm:ss z
+  EEE MMM d HH:mm:ss 
+  EEE MMM d hh:mm:ss z 
--- End diff --

The "hh" is erroneous.  I'm surprised this even works; perhaps "lenient" 
allows os to get away with it?  I was hoping to keep the added patterns more 
minimal.  You recently added "parse-date-patterns-from-extract-contrib" to 
solrconfig-parsing-update-processor-chains which is a better guide.




---

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[GitHub] lucene-solr pull request #438: SOLR-12593: Remove date parsing functionality...

2018-08-21 Thread barrotsteindev
GitHub user barrotsteindev opened a pull request:

https://github.com/apache/lucene-solr/pull/438

SOLR-12593: Remove date parsing functionality from extraction contrib



You can merge this pull request into a Git repository by running:

$ git pull https://github.com/barrotsteindev/lucene-solr SOLR-12593

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/lucene-solr/pull/438.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #438


commit 2fb612a4944464851ff59a97f4d1fdba28776a87
Author: Bar Rotstein 
Date:   2018-08-21T19:21:10Z

SOLR-12593: add ExtractionDateUtil default date formats to parse date urp 
default config

commit 163a1d44ff958400c3895a0382c526ae9102e2e6
Author: Bar Rotstein 
Date:   2018-08-21T19:21:32Z

replace extraction date util in SolrContentHandler with URP and update test 
config




---

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org