[GitHub] [spark] cloud-fan commented on a change in pull request #26190: [SPARK-29532][SQL] simplify interval string parsing

GitBox Wed, 23 Oct 2019 22:16:30 -0700

cloud-fan commented on a change in pull request #26190: [SPARK-29532][SQL] 
simplify interval string parsing
URL: https://github.com/apache/spark/pull/26190#discussion_r338384646


 ##########
 File path: sql/core/benchmarks/IntervalBenchmark-results.txt
 ##########
 @@ -1,25 +1,25 @@
-Java HotSpot(TM) 64-Bit Server VM 1.8.0_202-b08 on Mac OS X 10.15
-Intel(R) Core(TM) i7-4850HQ CPU @ 2.30GHz
+Java HotSpot(TM) 64-Bit Server VM 1.8.0_161-b12 on Mac OS X 10.14
+Intel(R) Core(TM) i7-6920HQ CPU @ 2.90GHz
 cast strings to intervals:                Best Time(ms)   Avg Time(ms)   
Stdev(ms)    Rate(M/s)   Per Row(ns)   Relative
 
------------------------------------------------------------------------------------------------------------------------
-string w/ interval                                  420            435         
 18          2.4         419.8       1.0X
-string w/o interval                                 359            365         
 10          2.8         358.7       1.2X
-1 units w/ interval                                 752            759         
  8          1.3         752.0       0.6X
-1 units w/o interval                                762            766         
  4          1.3         762.0       0.6X
-2 units w/ interval                                 961            970         
  8          1.0         960.7       0.4X
-2 units w/o interval                                970            976         
  9          1.0         970.2       0.4X
-3 units w/ interval                                1130           1136         
  7          0.9        1130.4       0.4X
-3 units w/o interval                               1150           1158         
  9          0.9        1150.3       0.4X
-4 units w/ interval                                1333           1336         
  3          0.7        1333.5       0.3X
-4 units w/o interval                               1354           1359         
  4          0.7        1354.5       0.3X
-5 units w/ interval                                1523           1525         
  2          0.7        1523.3       0.3X
-5 units w/o interval                               1549           1551         
  3          0.6        1549.4       0.3X
-6 units w/ interval                                1661           1663         
  2          0.6        1660.8       0.3X
-6 units w/o interval                               1691           1704         
 13          0.6        1691.2       0.2X
-7 units w/ interval                                1811           1817         
  8          0.6        1810.6       0.2X
-7 units w/o interval                               1853           1854         
  1          0.5        1853.2       0.2X
-8 units w/ interval                                2029           2037         
  8          0.5        2028.7       0.2X
-8 units w/o interval                               2075           2075         
  1          0.5        2074.5       0.2X
-9 units w/ interval                                2170           2175         
  5          0.5        2170.0       0.2X
-9 units w/o interval                               2204           2212         
  8          0.5        2203.6       0.2X
+prepare string w/ interval                          403            419         
 18          2.5         403.1       1.0X
+prepare string w/o interval                         341            353         
 21          2.9         341.1       1.2X
+1 units w/ interval                                5154           5159         
  8          0.2        5153.5       0.1X
+1 units w/o interval                               4818           4833         
 20          0.2        4817.6       0.1X
+2 units w/ interval                                6191           6223         
 41          0.2        6190.6       0.1X
+2 units w/o interval                               6236           6264         
 25          0.2        6235.7       0.1X
+3 units w/ interval                                7397           7567         
170          0.1        7397.0       0.1X
+3 units w/o interval                               7280           7367         
 76          0.1        7279.6       0.1X
+4 units w/ interval                                8197           8228         
 27          0.1        8197.3       0.0X
+4 units w/o interval                               7977           7989         
 17          0.1        7977.3       0.1X
+5 units w/ interval                                9089           9192         
101          0.1        9088.8       0.0X
+5 units w/o interval                               8853           8858         
  5          0.1        8852.8       0.0X
+6 units w/ interval                                9696           9720         
 23          0.1        9695.6       0.0X
+6 units w/o interval                               9509           9518         
  9          0.1        9509.4       0.0X
+7 units w/ interval                               10738          10791         
 55          0.1       10737.8       0.0X
+7 units w/o interval                              10556          10719         
146          0.1       10555.6       0.0X
+8 units w/ interval                               11787          11992         
339          0.1       11786.7       0.0X
+8 units w/o interval                              11666          11720         
 56          0.1       11665.8       0.0X
+9 units w/ interval                               12878          12908         
 42          0.1       12877.7       0.0X
+9 units w/o interval                              12696          12738         
 36          0.1       12696.1       0.0X
 
 Review comment:
   My comment is hidden because the code changes: 
https://github.com/apache/spark/pull/26190#discussion_r337974040
   
   Let me copy it again.
   
   It's about 6 times slower, but perf doesn't matter too much here. It's 
better to keep a single parser, and make the behavior consistent whenever we 
parse an interval string. For example
   1. `select interval +1 day` works but `select interval '+1 day'` does not.
   2. `select interval 1 day 1 year` works (fields can be any order) but 
`select interval '1 day 1 year'` does not.
   
   In general, the handwritten parser is more efficient but is less powerful. I 
think it's fragile to maintain the handwritten parser and make sure it's 
consistent with the antlr one.
   

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] cloud-fan commented on a change in pull request #26190: [SPARK-29532][SQL] simplify interval string parsing

Reply via email to