cloud-fan commented on a change in pull request #26190: [SPARK-29532][SQL] simplify interval string parsing URL: https://github.com/apache/spark/pull/26190#discussion_r338384646
########## File path: sql/core/benchmarks/IntervalBenchmark-results.txt ########## @@ -1,25 +1,25 @@ -Java HotSpot(TM) 64-Bit Server VM 1.8.0_202-b08 on Mac OS X 10.15 -Intel(R) Core(TM) i7-4850HQ CPU @ 2.30GHz +Java HotSpot(TM) 64-Bit Server VM 1.8.0_161-b12 on Mac OS X 10.14 +Intel(R) Core(TM) i7-6920HQ CPU @ 2.90GHz cast strings to intervals: Best Time(ms) Avg Time(ms) Stdev(ms) Rate(M/s) Per Row(ns) Relative ------------------------------------------------------------------------------------------------------------------------ -string w/ interval 420 435 18 2.4 419.8 1.0X -string w/o interval 359 365 10 2.8 358.7 1.2X -1 units w/ interval 752 759 8 1.3 752.0 0.6X -1 units w/o interval 762 766 4 1.3 762.0 0.6X -2 units w/ interval 961 970 8 1.0 960.7 0.4X -2 units w/o interval 970 976 9 1.0 970.2 0.4X -3 units w/ interval 1130 1136 7 0.9 1130.4 0.4X -3 units w/o interval 1150 1158 9 0.9 1150.3 0.4X -4 units w/ interval 1333 1336 3 0.7 1333.5 0.3X -4 units w/o interval 1354 1359 4 0.7 1354.5 0.3X -5 units w/ interval 1523 1525 2 0.7 1523.3 0.3X -5 units w/o interval 1549 1551 3 0.6 1549.4 0.3X -6 units w/ interval 1661 1663 2 0.6 1660.8 0.3X -6 units w/o interval 1691 1704 13 0.6 1691.2 0.2X -7 units w/ interval 1811 1817 8 0.6 1810.6 0.2X -7 units w/o interval 1853 1854 1 0.5 1853.2 0.2X -8 units w/ interval 2029 2037 8 0.5 2028.7 0.2X -8 units w/o interval 2075 2075 1 0.5 2074.5 0.2X -9 units w/ interval 2170 2175 5 0.5 2170.0 0.2X -9 units w/o interval 2204 2212 8 0.5 2203.6 0.2X +prepare string w/ interval 403 419 18 2.5 403.1 1.0X +prepare string w/o interval 341 353 21 2.9 341.1 1.2X +1 units w/ interval 5154 5159 8 0.2 5153.5 0.1X +1 units w/o interval 4818 4833 20 0.2 4817.6 0.1X +2 units w/ interval 6191 6223 41 0.2 6190.6 0.1X +2 units w/o interval 6236 6264 25 0.2 6235.7 0.1X +3 units w/ interval 7397 7567 170 0.1 7397.0 0.1X +3 units w/o interval 7280 7367 76 0.1 7279.6 0.1X +4 units w/ interval 8197 8228 27 0.1 8197.3 0.0X +4 units w/o interval 7977 7989 17 0.1 7977.3 0.1X +5 units w/ interval 9089 9192 101 0.1 9088.8 0.0X +5 units w/o interval 8853 8858 5 0.1 8852.8 0.0X +6 units w/ interval 9696 9720 23 0.1 9695.6 0.0X +6 units w/o interval 9509 9518 9 0.1 9509.4 0.0X +7 units w/ interval 10738 10791 55 0.1 10737.8 0.0X +7 units w/o interval 10556 10719 146 0.1 10555.6 0.0X +8 units w/ interval 11787 11992 339 0.1 11786.7 0.0X +8 units w/o interval 11666 11720 56 0.1 11665.8 0.0X +9 units w/ interval 12878 12908 42 0.1 12877.7 0.0X +9 units w/o interval 12696 12738 36 0.1 12696.1 0.0X Review comment: My comment is hidden because the code changes: https://github.com/apache/spark/pull/26190#discussion_r337974040 Let me copy it again. It's about 6 times slower, but perf doesn't matter too much here. It's better to keep a single parser, and make the behavior consistent whenever we parse an interval string. For example 1. `select interval +1 day` works but `select interval '+1 day'` does not. 2. `select interval 1 day 1 year` works (fields can be any order) but `select interval '1 day 1 year'` does not. In general, the handwritten parser is more efficient but is less powerful. I think it's fragile to maintain the handwritten parser and make sure it's consistent with the antlr one. ---------------------------------------------------------------- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services --------------------------------------------------------------------- To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org