GitHub user sujith71955 opened a pull request: https://github.com/apache/spark/pull/23197
[SPARK-26165][Optimizer] Date and Timestamp column expressions are getting converted to string type Date and Timestamp column expressions are getting converted to string type in less than/greater than filter query even though valid date/timestamp string literal is used in the right side of filter expression eg: select to_date('2009-07-30 04:17:52') >= '2009-07-30 04:17:52' Internally the expression will be casted as below struct<(CAST(to_date('2009-07-30 04:17:52') AS STRING) >= 2009-07-30 04:17:52):boolean> This can also reduce the query performance as every single row will be casted to String type After fix the above expression will be optimized as below, as the right side literal value is a valid date string literal. struct<(to_date('2009-07-30 04:17:52') >= CAST(2009-07-30 04:17:52 AS DATE)):boolean> ## What changes were proposed in this pull request? Date and Timestamp column is getting converted to string in less than/greater than filter queries even though date strings that contains a time, like '2018-03-18" 12:39:40' which are valid date format string. I think we shall avoid casting to String type if Date/timestamp string literal value can be converted to a valid date or timestamp, and we shall convert the filter right expression column to string type only if filter expression with string literal cannot be converted to data/timestamp. ## How was this patch tested? Using Existing Unit testcase and manual testing. You can merge this pull request into a Git repository by running: $ git pull https://github.com/sujith71955/spark master_filter_perf Alternatively you can review and apply these changes as the patch at: https://github.com/apache/spark/pull/23197.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #23197 ---- commit 9f1b2493d92c4340d02d4958da38d9ff8c1ab612 Author: s71955 <sujithchacko.2010@...> Date: 2018-11-25T14:14:28Z [SPARK-26165][Optimizer] Date and Timestamp column is getting converted to string in less than/greater than filter query even though valid date/timestamp string literal is used in the right side filter expression ## What changes were proposed in this pull request? Date and Timestamp column is getting converted to string in less than/greater than filter query even though date strings that contains a time, like '2018-03-18" 12:39:40' to date. Besides it's not possible to cast a string like '2018-03-18 12:39:40' to a timestamp. I think we shall avoid casting if Date/timestamp string which can be converted to a valid date or timestamp ,we can convert the filter right expression column to sting type only if filter expression with string literal cannot be converted to data/timestamp. ## How was this patch tested? Using Existing Unit testcase and manual testing. ---- --- --------------------------------------------------------------------- To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org