[ https://issues.apache.org/jira/browse/SPARK-26165?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16698227#comment-16698227 ]
Sujith edited comment on SPARK-26165 at 11/25/18 7:16 PM: ---------------------------------------------------------- I think we shall avoid casting to string in the cases where filter condition literals of string type value can generate a valid date/timestamp, like the filter condition mentioned in jira ,otherwise we can fallback to the current logic of casting to string type. This approach can also avoid the unnecessary overhead of casting the left filter column expression timestamp/date type values to string type as mentioned in JIRA. I wll raise a PR for handle this issue.. please let me know for any suggestions. cc [~srowen] [~cloud_fan] [~vinodkc] was (Author: s71955): I think we shall avoid casting to string in the cases where filter condition literals string type value can generate a valid date or timestamp, like the filter condition mentioned in jira ,otherwise we can fallback to the current logic of cast to string type. This approach can also avoid the unnecessary overhead of casting the left filter column expression timestamp/date type values to string I wll raise a PR for handle this issue.. please let me know for any suggestions. cc [~srowen] [~cloud_fan] [~vinodkc] > Date and Timestamp column expression is getting converted to string in less > than/greater than filter query even though valid date/timestamp string > literal is used in the right side filter expression > ------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------ > > Key: SPARK-26165 > URL: https://issues.apache.org/jira/browse/SPARK-26165 > Project: Spark > Issue Type: Improvement > Components: Optimizer > Affects Versions: 2.3.2, 2.4.0 > Reporter: Sujith > Priority: Major > Attachments: timestamp_filter_perf.PNG > > > Date and Timestamp column is getting converted to string in less than/greater > than filter query even though date strings that contains a time, like > '2018-03-18" 12:39:40' to date. Besides it's not possible to cast a string > like '2018-03-18 12:39:40' to a timestamp. > > scala> spark.sql("""explain extended SELECT username FROM orders WHERE > order_creation_date > '2017-02-26 13:45:12'""").show(false); > +------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- > |== Parsed Logical Plan == > 'Project ['username] > +- 'Filter ('order_creation_date > 2017-02-26 13:45:12) > +- 'UnresolvedRelation `orders` > == Analyzed Logical Plan == > username: string > Project [username#59] > +- Filter (cast(order_creation_date#60 as string) > 2017-02-26 13:45:12) > +- SubqueryAlias orders > +- HiveTableRelation `default`.`orders`, > org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe, [username#59, > order_creation_date#60, amount#61] > == Optimized Logical Plan == > Project [username#59] > +- Filter (isnotnull(order_creation_date#60) && (cast(order_creation_date#60 > as string) > 2017-02-26 13:45:12)) > +- HiveTableRelation `default`.`orders`, > org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe, [username#59, > order_creation_date#60, amount#61] > == Physical Plan == > *(1) Project [username#59] > +- *(1) Filter (isnotnull(order_creation_date#60) && > (cast(order_creation_date#60 as string) > 2017-02-26 13:45:12)) > +- HiveTableScan [order_creation_date#60, username#59], HiveTableRelation > `default`.`orders`, org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe, > [username#59, order_creation > +-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- > --------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- -- This message was sent by Atlassian JIRA (v7.6.3#76005) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org