[ 
https://issues.apache.org/jira/browse/SPARK-26165?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16698227#comment-16698227
 ] 

Sujith edited comment on SPARK-26165 at 11/25/18 7:11 PM:
----------------------------------------------------------

I think we shall avoid casting to string in the cases like if Date/timestamp 
string can be converted to a valid date or timestamp like the condition 
mentioned in jira ,otherwise we can fallback to the current logic of cast to 
string type.

This approach can also avoid the unnecessary overhead of casting the left 
filter column expression timestamp/date type values to string

I wll raise a PR for handle this issue.. please let me know for any suggestions.

 cc [~srowen]  [~cloud_fan] [~vinodkc] 


was (Author: s71955):
I think we shall avoid casting to string in the cases like if Date/timestamp 
string can be converted to a valid date or timestamp like the condition 
mentioned in jira ,otherwise we can cast to string type as per current logic.

cc [~srowen]  [~cloud_fan] [~vinodkc] 

I wll raise a PR for handle this issue.. please let me know for any suggestions.

 

> Date and Timestamp column expression is getting converted to string in less 
> than/greater than filter query even though valid date/timestamp string 
> literal is used in the right side filter expression
> ------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
>
>                 Key: SPARK-26165
>                 URL: https://issues.apache.org/jira/browse/SPARK-26165
>             Project: Spark
>          Issue Type: Improvement
>          Components: Optimizer
>    Affects Versions: 2.3.2, 2.4.0
>            Reporter: Sujith
>            Priority: Major
>         Attachments: timestamp_filter_perf.PNG
>
>
> Date and Timestamp column is getting converted to string in less than/greater 
> than filter query even though date strings that contains a time, like 
> '2018-03-18" 12:39:40' to date. Besides it's not possible to cast a string 
> like '2018-03-18 12:39:40' to a timestamp.
>  
> scala> spark.sql("""explain extended SELECT username FROM orders WHERE 
> order_creation_date > '2017-02-26 13:45:12'""").show(false);
> +-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
> |== Parsed Logical Plan ==
> 'Project ['username]
> +- 'Filter ('order_creation_date > 2017-02-26 13:45:12)
>  +- 'UnresolvedRelation `orders`
> == Analyzed Logical Plan ==
> username: string
> Project [username#59]
> +- Filter (cast(order_creation_date#60 as string) > 2017-02-26 13:45:12)
>  +- SubqueryAlias orders
>  +- HiveTableRelation `default`.`orders`, 
> org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe, [username#59, 
> order_creation_date#60, amount#61]
> == Optimized Logical Plan ==
> Project [username#59]
> +- Filter (isnotnull(order_creation_date#60) && (cast(order_creation_date#60 
> as string) > 2017-02-26 13:45:12))
>  +- HiveTableRelation `default`.`orders`, 
> org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe, [username#59, 
> order_creation_date#60, amount#61]
> == Physical Plan ==
> *(1) Project [username#59]
> +- *(1) Filter (isnotnull(order_creation_date#60) && 
> (cast(order_creation_date#60 as string) > 2017-02-26 13:45:12))
>  +- HiveTableScan [order_creation_date#60, username#59], HiveTableRelation 
> `default`.`orders`, org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe, 
> [username#59, order_creation
> +--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
> ---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

Reply via email to