Laurens Versluis created SPARK-41266: ----------------------------------------
Summary: Spark does not parse timestamp strings when using the IN operator Key: SPARK-41266 URL: https://issues.apache.org/jira/browse/SPARK-41266 Project: Spark Issue Type: Bug Components: SQL Affects Versions: 3.2.1 Environment: Windows 10, Spark 3.2.1 with Java 11 Reporter: Laurens Versluis Likely affects more versions, tested only with 3.2.1. Summary: Spark will convert a timestamp string to a timestamp when using the equal operator (=), yet won't do this when using the IN operator. Details: While debugging an issue why we got no results on a query, we found out that when using the equal symbol `=` in the WHERE clause combined with a TimeStampType column that Spark will convert the string to a timestamp and filter. However, when using the IN operator (our query), it will not do so, and perform a cast to string. We expected the behavior to be similar, or at least that Spark realizes the IN clause operates on a TimeStampType column and thus attempts to convert to timestamp first before falling back to string comparison. *Minimal reproducible example:* Suppose we have a one-line dataset with the follow contents and schema: {noformat} +----------------------------+ |starttime | +----------------------------+ |2019-08-11 19:33:05 | +----------------------------+ root |-- starttime: timestamp (nullable = true){noformat} Then if we fire the following queries, we will not get results for the IN-clause one using a timestamp string with timezone information: {code:java} // Works - Spark casts the argument to a string and the internal representation of the time seems to match it... singleCol.filter("starttime IN ('2019-08-11 19:33:05')").show(); // Works singleCol.filter("starttime = '2019-08-11 19:33:05'").show(); // Works singleCol.filter("starttime = '2019-08-11T19:33:05Z'").show(); // Doesn't work singleCol.filter("starttime IN ('2019-08-11T19:33:05Z')").show(); //Works singleCol.filter("starttime IN (to_timestamp('2019-08-11T19:33:05Z'))").show(); {code} We can see from the output that a cast to string is taking place: {noformat} [...] isnotnull(starttime#59),(cast(starttime#59 as string) = 2019-08-11 19:33:05){noformat} Since the = operator does work, it would be consistent if operators such as the IN operator would have similar, consistent behavior. -- This message was sent by Atlassian Jira (v8.20.10#820010) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org