[
https://issues.apache.org/jira/browse/SPARK-34259?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17275666#comment-17275666
]
Apache Spark commented on SPARK-34259:
--------------------------------------
User 'd80tb7' has created a pull request for this issue:
https://github.com/apache/spark/pull/31399
> Reading a partitioned dataset with a partition value of NOW causes the value
> to be parsed as a timestamp.
> ---------------------------------------------------------------------------------------------------------
>
> Key: SPARK-34259
> URL: https://issues.apache.org/jira/browse/SPARK-34259
> Project: Spark
> Issue Type: Bug
> Components: SQL
> Affects Versions: 3.0.1
> Reporter: Chris Martin
> Priority: Minor
>
> *Problem*
> Reading a partitioned dataset where one of the column values matches a
> special timestamp (NOW, TODAY etc) causes the value to be interpreted as a
> timestamp rather than a string.
> *Example Code (Scala)*
> {code:java}
> import org.apache.spark.sql.SparkSession
> import org.apache.spark.sql.functions._
> object TestBug {
> def main(args: Array[String]): Unit = {
> val spark = SparkSession.builder().master("local[*]").getOrCreate()
> val df = spark.range(1, 2).withColumn("partition", lit("NOW"))
> df.write.mode("overwrite").partitionBy("partition").parquet("bug")
>
> spark.read.parquet("bug").show(truncate = false)
> }
> }
> {code}
> The above program prints out:
> {noformat}
> +---+--------------------------+
> |id |partition |
> +---+--------------------------+
> |1 |2021-01-27 08:53:23.650039|
> +---+--------------------------+
> {noformat}
>
> *Analysis*
> This happens because in PartitioningUtils.inferPartitionColumnValue we try
> and cast the partition value to a timestamp in order to determine if
> timestamp is a valid interpretation. As NOW etc are literals which are valid
> to cast to timestamps, the code ends up as interpreting the value as a
> timestamp.
> I think what we want to do here is change
> PartitioningUtils.inferPartitionColumnValue so that when it attempts to
> interpret as timestamp we ignore the special values. This looks difficult to
> do if we continue to use cast, but one other option is to add an option to
> DateTimeUtils.stringToDate to tell it to ignore special values and instead
> use that to do the conversion.
--
This message was sent by Atlassian Jira
(v8.3.4#803005)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]