[jira] [Commented] (SPARK-34259) Reading a partitioned dataset with a partition value of NOW causes the value to be parsed as a timestamp.

Apache Spark (Jira) Sat, 30 Jan 2021 09:07:07 -0800


    [ 
https://issues.apache.org/jira/browse/SPARK-34259?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17275666#comment-17275666
 ]


Apache Spark commented on SPARK-34259:
--------------------------------------

User 'd80tb7' has created a pull request for this issue:
https://github.com/apache/spark/pull/31399

> Reading a partitioned dataset with a partition value of NOW causes the value 
> to be parsed as a timestamp.
> ---------------------------------------------------------------------------------------------------------
>
>                 Key: SPARK-34259
>                 URL: https://issues.apache.org/jira/browse/SPARK-34259
>             Project: Spark
>          Issue Type: Bug
>          Components: SQL
>    Affects Versions: 3.0.1
>            Reporter: Chris Martin
>            Priority: Minor
>
> *Problem*
> Reading a partitioned dataset where one of the column values matches a 
> special timestamp (NOW, TODAY etc) causes the value to be interpreted as a 
> timestamp rather than a string. 
> *Example Code (Scala)*
> {code:java}
> import org.apache.spark.sql.SparkSession
> import org.apache.spark.sql.functions._
> object TestBug {
>   def main(args: Array[String]): Unit = {
>     val spark = SparkSession.builder().master("local[*]").getOrCreate()
>     val df = spark.range(1, 2).withColumn("partition", lit("NOW"))
>     df.write.mode("overwrite").partitionBy("partition").parquet("bug")
>     
>     spark.read.parquet("bug").show(truncate = false)
>   }
> }
> {code}
>  The above program prints out:
> {noformat}
> +---+--------------------------+
> |id |partition |
> +---+--------------------------+
> |1 |2021-01-27 08:53:23.650039|
> +---+--------------------------+
> {noformat}
>  
> *Analysis*
> This happens because in PartitioningUtils.inferPartitionColumnValue we try 
> and cast the partition value to a timestamp in order to determine if 
> timestamp is a valid interpretation.  As NOW etc are literals which are valid 
> to cast to timestamps, the code ends up as interpreting the value as a 
> timestamp.
> I think what we want to do here is change 
> PartitioningUtils.inferPartitionColumnValue so that when it  attempts to 
> interpret as timestamp we ignore the special values. This looks difficult to 
> do if we continue to use cast, but one other option is to add an option to
> DateTimeUtils.stringToDate to tell it to ignore special values and instead 
> use that to do the conversion.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-34259) Reading a partitioned dataset with a partition value of NOW causes the value to be parsed as a timestamp.

Reply via email to