d80tb7 opened a new pull request #31399:
URL: https://github.com/apache/spark/pull/31399


   ### What changes were proposed in this pull request?
   - Added  `isSpecialTimestamp`  and `isSpecialDate` functions to 
`DatetimeUtils` so that we can determine if a string matches one of the 
shorthand date(time) notations (`now`, `today` etc). 
   - Updated PartitioningUtils.inferPartitionColumnValue so that we don't infer 
date and time values if they match one of the shorthand notations.
   
   ### Why are the changes needed?
   Previously, reading a partitioned dataset from file where one of the 
partition values matches a special date/timestamp (`NOW`, `TODAY` etc) caused 
the value to be interpreted as a date/timestamp rather than a string.  This 
would occur even if the the column as a whole was determined to be of type 
string. This change ensures that strings matching these shorthands are 
preserved as strings.    
   
   ### Does this PR introduce _any_ user-facing change?
   Yes: any file-partitioned datasets possessing a string matching a special 
date/timestamp  as a partition key will have that value interpreted as a string 
rather than as a date/timestamp.
   
   As an example, consider the following program:
   
   ```
   import org.apache.spark.sql.SparkSession
   import org.apache.spark.sql.functions._
   
   object TestBug {
   
     def main(args: Array[String]): Unit = {
       val spark = SparkSession.builder().master("local[*]").getOrCreate()
   
       val df = spark.range(1, 2).withColumn("partition", lit("NOW"))
       df.write.mode("overwrite").partitionBy("partition").parquet("bug")
       
       spark.read.parquet("bug").show(truncate = false)
     }
   
   }
   ```
   
   Prior to this change, the output would be be of the form:
   
   ```
   +---+--------------------------+
   |id |partition |
   +---+--------------------------+
   |1 |2021-01-27 08:53:23.650039|
   +---+--------------------------+
   ```
   
   Following this change, the output is of the form:
   
   ```
   +---+---------+
   |id |partition|
   +---+---------+
   |1  |NOW      |
   +---+---------+
   ```
   
   ### How was this patch tested?
   * Extended  `column type inference` checks in ParquetPartitionDiscoverySuite 
to confirm that special date/timestamps where being correctly handled. 
   * Ensured all existing tests pass


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

Reply via email to