[jira] [Commented] (SPARK-33632) to_date doesn't behave as documented

2020-12-04 Thread Frank Oosterhuis (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-33632?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17243858#comment-17243858
 ] 

Frank Oosterhuis commented on SPARK-33632:
--

[~qwe1398775315] is right, and the spec is actually somewhat clear about this.

"Year: The count of letters determines the minimum field width below which 
padding is used. If the count of letters is two, then a reduced two digit form 
is used. For printing, this outputs the rightmost two digits. *For parsing, 
this will parse using the base value of 2000, resulting in a year within the 
range 2000 to 2099 inclusive.* If the count of letters is less than four (but 
not two), then the sign is only output for negative years. Otherwise, the sign 
is output if the pad width is exceeded when ‘G’ is not present. 7 or more 
letters will fail."

 The table could be a bit clearer.
 !screenshot-1.png! 

> to_date doesn't behave as documented
> 
>
> Key: SPARK-33632
> URL: https://issues.apache.org/jira/browse/SPARK-33632
> Project: Spark
>  Issue Type: Bug
>  Components: Spark Core
>Affects Versions: 3.0.1
>Reporter: Frank Oosterhuis
>Priority: Major
> Attachments: image-2020-12-04-11-45-10-379.png, screenshot-1.png
>
>
> I'm trying to use to_date on a string formatted as "10/31/20".
> Expected output is "2020-10-31".
> Actual output is "0020-01-31".
> The 
> [documentation|https://spark.apache.org/docs/latest/sql-ref-datetime-pattern.html]
>  suggests 2020 or 20 as input for "y".
> Example below. Expected behaviour is included in the udf.
> {code:scala}
> import java.sql.Date
> import org.apache.spark.sql.SparkSession
> import org.apache.spark.sql.functions.{to_date, udf}
> object ToDate {
>   val toDate = udf((date: String) => {
> val split = date.split("/")
> val month = "%02d".format(split(0).toInt)
> val day = "%02d".format(split(1).toInt)
> val year = split(2).toInt + 2000
> Date.valueOf(s"${year}-${month}-${day}")
>   })
>   def main(args: Array[String]): Unit = {
> val spark = SparkSession.builder().master("local[2]").getOrCreate()
> spark.sparkContext.setLogLevel("ERROR")
> import spark.implicits._
> Seq("1/1/20", "10/31/20")
>   .toDF("raw")
>   .withColumn("to_date", to_date($"raw", "m/d/y"))
>   .withColumn("udf", toDate($"raw"))
>   .show
>   }
> }
> {code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-33632) to_date doesn't behave as documented

2020-12-03 Thread Liu Neng (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-33632?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17243681#comment-17243681
 ] 

Liu Neng commented on SPARK-33632:
--

you should use pattern m/d/yy, parse mode is determined by count of letter 'y'.

below is source code from DateTimeFormatterBuilder.

!image-2020-12-04-11-45-10-379.png!

> to_date doesn't behave as documented
> 
>
> Key: SPARK-33632
> URL: https://issues.apache.org/jira/browse/SPARK-33632
> Project: Spark
>  Issue Type: Bug
>  Components: Spark Core
>Affects Versions: 3.0.1
>Reporter: Frank Oosterhuis
>Priority: Major
> Attachments: image-2020-12-04-11-45-10-379.png
>
>
> I'm trying to use to_date on a string formatted as "10/31/20".
> Expected output is "2020-10-31".
> Actual output is "0020-01-31".
> The 
> [documentation|https://spark.apache.org/docs/latest/sql-ref-datetime-pattern.html]
>  suggests 2020 or 20 as input for "y".
> Example below. Expected behaviour is included in the udf.
> {code:scala}
> import java.sql.Date
> import org.apache.spark.sql.SparkSession
> import org.apache.spark.sql.functions.{to_date, udf}
> object ToDate {
>   val toDate = udf((date: String) => {
> val split = date.split("/")
> val month = "%02d".format(split(0).toInt)
> val day = "%02d".format(split(1).toInt)
> val year = split(2).toInt + 2000
> Date.valueOf(s"${year}-${month}-${day}")
>   })
>   def main(args: Array[String]): Unit = {
> val spark = SparkSession.builder().master("local[2]").getOrCreate()
> spark.sparkContext.setLogLevel("ERROR")
> import spark.implicits._
> Seq("1/1/20", "10/31/20")
>   .toDF("raw")
>   .withColumn("to_date", to_date($"raw", "m/d/y"))
>   .withColumn("udf", toDate($"raw"))
>   .show
>   }
> }
> {code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-33632) to_date doesn't behave as documented

2020-12-02 Thread Hyukjin Kwon (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-33632?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17242893#comment-17242893
 ] 

Hyukjin Kwon commented on SPARK-33632:
--

cc [~XuanYuan] and [~maxgekk] FYI

> to_date doesn't behave as documented
> 
>
> Key: SPARK-33632
> URL: https://issues.apache.org/jira/browse/SPARK-33632
> Project: Spark
>  Issue Type: Bug
>  Components: Spark Core
>Affects Versions: 3.0.1
>Reporter: Frank Oosterhuis
>Priority: Major
>
> I'm trying to use to_date on a string formatted as "10/31/20".
> Expected output is "2020-10-31".
> Actual output is "0020-01-31".
> The 
> [documentation|https://spark.apache.org/docs/latest/sql-ref-datetime-pattern.html]
>  suggests 2020 or 20 as input for "y".
> Example below. Expected behaviour is included in the udf.
> {code:scala}
> import java.sql.Date
> import org.apache.spark.sql.SparkSession
> import org.apache.spark.sql.functions.{to_date, udf}
> object ToDate {
>   val toDate = udf((date: String) => {
> val split = date.split("/")
> val month = "%02d".format(split(0).toInt)
> val day = "%02d".format(split(1).toInt)
> val year = split(2).toInt + 2000
> Date.valueOf(s"${year}-${month}-${day}")
>   })
>   def main(args: Array[String]): Unit = {
> val spark = SparkSession.builder().master("local[2]").getOrCreate()
> spark.sparkContext.setLogLevel("ERROR")
> import spark.implicits._
> Seq("1/1/20", "10/31/20")
>   .toDF("raw")
>   .withColumn("to_date", to_date($"raw", "m/d/y"))
>   .withColumn("udf", toDate($"raw"))
>   .show
>   }
> }
> {code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org