[jira] [Commented] (SPARK-33632) to_date doesn't behave as documented
[ https://issues.apache.org/jira/browse/SPARK-33632?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17243858#comment-17243858 ] Frank Oosterhuis commented on SPARK-33632: -- [~qwe1398775315] is right, and the spec is actually somewhat clear about this. "Year: The count of letters determines the minimum field width below which padding is used. If the count of letters is two, then a reduced two digit form is used. For printing, this outputs the rightmost two digits. *For parsing, this will parse using the base value of 2000, resulting in a year within the range 2000 to 2099 inclusive.* If the count of letters is less than four (but not two), then the sign is only output for negative years. Otherwise, the sign is output if the pad width is exceeded when ‘G’ is not present. 7 or more letters will fail." The table could be a bit clearer. !screenshot-1.png! > to_date doesn't behave as documented > > > Key: SPARK-33632 > URL: https://issues.apache.org/jira/browse/SPARK-33632 > Project: Spark > Issue Type: Bug > Components: Spark Core >Affects Versions: 3.0.1 >Reporter: Frank Oosterhuis >Priority: Major > Attachments: image-2020-12-04-11-45-10-379.png, screenshot-1.png > > > I'm trying to use to_date on a string formatted as "10/31/20". > Expected output is "2020-10-31". > Actual output is "0020-01-31". > The > [documentation|https://spark.apache.org/docs/latest/sql-ref-datetime-pattern.html] > suggests 2020 or 20 as input for "y". > Example below. Expected behaviour is included in the udf. > {code:scala} > import java.sql.Date > import org.apache.spark.sql.SparkSession > import org.apache.spark.sql.functions.{to_date, udf} > object ToDate { > val toDate = udf((date: String) => { > val split = date.split("/") > val month = "%02d".format(split(0).toInt) > val day = "%02d".format(split(1).toInt) > val year = split(2).toInt + 2000 > Date.valueOf(s"${year}-${month}-${day}") > }) > def main(args: Array[String]): Unit = { > val spark = SparkSession.builder().master("local[2]").getOrCreate() > spark.sparkContext.setLogLevel("ERROR") > import spark.implicits._ > Seq("1/1/20", "10/31/20") > .toDF("raw") > .withColumn("to_date", to_date($"raw", "m/d/y")) > .withColumn("udf", toDate($"raw")) > .show > } > } > {code} -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-33632) to_date doesn't behave as documented
[ https://issues.apache.org/jira/browse/SPARK-33632?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17243681#comment-17243681 ] Liu Neng commented on SPARK-33632: -- you should use pattern m/d/yy, parse mode is determined by count of letter 'y'. below is source code from DateTimeFormatterBuilder. !image-2020-12-04-11-45-10-379.png! > to_date doesn't behave as documented > > > Key: SPARK-33632 > URL: https://issues.apache.org/jira/browse/SPARK-33632 > Project: Spark > Issue Type: Bug > Components: Spark Core >Affects Versions: 3.0.1 >Reporter: Frank Oosterhuis >Priority: Major > Attachments: image-2020-12-04-11-45-10-379.png > > > I'm trying to use to_date on a string formatted as "10/31/20". > Expected output is "2020-10-31". > Actual output is "0020-01-31". > The > [documentation|https://spark.apache.org/docs/latest/sql-ref-datetime-pattern.html] > suggests 2020 or 20 as input for "y". > Example below. Expected behaviour is included in the udf. > {code:scala} > import java.sql.Date > import org.apache.spark.sql.SparkSession > import org.apache.spark.sql.functions.{to_date, udf} > object ToDate { > val toDate = udf((date: String) => { > val split = date.split("/") > val month = "%02d".format(split(0).toInt) > val day = "%02d".format(split(1).toInt) > val year = split(2).toInt + 2000 > Date.valueOf(s"${year}-${month}-${day}") > }) > def main(args: Array[String]): Unit = { > val spark = SparkSession.builder().master("local[2]").getOrCreate() > spark.sparkContext.setLogLevel("ERROR") > import spark.implicits._ > Seq("1/1/20", "10/31/20") > .toDF("raw") > .withColumn("to_date", to_date($"raw", "m/d/y")) > .withColumn("udf", toDate($"raw")) > .show > } > } > {code} -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-33632) to_date doesn't behave as documented
[ https://issues.apache.org/jira/browse/SPARK-33632?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17242893#comment-17242893 ] Hyukjin Kwon commented on SPARK-33632: -- cc [~XuanYuan] and [~maxgekk] FYI > to_date doesn't behave as documented > > > Key: SPARK-33632 > URL: https://issues.apache.org/jira/browse/SPARK-33632 > Project: Spark > Issue Type: Bug > Components: Spark Core >Affects Versions: 3.0.1 >Reporter: Frank Oosterhuis >Priority: Major > > I'm trying to use to_date on a string formatted as "10/31/20". > Expected output is "2020-10-31". > Actual output is "0020-01-31". > The > [documentation|https://spark.apache.org/docs/latest/sql-ref-datetime-pattern.html] > suggests 2020 or 20 as input for "y". > Example below. Expected behaviour is included in the udf. > {code:scala} > import java.sql.Date > import org.apache.spark.sql.SparkSession > import org.apache.spark.sql.functions.{to_date, udf} > object ToDate { > val toDate = udf((date: String) => { > val split = date.split("/") > val month = "%02d".format(split(0).toInt) > val day = "%02d".format(split(1).toInt) > val year = split(2).toInt + 2000 > Date.valueOf(s"${year}-${month}-${day}") > }) > def main(args: Array[String]): Unit = { > val spark = SparkSession.builder().master("local[2]").getOrCreate() > spark.sparkContext.setLogLevel("ERROR") > import spark.implicits._ > Seq("1/1/20", "10/31/20") > .toDF("raw") > .withColumn("to_date", to_date($"raw", "m/d/y")) > .withColumn("udf", toDate($"raw")) > .show > } > } > {code} -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org