[GitHub] spark pull request #16870: [SPARK-19496][SQL]to_date udf to return null when...
Github user gatorsmile commented on a diff in the pull request: https://github.com/apache/spark/pull/16870#discussion_r100859963 --- Diff: sql/core/src/test/scala/org/apache/spark/sql/DateFunctionsSuite.scala --- @@ -477,6 +483,27 @@ class DateFunctionsSuite extends QueryTest with SharedSQLContext { checkAnswer(df.selectExpr(s"unix_timestamp(s, '$fmt')"), Seq( Row(ts1.getTime / 1000L), Row(ts2.getTime / 1000L))) +val x1 = "2015-07-24 10:00:00" +val x2 = "2015-25-07 02:02:02" +val x3 = "2015-07-24 25:02:02" +val x4 = "2015-24-07 26:02:02" +val ts3 = Timestamp.valueOf("2015-07-24 02:25:02") +val ts4 = Timestamp.valueOf("2015-07-24 00:10:00") + +val df1 = Seq(x1, x2, x3, x4).toDF("x") +checkAnswer(df1.select(unix_timestamp(col("x"))), Seq( + Row(ts1.getTime / 1000L), Row(null), Row(null), Row(null))) --- End diff -- uh, got it. Thanks! --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #16870: [SPARK-19496][SQL]to_date udf to return null when...
Github user asfgit closed the pull request at: https://github.com/apache/spark/pull/16870 --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #16870: [SPARK-19496][SQL]to_date udf to return null when...
Github user hvanhovell commented on a diff in the pull request: https://github.com/apache/spark/pull/16870#discussion_r100771130 --- Diff: sql/core/src/test/scala/org/apache/spark/sql/DateFunctionsSuite.scala --- @@ -477,6 +483,27 @@ class DateFunctionsSuite extends QueryTest with SharedSQLContext { checkAnswer(df.selectExpr(s"unix_timestamp(s, '$fmt')"), Seq( Row(ts1.getTime / 1000L), Row(ts2.getTime / 1000L))) +val x1 = "2015-07-24 10:00:00" +val x2 = "2015-25-07 02:02:02" +val x3 = "2015-07-24 25:02:02" +val x4 = "2015-24-07 26:02:02" +val ts3 = Timestamp.valueOf("2015-07-24 02:25:02") +val ts4 = Timestamp.valueOf("2015-07-24 00:10:00") + +val df1 = Seq(x1, x2, x3, x4).toDF("x") +checkAnswer(df1.select(unix_timestamp(col("x"))), Seq( + Row(ts1.getTime / 1000L), Row(null), Row(null), Row(null))) --- End diff -- @gatorsmile the ts1 var is defined at the beginning of the test. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #16870: [SPARK-19496][SQL]to_date udf to return null when...
Github user windpiger commented on a diff in the pull request: https://github.com/apache/spark/pull/16870#discussion_r100723941 --- Diff: sql/core/src/test/scala/org/apache/spark/sql/DateFunctionsSuite.scala --- @@ -500,6 +527,23 @@ class DateFunctionsSuite extends QueryTest with SharedSQLContext { Row(date1.getTime / 1000L), Row(date2.getTime / 1000L))) checkAnswer(df.selectExpr(s"to_unix_timestamp(s, '$fmt')"), Seq( Row(ts1.getTime / 1000L), Row(ts2.getTime / 1000L))) + +val x1 = "2015-07-24 10:00:00" +val x2 = "2015-25-07 02:02:02" +val x3 = "2015-07-24 25:02:02" +val x4 = "2015-24-07 26:02:02" +val ts3 = Timestamp.valueOf("2015-07-24 02:25:02") +val ts4 = Timestamp.valueOf("2015-07-24 00:10:00") + +val df1 = Seq(x1, x2, x3, x4).toDF("x") +checkAnswer(df1.selectExpr("to_unix_timestamp(x)"), Seq( + Row(ts1.getTime / 1000L), Row(null), Row(null), Row(null))) --- End diff -- the same with above~ --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #16870: [SPARK-19496][SQL]to_date udf to return null when...
Github user windpiger commented on a diff in the pull request: https://github.com/apache/spark/pull/16870#discussion_r100723919 --- Diff: sql/core/src/test/scala/org/apache/spark/sql/DateFunctionsSuite.scala --- @@ -477,6 +483,27 @@ class DateFunctionsSuite extends QueryTest with SharedSQLContext { checkAnswer(df.selectExpr(s"unix_timestamp(s, '$fmt')"), Seq( Row(ts1.getTime / 1000L), Row(ts2.getTime / 1000L))) +val x1 = "2015-07-24 10:00:00" +val x2 = "2015-25-07 02:02:02" +val x3 = "2015-07-24 25:02:02" +val x4 = "2015-24-07 26:02:02" +val ts3 = Timestamp.valueOf("2015-07-24 02:25:02") +val ts4 = Timestamp.valueOf("2015-07-24 00:10:00") + +val df1 = Seq(x1, x2, x3, x4).toDF("x") +checkAnswer(df1.select(unix_timestamp(col("x"))), Seq( + Row(ts1.getTime / 1000L), Row(null), Row(null), Row(null))) --- End diff -- yes, it is ts1, the timestamp of `x1` is `ts1` --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #16870: [SPARK-19496][SQL]to_date udf to return null when...
Github user gatorsmile commented on a diff in the pull request: https://github.com/apache/spark/pull/16870#discussion_r100713312 --- Diff: sql/core/src/test/scala/org/apache/spark/sql/DateFunctionsSuite.scala --- @@ -500,6 +527,23 @@ class DateFunctionsSuite extends QueryTest with SharedSQLContext { Row(date1.getTime / 1000L), Row(date2.getTime / 1000L))) checkAnswer(df.selectExpr(s"to_unix_timestamp(s, '$fmt')"), Seq( Row(ts1.getTime / 1000L), Row(ts2.getTime / 1000L))) + +val x1 = "2015-07-24 10:00:00" +val x2 = "2015-25-07 02:02:02" +val x3 = "2015-07-24 25:02:02" +val x4 = "2015-24-07 26:02:02" +val ts3 = Timestamp.valueOf("2015-07-24 02:25:02") +val ts4 = Timestamp.valueOf("2015-07-24 00:10:00") + +val df1 = Seq(x1, x2, x3, x4).toDF("x") +checkAnswer(df1.selectExpr("to_unix_timestamp(x)"), Seq( + Row(ts1.getTime / 1000L), Row(null), Row(null), Row(null))) --- End diff -- The same issue here --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #16870: [SPARK-19496][SQL]to_date udf to return null when...
Github user gatorsmile commented on a diff in the pull request: https://github.com/apache/spark/pull/16870#discussion_r100713251 --- Diff: sql/core/src/test/scala/org/apache/spark/sql/DateFunctionsSuite.scala --- @@ -477,6 +483,27 @@ class DateFunctionsSuite extends QueryTest with SharedSQLContext { checkAnswer(df.selectExpr(s"unix_timestamp(s, '$fmt')"), Seq( Row(ts1.getTime / 1000L), Row(ts2.getTime / 1000L))) +val x1 = "2015-07-24 10:00:00" +val x2 = "2015-25-07 02:02:02" +val x3 = "2015-07-24 25:02:02" +val x4 = "2015-24-07 26:02:02" +val ts3 = Timestamp.valueOf("2015-07-24 02:25:02") +val ts4 = Timestamp.valueOf("2015-07-24 00:10:00") + +val df1 = Seq(x1, x2, x3, x4).toDF("x") +checkAnswer(df1.select(unix_timestamp(col("x"))), Seq( + Row(ts1.getTime / 1000L), Row(null), Row(null), Row(null))) --- End diff -- `ts1`? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #16870: [SPARK-19496][SQL]to_date udf to return null when...
Github user gatorsmile commented on a diff in the pull request: https://github.com/apache/spark/pull/16870#discussion_r100712695 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/util/DateTimeUtils.scala --- @@ -98,6 +98,7 @@ object DateTimeUtils { def newDateFormat(formatString: String, timeZone: TimeZone): DateFormat = { val sdf = new SimpleDateFormat(formatString, Locale.US) sdf.setTimeZone(timeZone) +sdf.setLenient(false) --- End diff -- Please add one line comment: > // Enable strict parsing, inputs must precisely match this object's format. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #16870: [SPARK-19496][SQL]to_date udf to return null when...
Github user windpiger commented on a diff in the pull request: https://github.com/apache/spark/pull/16870#discussion_r100687862 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/util/DateTimeUtils.scala --- @@ -95,9 +95,10 @@ object DateTimeUtils { sdf } - def newDateFormat(formatString: String, timeZone: TimeZone): DateFormat = { + def newDateFormat(formatString: String, timeZone: TimeZone, isLenient: Boolean): DateFormat = { val sdf = new SimpleDateFormat(formatString, Locale.US) sdf.setTimeZone(timeZone) +sdf.setLenient(isLenient) --- End diff -- ok thanks~ --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #16870: [SPARK-19496][SQL]to_date udf to return null when...
Github user cloud-fan commented on a diff in the pull request: https://github.com/apache/spark/pull/16870#discussion_r100687520 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/util/DateTimeUtils.scala --- @@ -95,9 +95,10 @@ object DateTimeUtils { sdf } - def newDateFormat(formatString: String, timeZone: TimeZone): DateFormat = { + def newDateFormat(formatString: String, timeZone: TimeZone, isLenient: Boolean): DateFormat = { val sdf = new SimpleDateFormat(formatString, Locale.US) sdf.setTimeZone(timeZone) +sdf.setLenient(isLenient) --- End diff -- yes, it will be good if we don't need to introduce new parameter --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #16870: [SPARK-19496][SQL]to_date udf to return null when...
Github user windpiger commented on a diff in the pull request: https://github.com/apache/spark/pull/16870#discussion_r100683849 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/util/DateTimeUtils.scala --- @@ -95,9 +95,10 @@ object DateTimeUtils { sdf } - def newDateFormat(formatString: String, timeZone: TimeZone): DateFormat = { + def newDateFormat(formatString: String, timeZone: TimeZone, isLenient: Boolean): DateFormat = { val sdf = new SimpleDateFormat(formatString, Locale.US) sdf.setTimeZone(timeZone) +sdf.setLenient(isLenient) --- End diff -- we can test it with lenient false. this is a util func, if test is ok, should we always set it to false? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #16870: [SPARK-19496][SQL]to_date udf to return null when...
Github user cloud-fan commented on a diff in the pull request: https://github.com/apache/spark/pull/16870#discussion_r100680992 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/util/DateTimeUtils.scala --- @@ -95,9 +95,10 @@ object DateTimeUtils { sdf } - def newDateFormat(formatString: String, timeZone: TimeZone): DateFormat = { + def newDateFormat(formatString: String, timeZone: TimeZone, isLenient: Boolean): DateFormat = { val sdf = new SimpleDateFormat(formatString, Locale.US) sdf.setTimeZone(timeZone) +sdf.setLenient(isLenient) --- End diff -- what if we always set `lenient` to false? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #16870: [SPARK-19496][SQL]to_date udf to return null when...
Github user windpiger commented on a diff in the pull request: https://github.com/apache/spark/pull/16870#discussion_r100656533 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/datetimeExpressions.scala --- @@ -465,15 +465,15 @@ case class DateFormatClass(left: Expression, right: Expression, timeZoneId: Opti copy(timeZoneId = Option(timeZoneId)) override protected def nullSafeEval(timestamp: Any, format: Any): Any = { -val df = DateTimeUtils.newDateFormat(format.toString, timeZone) +val df = DateTimeUtils.newDateFormat(format.toString, timeZone, isLenient = true) UTF8String.fromString(df.format(new java.util.Date(timestamp.asInstanceOf[Long] / 1000))) } override def doGenCode(ctx: CodegenContext, ev: ExprCode): ExprCode = { val dtu = DateTimeUtils.getClass.getName.stripSuffix("$") val tz = ctx.addReferenceMinorObj(timeZone) defineCodeGen(ctx, ev, (timestamp, format) => { - s"""UTF8String.fromString($dtu.newDateFormat($format.toString(), $tz) + s"""UTF8String.fromString($dtu.newDateFormat($format.toString(), $tz, false) --- End diff -- oh, sorry let me fix it, thanks! --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #16870: [SPARK-19496][SQL]to_date udf to return null when...
Github user hvanhovell commented on a diff in the pull request: https://github.com/apache/spark/pull/16870#discussion_r100539828 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/datetimeExpressions.scala --- @@ -465,15 +465,15 @@ case class DateFormatClass(left: Expression, right: Expression, timeZoneId: Opti copy(timeZoneId = Option(timeZoneId)) override protected def nullSafeEval(timestamp: Any, format: Any): Any = { -val df = DateTimeUtils.newDateFormat(format.toString, timeZone) +val df = DateTimeUtils.newDateFormat(format.toString, timeZone, isLenient = true) UTF8String.fromString(df.format(new java.util.Date(timestamp.asInstanceOf[Long] / 1000))) } override def doGenCode(ctx: CodegenContext, ev: ExprCode): ExprCode = { val dtu = DateTimeUtils.getClass.getName.stripSuffix("$") val tz = ctx.addReferenceMinorObj(timeZone) defineCodeGen(ctx, ev, (timestamp, format) => { - s"""UTF8String.fromString($dtu.newDateFormat($format.toString(), $tz) + s"""UTF8String.fromString($dtu.newDateFormat($format.toString(), $tz, false) --- End diff -- Why is this one not lenient? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #16870: [SPARK-19496][SQL]to_date udf to return null when...
Github user windpiger commented on a diff in the pull request: https://github.com/apache/spark/pull/16870#discussion_r100539384 --- Diff: sql/core/src/test/scala/org/apache/spark/sql/DateFunctionsSuite.scala --- @@ -477,6 +475,24 @@ class DateFunctionsSuite extends QueryTest with SharedSQLContext { checkAnswer(df.selectExpr(s"unix_timestamp(s, '$fmt')"), Seq( Row(ts1.getTime / 1000L), Row(ts2.getTime / 1000L))) + +val x1 = "2015-07-24 10:00:00" +val x2 = "2015-25-07 02:02:02" +val x3 = "2015-07-24 25:02:02" +val x4 = "2015-24-07 26:02:02" +val ts3 = Timestamp.valueOf("2015-07-24 02:25:02") +val ts4 = Timestamp.valueOf("2015-07-24 00:10:00") + +val df1 = Seq(x1, x2, x3, x4).toDF("x") +checkAnswer(df1.select(unix_timestamp(col("x"))), Seq( + Row(ts1.getTime / 1000L), Row(null), Row(null), Row(null))) +checkAnswer(df1.selectExpr("unix_timestamp(x)"), Seq( + Row(ts1.getTime / 1000L), Row(null), Row(null), Row(null))) +checkAnswer(df1.select(unix_timestamp(col("x"), "-dd-MM HH:mm:ss")), Seq( + Row(ts2.getTime / 1000L), Row(null), Row(null), Row(null))) +checkAnswer(df1.selectExpr(s"unix_timestamp(x, '-MM-dd mm:HH:ss')"), Seq( + Row(ts3.getTime / 1000L), Row(ts4.getTime / 1000L), Row(null), Row(null))) --- End diff -- you are right~ thanks a lot! --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #16870: [SPARK-19496][SQL]to_date udf to return null when...
Github user windpiger commented on a diff in the pull request: https://github.com/apache/spark/pull/16870#discussion_r100539442 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/util/DateTimeUtils.scala --- @@ -95,9 +95,12 @@ object DateTimeUtils { sdf } - def newDateFormat(formatString: String, timeZone: TimeZone): DateFormat = { + def newDateFormat(formatString: String, + timeZone: TimeZone, + isLenient: Boolean = true): DateFormat = { --- End diff -- ok let me modify it --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #16870: [SPARK-19496][SQL]to_date udf to return null when...
Github user hvanhovell commented on a diff in the pull request: https://github.com/apache/spark/pull/16870#discussion_r100523109 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/util/DateTimeUtils.scala --- @@ -95,9 +95,12 @@ object DateTimeUtils { sdf } - def newDateFormat(formatString: String, timeZone: TimeZone): DateFormat = { + def newDateFormat(formatString: String, + timeZone: TimeZone, + isLenient: Boolean = true): DateFormat = { --- End diff -- Let's not make this a default parameter. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #16870: [SPARK-19496][SQL]to_date udf to return null when...
Github user hvanhovell commented on a diff in the pull request: https://github.com/apache/spark/pull/16870#discussion_r100523512 --- Diff: sql/core/src/test/scala/org/apache/spark/sql/DateFunctionsSuite.scala --- @@ -477,6 +475,24 @@ class DateFunctionsSuite extends QueryTest with SharedSQLContext { checkAnswer(df.selectExpr(s"unix_timestamp(s, '$fmt')"), Seq( Row(ts1.getTime / 1000L), Row(ts2.getTime / 1000L))) + +val x1 = "2015-07-24 10:00:00" +val x2 = "2015-25-07 02:02:02" +val x3 = "2015-07-24 25:02:02" +val x4 = "2015-24-07 26:02:02" +val ts3 = Timestamp.valueOf("2015-07-24 02:25:02") +val ts4 = Timestamp.valueOf("2015-07-24 00:10:00") + +val df1 = Seq(x1, x2, x3, x4).toDF("x") +checkAnswer(df1.select(unix_timestamp(col("x"))), Seq( + Row(ts1.getTime / 1000L), Row(null), Row(null), Row(null))) +checkAnswer(df1.selectExpr("unix_timestamp(x)"), Seq( + Row(ts1.getTime / 1000L), Row(null), Row(null), Row(null))) +checkAnswer(df1.select(unix_timestamp(col("x"), "-dd-MM HH:mm:ss")), Seq( + Row(ts2.getTime / 1000L), Row(null), Row(null), Row(null))) +checkAnswer(df1.selectExpr(s"unix_timestamp(x, '-MM-dd mm:HH:ss')"), Seq( + Row(ts3.getTime / 1000L), Row(ts4.getTime / 1000L), Row(null), Row(null))) --- End diff -- Shouldn't the order be `Row(ts4.getTime / 1000L), Row(null), Row(ts3.getTime / 1000L), Row(null)`? It does not matter for testing since we sort results, but it makes it less confusing. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #16870: [SPARK-19496][SQL]to_date udf to return null when...
Github user hvanhovell commented on a diff in the pull request: https://github.com/apache/spark/pull/16870#discussion_r100524361 --- Diff: sql/core/src/test/scala/org/apache/spark/sql/DateFunctionsSuite.scala --- @@ -500,6 +516,20 @@ class DateFunctionsSuite extends QueryTest with SharedSQLContext { Row(date1.getTime / 1000L), Row(date2.getTime / 1000L))) checkAnswer(df.selectExpr(s"to_unix_timestamp(s, '$fmt')"), Seq( Row(ts1.getTime / 1000L), Row(ts2.getTime / 1000L))) + + +val x1 = "2015-07-24 10:00:00" +val x2 = "2015-25-07 02:02:02" +val x3 = "2015-07-24 25:02:02" +val x4 = "2015-24-07 26:02:02" +val ts3 = Timestamp.valueOf("2015-07-24 02:25:02") +val ts4 = Timestamp.valueOf("2015-07-24 00:10:00") + +val df1 = Seq(x1, x2, x3, x4).toDF("x") +checkAnswer(df1.selectExpr("to_unix_timestamp(x)"), Seq( + Row(ts1.getTime / 1000L), Row(null), Row(null), Row(null))) +checkAnswer(df1.selectExpr(s"to_unix_timestamp(x, '-MM-dd mm:HH:ss')"), Seq( + Row(ts3.getTime / 1000L), Row(ts4.getTime / 1000L), Row(null), Row(null))) --- End diff -- Shouldn't the order be `Row(ts4.getTime / 1000L), Row(null), Row(ts3.getTime / 1000L), Row(null)`? It does not matter for testing since we sort results, but it makes it less confusing. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #16870: [SPARK-19496][SQL]to_date udf to return null when...
GitHub user windpiger opened a pull request: https://github.com/apache/spark/pull/16870 [SPARK-19496][SQL]to_date udf to return null when input date is invalid ## What changes were proposed in this pull request? Currently the udf `to_date` has different return value with an invalid date input. ``` SELECT to_date('2015-07-22', '-dd-MM') -> return `2016-10-07` SELECT to_date('2014-31-12')-> return null ``` As discussed in JIRA [SPARK-19496](https://issues.apache.org/jira/browse/SPARK-19496), we should return null in both situations when the input date is invalid ## How was this patch tested? unit test added You can merge this pull request into a Git repository by running: $ git pull https://github.com/windpiger/spark to_date Alternatively you can review and apply these changes as the patch at: https://github.com/apache/spark/pull/16870.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #16870 commit 24c99fa4c0ddcdbd852b8d5354aaff43fa198477 Author: windpiger Date: 2017-02-09T09:58:17Z [SPARK-19496][SQL]to_date udf to return null when input date is invalid --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org