[GitHub] spark pull request #16870: [SPARK-19496][SQL]to_date udf to return null when...

2017-02-13 Thread gatorsmile
Github user gatorsmile commented on a diff in the pull request:

https://github.com/apache/spark/pull/16870#discussion_r100859963
  
--- Diff: 
sql/core/src/test/scala/org/apache/spark/sql/DateFunctionsSuite.scala ---
@@ -477,6 +483,27 @@ class DateFunctionsSuite extends QueryTest with 
SharedSQLContext {
 checkAnswer(df.selectExpr(s"unix_timestamp(s, '$fmt')"), Seq(
   Row(ts1.getTime / 1000L), Row(ts2.getTime / 1000L)))
 
+val x1 = "2015-07-24 10:00:00"
+val x2 = "2015-25-07 02:02:02"
+val x3 = "2015-07-24 25:02:02"
+val x4 = "2015-24-07 26:02:02"
+val ts3 = Timestamp.valueOf("2015-07-24 02:25:02")
+val ts4 = Timestamp.valueOf("2015-07-24 00:10:00")
+
+val df1 = Seq(x1, x2, x3, x4).toDF("x")
+checkAnswer(df1.select(unix_timestamp(col("x"))), Seq(
+  Row(ts1.getTime / 1000L), Row(null), Row(null), Row(null)))
--- End diff --

uh, got it. Thanks!


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #16870: [SPARK-19496][SQL]to_date udf to return null when...

2017-02-13 Thread asfgit
Github user asfgit closed the pull request at:

https://github.com/apache/spark/pull/16870


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #16870: [SPARK-19496][SQL]to_date udf to return null when...

2017-02-13 Thread hvanhovell
Github user hvanhovell commented on a diff in the pull request:

https://github.com/apache/spark/pull/16870#discussion_r100771130
  
--- Diff: 
sql/core/src/test/scala/org/apache/spark/sql/DateFunctionsSuite.scala ---
@@ -477,6 +483,27 @@ class DateFunctionsSuite extends QueryTest with 
SharedSQLContext {
 checkAnswer(df.selectExpr(s"unix_timestamp(s, '$fmt')"), Seq(
   Row(ts1.getTime / 1000L), Row(ts2.getTime / 1000L)))
 
+val x1 = "2015-07-24 10:00:00"
+val x2 = "2015-25-07 02:02:02"
+val x3 = "2015-07-24 25:02:02"
+val x4 = "2015-24-07 26:02:02"
+val ts3 = Timestamp.valueOf("2015-07-24 02:25:02")
+val ts4 = Timestamp.valueOf("2015-07-24 00:10:00")
+
+val df1 = Seq(x1, x2, x3, x4).toDF("x")
+checkAnswer(df1.select(unix_timestamp(col("x"))), Seq(
+  Row(ts1.getTime / 1000L), Row(null), Row(null), Row(null)))
--- End diff --

@gatorsmile the ts1 var is defined at the beginning of the test.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #16870: [SPARK-19496][SQL]to_date udf to return null when...

2017-02-12 Thread windpiger
Github user windpiger commented on a diff in the pull request:

https://github.com/apache/spark/pull/16870#discussion_r100723941
  
--- Diff: 
sql/core/src/test/scala/org/apache/spark/sql/DateFunctionsSuite.scala ---
@@ -500,6 +527,23 @@ class DateFunctionsSuite extends QueryTest with 
SharedSQLContext {
   Row(date1.getTime / 1000L), Row(date2.getTime / 1000L)))
 checkAnswer(df.selectExpr(s"to_unix_timestamp(s, '$fmt')"), Seq(
   Row(ts1.getTime / 1000L), Row(ts2.getTime / 1000L)))
+
+val x1 = "2015-07-24 10:00:00"
+val x2 = "2015-25-07 02:02:02"
+val x3 = "2015-07-24 25:02:02"
+val x4 = "2015-24-07 26:02:02"
+val ts3 = Timestamp.valueOf("2015-07-24 02:25:02")
+val ts4 = Timestamp.valueOf("2015-07-24 00:10:00")
+
+val df1 = Seq(x1, x2, x3, x4).toDF("x")
+checkAnswer(df1.selectExpr("to_unix_timestamp(x)"), Seq(
+  Row(ts1.getTime / 1000L), Row(null), Row(null), Row(null)))
--- End diff --

the same with above~


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #16870: [SPARK-19496][SQL]to_date udf to return null when...

2017-02-12 Thread windpiger
Github user windpiger commented on a diff in the pull request:

https://github.com/apache/spark/pull/16870#discussion_r100723919
  
--- Diff: 
sql/core/src/test/scala/org/apache/spark/sql/DateFunctionsSuite.scala ---
@@ -477,6 +483,27 @@ class DateFunctionsSuite extends QueryTest with 
SharedSQLContext {
 checkAnswer(df.selectExpr(s"unix_timestamp(s, '$fmt')"), Seq(
   Row(ts1.getTime / 1000L), Row(ts2.getTime / 1000L)))
 
+val x1 = "2015-07-24 10:00:00"
+val x2 = "2015-25-07 02:02:02"
+val x3 = "2015-07-24 25:02:02"
+val x4 = "2015-24-07 26:02:02"
+val ts3 = Timestamp.valueOf("2015-07-24 02:25:02")
+val ts4 = Timestamp.valueOf("2015-07-24 00:10:00")
+
+val df1 = Seq(x1, x2, x3, x4).toDF("x")
+checkAnswer(df1.select(unix_timestamp(col("x"))), Seq(
+  Row(ts1.getTime / 1000L), Row(null), Row(null), Row(null)))
--- End diff --

yes, it is ts1, the timestamp of `x1`   is `ts1`


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #16870: [SPARK-19496][SQL]to_date udf to return null when...

2017-02-12 Thread gatorsmile
Github user gatorsmile commented on a diff in the pull request:

https://github.com/apache/spark/pull/16870#discussion_r100713312
  
--- Diff: 
sql/core/src/test/scala/org/apache/spark/sql/DateFunctionsSuite.scala ---
@@ -500,6 +527,23 @@ class DateFunctionsSuite extends QueryTest with 
SharedSQLContext {
   Row(date1.getTime / 1000L), Row(date2.getTime / 1000L)))
 checkAnswer(df.selectExpr(s"to_unix_timestamp(s, '$fmt')"), Seq(
   Row(ts1.getTime / 1000L), Row(ts2.getTime / 1000L)))
+
+val x1 = "2015-07-24 10:00:00"
+val x2 = "2015-25-07 02:02:02"
+val x3 = "2015-07-24 25:02:02"
+val x4 = "2015-24-07 26:02:02"
+val ts3 = Timestamp.valueOf("2015-07-24 02:25:02")
+val ts4 = Timestamp.valueOf("2015-07-24 00:10:00")
+
+val df1 = Seq(x1, x2, x3, x4).toDF("x")
+checkAnswer(df1.selectExpr("to_unix_timestamp(x)"), Seq(
+  Row(ts1.getTime / 1000L), Row(null), Row(null), Row(null)))
--- End diff --

The same issue here


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #16870: [SPARK-19496][SQL]to_date udf to return null when...

2017-02-12 Thread gatorsmile
Github user gatorsmile commented on a diff in the pull request:

https://github.com/apache/spark/pull/16870#discussion_r100713251
  
--- Diff: 
sql/core/src/test/scala/org/apache/spark/sql/DateFunctionsSuite.scala ---
@@ -477,6 +483,27 @@ class DateFunctionsSuite extends QueryTest with 
SharedSQLContext {
 checkAnswer(df.selectExpr(s"unix_timestamp(s, '$fmt')"), Seq(
   Row(ts1.getTime / 1000L), Row(ts2.getTime / 1000L)))
 
+val x1 = "2015-07-24 10:00:00"
+val x2 = "2015-25-07 02:02:02"
+val x3 = "2015-07-24 25:02:02"
+val x4 = "2015-24-07 26:02:02"
+val ts3 = Timestamp.valueOf("2015-07-24 02:25:02")
+val ts4 = Timestamp.valueOf("2015-07-24 00:10:00")
+
+val df1 = Seq(x1, x2, x3, x4).toDF("x")
+checkAnswer(df1.select(unix_timestamp(col("x"))), Seq(
+  Row(ts1.getTime / 1000L), Row(null), Row(null), Row(null)))
--- End diff --

`ts1`? 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #16870: [SPARK-19496][SQL]to_date udf to return null when...

2017-02-12 Thread gatorsmile
Github user gatorsmile commented on a diff in the pull request:

https://github.com/apache/spark/pull/16870#discussion_r100712695
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/util/DateTimeUtils.scala
 ---
@@ -98,6 +98,7 @@ object DateTimeUtils {
   def newDateFormat(formatString: String, timeZone: TimeZone): DateFormat 
= {
 val sdf = new SimpleDateFormat(formatString, Locale.US)
 sdf.setTimeZone(timeZone)
+sdf.setLenient(false)
--- End diff --

Please add one line comment:
> // Enable strict parsing, inputs must precisely match this object's 
format.



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #16870: [SPARK-19496][SQL]to_date udf to return null when...

2017-02-12 Thread windpiger
Github user windpiger commented on a diff in the pull request:

https://github.com/apache/spark/pull/16870#discussion_r100687862
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/util/DateTimeUtils.scala
 ---
@@ -95,9 +95,10 @@ object DateTimeUtils {
 sdf
   }
 
-  def newDateFormat(formatString: String, timeZone: TimeZone): DateFormat 
= {
+  def newDateFormat(formatString: String, timeZone: TimeZone, isLenient: 
Boolean): DateFormat = {
 val sdf = new SimpleDateFormat(formatString, Locale.US)
 sdf.setTimeZone(timeZone)
+sdf.setLenient(isLenient)
--- End diff --

ok thanks~


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #16870: [SPARK-19496][SQL]to_date udf to return null when...

2017-02-12 Thread cloud-fan
Github user cloud-fan commented on a diff in the pull request:

https://github.com/apache/spark/pull/16870#discussion_r100687520
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/util/DateTimeUtils.scala
 ---
@@ -95,9 +95,10 @@ object DateTimeUtils {
 sdf
   }
 
-  def newDateFormat(formatString: String, timeZone: TimeZone): DateFormat 
= {
+  def newDateFormat(formatString: String, timeZone: TimeZone, isLenient: 
Boolean): DateFormat = {
 val sdf = new SimpleDateFormat(formatString, Locale.US)
 sdf.setTimeZone(timeZone)
+sdf.setLenient(isLenient)
--- End diff --

yes, it will be good if we don't need to introduce new parameter


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #16870: [SPARK-19496][SQL]to_date udf to return null when...

2017-02-11 Thread windpiger
Github user windpiger commented on a diff in the pull request:

https://github.com/apache/spark/pull/16870#discussion_r100683849
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/util/DateTimeUtils.scala
 ---
@@ -95,9 +95,10 @@ object DateTimeUtils {
 sdf
   }
 
-  def newDateFormat(formatString: String, timeZone: TimeZone): DateFormat 
= {
+  def newDateFormat(formatString: String, timeZone: TimeZone, isLenient: 
Boolean): DateFormat = {
 val sdf = new SimpleDateFormat(formatString, Locale.US)
 sdf.setTimeZone(timeZone)
+sdf.setLenient(isLenient)
--- End diff --

we can test it with lenient false. this is  a util func, if test is ok, 
should we always set it to false?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #16870: [SPARK-19496][SQL]to_date udf to return null when...

2017-02-11 Thread cloud-fan
Github user cloud-fan commented on a diff in the pull request:

https://github.com/apache/spark/pull/16870#discussion_r100680992
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/util/DateTimeUtils.scala
 ---
@@ -95,9 +95,10 @@ object DateTimeUtils {
 sdf
   }
 
-  def newDateFormat(formatString: String, timeZone: TimeZone): DateFormat 
= {
+  def newDateFormat(formatString: String, timeZone: TimeZone, isLenient: 
Boolean): DateFormat = {
 val sdf = new SimpleDateFormat(formatString, Locale.US)
 sdf.setTimeZone(timeZone)
+sdf.setLenient(isLenient)
--- End diff --

what if we always set `lenient` to false?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #16870: [SPARK-19496][SQL]to_date udf to return null when...

2017-02-10 Thread windpiger
Github user windpiger commented on a diff in the pull request:

https://github.com/apache/spark/pull/16870#discussion_r100656533
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/datetimeExpressions.scala
 ---
@@ -465,15 +465,15 @@ case class DateFormatClass(left: Expression, right: 
Expression, timeZoneId: Opti
 copy(timeZoneId = Option(timeZoneId))
 
   override protected def nullSafeEval(timestamp: Any, format: Any): Any = {
-val df = DateTimeUtils.newDateFormat(format.toString, timeZone)
+val df = DateTimeUtils.newDateFormat(format.toString, timeZone, 
isLenient = true)
 UTF8String.fromString(df.format(new 
java.util.Date(timestamp.asInstanceOf[Long] / 1000)))
   }
 
   override def doGenCode(ctx: CodegenContext, ev: ExprCode): ExprCode = {
 val dtu = DateTimeUtils.getClass.getName.stripSuffix("$")
 val tz = ctx.addReferenceMinorObj(timeZone)
 defineCodeGen(ctx, ev, (timestamp, format) => {
-  s"""UTF8String.fromString($dtu.newDateFormat($format.toString(), $tz)
+  s"""UTF8String.fromString($dtu.newDateFormat($format.toString(), 
$tz, false)
--- End diff --

oh, sorry let me fix it, thanks!


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #16870: [SPARK-19496][SQL]to_date udf to return null when...

2017-02-10 Thread hvanhovell
Github user hvanhovell commented on a diff in the pull request:

https://github.com/apache/spark/pull/16870#discussion_r100539828
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/datetimeExpressions.scala
 ---
@@ -465,15 +465,15 @@ case class DateFormatClass(left: Expression, right: 
Expression, timeZoneId: Opti
 copy(timeZoneId = Option(timeZoneId))
 
   override protected def nullSafeEval(timestamp: Any, format: Any): Any = {
-val df = DateTimeUtils.newDateFormat(format.toString, timeZone)
+val df = DateTimeUtils.newDateFormat(format.toString, timeZone, 
isLenient = true)
 UTF8String.fromString(df.format(new 
java.util.Date(timestamp.asInstanceOf[Long] / 1000)))
   }
 
   override def doGenCode(ctx: CodegenContext, ev: ExprCode): ExprCode = {
 val dtu = DateTimeUtils.getClass.getName.stripSuffix("$")
 val tz = ctx.addReferenceMinorObj(timeZone)
 defineCodeGen(ctx, ev, (timestamp, format) => {
-  s"""UTF8String.fromString($dtu.newDateFormat($format.toString(), $tz)
+  s"""UTF8String.fromString($dtu.newDateFormat($format.toString(), 
$tz, false)
--- End diff --

Why is this one not lenient?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #16870: [SPARK-19496][SQL]to_date udf to return null when...

2017-02-10 Thread windpiger
Github user windpiger commented on a diff in the pull request:

https://github.com/apache/spark/pull/16870#discussion_r100539384
  
--- Diff: 
sql/core/src/test/scala/org/apache/spark/sql/DateFunctionsSuite.scala ---
@@ -477,6 +475,24 @@ class DateFunctionsSuite extends QueryTest with 
SharedSQLContext {
 checkAnswer(df.selectExpr(s"unix_timestamp(s, '$fmt')"), Seq(
   Row(ts1.getTime / 1000L), Row(ts2.getTime / 1000L)))
 
+
+val x1 = "2015-07-24 10:00:00"
+val x2 = "2015-25-07 02:02:02"
+val x3 = "2015-07-24 25:02:02"
+val x4 = "2015-24-07 26:02:02"
+val ts3 = Timestamp.valueOf("2015-07-24 02:25:02")
+val ts4 = Timestamp.valueOf("2015-07-24 00:10:00")
+
+val df1 = Seq(x1, x2, x3, x4).toDF("x")
+checkAnswer(df1.select(unix_timestamp(col("x"))), Seq(
+  Row(ts1.getTime / 1000L), Row(null), Row(null), Row(null)))
+checkAnswer(df1.selectExpr("unix_timestamp(x)"), Seq(
+  Row(ts1.getTime / 1000L), Row(null), Row(null), Row(null)))
+checkAnswer(df1.select(unix_timestamp(col("x"), "-dd-MM 
HH:mm:ss")), Seq(
+  Row(ts2.getTime / 1000L), Row(null), Row(null), Row(null)))
+checkAnswer(df1.selectExpr(s"unix_timestamp(x, '-MM-dd 
mm:HH:ss')"), Seq(
+  Row(ts3.getTime / 1000L), Row(ts4.getTime / 1000L), Row(null), 
Row(null)))
--- End diff --

you are right~ thanks a lot!


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #16870: [SPARK-19496][SQL]to_date udf to return null when...

2017-02-10 Thread windpiger
Github user windpiger commented on a diff in the pull request:

https://github.com/apache/spark/pull/16870#discussion_r100539442
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/util/DateTimeUtils.scala
 ---
@@ -95,9 +95,12 @@ object DateTimeUtils {
 sdf
   }
 
-  def newDateFormat(formatString: String, timeZone: TimeZone): DateFormat 
= {
+  def newDateFormat(formatString: String,
+  timeZone: TimeZone,
+  isLenient: Boolean = true): DateFormat = {
--- End diff --

ok let me modify it


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #16870: [SPARK-19496][SQL]to_date udf to return null when...

2017-02-10 Thread hvanhovell
Github user hvanhovell commented on a diff in the pull request:

https://github.com/apache/spark/pull/16870#discussion_r100523109
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/util/DateTimeUtils.scala
 ---
@@ -95,9 +95,12 @@ object DateTimeUtils {
 sdf
   }
 
-  def newDateFormat(formatString: String, timeZone: TimeZone): DateFormat 
= {
+  def newDateFormat(formatString: String,
+  timeZone: TimeZone,
+  isLenient: Boolean = true): DateFormat = {
--- End diff --

Let's not make this a default parameter. 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #16870: [SPARK-19496][SQL]to_date udf to return null when...

2017-02-10 Thread hvanhovell
Github user hvanhovell commented on a diff in the pull request:

https://github.com/apache/spark/pull/16870#discussion_r100523512
  
--- Diff: 
sql/core/src/test/scala/org/apache/spark/sql/DateFunctionsSuite.scala ---
@@ -477,6 +475,24 @@ class DateFunctionsSuite extends QueryTest with 
SharedSQLContext {
 checkAnswer(df.selectExpr(s"unix_timestamp(s, '$fmt')"), Seq(
   Row(ts1.getTime / 1000L), Row(ts2.getTime / 1000L)))
 
+
+val x1 = "2015-07-24 10:00:00"
+val x2 = "2015-25-07 02:02:02"
+val x3 = "2015-07-24 25:02:02"
+val x4 = "2015-24-07 26:02:02"
+val ts3 = Timestamp.valueOf("2015-07-24 02:25:02")
+val ts4 = Timestamp.valueOf("2015-07-24 00:10:00")
+
+val df1 = Seq(x1, x2, x3, x4).toDF("x")
+checkAnswer(df1.select(unix_timestamp(col("x"))), Seq(
+  Row(ts1.getTime / 1000L), Row(null), Row(null), Row(null)))
+checkAnswer(df1.selectExpr("unix_timestamp(x)"), Seq(
+  Row(ts1.getTime / 1000L), Row(null), Row(null), Row(null)))
+checkAnswer(df1.select(unix_timestamp(col("x"), "-dd-MM 
HH:mm:ss")), Seq(
+  Row(ts2.getTime / 1000L), Row(null), Row(null), Row(null)))
+checkAnswer(df1.selectExpr(s"unix_timestamp(x, '-MM-dd 
mm:HH:ss')"), Seq(
+  Row(ts3.getTime / 1000L), Row(ts4.getTime / 1000L), Row(null), 
Row(null)))
--- End diff --

Shouldn't the order be `Row(ts4.getTime / 1000L), Row(null), 
Row(ts3.getTime / 1000L), Row(null)`? It does not matter for testing since we 
sort results, but it makes it less confusing.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #16870: [SPARK-19496][SQL]to_date udf to return null when...

2017-02-10 Thread hvanhovell
Github user hvanhovell commented on a diff in the pull request:

https://github.com/apache/spark/pull/16870#discussion_r100524361
  
--- Diff: 
sql/core/src/test/scala/org/apache/spark/sql/DateFunctionsSuite.scala ---
@@ -500,6 +516,20 @@ class DateFunctionsSuite extends QueryTest with 
SharedSQLContext {
   Row(date1.getTime / 1000L), Row(date2.getTime / 1000L)))
 checkAnswer(df.selectExpr(s"to_unix_timestamp(s, '$fmt')"), Seq(
   Row(ts1.getTime / 1000L), Row(ts2.getTime / 1000L)))
+
+
+val x1 = "2015-07-24 10:00:00"
+val x2 = "2015-25-07 02:02:02"
+val x3 = "2015-07-24 25:02:02"
+val x4 = "2015-24-07 26:02:02"
+val ts3 = Timestamp.valueOf("2015-07-24 02:25:02")
+val ts4 = Timestamp.valueOf("2015-07-24 00:10:00")
+
+val df1 = Seq(x1, x2, x3, x4).toDF("x")
+checkAnswer(df1.selectExpr("to_unix_timestamp(x)"), Seq(
+  Row(ts1.getTime / 1000L), Row(null), Row(null), Row(null)))
+checkAnswer(df1.selectExpr(s"to_unix_timestamp(x, '-MM-dd 
mm:HH:ss')"), Seq(
+  Row(ts3.getTime / 1000L), Row(ts4.getTime / 1000L), Row(null), 
Row(null)))
--- End diff --

Shouldn't the order be `Row(ts4.getTime / 1000L), Row(null), 
Row(ts3.getTime / 1000L), Row(null)`? It does not matter for testing since we 
sort results, but it makes it less confusing.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #16870: [SPARK-19496][SQL]to_date udf to return null when...

2017-02-09 Thread windpiger
GitHub user windpiger opened a pull request:

https://github.com/apache/spark/pull/16870

[SPARK-19496][SQL]to_date udf to return null when input date is invalid

## What changes were proposed in this pull request?

Currently the udf  `to_date` has different return value with an invalid 
date input.

```
SELECT to_date('2015-07-22', '-dd-MM') ->  return `2016-10-07`
SELECT to_date('2014-31-12')-> return null
```

As discussed in JIRA 
[SPARK-19496](https://issues.apache.org/jira/browse/SPARK-19496), we should 
return null in both situations when the input date is invalid 

## How was this patch tested?
unit test added

You can merge this pull request into a Git repository by running:

$ git pull https://github.com/windpiger/spark to_date

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/spark/pull/16870.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #16870


commit 24c99fa4c0ddcdbd852b8d5354aaff43fa198477
Author: windpiger 
Date:   2017-02-09T09:58:17Z

[SPARK-19496][SQL]to_date udf to return null when input date is invalid




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org