[GitHub] spark pull request #22943: [SPARK-25098][SQL] Trim the string when cast stri...

2018-11-06 Thread asfgit
Github user asfgit closed the pull request at:

https://github.com/apache/spark/pull/22943


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #22943: [SPARK-25098][SQL] Trim the string when cast stri...

2018-11-06 Thread cloud-fan
Github user cloud-fan commented on a diff in the pull request:

https://github.com/apache/spark/pull/22943#discussion_r231382309
  
--- Diff: 
sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/util/DateTimeUtilsSuite.scala
 ---
@@ -140,16 +140,10 @@ class DateTimeUtilsSuite extends SparkFunSuite {
 c = Calendar.getInstance()
 c.set(2015, 2, 18, 0, 0, 0)
 c.set(Calendar.MILLISECOND, 0)
-assert(stringToDate(UTF8String.fromString("2015-03-18")).get ===
-  millisToDays(c.getTimeInMillis))
-assert(stringToDate(UTF8String.fromString("2015-03-18 ")).get ===
-  millisToDays(c.getTimeInMillis))
-assert(stringToDate(UTF8String.fromString("2015-03-18 123142")).get ===
-  millisToDays(c.getTimeInMillis))
-assert(stringToDate(UTF8String.fromString("2015-03-18T123123")).get ===
-  millisToDays(c.getTimeInMillis))
-assert(stringToDate(UTF8String.fromString("2015-03-18T")).get ===
-  millisToDays(c.getTimeInMillis))
+Seq("2015-03-18", "2015-03-18 ", " 2015-03-18", " 2015-03-18 ", 
"2015-03-18 123142",
--- End diff --

ah i see


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #22943: [SPARK-25098][SQL] Trim the string when cast stri...

2018-11-06 Thread dongjoon-hyun
Github user dongjoon-hyun commented on a diff in the pull request:

https://github.com/apache/spark/pull/22943#discussion_r231381218
  
--- Diff: 
sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/util/DateTimeUtilsSuite.scala
 ---
@@ -140,16 +140,10 @@ class DateTimeUtilsSuite extends SparkFunSuite {
 c = Calendar.getInstance()
 c.set(2015, 2, 18, 0, 0, 0)
 c.set(Calendar.MILLISECOND, 0)
-assert(stringToDate(UTF8String.fromString("2015-03-18")).get ===
-  millisToDays(c.getTimeInMillis))
-assert(stringToDate(UTF8String.fromString("2015-03-18 ")).get ===
-  millisToDays(c.getTimeInMillis))
-assert(stringToDate(UTF8String.fromString("2015-03-18 123142")).get ===
-  millisToDays(c.getTimeInMillis))
-assert(stringToDate(UTF8String.fromString("2015-03-18T123123")).get ===
-  millisToDays(c.getTimeInMillis))
-assert(stringToDate(UTF8String.fromString("2015-03-18T")).get ===
-  millisToDays(c.getTimeInMillis))
+Seq("2015-03-18", "2015-03-18 ", " 2015-03-18", " 2015-03-18 ", 
"2015-03-18 123142",
--- End diff --

New test cases (with space padding) are added; e.g. ` 2015-03-18` and ` 
2015-03-18 `.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #22943: [SPARK-25098][SQL] Trim the string when cast stri...

2018-11-06 Thread cloud-fan
Github user cloud-fan commented on a diff in the pull request:

https://github.com/apache/spark/pull/22943#discussion_r231380552
  
--- Diff: 
sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/util/DateTimeUtilsSuite.scala
 ---
@@ -140,16 +140,10 @@ class DateTimeUtilsSuite extends SparkFunSuite {
 c = Calendar.getInstance()
 c.set(2015, 2, 18, 0, 0, 0)
 c.set(Calendar.MILLISECOND, 0)
-assert(stringToDate(UTF8String.fromString("2015-03-18")).get ===
-  millisToDays(c.getTimeInMillis))
-assert(stringToDate(UTF8String.fromString("2015-03-18 ")).get ===
-  millisToDays(c.getTimeInMillis))
-assert(stringToDate(UTF8String.fromString("2015-03-18 123142")).get ===
-  millisToDays(c.getTimeInMillis))
-assert(stringToDate(UTF8String.fromString("2015-03-18T123123")).get ===
-  millisToDays(c.getTimeInMillis))
-assert(stringToDate(UTF8String.fromString("2015-03-18T")).get ===
-  millisToDays(c.getTimeInMillis))
+Seq("2015-03-18", "2015-03-18 ", " 2015-03-18", " 2015-03-18 ", 
"2015-03-18 123142",
--- End diff --

the test result doesn't change?


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #22943: [SPARK-25098][SQL] Trim the string when cast stri...

2018-11-06 Thread dongjoon-hyun
Github user dongjoon-hyun commented on a diff in the pull request:

https://github.com/apache/spark/pull/22943#discussion_r231207627
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/util/DateTimeUtils.scala
 ---
@@ -441,7 +441,7 @@ object DateTimeUtils {
   }
 
   /**
-   * Parses a given UTF8 date string to a corresponding [[Int]] value.
+   * Parses a trimmed UTF8 date string to a corresponding [[Int]] value.
--- End diff --

`Parses a trimmed UTF8` -> `Trim and parse a given UTF8`?


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #22943: [SPARK-25098][SQL] Trim the string when cast stri...

2018-11-06 Thread dongjoon-hyun
Github user dongjoon-hyun commented on a diff in the pull request:

https://github.com/apache/spark/pull/22943#discussion_r231207128
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/util/DateTimeUtils.scala
 ---
@@ -274,7 +274,7 @@ object DateTimeUtils {
   }
 
   /**
-   * Parses a given UTF8 date string to the corresponding a corresponding 
[[Long]] value.
+   * Parses a trimmed UTF8 date string to the corresponding a 
corresponding [[Long]] value.
--- End diff --

`Parses a trimmed UTF8` -> `Trim and parse a given UTF8`?


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #22943: [SPARK-25098][SQL] Trim the string when cast stri...

2018-11-05 Thread dongjoon-hyun
Github user dongjoon-hyun commented on a diff in the pull request:

https://github.com/apache/spark/pull/22943#discussion_r230998508
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/Cast.scala
 ---
@@ -359,7 +359,7 @@ case class Cast(child: Expression, dataType: DataType, 
timeZoneId: Option[String
   // TimestampConverter
   private[this] def castToTimestamp(from: DataType): Any => Any = from 
match {
 case StringType =>
-  buildCast[UTF8String](_, utfs => 
DateTimeUtils.stringToTimestamp(utfs, timeZone).orNull)
+  buildCast[UTF8String](_, s => 
DateTimeUtils.stringToTimestamp(s.trim(), timeZone).orNull)
--- End diff --

Ur, I'd like not to rename it. One line function document will suffice.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #22943: [SPARK-25098][SQL] Trim the string when cast stri...

2018-11-05 Thread wangyum
Github user wangyum commented on a diff in the pull request:

https://github.com/apache/spark/pull/22943#discussion_r230970713
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/Cast.scala
 ---
@@ -359,7 +359,7 @@ case class Cast(child: Expression, dataType: DataType, 
timeZoneId: Option[String
   // TimestampConverter
   private[this] def castToTimestamp(from: DataType): Any => Any = from 
match {
 case StringType =>
-  buildCast[UTF8String](_, utfs => 
DateTimeUtils.stringToTimestamp(utfs, timeZone).orNull)
+  buildCast[UTF8String](_, s => 
DateTimeUtils.stringToTimestamp(s.trim(), timeZone).orNull)
--- End diff --

How about change `stringToDate` to `trimStringToDate` and update 
`trimStringToDate` to:

![image](https://user-images.githubusercontent.com/5399861/48036353-ec49a100-e1a2-11e8-80e6-b52b4a007493.png)



---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #22943: [SPARK-25098][SQL] Trim the string when cast stri...

2018-11-05 Thread dongjoon-hyun
Github user dongjoon-hyun commented on a diff in the pull request:

https://github.com/apache/spark/pull/22943#discussion_r230911199
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/Cast.scala
 ---
@@ -359,7 +359,7 @@ case class Cast(child: Expression, dataType: DataType, 
timeZoneId: Option[String
   // TimestampConverter
   private[this] def castToTimestamp(from: DataType): Any => Any = from 
match {
 case StringType =>
-  buildCast[UTF8String](_, utfs => 
DateTimeUtils.stringToTimestamp(utfs, timeZone).orNull)
+  buildCast[UTF8String](_, s => 
DateTimeUtils.stringToTimestamp(s.trim(), timeZone).orNull)
--- End diff --

cc @gatorsmile 


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #22943: [SPARK-25098][SQL] Trim the string when cast stri...

2018-11-05 Thread dongjoon-hyun
Github user dongjoon-hyun commented on a diff in the pull request:

https://github.com/apache/spark/pull/22943#discussion_r230911108
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/Cast.scala
 ---
@@ -359,7 +359,7 @@ case class Cast(child: Expression, dataType: DataType, 
timeZoneId: Option[String
   // TimestampConverter
   private[this] def castToTimestamp(from: DataType): Any => Any = from 
match {
 case StringType =>
-  buildCast[UTF8String](_, utfs => 
DateTimeUtils.stringToTimestamp(utfs, timeZone).orNull)
+  buildCast[UTF8String](_, s => 
DateTimeUtils.stringToTimestamp(s.trim(), timeZone).orNull)
--- End diff --

What about changing `stringToDate` and `stringToTimestamp` instead?
Those functions are used only in `Cast` and they already handle 
`null`cases, too.
I didn't look at the detail of this PR. The change looks a little less 
robust when `s` is `null`.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #22943: [SPARK-25098][SQL] Trim the string when cast stri...

2018-11-04 Thread wangyum
GitHub user wangyum opened a pull request:

https://github.com/apache/spark/pull/22943

[SPARK-25098][SQL] Trim the string when cast stringToTimestamp and 
stringToDate

## What changes were proposed in this pull request?

**Hive** and **Oracle** trim the string when cast `stringToTimestamp` and 
`stringToDate`. this PR support this feature:

![image](https://user-images.githubusercontent.com/5399861/47979721-793b1e80-e0ff-11e8-97c8-24b10950ee9e.png)

![image](https://user-images.githubusercontent.com/5399861/47979725-7dffd280-e0ff-11e8-87d4-5767a00ed46e.png)


## How was this patch tested?

unit tests

Closes https://github.com/apache/spark/pull/22089


You can merge this pull request into a Git repository by running:

$ git pull https://github.com/wangyum/spark SPARK-25098

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/spark/pull/22943.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #22943


commit d297817b7457fef40eb78b803542aed213afb7fc
Author: Yuming Wang 
Date:   2018-11-05T05:31:22Z

trim() the string when cast stringToTimestamp and stringToDate




---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org