[GitHub] spark issue #16735: [SPARK-19228][SQL] Introduce tryParseDate method to proc...

2017-12-03 Thread sergey-rubtsov
Github user sergey-rubtsov commented on the issue:

https://github.com/apache/spark/pull/16735
  
I wil try to complete it in this month


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #16735: [SPARK-19228][SQL] Introduce tryParseDate method ...

2018-01-03 Thread sergey-rubtsov
Github user sergey-rubtsov closed the pull request at:

https://github.com/apache/spark/pull/16735


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #20140: [SPARK-19228][SQL] Introduce tryParseDate method ...

2018-01-03 Thread sergey-rubtsov
GitHub user sergey-rubtsov opened a pull request:

https://github.com/apache/spark/pull/20140

[SPARK-19228][SQL] Introduce tryParseDate method to process csv date,…

… add a type-widening rule in findTightestCommonType between DateType and 
TimestampType, add java.time.format.DateTimeFormatter to more accurately infer 
the type of time, add an end-to-end test case and unit test

## What changes were proposed in this pull request?

By design 'TimestampType' (8 bytes) is larger than 'DateType' (4 bytes).
But when a date is parsed, an option "dateFormat" is ignored and default 
date format ("-MM-dd") is using and the date is parsed as timestamp.

This patch fixes that bug.

For other details, please, read the ticket
https://issues.apache.org/jira/browse/SPARK-19228

## How was this patch tested?

Add an end-to-end test case and unit test

You can merge this pull request into a Git repository by running:

    $ git pull https://github.com/sergey-rubtsov/spark master

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/spark/pull/20140.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #20140


commit d2ed68673082995bee65a3c58be8b71642a60d57
Author: sergei.rubtcov 
Date:   2018-01-03T10:50:26Z

[SPARK-19228][SQL] Introduce tryParseDate method to process csv date, add a 
type-widening rule in findTightestCommonType between DateType and 
TimestampType, add java.time.format.DateTimeFormatter to more accurately infer 
the type of time, add an end-to-end test case and unit test




---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #20140: [SPARK-19228][SQL] Introduce tryParseDate method to proc...

2018-01-15 Thread sergey-rubtsov
Github user sergey-rubtsov commented on the issue:

https://github.com/apache/spark/pull/20140
  
@HyukjinKwon, @gatorsmile could you please help find someone to review this?


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #21363: [SPARK-19228][SQL] Migrate on Java 8 time from Fa...

2018-05-18 Thread sergey-rubtsov
GitHub user sergey-rubtsov opened a pull request:

https://github.com/apache/spark/pull/21363

[SPARK-19228][SQL] Migrate on Java 8 time from FastDateFormat for meet the 
ISO8601

## What changes were proposed in this pull request?

Add support for inferring DateType and custom "dateFormat" option.
Fix an old bug with parse string to SQL's timestamp value in microseconds 
accuracy.
Add a type-widening rule in findTightestCommonType between DateType and 
TimestampType.

## How was this patch tested?

Fix some tests to accord with an internationally agreed way to represent 
dates.
Add an end-to-end test case and unit tests.

You can merge this pull request into a Git repository by running:

$ git pull https://github.com/sergey-rubtsov/spark master

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/spark/pull/21363.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #21363


commit 65179a2bdd1623fb7f4077cdc316de5a7436c49d
Author: sergei.rubtcov 
Date:   2018-05-18T12:13:05Z

[SPARK-19228][SQL] Migrate on Java 8 time from FastDateFormat for meet the 
ISO8601 and parsing dates in csv correctly. Add support for inferring DateType 
and custom "dateFormat" option.




---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #21363: [SPARK-19228][SQL] Migrate on Java 8 time from FastDateF...

2018-05-18 Thread sergey-rubtsov
Github user sergey-rubtsov commented on the issue:

https://github.com/apache/spark/pull/21363
  
Previous pull request https://github.com/apache/spark/pull/20140 is closed


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #21363: [SPARK-19228][SQL] Migrate on Java 8 time from FastDateF...

2018-05-18 Thread sergey-rubtsov
Github user sergey-rubtsov commented on the issue:

https://github.com/apache/spark/pull/21363
  
@HyukjinKwon, please take a look


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #20140: [SPARK-19228][SQL] Introduce tryParseDate method ...

2018-05-18 Thread sergey-rubtsov
Github user sergey-rubtsov closed the pull request at:

https://github.com/apache/spark/pull/20140


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #21363: [SPARK-19228][SQL] Migrate on Java 8 time from Fa...

2018-05-18 Thread sergey-rubtsov
Github user sergey-rubtsov commented on a diff in the pull request:

https://github.com/apache/spark/pull/21363#discussion_r189255920
  
--- Diff: 
sql/core/src/test/scala/org/apache/spark/sql/execution/datasources/csv/UnivocityParserSuite.scala
 ---
@@ -107,20 +107,26 @@ class UnivocityParserSuite extends SparkFunSuite {
 assert(parser.makeConverter("_1", BooleanType, options = 
options).apply("true") == true)
 
 val timestampsOptions =
-  new CSVOptions(Map("timestampFormat" -> "dd/MM/ hh:mm"), "GMT")
+  new CSVOptions(Map("timestampFormat" -> "dd/MM/ HH:mm"), "GMT")
--- End diff --

In accordind to official documentation, this test must not pass, because 
"hh" for hours means that hour in am/pm (1-12):

https://docs.oracle.com/javase/8/docs/api/java/time/format/DateTimeFormatter.html
https://docs.oracle.com/javase/7/docs/api/java/text/SimpleDateFormat.html


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #21363: [SPARK-19228][SQL] Migrate on Java 8 time from Fa...

2018-05-18 Thread sergey-rubtsov
Github user sergey-rubtsov commented on a diff in the pull request:

https://github.com/apache/spark/pull/21363#discussion_r189255997
  
--- Diff: 
sql/core/src/test/scala/org/apache/spark/sql/execution/datasources/csv/UnivocityParserSuite.scala
 ---
@@ -107,20 +107,26 @@ class UnivocityParserSuite extends SparkFunSuite {
 assert(parser.makeConverter("_1", BooleanType, options = 
options).apply("true") == true)
 
 val timestampsOptions =
-  new CSVOptions(Map("timestampFormat" -> "dd/MM/ hh:mm"), "GMT")
--- End diff --

In accordind to official documentation, this test must not pass, because 
"hh" for hours means that hour in am/pm (1-12):

https://docs.oracle.com/javase/8/docs/api/java/time/format/DateTimeFormatter.html
https://docs.oracle.com/javase/7/docs/api/java/text/SimpleDateFormat.html


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #21363: [SPARK-19228][SQL] Migrate on Java 8 time from FastDateF...

2018-05-20 Thread sergey-rubtsov
Github user sergey-rubtsov commented on the issue:

https://github.com/apache/spark/pull/21363
  
cc @viirya @mgaido91 @dongjoon-hyun


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #21363: [SPARK-19228][SQL] Migrate on Java 8 time from Fa...

2018-05-21 Thread sergey-rubtsov
Github user sergey-rubtsov commented on a diff in the pull request:

https://github.com/apache/spark/pull/21363#discussion_r189582909
  
--- Diff: 
sql/core/src/test/scala/org/apache/spark/sql/execution/datasources/csv/CSVInferSchemaSuite.scala
 ---
@@ -59,13 +59,21 @@ class CSVInferSchemaSuite extends SparkFunSuite {
 assert(CSVInferSchema.inferField(IntegerType, textValueOne, options) 
== expectedTypeOne)
   }
 
-  test("Timestamp field types are inferred correctly via custom data 
format") {
-var options = new CSVOptions(Map("timestampFormat" -> "-mm"), 
"GMT")
+  test("Timestamp field types are inferred correctly via custom date 
format") {
+var options = new CSVOptions(Map("timestampFormat" -> "-MM"), 
"GMT")
--- End diff --

"-mm" means years and minutes, this is date format, this is time format
-MM" means years and months, but I do not insist on this change


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #21363: [SPARK-19228][SQL] Migrate on Java 8 time from Fa...

2018-05-21 Thread sergey-rubtsov
Github user sergey-rubtsov commented on a diff in the pull request:

https://github.com/apache/spark/pull/21363#discussion_r189583023
  
--- Diff: 
sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/csv/CSVInferSchema.scala
 ---
@@ -90,6 +90,7 @@ private[csv] object CSVInferSchema {
   // DecimalTypes have different precisions and scales, so we try 
to find the common type.
   findTightestCommonType(typeSoFar, tryParseDecimal(field, 
options)).getOrElse(StringType)
 case DoubleType => tryParseDouble(field, options)
+case DateType => tryParseDate(field, options)
--- End diff --

I can do it, but where exactly it should be documented?


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #21363: [SPARK-19228][SQL] Migrate on Java 8 time from Fa...

2018-05-21 Thread sergey-rubtsov
Github user sergey-rubtsov commented on a diff in the pull request:

https://github.com/apache/spark/pull/21363#discussion_r189583203
  
--- Diff: 
sql/core/src/test/scala/org/apache/spark/sql/execution/datasources/csv/UnivocityParserSuite.scala
 ---
@@ -107,20 +107,26 @@ class UnivocityParserSuite extends SparkFunSuite {
 assert(parser.makeConverter("_1", BooleanType, options = 
options).apply("true") == true)
 
 val timestampsOptions =
-  new CSVOptions(Map("timestampFormat" -> "dd/MM/ hh:mm"), "GMT")
+  new CSVOptions(Map("timestampFormat" -> "dd/MM/ HH:mm"), "GMT")
 val customTimestamp = "31/01/2015 00:00"
-val expectedTime = 
timestampsOptions.timestampFormat.parse(customTimestamp).getTime
+
+val expectedTime = LocalDateTime.parse(customTimestamp, 
timestampsOptions.timestampFormatter)
+  .atZone(options.timeZone.toZoneId)
+  .toInstant.toEpochMilli
 val castedTimestamp =
-  parser.makeConverter("_1", TimestampType, nullable = true, options = 
timestampsOptions)
+  parser.makeConverter("_1", TimestampType, nullable = true, 
timestampsOptions)
 .apply(customTimestamp)
 assert(castedTimestamp == expectedTime * 1000L)
 
-val customDate = "31/01/2015"
 val dateOptions = new CSVOptions(Map("dateFormat" -> "dd/MM/"), 
"GMT")
-val expectedDate = dateOptions.dateFormat.parse(customDate).getTime
+val customDate = "31/01/2015"
+
+val expectedDate = LocalDate.parse(customDate, 
dateOptions.dateFormatter)
+  .atStartOfDay(options.timeZone.toZoneId)
+  .toInstant.toEpochMilli
 val castedDate =
-  parser.makeConverter("_1", DateType, nullable = true, options = 
dateOptions)
--- End diff --

ok


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #21363: [SPARK-19228][SQL] Migrate on Java 8 time from Fa...

2018-05-21 Thread sergey-rubtsov
Github user sergey-rubtsov commented on a diff in the pull request:

https://github.com/apache/spark/pull/21363#discussion_r189588135
  
--- Diff: 
sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/csv/CSVInferSchema.scala
 ---
@@ -140,14 +141,23 @@ private[csv] object CSVInferSchema {
   private def tryParseDouble(field: String, options: CSVOptions): DataType 
= {
 if ((allCatch opt field.toDouble).isDefined || isInfOrNan(field, 
options)) {
   DoubleType
+} else {
+  tryParseDate(field, options)
--- End diff --

For example, by mistake we have identical "timestampFormat" and 
"dateFormat" options.
Let it be "-MM-dd"
'TimestampType' (8 bytes) is larger than 'DateType' (4 bytes)
So if they can overlap, we need to try parse it as date firstly, because 
both of these types are suitable, but you need to try to use a more compact by 
default and it will be correct inferring of type


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #21363: [SPARK-19228][SQL] Migrate on Java 8 time from FastDateF...

2018-05-21 Thread sergey-rubtsov
Github user sergey-rubtsov commented on the issue:

https://github.com/apache/spark/pull/21363
  
@HyukjinKwon please, clarify, how to add a configuration to control this 
behaviour? Do you mean to keep backward compatibility? 
But it seems, now, we don't handle a Spark "DataType" at all. 
I can change the behaviour in next way: 
Hanlde "DataType" if option "dateFormat" was set by customer, if it is not 
set, ignore it at all, is it okay?


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #21363: [SPARK-19228][SQL] Migrate on Java 8 time from Fa...

2018-05-21 Thread sergey-rubtsov
Github user sergey-rubtsov commented on a diff in the pull request:

https://github.com/apache/spark/pull/21363#discussion_r189596275
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/util/DateTimeUtils.scala
 ---
@@ -143,6 +145,12 @@ object DateTimeUtils {
 millisLocal - getOffsetFromLocalMillis(millisLocal, timeZone)
   }
 
+  def dateTimeToMicroseconds(localDateTime: LocalDateTime, timeZone: 
TimeZone): Long = {
+val microOfSecond = localDateTime.getLong(ChronoField.MICRO_OF_SECOND)
+val epochSecond = 
localDateTime.atZone(timeZone.toZoneId).toInstant.getEpochSecond
+epochSecond * 100L + microOfSecond
--- End diff --

thanks


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #21363: [SPARK-19228][SQL] Migrate on Java 8 time from Fa...

2018-05-21 Thread sergey-rubtsov
Github user sergey-rubtsov commented on a diff in the pull request:

https://github.com/apache/spark/pull/21363#discussion_r189597989
  
--- Diff: 
sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/csv/CSVInferSchema.scala
 ---
@@ -140,14 +141,23 @@ private[csv] object CSVInferSchema {
   private def tryParseDouble(field: String, options: CSVOptions): DataType 
= {
 if ((allCatch opt field.toDouble).isDefined || isInfOrNan(field, 
options)) {
   DoubleType
+} else {
+  tryParseDate(field, options)
--- End diff --

At the moment, DateType here is ignored at all, I'm not sure that it was 
conceived when the type was created


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #20140: [SPARK-19228][SQL] Introduce tryParseDate method to proc...

2018-04-19 Thread sergey-rubtsov
Github user sergey-rubtsov commented on the issue:

https://github.com/apache/spark/pull/20140
  
@HyukjinKwon changed as you suggested


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #16735: [SPARK-19228][SQL] Introduce tryParseDate method to proc...

2017-02-01 Thread sergey-rubtsov
Github user sergey-rubtsov commented on the issue:

https://github.com/apache/spark/pull/16735
  
@HyukjinKwon yes, sure, I will check it


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #20140: [SPARK-19228][SQL] Introduce tryParseDate method ...

2018-02-06 Thread sergey-rubtsov
Github user sergey-rubtsov commented on a diff in the pull request:

https://github.com/apache/spark/pull/20140#discussion_r166261313
  
--- Diff: 
sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/csv/CSVOptions.scala
 ---
@@ -150,6 +151,16 @@ class CSVOptions(
 
   val isCommentSet = this.comment != '\u'
 
+  def dateFormatter: DateTimeFormatter = {
+DateTimeFormatter.ofPattern(dateFormat.getPattern)
+  
.withLocale(Locale.US).withZone(timeZone.toZoneId).withResolverStyle(ResolverStyle.SMART)
+  }
+
+  def timestampFormatter: DateTimeFormatter = {
+DateTimeFormatter.ofPattern(timestampFormat.getPattern)
--- End diff --

DateTimeFormatter is a standard time library from java 8. FastDateFormat 
can't properly parse date and timestamp. 

I can create some test cases to prove it, but I need many time for that.

Also, FastDateFormat does not meet the ISO8601: 
https://en.wikipedia.org/wiki/ISO_8601
Current implementation of CSVInferSchema contains other bugs. For example, 
test test("Timestamp field types are inferred correctly via custom date 
format") in class CSVInferSchemaSuite must not pass, because timestampFormat 
"-mm" is wrong format for year and month. It should be "-MM".
It is better to make refactor of date types and change deprecated types on 
new ones for the whole project.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #20140: [SPARK-19228][SQL] Introduce tryParseDate method ...

2018-02-06 Thread sergey-rubtsov
Github user sergey-rubtsov commented on a diff in the pull request:

https://github.com/apache/spark/pull/20140#discussion_r166262196
  
--- Diff: 
sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/csv/CSVOptions.scala
 ---
@@ -150,6 +151,16 @@ class CSVOptions(
 
   val isCommentSet = this.comment != '\u'
 
+  def dateFormatter: DateTimeFormatter = {
--- End diff --

DateTimeFormatter has the disadvantage. It does not implement Serializable 
in contrast to FastDateFormat. That is why I couldn't make it as a val here.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #20140: [SPARK-19228][SQL] Introduce tryParseDate method ...

2018-02-06 Thread sergey-rubtsov
Github user sergey-rubtsov commented on a diff in the pull request:

https://github.com/apache/spark/pull/20140#discussion_r166264802
  
--- Diff: 
sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/csv/CSVInferSchema.scala
 ---
@@ -90,7 +90,10 @@ private[csv] object CSVInferSchema {
   // DecimalTypes have different precisions and scales, so we try 
to find the common type.
   findTightestCommonType(typeSoFar, tryParseDecimal(field, 
options)).getOrElse(StringType)
 case DoubleType => tryParseDouble(field, options)
-case TimestampType => tryParseTimestamp(field, options)
+case DateType => tryParseDate(field, options)
+case TimestampType =>
+  findTightestCommonType(typeSoFar, tryParseTimestamp(field, 
options)).getOrElse(
--- End diff --

Sorry, your question is not really clear for me.
We have to try parse object as DateType first, because date always can be 
parsed as date and as timestamp (begin of day). 
Current implementation of spark ignores dates and it is always parsing them 
as timestamps 


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #16735: [SPARK-19228][SQL] Introduce tryParseDate method ...

2017-03-13 Thread sergey-rubtsov
Github user sergey-rubtsov closed the pull request at:

https://github.com/apache/spark/pull/16735


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #16735: [SPARK-19228][SQL] Introduce tryParseDate method ...

2017-03-13 Thread sergey-rubtsov
GitHub user sergey-rubtsov reopened a pull request:

https://github.com/apache/spark/pull/16735

[SPARK-19228][SQL] Introduce tryParseDate method to process csv date …

…column with custom format as date

## What changes were proposed in this pull request?

This patch fixes bugs:

1) All the dates parsed as timestamps.
2) Option "dateFormat" is ignored when read csv files with date data. 
Instead of this option default date format ("-MM-dd") is using.

For other details, please, read the ticket
https://issues.apache.org/jira/browse/SPARK-19228

## How was this patch tested?

Tested with unit tests only. Add new test


You can merge this pull request into a Git repository by running:

$ git pull https://github.com/sergey-rubtsov/spark master

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/spark/pull/16735.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #16735


commit 287005809cec5388dcb75a3d99bc0f0461b9bb69
Author: sergei.rubtcov 
Date:   2017-03-10T13:03:27Z

[SPARK-19228][SQL] Introduce tryParseDate method to process csv date, add a 
type-widening rule in findTightestCommonType between DateType and 
TimestampType, add an end-to-end test case

commit 3e250f56ee5f3a5c1ce5542d56670973233e62b7
Author: sergei.rubtcov 
Date:   2017-03-13T14:13:17Z

Merge branch 'master' of https://github.com/apache/spark




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #16735: [SPARK-19228][SQL] Introduce tryParseDate method to proc...

2017-03-13 Thread sergey-rubtsov
Github user sergey-rubtsov commented on the issue:

https://github.com/apache/spark/pull/16735
  
Couldn't run tests in CSVSuite locally on my Windows OS, apologize for the 
possible test fails


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #16735: [SPARK-19228][SQL] Introduce tryParseDate method to proc...

2017-03-22 Thread sergey-rubtsov
Github user sergey-rubtsov commented on the issue:

https://github.com/apache/spark/pull/16735
  
Hi @HyukjinKwon, @gatorsmile 
Could you take a look, please?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #16735: [SPARK-19228][SQL] Introduce tryParseDate method ...

2017-01-29 Thread sergey-rubtsov
GitHub user sergey-rubtsov opened a pull request:

https://github.com/apache/spark/pull/16735

[SPARK-19228][SQL] Introduce tryParseDate method to process csv date …

…column with custom format as date

## What changes were proposed in this pull request?

(Please fill in changes proposed in this fix)

## How was this patch tested?

(Please explain how this patch was tested. E.g. unit tests, integration 
tests, manual tests)
(If this patch involves UI changes, please attach a screenshot; otherwise, 
remove this)

Please review http://spark.apache.org/contributing.html before opening a 
pull request.


You can merge this pull request into a Git repository by running:

$ git pull https://github.com/sergey-rubtsov/spark master

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/spark/pull/16735.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #16735


commit 8f78e1129efe8277c190edd5016c1e06b5aeef65
Author: Sergey Rubtsov 
Date:   2017-01-28T20:21:55Z

[SPARK-19228][SQL] Introduce tryParseDate method to process csv date column 
with custom format as date




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #16735: [SPARK-19228][SQL] Introduce tryParseDate method ...

2017-01-30 Thread sergey-rubtsov
Github user sergey-rubtsov commented on a diff in the pull request:

https://github.com/apache/spark/pull/16735#discussion_r98523879
  
--- Diff: 
sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/csv/CSVInferSchema.scala
 ---
@@ -140,12 +137,21 @@ private[csv] object CSVInferSchema {
 }
   }
 
+  private def tryParseDate(field: String, options: CSVOptions): DataType = 
{
+// This case infers a custom `dateFormat` is set.
+if ((allCatch opt options.dateFormat.parse(field)).isDefined) {
+  DateType
+} else {
+  tryParseTimestamp(field, options)
+}
+  }
+
   private def tryParseTimestamp(field: String, options: CSVOptions): 
DataType = {
-// This case infers a custom `dataFormat` is set.
+// This case infers a custom `timestampFormat` is set.
 if ((allCatch opt options.timestampFormat.parse(field)).isDefined) {
   TimestampType
 } else if ((allCatch opt DateTimeUtils.stringToTime(field)).isDefined) 
{
-  // We keep this for backwords competibility.
+  // We keep this for backwards compatibility.
   TimestampType
 } else {
   tryParseBoolean(field, options)
--- End diff --

okey, I will test it again, thanks


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org