[GitHub] spark issue #21182: [SPARK-24068] Propagating DataFrameReader's options to T...

2018-05-06 Thread MaxGekk
Github user MaxGekk commented on the issue: https://github.com/apache/spark/pull/21182 jenkins, retest this, please --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail

[GitHub] spark issue #21182: [SPARK-24068] Propagating DataFrameReader's options to T...

2018-05-06 Thread MaxGekk
Github user MaxGekk commented on the issue: https://github.com/apache/spark/pull/21182 jenkins, retest this, please --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail

[GitHub] spark issue #20894: [SPARK-23786][SQL] Checking column names of csv headers

2018-05-04 Thread MaxGekk
Github user MaxGekk commented on the issue: https://github.com/apache/spark/pull/20894 jenkins, retest this, please --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail

[GitHub] spark issue #21108: [SPARK-24027][SQL] Support MapType with StringType for k...

2018-05-04 Thread MaxGekk
Github user MaxGekk commented on the issue: https://github.com/apache/spark/pull/21108 jenkins, retest this, please --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail

[GitHub] spark issue #21254: [SPARK-23094][SPARK-23723][SPARK-23724][SQL][FOLLOW-UP] ...

2018-05-07 Thread MaxGekk
Github user MaxGekk commented on the issue: https://github.com/apache/spark/pull/21254 > Do we have any behavior change after the previous PR: #20937? The PR brought the `encoding` (and `charset`) option but we didn't change behavior when `encoding` is not specif

[GitHub] spark issue #21108: [SPARK-24027][SQL] Support MapType with StringType for k...

2018-05-05 Thread MaxGekk
Github user MaxGekk commented on the issue: https://github.com/apache/spark/pull/21108 @gatorsmile Kindly ask you to look at it again. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

[GitHub] spark pull request #21228: [SPARK-24171] Adding a note for non-deterministic...

2018-05-05 Thread MaxGekk
Github user MaxGekk commented on a diff in the pull request: https://github.com/apache/spark/pull/21228#discussion_r186259641 --- Diff: R/pkg/R/functions.R --- @@ -963,6 +964,7 @@ setMethod("kurtosis", #' last(df$c, TRUE) #' } #' @note last since 1.4.0

[GitHub] spark issue #20894: [SPARK-23786][SQL] Checking column names of csv headers

2018-05-05 Thread MaxGekk
Github user MaxGekk commented on the issue: https://github.com/apache/spark/pull/20894 @HyukjinKwon @gengliangwang @gatorsmile Please, have a look at it again. --- - To unsubscribe, e-mail: reviews-unsubscr

[GitHub] spark pull request #21182: [SPARK-24068] Propagating DataFrameReader's optio...

2018-05-05 Thread MaxGekk
Github user MaxGekk commented on a diff in the pull request: https://github.com/apache/spark/pull/21182#discussion_r186257363 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/json/JSONOptions.scala --- @@ -136,4 +138,6 @@ private[sql] class JSONOptions

[GitHub] spark pull request #21273: [WIP][SPARK-17916][SQL] Fix empty string being pa...

2018-05-08 Thread MaxGekk
GitHub user MaxGekk opened a pull request: https://github.com/apache/spark/pull/21273 [WIP][SPARK-17916][SQL] Fix empty string being parsed as null when nullValue is set. ## What changes were proposed in this pull request? I propose to bump version of uniVocity parser up

[GitHub] spark issue #21182: [SPARK-24068] Propagating DataFrameReader's options to T...

2018-05-08 Thread MaxGekk
Github user MaxGekk commented on the issue: https://github.com/apache/spark/pull/21182 @HyukjinKwon @viirya @gengliangwang May I ask you to look at the PR again. --- - To unsubscribe, e-mail: reviews-unsubscr

[GitHub] spark pull request #21108: [SPARK-24027][SQL] Support MapType with StringTyp...

2018-05-14 Thread MaxGekk
Github user MaxGekk commented on a diff in the pull request: https://github.com/apache/spark/pull/21108#discussion_r187888913 --- Diff: sql/core/src/test/scala/org/apache/spark/sql/JsonFunctionsSuite.scala --- @@ -326,4 +326,61 @@ class JsonFunctionsSuite extends QueryTest

[GitHub] spark issue #20894: [SPARK-23786][SQL] Checking column names of csv headers

2018-05-14 Thread MaxGekk
Github user MaxGekk commented on the issue: https://github.com/apache/spark/pull/20894 jenkins, retest this, please --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail

[GitHub] spark issue #20894: [SPARK-23786][SQL] Checking column names of csv headers

2018-05-15 Thread MaxGekk
Github user MaxGekk commented on the issue: https://github.com/apache/spark/pull/20894 @gengliangwang @gatorsmile May I ask you to look at this PR one more time. --- - To unsubscribe, e-mail: reviews-unsubscr

[GitHub] spark issue #21182: [SPARK-24068] Propagating DataFrameReader's options to T...

2018-05-09 Thread MaxGekk
Github user MaxGekk commented on the issue: https://github.com/apache/spark/pull/21182 @HyukjinKwon Sure, I will prepare a PR --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional

[GitHub] spark pull request #21182: [SPARK-24068] Propagating DataFrameReader's optio...

2018-04-27 Thread MaxGekk
GitHub user MaxGekk opened a pull request: https://github.com/apache/spark/pull/21182 [SPARK-24068] Propagating DataFrameReader's options to Text datasource on schema inferring ## What changes were proposed in this pull request? While reading CSV or JSON files

[GitHub] spark issue #21182: [SPARK-24068] Propagating DataFrameReader's options to T...

2018-04-27 Thread MaxGekk
Github user MaxGekk commented on the issue: https://github.com/apache/spark/pull/21182 jenkins, retest this, please --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail

[GitHub] spark issue #21173: [SPARK-23856][SQL] Add an option `queryTimeout` in JDBCO...

2018-04-27 Thread MaxGekk
Github user MaxGekk commented on the issue: https://github.com/apache/spark/pull/21173 What about timeout for this query: https://github.com/maropu/spark/blob/f134548bd6b7b9f2bc2c508698404a61eb9ea43e/sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/jdbc/JDBCRDD.scala

[GitHub] spark issue #21182: [SPARK-24068] Propagating DataFrameReader's options to T...

2018-05-10 Thread MaxGekk
Github user MaxGekk commented on the issue: https://github.com/apache/spark/pull/21182 Here is the backport to 2.3: https://github.com/apache/spark/pull/21292 --- - To unsubscribe, e-mail: reviews-unsubscr

[GitHub] spark pull request #21292: [SPARK-24068][BACKPORT-2.3] Propagating DataFrame...

2018-05-10 Thread MaxGekk
GitHub user MaxGekk opened a pull request: https://github.com/apache/spark/pull/21292 [SPARK-24068][BACKPORT-2.3] Propagating DataFrameReader's options to Text datasource on schema inferring ## What changes were proposed in this pull request? While reading CSV or JSON

[GitHub] spark issue #21108: [SPARK-24027][SQL] Support MapType with StringType for k...

2018-05-10 Thread MaxGekk
Github user MaxGekk commented on the issue: https://github.com/apache/spark/pull/21108 @marmbrus May I ask you to look at the PR. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional

[GitHub] spark issue #21273: [SPARK-17916][SQL] Fix empty string being parsed as null...

2018-05-10 Thread MaxGekk
Github user MaxGekk commented on the issue: https://github.com/apache/spark/pull/21273 @HyukjinKwon @maropu Please, have a look at the PR. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

[GitHub] spark pull request #21296: [SPARK-24244][SQL] CSV column pruning

2018-05-10 Thread MaxGekk
GitHub user MaxGekk opened a pull request: https://github.com/apache/spark/pull/21296 [SPARK-24244][SQL] CSV column pruning ## What changes were proposed in this pull request? uniVocity parser allows to specify only required column names or indexes for [parsing](https

[GitHub] spark issue #21296: [SPARK-24244][SQL] CSV column pruning

2018-05-10 Thread MaxGekk
Github user MaxGekk commented on the issue: https://github.com/apache/spark/pull/21296 @cloud-fan @hvanhovell Could you look at the PR, please. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

[GitHub] spark pull request #21296: [SPARK-24244][SQL] CSV column pruning

2018-05-10 Thread MaxGekk
Github user MaxGekk commented on a diff in the pull request: https://github.com/apache/spark/pull/21296#discussion_r187426203 --- Diff: sql/core/src/test/scala/org/apache/spark/sql/execution/datasources/csv/CSVSuite.scala --- @@ -267,7 +267,7 @@ class CSVSuite extends QueryTest

[GitHub] spark pull request #21247: [SPARK-24190] Separating JSONOptions for read

2018-05-12 Thread MaxGekk
Github user MaxGekk commented on a diff in the pull request: https://github.com/apache/spark/pull/21247#discussion_r187780271 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/json/JSONOptions.scala --- @@ -138,3 +121,40 @@ private[sql] class JSONOptions

[GitHub] spark issue #21273: [SPARK-17916][SQL] Fix empty string being parsed as null...

2018-05-12 Thread MaxGekk
Github user MaxGekk commented on the issue: https://github.com/apache/spark/pull/21273 @gengliangwang @gatorsmile I added a benchmark for parsing of quoted values. Parsing time dropped by **28%** (look at the commit https://github.com/apache/spark/pull/21273/commits

[GitHub] spark pull request #21296: [SPARK-24244][SQL] Passing only required columns ...

2018-05-13 Thread MaxGekk
Github user MaxGekk commented on a diff in the pull request: https://github.com/apache/spark/pull/21296#discussion_r187810542 --- Diff: sql/core/src/test/scala/org/apache/spark/sql/execution/datasources/csv/CSVSuite.scala --- @@ -1322,4 +1322,31 @@ class CSVSuite extends

[GitHub] spark pull request #21296: [SPARK-24244][SQL] CSV column pruning

2018-05-11 Thread MaxGekk
Github user MaxGekk commented on a diff in the pull request: https://github.com/apache/spark/pull/21296#discussion_r187608568 --- Diff: sql/core/src/test/scala/org/apache/spark/sql/execution/datasources/csv/CSVBenchmarks.scala --- @@ -0,0 +1,92 @@ +/* + * Licensed

[GitHub] spark pull request #21296: [SPARK-24244][SQL] CSV column pruning

2018-05-11 Thread MaxGekk
Github user MaxGekk commented on a diff in the pull request: https://github.com/apache/spark/pull/21296#discussion_r187610192 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/csv/UnivocityParser.scala --- @@ -73,11 +64,24 @@ class UnivocityParser

[GitHub] spark pull request #21296: [SPARK-24244][SQL] CSV column pruning

2018-05-11 Thread MaxGekk
Github user MaxGekk commented on a diff in the pull request: https://github.com/apache/spark/pull/21296#discussion_r187604963 --- Diff: sql/core/src/test/scala/org/apache/spark/sql/execution/datasources/csv/CSVSuite.scala --- @@ -267,7 +267,7 @@ class CSVSuite extends QueryTest

[GitHub] spark pull request #21299: [SPARK-24250][SQL] support accessing SQLConf insi...

2018-05-11 Thread MaxGekk
Github user MaxGekk commented on a diff in the pull request: https://github.com/apache/spark/pull/21299#discussion_r187707287 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/json/JsonDataSource.scala --- @@ -99,12 +99,7 @@ object

[GitHub] spark pull request #21299: [SPARK-24250][SQL] support accessing SQLConf insi...

2018-05-11 Thread MaxGekk
Github user MaxGekk commented on a diff in the pull request: https://github.com/apache/spark/pull/21299#discussion_r187697424 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/json/JsonDataSource.scala --- @@ -99,12 +99,7 @@ object

[GitHub] spark pull request #21292: [SPARK-24068][BACKPORT-2.3] Propagating DataFrame...

2018-05-10 Thread MaxGekk
Github user MaxGekk closed the pull request at: https://github.com/apache/spark/pull/21292 --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #21273: [SPARK-17916][SQL] Fix empty string being parsed as null...

2018-05-10 Thread MaxGekk
Github user MaxGekk commented on the issue: https://github.com/apache/spark/pull/21273 >> CSV parser now parses quoted values ~30% faster > Could we add a micro-benmark suite for this? @gatorsmile In this PR or in a sep

[GitHub] spark pull request #21228: [SPARK-24171] Adding a note for non-deterministic...

2018-05-06 Thread MaxGekk
Github user MaxGekk commented on a diff in the pull request: https://github.com/apache/spark/pull/21228#discussion_r186281087 --- Diff: python/pyspark/sql/functions.py --- @@ -152,13 +152,19 @@ def _(): _collect_list_doc = """ Aggregate function:

[GitHub] spark pull request #21247: [SPARK-24190] Separating JSONOptions for read

2018-05-06 Thread MaxGekk
Github user MaxGekk commented on a diff in the pull request: https://github.com/apache/spark/pull/21247#discussion_r186283555 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/json/JSONOptions.scala --- @@ -137,3 +121,40 @@ private[sql] class JSONOptions

[GitHub] spark pull request #21192: [SPARK-24118][SQL] Flexible format for the lineSe...

2018-05-06 Thread MaxGekk
Github user MaxGekk commented on a diff in the pull request: https://github.com/apache/spark/pull/21192#discussion_r186283736 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/json/JSONOptions.scala --- @@ -120,8 +120,26 @@ private[sql] class JSONOptions

[GitHub] spark issue #20894: [SPARK-23786][SQL] Checking column names of csv headers

2018-05-06 Thread MaxGekk
Github user MaxGekk commented on the issue: https://github.com/apache/spark/pull/20894 > For example, there are so many options that can be potentially added (other univocity parser options). You are right, so many things can be added but in this particular case

[GitHub] spark pull request #21182: [SPARK-24068] Propagating DataFrameReader's optio...

2018-05-05 Thread MaxGekk
Github user MaxGekk commented on a diff in the pull request: https://github.com/apache/spark/pull/21182#discussion_r186264859 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/json/JSONOptions.scala --- @@ -136,4 +136,6 @@ private[sql] class JSONOptions

[GitHub] spark pull request #21247: [SPARK-24190] Separating JSONOptions for read

2018-05-05 Thread MaxGekk
GitHub user MaxGekk opened a pull request: https://github.com/apache/spark/pull/21247 [SPARK-24190] Separating JSONOptions for read ## What changes were proposed in this pull request? Currently, restrictions in JSONOptions for `encoding` and `lineSep` are the same for read

[GitHub] spark pull request #20894: [SPARK-23786][SQL] Checking column names of csv h...

2018-05-16 Thread MaxGekk
Github user MaxGekk commented on a diff in the pull request: https://github.com/apache/spark/pull/20894#discussion_r188558354 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/csv/CSVOptions.scala --- @@ -153,6 +153,12 @@ class CSVOptions( val

[GitHub] spark issue #21377: [SPARK-24325] Tests for Hadoop's LinesReader

2018-05-20 Thread MaxGekk
Github user MaxGekk commented on the issue: https://github.com/apache/spark/pull/21377 jenkins, retest this, please --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail

[GitHub] spark pull request #21377: [SPARK-24325] Tests for Hadoop's LineReader

2018-05-20 Thread MaxGekk
GitHub user MaxGekk opened a pull request: https://github.com/apache/spark/pull/21377 [SPARK-24325] Tests for Hadoop's LineReader ## What changes were proposed in this pull request? The tests cover basic functionality of [Hadoop LinesReader](https://github.com/apache/spark

[GitHub] spark pull request #20894: [SPARK-23786][SQL] Checking column names of csv h...

2018-05-17 Thread MaxGekk
Github user MaxGekk commented on a diff in the pull request: https://github.com/apache/spark/pull/20894#discussion_r188966754 --- Diff: python/pyspark/sql/tests.py --- @@ -3040,6 +3040,24 @@ def test_csv_sampling_ratio(self): .csv(rdd, samplingRatio=0.5).schema

[GitHub] spark pull request #20894: [SPARK-23786][SQL] Checking column names of csv h...

2018-05-17 Thread MaxGekk
Github user MaxGekk commented on a diff in the pull request: https://github.com/apache/spark/pull/20894#discussion_r188961210 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/csv/CSVDataSource.scala --- @@ -206,24 +280,33 @@ object

[GitHub] spark pull request #20894: [SPARK-23786][SQL] Checking column names of csv h...

2018-05-16 Thread MaxGekk
Github user MaxGekk commented on a diff in the pull request: https://github.com/apache/spark/pull/20894#discussion_r188636535 --- Diff: sql/core/src/test/scala/org/apache/spark/sql/execution/datasources/csv/CSVSuite.scala --- @@ -1279,4 +1279,62 @@ class CSVSuite extends

[GitHub] spark pull request #20894: [SPARK-23786][SQL] Checking column names of csv h...

2018-05-16 Thread MaxGekk
Github user MaxGekk commented on a diff in the pull request: https://github.com/apache/spark/pull/20894#discussion_r188639434 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/csv/CSVDataSource.scala --- @@ -118,16 +122,62 @@ object CSVDataSource

[GitHub] spark pull request #20894: [SPARK-23786][SQL] Checking column names of csv h...

2018-05-16 Thread MaxGekk
Github user MaxGekk commented on a diff in the pull request: https://github.com/apache/spark/pull/20894#discussion_r188651192 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/DataFrameReader.scala --- @@ -497,6 +498,11 @@ class DataFrameReader private[sql](sparkSession

[GitHub] spark pull request #20894: [SPARK-23786][SQL] Checking column names of csv h...

2018-05-16 Thread MaxGekk
Github user MaxGekk commented on a diff in the pull request: https://github.com/apache/spark/pull/20894#discussion_r188654003 --- Diff: python/pyspark/sql/readwriter.py --- @@ -373,6 +373,12 @@ def csv(self, path, schema=None, sep=None, encoding=None, quote=None, escape=Non

[GitHub] spark pull request #20894: [SPARK-23786][SQL] Checking column names of csv h...

2018-05-16 Thread MaxGekk
Github user MaxGekk commented on a diff in the pull request: https://github.com/apache/spark/pull/20894#discussion_r188635693 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/csv/UnivocityParser.scala --- @@ -236,38 +236,44 @@ private[csv] object

[GitHub] spark pull request #20894: [SPARK-23786][SQL] Checking column names of csv h...

2018-05-16 Thread MaxGekk
Github user MaxGekk commented on a diff in the pull request: https://github.com/apache/spark/pull/20894#discussion_r188650707 --- Diff: python/pyspark/sql/readwriter.py --- @@ -373,6 +373,12 @@ def csv(self, path, schema=None, sep=None, encoding=None, quote=None, escape=Non

[GitHub] spark pull request #20894: [SPARK-23786][SQL] Checking column names of csv h...

2018-05-16 Thread MaxGekk
Github user MaxGekk commented on a diff in the pull request: https://github.com/apache/spark/pull/20894#discussion_r188635172 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/csv/CSVDataSource.scala --- @@ -118,6 +120,61 @@ object CSVDataSource

[GitHub] spark pull request #20894: [SPARK-23786][SQL] Checking column names of csv h...

2018-05-16 Thread MaxGekk
Github user MaxGekk commented on a diff in the pull request: https://github.com/apache/spark/pull/20894#discussion_r188613114 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/csv/CSVDataSource.scala --- @@ -202,28 +263,33 @@ object

[GitHub] spark pull request #20894: [SPARK-23786][SQL] Checking column names of csv h...

2018-05-16 Thread MaxGekk
Github user MaxGekk commented on a diff in the pull request: https://github.com/apache/spark/pull/20894#discussion_r188615023 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/csv/CSVDataSource.scala --- @@ -202,28 +263,33 @@ object

[GitHub] spark pull request #20894: [SPARK-23786][SQL] Checking column names of csv h...

2018-05-16 Thread MaxGekk
Github user MaxGekk commented on a diff in the pull request: https://github.com/apache/spark/pull/20894#discussion_r188611653 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/csv/CSVDataSource.scala --- @@ -34,6 +34,7 @@ import org.apache.spark.rdd

[GitHub] spark pull request #20894: [SPARK-23786][SQL] Checking column names of csv h...

2018-05-16 Thread MaxGekk
Github user MaxGekk commented on a diff in the pull request: https://github.com/apache/spark/pull/20894#discussion_r188612006 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/csv/CSVDataSource.scala --- @@ -118,16 +122,62 @@ object CSVDataSource

[GitHub] spark pull request #20894: [SPARK-23786][SQL] Checking column names of csv h...

2018-05-16 Thread MaxGekk
Github user MaxGekk commented on a diff in the pull request: https://github.com/apache/spark/pull/20894#discussion_r188618430 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/csv/CSVDataSource.scala --- @@ -202,28 +263,33 @@ object

[GitHub] spark pull request #21192: [SPARK-24118][SQL] Flexible format for the lineSe...

2018-05-17 Thread MaxGekk
Github user MaxGekk commented on a diff in the pull request: https://github.com/apache/spark/pull/21192#discussion_r189047893 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/json/JSONOptions.scala --- @@ -120,8 +120,26 @@ private[sql] class JSONOptions

[GitHub] spark pull request #20894: [SPARK-23786][SQL] Checking column names of csv h...

2018-05-16 Thread MaxGekk
Github user MaxGekk commented on a diff in the pull request: https://github.com/apache/spark/pull/20894#discussion_r188551737 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/csv/CSVOptions.scala --- @@ -153,6 +153,12 @@ class CSVOptions( val

[GitHub] spark pull request #21380: [SPARK-24329][SQL] Remove comments filtering befo...

2018-05-22 Thread MaxGekk
Github user MaxGekk commented on a diff in the pull request: https://github.com/apache/spark/pull/21380#discussion_r189879531 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/csv/UnivocityParser.scala --- @@ -300,14 +302,11 @@ private[csv] object

[GitHub] spark pull request #21394: [SPARK-24329][SQL] Test for skipping multi-space ...

2018-05-22 Thread MaxGekk
GitHub user MaxGekk opened a pull request: https://github.com/apache/spark/pull/21394 [SPARK-24329][SQL] Test for skipping multi-space lines ## What changes were proposed in this pull request? The PR is a continue of https://github.com/apache/spark/pull/21380 . It checks

[GitHub] spark issue #21296: [SPARK-24244][SQL] Passing only required columns to the ...

2018-05-22 Thread MaxGekk
Github user MaxGekk commented on the issue: https://github.com/apache/spark/pull/21296 I added the word `parser` to the feature name because as @HyukjinKwon wrote above we do pruning in type conversion already. This PR enables column pruning by CSV parser only

[GitHub] spark issue #20894: [SPARK-23786][SQL] Checking column names of csv headers

2018-05-22 Thread MaxGekk
Github user MaxGekk commented on the issue: https://github.com/apache/spark/pull/20894 @gatorsmile @cloud-fan @HyukjinKwon @gengliangwang May I ask you to look at the PR again. --- - To unsubscribe, e-mail: reviews

[GitHub] spark issue #21296: [SPARK-24244][SQL] Passing only required columns to the ...

2018-05-22 Thread MaxGekk
Github user MaxGekk commented on the issue: https://github.com/apache/spark/pull/21296 @cloud-fan @HyukjinKwon Could you look at the PR, please. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

[GitHub] spark pull request #20929: [SPARK-23772][SQL] Provide an option to ignore co...

2018-05-23 Thread MaxGekk
Github user MaxGekk commented on a diff in the pull request: https://github.com/apache/spark/pull/20929#discussion_r190399084 --- Diff: sql/core/src/test/scala/org/apache/spark/sql/execution/datasources/json/JsonSuite.scala --- @@ -2408,4 +2408,24 @@ class JsonSuite extends

[GitHub] spark pull request #20929: [SPARK-23772][SQL] Provide an option to ignore co...

2018-05-23 Thread MaxGekk
Github user MaxGekk commented on a diff in the pull request: https://github.com/apache/spark/pull/20929#discussion_r190401748 --- Diff: sql/core/src/test/scala/org/apache/spark/sql/execution/datasources/json/JsonSuite.scala --- @@ -2408,4 +2408,24 @@ class JsonSuite extends

[GitHub] spark issue #21410: [SPARK-24366][SQL] Improving of error messages for type ...

2018-05-23 Thread MaxGekk
Github user MaxGekk commented on the issue: https://github.com/apache/spark/pull/21410 jenkins, retest this, please --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail

[GitHub] spark pull request #21415: [SPARK-24244][SPARK-24368][SQL] Passing only requ...

2018-05-23 Thread MaxGekk
GitHub user MaxGekk opened a pull request: https://github.com/apache/spark/pull/21415 [SPARK-24244][SPARK-24368][SQL] Passing only required columns to the CSV parser ## What changes were proposed in this pull request? uniVocity parser allows to specify only required column

[GitHub] spark issue #21415: [SPARK-24244][SPARK-24368][SQL] Passing only required co...

2018-05-23 Thread MaxGekk
Github user MaxGekk commented on the issue: https://github.com/apache/spark/pull/21415 The difference between this PR and #21296 is that the `columnPruning` is passed to CSVOptions as a parameter. It should fix flaky `UnivocityParserSuite

[GitHub] spark pull request #20929: [SPARK-23772][SQL] Provide an option to ignore co...

2018-05-23 Thread MaxGekk
Github user MaxGekk commented on a diff in the pull request: https://github.com/apache/spark/pull/20929#discussion_r190400420 --- Diff: sql/core/src/test/scala/org/apache/spark/sql/execution/datasources/json/JsonSuite.scala --- @@ -2408,4 +2408,24 @@ class JsonSuite extends

[GitHub] spark pull request #20929: [SPARK-23772][SQL] Provide an option to ignore co...

2018-05-23 Thread MaxGekk
Github user MaxGekk commented on a diff in the pull request: https://github.com/apache/spark/pull/20929#discussion_r190397868 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/DataFrameReader.scala --- @@ -379,6 +379,8 @@ class DataFrameReader private[sql](sparkSession

[GitHub] spark pull request #20894: [SPARK-23786][SQL] Checking column names of csv h...

2018-05-23 Thread MaxGekk
Github user MaxGekk commented on a diff in the pull request: https://github.com/apache/spark/pull/20894#discussion_r190147538 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/csv/CSVDataSource.scala --- @@ -118,6 +121,64 @@ object CSVDataSource

[GitHub] spark pull request #20894: [SPARK-23786][SQL] Checking column names of csv h...

2018-05-23 Thread MaxGekk
Github user MaxGekk commented on a diff in the pull request: https://github.com/apache/spark/pull/20894#discussion_r190146438 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/csv/UnivocityParser.scala --- @@ -248,28 +248,32 @@ private[csv] object

[GitHub] spark pull request #20894: [SPARK-23786][SQL] Checking column names of csv h...

2018-05-23 Thread MaxGekk
Github user MaxGekk commented on a diff in the pull request: https://github.com/apache/spark/pull/20894#discussion_r190148356 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/csv/CSVDataSource.scala --- @@ -118,6 +121,64 @@ object CSVDataSource

[GitHub] spark issue #21394: [SPARK-24329][SQL] Test for skipping multi-space lines

2018-05-23 Thread MaxGekk
Github user MaxGekk commented on the issue: https://github.com/apache/spark/pull/21394 jenkins, retest this, please --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail

[GitHub] spark issue #21415: [SPARK-24244][SPARK-24368][SQL] Passing only required co...

2018-05-23 Thread MaxGekk
Github user MaxGekk commented on the issue: https://github.com/apache/spark/pull/21415 jenkins, retest this, please --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail

[GitHub] spark pull request #21410: [SPARK-24366][SQL] Improving of error messages fo...

2018-05-23 Thread MaxGekk
GitHub user MaxGekk opened a pull request: https://github.com/apache/spark/pull/21410 [SPARK-24366][SQL] Improving of error messages for type converting ## What changes were proposed in this pull request? Currently, users are getting the following error messages on type

[GitHub] spark issue #21415: [SPARK-24244][SPARK-24368][SQL] Passing only required co...

2018-05-24 Thread MaxGekk
Github user MaxGekk commented on the issue: https://github.com/apache/spark/pull/21415 jenkins, retest this, please --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail

[GitHub] spark issue #21415: [SPARK-24244][SPARK-24368][SQL] Passing only required co...

2018-05-24 Thread MaxGekk
Github user MaxGekk commented on the issue: https://github.com/apache/spark/pull/21415 jenkins, retest this, please --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail

[GitHub] spark issue #21394: [SPARK-24329][SQL] Test for skipping multi-space lines

2018-05-22 Thread MaxGekk
Github user MaxGekk commented on the issue: https://github.com/apache/spark/pull/21394 jenkins, retest this, please --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail

[GitHub] spark pull request #21380: [SPARK-24329][SQL] Remove comments filtering befo...

2018-05-22 Thread MaxGekk
Github user MaxGekk commented on a diff in the pull request: https://github.com/apache/spark/pull/21380#discussion_r190023523 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/csv/UnivocityParser.scala --- @@ -300,14 +302,11 @@ private[csv] object

[GitHub] spark issue #21410: [SPARK-24366][SQL] Improving of error messages for type ...

2018-05-24 Thread MaxGekk
Github user MaxGekk commented on the issue: https://github.com/apache/spark/pull/21410 @gatorsmile Could you look at the PR, please. The changes should help us in trouble shooting of customer's issues

[GitHub] spark issue #21415: [SPARK-24244][SPARK-24368][SQL] Passing only required co...

2018-05-24 Thread MaxGekk
Github user MaxGekk commented on the issue: https://github.com/apache/spark/pull/21415 jenkins, retest this, please --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail

[GitHub] spark issue #21410: [SPARK-24366][SQL] Improving of error messages for type ...

2018-05-24 Thread MaxGekk
Github user MaxGekk commented on the issue: https://github.com/apache/spark/pull/21410 > Is there a way to identify where in the schema the issue is occurring? We can catch the exceptions on each level of schema tree traversal, and show sub-trees in each catch. For exam

[GitHub] spark pull request #21415: [SPARK-24244][SPARK-24368][SQL] Passing only requ...

2018-05-24 Thread MaxGekk
Github user MaxGekk commented on a diff in the pull request: https://github.com/apache/spark/pull/21415#discussion_r190694499 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/csv/CSVOptions.scala --- @@ -29,17 +29,20 @@ import

[GitHub] spark issue #21394: [SPARK-24329][SQL] Test for skipping multi-space lines

2018-05-24 Thread MaxGekk
Github user MaxGekk commented on the issue: https://github.com/apache/spark/pull/21394 @HyukjinKwon @gengliangwang @maropu Please, look at the PR. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

[GitHub] spark pull request #21415: [SPARK-24244][SPARK-24368][SQL] Passing only requ...

2018-05-24 Thread MaxGekk
Github user MaxGekk commented on a diff in the pull request: https://github.com/apache/spark/pull/21415#discussion_r190724870 --- Diff: sql/core/src/test/scala/org/apache/spark/sql/execution/datasources/csv/CSVSuite.scala --- @@ -1383,4 +1385,31 @@ class CSVSuite extends

[GitHub] spark pull request #21415: [SPARK-24244][SPARK-24368][SQL] Passing only requ...

2018-05-24 Thread MaxGekk
Github user MaxGekk commented on a diff in the pull request: https://github.com/apache/spark/pull/21415#discussion_r190725111 --- Diff: sql/core/src/test/scala/org/apache/spark/sql/execution/datasources/csv/CSVBenchmarks.scala --- @@ -74,7 +74,49 @@ object CSVBenchmarks

[GitHub] spark pull request #21415: [SPARK-24244][SPARK-24368][SQL] Passing only requ...

2018-05-24 Thread MaxGekk
Github user MaxGekk commented on a diff in the pull request: https://github.com/apache/spark/pull/21415#discussion_r190724995 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/csv/CSVOptions.scala --- @@ -29,17 +29,20 @@ import

[GitHub] spark pull request #21410: [SPARK-24366][SQL] Improving of error messages fo...

2018-05-25 Thread MaxGekk
Github user MaxGekk commented on a diff in the pull request: https://github.com/apache/spark/pull/21410#discussion_r190823171 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/CatalystTypeConverters.scala --- @@ -309,6 +322,9 @@ object CatalystTypeConverters

[GitHub] spark pull request #21296: [SPARK-24244][SQL] Passing only required columns ...

2018-05-18 Thread MaxGekk
Github user MaxGekk commented on a diff in the pull request: https://github.com/apache/spark/pull/21296#discussion_r189277890 --- Diff: docs/sql-programming-guide.md --- @@ -1814,6 +1814,7 @@ working with timestamps in `pandas_udf`s to get the best performance, see

[GitHub] spark pull request #20894: [SPARK-23786][SQL] Checking column names of csv h...

2018-05-18 Thread MaxGekk
Github user MaxGekk commented on a diff in the pull request: https://github.com/apache/spark/pull/20894#discussion_r189387496 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/DataFrameReader.scala --- @@ -497,6 +498,11 @@ class DataFrameReader private[sql](sparkSession

[GitHub] spark pull request #20894: [SPARK-23786][SQL] Checking column names of csv h...

2018-05-17 Thread MaxGekk
Github user MaxGekk commented on a diff in the pull request: https://github.com/apache/spark/pull/20894#discussion_r189100297 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/csv/CSVOptions.scala --- @@ -153,6 +153,12 @@ class CSVOptions( val

[GitHub] spark pull request #20894: [SPARK-23786][SQL] Checking column names of csv h...

2018-05-17 Thread MaxGekk
Github user MaxGekk commented on a diff in the pull request: https://github.com/apache/spark/pull/20894#discussion_r189103522 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/DataFrameReader.scala --- @@ -497,6 +498,11 @@ class DataFrameReader private[sql](sparkSession

[GitHub] spark pull request #20894: [SPARK-23786][SQL] Checking column names of csv h...

2018-05-17 Thread MaxGekk
Github user MaxGekk commented on a diff in the pull request: https://github.com/apache/spark/pull/20894#discussion_r189106956 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/csv/UnivocityParser.scala --- @@ -234,38 +234,42 @@ private[csv] object

[GitHub] spark issue #21394: [SPARK-24329][SQL] Test for skipping multi-space lines

2018-05-22 Thread MaxGekk
Github user MaxGekk commented on the issue: https://github.com/apache/spark/pull/21394 jenkins, retest this, please --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail

[GitHub] spark pull request #21414: [SPARK-24368][SQL] Removing columnPruning from CS...

2018-05-23 Thread MaxGekk
GitHub user MaxGekk opened a pull request: https://github.com/apache/spark/pull/21414 [SPARK-24368][SQL] Removing columnPruning from CSVOptions ## What changes were proposed in this pull request? In the PR, I removed the private `columnPruning` value from `CSVOptions

[GitHub] spark pull request #21414: [SPARK-24368][SQL] Removing columnPruning from CS...

2018-05-23 Thread MaxGekk
Github user MaxGekk closed the pull request at: https://github.com/apache/spark/pull/21414 --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #21380: [SPARK-24329][SQL] Remove comments filtering befo...

2018-05-21 Thread MaxGekk
GitHub user MaxGekk opened a pull request: https://github.com/apache/spark/pull/21380 [SPARK-24329][SQL] Remove comments filtering before parsing of CSV files ## What changes were proposed in this pull request? Filtering of comments and whitespace has been performed

  1   2   3   4   5   6   7   8   9   10   >