Github user MaxGekk commented on the issue:
https://github.com/apache/spark/pull/21182
jenkins, retest this, please
---
-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail
Github user MaxGekk commented on the issue:
https://github.com/apache/spark/pull/21182
jenkins, retest this, please
---
-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail
Github user MaxGekk commented on the issue:
https://github.com/apache/spark/pull/20894
jenkins, retest this, please
---
-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail
Github user MaxGekk commented on the issue:
https://github.com/apache/spark/pull/21108
jenkins, retest this, please
---
-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail
Github user MaxGekk commented on the issue:
https://github.com/apache/spark/pull/21254
> Do we have any behavior change after the previous PR: #20937?
The PR brought the `encoding` (and `charset`) option but we didn't change
behavior when `encoding` is not specif
Github user MaxGekk commented on the issue:
https://github.com/apache/spark/pull/21108
@gatorsmile Kindly ask you to look at it again.
---
-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
Github user MaxGekk commented on a diff in the pull request:
https://github.com/apache/spark/pull/21228#discussion_r186259641
--- Diff: R/pkg/R/functions.R ---
@@ -963,6 +964,7 @@ setMethod("kurtosis",
#' last(df$c, TRUE)
#' }
#' @note last since 1.4.0
Github user MaxGekk commented on the issue:
https://github.com/apache/spark/pull/20894
@HyukjinKwon @gengliangwang @gatorsmile Please, have a look at it again.
---
-
To unsubscribe, e-mail: reviews-unsubscr
Github user MaxGekk commented on a diff in the pull request:
https://github.com/apache/spark/pull/21182#discussion_r186257363
--- Diff:
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/json/JSONOptions.scala
---
@@ -136,4 +138,6 @@ private[sql] class JSONOptions
GitHub user MaxGekk opened a pull request:
https://github.com/apache/spark/pull/21273
[WIP][SPARK-17916][SQL] Fix empty string being parsed as null when
nullValue is set.
## What changes were proposed in this pull request?
I propose to bump version of uniVocity parser up
Github user MaxGekk commented on the issue:
https://github.com/apache/spark/pull/21182
@HyukjinKwon @viirya @gengliangwang May I ask you to look at the PR again.
---
-
To unsubscribe, e-mail: reviews-unsubscr
Github user MaxGekk commented on a diff in the pull request:
https://github.com/apache/spark/pull/21108#discussion_r187888913
--- Diff:
sql/core/src/test/scala/org/apache/spark/sql/JsonFunctionsSuite.scala ---
@@ -326,4 +326,61 @@ class JsonFunctionsSuite extends QueryTest
Github user MaxGekk commented on the issue:
https://github.com/apache/spark/pull/20894
jenkins, retest this, please
---
-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail
Github user MaxGekk commented on the issue:
https://github.com/apache/spark/pull/20894
@gengliangwang @gatorsmile May I ask you to look at this PR one more time.
---
-
To unsubscribe, e-mail: reviews-unsubscr
Github user MaxGekk commented on the issue:
https://github.com/apache/spark/pull/21182
@HyukjinKwon Sure, I will prepare a PR
---
-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional
GitHub user MaxGekk opened a pull request:
https://github.com/apache/spark/pull/21182
[SPARK-24068] Propagating DataFrameReader's options to Text datasource on
schema inferring
## What changes were proposed in this pull request?
While reading CSV or JSON files
Github user MaxGekk commented on the issue:
https://github.com/apache/spark/pull/21182
jenkins, retest this, please
---
-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail
Github user MaxGekk commented on the issue:
https://github.com/apache/spark/pull/21173
What about timeout for this query:
https://github.com/maropu/spark/blob/f134548bd6b7b9f2bc2c508698404a61eb9ea43e/sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/jdbc/JDBCRDD.scala
Github user MaxGekk commented on the issue:
https://github.com/apache/spark/pull/21182
Here is the backport to 2.3: https://github.com/apache/spark/pull/21292
---
-
To unsubscribe, e-mail: reviews-unsubscr
GitHub user MaxGekk opened a pull request:
https://github.com/apache/spark/pull/21292
[SPARK-24068][BACKPORT-2.3] Propagating DataFrameReader's options to Text
datasource on schema inferring
## What changes were proposed in this pull request?
While reading CSV or JSON
Github user MaxGekk commented on the issue:
https://github.com/apache/spark/pull/21108
@marmbrus May I ask you to look at the PR.
---
-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional
Github user MaxGekk commented on the issue:
https://github.com/apache/spark/pull/21273
@HyukjinKwon @maropu Please, have a look at the PR.
---
-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
GitHub user MaxGekk opened a pull request:
https://github.com/apache/spark/pull/21296
[SPARK-24244][SQL] CSV column pruning
## What changes were proposed in this pull request?
uniVocity parser allows to specify only required column names or indexes
for [parsing](https
Github user MaxGekk commented on the issue:
https://github.com/apache/spark/pull/21296
@cloud-fan @hvanhovell Could you look at the PR, please.
---
-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
Github user MaxGekk commented on a diff in the pull request:
https://github.com/apache/spark/pull/21296#discussion_r187426203
--- Diff:
sql/core/src/test/scala/org/apache/spark/sql/execution/datasources/csv/CSVSuite.scala
---
@@ -267,7 +267,7 @@ class CSVSuite extends QueryTest
Github user MaxGekk commented on a diff in the pull request:
https://github.com/apache/spark/pull/21247#discussion_r187780271
--- Diff:
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/json/JSONOptions.scala
---
@@ -138,3 +121,40 @@ private[sql] class JSONOptions
Github user MaxGekk commented on the issue:
https://github.com/apache/spark/pull/21273
@gengliangwang @gatorsmile I added a benchmark for parsing of quoted
values. Parsing time dropped by **28%** (look at the commit
https://github.com/apache/spark/pull/21273/commits
Github user MaxGekk commented on a diff in the pull request:
https://github.com/apache/spark/pull/21296#discussion_r187810542
--- Diff:
sql/core/src/test/scala/org/apache/spark/sql/execution/datasources/csv/CSVSuite.scala
---
@@ -1322,4 +1322,31 @@ class CSVSuite extends
Github user MaxGekk commented on a diff in the pull request:
https://github.com/apache/spark/pull/21296#discussion_r187608568
--- Diff:
sql/core/src/test/scala/org/apache/spark/sql/execution/datasources/csv/CSVBenchmarks.scala
---
@@ -0,0 +1,92 @@
+/*
+ * Licensed
Github user MaxGekk commented on a diff in the pull request:
https://github.com/apache/spark/pull/21296#discussion_r187610192
--- Diff:
sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/csv/UnivocityParser.scala
---
@@ -73,11 +64,24 @@ class UnivocityParser
Github user MaxGekk commented on a diff in the pull request:
https://github.com/apache/spark/pull/21296#discussion_r187604963
--- Diff:
sql/core/src/test/scala/org/apache/spark/sql/execution/datasources/csv/CSVSuite.scala
---
@@ -267,7 +267,7 @@ class CSVSuite extends QueryTest
Github user MaxGekk commented on a diff in the pull request:
https://github.com/apache/spark/pull/21299#discussion_r187707287
--- Diff:
sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/json/JsonDataSource.scala
---
@@ -99,12 +99,7 @@ object
Github user MaxGekk commented on a diff in the pull request:
https://github.com/apache/spark/pull/21299#discussion_r187697424
--- Diff:
sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/json/JsonDataSource.scala
---
@@ -99,12 +99,7 @@ object
Github user MaxGekk closed the pull request at:
https://github.com/apache/spark/pull/21292
---
-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org
Github user MaxGekk commented on the issue:
https://github.com/apache/spark/pull/21273
>> CSV parser now parses quoted values ~30% faster
> Could we add a micro-benmark suite for this?
@gatorsmile In this PR or in a sep
Github user MaxGekk commented on a diff in the pull request:
https://github.com/apache/spark/pull/21228#discussion_r186281087
--- Diff: python/pyspark/sql/functions.py ---
@@ -152,13 +152,19 @@ def _():
_collect_list_doc = """
Aggregate function:
Github user MaxGekk commented on a diff in the pull request:
https://github.com/apache/spark/pull/21247#discussion_r186283555
--- Diff:
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/json/JSONOptions.scala
---
@@ -137,3 +121,40 @@ private[sql] class JSONOptions
Github user MaxGekk commented on a diff in the pull request:
https://github.com/apache/spark/pull/21192#discussion_r186283736
--- Diff:
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/json/JSONOptions.scala
---
@@ -120,8 +120,26 @@ private[sql] class JSONOptions
Github user MaxGekk commented on the issue:
https://github.com/apache/spark/pull/20894
> For example, there are so many options that can be potentially added
(other univocity parser options).
You are right, so many things can be added but in this particular case
Github user MaxGekk commented on a diff in the pull request:
https://github.com/apache/spark/pull/21182#discussion_r186264859
--- Diff:
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/json/JSONOptions.scala
---
@@ -136,4 +136,6 @@ private[sql] class JSONOptions
GitHub user MaxGekk opened a pull request:
https://github.com/apache/spark/pull/21247
[SPARK-24190] Separating JSONOptions for read
## What changes were proposed in this pull request?
Currently, restrictions in JSONOptions for `encoding` and `lineSep` are the
same for read
Github user MaxGekk commented on a diff in the pull request:
https://github.com/apache/spark/pull/20894#discussion_r188558354
--- Diff:
sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/csv/CSVOptions.scala
---
@@ -153,6 +153,12 @@ class CSVOptions(
val
Github user MaxGekk commented on the issue:
https://github.com/apache/spark/pull/21377
jenkins, retest this, please
---
-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail
GitHub user MaxGekk opened a pull request:
https://github.com/apache/spark/pull/21377
[SPARK-24325] Tests for Hadoop's LineReader
## What changes were proposed in this pull request?
The tests cover basic functionality of [Hadoop
LinesReader](https://github.com/apache/spark
Github user MaxGekk commented on a diff in the pull request:
https://github.com/apache/spark/pull/20894#discussion_r188966754
--- Diff: python/pyspark/sql/tests.py ---
@@ -3040,6 +3040,24 @@ def test_csv_sampling_ratio(self):
.csv(rdd, samplingRatio=0.5).schema
Github user MaxGekk commented on a diff in the pull request:
https://github.com/apache/spark/pull/20894#discussion_r188961210
--- Diff:
sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/csv/CSVDataSource.scala
---
@@ -206,24 +280,33 @@ object
Github user MaxGekk commented on a diff in the pull request:
https://github.com/apache/spark/pull/20894#discussion_r188636535
--- Diff:
sql/core/src/test/scala/org/apache/spark/sql/execution/datasources/csv/CSVSuite.scala
---
@@ -1279,4 +1279,62 @@ class CSVSuite extends
Github user MaxGekk commented on a diff in the pull request:
https://github.com/apache/spark/pull/20894#discussion_r188639434
--- Diff:
sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/csv/CSVDataSource.scala
---
@@ -118,16 +122,62 @@ object CSVDataSource
Github user MaxGekk commented on a diff in the pull request:
https://github.com/apache/spark/pull/20894#discussion_r188651192
--- Diff:
sql/core/src/main/scala/org/apache/spark/sql/DataFrameReader.scala ---
@@ -497,6 +498,11 @@ class DataFrameReader private[sql](sparkSession
Github user MaxGekk commented on a diff in the pull request:
https://github.com/apache/spark/pull/20894#discussion_r188654003
--- Diff: python/pyspark/sql/readwriter.py ---
@@ -373,6 +373,12 @@ def csv(self, path, schema=None, sep=None,
encoding=None, quote=None, escape=Non
Github user MaxGekk commented on a diff in the pull request:
https://github.com/apache/spark/pull/20894#discussion_r188635693
--- Diff:
sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/csv/UnivocityParser.scala
---
@@ -236,38 +236,44 @@ private[csv] object
Github user MaxGekk commented on a diff in the pull request:
https://github.com/apache/spark/pull/20894#discussion_r188650707
--- Diff: python/pyspark/sql/readwriter.py ---
@@ -373,6 +373,12 @@ def csv(self, path, schema=None, sep=None,
encoding=None, quote=None, escape=Non
Github user MaxGekk commented on a diff in the pull request:
https://github.com/apache/spark/pull/20894#discussion_r188635172
--- Diff:
sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/csv/CSVDataSource.scala
---
@@ -118,6 +120,61 @@ object CSVDataSource
Github user MaxGekk commented on a diff in the pull request:
https://github.com/apache/spark/pull/20894#discussion_r188613114
--- Diff:
sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/csv/CSVDataSource.scala
---
@@ -202,28 +263,33 @@ object
Github user MaxGekk commented on a diff in the pull request:
https://github.com/apache/spark/pull/20894#discussion_r188615023
--- Diff:
sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/csv/CSVDataSource.scala
---
@@ -202,28 +263,33 @@ object
Github user MaxGekk commented on a diff in the pull request:
https://github.com/apache/spark/pull/20894#discussion_r188611653
--- Diff:
sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/csv/CSVDataSource.scala
---
@@ -34,6 +34,7 @@ import org.apache.spark.rdd
Github user MaxGekk commented on a diff in the pull request:
https://github.com/apache/spark/pull/20894#discussion_r188612006
--- Diff:
sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/csv/CSVDataSource.scala
---
@@ -118,16 +122,62 @@ object CSVDataSource
Github user MaxGekk commented on a diff in the pull request:
https://github.com/apache/spark/pull/20894#discussion_r188618430
--- Diff:
sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/csv/CSVDataSource.scala
---
@@ -202,28 +263,33 @@ object
Github user MaxGekk commented on a diff in the pull request:
https://github.com/apache/spark/pull/21192#discussion_r189047893
--- Diff:
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/json/JSONOptions.scala
---
@@ -120,8 +120,26 @@ private[sql] class JSONOptions
Github user MaxGekk commented on a diff in the pull request:
https://github.com/apache/spark/pull/20894#discussion_r188551737
--- Diff:
sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/csv/CSVOptions.scala
---
@@ -153,6 +153,12 @@ class CSVOptions(
val
Github user MaxGekk commented on a diff in the pull request:
https://github.com/apache/spark/pull/21380#discussion_r189879531
--- Diff:
sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/csv/UnivocityParser.scala
---
@@ -300,14 +302,11 @@ private[csv] object
GitHub user MaxGekk opened a pull request:
https://github.com/apache/spark/pull/21394
[SPARK-24329][SQL] Test for skipping multi-space lines
## What changes were proposed in this pull request?
The PR is a continue of https://github.com/apache/spark/pull/21380 . It
checks
Github user MaxGekk commented on the issue:
https://github.com/apache/spark/pull/21296
I added the word `parser` to the feature name because as @HyukjinKwon wrote
above we do pruning in type conversion already. This PR enables column pruning
by CSV parser only
Github user MaxGekk commented on the issue:
https://github.com/apache/spark/pull/20894
@gatorsmile @cloud-fan @HyukjinKwon @gengliangwang May I ask you to look at
the PR again.
---
-
To unsubscribe, e-mail: reviews
Github user MaxGekk commented on the issue:
https://github.com/apache/spark/pull/21296
@cloud-fan @HyukjinKwon Could you look at the PR, please.
---
-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
Github user MaxGekk commented on a diff in the pull request:
https://github.com/apache/spark/pull/20929#discussion_r190399084
--- Diff:
sql/core/src/test/scala/org/apache/spark/sql/execution/datasources/json/JsonSuite.scala
---
@@ -2408,4 +2408,24 @@ class JsonSuite extends
Github user MaxGekk commented on a diff in the pull request:
https://github.com/apache/spark/pull/20929#discussion_r190401748
--- Diff:
sql/core/src/test/scala/org/apache/spark/sql/execution/datasources/json/JsonSuite.scala
---
@@ -2408,4 +2408,24 @@ class JsonSuite extends
Github user MaxGekk commented on the issue:
https://github.com/apache/spark/pull/21410
jenkins, retest this, please
---
-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail
GitHub user MaxGekk opened a pull request:
https://github.com/apache/spark/pull/21415
[SPARK-24244][SPARK-24368][SQL] Passing only required columns to the CSV
parser
## What changes were proposed in this pull request?
uniVocity parser allows to specify only required column
Github user MaxGekk commented on the issue:
https://github.com/apache/spark/pull/21415
The difference between this PR and #21296 is that the `columnPruning` is
passed to CSVOptions as a parameter. It should fix flaky `UnivocityParserSuite
Github user MaxGekk commented on a diff in the pull request:
https://github.com/apache/spark/pull/20929#discussion_r190400420
--- Diff:
sql/core/src/test/scala/org/apache/spark/sql/execution/datasources/json/JsonSuite.scala
---
@@ -2408,4 +2408,24 @@ class JsonSuite extends
Github user MaxGekk commented on a diff in the pull request:
https://github.com/apache/spark/pull/20929#discussion_r190397868
--- Diff:
sql/core/src/main/scala/org/apache/spark/sql/DataFrameReader.scala ---
@@ -379,6 +379,8 @@ class DataFrameReader private[sql](sparkSession
Github user MaxGekk commented on a diff in the pull request:
https://github.com/apache/spark/pull/20894#discussion_r190147538
--- Diff:
sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/csv/CSVDataSource.scala
---
@@ -118,6 +121,64 @@ object CSVDataSource
Github user MaxGekk commented on a diff in the pull request:
https://github.com/apache/spark/pull/20894#discussion_r190146438
--- Diff:
sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/csv/UnivocityParser.scala
---
@@ -248,28 +248,32 @@ private[csv] object
Github user MaxGekk commented on a diff in the pull request:
https://github.com/apache/spark/pull/20894#discussion_r190148356
--- Diff:
sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/csv/CSVDataSource.scala
---
@@ -118,6 +121,64 @@ object CSVDataSource
Github user MaxGekk commented on the issue:
https://github.com/apache/spark/pull/21394
jenkins, retest this, please
---
-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail
Github user MaxGekk commented on the issue:
https://github.com/apache/spark/pull/21415
jenkins, retest this, please
---
-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail
GitHub user MaxGekk opened a pull request:
https://github.com/apache/spark/pull/21410
[SPARK-24366][SQL] Improving of error messages for type converting
## What changes were proposed in this pull request?
Currently, users are getting the following error messages on type
Github user MaxGekk commented on the issue:
https://github.com/apache/spark/pull/21415
jenkins, retest this, please
---
-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail
Github user MaxGekk commented on the issue:
https://github.com/apache/spark/pull/21415
jenkins, retest this, please
---
-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail
Github user MaxGekk commented on the issue:
https://github.com/apache/spark/pull/21394
jenkins, retest this, please
---
-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail
Github user MaxGekk commented on a diff in the pull request:
https://github.com/apache/spark/pull/21380#discussion_r190023523
--- Diff:
sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/csv/UnivocityParser.scala
---
@@ -300,14 +302,11 @@ private[csv] object
Github user MaxGekk commented on the issue:
https://github.com/apache/spark/pull/21410
@gatorsmile Could you look at the PR, please. The changes should help us in
trouble shooting of customer's issues
Github user MaxGekk commented on the issue:
https://github.com/apache/spark/pull/21415
jenkins, retest this, please
---
-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail
Github user MaxGekk commented on the issue:
https://github.com/apache/spark/pull/21410
> Is there a way to identify where in the schema the issue is occurring?
We can catch the exceptions on each level of schema tree traversal, and
show sub-trees in each catch. For exam
Github user MaxGekk commented on a diff in the pull request:
https://github.com/apache/spark/pull/21415#discussion_r190694499
--- Diff:
sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/csv/CSVOptions.scala
---
@@ -29,17 +29,20 @@ import
Github user MaxGekk commented on the issue:
https://github.com/apache/spark/pull/21394
@HyukjinKwon @gengliangwang @maropu Please, look at the PR.
---
-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
Github user MaxGekk commented on a diff in the pull request:
https://github.com/apache/spark/pull/21415#discussion_r190724870
--- Diff:
sql/core/src/test/scala/org/apache/spark/sql/execution/datasources/csv/CSVSuite.scala
---
@@ -1383,4 +1385,31 @@ class CSVSuite extends
Github user MaxGekk commented on a diff in the pull request:
https://github.com/apache/spark/pull/21415#discussion_r190725111
--- Diff:
sql/core/src/test/scala/org/apache/spark/sql/execution/datasources/csv/CSVBenchmarks.scala
---
@@ -74,7 +74,49 @@ object CSVBenchmarks
Github user MaxGekk commented on a diff in the pull request:
https://github.com/apache/spark/pull/21415#discussion_r190724995
--- Diff:
sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/csv/CSVOptions.scala
---
@@ -29,17 +29,20 @@ import
Github user MaxGekk commented on a diff in the pull request:
https://github.com/apache/spark/pull/21410#discussion_r190823171
--- Diff:
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/CatalystTypeConverters.scala
---
@@ -309,6 +322,9 @@ object CatalystTypeConverters
Github user MaxGekk commented on a diff in the pull request:
https://github.com/apache/spark/pull/21296#discussion_r189277890
--- Diff: docs/sql-programming-guide.md ---
@@ -1814,6 +1814,7 @@ working with timestamps in `pandas_udf`s to get the
best performance, see
Github user MaxGekk commented on a diff in the pull request:
https://github.com/apache/spark/pull/20894#discussion_r189387496
--- Diff:
sql/core/src/main/scala/org/apache/spark/sql/DataFrameReader.scala ---
@@ -497,6 +498,11 @@ class DataFrameReader private[sql](sparkSession
Github user MaxGekk commented on a diff in the pull request:
https://github.com/apache/spark/pull/20894#discussion_r189100297
--- Diff:
sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/csv/CSVOptions.scala
---
@@ -153,6 +153,12 @@ class CSVOptions(
val
Github user MaxGekk commented on a diff in the pull request:
https://github.com/apache/spark/pull/20894#discussion_r189103522
--- Diff:
sql/core/src/main/scala/org/apache/spark/sql/DataFrameReader.scala ---
@@ -497,6 +498,11 @@ class DataFrameReader private[sql](sparkSession
Github user MaxGekk commented on a diff in the pull request:
https://github.com/apache/spark/pull/20894#discussion_r189106956
--- Diff:
sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/csv/UnivocityParser.scala
---
@@ -234,38 +234,42 @@ private[csv] object
Github user MaxGekk commented on the issue:
https://github.com/apache/spark/pull/21394
jenkins, retest this, please
---
-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail
GitHub user MaxGekk opened a pull request:
https://github.com/apache/spark/pull/21414
[SPARK-24368][SQL] Removing columnPruning from CSVOptions
## What changes were proposed in this pull request?
In the PR, I removed the private `columnPruning` value from `CSVOptions
Github user MaxGekk closed the pull request at:
https://github.com/apache/spark/pull/21414
---
-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org
GitHub user MaxGekk opened a pull request:
https://github.com/apache/spark/pull/21380
[SPARK-24329][SQL] Remove comments filtering before parsing of CSV files
## What changes were proposed in this pull request?
Filtering of comments and whitespace has been performed
1 - 100 of 1029 matches
Mail list logo