Github user HyukjinKwon commented on the issue:
https://github.com/apache/spark/pull/22979
retest this please
---
-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail
Github user HyukjinKwon commented on a diff in the pull request:
https://github.com/apache/spark/pull/22939#discussion_r232208151
--- Diff: R/pkg/R/functions.R ---
@@ -2230,6 +2237,32 @@ setMethod("from_json", signature(x = "Column",
schema = &q
Github user HyukjinKwon commented on a diff in the pull request:
https://github.com/apache/spark/pull/22939#discussion_r232207412
--- Diff: R/pkg/R/functions.R ---
@@ -2230,6 +2237,32 @@ setMethod("from_json", signature(x = "Column",
schema = &q
Github user HyukjinKwon commented on a diff in the pull request:
https://github.com/apache/spark/pull/22954#discussion_r232202773
--- Diff: R/pkg/R/SQLContext.R ---
@@ -189,19 +238,67 @@ createDataFrame <- function(data, schema = NULL,
samplingRatio = 1.0,
Github user HyukjinKwon commented on the issue:
https://github.com/apache/spark/pull/22880
Let me take a look on this weekends.
---
-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional
Github user HyukjinKwon commented on the issue:
https://github.com/apache/spark/pull/22985
Merged to master.
---
-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail
Github user HyukjinKwon commented on the issue:
https://github.com/apache/spark/pull/22939
Hey @felixcheung, it should be ready for another look.
---
-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
Github user HyukjinKwon commented on the issue:
https://github.com/apache/spark/pull/22979
Also let me leave a cc for @srowen.
---
-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional
Github user HyukjinKwon commented on the issue:
https://github.com/apache/spark/pull/22951
Actually let me leave a cc for @srowen. I remember we talked about it
before.
---
-
To unsubscribe, e-mail: reviews
Github user HyukjinKwon commented on the issue:
https://github.com/apache/spark/pull/22951
Merged to master.
---
-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail
Github user HyukjinKwon commented on a diff in the pull request:
https://github.com/apache/spark/pull/22954#discussion_r232115966
--- Diff: R/pkg/R/SQLContext.R ---
@@ -147,6 +147,55 @@ getDefaultSqlSource <- function() {
l[["spark.sql.sources
Github user HyukjinKwon commented on the issue:
https://github.com/apache/spark/pull/21363
Looks difficult because the behaviours themselves are different. One
possibility is a fallback and the other possibility is configuration
Github user HyukjinKwon commented on the issue:
https://github.com/apache/spark/pull/22954
retest this please
---
-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail
Github user HyukjinKwon commented on the issue:
https://github.com/apache/zeppelin/pull/3206
Hey all ~ could this get in by any change maybe?
---
Github user HyukjinKwon commented on the issue:
https://github.com/apache/spark/pull/22954
I have finished most of todos except waiting for R API of Arrow 0.12.0 and
fixing some changes accordingly
Github user HyukjinKwon commented on a diff in the pull request:
https://github.com/apache/spark/pull/22954#discussion_r231992763
--- Diff: R/pkg/R/SQLContext.R ---
@@ -147,6 +147,55 @@ getDefaultSqlSource <- function() {
l[["spark.sql.sources
Github user HyukjinKwon commented on the issue:
https://github.com/apache/spark/pull/22979
Looks good. Will take a closer look.
---
-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional
Github user HyukjinKwon commented on the issue:
https://github.com/apache/spark/pull/22951
retest this please
---
-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail
Github user HyukjinKwon commented on the issue:
https://github.com/apache/spark/pull/22977
Actually, similar changes were being made at
https://github.com/apache/spark/pull/22967 FYI. cc @dbtsai
Github user HyukjinKwon commented on the issue:
https://github.com/apache/spark/pull/22973
Yea .. actually that's documented in
https://spark.apache.org/contributing.html . Strictly it should be` PYTHON`
> The PR title should be of the form [SPARK-][COMPONENT] Ti
Github user HyukjinKwon commented on the issue:
https://github.com/apache/spark/pull/22954
adding @falaki and @mengxr as well.
---
-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional
Github user HyukjinKwon commented on a diff in the pull request:
https://github.com/apache/spark/pull/22954#discussion_r231847272
--- Diff: R/pkg/R/SQLContext.R ---
@@ -215,14 +278,16 @@ createDataFrame <- function(data, schema = NULL,
samplingRatio =
Github user HyukjinKwon commented on the issue:
https://github.com/apache/spark/pull/22954
@felixcheung! performance improvement was **955%** ! I described the
benchmark I took in PR description
Github user HyukjinKwon commented on a diff in the pull request:
https://github.com/apache/spark/pull/22954#discussion_r231815154
--- Diff: R/pkg/R/SQLContext.R ---
@@ -147,6 +147,30 @@ getDefaultSqlSource <- function() {
l[["spark.sql.sources
Github user HyukjinKwon commented on a diff in the pull request:
https://github.com/apache/spark/pull/22954#discussion_r231814143
--- Diff: R/pkg/R/SQLContext.R ---
@@ -147,6 +147,30 @@ getDefaultSqlSource <- function() {
l[["spark.sql.sources
Github user HyukjinKwon commented on the issue:
https://github.com/apache/spark/pull/22973
Oh, let's name it `[PYTHON]` BTW.
---
-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands
Github user HyukjinKwon commented on the issue:
https://github.com/apache/spark/pull/20944
Let's revert this then if this only targeted to fix the test. We can bring
this back later when it's needed - tho, yea . This caused a specific case
failure in Livy' when restarting Hive
Github user HyukjinKwon commented on a diff in the pull request:
https://github.com/apache/spark/pull/22967#discussion_r231783302
--- Diff: docs/sparkr.md ---
@@ -133,7 +133,7 @@ specifying `--packages` with `spark-submit` or `sparkR`
commands, or if initiali
Github user HyukjinKwon commented on a diff in the pull request:
https://github.com/apache/spark/pull/22967#discussion_r231783339
--- Diff: docs/sparkr.md ---
@@ -133,7 +133,7 @@ specifying `--packages` with `spark-submit` or `sparkR`
commands, or if initiali
Github user HyukjinKwon commented on a diff in the pull request:
https://github.com/apache/spark/pull/22967#discussion_r231783212
--- Diff: docs/sparkr.md ---
@@ -133,7 +133,7 @@ specifying `--packages` with `spark-submit` or `sparkR`
commands, or if initiali
Github user HyukjinKwon commented on a diff in the pull request:
https://github.com/apache/spark/pull/22967#discussion_r231781880
--- Diff: docs/sparkr.md ---
@@ -133,7 +133,7 @@ specifying `--packages` with `spark-submit` or `sparkR`
commands, or if initiali
Github user HyukjinKwon commented on a diff in the pull request:
https://github.com/apache/spark/pull/22967#discussion_r231781676
--- Diff: docs/sparkr.md ---
@@ -133,7 +133,7 @@ specifying `--packages` with `spark-submit` or `sparkR`
commands, or if initiali
Github user HyukjinKwon commented on a diff in the pull request:
https://github.com/apache/spark/pull/22967#discussion_r231780839
--- Diff: docs/sparkr.md ---
@@ -133,7 +133,7 @@ specifying `--packages` with `spark-submit` or `sparkR`
commands, or if initiali
Github user HyukjinKwon commented on the issue:
https://github.com/apache/spark/pull/22960
Merged to master.
---
-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail
Github user HyukjinKwon commented on the issue:
https://github.com/apache/spark/pull/22958
Merged to master.
---
-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail
Github user HyukjinKwon commented on the issue:
https://github.com/apache/spark/pull/22958
@MaxGekk, BTW, can you call `verifyColumnNameOfCorruptRecord` here and
datasource as well for JSON and CSV? Of course in a separate PR
Github user HyukjinKwon commented on the issue:
https://github.com/apache/spark/pull/22958
For CSV, looks we are already doing so:
https://github.com/apache/spark/blob/76813cfa1e2607ea3b669a79e59b568e96395b2e/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst
Github user HyukjinKwon commented on a diff in the pull request:
https://github.com/apache/spark/pull/22932#discussion_r231777190
--- Diff:
sql/core/src/test/resources/sql-tests/results/describe-part-after-analyze.sql.out
---
@@ -93,7 +93,7 @@ Partition Values [ds=2017
Github user HyukjinKwon commented on a diff in the pull request:
https://github.com/apache/spark/pull/22951#discussion_r231776739
--- Diff: python/pyspark/sql/readwriter.py ---
@@ -349,7 +353,7 @@ def csv(self, path, schema=None, sep=None,
encoding=None, quote=None, escape=Non
Github user HyukjinKwon commented on a diff in the pull request:
https://github.com/apache/spark/pull/22951#discussion_r231776568
--- Diff: python/pyspark/sql/readwriter.py ---
@@ -267,7 +270,8 @@ def json(self, path, schema=None,
primitivesAsString=None, prefersDecimal=None
Github user HyukjinKwon commented on a diff in the pull request:
https://github.com/apache/spark/pull/22951#discussion_r231776396
--- Diff: python/pyspark/sql/readwriter.py ---
@@ -267,7 +270,8 @@ def json(self, path, schema=None,
primitivesAsString=None, prefersDecimal=None
Github user HyukjinKwon commented on a diff in the pull request:
https://github.com/apache/spark/pull/22951#discussion_r231775987
--- Diff: python/pyspark/sql/readwriter.py ---
@@ -446,6 +450,9 @@ def csv(self, path, schema=None, sep=None,
encoding=None, quote=None, escape=Non
Github user HyukjinKwon commented on the issue:
https://github.com/apache/spark/pull/22951
OMG, what does `Ð½Ð¾Ñ 2018` mean BTW? haha
---
-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
Github user HyukjinKwon commented on a diff in the pull request:
https://github.com/apache/spark/pull/22932#discussion_r231775020
--- Diff:
sql/core/src/test/resources/sql-tests/results/describe-part-after-analyze.sql.out
---
@@ -93,7 +93,7 @@ Partition Values [ds=2017
Github user HyukjinKwon commented on the issue:
https://github.com/apache/spark/pull/21679
I think we should close this for now then.
---
-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional
Github user HyukjinKwon commented on a diff in the pull request:
https://github.com/apache/spark/pull/22948#discussion_r231762531
--- Diff: dev/appveyor-install-dependencies.ps1 ---
@@ -115,7 +115,7 @@ $env:Path += ";$env:HADOOP_HOME\bin"
Po
Github user HyukjinKwon commented on the issue:
https://github.com/apache/spark/pull/22963
Merged to master.
---
-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail
Github user HyukjinKwon commented on the issue:
https://github.com/apache/spark/pull/22963
Thanks, @srowen and @dongjoon-hyun.
---
-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional
Github user HyukjinKwon commented on the issue:
https://github.com/apache/spark/pull/20944
To me, it sounds we made a fix because it was difficult to figure out
exactly what's going on internally. It's okay if it's difficult to reproduce
but it can be reproduce in production; however
Github user HyukjinKwon commented on the issue:
https://github.com/apache/spark/pull/20944
I understood the reproducer step in JIRA but how and why it matters? Did it
cause an actual problem in your production environment
Github user HyukjinKwon commented on the issue:
https://github.com/apache/incubator-livy/pull/121
Thanks guys!
---
Github user HyukjinKwon commented on the issue:
https://github.com/apache/spark/pull/20944
ping @wangyum, I'm going to revert this today if there's no response today.
---
-
To unsubscribe, e-mail: reviews-unsubscr
Github user HyukjinKwon commented on the issue:
https://github.com/apache/spark/pull/22938
@attilapiros, mind showing rough small test codes for it please? just want
to see if this is something we should fix
Github user HyukjinKwon commented on the issue:
https://github.com/apache/spark/pull/22938
Yea, looks fine in general. Will take a look within this week or weekends.
---
-
To unsubscribe, e-mail: reviews-unsubscr
Github user HyukjinKwon commented on a diff in the pull request:
https://github.com/apache/spark/pull/22939#discussion_r231476884
--- Diff: R/pkg/R/functions.R ---
@@ -205,11 +205,18 @@ NULL
#' also supported for the schema.
#' \item \code{from_csv
Github user HyukjinKwon commented on the issue:
https://github.com/apache/spark/pull/22963
cc @srowen, @holdenk and @rekhajoshm
---
-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional
Github user HyukjinKwon commented on a diff in the pull request:
https://github.com/apache/spark/pull/22963#discussion_r231463118
--- Diff: dev/lint-python ---
@@ -87,27 +91,46 @@ else
rm "$PYCODESTYLE_REPORT_PATH"
fi
-# stop the build if there
Github user HyukjinKwon commented on a diff in the pull request:
https://github.com/apache/spark/pull/22963#discussion_r231462182
--- Diff: dev/lint-python ---
@@ -26,9 +26,13 @@
PYCODESTYLE_REPORT_PATH="$SPARK_ROOT_DIR/dev/pycodestyle-report.txt"
PYDOCSTYLE_R
Github user HyukjinKwon commented on a diff in the pull request:
https://github.com/apache/spark/pull/22963#discussion_r231461879
--- Diff: dev/lint-python ---
@@ -26,9 +26,13 @@
PYCODESTYLE_REPORT_PATH="$SPARK_ROOT_DIR/dev/pycodestyle-report.txt"
PYDOCSTYLE_R
Github user HyukjinKwon commented on a diff in the pull request:
https://github.com/apache/spark/pull/22963#discussion_r231460594
--- Diff: dev/lint-python ---
@@ -26,9 +26,13 @@
PYCODESTYLE_REPORT_PATH="$SPARK_ROOT_DIR/dev/pycodestyle-report.txt"
PYDOCSTYLE_R
Github user HyukjinKwon commented on a diff in the pull request:
https://github.com/apache/spark/pull/22963#discussion_r231460041
--- Diff: dev/lint-python ---
@@ -87,27 +91,46 @@ else
rm "$PYCODESTYLE_REPORT_PATH"
fi
-# stop the build if there
GitHub user HyukjinKwon opened a pull request:
https://github.com/apache/spark/pull/22963
[SPARK-25962][BUILD][PYTHON]Specify minimum versions for both pydocstyle
and flake8 in 'lint-python' script
## What changes were proposed in this pull request?
This PR explicitly
Github user HyukjinKwon commented on the issue:
https://github.com/apache/spark/pull/22954
For encryption stuff, I will try to handle that as well (maybe as a
followup(?)) so that we support it even when that's enabled
Github user HyukjinKwon commented on the issue:
https://github.com/apache/spark/pull/22954
Thanks, @felixcheung. I will address those comments during cleaning up.
---
-
To unsubscribe, e-mail: reviews-unsubscr
Github user HyukjinKwon commented on a diff in the pull request:
https://github.com/apache/spark/pull/22954#discussion_r231414561
--- Diff: R/pkg/R/SQLContext.R ---
@@ -147,6 +147,30 @@ getDefaultSqlSource <- function() {
l[["spark.sql.sources
Github user HyukjinKwon commented on a diff in the pull request:
https://github.com/apache/spark/pull/22960#discussion_r231413853
--- Diff:
sql/core/src/test/scala/org/apache/spark/sql/CsvFunctionsSuite.scala ---
@@ -86,4 +86,82 @@ class CsvFunctionsSuite extends QueryTest
Github user HyukjinKwon commented on the issue:
https://github.com/apache/spark/pull/22951
retest this please
---
-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail
Github user HyukjinKwon commented on the issue:
https://github.com/apache/zeppelin/pull/3206
Thank you guys!
---
Github user HyukjinKwon commented on the issue:
https://github.com/apache/spark/pull/20944
Please describe manual tests and how it relates to actual usecase.
---
-
To unsubscribe, e-mail: reviews-unsubscr
Github user HyukjinKwon commented on a diff in the pull request:
https://github.com/apache/spark/pull/22939#discussion_r231404180
--- Diff: R/pkg/R/functions.R ---
@@ -2230,6 +2237,32 @@ setMethod("from_json", signature(x = "Column",
schema = &q
Github user HyukjinKwon commented on the issue:
https://github.com/apache/incubator-livy/pull/121
@vanzin, looks
https://github.com/apache/spark/commit/a75571b46f813005a6d4b076ec39081ffab11844#diff-f697551d2f00bfb336406b6fe6b516fe
causes the test failure. At the very least, I can see
Github user HyukjinKwon commented on the issue:
https://github.com/apache/spark/pull/20944
Sorry, why was this change required? I don't see
https://github.com/apache/spark/pull/20944#issuecomment-379525776 is addressed
Can you elaborate please? Why do we make `org.apache.derby
Github user HyukjinKwon commented on the issue:
https://github.com/apache/spark/pull/15899
+1 for the decision and closing it.
---
-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional
Github user HyukjinKwon commented on the issue:
https://github.com/apache/incubator-livy/pull/121
@vanzin, weird.
```
$ ./bin/spark-shell
scala> sql("CREATE TABLE tblA(a int)")
scala> spark.stop()
```
```
$ rm -fr metastore_db
Github user HyukjinKwon commented on the issue:
https://github.com/apache/spark/pull/22951
Looks good. I or someone else should take a closer look before getting this
in.
---
-
To unsubscribe, e-mail: reviews
Github user HyukjinKwon commented on the issue:
https://github.com/apache/spark/pull/22956
Merged to master.
---
-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail
Github user HyukjinKwon commented on a diff in the pull request:
https://github.com/apache/spark/pull/22956#discussion_r231370599
--- Diff:
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/csvExpressions.scala
---
@@ -92,8 +93,14 @@ case class CsvToStructs
Github user HyukjinKwon commented on the issue:
https://github.com/apache/spark/pull/22590
I wonder how important it is. I know `spark-csv` at Databricks supported
different quote modes and that's gone when we ported that into Spark - the root
cause was due to replacing the library
Github user HyukjinKwon commented on the issue:
https://github.com/apache/spark/pull/22590
They should be documented in API doc like `DataFrameReader.scala`. For
site, we should avoid doc duplication - It's a general issue to document
options
Github user HyukjinKwon commented on the issue:
https://github.com/apache/spark/pull/22954
So far, the regressions tests are passed and newly added test for R
optimization is verified locally. Let me fix CRAN test and some nits
Github user HyukjinKwon commented on the issue:
https://github.com/apache/spark/pull/22956
Looks good. I or someone else should take a closer look before getting this
in.
---
-
To unsubscribe, e-mail: reviews
Github user HyukjinKwon commented on a diff in the pull request:
https://github.com/apache/spark/pull/22960#discussion_r231344120
--- Diff:
sql/core/src/test/scala/org/apache/spark/sql/CsvFunctionsSuite.scala ---
@@ -86,4 +86,82 @@ class CsvFunctionsSuite extends QueryTest
Github user HyukjinKwon commented on a diff in the pull request:
https://github.com/apache/incubator-livy/pull/121#discussion_r23134
--- Diff: rsc/src/test/java/org/apache/livy/rsc/TestSparkClient.java ---
@@ -271,7 +275,7 @@ public void call(LivyClient client) throws Exception
Github user HyukjinKwon commented on a diff in the pull request:
https://github.com/apache/incubator-livy/pull/121#discussion_r231342303
--- Diff: rsc/src/test/java/org/apache/livy/rsc/TestSparkClient.java ---
@@ -271,7 +275,7 @@ public void call(LivyClient client) throws Exception
Github user HyukjinKwon commented on the issue:
https://github.com/apache/spark/pull/22954
Adding @yanghaogn as well fyi
---
-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e
Github user HyukjinKwon commented on the issue:
https://github.com/apache/spark/pull/22954
Let me leave a cc @felixcheung, @BryanCutler FYI.
---
-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
GitHub user HyukjinKwon opened a pull request:
https://github.com/apache/spark/pull/22954
[DO-NOT-MERGE][POC] Enables Arrow optimization from R DataFrame to Spark
DataFrame
## What changes were proposed in this pull request?
This PR is not for merging it but targets
Github user HyukjinKwon commented on a diff in the pull request:
https://github.com/apache/spark/pull/22939#discussion_r231029629
--- Diff: R/pkg/R/functions.R ---
@@ -2230,6 +2237,32 @@ setMethod("from_json", signature(x = "Column",
schema = &q
Github user HyukjinKwon commented on the issue:
https://github.com/apache/spark/pull/22940
Hm .. okay. let me close this for now.
---
-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional
Github user HyukjinKwon closed the pull request at:
https://github.com/apache/spark/pull/22940
---
-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h
Github user HyukjinKwon commented on the issue:
https://github.com/apache/spark/pull/22940
Hmmm yea but like .. some of classes similar with this case have been
renamed time to time, for instancem `json InferSchema` -> `json
JSONInferSchema` when CSV datasource was ad
Github user HyukjinKwon commented on the issue:
https://github.com/apache/spark/pull/22305
Let me try to take a look this weekends. Sorry it's been delayed.
---
-
To unsubscribe, e-mail: reviews-unsubscr
Github user HyukjinKwon commented on the issue:
https://github.com/apache/zeppelin/pull/3206
Ah, yeap. sure.
---
Github user HyukjinKwon commented on the issue:
https://github.com/apache/zeppelin/pull/3206
This should be ready for a look as is. I already tested Spark 2.4.0.
---
Github user HyukjinKwon commented on the issue:
https://github.com/apache/incubator-livy/pull/121
Yup, let me rebase and get rid of [WIP] tag.
---
Github user HyukjinKwon commented on the issue:
https://github.com/apache/spark/pull/22953
Looks good to me.
---
-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail
Github user HyukjinKwon commented on the issue:
https://github.com/apache/spark/pull/22919
Merged to master and branch-2.4.
---
-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands
Github user HyukjinKwon commented on the issue:
https://github.com/apache/spark/pull/22940
hmhm .. it's trivial and yea it is a logical change. I happened to take a
look some codes around here lately, and the name `SQLUtils` actually annoyed me
few times :(. I will leave
Github user HyukjinKwon commented on the issue:
https://github.com/apache/spark/pull/22948
cc @felixcheung
---
-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews
GitHub user HyukjinKwon opened a pull request:
https://github.com/apache/spark/pull/22948
[SPARK-25944][R][BUILD] AppVeyor change to latest R version (3.5.1)
## What changes were proposed in this pull request?
R 3.5.1 is released 2018-07-02. This PR targets to changes R
501 - 600 of 12680 matches
Mail list logo