Github user HyukjinKwon commented on the issue:
https://github.com/apache/spark/pull/22893
Please fix the PR title as described in
https://spark.apache.org/contributing.html and read it.
---
-
To unsubscribe, e
Github user HyukjinKwon commented on a diff in the pull request:
https://github.com/apache/spark/pull/22844#discussion_r229243855
--- Diff: sql/core/benchmarks/JSONBenchmarks-results.txt ---
@@ -0,0 +1,33
Github user HyukjinKwon commented on the issue:
https://github.com/apache/spark/pull/22888
I would close this, @351zyf.
---
-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e
Github user HyukjinKwon commented on the issue:
https://github.com/apache/spark/pull/22888
You're introducing a flag to convert. I think it's virtually same enabling
the flag vs calling a function to convert
Github user HyukjinKwon commented on a diff in the pull request:
https://github.com/apache/spark/pull/22844#discussion_r229214337
--- Diff: sql/core/benchmarks/JSONBenchmarks-results.txt ---
@@ -0,0 +1,33
Github user HyukjinKwon commented on a diff in the pull request:
https://github.com/apache/spark/pull/22844#discussion_r229213742
--- Diff: sql/core/benchmarks/JSONBenchmarks-results.txt ---
@@ -0,0 +1,33
Github user HyukjinKwon commented on a diff in the pull request:
https://github.com/apache/spark/pull/22844#discussion_r229212923
--- Diff: sql/core/benchmarks/JSONBenchmarks-results.txt ---
@@ -0,0 +1,33
Github user HyukjinKwon commented on the issue:
https://github.com/apache/spark/pull/22888
Then, you can convert the type into double or floats in Spark DataFrame.
This is super easily able to work around at Pandas DataFrame or Spark's
DataFrame. I don't think we should add this flag
Github user HyukjinKwon commented on the issue:
https://github.com/apache/spark/pull/22885
retest this please
---
-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail
Github user HyukjinKwon commented on the issue:
https://github.com/apache/spark/pull/22888
I think you can just manually convert from Pandas DataFrame, no?
---
-
To unsubscribe, e-mail: reviews-unsubscr
Github user HyukjinKwon commented on the issue:
https://github.com/apache/incubator-livy/pull/121
As of RC3, all the unit tests were passed
(https://travis-ci.org/HyukjinKwon/incubator-livy/builds/441687251).
I am running tests against RC 5 -
https://travis-ci.org
Github user HyukjinKwon commented on the issue:
https://github.com/apache/spark/pull/16429
This is fixed from Spark 1.6.4, 2.0.3, 2.1.1 and 2.2.0.
---
-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
Github user HyukjinKwon commented on the issue:
https://github.com/apache/spark/pull/22878
Just quickly and roughly tested. Merge script looks only recognising main
author of each commit in a PR. Let's just push a commit into here
Github user HyukjinKwon commented on the issue:
https://github.com/apache/spark/pull/22878
I wonder if that can be handled by merge script tho. I think it's okay just
to pick up some commits there and rebase them to here even if they are empty
commits. That's easier for committers
Github user HyukjinKwon commented on the issue:
https://github.com/apache/spark/pull/22666
Thanks, @cloud-fan. The change looks good to me from my side. Let me take
another look for this and leave a sign-off (which means a sign-off for
@MaxGekk's code changes
Github user HyukjinKwon commented on the issue:
https://github.com/apache/spark/pull/22877
Thanks, @kiszk and @dongjoon-hyun
---
-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional
Github user HyukjinKwon commented on the issue:
https://github.com/apache/spark/pull/22877
Merged to master
---
-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews
Github user HyukjinKwon commented on the issue:
https://github.com/apache/spark/pull/21654
Thanks, @holdenk for addressing my concern. I will try to join as well.
---
-
To unsubscribe, e-mail: reviews-unsubscr
Github user HyukjinKwon commented on a diff in the pull request:
https://github.com/apache/spark/pull/22666#discussion_r228949792
--- Diff: sql/core/src/test/resources/sql-tests/inputs/csv-functions.sql ---
@@ -7,3 +7,11 @@ select from_csv('1', 'a InvalidType');
select
Github user HyukjinKwon commented on the issue:
https://github.com/apache/spark/pull/21588
Yes, that was what I was thinking at worst case. For clarification,
@wangyum made a try and all tests were passed at least -
https://github.com/apache/spark/pull/20659. Given this try, I think
GitHub user HyukjinKwon opened a pull request:
https://github.com/apache/spark/pull/22877
[MINOR][SQL] Avoid hardcoded configuration keys in SQLConf's `doc`
## What changes were proposed in this pull request?
This PR proposes to avoid hardcorded configuration keys
Github user HyukjinKwon commented on the issue:
https://github.com/apache/spark/pull/22872
retest this please
---
-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail
Github user HyukjinKwon commented on the issue:
https://github.com/apache/spark/pull/22530
retest this please
---
-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail
Github user HyukjinKwon commented on the issue:
https://github.com/apache/spark/pull/22530
retest this please
--
---
-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional
Github user HyukjinKwon commented on a diff in the pull request:
https://github.com/apache/spark/pull/22078#discussion_r228882841
--- Diff:
sql/catalyst/src/main/scala/org/apache/spark/sql/internal/SQLConf.scala ---
@@ -626,6 +626,14 @@ object SQLConf {
.stringConf
Github user HyukjinKwon commented on a diff in the pull request:
https://github.com/apache/spark/pull/22078#discussion_r228881996
--- Diff:
sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/InsertIntoHadoopFsRelationCommand.scala
---
@@ -70,7 +76,6 @@ case class
Github user HyukjinKwon commented on a diff in the pull request:
https://github.com/apache/spark/pull/22078#discussion_r228881824
--- Diff:
sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/InsertIntoHadoopFsRelationCommand.scala
---
@@ -261,4 +272,69 @@ case
Github user HyukjinKwon commented on the issue:
https://github.com/apache/zeppelin/pull/3206
It should be usable if the changes is cherry-picked properly. This PR
basically just replace one line:
https://github.com/apache/zeppelin/blob/v0.8.0/spark/scala-2.11/src/main/scala
Github user HyukjinKwon commented on the issue:
https://github.com/apache/spark/pull/22275
retest this please
---
-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail
Github user HyukjinKwon commented on the issue:
https://github.com/apache/zeppelin/pull/3206
The error message:
```
[ERROR]
/home/cloud-user/ajay/code/csf-cc-zeppelin-k8szep/spark/scala-2.11/src/main/scala/org/apache/zeppelin/spark/SparkScala211Interpreter.scala:37
Github user HyukjinKwon commented on the issue:
https://github.com/apache/zeppelin/pull/3206
Does that happen only with this code changes? The change here does not
touch signature at `class SparkScala211Interpreter(` and the error message
looks pretty unrelated. The whole change
Github user HyukjinKwon commented on the issue:
https://github.com/apache/spark/pull/22870
retest this please
---
-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail
Github user HyukjinKwon commented on the issue:
https://github.com/apache/spark/pull/22871
Thanks, @dongjoon-hyun and @gatorsmile
---
-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional
Github user HyukjinKwon commented on the issue:
https://github.com/apache/spark/pull/22530
retest this please
---
-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail
Github user HyukjinKwon commented on the issue:
https://github.com/apache/spark/pull/22326
late LGTM
---
-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h
Github user HyukjinKwon commented on a diff in the pull request:
https://github.com/apache/spark/pull/22847#discussion_r228789484
--- Diff:
sql/catalyst/src/main/scala/org/apache/spark/sql/internal/SQLConf.scala ---
@@ -812,6 +812,17 @@ object SQLConf {
.intConf
Github user HyukjinKwon commented on a diff in the pull request:
https://github.com/apache/spark/pull/22666#discussion_r228787126
--- Diff:
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/ExprUtils.scala
---
@@ -19,14 +19,39 @@ package
Github user HyukjinKwon commented on the issue:
https://github.com/apache/spark/pull/22871
cc @BryanCutler and @gatorsmile.
---
-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands
GitHub user HyukjinKwon opened a pull request:
https://github.com/apache/spark/pull/22871
[SPARK-25179][PYTHON][DOCS] Document BinaryType support in Arrow conversion
## What changes were proposed in this pull request?
This PR targets to document binary type in "Apache
Github user HyukjinKwon commented on the issue:
https://github.com/apache/spark/pull/21588
@dongjoon-hyun and @wangyum, please fix my comment if I am wrong at any
point - I believe you guys took a look for this part more then I did
Github user HyukjinKwon commented on the issue:
https://github.com/apache/spark/pull/21588
> Does this upgrade Hive for execution or also for metastore? Spark
supports virtually all Hive metastore versions out there, and a lot of
deployments do run different versions of Spark agai
Github user HyukjinKwon commented on a diff in the pull request:
https://github.com/apache/spark/pull/22868#discussion_r228776349
--- Diff: docs/sql-migration-guide-hive-compatibility.md ---
@@ -51,6 +51,9 @@ Spark SQL supports the vast majority of Hive features
Github user HyukjinKwon commented on a diff in the pull request:
https://github.com/apache/spark/pull/22865#discussion_r228776300
--- Diff:
sql/catalyst/src/main/scala/org/apache/spark/sql/internal/SQLConf.scala ---
@@ -462,7 +462,7 @@ object SQLConf {
val
Github user HyukjinKwon commented on the issue:
https://github.com/apache/spark/pull/22858
Oops, mind fixing PR title too?
---
-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands
Github user HyukjinKwon commented on the issue:
https://github.com/apache/spark/pull/22858
@cloud-fan, thanks for doing this backport!
---
-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
Github user HyukjinKwon commented on the issue:
https://github.com/apache/spark/pull/22858
Merged to branch-2.4.
---
-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail
Github user HyukjinKwon commented on a diff in the pull request:
https://github.com/apache/spark/pull/22865#discussion_r228731568
--- Diff:
sql/catalyst/src/main/scala/org/apache/spark/sql/internal/SQLConf.scala ---
@@ -462,7 +462,7 @@ object SQLConf {
val
Github user HyukjinKwon commented on a diff in the pull request:
https://github.com/apache/spark/pull/22865#discussion_r228731385
--- Diff:
sql/catalyst/src/main/scala/org/apache/spark/sql/internal/SQLConf.scala ---
@@ -462,7 +462,7 @@ object SQLConf {
val
Github user HyukjinKwon commented on the issue:
https://github.com/apache/spark/pull/22858
Yup, I think strictly we should change. Looks there are two occurrences at
`udf` and `pands_udf` `isinstance(..., str)`.
Another problem at PySpark is, inconsistent type comparison like
Github user HyukjinKwon commented on a diff in the pull request:
https://github.com/apache/spark/pull/22858#discussion_r228731178
--- Diff: python/pyspark/sql/functions.py ---
@@ -2326,7 +2326,7 @@ def schema_of_json(json):
>>> df.select(schema_of_json('{"
Github user HyukjinKwon commented on the issue:
https://github.com/apache/spark/pull/21157
I meant to use
https://github.com/apache/spark/blob/a97001d21757ae214c86371141bd78a376200f66/python/pyspark/serializers.py#L583
Instead of
https://github.com/apache
Github user HyukjinKwon commented on a diff in the pull request:
https://github.com/apache/spark/pull/22858#discussion_r228713086
--- Diff: python/pyspark/sql/functions.py ---
@@ -2326,7 +2326,7 @@ def schema_of_json(json):
>>> df.select(schema_of_json('{"
Github user HyukjinKwon commented on the issue:
https://github.com/apache/spark/pull/22858
Wenchen, this is because
```python
if sys.version >= '3':
basestring = str
```
Is missing. Python 3 does not have `basestr
Github user HyukjinKwon commented on the issue:
https://github.com/apache/spark/pull/21157
Adding @gatorsmile and @cloud-fan as well since this might be potentially
breaking changes for 3.0 release (it affects RDD operation only with namedtuple
in certain case tho
Github user HyukjinKwon commented on the issue:
https://github.com/apache/spark/pull/21157
And you can also run profiler to show the performance effect. See
https://github.com/apache/spark/pull/19246#discussion_r139874732 to run the
profile
Github user HyukjinKwon commented on the issue:
https://github.com/apache/spark/pull/21157
You can just replace it to CloudPickler, remove changes at tests, and push
that commit here to show no case is broken
Github user HyukjinKwon commented on the issue:
https://github.com/apache/spark/pull/22666
retest this please
---
-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail
Github user HyukjinKwon commented on the issue:
https://github.com/apache/spark/pull/22666
retest this please
---
-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail
Github user HyukjinKwon commented on the issue:
https://github.com/apache/spark/pull/20503
ok to test
---
-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h
Github user HyukjinKwon commented on the issue:
https://github.com/apache/spark/pull/22775
Oh you mean the conflict fixing is not that hard. Thanks for doing this
@cloud-fan. I planned to do this today
Github user HyukjinKwon commented on the issue:
https://github.com/apache/spark/pull/21157
Yea, so to avoid to break, we could change the default pickler to
CloudPickler or document this workaround. @superbobry, can you check if the
case can be preserved if we use CloudPickler
Github user HyukjinKwon commented on the issue:
https://github.com/apache/spark/pull/22775
Yea, but I meant a bit complicated but I'm okay in that way @cloud-fan.
Thanks for doing that. I planed to do it today (now
Github user HyukjinKwon commented on the issue:
https://github.com/apache/spark/pull/21588
> Hive 2.3 works with Hadoop 2.x (Hive 3.x works with Hadoop 3.x).
This is essentially what we need for Hadoop 3 support
[release-2.3.2|https://github.com/apache/hive/blob/rel/rele
Github user HyukjinKwon commented on the issue:
https://github.com/apache/spark/pull/22775
Sure!
---
-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h
Github user HyukjinKwon commented on the issue:
https://github.com/apache/spark/pull/22850
Yea, I was aware of it. I think there are some more old comments in this
file if I remember this correctly. Can you double check and fix them while we
are here
Github user HyukjinKwon commented on the issue:
https://github.com/apache/spark/pull/22850
ok to test
---
-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h
Github user HyukjinKwon commented on a diff in the pull request:
https://github.com/apache/spark/pull/22775#discussion_r228520891
--- Diff: python/pyspark/sql/functions.py ---
@@ -2365,30 +2365,32 @@ def to_json(col, options={}):
@ignore_unicode_prefix
@since(2.4
Github user HyukjinKwon commented on a diff in the pull request:
https://github.com/apache/spark/pull/22775#discussion_r228504453
--- Diff: python/pyspark/sql/functions.py ---
@@ -2365,30 +2365,32 @@ def to_json(col, options={}):
@ignore_unicode_prefix
@since(2.4
Github user HyukjinKwon commented on the issue:
https://github.com/apache/spark/pull/22771
retest this please
---
-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail
Github user HyukjinKwon commented on the issue:
https://github.com/apache/spark/pull/22775
Yup, yup .. I should sync the tests
---
-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional
Github user HyukjinKwon commented on the issue:
https://github.com/apache/spark/pull/22814
Merged to master
---
-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews
Github user HyukjinKwon commented on the issue:
https://github.com/apache/spark/pull/21588
Yup, it supports Hadoop 3, and other fixes what @wangyum mentioned.
---
-
To unsubscribe, e-mail: reviews-unsubscr
Github user HyukjinKwon commented on the issue:
https://github.com/apache/spark/pull/22814
LGTM too
---
-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h
Github user HyukjinKwon commented on a diff in the pull request:
https://github.com/apache/spark/pull/22814#discussion_r228381742
--- Diff: docs/sql-data-sources-avro.md ---
@@ -177,6 +180,19 @@ Data source options of Avro can be set using the
`.option` method on `DataFrameR
Github user HyukjinKwon commented on a diff in the pull request:
https://github.com/apache/spark/pull/22814#discussion_r228380951
--- Diff:
external/avro/src/main/scala/org/apache/spark/sql/avro/package.scala ---
@@ -31,10 +32,32 @@ package object avro {
* @since 2.4.0
Github user HyukjinKwon commented on a diff in the pull request:
https://github.com/apache/spark/pull/22814#discussion_r228380639
--- Diff:
external/avro/src/test/scala/org/apache/spark/sql/avro/AvroFunctionsSuite.scala
---
@@ -61,6 +59,24 @@ class AvroFunctionsSuite extends
Github user HyukjinKwon commented on the issue:
https://github.com/apache/spark/pull/22827
LGTM too
---
-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h
Github user HyukjinKwon commented on the issue:
https://github.com/apache/spark/pull/22841
Looks good to me.
---
-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail
Github user HyukjinKwon commented on a diff in the pull request:
https://github.com/apache/spark/pull/22841#discussion_r228376996
--- Diff: python/pyspark/sql/window.py ---
@@ -239,34 +212,27 @@ def rangeBetween(self, start, end):
and "5" means the five
Github user HyukjinKwon commented on a diff in the pull request:
https://github.com/apache/spark/pull/22815#discussion_r228376272
--- Diff: R/pkg/R/SQLContext.R ---
@@ -434,6 +388,7 @@ read.orc <- function(path, ...) {
#' Loads a Parquet file, returning the res
Github user HyukjinKwon commented on a diff in the pull request:
https://github.com/apache/spark/pull/22841#discussion_r228376015
--- Diff: python/pyspark/sql/window.py ---
@@ -239,34 +212,27 @@ def rangeBetween(self, start, end):
and "5" means the five
Github user HyukjinKwon commented on the issue:
https://github.com/apache/spark/pull/16812
This can be easily worked around, no?
---
-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional
Github user HyukjinKwon commented on the issue:
https://github.com/apache/spark/pull/22841
Yup, I also agree with this revert.
---
-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional
Github user HyukjinKwon commented on the issue:
https://github.com/apache/spark/pull/22775
Maybe I am too much careful about it but I am kind of nervous about this
column case. I don't intend to disallow it entirely but only for Spark 2.4. We
might have to find a way to use column
Github user HyukjinKwon commented on the issue:
https://github.com/apache/spark/pull/22775
Actually, that usecase can more easily accomplished by simply inferring
schema by JSON datasource. Yea, I indeed suggested that as workaround for this
issue before. Let's say, `spark.read.json
Github user HyukjinKwon commented on the issue:
https://github.com/apache/spark/pull/22621
That's my point. Why do we have to document for fixing unexpected results
fixed
---
-
To unsubscribe, e-mail: reviews
Github user HyukjinKwon commented on the issue:
https://github.com/apache/spark/pull/22747
Yup, that's similar argument I had in
https://github.com/apache/spark/pull/22773#issuecomment-432923361 I think we
should clarify what to document
Github user HyukjinKwon commented on a diff in the pull request:
https://github.com/apache/spark/pull/22814#discussion_r228113346
--- Diff: docs/sql-migration-guide-upgrade.md ---
@@ -10,6 +10,9 @@ displayTitle: Spark SQL Upgrading Guide
## Upgrading From Spark SQL 2.4 to 3.0
Github user HyukjinKwon commented on a diff in the pull request:
https://github.com/apache/spark/pull/22814#discussion_r228115771
--- Diff:
external/avro/src/test/scala/org/apache/spark/sql/avro/AvroFunctionsSuite.scala
---
@@ -61,6 +59,24 @@ class AvroFunctionsSuite extends
Github user HyukjinKwon commented on the issue:
https://github.com/apache/spark/pull/22775
retest this please
---
-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail
Github user HyukjinKwon commented on a diff in the pull request:
https://github.com/apache/spark/pull/22814#discussion_r228065259
--- Diff:
external/avro/src/main/scala/org/apache/spark/sql/avro/AvroDataToCatalyst.scala
---
@@ -21,16 +21,31 @@ import org.apache.avro.Schema
Github user HyukjinKwon commented on the issue:
https://github.com/apache/spark/pull/22621
Let's say, this can be behaivour changes too since metrics are now changed.
Should we update migration guide for safety
Github user HyukjinKwon commented on the issue:
https://github.com/apache/spark/pull/22690
cc @cloud-fan and @gatorsmile
Should we update migration guide as well?
---
-
To unsubscribe, e-mail: reviews
Github user HyukjinKwon commented on the issue:
https://github.com/apache/spark/pull/22503
@justinuang, this might affect existing users application. Although this
matches the behaviour to non-miltiline mode, can we explicitly mention it in
migration guide?
cc @cloud-fan
Github user HyukjinKwon commented on the issue:
https://github.com/apache/spark/pull/22747
This looks also external changes to existing application users. Shall we
update migration guide?
---
-
To unsubscribe, e
Github user HyukjinKwon commented on the issue:
https://github.com/apache/spark/pull/22773
Yup, will encourage to update the migration guide in that way.
---
-
To unsubscribe, e-mail: reviews-unsubscr
Github user HyukjinKwon commented on the issue:
https://github.com/apache/spark/pull/22728
(From https://github.com/apache/spark/pull/22773#issuecomment-432917994)
@gatorsmile and @cloud-fan, let's say this will break `DESCRIBE FUNCTION
EXTENDED`. Should we update migration guide
Github user HyukjinKwon commented on the issue:
https://github.com/apache/spark/pull/22815
BTW, should we update migration guide too?
---
-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional
Github user HyukjinKwon commented on the issue:
https://github.com/apache/spark/pull/22773
Sure, so for clarification, we will document everything that affects to
external users application, right
Github user HyukjinKwon commented on the issue:
https://github.com/apache/spark/pull/22775
@cloud-fan, looks we are going to start another RC. Would you mind if I ask
to take a quick look before the new RC
701 - 800 of 12680 matches
Mail list logo