[GitHub] spark issue #22893: One part of Spark MLlib Kmean Logic Performance problem

2018-10-30 Thread HyukjinKwon
Github user HyukjinKwon commented on the issue:

https://github.com/apache/spark/pull/22893
  
Please fix the PR title as described in 
https://spark.apache.org/contributing.html and read it.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #22844: [SPARK-25847][SQL][TEST] Refactor JSONBenchmarks ...

2018-10-30 Thread HyukjinKwon
Github user HyukjinKwon commented on a diff in the pull request:

https://github.com/apache/spark/pull/22844#discussion_r229243855
  
--- Diff: sql/core/benchmarks/JSONBenchmarks-results.txt ---
@@ -0,0 +1,33 @@

+
+Benchmark for performance of JSON parsing

+
+
+OpenJDK 64-Bit Server VM 1.8.0_163-b01 on Windows 7 6.1
+Intel64 Family 6 Model 94 Stepping 3, GenuineIntel
+JSON schema inferring:   Best/Avg Time(ms)Rate(M/s)   
Per Row(ns)   Relative

+
+No encoding 48088 / 48180  2.1 
480.9   1.0X
+UTF-8 is set71881 / 71992  1.4 
718.8   0.7X
+
+OpenJDK 64-Bit Server VM 1.8.0_163-b01 on Windows 7 6.1
+Intel64 Family 6 Model 94 Stepping 3, GenuineIntel
+JSON per-line parsing:   Best/Avg Time(ms)Rate(M/s)   
Per Row(ns)   Relative

+
+No encoding 12107 / 12246  8.3 
121.1   1.0X
+UTF-8 is set12375 / 12475  8.1 
123.8   1.0X
--- End diff --

Ah, I see. This is also because of count optimization. ratio is weird but 
actually it's performance improvement for both cases. shouldn't be a big deal.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #22888: SPARK-25881

2018-10-30 Thread HyukjinKwon
Github user HyukjinKwon commented on the issue:

https://github.com/apache/spark/pull/22888
  
I would close this, @351zyf.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #22888: SPARK-25881

2018-10-30 Thread HyukjinKwon
Github user HyukjinKwon commented on the issue:

https://github.com/apache/spark/pull/22888
  
You're introducing a flag to convert. I think it's virtually same enabling 
the flag vs calling a function to convert.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #22844: [SPARK-25847][SQL][TEST] Refactor JSONBenchmarks ...

2018-10-30 Thread HyukjinKwon
Github user HyukjinKwon commented on a diff in the pull request:

https://github.com/apache/spark/pull/22844#discussion_r229214337
  
--- Diff: sql/core/benchmarks/JSONBenchmarks-results.txt ---
@@ -0,0 +1,33 @@

+
+Benchmark for performance of JSON parsing

+
+
+OpenJDK 64-Bit Server VM 1.8.0_163-b01 on Windows 7 6.1
+Intel64 Family 6 Model 94 Stepping 3, GenuineIntel
+JSON schema inferring:   Best/Avg Time(ms)Rate(M/s)   
Per Row(ns)   Relative

+
+No encoding 48088 / 48180  2.1 
480.9   1.0X
+UTF-8 is set71881 / 71992  1.4 
718.8   0.7X
+
+OpenJDK 64-Bit Server VM 1.8.0_163-b01 on Windows 7 6.1
+Intel64 Family 6 Model 94 Stepping 3, GenuineIntel
+JSON per-line parsing:   Best/Avg Time(ms)Rate(M/s)   
Per Row(ns)   Relative

+
+No encoding 12107 / 12246  8.3 
121.1   1.0X
+UTF-8 is set12375 / 12475  8.1 
123.8   1.0X
--- End diff --

Let me take a quick look within few days. This is per line basic case where 
many users are affected.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #22844: [SPARK-25847][SQL][TEST] Refactor JSONBenchmarks ...

2018-10-30 Thread HyukjinKwon
Github user HyukjinKwon commented on a diff in the pull request:

https://github.com/apache/spark/pull/22844#discussion_r229213742
  
--- Diff: sql/core/benchmarks/JSONBenchmarks-results.txt ---
@@ -0,0 +1,33 @@

+
+Benchmark for performance of JSON parsing

+
+
+OpenJDK 64-Bit Server VM 1.8.0_163-b01 on Windows 7 6.1
+Intel64 Family 6 Model 94 Stepping 3, GenuineIntel
+JSON schema inferring:   Best/Avg Time(ms)Rate(M/s)   
Per Row(ns)   Relative

+
+No encoding 48088 / 48180  2.1 
480.9   1.0X
+UTF-8 is set71881 / 71992  1.4 
718.8   0.7X
+
+OpenJDK 64-Bit Server VM 1.8.0_163-b01 on Windows 7 6.1
+Intel64 Family 6 Model 94 Stepping 3, GenuineIntel
+JSON per-line parsing:   Best/Avg Time(ms)Rate(M/s)   
Per Row(ns)   Relative

+
+No encoding 12107 / 12246  8.3 
121.1   1.0X
+UTF-8 is set12375 / 12475  8.1 
123.8   1.0X
--- End diff --

IIRC, this benchmark was added rather we can make sure setting encoding 
does not affect the performance without encoding (right @MaxGekk ?). We should 
fix this. @cloud-fan


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #22844: [SPARK-25847][SQL][TEST] Refactor JSONBenchmarks ...

2018-10-30 Thread HyukjinKwon
Github user HyukjinKwon commented on a diff in the pull request:

https://github.com/apache/spark/pull/22844#discussion_r229212923
  
--- Diff: sql/core/benchmarks/JSONBenchmarks-results.txt ---
@@ -0,0 +1,33 @@

+
+Benchmark for performance of JSON parsing

+
+
+OpenJDK 64-Bit Server VM 1.8.0_163-b01 on Windows 7 6.1
+Intel64 Family 6 Model 94 Stepping 3, GenuineIntel
+JSON schema inferring:   Best/Avg Time(ms)Rate(M/s)   
Per Row(ns)   Relative

+
+No encoding 48088 / 48180  2.1 
480.9   1.0X
+UTF-8 is set71881 / 71992  1.4 
718.8   0.7X
+
+OpenJDK 64-Bit Server VM 1.8.0_163-b01 on Windows 7 6.1
+Intel64 Family 6 Model 94 Stepping 3, GenuineIntel
+JSON per-line parsing:   Best/Avg Time(ms)Rate(M/s)   
Per Row(ns)   Relative

+
+No encoding 12107 / 12246  8.3 
121.1   1.0X
+UTF-8 is set12375 / 12475  8.1 
123.8   1.0X
--- End diff --

Wait .. this is almost 50% slower. This had to be around 8000ish.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #22888: SPARK-25881

2018-10-30 Thread HyukjinKwon
Github user HyukjinKwon commented on the issue:

https://github.com/apache/spark/pull/22888
  
Then, you can convert the type into double or floats in Spark DataFrame. 
This is super easily able to work around at Pandas DataFrame or Spark's 
DataFrame. I don't think we should add this flag.

BTW, the same feature should be added to when Arrow optimization is enabled 
as well.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #22885: [BUILD][MINOR] release script should not interrupt by sv...

2018-10-30 Thread HyukjinKwon
Github user HyukjinKwon commented on the issue:

https://github.com/apache/spark/pull/22885
  
retest this please


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #22888: SPARK-25881

2018-10-30 Thread HyukjinKwon
Github user HyukjinKwon commented on the issue:

https://github.com/apache/spark/pull/22888
  
I think you can just manually convert from Pandas DataFrame, no?


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] incubator-livy issue #121: [WIP][LIVY-518][BUILD] Support Spark 2.4

2018-10-29 Thread HyukjinKwon
Github user HyukjinKwon commented on the issue:

https://github.com/apache/incubator-livy/pull/121
  
As of RC3, all the unit tests were passed 
(https://travis-ci.org/HyukjinKwon/incubator-livy/builds/441687251).

I am running tests against RC 5 - 
https://travis-ci.org/HyukjinKwon/incubator-livy/builds/448148672


---


[GitHub] spark issue #16429: [SPARK-19019][PYTHON] Fix hijacked `collections.namedtup...

2018-10-29 Thread HyukjinKwon
Github user HyukjinKwon commented on the issue:

https://github.com/apache/spark/pull/16429
  
This is fixed from Spark 1.6.4, 2.0.3, 2.1.1 and 2.2.0.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #22878: [SPARK-25789][SQL] Support for Dataset of Avro

2018-10-29 Thread HyukjinKwon
Github user HyukjinKwon commented on the issue:

https://github.com/apache/spark/pull/22878
  
Just quickly and roughly tested. Merge script looks only recognising main 
author of each commit in a PR. Let's just push a commit into here.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #22878: [SPARK-25789][SQL] Support for Dataset of Avro

2018-10-29 Thread HyukjinKwon
Github user HyukjinKwon commented on the issue:

https://github.com/apache/spark/pull/22878
  
I wonder if that can be handled by merge script tho. I think it's okay just 
to pick up some commits there and rebase them to here even if they are empty 
commits. That's easier for committers to put his name as primary author when 
it's merged.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #22666: [SPARK-25672][SQL] schema_of_csv() - schema inference fr...

2018-10-29 Thread HyukjinKwon
Github user HyukjinKwon commented on the issue:

https://github.com/apache/spark/pull/22666
  
Thanks, @cloud-fan. The change looks good to me from my side. Let me take 
another look for this and leave a sign-off (which means a sign-off for 
@MaxGekk's code changes)


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #22877: [MINOR][SQL] Avoid hardcoded configuration keys in SQLCo...

2018-10-29 Thread HyukjinKwon
Github user HyukjinKwon commented on the issue:

https://github.com/apache/spark/pull/22877
  
Thanks, @kiszk and @dongjoon-hyun 


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #22877: [MINOR][SQL] Avoid hardcoded configuration keys in SQLCo...

2018-10-29 Thread HyukjinKwon
Github user HyukjinKwon commented on the issue:

https://github.com/apache/spark/pull/22877
  
Merged to master


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #21654: [SPARK-24671][PySpark] DataFrame length using a dunder/m...

2018-10-29 Thread HyukjinKwon
Github user HyukjinKwon commented on the issue:

https://github.com/apache/spark/pull/21654
  
Thanks, @holdenk for addressing my concern. I will try to join as well.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #22666: [SPARK-25672][SQL] schema_of_csv() - schema infer...

2018-10-29 Thread HyukjinKwon
Github user HyukjinKwon commented on a diff in the pull request:

https://github.com/apache/spark/pull/22666#discussion_r228949792
  
--- Diff: sql/core/src/test/resources/sql-tests/inputs/csv-functions.sql ---
@@ -7,3 +7,11 @@ select from_csv('1', 'a InvalidType');
 select from_csv('1', 'a INT', named_struct('mode', 'PERMISSIVE'));
 select from_csv('1', 'a INT', map('mode', 1));
 select from_csv();
+-- infer schema of json literal
+select from_csv('1,abc', schema_of_csv('1,abc'));
+select schema_of_csv('1|abc', map('delimiter', '|'));
+select schema_of_csv(null);
+CREATE TEMPORARY VIEW csvTable(csvField, a) AS SELECT * FROM VALUES 
('1,abc', 'a');
+SELECT schema_of_csv(csvField) FROM csvTable;
+-- Clean up
+DROP VIEW IF EXISTS csvTable;
--- End diff --

I see but isn't it still better to explicitly clean tables up?


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #21588: [SPARK-24590][BUILD] Make Jenkins tests passed with hado...

2018-10-29 Thread HyukjinKwon
Github user HyukjinKwon commented on the issue:

https://github.com/apache/spark/pull/21588
  
Yes, that was what I was thinking at worst case. For clarification, 
@wangyum made a try and all tests were passed at least - 
https://github.com/apache/spark/pull/20659. Given this try, I think it looks 
pretty much feasible to upgrade without breaking compatibility.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #22877: [MINOR][SQL] Avoid hardcoded configuration keys i...

2018-10-29 Thread HyukjinKwon
GitHub user HyukjinKwon opened a pull request:

https://github.com/apache/spark/pull/22877

[MINOR][SQL] Avoid hardcoded configuration keys in SQLConf's `doc`

## What changes were proposed in this pull request?

This PR proposes to avoid hardcorded configuration keys in SQLConf's `doc.

## How was this patch tested?

Manually verified.

You can merge this pull request into a Git repository by running:

$ git pull https://github.com/HyukjinKwon/spark minor-conf-name

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/spark/pull/22877.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #22877


commit 599557012cc9f4d4e5f36744180121cc8845413c
Author: hyukjinkwon 
Date:   2018-10-29T12:42:54Z

Avoid hardcoded configuration keys in SQLConf's `doc`




---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #22872: [SPARK-25864][SQL][TEST] Make main args accessible for B...

2018-10-29 Thread HyukjinKwon
Github user HyukjinKwon commented on the issue:

https://github.com/apache/spark/pull/22872
  
retest this please


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #22530: [SPARK-24869][SQL] Fix SaveIntoDataSourceCommand's input...

2018-10-29 Thread HyukjinKwon
Github user HyukjinKwon commented on the issue:

https://github.com/apache/spark/pull/22530
  
retest this please




---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #22530: [SPARK-24869][SQL] Fix SaveIntoDataSourceCommand's input...

2018-10-29 Thread HyukjinKwon
Github user HyukjinKwon commented on the issue:

https://github.com/apache/spark/pull/22530
  

retest this please
--





---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #22078: [SPARK-25085][SQL] Table subdirectories should in...

2018-10-29 Thread HyukjinKwon
Github user HyukjinKwon commented on a diff in the pull request:

https://github.com/apache/spark/pull/22078#discussion_r228882841
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/internal/SQLConf.scala ---
@@ -626,6 +626,14 @@ object SQLConf {
 .stringConf
 .createWithDefault("parquet")
 
+  val DATA_SOURCE_TABLE_INHERIT_PERMS = 
buildConf("spark.sql.datasource.table.inherit.perms")
+.internal()
+.doc("Set this to true if the table directories should be inheriting 
the permission " +
+  "of the warehouse or database directory " +
--- End diff --

Hive configuration doc itself doesn't looks quite clear to me as well 
actually.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #22078: [SPARK-25085][SQL] Table subdirectories should in...

2018-10-29 Thread HyukjinKwon
Github user HyukjinKwon commented on a diff in the pull request:

https://github.com/apache/spark/pull/22078#discussion_r228881996
  
--- Diff: 
sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/InsertIntoHadoopFsRelationCommand.scala
 ---
@@ -70,7 +76,6 @@ case class InsertIntoHadoopFsRelationCommand(
 val hadoopConf = 
sparkSession.sessionState.newHadoopConfWithOptions(options)
 val fs = outputPath.getFileSystem(hadoopConf)
 val qualifiedOutputPath = outputPath.makeQualified(fs.getUri, 
fs.getWorkingDirectory)
-
--- End diff --

I would remove unrelated newline changes.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #22078: [SPARK-25085][SQL] Table subdirectories should in...

2018-10-29 Thread HyukjinKwon
Github user HyukjinKwon commented on a diff in the pull request:

https://github.com/apache/spark/pull/22078#discussion_r228881824
  
--- Diff: 
sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/InsertIntoHadoopFsRelationCommand.scala
 ---
@@ -261,4 +272,69 @@ case class InsertIntoHadoopFsRelationCommand(
   }
 }.toMap
   }
+
+  private def isExtendedAclEnabled(hadoopConf: Configuration): Boolean =
+hadoopConf.getBoolean(DFS_NAMENODE_ACLS_ENABLED_KEY, 
DFS_NAMENODE_ACLS_ENABLED_DEFAULT)
+
+  private def getFullFileStatus(
+  conf: SQLConf,
+  hadoopConf: Configuration,
+  pathExists: Boolean,
+  fs: FileSystem,
+  file: Path): (Option[FsPermission], Option[AclStatus]) = {
+var permission: Option[FsPermission] = None
+var aclStatus: Option[AclStatus] = None
+if (conf.isDataSouceTableInheritPerms && pathExists) {
+  permission = Some(fs.getFileStatus(file).getPermission)
+  if (isExtendedAclEnabled(hadoopConf)) aclStatus = 
Some(fs.getAclStatus(file))
+}
+(permission, aclStatus)
--- End diff --

I would remove `var` by, for instance:

```scala
val shouldFetchPermission = conf.isDataSouceTableInheritPerms && pathExists
val shouldFetchAclStatus = shouldFetchPermission && 
isExtendedAclEnabled(hadoopConf)

val permission = if (shouldFetchPermission) {
  Some(fs.getFileStatus(file).getPermission)
} else {
  None
}
val aclStatus = if (shouldFetchAclStatus) {
  Some(fs.getAclStatus(file))
} else {
  None
}

(permission, aclStatus)
```


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] zeppelin issue #3206: [WIP][ZEPPELIN-3810] Support Spark 2.4

2018-10-29 Thread HyukjinKwon
Github user HyukjinKwon commented on the issue:

https://github.com/apache/zeppelin/pull/3206
  
It should be usable if the changes is cherry-picked properly. This PR 
basically just replace one line:


https://github.com/apache/zeppelin/blob/v0.8.0/spark/scala-2.11/src/main/scala/org/apache/zeppelin/spark/SparkScala211Interpreter.scala#L84

to a private function.


---


[GitHub] spark issue #22275: [SPARK-25274][PYTHON][SQL] In toPandas with Arrow send o...

2018-10-29 Thread HyukjinKwon
Github user HyukjinKwon commented on the issue:

https://github.com/apache/spark/pull/22275
  
retest this please


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] zeppelin issue #3206: [WIP][ZEPPELIN-3810] Support Spark 2.4

2018-10-29 Thread HyukjinKwon
Github user HyukjinKwon commented on the issue:

https://github.com/apache/zeppelin/pull/3206
  
The error message:

```
[ERROR] 
/home/cloud-user/ajay/code/csf-cc-zeppelin-k8szep/spark/scala-2.11/src/main/scala/org/apache/zeppelin/spark/SparkScala211Interpreter.scala:37:
 error: class 
SparkScala211Interpreter needs to be abstract, since method interpret in 
class 
BaseSparkScalaInterpreter of type (code: String, context: 
org.apache.zeppelin.interpreter.InterpreterContext)org.apache.zeppelin.interpreter.InterpreterResult
 is 
not defined
```

complains there's no `interpret` method defined at 
`BaseSparkScalaInterpreter`.


---


[GitHub] zeppelin issue #3206: [WIP][ZEPPELIN-3810] Support Spark 2.4

2018-10-29 Thread HyukjinKwon
Github user HyukjinKwon commented on the issue:

https://github.com/apache/zeppelin/pull/3206
  
Does that happen only with this code changes? The change here does not 
touch signature at `class SparkScala211Interpreter(` and the error message 
looks pretty unrelated. The whole change proposed here does not change any 
signature.


---


[GitHub] spark issue #22870: [SPARK-25862][SQL] Remove rangeBetween APIs introduced i...

2018-10-29 Thread HyukjinKwon
Github user HyukjinKwon commented on the issue:

https://github.com/apache/spark/pull/22870
  
retest this please


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #22871: [SPARK-25179][PYTHON][DOCS] Document BinaryType support ...

2018-10-29 Thread HyukjinKwon
Github user HyukjinKwon commented on the issue:

https://github.com/apache/spark/pull/22871
  
Thanks, @dongjoon-hyun and @gatorsmile 


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #22530: [SPARK-24869][SQL] Fix SaveIntoDataSourceCommand's input...

2018-10-28 Thread HyukjinKwon
Github user HyukjinKwon commented on the issue:

https://github.com/apache/spark/pull/22530
  
retest this please


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #22326: [SPARK-25314][SQL] Fix Python UDF accessing attributes f...

2018-10-28 Thread HyukjinKwon
Github user HyukjinKwon commented on the issue:

https://github.com/apache/spark/pull/22326
  
late LGTM


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #22847: [SPARK-25850][SQL] Make the split threshold for t...

2018-10-28 Thread HyukjinKwon
Github user HyukjinKwon commented on a diff in the pull request:

https://github.com/apache/spark/pull/22847#discussion_r228789484
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/internal/SQLConf.scala ---
@@ -812,6 +812,17 @@ object SQLConf {
 .intConf
 .createWithDefault(65535)
 
+  val CODEGEN_METHOD_SPLIT_THRESHOLD = 
buildConf("spark.sql.codegen.methodSplitThreshold")
+.internal()
+.doc("The threshold of source code length without comment of a single 
Java function by " +
+  "codegen to be split. When the generated Java function source code 
exceeds this threshold" +
+  ", it will be split into multiple small functions. We can't know how 
many bytecode will " +
--- End diff --

Not a big deal but I would avoid abbreviation in documentation. `can't` -> 
`cannot`


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #22666: [SPARK-25672][SQL] schema_of_csv() - schema infer...

2018-10-28 Thread HyukjinKwon
Github user HyukjinKwon commented on a diff in the pull request:

https://github.com/apache/spark/pull/22666#discussion_r228787126
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/ExprUtils.scala
 ---
@@ -19,14 +19,39 @@ package org.apache.spark.sql.catalyst.expressions
 
 import org.apache.spark.sql.AnalysisException
 import org.apache.spark.sql.catalyst.util.ArrayBasedMapData
-import org.apache.spark.sql.types.{MapType, StringType, StructType}
+import org.apache.spark.sql.types.{DataType, MapType, StringType, 
StructType}
+import org.apache.spark.unsafe.types.UTF8String
 
 object ExprUtils {
 
-  def evalSchemaExpr(exp: Expression): StructType = exp match {
-case Literal(s, StringType) => StructType.fromDDL(s.toString)
+  def evalSchemaExpr(exp: Expression): StructType = {
+// Use `DataType.fromDDL` since the type string can be struct<...>.
+val dataType = exp match {
+  case Literal(s, StringType) =>
+DataType.fromDDL(s.toString)
+  case e @ SchemaOfCsv(_: Literal, _) =>
+val ddlSchema = e.eval(EmptyRow).asInstanceOf[UTF8String]
+DataType.fromDDL(ddlSchema.toString)
+  case e => throw new AnalysisException(
+"Schema should be specified in DDL format as a string literal or 
output of " +
+  s"the schema_of_csv function instead of ${e.sql}")
+}
+
+if (!dataType.isInstanceOf[StructType]) {
+  throw new AnalysisException(
+s"Schema should be struct type but got ${dataType.sql}.")
+}
+dataType.asInstanceOf[StructType]
+  }
+
+  def evalTypeExpr(exp: Expression): DataType = exp match {
+case Literal(s, StringType) => DataType.fromDDL(s.toString)
--- End diff --

Yup, that's what I initially thought that we should allow constant-foldable 
expressions as well but just decided to follow the initial intent - literal 
only support. I wasn't also sure about when we would need constant folding to 
construct a JSON example because I suspected that's usually copied and pasted 
from, for instance, a file.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #22871: [SPARK-25179][PYTHON][DOCS] Document BinaryType support ...

2018-10-28 Thread HyukjinKwon
Github user HyukjinKwon commented on the issue:

https://github.com/apache/spark/pull/22871
  
cc @BryanCutler and @gatorsmile.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #22871: [SPARK-25179][PYTHON][DOCS] Document BinaryType s...

2018-10-28 Thread HyukjinKwon
GitHub user HyukjinKwon opened a pull request:

https://github.com/apache/spark/pull/22871

[SPARK-25179][PYTHON][DOCS] Document BinaryType support in Arrow conversion

## What changes were proposed in this pull request?

This PR targets to document binary type in "Apache Arrow in Spark".

## How was this patch tested?

Manually built the documentation and checked.


You can merge this pull request into a Git repository by running:

$ git pull https://github.com/HyukjinKwon/spark SPARK-25179

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/spark/pull/22871.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #22871


commit 34109bc63dd12f733cf2052b048093799f9b0102
Author: hyukjinkwon 
Date:   2018-10-29T02:53:18Z

Document BinaryType support in Arrow conversion




---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #21588: [SPARK-24590][BUILD] Make Jenkins tests passed with hado...

2018-10-28 Thread HyukjinKwon
Github user HyukjinKwon commented on the issue:

https://github.com/apache/spark/pull/21588
  
@dongjoon-hyun and @wangyum, please fix my comment if I am wrong at any 
point - I believe you guys took a look for this part more then I did.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #21588: [SPARK-24590][BUILD] Make Jenkins tests passed with hado...

2018-10-28 Thread HyukjinKwon
Github user HyukjinKwon commented on the issue:

https://github.com/apache/spark/pull/21588
  
> Does this upgrade Hive for execution or also for metastore? Spark 
supports virtually all Hive metastore versions out there, and a lot of 
deployments do run different versions of Spark against the same old Hive 
metastore, and it'd be bad to break connectivity to old Hive metastores.

> The execution part is a different story and we can upgrade them easily.

The upgrade basically targets to upgrade Hive for execution (let me know if 
I am mistaken). For metastore compatibility, I believe we are able to provide 
metastore jars and support other Hive versions via explicitly configuring the 
JARs via isolated classloader. I believe we have basic tests for different Hive 
versions.

I would cautiously like to raise an option - drop the builtin metastore 
support at 3.0 by default if the upgrade makes to keep builtin metastore 
support hard enough.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #22868: [SPARK-25833][SQL][DOCS] Update migration guide f...

2018-10-28 Thread HyukjinKwon
Github user HyukjinKwon commented on a diff in the pull request:

https://github.com/apache/spark/pull/22868#discussion_r228776349
  
--- Diff: docs/sql-migration-guide-hive-compatibility.md ---
@@ -51,6 +51,9 @@ Spark SQL supports the vast majority of Hive features, 
such as:
 * Explain
 * Partitioned tables including dynamic partition insertion
 * View
+  * If column aliases are not specified in view definition queries, both 
Spark and Hive will
+generate alias names, but in different ways. In order for Spark to be 
able to read views created
+by Hive, users should explicitly specify column aliases in view 
definition queries.
--- End diff --

+1


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #22865: [DOC] Fix doc for spark.sql.parquet.recordLevelFi...

2018-10-28 Thread HyukjinKwon
Github user HyukjinKwon commented on a diff in the pull request:

https://github.com/apache/spark/pull/22865#discussion_r228776300
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/internal/SQLConf.scala ---
@@ -462,7 +462,7 @@ object SQLConf {
   val PARQUET_RECORD_FILTER_ENABLED = 
buildConf("spark.sql.parquet.recordLevelFilter.enabled")
 .doc("If true, enables Parquet's native record-level filtering using 
the pushed down " +
   "filters. This configuration only has an effect when 
'spark.sql.parquet.filterPushdown' " +
-  "is enabled.")
+  "is enabled and spark.sql.parquet.enableVectorizedReader is 
disabled.")
--- End diff --

SGTM


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #22858: [SPARK-24709][SQL][2.4] use str instead of basestring in...

2018-10-27 Thread HyukjinKwon
Github user HyukjinKwon commented on the issue:

https://github.com/apache/spark/pull/22858
  
Oops, mind fixing PR title too?


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #22858: [SPARK-24709][SQL][2.4] use str instead of basestring in...

2018-10-27 Thread HyukjinKwon
Github user HyukjinKwon commented on the issue:

https://github.com/apache/spark/pull/22858
  
@cloud-fan, thanks for doing this backport!


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #22858: [SPARK-24709][SQL][2.4] use str instead of basestring in...

2018-10-27 Thread HyukjinKwon
Github user HyukjinKwon commented on the issue:

https://github.com/apache/spark/pull/22858
  
Merged to branch-2.4.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #22865: [DOC] Fix doc for spark.sql.parquet.recordLevelFi...

2018-10-27 Thread HyukjinKwon
Github user HyukjinKwon commented on a diff in the pull request:

https://github.com/apache/spark/pull/22865#discussion_r228731568
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/internal/SQLConf.scala ---
@@ -462,7 +462,7 @@ object SQLConf {
   val PARQUET_RECORD_FILTER_ENABLED = 
buildConf("spark.sql.parquet.recordLevelFilter.enabled")
 .doc("If true, enables Parquet's native record-level filtering using 
the pushed down " +
   "filters. This configuration only has an effect when 
'spark.sql.parquet.filterPushdown' " +
-  "is enabled.")
+  "is enabled and spark.sql.parquet.enableVectorizedReader is 
disabled.")
--- End diff --

Btw vectorized reader at parquet side is in progress at Parquet side. Maybe 
ideally 3.0 is a good target to replace.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #22865: [DOC] Fix doc for spark.sql.parquet.recordLevelFi...

2018-10-27 Thread HyukjinKwon
Github user HyukjinKwon commented on a diff in the pull request:

https://github.com/apache/spark/pull/22865#discussion_r228731385
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/internal/SQLConf.scala ---
@@ -462,7 +462,7 @@ object SQLConf {
   val PARQUET_RECORD_FILTER_ENABLED = 
buildConf("spark.sql.parquet.recordLevelFilter.enabled")
 .doc("If true, enables Parquet's native record-level filtering using 
the pushed down " +
   "filters. This configuration only has an effect when 
'spark.sql.parquet.filterPushdown' " +
-  "is enabled.")
+  "is enabled and spark.sql.parquet.enableVectorizedReader is 
disabled.")
--- End diff --

I didn't document this intentionally because 
`spark.sql.parquet.enableVectorizedReader` can be disabled for other conditions 
even if it's enabled (like for complex types).

Shall we clearly document this?


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #22858: [SPARK-24709][SQL][2.4] use str instead of basestring in...

2018-10-27 Thread HyukjinKwon
Github user HyukjinKwon commented on the issue:

https://github.com/apache/spark/pull/22858
  
Yup, I think strictly we should change. Looks there are two occurrences at 
`udf` and `pands_udf` `isinstance(..., str)`.

Another problem at PySpark is, inconsistent type comparison like type(...) 
== t` vs `isinstance(..., t)`. For instance, `type(...) == dict` vs 
`isinstance(..., dict)` - the former does not allow `OrderedDict` but the later 
allows.

Another problem is, some types like `bool` at Python inherits `int`. In 
this case, `isinstance(...)` might produce unexpected results, for instance, 

```python
>>> isinstance(True, int)
True
```

I was nervous about the cases above and didn't fix those changes so far. 


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #22858: [SPARK-24709][SQL][2.4] use str instead of basest...

2018-10-27 Thread HyukjinKwon
Github user HyukjinKwon commented on a diff in the pull request:

https://github.com/apache/spark/pull/22858#discussion_r228731178
  
--- Diff: python/pyspark/sql/functions.py ---
@@ -2326,7 +2326,7 @@ def schema_of_json(json):
 >>> df.select(schema_of_json('{"a": 0}').alias("json")).collect()
 [Row(json=u'struct')]
 """
-if isinstance(json, basestring):
+if isinstance(json, str):
--- End diff --

Yea we should. They are put only when it's needed because there are so many 
cases like that (for instance, imap in Python 2 and map in Python 3)

Looks that's added in another PR in master beach only.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #21157: [SPARK-22674][PYTHON] Removed the namedtuple pickling pa...

2018-10-27 Thread HyukjinKwon
Github user HyukjinKwon commented on the issue:

https://github.com/apache/spark/pull/21157
  
I meant to use


https://github.com/apache/spark/blob/a97001d21757ae214c86371141bd78a376200f66/python/pyspark/serializers.py#L583

Instead of 


https://github.com/apache/spark/blob/a97001d21757ae214c86371141bd78a376200f66/python/pyspark/serializers.py#L561



---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #22858: [SPARK-24709][SQL][2.4] use str instead of basest...

2018-10-27 Thread HyukjinKwon
Github user HyukjinKwon commented on a diff in the pull request:

https://github.com/apache/spark/pull/22858#discussion_r228713086
  
--- Diff: python/pyspark/sql/functions.py ---
@@ -2326,7 +2326,7 @@ def schema_of_json(json):
 >>> df.select(schema_of_json('{"a": 0}').alias("json")).collect()
 [Row(json=u'struct')]
 """
-if isinstance(json, basestring):
+if isinstance(json, str):
--- End diff --

The problem here is we will not support unicode in Python 2 .. 


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #22858: [SPARK-24709][SQL][2.4] use str instead of basestring in...

2018-10-27 Thread HyukjinKwon
Github user HyukjinKwon commented on the issue:

https://github.com/apache/spark/pull/22858
  
Wenchen, this is because 

```python
if sys.version >= '3':
basestring = str
```

Is missing. Python 3 does not have `basestring`.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #21157: [SPARK-22674][PYTHON] Removed the namedtuple pickling pa...

2018-10-27 Thread HyukjinKwon
Github user HyukjinKwon commented on the issue:

https://github.com/apache/spark/pull/21157
  
Adding @gatorsmile and @cloud-fan as well since this might be potentially 
breaking changes for 3.0 release (it affects RDD operation only with namedtuple 
in certain case tho)


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #21157: [SPARK-22674][PYTHON] Removed the namedtuple pickling pa...

2018-10-27 Thread HyukjinKwon
Github user HyukjinKwon commented on the issue:

https://github.com/apache/spark/pull/21157
  
And you can also run profiler to show the performance effect. See 
https://github.com/apache/spark/pull/19246#discussion_r139874732 to run the 
profile


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #21157: [SPARK-22674][PYTHON] Removed the namedtuple pickling pa...

2018-10-27 Thread HyukjinKwon
Github user HyukjinKwon commented on the issue:

https://github.com/apache/spark/pull/21157
  
You can just replace it to CloudPickler, remove changes at tests, and push 
that commit here to show no case is broken


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #22666: [SPARK-25672][SQL] schema_of_csv() - schema inference fr...

2018-10-27 Thread HyukjinKwon
Github user HyukjinKwon commented on the issue:

https://github.com/apache/spark/pull/22666
  
retest this please


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #22666: [SPARK-25672][SQL] schema_of_csv() - schema inference fr...

2018-10-26 Thread HyukjinKwon
Github user HyukjinKwon commented on the issue:

https://github.com/apache/spark/pull/22666
  
retest this please


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #20503: [SPARK-23299][SQL][PYSPARK] Fix __repr__ behaviour for R...

2018-10-26 Thread HyukjinKwon
Github user HyukjinKwon commented on the issue:

https://github.com/apache/spark/pull/20503
  
ok to test


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #22775: [SPARK-24709][SQL][FOLLOW-UP] Make schema_of_json's inpu...

2018-10-26 Thread HyukjinKwon
Github user HyukjinKwon commented on the issue:

https://github.com/apache/spark/pull/22775
  
Oh you mean the conflict fixing is not that hard. Thanks for doing this 
@cloud-fan. I planned to do this today .. :-).


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #21157: [SPARK-22674][PYTHON] Removed the namedtuple pickling pa...

2018-10-26 Thread HyukjinKwon
Github user HyukjinKwon commented on the issue:

https://github.com/apache/spark/pull/21157
  
Yea, so to avoid to break, we could change the default pickler to 
CloudPickler or document this workaround. @superbobry, can you check if the 
case can be preserved if we use CloudPickler instead?


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #22775: [SPARK-24709][SQL][FOLLOW-UP] Make schema_of_json's inpu...

2018-10-26 Thread HyukjinKwon
Github user HyukjinKwon commented on the issue:

https://github.com/apache/spark/pull/22775
  
Yea, but I meant a bit complicated but I'm okay in that way @cloud-fan. 
Thanks for doing that. I planed to do it today (now) :).


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #21588: [SPARK-24590][BUILD] Make Jenkins tests passed with hado...

2018-10-26 Thread HyukjinKwon
Github user HyukjinKwon commented on the issue:

https://github.com/apache/spark/pull/21588
  
> Hive 2.3 works with Hadoop 2.x (Hive 3.x works with Hadoop 3.x).

This is essentially what we need for Hadoop 3 support 
[release-2.3.2|https://github.com/apache/hive/blob/rel/release-2.3.2/shims/common/src/main/java/org/apache/hadoop/hive/shims/ShimLoader.java#L152-L154].
 
See https://issues.apache.org/jira/browse/HIVE-16081.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #22775: [SPARK-24709][SQL][FOLLOW-UP] Make schema_of_json's inpu...

2018-10-26 Thread HyukjinKwon
Github user HyukjinKwon commented on the issue:

https://github.com/apache/spark/pull/22775
  
Sure!


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #22850: [MINOR][DOC] Fix comment error of HiveUtils

2018-10-26 Thread HyukjinKwon
Github user HyukjinKwon commented on the issue:

https://github.com/apache/spark/pull/22850
  
Yea, I was aware of it. I think there are some more old comments in this 
file if I remember this correctly. Can you double check and fix them while we 
are here?


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #22850: [MINOR][DOC] Fix comment error of HiveUtils

2018-10-26 Thread HyukjinKwon
Github user HyukjinKwon commented on the issue:

https://github.com/apache/spark/pull/22850
  
ok to test


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #22775: [SPARK-24709][SQL][FOLLOW-UP] Make schema_of_json...

2018-10-26 Thread HyukjinKwon
Github user HyukjinKwon commented on a diff in the pull request:

https://github.com/apache/spark/pull/22775#discussion_r228520891
  
--- Diff: python/pyspark/sql/functions.py ---
@@ -2365,30 +2365,32 @@ def to_json(col, options={}):
 
 @ignore_unicode_prefix
 @since(2.4)
-def schema_of_json(col, options={}):
+def schema_of_json(json, options={}):
 """
-Parses a column containing a JSON string and infers its schema in DDL 
format.
+Parses a JSON string and infers its schema in DDL format.
 
-:param col: string column in json format
+:param json: a JSON string or a string literal containing a JSON 
string.
 :param options: options to control parsing. accepts the same options 
as the JSON datasource
 
 .. versionchanged:: 3.0
It accepts `options` parameter to control schema inferring.
 
->>> from pyspark.sql.types import *
->>> data = [(1, '{"a": 1}')]
->>> df = spark.createDataFrame(data, ("key", "value"))
->>> df.select(schema_of_json(df.value).alias("json")).collect()
-[Row(json=u'struct')]
+>>> df = spark.range(1)
 >>> df.select(schema_of_json(lit('{"a": 0}')).alias("json")).collect()
 [Row(json=u'struct')]
->>> schema = schema_of_json(lit('{a: 1}'), 
{'allowUnquotedFieldNames':'true'})
+>>> schema = schema_of_json('{a: 1}', 
{'allowUnquotedFieldNames':'true'})
 >>> df.select(schema.alias("json")).collect()
 [Row(json=u'struct')]
 """
+if isinstance(json, basestring):
--- End diff --

Actually, Wenchen, I think that's going to make it a bit complicated after 
few more thoughts .. I think it's okay to go ahead. It's kind of a mistake that 
we had to do in 2.4.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #22775: [SPARK-24709][SQL][FOLLOW-UP] Make schema_of_json...

2018-10-26 Thread HyukjinKwon
Github user HyukjinKwon commented on a diff in the pull request:

https://github.com/apache/spark/pull/22775#discussion_r228504453
  
--- Diff: python/pyspark/sql/functions.py ---
@@ -2365,30 +2365,32 @@ def to_json(col, options={}):
 
 @ignore_unicode_prefix
 @since(2.4)
-def schema_of_json(col, options={}):
+def schema_of_json(json, options={}):
 """
-Parses a column containing a JSON string and infers its schema in DDL 
format.
+Parses a JSON string and infers its schema in DDL format.
 
-:param col: string column in json format
+:param json: a JSON string or a string literal containing a JSON 
string.
 :param options: options to control parsing. accepts the same options 
as the JSON datasource
 
 .. versionchanged:: 3.0
It accepts `options` parameter to control schema inferring.
 
->>> from pyspark.sql.types import *
->>> data = [(1, '{"a": 1}')]
->>> df = spark.createDataFrame(data, ("key", "value"))
->>> df.select(schema_of_json(df.value).alias("json")).collect()
-[Row(json=u'struct')]
+>>> df = spark.range(1)
 >>> df.select(schema_of_json(lit('{"a": 0}')).alias("json")).collect()
 [Row(json=u'struct')]
->>> schema = schema_of_json(lit('{a: 1}'), 
{'allowUnquotedFieldNames':'true'})
+>>> schema = schema_of_json('{a: 1}', 
{'allowUnquotedFieldNames':'true'})
 >>> df.select(schema.alias("json")).collect()
 [Row(json=u'struct')]
 """
+if isinstance(json, basestring):
--- End diff --

+1


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #22771: [SPARK-25773][Core]Cancel zombie tasks in a result stage...

2018-10-26 Thread HyukjinKwon
Github user HyukjinKwon commented on the issue:

https://github.com/apache/spark/pull/22771
  
retest this please


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #22775: [SPARK-24709][SQL][FOLLOW-UP] Make schema_of_json's inpu...

2018-10-26 Thread HyukjinKwon
Github user HyukjinKwon commented on the issue:

https://github.com/apache/spark/pull/22775
  
Yup, yup .. I should sync the tests


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #22814: [SPARK-25819][SQL] Support parse mode option for the fun...

2018-10-25 Thread HyukjinKwon
Github user HyukjinKwon commented on the issue:

https://github.com/apache/spark/pull/22814
  
Merged to master


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #21588: [SPARK-24590][BUILD] Make Jenkins tests passed with hado...

2018-10-25 Thread HyukjinKwon
Github user HyukjinKwon commented on the issue:

https://github.com/apache/spark/pull/21588
  
Yup, it supports Hadoop 3, and other fixes what @wangyum mentioned.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #22814: [SPARK-25819][SQL] Support parse mode option for the fun...

2018-10-25 Thread HyukjinKwon
Github user HyukjinKwon commented on the issue:

https://github.com/apache/spark/pull/22814
  
LGTM too


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #22814: [SPARK-25819][SQL] Support parse mode option for ...

2018-10-25 Thread HyukjinKwon
Github user HyukjinKwon commented on a diff in the pull request:

https://github.com/apache/spark/pull/22814#discussion_r228381742
  
--- Diff: docs/sql-data-sources-avro.md ---
@@ -177,6 +180,19 @@ Data source options of Avro can be set using the 
`.option` method on `DataFrameR
   Currently supported codecs are uncompressed, 
snappy, deflate, bzip2 and 
xz. If the option is not set, the configuration 
spark.sql.avro.compression.codec config is taken into account.
 write
   
+  
+mode
+FAILFAST
+The mode option allows to specify parse mode for 
function from_avro.
+  Currently supported modes are:
+  
+FAILFAST: Throws an exception on processing 
corrupted record.
+PERMISSIVE: Corrupt records are processed as null 
result. To implement this, the
--- End diff --

not a big deal but ..`To implement this` sounds a little bit awkward .. I 
think you can just say ... `Corrupt records are processed as null; therefore, 
the given schema is forced to be fully nullable.` (feel free to fix words as 
you think righter).


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #22814: [SPARK-25819][SQL] Support parse mode option for ...

2018-10-25 Thread HyukjinKwon
Github user HyukjinKwon commented on a diff in the pull request:

https://github.com/apache/spark/pull/22814#discussion_r228380951
  
--- Diff: 
external/avro/src/main/scala/org/apache/spark/sql/avro/package.scala ---
@@ -31,10 +32,32 @@ package object avro {
* @since 2.4.0
*/
   @Experimental
-  def from_avro(data: Column, jsonFormatSchema: String): Column = {
-new Column(AvroDataToCatalyst(data.expr, jsonFormatSchema))
+  def from_avro(
+  data: Column,
+  jsonFormatSchema: String): Column = {
+new Column(AvroDataToCatalyst(data.expr, jsonFormatSchema, Map.empty))
+  }
+
+  /**
+   * Converts a binary column of avro format into its corresponding 
catalyst value. The specified
+   * schema must match the read data, otherwise the behavior is undefined: 
it may fail or return
+   * arbitrary result.
+   *
+   * @param data the binary column.
+   * @param jsonFormatSchema the avro schema in JSON string format.
+   * @param options options to control how the Avro record is parsed.
+   *
+   * @since 3.0.0
+   */
+  @Experimental
+  def from_avro(
+  data: Column,
+  jsonFormatSchema: String,
+  options: Map[String, String]): Column = {
--- End diff --

One thing I am less sure is tho, what do you think about we just expose one 
Java specific one? This will only be usable in Scala side.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #22814: [SPARK-25819][SQL] Support parse mode option for ...

2018-10-25 Thread HyukjinKwon
Github user HyukjinKwon commented on a diff in the pull request:

https://github.com/apache/spark/pull/22814#discussion_r228380639
  
--- Diff: 
external/avro/src/test/scala/org/apache/spark/sql/avro/AvroFunctionsSuite.scala 
---
@@ -61,6 +59,24 @@ class AvroFunctionsSuite extends QueryTest with 
SharedSQLContext {
 checkAnswer(avroStructDF.select(from_avro('avro, avroTypeStruct)), df)
   }
 
+  test("handle invalid input in from_avro") {
+val count = 10
+val df = spark.range(count).select(struct('id, 
'id.as("id2")).as("struct"))
+val avroStructDF = df.select(to_avro('struct).as("avro"))
+val avroTypeStruct = s"""
+  |{
+  |  "type": "record",
+  |  "name": "struct",
+  |  "fields": [
+  |{"name": "col1", "type": "long"},
+  |{"name": "col2", "type": "double"}
+  |  ]
+  |}
+""".stripMargin
+val expected = (0 until count).map(_ => Row(Row(null, null)))
+checkAnswer(avroStructDF.select(from_avro('avro, avroTypeStruct)), 
expected)
--- End diff --

Thanks. Makes sense to me


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #22827: [SPARK-25832][SQL][BRANCH-2.4] Revert newly added map re...

2018-10-25 Thread HyukjinKwon
Github user HyukjinKwon commented on the issue:

https://github.com/apache/spark/pull/22827
  
LGTM too


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #22841: [SPARK-25842][SQL] Deprecate rangeBetween APIs introduce...

2018-10-25 Thread HyukjinKwon
Github user HyukjinKwon commented on the issue:

https://github.com/apache/spark/pull/22841
  
Looks good to me.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #22841: [SPARK-25842][SQL] Deprecate rangeBetween APIs in...

2018-10-25 Thread HyukjinKwon
Github user HyukjinKwon commented on a diff in the pull request:

https://github.com/apache/spark/pull/22841#discussion_r228376996
  
--- Diff: python/pyspark/sql/window.py ---
@@ -239,34 +212,27 @@ def rangeBetween(self, start, end):
 and "5" means the five off after the current row.
 
 We recommend users use ``Window.unboundedPreceding``, 
``Window.unboundedFollowing``,
-``Window.currentRow``, 
``pyspark.sql.functions.unboundedPreceding``,
-``pyspark.sql.functions.unboundedFollowing`` and 
``pyspark.sql.functions.currentRow``
-to specify special boundary values, rather than using integral 
values directly.
+and ``Window.currentRow`` to specify special boundary values, 
rather than using integral
--- End diff --

Oops, right. I misread.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #22815: [SPARK-25821][SQL] Remove SQLContext methods depr...

2018-10-25 Thread HyukjinKwon
Github user HyukjinKwon commented on a diff in the pull request:

https://github.com/apache/spark/pull/22815#discussion_r228376272
  
--- Diff: R/pkg/R/SQLContext.R ---
@@ -434,6 +388,7 @@ read.orc <- function(path, ...) {
 #' Loads a Parquet file, returning the result as a SparkDataFrame.
 #'
 #' @param path path of file to read. A vector of multiple paths is allowed.
+#' @param ... additional external data source specific named properties.
--- End diff --

Yeam actually I opened a PR .. https://github.com/srowen/spark/pull/2


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #22841: [SPARK-25842][SQL] Deprecate rangeBetween APIs in...

2018-10-25 Thread HyukjinKwon
Github user HyukjinKwon commented on a diff in the pull request:

https://github.com/apache/spark/pull/22841#discussion_r228376015
  
--- Diff: python/pyspark/sql/window.py ---
@@ -239,34 +212,27 @@ def rangeBetween(self, start, end):
 and "5" means the five off after the current row.
 
 We recommend users use ``Window.unboundedPreceding``, 
``Window.unboundedFollowing``,
-``Window.currentRow``, 
``pyspark.sql.functions.unboundedPreceding``,
-``pyspark.sql.functions.unboundedFollowing`` and 
``pyspark.sql.functions.currentRow``
-to specify special boundary values, rather than using integral 
values directly.
+and ``Window.currentRow`` to specify special boundary values, 
rather than using integral
--- End diff --

Reynold, looks we should deprecate `rangeBetween` here in Python side as 
well.

```python
def ragneBetween(...):

"""
...

.. note:: Deprecated in Spark 2.4. See SPARK-25842 for more information.
"""
warnings.warn(
"Deprecated in Spark 2.4. See SPARK-25842 for more information.", 
DeprecationWarning)
```


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #16812: [SPARK-19465][SQL] Added options for custom boolean valu...

2018-10-25 Thread HyukjinKwon
Github user HyukjinKwon commented on the issue:

https://github.com/apache/spark/pull/16812
  
This can be easily worked around, no?


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #22841: [SPARK-25842][SQL] Deprecate rangeBetween APIs introduce...

2018-10-25 Thread HyukjinKwon
Github user HyukjinKwon commented on the issue:

https://github.com/apache/spark/pull/22841
  
Yup, I also agree with this revert.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #22775: [SPARK-24709][SQL][FOLLOW-UP] Make schema_of_json's inpu...

2018-10-25 Thread HyukjinKwon
Github user HyukjinKwon commented on the issue:

https://github.com/apache/spark/pull/22775
  
Maybe I am too much careful about it but I am kind of nervous about this 
column case. I don't intend to disallow it entirely but only for Spark 2.4. We 
might have to find a way to use column column with `from_json` but I want to 
defer to 3.0.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #22775: [SPARK-24709][SQL][FOLLOW-UP] Make schema_of_json's inpu...

2018-10-25 Thread HyukjinKwon
Github user HyukjinKwon commented on the issue:

https://github.com/apache/spark/pull/22775
  
Actually, that usecase can more easily accomplished by simply inferring 
schema by JSON datasource. Yea, I indeed suggested that as workaround for this 
issue before. Let's say, `spark.read.json(df.select("json").as[String]).schema`.

I know it's not super clear about the usecase of `schema_of_json` and 
actually that's also partly why I want to allow what we need for this 
expression now.

@rxin, WDYT? This PR tries to allow what we only need for now. Let's say 
disallow:

```scala
schema_of_json(column)
```

and only allow 

```scala
schema_of_json(literal)
```

because the main usecase is:

```scala
from_json(schema_of_json(literal))
```

and

```scala
from_json(schema_of_json(column))
```

is already not being supported. My judgement was `schema_of_json(column)` 
doesn't have to be exposed for now and actually want to have a talk with 
@MaxGekk about this when he comes back from his vacation.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #22621: [SPARK-25602][SQL] SparkPlan.getByteArrayRdd should not ...

2018-10-25 Thread HyukjinKwon
Github user HyukjinKwon commented on the issue:

https://github.com/apache/spark/pull/22621
  
That's my point. Why do we have to document for fixing unexpected results 
fixed


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #22747: [SPARK-25760][SQL] Set AddJarCommand return empty

2018-10-25 Thread HyukjinKwon
Github user HyukjinKwon commented on the issue:

https://github.com/apache/spark/pull/22747
  
Yup, that's similar argument I had in 
https://github.com/apache/spark/pull/22773#issuecomment-432923361 I think we 
should clarify what to document there.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #22814: [SPARK-25819][SQL] Support parse mode option for ...

2018-10-25 Thread HyukjinKwon
Github user HyukjinKwon commented on a diff in the pull request:

https://github.com/apache/spark/pull/22814#discussion_r228113346
  
--- Diff: docs/sql-migration-guide-upgrade.md ---
@@ -10,6 +10,9 @@ displayTitle: Spark SQL Upgrading Guide
 ## Upgrading From Spark SQL 2.4 to 3.0
 
   - In PySpark, when creating a `SparkSession` with 
`SparkSession.builder.getOrCreate()`, if there is an existing `SparkContext`, 
the builder was trying to update the `SparkConf` of the existing `SparkContext` 
with configurations specified to the builder, but the `SparkContext` is shared 
by all `SparkSession`s, so we should not update them. Since 3.0, the builder 
comes to not update the configurations. This is the same behavior as Java/Scala 
API in 2.3 and above. If you want to update them, you need to update them prior 
to creating a `SparkSession`.
+  - In Avro data source, the function `from_avro` supports following parse 
modes:
--- End diff --

nit: let's add a space to make it more readable (I added each space between 
each item).


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #22814: [SPARK-25819][SQL] Support parse mode option for ...

2018-10-25 Thread HyukjinKwon
Github user HyukjinKwon commented on a diff in the pull request:

https://github.com/apache/spark/pull/22814#discussion_r228115771
  
--- Diff: 
external/avro/src/test/scala/org/apache/spark/sql/avro/AvroFunctionsSuite.scala 
---
@@ -61,6 +59,24 @@ class AvroFunctionsSuite extends QueryTest with 
SharedSQLContext {
 checkAnswer(avroStructDF.select(from_avro('avro, avroTypeStruct)), df)
   }
 
+  test("handle invalid input in from_avro") {
+val count = 10
+val df = spark.range(count).select(struct('id, 
'id.as("id2")).as("struct"))
+val avroStructDF = df.select(to_avro('struct).as("avro"))
+val avroTypeStruct = s"""
+  |{
+  |  "type": "record",
+  |  "name": "struct",
+  |  "fields": [
+  |{"name": "col1", "type": "long"},
+  |{"name": "col2", "type": "double"}
+  |  ]
+  |}
+""".stripMargin
+val expected = (0 until count).map(_ => Row(Row(null, null)))
+checkAnswer(avroStructDF.select(from_avro('avro, avroTypeStruct)), 
expected)
--- End diff --

BTW, when would there be malformed records - is this usually only when the 
schema is different?  Main purpose for parse modes at CSV and JSON is to cover 
the limitation from semi structured input data. Was wondering how useful it is 
in Avro if the most of cases are only when the input schema is different.

Also, one good thing about PERMISSIVE mode is that we allow to fill invalid 
records at `columnNameOfCorruptRecord`. Here it looks it doesn't quite make 
sense to add this functionality at Avro.



---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #22775: [SPARK-24709][SQL][FOLLOW-UP] Make schema_of_json's inpu...

2018-10-25 Thread HyukjinKwon
Github user HyukjinKwon commented on the issue:

https://github.com/apache/spark/pull/22775
  
retest this please


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #22814: [SPARK-25819][SQL] Support parse mode option for ...

2018-10-25 Thread HyukjinKwon
Github user HyukjinKwon commented on a diff in the pull request:

https://github.com/apache/spark/pull/22814#discussion_r228065259
  
--- Diff: 
external/avro/src/main/scala/org/apache/spark/sql/avro/AvroDataToCatalyst.scala 
---
@@ -21,16 +21,31 @@ import org.apache.avro.Schema
 import org.apache.avro.generic.GenericDatumReader
 import org.apache.avro.io.{BinaryDecoder, DecoderFactory}
 
-import org.apache.spark.sql.catalyst.expressions.{ExpectsInputTypes, 
Expression, UnaryExpression}
+import org.apache.spark.SparkException
+import org.apache.spark.sql.AnalysisException
+import org.apache.spark.sql.catalyst.expressions.{ExpectsInputTypes, 
Expression, SpecificInternalRow, UnaryExpression}
 import org.apache.spark.sql.catalyst.expressions.codegen.{CodegenContext, 
CodeGenerator, ExprCode}
-import org.apache.spark.sql.types.{AbstractDataType, BinaryType, DataType}
+import org.apache.spark.sql.catalyst.util.{FailFastMode, ParseMode, 
PermissiveMode}
+import org.apache.spark.sql.types._
 
-case class AvroDataToCatalyst(child: Expression, jsonFormatSchema: String)
+case class AvroDataToCatalyst(
+child: Expression,
+jsonFormatSchema: String,
+options: Map[String, String])
   extends UnaryExpression with ExpectsInputTypes {
 
   override def inputTypes: Seq[AbstractDataType] = Seq(BinaryType)
 
-  override lazy val dataType: DataType = 
SchemaConverters.toSqlType(avroSchema).dataType
+  override lazy val dataType: DataType = {
+val dt = SchemaConverters.toSqlType(avroSchema).dataType
+parseMode match {
+  // With PermissiveMode, the output Catalyst row might contain 
columns of null values for
+  // corrupt records, even if some of the columns are not nullable in 
the user-provided schema.
+  // Therefore we force the schema to be all nullable here.
+  case PermissiveMode => dt.asNullable
--- End diff --

This looks going to be an external behaviour change to users from 2.4.0. 
Please update migration guide as well.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #22621: [SPARK-25602][SQL] SparkPlan.getByteArrayRdd should not ...

2018-10-25 Thread HyukjinKwon
Github user HyukjinKwon commented on the issue:

https://github.com/apache/spark/pull/22621
  
Let's say, this can be behaivour changes too since metrics are now changed. 
Should we update migration guide for safety?


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #22690: [SPARK-19287][CORE][STREAMING] JavaPairRDD flatMapValues...

2018-10-24 Thread HyukjinKwon
Github user HyukjinKwon commented on the issue:

https://github.com/apache/spark/pull/22690
  
cc @cloud-fan and @gatorsmile 

Should we update migration guide as well?


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #22503: [SPARK-25493][SQL] Use auto-detection for CRLF in CSV da...

2018-10-24 Thread HyukjinKwon
Github user HyukjinKwon commented on the issue:

https://github.com/apache/spark/pull/22503
  
@justinuang, this might affect existing users application. Although this 
matches the behaviour to non-miltiline mode, can we explicitly mention it in 
migration guide?

cc @cloud-fan and @gatorsmile 


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #22747: [SPARK-25760][SQL] Set AddJarCommand return empty

2018-10-24 Thread HyukjinKwon
Github user HyukjinKwon commented on the issue:

https://github.com/apache/spark/pull/22747
  
This looks also external changes to existing application users. Shall we 
update migration guide?


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #22773: [SPARK-25785][SQL] Add prettyNames for from_json, to_jso...

2018-10-24 Thread HyukjinKwon
Github user HyukjinKwon commented on the issue:

https://github.com/apache/spark/pull/22773
  
Yup, will encourage to update the migration guide in that way.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #22728: [SPARK-25736][SQL][TEST] add tests to verify the behavio...

2018-10-24 Thread HyukjinKwon
Github user HyukjinKwon commented on the issue:

https://github.com/apache/spark/pull/22728
  
(From https://github.com/apache/spark/pull/22773#issuecomment-432917994) 
@gatorsmile and @cloud-fan, let's say this will break `DESCRIBE FUNCTION 
EXTENDED`. Should we update migration guide as well?


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #22815: [SPARK-25821][SQL] Remove SQLContext methods deprecated ...

2018-10-24 Thread HyukjinKwon
Github user HyukjinKwon commented on the issue:

https://github.com/apache/spark/pull/22815
  
BTW, should we update migration guide too?


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #22773: [SPARK-25785][SQL] Add prettyNames for from_json, to_jso...

2018-10-24 Thread HyukjinKwon
Github user HyukjinKwon commented on the issue:

https://github.com/apache/spark/pull/22773
  
Sure, so for clarification, we will document everything that affects to 
external users application, right?


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #22775: [SPARK-24709][SQL][FOLLOW-UP] Make schema_of_json's inpu...

2018-10-24 Thread HyukjinKwon
Github user HyukjinKwon commented on the issue:

https://github.com/apache/spark/pull/22775
  
@cloud-fan, looks we are going to start another RC. Would you mind if I ask 
to take a quick look before the new RC?


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



<    3   4   5   6   7   8   9   10   11   12   >