Github user mswit-databricks commented on a diff in the pull request:
https://github.com/apache/spark/pull/21320#discussion_r199415439
--- Diff:
sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/parquet/ParquetReadSupport.scala
---
@@ -71,9 +80,22 @@ private
Github user mswit-databricks commented on a diff in the pull request:
https://github.com/apache/spark/pull/21390#discussion_r189857237
--- Diff:
common/network-common/src/main/java/org/apache/spark/network/util/JavaUtils.java
---
@@ -157,10 +172,10 @@ private static void
Github user mswit-databricks commented on the issue:
https://github.com/apache/spark/pull/21070
@rdblue Do you see any risk of additional overhead coming from the extra
stats? For example, if the data contains very long strings, performing
comparison on them to generate stats
Github user mswit-databricks commented on a diff in the pull request:
https://github.com/apache/spark/pull/20694#discussion_r172142857
--- Diff:
sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/expressions/JsonExpressionsSuite.scala
---
@@ -680,4 +681,31 @@ class
GitHub user mswit-databricks opened a pull request:
https://github.com/apache/spark/pull/20694
[SPARK-23173][SQL] Avoid creating corrupt parquet files when loading data
from JSON
## What changes were proposed in this pull request?
The from_json() function accepts