[GitHub] spark pull request #21320: [SPARK-4502][SQL] Parquet nested column pruning -...

2018-07-02 Thread mswit-databricks
Github user mswit-databricks commented on a diff in the pull request: https://github.com/apache/spark/pull/21320#discussion_r199415439 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/parquet/ParquetReadSupport.scala --- @@ -71,9 +80,22 @@ private

[GitHub] spark pull request #21390: [SPARK-24340][Core] Clean up non-shuffle disk blo...

2018-05-22 Thread mswit-databricks
Github user mswit-databricks commented on a diff in the pull request: https://github.com/apache/spark/pull/21390#discussion_r189857237 --- Diff: common/network-common/src/main/java/org/apache/spark/network/util/JavaUtils.java --- @@ -157,10 +172,10 @@ private static void

[GitHub] spark issue #21070: [SPARK-23972][BUILD][SQL] Update Parquet to 1.10.0.

2018-05-01 Thread mswit-databricks
Github user mswit-databricks commented on the issue: https://github.com/apache/spark/pull/21070 @rdblue Do you see any risk of additional overhead coming from the extra stats? For example, if the data contains very long strings, performing comparison on them to generate stats

[GitHub] spark pull request #20694: [SPARK-23173][SQL] Avoid creating corrupt parquet...

2018-03-05 Thread mswit-databricks
Github user mswit-databricks commented on a diff in the pull request: https://github.com/apache/spark/pull/20694#discussion_r172142857 --- Diff: sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/expressions/JsonExpressionsSuite.scala --- @@ -680,4 +681,31 @@ class

[GitHub] spark pull request #20694: [SPARK-23173][SQL] Avoid creating corrupt parquet...

2018-02-28 Thread mswit-databricks
GitHub user mswit-databricks opened a pull request: https://github.com/apache/spark/pull/20694 [SPARK-23173][SQL] Avoid creating corrupt parquet files when loading data from JSON ## What changes were proposed in this pull request? The from_json() function accepts