[GitHub] spark pull request #21070: SPARK-23972: Update Parquet to 1.10.0.

2018-04-13 Thread henryr
Github user henryr commented on a diff in the pull request: https://github.com/apache/spark/pull/21070#discussion_r181540952 --- Diff: sql/core/src/main/java/org/apache/spark/sql/execution/datasources/parquet/VectorizedPlainValuesReader.java --- @@ -63,115 +58,139 @@ public final

[GitHub] spark issue #20704: [SPARK-23551][BUILD] Exclude `hadoop-mapreduce-client-co...

2018-04-13 Thread megaserg
Github user megaserg commented on the issue: https://github.com/apache/spark/pull/20704 Thank you @dongjoon-hyun! This was also affecting our Spark job performance! We're using `mapreduce.fileoutputcommitter.algorithm.version=2` in our Spark job config, as recommended e.g.

[GitHub] spark pull request #20611: [SPARK-23425][SQL]Support wildcard in HDFS path f...

2018-04-13 Thread wzhfy
Github user wzhfy commented on a diff in the pull request: https://github.com/apache/spark/pull/20611#discussion_r181538566 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/command/tables.scala --- @@ -304,45 +304,14 @@ case class LoadDataCommand( }

[GitHub] spark issue #20992: [SPARK-23779][SQL] TaskMemoryManager and UnsafeSorter re...

2018-04-13 Thread kiszk
Github user kiszk commented on the issue: https://github.com/apache/spark/pull/20992 Good point. I would add benchmark results. Let me leave ToDo in the description. --- - To unsubscribe, e-mail:

[GitHub] spark pull request #19868: [SPARK-22676] Avoid iterating all partition paths...

2018-04-13 Thread jinxing64
Github user jinxing64 commented on a diff in the pull request: https://github.com/apache/spark/pull/19868#discussion_r181538066 --- Diff: sql/hive/src/main/scala/org/apache/spark/sql/hive/TableReader.scala --- @@ -176,12 +176,13 @@ class HadoopTableReader( val

[GitHub] spark issue #20894: [SPARK-23786][SQL] Checking column names of csv headers

2018-04-13 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/20894 Test FAILed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/89360/ Test FAILed. ---

[GitHub] spark issue #20894: [SPARK-23786][SQL] Checking column names of csv headers

2018-04-13 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/20894 Merged build finished. Test FAILed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional

[GitHub] spark issue #20894: [SPARK-23786][SQL] Checking column names of csv headers

2018-04-13 Thread SparkQA
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/20894 **[Test build #89360 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/89360/testReport)** for PR 20894 at commit

[GitHub] spark issue #20992: [SPARK-23779][SQL] TaskMemoryManager and UnsafeSorter re...

2018-04-13 Thread rxin
Github user rxin commented on the issue: https://github.com/apache/spark/pull/20992 What are the performance improvements? Without additional data this seems like just an invasive change without any real benefits ... ---

[GitHub] spark pull request #21065: [SPARK-23979][SQL] MultiAlias should not be a Cod...

2018-04-13 Thread asfgit
Github user asfgit closed the pull request at: https://github.com/apache/spark/pull/21065 --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #21065: [SPARK-23979][SQL] MultiAlias should not be a CodegenFal...

2018-04-13 Thread cloud-fan
Github user cloud-fan commented on the issue: https://github.com/apache/spark/pull/21065 LGTM, merging to master! --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail:

[GitHub] spark pull request #20988: [SPARK-23877][SQL]: Use filter predicates to prun...

2018-04-13 Thread cloud-fan
Github user cloud-fan commented on a diff in the pull request: https://github.com/apache/spark/pull/20988#discussion_r181535823 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/OptimizeMetadataOnlyQuery.scala --- @@ -129,35 +151,41 @@ case class

[GitHub] spark issue #21031: [SPARK-23923][SQL] Add cardinality function

2018-04-13 Thread kiszk
Github user kiszk commented on the issue: https://github.com/apache/spark/pull/21031 According to my understanding, these activities are to improve compatibility with other DBs (like Presto) in https://issues.apache.org/jira/browse/SPARK-23899 and

[GitHub] spark issue #21060: [SPARK-23942][PYTHON][SQL][BRANCH-2.3] Makes collect in ...

2018-04-13 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/21060 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/89363/ Test PASSed. ---

[GitHub] spark issue #21060: [SPARK-23942][PYTHON][SQL][BRANCH-2.3] Makes collect in ...

2018-04-13 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/21060 Merged build finished. Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional

[GitHub] spark pull request #20988: [SPARK-23877][SQL]: Use filter predicates to prun...

2018-04-13 Thread cloud-fan
Github user cloud-fan commented on a diff in the pull request: https://github.com/apache/spark/pull/20988#discussion_r181535484 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/OptimizeMetadataOnlyQuery.scala --- @@ -129,35 +151,41 @@ case class

[GitHub] spark issue #21060: [SPARK-23942][PYTHON][SQL][BRANCH-2.3] Makes collect in ...

2018-04-13 Thread SparkQA
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/21060 **[Test build #89363 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/89363/testReport)** for PR 21060 at commit

[GitHub] spark pull request #21070: SPARK-23972: Update Parquet to 1.10.0.

2018-04-13 Thread rdblue
Github user rdblue commented on a diff in the pull request: https://github.com/apache/spark/pull/21070#discussion_r181535144 --- Diff: sql/core/src/main/java/org/apache/spark/sql/execution/datasources/parquet/VectorizedPlainValuesReader.java --- @@ -63,115 +58,139 @@ public final

[GitHub] spark issue #20988: [SPARK-23877][SQL]: Use filter predicates to prune parti...

2018-04-13 Thread cloud-fan
Github user cloud-fan commented on the issue: https://github.com/apache/spark/pull/20988 can we add a test? We can use `HiveCatalogMetrics.METRIC_PARTITIONS_FETCHED.getCount()` to check if this patch can really reduce the number of partitions being fetched. ---

[GitHub] spark issue #21071: [SPARK-21962][CORE] Distributed Tracing in Spark

2018-04-13 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/21071 Can one of the admins verify this patch? --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional

[GitHub] spark issue #21071: [SPARK-21962][CORE] Distributed Tracing in Spark

2018-04-13 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/21071 Can one of the admins verify this patch? --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional

[GitHub] spark pull request #21071: [SPARK-21962][CORE] Distributed Tracing in Spark

2018-04-13 Thread devaraj-kavali
GitHub user devaraj-kavali opened a pull request: https://github.com/apache/spark/pull/21071 [SPARK-21962][CORE] Distributed Tracing in Spark ## What changes were proposed in this pull request? This PR integrates with HTrace, it sends traces for the application and tasks

[GitHub] spark pull request #20933: [SPARK-23817][SQL]Migrate ORC file format read pa...

2018-04-13 Thread gengliangwang
Github user gengliangwang commented on a diff in the pull request: https://github.com/apache/spark/pull/20933#discussion_r181531212 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/v2/orc/OrcDataSourceV2.scala --- @@ -0,0 +1,194 @@ +/* + *

[GitHub] spark issue #21031: [SPARK-23923][SQL] Add cardinality function

2018-04-13 Thread rxin
Github user rxin commented on the issue: https://github.com/apache/spark/pull/21031 If there is already size, why do we need to create a new implementation? Why can't we just rewrite cardinality to size? Also I wouldn't add any programming API for this, since there is

[GitHub] spark pull request #21048: [SPARK-23966][SS] Refactoring all checkpoint file...

2018-04-13 Thread asfgit
Github user asfgit closed the pull request at: https://github.com/apache/spark/pull/21048 --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #21056: [SPARK-23849][SQL] Tests for samplingRatio of jso...

2018-04-13 Thread rxin
Github user rxin commented on a diff in the pull request: https://github.com/apache/spark/pull/21056#discussion_r181530121 --- Diff: sql/core/src/test/scala/org/apache/spark/sql/execution/datasources/json/JsonSuite.scala --- @@ -2128,38 +2128,60 @@ class JsonSuite extends

[GitHub] spark pull request #21053: [SPARK-23924][SQL] Add element_at function

2018-04-13 Thread rxin
Github user rxin commented on a diff in the pull request: https://github.com/apache/spark/pull/21053#discussion_r181529978 --- Diff: sql/core/src/test/scala/org/apache/spark/sql/DataFrameFunctionsSuite.scala --- @@ -413,6 +413,78 @@ class DataFrameFunctionsSuite extends QueryTest

[GitHub] spark issue #21048: [SPARK-23966][SS] Refactoring all checkpoint file writin...

2018-04-13 Thread tdas
Github user tdas commented on the issue: https://github.com/apache/spark/pull/21048 I am merging this to master. Once again, thank you for your reviews. --- - To unsubscribe, e-mail:

[GitHub] spark pull request #21053: [SPARK-23924][SQL] Add element_at function

2018-04-13 Thread rxin
Github user rxin commented on a diff in the pull request: https://github.com/apache/spark/pull/21053#discussion_r181529901 --- Diff: sql/core/src/test/scala/org/apache/spark/sql/DataFrameFunctionsSuite.scala --- @@ -413,6 +413,78 @@ class DataFrameFunctionsSuite extends QueryTest

[GitHub] spark pull request #20933: [SPARK-23817][SQL]Migrate ORC file format read pa...

2018-04-13 Thread rxin
Github user rxin commented on a diff in the pull request: https://github.com/apache/spark/pull/20933#discussion_r181529318 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/v2/orc/OrcDataSourceV2.scala --- @@ -0,0 +1,194 @@ +/* + * Licensed to

[GitHub] spark pull request #21070: SPARK-23972: Update Parquet to 1.10.0.

2018-04-13 Thread henryr
Github user henryr commented on a diff in the pull request: https://github.com/apache/spark/pull/21070#discussion_r181528729 --- Diff: sql/core/src/main/java/org/apache/spark/sql/execution/datasources/parquet/VectorizedPlainValuesReader.java --- @@ -63,115 +58,139 @@ public final

[GitHub] spark issue #20888: [SPARK-23775][TEST] Make DataFrameRangeSuite not flaky

2018-04-13 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/20888 Merged build finished. Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional

[GitHub] spark issue #20888: [SPARK-23775][TEST] Make DataFrameRangeSuite not flaky

2018-04-13 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/20888 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/89358/ Test PASSed. ---

[GitHub] spark issue #20888: [SPARK-23775][TEST] Make DataFrameRangeSuite not flaky

2018-04-13 Thread SparkQA
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/20888 **[Test build #89358 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/89358/testReport)** for PR 20888 at commit

[GitHub] spark issue #17359: [SPARK-20028][SQL] Add aggreagate expression nGrams

2018-04-13 Thread sijunhe
Github user sijunhe commented on the issue: https://github.com/apache/spark/pull/17359 Would love to see this feature in Spark SQL. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional

[GitHub] spark issue #21048: [SPARK-23966][SS] Refactoring all checkpoint file writin...

2018-04-13 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/21048 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/89357/ Test PASSed. ---

[GitHub] spark issue #21048: [SPARK-23966][SS] Refactoring all checkpoint file writin...

2018-04-13 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/21048 Merged build finished. Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional

[GitHub] spark issue #21048: [SPARK-23966][SS] Refactoring all checkpoint file writin...

2018-04-13 Thread SparkQA
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/21048 **[Test build #89357 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/89357/testReport)** for PR 21048 at commit

[GitHub] spark issue #21068: [SPARK-16630][YARN] Blacklist a node if executors won't ...

2018-04-13 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/21068 Merged build finished. Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional

[GitHub] spark issue #21068: [SPARK-16630][YARN] Blacklist a node if executors won't ...

2018-04-13 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/21068 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/89355/ Test PASSed. ---

[GitHub] spark issue #21068: [SPARK-16630][YARN] Blacklist a node if executors won't ...

2018-04-13 Thread SparkQA
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/21068 **[Test build #89355 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/89355/testReport)** for PR 21068 at commit

[GitHub] spark issue #20998: [SPARK-23888][CORE] speculative task should not run on a...

2018-04-13 Thread mridulm
Github user mridulm commented on the issue: https://github.com/apache/spark/pull/20998 @squito I completely agree that the comment is inaccurate. Note that this is for a specific taskset, so impact is limited to that taskset (w.r.t using executors for spec exec) ---

[GitHub] spark pull request #21069: [SPARK-23920][SQL]add array_remove to remove all ...

2018-04-13 Thread viirya
Github user viirya commented on a diff in the pull request: https://github.com/apache/spark/pull/21069#discussion_r181525550 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/collectionOperations.scala --- @@ -287,3 +287,44 @@ case class

[GitHub] spark issue #21048: [SPARK-23966][SS] Refactoring all checkpoint file writin...

2018-04-13 Thread tdas
Github user tdas commented on the issue: https://github.com/apache/spark/pull/21048 @steveloughran Thanks for your comments :) --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional

[GitHub] spark pull request #21061: [SPARK-23914][SQL] Add array_union function

2018-04-13 Thread viirya
Github user viirya commented on a diff in the pull request: https://github.com/apache/spark/pull/21061#discussion_r181524647 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/collectionOperations.scala --- @@ -287,3 +288,80 @@ case class

[GitHub] spark pull request #20923: [SPARK-23807][BUILD] Add Hadoop 3.1 profile with ...

2018-04-13 Thread vanzin
Github user vanzin commented on a diff in the pull request: https://github.com/apache/spark/pull/20923#discussion_r181524292 --- Diff: hadoop-cloud/pom.xml --- @@ -38,7 +38,32 @@ hadoop-cloud + + target/scala-${scala.binary.version}/classes

[GitHub] spark pull request #20923: [SPARK-23807][BUILD] Add Hadoop 3.1 profile with ...

2018-04-13 Thread vanzin
Github user vanzin commented on a diff in the pull request: https://github.com/apache/spark/pull/20923#discussion_r181524069 --- Diff: assembly/pom.xml --- @@ -254,6 +254,14 @@ spark-hadoop-cloud_${scala.binary.version} ${project.version}

[GitHub] spark pull request #20923: [SPARK-23807][BUILD] Add Hadoop 3.1 profile with ...

2018-04-13 Thread vanzin
Github user vanzin commented on a diff in the pull request: https://github.com/apache/spark/pull/20923#discussion_r181524354 --- Diff: hadoop-cloud/pom.xml --- @@ -38,7 +38,32 @@ hadoop-cloud + --- End diff -- Is this still needed after you

[GitHub] spark issue #21065: [SPARK-23979][SQL] MultiAlias should not be a CodegenFal...

2018-04-13 Thread viirya
Github user viirya commented on the issue: https://github.com/apache/spark/pull/21065 cc @cloud-fan @hvanhovell --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail:

[GitHub] spark issue #21069: [SPARK-23920][SQL]add array_remove to remove all element...

2018-04-13 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/21069 Merged build finished. Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional

[GitHub] spark issue #21069: [SPARK-23920][SQL]add array_remove to remove all element...

2018-04-13 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/21069 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/89354/ Test PASSed. ---

[GitHub] spark issue #21069: [SPARK-23920][SQL]add array_remove to remove all element...

2018-04-13 Thread SparkQA
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/21069 **[Test build #89354 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/89354/testReport)** for PR 21069 at commit

[GitHub] spark issue #21033: [SPARK-19320][MESOS]allow specifying a hard limit on num...

2018-04-13 Thread yanji84
Github user yanji84 commented on the issue: https://github.com/apache/spark/pull/21033 Anything else do we need to do to merge in this change? --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For

[GitHub] spark issue #21044: [SPARK-9312][ML] Add RawPrediction, numClasses, and numF...

2018-04-13 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/21044 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/89361/ Test PASSed. ---

[GitHub] spark issue #21044: [SPARK-9312][ML] Add RawPrediction, numClasses, and numF...

2018-04-13 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/21044 Merged build finished. Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional

[GitHub] spark issue #21044: [SPARK-9312][ML] Add RawPrediction, numClasses, and numF...

2018-04-13 Thread SparkQA
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/21044 **[Test build #89361 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/89361/testReport)** for PR 21044 at commit

[GitHub] spark pull request #21068: [SPARK-16630][YARN] Blacklist a node if executors...

2018-04-13 Thread squito
Github user squito commented on a diff in the pull request: https://github.com/apache/spark/pull/21068#discussion_r181513236 --- Diff: resource-managers/yarn/src/main/scala/org/apache/spark/deploy/yarn/FailureWithinTimeIntervalTracker.scala --- @@ -0,0 +1,80 @@ +/* + *

[GitHub] spark pull request #21068: [SPARK-16630][YARN] Blacklist a node if executors...

2018-04-13 Thread squito
Github user squito commented on a diff in the pull request: https://github.com/apache/spark/pull/21068#discussion_r181515465 --- Diff: resource-managers/yarn/src/main/scala/org/apache/spark/deploy/yarn/YarnAllocatorBlacklistTracker.scala --- @@ -0,0 +1,155 @@ +/* + *

[GitHub] spark issue #21060: [SPARK-23942][PYTHON][SQL][BRANCH-2.3] Makes collect in ...

2018-04-13 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/21060 Merged build finished. Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional

[GitHub] spark issue #21060: [SPARK-23942][PYTHON][SQL][BRANCH-2.3] Makes collect in ...

2018-04-13 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/21060 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution/2328/

[GitHub] spark issue #21060: [SPARK-23942][PYTHON][SQL][BRANCH-2.3] Makes collect in ...

2018-04-13 Thread SparkQA
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/21060 **[Test build #89363 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/89363/testReport)** for PR 21060 at commit

[GitHub] spark issue #21070: SPARK-23972: Update Parquet to 1.10.0.

2018-04-13 Thread rdblue
Github user rdblue commented on the issue: https://github.com/apache/spark/pull/21070 Upstream benchmarks for buffer management changes are here: https://github.com/apache/parquet-mr/pull/390#issuecomment-338505426 That doesn't show the GC benefit for smaller buffer

[GitHub] spark issue #21060: [SPARK-23942][PYTHON][SQL][BRANCH-2.3] Makes collect in ...

2018-04-13 Thread BryanCutler
Github user BryanCutler commented on the issue: https://github.com/apache/spark/pull/21060 retest this please --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail:

[GitHub] spark issue #21070: SPARK-23972: Update Parquet to 1.10.0.

2018-04-13 Thread gatorsmile
Github user gatorsmile commented on the issue: https://github.com/apache/spark/pull/21070 Could you share the performance number? --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional

[GitHub] spark issue #21070: SPARK-23972: Update Parquet to 1.10.0.

2018-04-13 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/21070 Merged build finished. Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional

[GitHub] spark issue #21070: SPARK-23972: Update Parquet to 1.10.0.

2018-04-13 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/21070 Merged build finished. Test FAILed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional

[GitHub] spark issue #21070: SPARK-23972: Update Parquet to 1.10.0.

2018-04-13 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/21070 Test FAILed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/89362/ Test FAILed. ---

[GitHub] spark issue #21070: SPARK-23972: Update Parquet to 1.10.0.

2018-04-13 Thread SparkQA
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/21070 **[Test build #89362 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/89362/testReport)** for PR 21070 at commit

[GitHub] spark issue #21070: SPARK-23972: Update Parquet to 1.10.0.

2018-04-13 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/21070 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution/2327/

[GitHub] spark issue #21070: SPARK-23972: Update Parquet to 1.10.0.

2018-04-13 Thread SparkQA
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/21070 **[Test build #89362 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/89362/testReport)** for PR 21070 at commit

[GitHub] spark pull request #21043: [SPARK-23963] [SQL] Properly handle large number ...

2018-04-13 Thread asfgit
Github user asfgit closed the pull request at: https://github.com/apache/spark/pull/21043 --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #21070: SPARK-23972: Update Parquet to 1.10.0.

2018-04-13 Thread rdblue
GitHub user rdblue opened a pull request: https://github.com/apache/spark/pull/21070 SPARK-23972: Update Parquet to 1.10.0. ## What changes were proposed in this pull request? This updates Parquet to 1.10.0 and updates the vectorized path for buffer management changes.

[GitHub] spark issue #21043: [SPARK-23963] [SQL] Properly handle large number of colu...

2018-04-13 Thread gatorsmile
Github user gatorsmile commented on the issue: https://github.com/apache/spark/pull/21043 Thanks! Merged to master. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail:

[GitHub] spark pull request #20933: [SPARK-23817][SQL]Migrate ORC file format read pa...

2018-04-13 Thread rdblue
Github user rdblue commented on a diff in the pull request: https://github.com/apache/spark/pull/20933#discussion_r181509305 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/DataSourceScanExec.scala --- @@ -368,8 +368,7 @@ case class FileSourceScanExec(

[GitHub] spark pull request #20933: [SPARK-23817][SQL]Migrate ORC file format read pa...

2018-04-13 Thread gatorsmile
Github user gatorsmile commented on a diff in the pull request: https://github.com/apache/spark/pull/20933#discussion_r181507712 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/internal/SQLConf.scala --- @@ -1185,6 +1185,13 @@ object SQLConf { .stringConf

[GitHub] spark pull request #20997: [SPARK-19185] [DSTREAMS] Avoid concurrent use of ...

2018-04-13 Thread koeninger
Github user koeninger commented on a diff in the pull request: https://github.com/apache/spark/pull/20997#discussion_r181507520 --- Diff: external/kafka-0-10/src/main/scala/org/apache/spark/streaming/kafka010/KafkaDataConsumer.scala --- @@ -0,0 +1,381 @@ +/* + * Licensed

[GitHub] spark pull request #20997: [SPARK-19185] [DSTREAMS] Avoid concurrent use of ...

2018-04-13 Thread koeninger
Github user koeninger commented on a diff in the pull request: https://github.com/apache/spark/pull/20997#discussion_r181506863 --- Diff: external/kafka-0-10/src/main/scala/org/apache/spark/streaming/kafka010/KafkaDataConsumer.scala --- @@ -0,0 +1,381 @@ +/* + * Licensed

[GitHub] spark pull request #20997: [SPARK-19185] [DSTREAMS] Avoid concurrent use of ...

2018-04-13 Thread koeninger
Github user koeninger commented on a diff in the pull request: https://github.com/apache/spark/pull/20997#discussion_r181506582 --- Diff: external/kafka-0-10/src/main/scala/org/apache/spark/streaming/kafka010/KafkaDataConsumer.scala --- @@ -0,0 +1,381 @@ +/* + * Licensed

[GitHub] spark issue #21044: [SPARK-9312][ML] Add RawPrediction, numClasses, and numF...

2018-04-13 Thread SparkQA
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/21044 **[Test build #89361 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/89361/testReport)** for PR 21044 at commit

[GitHub] spark issue #21053: [SPARK-23924][SQL] Add element_at function

2018-04-13 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/21053 Merged build finished. Test FAILed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional

[GitHub] spark issue #21053: [SPARK-23924][SQL] Add element_at function

2018-04-13 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/21053 Test FAILed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/89356/ Test FAILed. ---

[GitHub] spark issue #21053: [SPARK-23924][SQL] Add element_at function

2018-04-13 Thread SparkQA
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/21053 **[Test build #89356 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/89356/testReport)** for PR 21053 at commit

[GitHub] spark pull request #20997: [SPARK-19185] [DSTREAMS] Avoid concurrent use of ...

2018-04-13 Thread gaborgsomogyi
Github user gaborgsomogyi commented on a diff in the pull request: https://github.com/apache/spark/pull/20997#discussion_r181502862 --- Diff: external/kafka-0-10/src/test/scala/org/apache/spark/streaming/kafka010/KafkaDataConsumerSuite.scala --- @@ -0,0 +1,111 @@ +/* + *

[GitHub] spark issue #20997: [SPARK-19185] [DSTREAMS] Avoid concurrent use of cached ...

2018-04-13 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/20997 Merged build finished. Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional

[GitHub] spark issue #20997: [SPARK-19185] [DSTREAMS] Avoid concurrent use of cached ...

2018-04-13 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/20997 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/89359/ Test PASSed. ---

[GitHub] spark issue #20997: [SPARK-19185] [DSTREAMS] Avoid concurrent use of cached ...

2018-04-13 Thread SparkQA
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/20997 **[Test build #89359 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/89359/testReport)** for PR 20997 at commit

[GitHub] spark issue #21011: [SPARK-23916][SQL] Add array_join function

2018-04-13 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/21011 Merged build finished. Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional

[GitHub] spark issue #21011: [SPARK-23916][SQL] Add array_join function

2018-04-13 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/21011 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/89351/ Test PASSed. ---

[GitHub] spark issue #21011: [SPARK-23916][SQL] Add array_join function

2018-04-13 Thread SparkQA
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/21011 **[Test build #89351 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/89351/testReport)** for PR 21011 at commit

[GitHub] spark issue #21063: [SPARK-23886][Structured Streaming][WIP] Update query st...

2018-04-13 Thread jose-torres
Github user jose-torres commented on the issue: https://github.com/apache/spark/pull/21063 I guess we might not even need to make an API change, just document that these flags only mean anything for microbatch execution. In any case that's a separate discussion. ---

[GitHub] spark issue #21063: [SPARK-23886][Structured Streaming][WIP] Update query st...

2018-04-13 Thread jose-torres
Github user jose-torres commented on the issue: https://github.com/apache/spark/pull/21063 I'm not sure isDataAvailable makes sense in the context of continuous processing; it seems fundamentally tied to the microbatch execution model. I think the best option is to just leave it and

[GitHub] spark issue #20894: [SPARK-23786][SQL] Checking column names of csv headers

2018-04-13 Thread SparkQA
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/20894 **[Test build #89360 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/89360/testReport)** for PR 20894 at commit

[GitHub] spark issue #20997: [SPARK-19185] [DSTREAMS] Avoid concurrent use of cached ...

2018-04-13 Thread SparkQA
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/20997 **[Test build #89359 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/89359/testReport)** for PR 20997 at commit

[GitHub] spark issue #21045: [WIP][SPARK-23931][SQL] Adds zip function to sparksql

2018-04-13 Thread mgaido91
Github user mgaido91 commented on the issue: https://github.com/apache/spark/pull/21045 @DylanGuedes the first suggestion I can give you is: do not use spark-shell for testing, but write UT and run them with a debugger. Then, you can breakpoint to check the generated code (or you can

[GitHub] spark issue #21045: [WIP][SPARK-23931][SQL] Adds zip function to sparksql

2018-04-13 Thread DylanGuedes
Github user DylanGuedes commented on the issue: https://github.com/apache/spark/pull/21045 Ok so It works fine in spark-shell but in pyspark I got this error: ```shell File "/home/dguedes/Workspace/spark/python/pyspark/sql/functions.py", line 2155, in pyspark.sql.functions.zip

[GitHub] spark issue #20888: [SPARK-23775][TEST] Make DataFrameRangeSuite not flaky

2018-04-13 Thread SparkQA
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/20888 **[Test build #89358 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/89358/testReport)** for PR 20888 at commit

[GitHub] spark issue #21068: [SPARK-16630][YARN] Blacklist a node if executors won't ...

2018-04-13 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/21068 Test FAILed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/89350/ Test FAILed. ---

[GitHub] spark issue #21068: [SPARK-16630][YARN] Blacklist a node if executors won't ...

2018-04-13 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/21068 Merged build finished. Test FAILed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional

[GitHub] spark issue #21068: [SPARK-16630][YARN] Blacklist a node if executors won't ...

2018-04-13 Thread SparkQA
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/21068 **[Test build #89350 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/89350/testReport)** for PR 21068 at commit

[GitHub] spark pull request #21045: [WIP][SPARK-23931][SQL] Adds zip function to spar...

2018-04-13 Thread kiszk
Github user kiszk commented on a diff in the pull request: https://github.com/apache/spark/pull/21045#discussion_r181489957 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/collectionOperations.scala --- @@ -87,6 +87,62 @@ case class MapKeys(child:

  1   2   3   4   5   >