[GitHub] spark pull request #20928: [MINOR][DOC] Fix some typos and grammar issues
Github user HyukjinKwon commented on a diff in the pull request: https://github.com/apache/spark/pull/20928#discussion_r179038732 --- Diff: docs/mllib-feature-extraction.md --- @@ -105,7 +105,7 @@ p(w_i | w_j ) = \frac{\exp(u_{w_i}^{\top}v_{w_j})}{\sum_{l=1}^{V} \exp(u_l^{\top \]` where $V$ is the vocabulary size. -The skip-gram model with softmax is expensive because the cost of computing $\log p(w_i | w_j)$ +The skip-gram model with softmax is expensive because of the cost of computing $\log p(w_i | w_j)$ --- End diff -- seems a mistake. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20961: [SPARK-23823][SQL] Keep origin in transformExpression
Github user cloud-fan commented on the issue: https://github.com/apache/spark/pull/20961 LGTM, too --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #20928: [MINOR][DOC] Fix some typos and grammar issues
Github user HyukjinKwon commented on a diff in the pull request: https://github.com/apache/spark/pull/20928#discussion_r179040483 --- Diff: sql/README.md --- @@ -6,7 +6,7 @@ This module provides support for executing relational queries expressed in eithe Spark SQL is broken up into four subprojects: - Catalyst (sql/catalyst) - An implementation-agnostic framework for manipulating trees of relational operators and expressions. - Execution (sql/core) - A query planner / execution engine for translating Catalyst's logical query plans into Spark RDDs. This component also includes a new public interface, SQLContext, that allows users to execute SQL or LINQ statements against existing RDDs and Parquet files. - - Hive Support (sql/hive) - Includes an extension of SQLContext called HiveContext that allows users to write queries using a subset of HiveQL and access data from a Hive Metastore using Hive SerDes. There are also wrappers that allows users to run queries that include Hive UDFs, UDAFs, and UDTFs. + - Hive Support (sql/hive) - Includes an extension of SQLContext called HiveContext that allows users to write queries using a subset of HiveQL and access data from a Hive Metastore using Hive SerDes. There are also wrappers that allow users to run queries that include Hive UDFs, UDAFs, and UDTFs. --- End diff -- There seems an extra place after `allow`. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #20928: [MINOR][DOC] Fix some typos and grammar issues
Github user HyukjinKwon commented on a diff in the pull request: https://github.com/apache/spark/pull/20928#discussion_r179038446 --- Diff: docs/ml-collaborative-filtering.md --- @@ -92,7 +92,7 @@ above) and "drop". Further strategies may be supported in future. -In the following example, we load ratings data from the +In the following example, we load rating data from the --- End diff -- @dsakuma, ratings seems fine (also given the link http://grouplens.org/datasets/movielens/) --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #20928: [MINOR][DOC] Fix some typos and grammar issues
Github user HyukjinKwon commented on a diff in the pull request: https://github.com/apache/spark/pull/20928#discussion_r179038867 --- Diff: docs/mllib-naive-bayes.md --- @@ -19,7 +19,7 @@ These models are typically used for [document classification](http://nlp.stanfor Within that context, each observation is a document and each feature represents a term whose value is the frequency of the term (in multinomial naive Bayes) or a zero or one indicating whether the term was found in the document (in Bernoulli naive Bayes). -Feature values must be nonnegative. The model type is selected with an optional parameter +Feature values must be non-negative. The model type is selected with an optional parameter --- End diff -- seems the previous one also correct. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #18576: [SPARK-21351][SQL] Update nullability based on ch...
Github user asfgit closed the pull request at: https://github.com/apache/spark/pull/18576 --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #18576: [SPARK-21351][SQL] Update nullability based on children'...
Github user cloud-fan commented on the issue: https://github.com/apache/spark/pull/18576 thanks, merging to master! --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #20969: [SPARK-23826] [TEST] TestHiveSparkSession should ...
Github user asfgit closed the pull request at: https://github.com/apache/spark/pull/20969 --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20969: [SPARK-23826] [TEST] TestHiveSparkSession should set def...
Github user cloud-fan commented on the issue: https://github.com/apache/spark/pull/20969 thanks, merging to master! --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #20913: [SPARK-23799] FilterEstimation.evaluateInSet prod...
Github user maropu commented on a diff in the pull request: https://github.com/apache/spark/pull/20913#discussion_r179037665 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/plans/logical/statsEstimation/FilterEstimation.scala --- @@ -427,7 +427,11 @@ case class FilterEstimation(plan: Filter) extends Logging { // return the filter selectivity. Without advanced statistics such as histograms, // we have to assume uniform distribution. -Some(math.min(newNdv.toDouble / ndv.toDouble, 1.0)) +if (ndv.toDouble != 0) { --- End diff -- What's the concrete case when `ndv.toDouble == 0`? Also, is this only an place where we need this check? For example, we don't here: https://github.com/apache/spark/blob/5cfd5fabcdbd77a806b98a6dd59b02772d2f6dee/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/plans/logical/statsEstimation/EstimationUtils.scala#L166 --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #20931: [SPARK-23815][Core]Spark writer dynamic partition...
Github user cloud-fan commented on a diff in the pull request: https://github.com/apache/spark/pull/20931#discussion_r179036894 --- Diff: core/src/main/scala/org/apache/spark/internal/io/HadoopMapReduceCommitProtocol.scala --- @@ -186,7 +186,9 @@ class HadoopMapReduceCommitProtocol( logDebug(s"Clean up default partition directories for overwriting: $partitionPaths") for (part <- partitionPaths) { val finalPartPath = new Path(path, part) - fs.delete(finalPartPath, true) + if (!fs.delete(finalPartPath, true) && !fs.exists(finalPartPath.getParent)) { +fs.mkdirs(finalPartPath.getParent) --- End diff -- do you have some official HDFS document to support this change? --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20969: [SPARK-23826] [TEST] TestHiveSparkSession should set def...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/20969 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/88872/ Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20969: [SPARK-23826] [TEST] TestHiveSparkSession should set def...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/20969 Merged build finished. Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20969: [SPARK-23826] [TEST] TestHiveSparkSession should set def...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/20969 **[Test build #88872 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/88872/testReport)** for PR 20969 at commit [`f7e0b03`](https://github.com/apache/spark/commit/f7e0b034026691872c905ab4d5d09c381c56b7b0). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20931: [SPARK-23815][Core]Spark writer dynamic partition overwr...
Github user cloud-fan commented on the issue: https://github.com/apache/spark/pull/20931 ok to test --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20913: [SPARK-23799] FilterEstimation.evaluateInSet produces de...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/20913 **[Test build #88877 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/88877/testReport)** for PR 20913 at commit [`67597fd`](https://github.com/apache/spark/commit/67597fdcb703c7fa3fa189a456944693727d5754). --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20913: [SPARK-23799] FilterEstimation.evaluateInSet produces de...
Github user maropu commented on the issue: https://github.com/apache/spark/pull/20913 retest this please --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #18576: [SPARK-21351][SQL] Update nullability based on children'...
Github user maropu commented on the issue: https://github.com/apache/spark/pull/18576 ping --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20965: [SPARK-21870][SQL] Split aggregation code into small fun...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/20965 Merged build finished. Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20965: [SPARK-21870][SQL] Split aggregation code into small fun...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/20965 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution/1946/ Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20965: [SPARK-21870][SQL] Split aggregation code into small fun...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/20965 **[Test build #88876 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/88876/testReport)** for PR 20965 at commit [`696ba17`](https://github.com/apache/spark/commit/696ba171e2f42ceb6028eec56f8422715ca40a99). --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20965: [SPARK-21870][SQL] Split aggregation code into small fun...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/20965 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution/1945/ Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20965: [SPARK-21870][SQL] Split aggregation code into small fun...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/20965 **[Test build #88875 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/88875/testReport)** for PR 20965 at commit [`9623765`](https://github.com/apache/spark/commit/962376552a9cfbd4a110ceb9294caeffb3032ecc). --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20965: [SPARK-21870][SQL] Split aggregation code into small fun...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/20965 Merged build finished. Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20973: [SPARK-20114][ML] spark.ml parity for sequential pattern...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/20973 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/88873/ Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20973: [SPARK-20114][ML] spark.ml parity for sequential pattern...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/20973 Merged build finished. Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20973: [SPARK-20114][ML] spark.ml parity for sequential pattern...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/20973 **[Test build #88873 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/88873/testReport)** for PR 20973 at commit [`d563c8f`](https://github.com/apache/spark/commit/d563c8fab0cb718b511ac78bc38e712a65148d17). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20953: [SPARK-23822][SQL] Improve error message for Parquet sch...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/20953 Test FAILed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/88871/ Test FAILed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20953: [SPARK-23822][SQL] Improve error message for Parquet sch...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/20953 Merged build finished. Test FAILed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20886: [SPARK-19724][SQL]create a managed table with an existed...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/20886 Merged build finished. Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20953: [SPARK-23822][SQL] Improve error message for Parquet sch...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/20953 **[Test build #88871 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/88871/testReport)** for PR 20953 at commit [`a06ad5e`](https://github.com/apache/spark/commit/a06ad5e0451c3ff8bf7104512f32161bf66ed696). * This patch **fails Spark unit tests**. * This patch merges cleanly. * This patch adds no public classes. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20886: [SPARK-19724][SQL]create a managed table with an existed...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/20886 **[Test build #88874 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/88874/testReport)** for PR 20886 at commit [`2b2973a`](https://github.com/apache/spark/commit/2b2973a9db7a8fa228bfc939604feca4cc2c6a59). --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20886: [SPARK-19724][SQL]create a managed table with an existed...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/20886 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution/1944/ Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20971: [SPARK-23809][SQL][backport] Active SparkSession should ...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/20971 Merged build finished. Test FAILed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20971: [SPARK-23809][SQL][backport] Active SparkSession should ...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/20971 Test FAILed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/88870/ Test FAILed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20971: [SPARK-23809][SQL][backport] Active SparkSession should ...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/20971 **[Test build #88870 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/88870/testReport)** for PR 20971 at commit [`36fa1bd`](https://github.com/apache/spark/commit/36fa1bdc847f0b5ffb61284a35f3183751255705). * This patch **fails Spark unit tests**. * This patch merges cleanly. * This patch adds no public classes. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #20611: [SPARK-23425][SQL]Support wildcard in HDFS path f...
Github user sujith71955 commented on a diff in the pull request: https://github.com/apache/spark/pull/20611#discussion_r179030611 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/command/tables.scala --- @@ -385,7 +385,9 @@ case class LoadDataCommand( val hadoopConf = sparkSession.sessionState.newHadoopConf() val srcPath = new Path(hdfsUri) val fs = srcPath.getFileSystem(hadoopConf) -if (!fs.exists(srcPath)) { +// Check if the path exists or there are matched paths if it's a path with wildcard. +// For HDFS path, we support wildcard in directory name and file name. +if (null == fs.globStatus(srcPath) || fs.globStatus(srcPath).isEmpty) { --- End diff -- I will update the PR as such we can use fs.globStatus() API in both local and hdfs file path scenarios. Thanks. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #20611: [SPARK-23425][SQL]Support wildcard in HDFS path f...
Github user sujith71955 commented on a diff in the pull request: https://github.com/apache/spark/pull/20611#discussion_r179030399 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/command/tables.scala --- @@ -385,7 +385,9 @@ case class LoadDataCommand( val hadoopConf = sparkSession.sessionState.newHadoopConf() val srcPath = new Path(hdfsUri) val fs = srcPath.getFileSystem(hadoopConf) -if (!fs.exists(srcPath)) { +// Check if the path exists or there are matched paths if it's a path with wildcard. +// For HDFS path, we support wildcard in directory name and file name. +if (null == fs.globStatus(srcPath) || fs.globStatus(srcPath).isEmpty) { --- End diff -- @wzhfy @HyukjinKwon @dongjoon-hyun i verified the scenario by updating the code by using fs.globStatus() API for both local and hdfs path. for local path its working fine --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20974: [SPARK-23862][SQL] Spark ExpressionEncoder should suppor...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/20974 Can one of the admins verify this patch? --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20974: [SPARK-23862][SQL] Spark ExpressionEncoder should suppor...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/20974 Can one of the admins verify this patch? --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #20974: [SPARK-23862][SQL] Spark ExpressionEncoder should...
GitHub user fangshil opened a pull request: https://github.com/apache/spark/pull/20974 [SPARK-23862][SQL] Spark ExpressionEncoder should support java enum type in scala ## What changes were proposed in this pull request? In SPARK-21255, spark upstream adds support for creating encoders for java enum types, but the support is only added to Java API(for enum working within Java Beans). Since the java enum can come from third-party java library, we have use case that requires 1. using java enum types as field of scala case class 2. using java enum as the type T in Dataset[T] Spark ExpressionEncoder already supports ser/de many java types in ScalaReflection, so we propose to add support for java enum as well, as a follow up of SPARK-21255. ## How was this patch tested? Tested the patch in our production cluster. Added unit test. Since: 1. it is not possible to define a java enum in scala directly, since the defined enum class in scala will miss method like valueOf which is added by java compiler 2. it is not possible to define a test enum java class and use in scala test because the compilation of single scala test(-DwildcardSuites=org.apache.spark.sql.DatasetSuite) won't compile the test java class first As a result, I use the Spark SQL public java enum API(SaveMode.java) in the test. Please advise if there is a better way to test You can merge this pull request into a Git repository by running: $ git pull https://github.com/fangshil/spark SPARK-23862 Alternatively you can review and apply these changes as the patch at: https://github.com/apache/spark/pull/20974.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #20974 commit 90effb21375a2ec0e93426efcaae092ad3f59e26 Author: Fangshi Li Date: 2018-04-04T04:52:36Z SPARK-23862: Spark ExpressionEncoder should support java enum type in scala --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20973: [SPARK-20114][ML] spark.ml parity for sequential pattern...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/20973 Merged build finished. Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20973: [SPARK-20114][ML] spark.ml parity for sequential pattern...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/20973 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution/1943/ Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20973: [SPARK-20114][ML] spark.ml parity for sequential pattern...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/20973 **[Test build #88873 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/88873/testReport)** for PR 20973 at commit [`d563c8f`](https://github.com/apache/spark/commit/d563c8fab0cb718b511ac78bc38e712a65148d17). --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #20810: [SPARK-20114][ML] spark.ml parity for sequential ...
Github user WeichenXu123 closed the pull request at: https://github.com/apache/spark/pull/20810 --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20810: [SPARK-20114][ML] spark.ml parity for sequential pattern...
Github user WeichenXu123 commented on the issue: https://github.com/apache/spark/pull/20810 According to @jkbradley 's opinion. I create a new PR which only use a static method. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #20973: [SPARK-20114][ML] spark.ml parity for sequential ...
GitHub user WeichenXu123 opened a pull request: https://github.com/apache/spark/pull/20973 [SPARK-20114][ML] spark.ml parity for sequential pattern mining - PrefixSpan ## What changes were proposed in this pull request? PrefixSpan API for spark.ml. New implementation instead of #20810 ## How was this patch tested? N/A You can merge this pull request into a Git repository by running: $ git pull https://github.com/WeichenXu123/spark prefixSpan2 Alternatively you can review and apply these changes as the patch at: https://github.com/apache/spark/pull/20973.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #20973 commit d563c8fab0cb718b511ac78bc38e712a65148d17 Author: WeichenXu Date: 2018-04-04T04:42:05Z init pr --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20786: [SPARK-14681][ML] Provide label/impurity stats for spark...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/20786 Test FAILed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/88868/ Test FAILed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20786: [SPARK-14681][ML] Provide label/impurity stats for spark...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/20786 Merged build finished. Test FAILed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20786: [SPARK-14681][ML] Provide label/impurity stats for spark...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/20786 **[Test build #88868 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/88868/testReport)** for PR 20786 at commit [`48c17d4`](https://github.com/apache/spark/commit/48c17d4dff6a4e82b86d70f3845e6d524b4807e5). * This patch **fails Spark unit tests**. * This patch merges cleanly. * This patch adds the following public classes _(experimental)_: * `sealed trait ClassificationNode extends Node ` * `sealed trait RegressionNode extends Node ` * `sealed trait LeafNode extends Node ` * `sealed trait InternalNode extends Node ` --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20969: [SPARK-23826] [TEST] TestHiveSparkSession should set def...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/20969 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution/1942/ Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20969: [SPARK-23826] [TEST] TestHiveSparkSession should set def...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/20969 Merged build finished. Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20969: [SPARK-23826] [TEST] TestHiveSparkSession should set def...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/20969 **[Test build #88872 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/88872/testReport)** for PR 20969 at commit [`f7e0b03`](https://github.com/apache/spark/commit/f7e0b034026691872c905ab4d5d09c381c56b7b0). --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #20969: [SPARK-23826] [TEST] TestHiveSparkSession should ...
Github user gatorsmile commented on a diff in the pull request: https://github.com/apache/spark/pull/20969#discussion_r179020152 --- Diff: sql/hive/src/main/scala/org/apache/spark/sql/hive/test/TestHive.scala --- @@ -159,9 +159,10 @@ private[hive] class TestHiveSparkSession( private val loadTestTables: Boolean) extends SparkSession(sc) with Logging { self => - // TODO(SPARK-23826): TestHiveSparkSession should set default session the same way as - // TestSparkSession, but doing this the same way breaks many tests in the package. We need - // to investigate and find a different strategy. + // The base spark session does this in getOrCreate(), here we emulate that behavior for tests. + if (SparkSession.getDefaultSession.isEmpty) { +SparkSession.setDefaultSession(this) + } --- End diff -- This is not needed after we merge https://github.com/apache/spark/pull/20927 --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20971: [SPARK-23809][SQL][backport] Active SparkSession should ...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/20971 Merged build finished. Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20971: [SPARK-23809][SQL][backport] Active SparkSession should ...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/20971 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution/1941/ Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20971: [SPARK-23809][SQL][backport] Active SparkSession should ...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/20971 **[Test build #88870 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/88870/testReport)** for PR 20971 at commit [`36fa1bd`](https://github.com/apache/spark/commit/36fa1bdc847f0b5ffb61284a35f3183751255705). --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20953: [SPARK-23822][SQL] Improve error message for Parquet sch...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/20953 **[Test build #88871 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/88871/testReport)** for PR 20953 at commit [`a06ad5e`](https://github.com/apache/spark/commit/a06ad5e0451c3ff8bf7104512f32161bf66ed696). --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20953: [SPARK-23822][SQL] Improve error message for Parquet sch...
Github user gatorsmile commented on the issue: https://github.com/apache/spark/pull/20953 retest this please --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20797: [SPARK-23583][SQL] Invoke should support interpreted exe...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/20797 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution/1940/ Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20971: [SPARK-23809][SQL][backport] Active SparkSession should ...
Github user gatorsmile commented on the issue: https://github.com/apache/spark/pull/20971 retest this please --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20797: [SPARK-23583][SQL] Invoke should support interpreted exe...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/20797 Merged build finished. Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20797: [SPARK-23583][SQL] Invoke should support interpreted exe...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/20797 **[Test build #88869 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/88869/testReport)** for PR 20797 at commit [`c568944`](https://github.com/apache/spark/commit/c568944a98ce35c79809283a68ec95454029d0ea). --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20971: [SPARK-23809][SQL][backport] Active SparkSession should ...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/20971 Merged build finished. Test FAILed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20971: [SPARK-23809][SQL][backport] Active SparkSession should ...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/20971 Test FAILed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/88867/ Test FAILed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20971: [SPARK-23809][SQL][backport] Active SparkSession should ...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/20971 **[Test build #88867 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/88867/testReport)** for PR 20971 at commit [`36fa1bd`](https://github.com/apache/spark/commit/36fa1bdc847f0b5ffb61284a35f3183751255705). * This patch **fails Spark unit tests**. * This patch merges cleanly. * This patch adds no public classes. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20953: [SPARK-23822][SQL] Improve error message for Parquet sch...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/20953 Test FAILed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/88866/ Test FAILed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20953: [SPARK-23822][SQL] Improve error message for Parquet sch...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/20953 Merged build finished. Test FAILed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20953: [SPARK-23822][SQL] Improve error message for Parquet sch...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/20953 **[Test build #88866 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/88866/testReport)** for PR 20953 at commit [`a06ad5e`](https://github.com/apache/spark/commit/a06ad5e0451c3ff8bf7104512f32161bf66ed696). * This patch **fails Spark unit tests**. * This patch merges cleanly. * This patch adds no public classes. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #20972: Fixes misspelling in configuration.md
Github user bradurani closed the pull request at: https://github.com/apache/spark/pull/20972 --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20886: [SPARK-19724][SQL]create a managed table with an existed...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/20886 Merged build finished. Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #19968: [SPARK-22769][CORE] When driver stopping, there i...
Github user KaiXinXiaoLei closed the pull request at: https://github.com/apache/spark/pull/19968 --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20886: [SPARK-19724][SQL]create a managed table with an existed...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/20886 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/88860/ Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #19968: [SPARK-22769][CORE] When driver stopping, there is error...
Github user KaiXinXiaoLei commented on the issue: https://github.com/apache/spark/pull/19968 Now this problemï¼ i don't work. Now i close it . --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20886: [SPARK-19724][SQL]create a managed table with an existed...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/20886 **[Test build #88860 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/88860/testReport)** for PR 20886 at commit [`7a3311c`](https://github.com/apache/spark/commit/7a3311c2cbd3d9f7399abb38bd877bbd23ca836e). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20928: [MINOR][DOC] Fix some typos and grammar issues
Github user dsakuma commented on the issue: https://github.com/apache/spark/pull/20928 @HyukjinKwon I've fixed the title format :D --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #20640: [SPARK-19755][Mesos] Blacklist is always active f...
Github user squito commented on a diff in the pull request: https://github.com/apache/spark/pull/20640#discussion_r179013270 --- Diff: resource-managers/mesos/src/test/scala/org/apache/spark/scheduler/cluster/mesos/MesosCoarseGrainedSchedulerBackendSuite.scala --- @@ -108,6 +108,28 @@ class MesosCoarseGrainedSchedulerBackendSuite extends SparkFunSuite verifyTaskLaunched(driver, "o2") } + test("mesos declines offers from blacklisted slave") { +setBackend() + +// launches a task on a valid offer on slave s1 +val minMem = backend.executorMemory(sc) + 1024 +val minCpu = 4 +val offer1 = Resources(minMem, minCpu) +offerResources(List(offer1)) +verifyTaskLaunched(driver, "o1") + +// for any reason executor(aka mesos task) failed on s1 +val status = createTaskStatus("0", "s1", TaskState.TASK_FAILED) +backend.statusUpdate(driver, status) +when(taskScheduler.nodeBlacklist()).thenReturn(Set("hosts1")) --- End diff -- just to re-iterate my point above -- in many cases, having an executor fail will *not* lead to `taskScheduler.nodeBlacklist()` changing as you're doing here. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #20640: [SPARK-19755][Mesos] Blacklist is always active f...
Github user squito commented on a diff in the pull request: https://github.com/apache/spark/pull/20640#discussion_r179012299 --- Diff: resource-managers/mesos/src/main/scala/org/apache/spark/scheduler/cluster/mesos/MesosCoarseGrainedSchedulerBackend.scala --- @@ -648,14 +645,8 @@ private[spark] class MesosCoarseGrainedSchedulerBackend( totalGpusAcquired -= gpus gpusByTaskId -= taskId } -// If it was a failure, mark the slave as failed for blacklisting purposes if (TaskState.isFailed(state)) { - slave.taskFailures += 1 - - if (slave.taskFailures >= MAX_SLAVE_FAILURES) { -logInfo(s"Blacklisting Mesos slave $slaveId due to too many failures; " + -"is Spark installed on it?") - } + logError(s"Task $taskId failed on Mesos slave $slaveId.") --- End diff -- minor: I think it would be nice to say "Mesos task $taskId...". Maybe its obvious for those spending more time with mesos, but I find I'm easily confused by the difference between a mesos task and a spark task. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #20640: [SPARK-19755][Mesos] Blacklist is always active f...
Github user squito commented on a diff in the pull request: https://github.com/apache/spark/pull/20640#discussion_r179012891 --- Diff: resource-managers/mesos/src/main/scala/org/apache/spark/scheduler/cluster/mesos/MesosCoarseGrainedSchedulerBackend.scala --- @@ -571,7 +568,7 @@ private[spark] class MesosCoarseGrainedSchedulerBackend( cpus + totalCoresAcquired <= maxCores && mem <= offerMem && numExecutors < executorLimit && - slaves.get(slaveId).map(_.taskFailures).getOrElse(0) < MAX_SLAVE_FAILURES && + !scheduler.nodeBlacklist().contains(offerHostname) && --- End diff -- I just want to make really sure everybody understands the big change in behavior here -- `nodeBlacklist()` currently *only* gets updated based on failures in *spark* tasks. If a mesos task fails to even start -- that is, if a spark executor fails to launch on a node -- `nodeBlacklist` does not get updated. So you could have a node that is misconfigured somehow, and you might end up repeatedly trying to launch executors on it after this changed, with the executor even failing to start each time. That is even if you have blacklisting on. This is SPARK-16630 for the non-mesos case. That is being actively worked on now -- however the work there will probably have to be yarn-specific, so there will still be followup work to get the same thing for mesos after that is in. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20933: [SPARK-23817][SQL]Migrate ORC file format read path to d...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/20933 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/88859/ Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20933: [SPARK-23817][SQL]Migrate ORC file format read path to d...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/20933 Merged build finished. Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20933: [SPARK-23817][SQL]Migrate ORC file format read path to d...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/20933 **[Test build #88859 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/88859/testReport)** for PR 20933 at commit [`ffbf2f8`](https://github.com/apache/spark/commit/ffbf2f88c224fcafce003121695ab91774db0776). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20249: [SPARK-23057][SPARK-19235][SQL] SET LOCATION should chan...
Github user xubo245 commented on the issue: https://github.com/apache/spark/pull/20249 It's belong to TODO work @tgravescs --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #20886: [SPARK-19724][SQL]create a managed table with an ...
Github user gatorsmile commented on a diff in the pull request: https://github.com/apache/spark/pull/20886#discussion_r179008019 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/catalog/SessionCatalog.scala --- @@ -298,15 +299,32 @@ class SessionCatalog( makeQualifiedPath(tableDefinition.storage.locationUri.get) tableDefinition.copy( storage = tableDefinition.storage.copy(locationUri = Some(qualifiedTableLocation)), -identifier = TableIdentifier(table, Some(db))) +identifier = tableIdentifier) } else { - tableDefinition.copy(identifier = TableIdentifier(table, Some(db))) + tableDefinition.copy(identifier = tableIdentifier) } requireDbExists(db) +if (!ignoreIfExists) { + validateTableLocation(newTableDefinition) +} externalCatalog.createTable(newTableDefinition, ignoreIfExists) } + def validateTableLocation(table: CatalogTable): Unit = { +// SPARK-19724: the default location of a managed table should be non-existent or empty. +if (table.tableType == CatalogTableType.MANAGED && !conf.allowNonemptyManagedTableLocation) { + val tableLocation = +new Path(table.storage.locationUri.getOrElse(defaultTablePath(table.identifier))) + val fs = tableLocation.getFileSystem(hadoopConf) + + if (fs.exists(tableLocation) && fs.listStatus(tableLocation).nonEmpty) { +throw new AnalysisException(s"Can not create the managed table('${table.identifier}')" + --- End diff -- `Can not` -> `Not allowed to` --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #20945: [SPARK-23790][Mesos] fix metastore connection iss...
Github user vanzin commented on a diff in the pull request: https://github.com/apache/spark/pull/20945#discussion_r179007924 --- Diff: resource-managers/mesos/src/main/scala/org/apache/spark/scheduler/cluster/mesos/MesosClusterScheduler.scala --- @@ -506,6 +506,10 @@ private[spark] class MesosClusterScheduler( options ++= Seq("--class", desc.command.mainClass) } +desc.conf.getOption("spark.mesos.proxyUser").foreach { v => + options ++= Seq("--proxy-user", v) --- End diff -- > Yes because the assumption was client mode was safe. There is no warning about this Could probably use something in the documentation - warnings printed to logs are easily ignored. Still, there are legitimate uses for client mode + proxy user, but I don't think this is one of them. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #20886: [SPARK-19724][SQL]create a managed table with an ...
Github user gatorsmile commented on a diff in the pull request: https://github.com/apache/spark/pull/20886#discussion_r179007898 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/internal/SQLConf.scala --- @@ -1152,6 +1152,13 @@ object SQLConf { .booleanConf .createWithDefault(false) + val ALLOW_NONEMPTY_MANAGED_TABLE_LOCATION = +buildConf("spark.sql.allowNonemptyManagedTableLocation") --- End diff -- `spark.sql.allowCreateManagedTableUsingNonemptyLocation` Also this should be an internal conf --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20786: [SPARK-14681][ML] Provide label/impurity stats for spark...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/20786 **[Test build #88868 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/88868/testReport)** for PR 20786 at commit [`48c17d4`](https://github.com/apache/spark/commit/48c17d4dff6a4e82b86d70f3845e6d524b4807e5). --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20786: [SPARK-14681][ML] Provide label/impurity stats for spark...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/20786 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution/1939/ Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20786: [SPARK-14681][ML] Provide label/impurity stats for spark...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/20786 Merged build finished. Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20786: [SPARK-14681][ML] Provide label/impurity stats for spark...
Github user WeichenXu123 commented on the issue: https://github.com/apache/spark/pull/20786 @jkbradley Thanks! --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20928: Fix small typo in configuration doc
Github user HyukjinKwon commented on the issue: https://github.com/apache/spark/pull/20928 @dsakuma, mind if I ask to fix the PR title to .. `[MINOR][DOC] ...` just to consistent with other PRs? It's not a small typo anymore :). Thanks for your effort. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20972: Fixes misspelling in configuration.md
Github user HyukjinKwon commented on the issue: https://github.com/apache/spark/pull/20972 We can close this just for clarification. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20972: Fixes misspelling in configuration.md
Github user HyukjinKwon commented on the issue: https://github.com/apache/spark/pull/20972 Please do a quick search before opening a PR.. there are two duplicated PRs - https://github.com/apache/spark/pull/20948 and https://github.com/apache/spark/pull/20928 --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20972: Fixes misspelling in configuration.md
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/20972 Can one of the admins verify this patch? --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20972: Fixes misspelling in configuration.md
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/20972 Can one of the admins verify this patch? --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20923: [SPARK-23807][BUILD][WIP] Add Hadoop 3 profile with rele...
Github user jerryshao commented on the issue: https://github.com/apache/spark/pull/20923 Hi @steveloughran , I think you missed this comment. You need to create a deps file under dev/deps and change the related script. > Also I think we need to create a related spark-deps-hadoop-3.x under dev/deps and make dependency check work for Hadoop 3. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20971: [SPARK-23809][SQL][backport] Active SparkSession should ...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/20971 Merged build finished. Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20971: [SPARK-23809][SQL][backport] Active SparkSession should ...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/20971 **[Test build #88867 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/88867/testReport)** for PR 20971 at commit [`36fa1bd`](https://github.com/apache/spark/commit/36fa1bdc847f0b5ffb61284a35f3183751255705). --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20971: [SPARK-23809][SQL][backport] Active SparkSession should ...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/20971 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution/1938/ Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #20972: Fixes misspelling in configuration.md
GitHub user bradurani opened a pull request: https://github.com/apache/spark/pull/20972 Fixes misspelling in configuration.md ## What changes were proposed in this pull request? Fixes a misspelling in configuration.md. Changes `spark-defalut.conf` to `spark-default.conf` ## How was this patch tested? Viewed the new markdown in Github You can merge this pull request into a Git repository by running: $ git pull https://github.com/bradurani/spark bu/fix_docs_misspelling Alternatively you can review and apply these changes as the patch at: https://github.com/apache/spark/pull/20972.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #20972 commit e346e677cd2b783b4fa39e7bf6a59eee0a40eb1a Author: Brad Urani Date: 2018-04-04T00:44:23Z Fixes misspelling in configuration.md spark-defalut.conf -> spark-default.conf --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org