[GitHub] spark pull request: [SPARK-XXXX][SQL] Can't add subquery to an ope...
GitHub user cloud-fan opened a pull request: https://github.com/apache/spark/pull/11658 [SPARK-][SQL] Can't add subquery to an operator with same-name outputs while generate SQL string ## What changes were proposed in this pull request? This PR tries to solve a fundamental issue in the `SQLBuilder`. When we want to turn a logical plan into SQL string and put it after FROM clause, we need to wrap it with a sub-query. However, a logical plan is allowed to have same-name outputs with different qualifiers(e.g. the `Join` operator), and this kind of plan can't be put under a subquery as we will erase and assign a new qualifier to all outputs and make it impossible to distinguish same-name outputs. To solve this problem, this PR renames all attributes with globally unique names(using exprId), so that we don't need qualifiers to resolve ambiguity anymore. ## How was this patch tested? existing tests and new tests in LogicalPlanToSQLSuite. You can merge this pull request into a Git repository by running: $ git pull https://github.com/cloud-fan/spark gensql Alternatively you can review and apply these changes as the patch at: https://github.com/apache/spark/pull/11658.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #11658 commit f4b1ae8b30c2a572d4f44cd68b66c224aeee553b Author: Wenchen FanDate: 2016-03-11T06:29:45Z tmp commit 198b406a643d908483408ec6ccea26ffdc464aa9 Author: Wenchen Fan Date: 2016-03-11T14:35:50Z assign globally unique names to all attributes to avoid ambiguity --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-13817][BUILD][SQL] Re-enable MiMA and r...
Github user asfgit closed the pull request at: https://github.com/apache/spark/pull/11656 --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-13817][BUILD][SQL] Re-enable MiMA and r...
Github user liancheng commented on the pull request: https://github.com/apache/spark/pull/11656#issuecomment-195382124 I'm going to merge this to bring MiMA back. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-13436][SPARKR] Added parameter drop to ...
Github user sun-rui commented on a diff in the pull request: https://github.com/apache/spark/pull/11318#discussion_r55832602 --- Diff: R/pkg/R/DataFrame.R --- @@ -1217,29 +1217,38 @@ setMethod("[[", signature(x = "DataFrame", i = "numericOrcharacter"), #' @rdname subset #' @name [ -setMethod("[", signature(x = "DataFrame", i = "missing"), - function(x, i, j, ...) { -if (is.numeric(j)) { - cols <- columns(x) - j <- cols[j] -} -if (length(j) > 1) { - j <- as.list(j) +setMethod("[", signature(x = "DataFrame"), + function(x, i, j, ..., drop=T) { --- End diff -- coding style: drop = T Many other places have such coding style issue too --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-13436][SPARKR] Added parameter drop to ...
Github user sun-rui commented on a diff in the pull request: https://github.com/apache/spark/pull/11318#discussion_r55831536 --- Diff: R/pkg/R/DataFrame.R --- @@ -1271,12 +1280,8 @@ setMethod("[", signature(x = "DataFrame", i = "Column"), #' subset(df, select = c(1,2)) #' } setMethod("subset", signature(x = "DataFrame"), - function(x, subset, select, ...) { -if (missing(subset)) { - x[, select, ...] -} else { - x[subset, select, ...] -} + function(x, subset, select, drop=T, ...) { --- End diff -- inconsistent with drop = FALSE in the signature of subset() in the R base package? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-13577] [yarn] Allow Spark jar to be mul...
Github user asfgit closed the pull request at: https://github.com/apache/spark/pull/11500 --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-13577] [yarn] Allow Spark jar to be mul...
Github user tgravescs commented on the pull request: https://github.com/apache/spark/pull/11500#issuecomment-195371687 +1 --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-13642][Yarn] Properly handle signal kil...
Github user tgravescs commented on the pull request: https://github.com/apache/spark/pull/11512#issuecomment-195370827 Yes its only if they explicitly System.exit. Or someone puts a System.exit back into Spark but that shouldn't happen except on failure. ok @jerryshao can you update pr for master to just be to change default behavior of shutdown hook assume failures. Then a PR to 1.6 with this signal handler --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-13436][SPARKR] Added parameter drop to ...
Github user sun-rui commented on a diff in the pull request: https://github.com/apache/spark/pull/11318#discussion_r55830076 --- Diff: R/pkg/R/DataFrame.R --- @@ -1217,29 +1217,38 @@ setMethod("[[", signature(x = "DataFrame", i = "numericOrcharacter"), #' @rdname subset #' @name [ -setMethod("[", signature(x = "DataFrame", i = "missing"), - function(x, i, j, ...) { -if (is.numeric(j)) { - cols <- columns(x) - j <- cols[j] -} -if (length(j) > 1) { - j <- as.list(j) +setMethod("[", signature(x = "DataFrame"), + function(x, i, j, ..., drop=T) { +# Perform filtering first if needed --- End diff -- Maybe for Spark 2.0 we can sacrifice some backwards compatibility and have the R user convenience as long term benefit? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-13817][BUILD][SQL] Re-enable MiMA and r...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/11656#issuecomment-195361432 Merged build finished. Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-13817][BUILD][SQL] Re-enable MiMA and r...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/11656#issuecomment-195361433 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/52923/ Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-13817][BUILD][SQL] Re-enable MiMA and r...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/11656#issuecomment-195361013 **[Test build #52923 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/52923/consoleFull)** for PR 11656 at commit [`9dae7f8`](https://github.com/apache/spark/commit/9dae7f882181aec159812ba77a9225345b223b87). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-13823] [CORE] [STREAMING] [SQL] Always ...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/11657#issuecomment-195347967 **[Test build #52925 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/52925/consoleFull)** for PR 11657 at commit [`8e2865e`](https://github.com/apache/spark/commit/8e2865e41a0022665186eaca816f67501436357c). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-13739] [SQL] [WIP] Push Predicate Throu...
Github user hvanhovell commented on a diff in the pull request: https://github.com/apache/spark/pull/11635#discussion_r55822929 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/optimizer/Optimizer.scala --- @@ -932,6 +933,33 @@ object PushPredicateThroughGenerate extends Rule[LogicalPlan] with PredicateHelp } /** + * Push [[Filter]] operators through [[Window]] operators. Parts of the predicate that can be pushed + * beneath must satisfy three conditions: + * 1. involving one and only one column that is part of window partitioning key. --- End diff -- Only one? We could push them all through right? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-13823] [CORE] [STREAMING] [SQL] Always ...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/11657#issuecomment-195346081 **[Test build #52924 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/52924/consoleFull)** for PR 11657 at commit [`1deecd8`](https://github.com/apache/spark/commit/1deecd8d9ca986d8adb1a42d315890ce5349d29c). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-13739] [SQL] [WIP] Push Predicate Throu...
Github user hvanhovell commented on a diff in the pull request: https://github.com/apache/spark/pull/11635#discussion_r55822686 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/dsl/package.scala --- @@ -268,7 +268,14 @@ package object dsl { Aggregate(groupingExprs, aliasedExprs, logicalPlan) } - def subquery(alias: Symbol): LogicalPlan = SubqueryAlias(alias.name, logicalPlan) +def window( --- End diff -- +1 for this. Where is the frame spec? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-13739] [SQL] [WIP] Push Predicate Throu...
Github user hvanhovell commented on a diff in the pull request: https://github.com/apache/spark/pull/11635#discussion_r55822634 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/optimizer/Optimizer.scala --- @@ -932,6 +933,33 @@ object PushPredicateThroughGenerate extends Rule[LogicalPlan] with PredicateHelp } /** + * Push [[Filter]] operators through [[Window]] operators. Parts of the predicate that can be pushed + * beneath must satisfy three conditions: + * 1. involving one and only one column that is part of window partitioning key. + * 2. Window partitioning key should be just a sequence of [[AttributeReference]]. --- End diff -- Why should the partitioning keys be attributes? It is the most common case, but still. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-13823] [CORE] [STREAMING] [SQL] Always ...
GitHub user srowen opened a pull request: https://github.com/apache/spark/pull/11657 [SPARK-13823] [CORE] [STREAMING] [SQL] Always specify Charset in String <-> byte[] conversions (and remaining Coverity items) ## What changes were proposed in this pull request? - Fixes calls to `new String(byte[])` or `String.getBytes()` that rely on platform default encoding, to use UTF-8 - Same for `InputStreamReader` and `OutputStreamWriter` constructors - Standardizes on UTF-8 everywhere - Standardizes specifying the encoding with `StandardCharsets.UTF-8`, not the Guava constant or "UTF-8" (which means handling `UnuspportedEncodingException`) - (also addresses the other remaining Coverity scan issues, which are pretty trivial; these are separated into commit https://github.com/srowen/spark/commit/1deecd8d9ca986d8adb1a42d315890ce5349d29c ) ## How was this patch tested? Jenkins tests You can merge this pull request into a Git repository by running: $ git pull https://github.com/srowen/spark SPARK-13823 Alternatively you can review and apply these changes as the patch at: https://github.com/apache/spark/pull/11657.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #11657 commit 38b0b843fbde0f71086da1004bf853f1349f7402 Author: Sean OwenDate: 2016-03-09T16:47:54Z Always specify character encoding, and use standard UTF-8 symbol, when converting strings to/from bytes commit 1deecd8d9ca986d8adb1a42d315890ce5349d29c Author: Sean Owen Date: 2016-03-09T16:57:15Z Fix remaining valid Coverity warnings --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-13810] [CORE] Add Port Configuration Su...
Github user srowen commented on the pull request: https://github.com/apache/spark/pull/11644#issuecomment-195342004 LGTM. (By the way Bjorn fields a lot of support issues for Spark and this is apparently a frequent source of questions/confusion. Great if a simple message change can preempt a number of questions.) --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [STREAMING][MINOR] Fix a duplicate "be" in com...
Github user lw-lin closed the pull request at: https://github.com/apache/spark/pull/11650 --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [STREAMING][MINOR] Fix a duplicate "be" in com...
Github user lw-lin commented on the pull request: https://github.com/apache/spark/pull/11650#issuecomment-195335583 Sure, I'll close this for now. Thanks for your time! --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-13817][BUILD][SQL] Re-enable MiMA and r...
Github user liancheng commented on the pull request: https://github.com/apache/spark/pull/11656#issuecomment-195334485 Verified that MiMA check had been triggered and passed successfully. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-13294] [PROJECT INFRA] Remove MiMa's de...
Github user liancheng commented on the pull request: https://github.com/apache/spark/pull/11178#issuecomment-195323566 @JoshRosen MiMA check is re-enabled in PR #11656. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-13805][SQL] Generate code that get a va...
Github user kiszk commented on the pull request: https://github.com/apache/spark/pull/11636#issuecomment-195323234 @nongli , here is another example. Spark code with two columns sqlContext.conf.setConfString(SQLConf.PARQUET_VECTORIZED_READER_ENABLED.key, "true") sqlContext.conf.setConfString(SQLConf.WHOLESTAGE_CODEGEN_ENABLED.key, "true") val values = 10 withTempPath { dir => withTempTable("t1", "tempTable") { sqlContext.range(values).registerTempTable("t1") sqlContext.sql("select id % 2 as p, cast(id as INT) as id from t1") .write.partitionBy("p").parquet(dir.getCanonicalPath) sqlContext.read.parquet(dir.getCanonicalPath).registerTempTable("tempTable") sqlContext.sql("select sum(p), sum(id) from tempTable").collect } } Code snippet generated by this PR ... /* 073 */ private void rdd_processBatches() throws java.io.IOException { /* 074 */ while (true) { /* 075 */ int numRows = rdd_batch.numRows(); /* 076 */ if (rdd_batchIdx == 0) rdd_metricValue.add(numRows); /* 077 */ /* 078 */ while (!shouldStop() && rdd_batchIdx < numRows) { /* 079 */ org.apache.spark.sql.execution.vectorized.ColumnVector rdd_col0 = rdd_batch.column(0);org.apache.spark.sql.execution.vectorized.ColumnVector r dd_col1 = rdd_batch.column(1); /* 080 */ /*** CONSUME: TungstenAggregate(key=[], functions=[(sum(cast(p#4 as bigint)),mode=Partial,isDistinct=false),(sum(cast(id#3 as bigint)),mode=Pa rtial,isDistinct=false)], output=[sum#13L,sum#14L]) */ /* 081 */ /* input[0, int] */ /* 082 */ boolean rdd_isNull = rdd_col0.getIsNull(rdd_batchIdx); /* 083 */ int rdd_value = rdd_isNull ? -1 : (rdd_col0.getInt(rdd_batchIdx)); /* 084 */ /* input[1, int] */ /* 085 */ boolean rdd_isNull1 = rdd_col1.getIsNull(rdd_batchIdx); /* 086 */ int rdd_value1 = rdd_isNull1 ? -1 : (rdd_col1.getInt(rdd_batchIdx)); /* 087 */ /* 088 */ // do aggregate ... --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-13817][BUILD][SQL] Re-enable MiMA and r...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/11656#issuecomment-195319368 **[Test build #52923 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/52923/consoleFull)** for PR 11656 at commit [`9dae7f8`](https://github.com/apache/spark/commit/9dae7f882181aec159812ba77a9225345b223b87). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-13436][SPARKR] Added parameter drop to ...
Github user sun-rui commented on the pull request: https://github.com/apache/spark/pull/11318#issuecomment-195316072 https://stat.ethz.ch/R-manual/R-devel/library/base/html/Extract.data.frame.html here for reference --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-13432][SQL] add the source file name an...
Github user sarutak commented on the pull request: https://github.com/apache/spark/pull/11301#issuecomment-195311336 @kiszk O.K, let's focus on `Expression`s in this PR. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-13432][SQL] add the source file name an...
Github user sarutak commented on a diff in the pull request: https://github.com/apache/spark/pull/11301#discussion_r55813906 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/trees/TreeNode.scala --- @@ -60,20 +61,20 @@ object CurrentOrigin { def reset(): Unit = value.set(Origin()) - def setPosition(line: Int, start: Int): Unit = { + def setPosition(callSite: String, line: Int, start: Int): Unit = { value.set( - value.get.copy(line = Some(line), startPosition = Some(start))) + value.get.copy(callSite = Some(callSite), line = Some(line), startPosition = Some(start))) } def withOrigin[A](o: Origin)(f: => A): A = { +val current = get set(o) -val ret = try f finally { reset() } -reset() +val ret = try f finally { set(current) } ret } } -abstract class TreeNode[BaseType <: TreeNode[BaseType]] extends Product { +abstract class TreeNode[BaseType <: TreeNode[BaseType]] extends Product with Serializable { --- End diff -- Sorry, I said we don't need to make `TreeNode` serializable but it's needed otherwise `origin` is not serialized and origin information like callsite is not in the comment in the generated code when WholeStageCodegen is disabled. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [STREAMING][MINOR] Fix a duplicate "be" in com...
Github user srowen commented on the pull request: https://github.com/apache/spark/pull/11650#issuecomment-195308817 @lw-lin it's not worth bothering with a PR for something this trivial. Consider running a bigger check for misspelling etc if you're going to. This still costs like an hour of attention across everyone just to process it. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-13812][SPARKR] Fix SparkR lint-r test e...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/11652#issuecomment-195308327 Merged build finished. Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-13812][SPARKR] Fix SparkR lint-r test e...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/11652#issuecomment-195308328 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/52922/ Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-13812][SPARKR] Fix SparkR lint-r test e...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/11652#issuecomment-195308217 **[Test build #52922 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/52922/consoleFull)** for PR 11652 at commit [`22d85d5`](https://github.com/apache/spark/commit/22d85d59fb8c6fd4e4b7424d52b24f38ad1a26e4). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-13658][SQL] BooleanSimplification rule ...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/11647#issuecomment-195307315 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/52918/ Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-13658][SQL] BooleanSimplification rule ...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/11647#issuecomment-195307309 Merged build finished. Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-13658][SQL] BooleanSimplification rule ...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/11647#issuecomment-195306728 **[Test build #52918 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/52918/consoleFull)** for PR 11647 at commit [`cbab017`](https://github.com/apache/spark/commit/cbab01763cf753779ae9a8557d88e9ec80e7be59). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-13816][Graphx] Add parameter checks for...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/11655#issuecomment-195304513 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/52920/ Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-13816][Graphx] Add parameter checks for...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/11655#issuecomment-195304509 Merged build finished. Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-13817][BUILD][SQL] Re-enable MiMA and r...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/11656#issuecomment-195304080 **[Test build #52921 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/52921/consoleFull)** for PR 11656 at commit [`b46e876`](https://github.com/apache/spark/commit/b46e876153212a26c791110eecca88a31806b89f). * This patch **fails Scala style tests**. * This patch merges cleanly. * This patch adds no public classes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-13817][BUILD][SQL] Re-enable MiMA and r...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/11656#issuecomment-195304086 Merged build finished. Test FAILed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-13817][BUILD][SQL] Re-enable MiMA and r...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/11656#issuecomment-195304088 Test FAILed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/52921/ Test FAILed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-13816][Graphx] Add parameter checks for...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/11655#issuecomment-195304148 **[Test build #52920 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/52920/consoleFull)** for PR 11655 at commit [`638ef77`](https://github.com/apache/spark/commit/638ef77dd1535e9f89ee81b0c34e2173178492a5). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-13568] [ML] Create feature transformer ...
Github user MLnick commented on a diff in the pull request: https://github.com/apache/spark/pull/11601#discussion_r55810884 --- Diff: mllib/src/main/scala/org/apache/spark/ml/feature/Imputer.scala --- @@ -0,0 +1,290 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the "License"); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + *http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package org.apache.spark.ml.feature + +import org.apache.hadoop.fs.Path + +import org.apache.spark.annotation.{Experimental, Since} +import org.apache.spark.ml.{Estimator, Model} +import org.apache.spark.ml.param._ +import org.apache.spark.ml.param.shared.{HasInputCol, HasOutputCol} +import org.apache.spark.ml.util._ +import org.apache.spark.mllib.linalg._ +import org.apache.spark.sql.{DataFrame, Row} +import org.apache.spark.sql.functions.{col, udf} +import org.apache.spark.sql.types.{DoubleType, StructField, StructType} + +/** + * Params for [[Imputer]] and [[ImputerModel]]. + */ +private[feature] trait ImputerParams extends Params with HasInputCol with HasOutputCol { + + /** +* The imputation strategy. +* If "mean", then replace missing values using the mean value of the feature. +* If "median", then replace missing values using the median value of the feature. +* If "most", then replace missing using the most frequent value of the feature. +* Default: mean +* +* @group param +*/ + val strategy: Param[String] = new Param(this, "strategy", "strategy for imputation. " + +"If mean, then replace missing values using the mean value of the feature." + +"If median, then replace missing values using the median value of the feature." + +"If most, then replace missing using the most frequent value of the feature.", + ParamValidators.inArray[String](Imputer.supportedStrategyNames.toArray)) + + /** @group getParam */ + def getStrategy: String = $(strategy) + + /** +* The placeholder for the missing values. All occurrences of missingValue will be imputed. +* Default: Double.NaN +* +* @group param +*/ + val missingValue: DoubleParam = new DoubleParam(this, "missingValue", +"The placeholder for the missing values. All occurrences of missingValue will be imputed") + + /** @group getParam */ + def getMissingValue: Double = $(missingValue) + + /** Validates and transforms the input schema. */ + protected def validateAndTransformSchema(schema: StructType): StructType = { +validateParams() +val inputType = schema($(inputCol)).dataType +require(inputType.isInstanceOf[VectorUDT] || inputType.isInstanceOf[DoubleType], + s"Input column ${$(inputCol)} must of type Vector or Double") +require(!schema.fieldNames.contains($(outputCol)), + s"Output column ${$(outputCol)} already exists.") +val outputFields = schema.fields :+ StructField($(outputCol), new VectorUDT, false) +StructType(outputFields) + } + +} + +/** + * :: Experimental :: + * Imputation estimator for completing missing values, either using the mean, the median or + * the most frequent value of the column in which the missing values are located. This class + * also allows for different missing values. + */ +@Experimental +class Imputer @Since("2.0.0")(override val uid: String) + extends Estimator[ImputerModel] with ImputerParams with DefaultParamsWritable { + + @Since("2.0.0") + def this() = this(Identifiable.randomUID("imputer")) + + /** @group setParam */ + def setInputCol(value: String): this.type = set(inputCol, value) + + /** @group setParam */ + def setOutputCol(value: String): this.type = set(outputCol, value) + + /** + * Imputation strategy. Available options are "mean", "median" and "most". + * @group setParam + */ + def setStrategy(value: String): this.type = set(strategy, value) + + /** @group setParam */ + def setMissingValue(value: Double): this.type
[GitHub] spark pull request: [SPARK-13812][SPARKR] Fix SparkR lint-r test e...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/11652#issuecomment-195303762 **[Test build #52922 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/52922/consoleFull)** for PR 11652 at commit [`22d85d5`](https://github.com/apache/spark/commit/22d85d59fb8c6fd4e4b7424d52b24f38ad1a26e4). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-13817][BUILD][SQL] Re-enable MiMA and r...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/11656#issuecomment-195303760 **[Test build #52921 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/52921/consoleFull)** for PR 11656 at commit [`b46e876`](https://github.com/apache/spark/commit/b46e876153212a26c791110eecca88a31806b89f). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-13267] [Web UI] document the ?param arg...
Github user steveloughran commented on a diff in the pull request: https://github.com/apache/spark/pull/11152#discussion_r55810810 --- Diff: docs/monitoring.md --- @@ -273,8 +309,8 @@ for a running application, at `http://localhost:4040/api/v1`. Download the event logs for all attempts of the given application as a zip file -/applications/[app-id]/[attempt-id]/logs -Download the event logs for the specified attempt of the given application as a zip file +/applications/[app-id]/logs +Download the event logs for the application as a zip file --- End diff -- OK, now i'm confusing, because the docs at the bottom say appId is `appid/attemptid` in yarn, which is true for every other element, which was why I adjusted this one. I'll try and clarify things more about appId vs attemptId, pull the details to the top and then have special mention here. As clearly it managed to confuse me. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-13812][SPARKR] Fix SparkR lint-r test e...
Github user sun-rui commented on the pull request: https://github.com/apache/spark/pull/11652#issuecomment-195303544 Jenkins, retest this please --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-13817][BUILD][SQL] Re-enable MiMA and r...
GitHub user liancheng opened a pull request: https://github.com/apache/spark/pull/11656 [SPARK-13817][BUILD][SQL] Re-enable MiMA and removes object DataFrame ## What changes were proposed in this pull request? PR #11443 temporarily disabled MiMA check, this PR re-enables it. One extra change is that `object DataFrame` is also removed. The only purpose of introducing `object DataFrame` was to use it as an internal factory for creating `Dataset[Row]`. By removing this object, both `DataFrame` and `DataFrame$` are entirely removed from the API, so that we can simply put a `MissingClassProblem` filter in `MimaExcludes.scala` for most DataFrame API changes. ## How was this patch tested? Tested by MiMA check triggered by Jenkins. You can merge this pull request into a Git repository by running: $ git pull https://github.com/liancheng/spark re-enable-mima Alternatively you can review and apply these changes as the patch at: https://github.com/apache/spark/pull/11656.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #11656 commit b46e876153212a26c791110eecca88a31806b89f Author: Cheng LianDate: 2016-03-11T08:26:32Z Re-enable MiMA and removes object DataFrame --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-13267] [Web UI] document the ?param arg...
Github user steveloughran commented on a diff in the pull request: https://github.com/apache/spark/pull/11152#discussion_r55810591 --- Diff: docs/monitoring.md --- @@ -229,10 +229,28 @@ for a running application, at `http://localhost:4040/api/v1`. A list of all applications + + ?status=[completed|running] list only applications in the chosen state + + + + ?minDate=[date] earliest date/time to list. Examples: + ?minDate=2015-02-10 --- End diff -- actually no, it MUST be `` without a tag, except in the specific case of XHTML, which nobody uses 1. [Mozilla](https://developer.mozilla.org/en/docs/Web/HTML/Element/br) 1. [MSFT](https://msdn.microsoft.com/en-us/library/ms535208(v=vs.85).aspx) This is the same as the paragraph tag, {{}}. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-13568] [ML] Create feature transformer ...
Github user MLnick commented on a diff in the pull request: https://github.com/apache/spark/pull/11601#discussion_r55810108 --- Diff: mllib/src/main/scala/org/apache/spark/ml/feature/Imputer.scala --- @@ -0,0 +1,290 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the "License"); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + *http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package org.apache.spark.ml.feature + +import org.apache.hadoop.fs.Path + +import org.apache.spark.annotation.{Experimental, Since} +import org.apache.spark.ml.{Estimator, Model} +import org.apache.spark.ml.param._ +import org.apache.spark.ml.param.shared.{HasInputCol, HasOutputCol} +import org.apache.spark.ml.util._ +import org.apache.spark.mllib.linalg._ +import org.apache.spark.sql.{DataFrame, Row} +import org.apache.spark.sql.functions.{col, udf} +import org.apache.spark.sql.types.{DoubleType, StructField, StructType} + +/** + * Params for [[Imputer]] and [[ImputerModel]]. + */ +private[feature] trait ImputerParams extends Params with HasInputCol with HasOutputCol { + + /** +* The imputation strategy. +* If "mean", then replace missing values using the mean value of the feature. +* If "median", then replace missing values using the median value of the feature. +* If "most", then replace missing using the most frequent value of the feature. +* Default: mean +* +* @group param +*/ + val strategy: Param[String] = new Param(this, "strategy", "strategy for imputation. " + +"If mean, then replace missing values using the mean value of the feature." + +"If median, then replace missing values using the median value of the feature." + +"If most, then replace missing using the most frequent value of the feature.", + ParamValidators.inArray[String](Imputer.supportedStrategyNames.toArray)) + + /** @group getParam */ + def getStrategy: String = $(strategy) + + /** +* The placeholder for the missing values. All occurrences of missingValue will be imputed. +* Default: Double.NaN +* +* @group param +*/ + val missingValue: DoubleParam = new DoubleParam(this, "missingValue", +"The placeholder for the missing values. All occurrences of missingValue will be imputed") + + /** @group getParam */ + def getMissingValue: Double = $(missingValue) + + /** Validates and transforms the input schema. */ + protected def validateAndTransformSchema(schema: StructType): StructType = { +validateParams() +val inputType = schema($(inputCol)).dataType +require(inputType.isInstanceOf[VectorUDT] || inputType.isInstanceOf[DoubleType], + s"Input column ${$(inputCol)} must of type Vector or Double") +require(!schema.fieldNames.contains($(outputCol)), + s"Output column ${$(outputCol)} already exists.") +val outputFields = schema.fields :+ StructField($(outputCol), new VectorUDT, false) +StructType(outputFields) + } + +} + +/** + * :: Experimental :: + * Imputation estimator for completing missing values, either using the mean, the median or + * the most frequent value of the column in which the missing values are located. This class + * also allows for different missing values. + */ +@Experimental +class Imputer @Since("2.0.0")(override val uid: String) + extends Estimator[ImputerModel] with ImputerParams with DefaultParamsWritable { + + @Since("2.0.0") + def this() = this(Identifiable.randomUID("imputer")) + + /** @group setParam */ + def setInputCol(value: String): this.type = set(inputCol, value) + + /** @group setParam */ + def setOutputCol(value: String): this.type = set(outputCol, value) + + /** + * Imputation strategy. Available options are "mean", "median" and "most". + * @group setParam + */ + def setStrategy(value: String): this.type = set(strategy, value) + + /** @group setParam */ + def setMissingValue(value: Double): this.type
[GitHub] spark pull request: [SPARK-13658][SQL] BooleanSimplification rule ...
Github user viirya commented on the pull request: https://github.com/apache/spark/pull/11647#issuecomment-195302187 I was measuring the time to finish the code snippet shown in pr description. But I think I should only measure the time spent on `val actual = Optimize.execute(plan)`. Let me update it: With this patch: 84362 microseconds 84075 microseconds 86192 microseconds Without this patch: 492139 microseconds 503386 microseconds 463162 microseconds --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-13816][Graphx] Add parameter checks for...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/11655#issuecomment-195301155 **[Test build #52920 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/52920/consoleFull)** for PR 11655 at commit [`638ef77`](https://github.com/apache/spark/commit/638ef77dd1535e9f89ee81b0c34e2173178492a5). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [STREAMING][MINOR] Fix a duplicate "be" in com...
Github user lw-lin commented on the pull request: https://github.com/apache/spark/pull/11650#issuecomment-195299373 @rxin @zsxwing Would you take a look when you have time? Thanks! --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-13816][Graphx] Add parameter checks for...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/11655#issuecomment-195295824 Test FAILed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/52919/ Test FAILed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-13816][Graphx] Add parameter checks for...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/11655#issuecomment-195295815 Merged build finished. Test FAILed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-13816][Graphx] Add parameter checks for...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/11655#issuecomment-195295571 **[Test build #52919 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/52919/consoleFull)** for PR 11655 at commit [`a6122de`](https://github.com/apache/spark/commit/a6122dea21f0075ca0bd403538dc2cc30e4327ff). * This patch **fails Spark unit tests**. * This patch merges cleanly. * This patch adds no public classes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-13816][Graphx] Add parameter checks for...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/11655#issuecomment-195286682 **[Test build #52919 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/52919/consoleFull)** for PR 11655 at commit [`a6122de`](https://github.com/apache/spark/commit/a6122dea21f0075ca0bd403538dc2cc30e4327ff). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-13568] [ML] Create feature transformer ...
Github user MLnick commented on a diff in the pull request: https://github.com/apache/spark/pull/11601#discussion_r55807303 --- Diff: mllib/src/main/scala/org/apache/spark/ml/feature/Imputer.scala --- @@ -0,0 +1,290 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the "License"); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + *http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package org.apache.spark.ml.feature + +import org.apache.hadoop.fs.Path + +import org.apache.spark.annotation.{Experimental, Since} +import org.apache.spark.ml.{Estimator, Model} +import org.apache.spark.ml.param._ +import org.apache.spark.ml.param.shared.{HasInputCol, HasOutputCol} +import org.apache.spark.ml.util._ +import org.apache.spark.mllib.linalg._ +import org.apache.spark.sql.{DataFrame, Row} +import org.apache.spark.sql.functions.{col, udf} +import org.apache.spark.sql.types.{DoubleType, StructField, StructType} + +/** + * Params for [[Imputer]] and [[ImputerModel]]. + */ +private[feature] trait ImputerParams extends Params with HasInputCol with HasOutputCol { + + /** +* The imputation strategy. +* If "mean", then replace missing values using the mean value of the feature. +* If "median", then replace missing values using the median value of the feature. +* If "most", then replace missing using the most frequent value of the feature. +* Default: mean +* +* @group param +*/ + val strategy: Param[String] = new Param(this, "strategy", "strategy for imputation. " + +"If mean, then replace missing values using the mean value of the feature." + +"If median, then replace missing values using the median value of the feature." + +"If most, then replace missing using the most frequent value of the feature.", + ParamValidators.inArray[String](Imputer.supportedStrategyNames.toArray)) + + /** @group getParam */ + def getStrategy: String = $(strategy) + + /** +* The placeholder for the missing values. All occurrences of missingValue will be imputed. +* Default: Double.NaN +* +* @group param +*/ + val missingValue: DoubleParam = new DoubleParam(this, "missingValue", +"The placeholder for the missing values. All occurrences of missingValue will be imputed") + + /** @group getParam */ + def getMissingValue: Double = $(missingValue) + + /** Validates and transforms the input schema. */ + protected def validateAndTransformSchema(schema: StructType): StructType = { +validateParams() +val inputType = schema($(inputCol)).dataType +require(inputType.isInstanceOf[VectorUDT] || inputType.isInstanceOf[DoubleType], + s"Input column ${$(inputCol)} must of type Vector or Double") +require(!schema.fieldNames.contains($(outputCol)), + s"Output column ${$(outputCol)} already exists.") +val outputFields = schema.fields :+ StructField($(outputCol), new VectorUDT, false) +StructType(outputFields) + } + +} + +/** + * :: Experimental :: + * Imputation estimator for completing missing values, either using the mean, the median or + * the most frequent value of the column in which the missing values are located. This class + * also allows for different missing values. + */ +@Experimental +class Imputer @Since("2.0.0")(override val uid: String) + extends Estimator[ImputerModel] with ImputerParams with DefaultParamsWritable { + + @Since("2.0.0") + def this() = this(Identifiable.randomUID("imputer")) + + /** @group setParam */ + def setInputCol(value: String): this.type = set(inputCol, value) + + /** @group setParam */ + def setOutputCol(value: String): this.type = set(outputCol, value) + + /** + * Imputation strategy. Available options are "mean", "median" and "most". + * @group setParam + */ + def setStrategy(value: String): this.type = set(strategy, value) + + /** @group setParam */ + def setMissingValue(value: Double): this.type
[GitHub] spark pull request: [SPARK-13816][Graphx] Add parameter checks for...
GitHub user zhengruifeng opened a pull request: https://github.com/apache/spark/pull/11655 [SPARK-13816][Graphx] Add parameter checks for algorithms in Graphx JIRA: https://issues.apache.org/jira/browse/SPARK-13816 ## What changes were proposed in this pull request? Add parameter checks for algorithms in Graphx: Pregel,LabelPropagation,PageRank,SVDPlusPlus ## How was this patch tested? manual tests You can merge this pull request into a Git repository by running: $ git pull https://github.com/zhengruifeng/spark graphx_param_check Alternatively you can review and apply these changes as the patch at: https://github.com/apache/spark/pull/11655.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #11655 commit a6122dea21f0075ca0bd403538dc2cc30e4327ff Author: Zheng RuiFengDate: 2016-03-11T09:17:43Z create param-check for graphx-algos --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-7425] [ML] spark.ml Predictor should su...
Github user MLnick commented on a diff in the pull request: https://github.com/apache/spark/pull/10355#discussion_r55806405 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/types/AbstractDataType.scala --- @@ -151,7 +151,7 @@ abstract class NumericType extends AtomicType { } -private[sql] object NumericType extends AbstractDataType { +private[spark] object NumericType extends AbstractDataType { --- End diff -- I don't think it's required - see https://github.com/apache/spark/pull/9777/files#diff-a364fbaaec2088b49d5d1ceaea4a1c41R73 where OneHotEncoder uses `isInstanceOf[NumericType]` without making this object `private[spark]`. Here `NumericType` refers to the abstract class [here](https://github.com/apache/spark/blob/master/sql/catalyst/src/main/scala/org/apache/spark/sql/types/AbstractDataType.scala#L144). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [Spark-13814][PySpark] Delete unnecessary impo...
Github user zhengruifeng commented on the pull request: https://github.com/apache/spark/pull/11651#issuecomment-195265628 @srowen I think so, I have reviewed all python examples. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [Spark-13814][PySpark] Delete unnecessary impo...
Github user srowen commented on the pull request: https://github.com/apache/spark/pull/11651#issuecomment-195264239 I think that's fine. Is this all we can find in Python examples? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [Spark-13814][PySpark] Delete unnecessary impo...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/11651#issuecomment-195258130 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/52917/ Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [Spark-13814][PySpark] Delete unnecessary impo...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/11651#issuecomment-195258129 Merged build finished. Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-13788][MLLIB] Fix side effects in Chole...
Github user srowen commented on the pull request: https://github.com/apache/spark/pull/11617#issuecomment-195258085 Returning a value by modifying an arg in place is exceptional, but I don't think it's something that can never happen. This is a case where it should, and it's done and documented in this special-purpose implementation intentionally for performance. I don't know about you, but around BLAS/lapack and related methods is exactly where I expect the function args to be modified, since that's the only way lapack works. Ideally, the modifying in-place semantics is reusable for a bunch of purposes. Since it's just the result of the lapack call, I suspect it could be. However that doesn't mean it's suitable for your use case. But if your solution is to clone for all calls, then a solution for this particular other usage is to pass a clone of your argument, yeah. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [Spark-13814][PySpark] Delete unnecessary impo...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/11651#issuecomment-195258018 **[Test build #52917 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/52917/consoleFull)** for PR 11651 at commit [`a78c25b`](https://github.com/apache/spark/commit/a78c25ba27f6477fd0393e0d5da052ecd4a3620b). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-13658][SQL] BooleanSimplification rule ...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/11647#issuecomment-195257957 **[Test build #52918 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/52918/consoleFull)** for PR 11647 at commit [`cbab017`](https://github.com/apache/spark/commit/cbab01763cf753779ae9a8557d88e9ec80e7be59). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-13811] [SQL] No Push-Down of Constraint...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/11649#issuecomment-195257436 Merged build finished. Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-13811] [SQL] No Push-Down of Constraint...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/11649#issuecomment-195257438 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/52907/ Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-13811] [SQL] No Push-Down of Constraint...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/11649#issuecomment-195256795 **[Test build #52907 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/52907/consoleFull)** for PR 11649 at commit [`9cba8a3`](https://github.com/apache/spark/commit/9cba8a339a36dba83765954c26905b6d61926d67). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-13776][WebUI]Add spark.ui.threads to se...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/11615#issuecomment-195255151 Merged build finished. Test FAILed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [HOTFIX][SQL] Fixes compilation failure caused...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/11654#issuecomment-195255004 Merged build finished. Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-13776][WebUI]Add spark.ui.threads to se...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/11615#issuecomment-195255153 Test FAILed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/52908/ Test FAILed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [Spark-13814][PySpark] Delete unnecessary impo...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/11651#issuecomment-195255010 **[Test build #52917 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/52917/consoleFull)** for PR 11651 at commit [`a78c25b`](https://github.com/apache/spark/commit/a78c25ba27f6477fd0393e0d5da052ecd4a3620b). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-13776][WebUI]Add spark.ui.threads to se...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/11615#issuecomment-195254957 **[Test build #52908 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/52908/consoleFull)** for PR 11615 at commit [`8422f6e`](https://github.com/apache/spark/commit/8422f6e1e5c8617dd8bffaa6c47429aadfc7bc10). * This patch **fails Spark unit tests**. * This patch merges cleanly. * This patch adds no public classes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [HOTFIX][SQL] Fixes compilation failure caused...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/11654#issuecomment-195255007 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/52916/ Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-13789] Infer additional constraints fro...
Github user sameeragarwal commented on the pull request: https://github.com/apache/spark/pull/11618#issuecomment-195254941 @gatorsmile instead of having a special rule for join, we can probably infer all possible filters based on constraints (... something along the lines of https://github.com/sameeragarwal/spark/commit/ce4c9441f32ab39eaebd7fe2081cf1c789a1ef4a). This should now subsume the predicate transitivity optimization right? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [HOTFIX][SQL] Fixes compilation failure caused...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/11654#issuecomment-195254910 **[Test build #52916 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/52916/consoleFull)** for PR 11654 at commit [`f416d02`](https://github.com/apache/spark/commit/f416d02a61918cb151a405b77569e8fcced19628). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [HOTFIX][SQL] Fixes compilation failure caused...
Github user MLnick commented on the pull request: https://github.com/apache/spark/pull/11654#issuecomment-195254726 Cool - sorry about that, Jenkins had passed for PR #11392 but before PR #11443 was merged :( --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [Spark-13814][PySpark] Delete unnecessary impo...
Github user zhengruifeng commented on the pull request: https://github.com/apache/spark/pull/11651#issuecomment-195254744 Jenkins, test this please --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [HOTFIX][SQL] Fixes compilation failure caused...
Github user liancheng closed the pull request at: https://github.com/apache/spark/pull/11654 --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [HOTFIX][SQL] Fixes compilation failure caused...
Github user liancheng commented on the pull request: https://github.com/apache/spark/pull/11654#issuecomment-195254363 @MLnick Thanks! Closing this in favor of #11653. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [HOT-FIX][SQL][ML] Fix compile error from use ...
Github user asfgit closed the pull request at: https://github.com/apache/spark/pull/11653 --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [HOTFIX][SQL] Fixes compilation failure caused...
Github user MLnick commented on the pull request: https://github.com/apache/spark/pull/11654#issuecomment-195252458 @liancheng I'm merging #11653 now, so this PR can be closed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [Spark-13814][PySpark] Delete unnecessary impo...
Github user zhengruifeng commented on the pull request: https://github.com/apache/spark/pull/11651#issuecomment-195252149 it sames something wrong in JavaMaxAbsScalerExample... --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [HOT-FIX][SQL][ML] Fix compile error from use ...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/11653#issuecomment-195250903 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/52914/ Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-13658][SQL] BooleanSimplification rule ...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/11647#issuecomment-195251569 Merged build finished. Test FAILed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-13658][SQL] BooleanSimplification rule ...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/11647#issuecomment-195251580 Test FAILed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/52909/ Test FAILed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [HOT-FIX][SQL][ML] Fix compile error from use ...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/11653#issuecomment-195250892 Merged build finished. Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-13658][SQL] BooleanSimplification rule ...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/11647#issuecomment-195250651 **[Test build #52909 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/52909/consoleFull)** for PR 11647 at commit [`c75e602`](https://github.com/apache/spark/commit/c75e602e58793690bc83f87a8ec25441cd22ed7b). * This patch **fails Spark unit tests**. * This patch merges cleanly. * This patch adds no public classes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-12719][SQL] [WIP] SQL generation suppor...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/11596#issuecomment-195250446 **[Test build #52915 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/52915/consoleFull)** for PR 11596 at commit [`bea871f`](https://github.com/apache/spark/commit/bea871f6b43de7ffcb6c58089521401ca39a7145). * This patch **fails to build**. * This patch merges cleanly. * This patch adds no public classes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-12719][SQL] [WIP] SQL generation suppor...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/11596#issuecomment-195250469 Test FAILed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/52915/ Test FAILed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-12719][SQL] [WIP] SQL generation suppor...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/11596#issuecomment-195250466 Merged build finished. Test FAILed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [HOTFIX][SQL] Fixes compilation failure caused...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/11654#issuecomment-195250052 **[Test build #52916 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/52916/consoleFull)** for PR 11654 at commit [`f416d02`](https://github.com/apache/spark/commit/f416d02a61918cb151a405b77569e8fcced19628). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [HOT-FIX][SQL][ML] Fix compile error from use ...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/11653#issuecomment-195250215 **[Test build #52914 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/52914/consoleFull)** for PR 11653 at commit [`d3f8100`](https://github.com/apache/spark/commit/d3f81008620bfbaa033d7c37661510c452cd50d7). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [HOTFIX][SQL] Fixes compilation failure caused...
Github user MLnick commented on the pull request: https://github.com/apache/spark/pull/11654#issuecomment-195249462 @liancheng I've opened a PR already for this #11653 --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [HOTFIX][SQL] Fixes compilation failure caused...
GitHub user liancheng opened a pull request: https://github.com/apache/spark/pull/11654 [HOTFIX][SQL] Fixes compilation failure caused by conflicts between PR #11392 and PR #11443 ## What changes were proposed in this pull request? PR #11392 was merged after PR #11443, which migrated Java `DataFrame` to `Dataset`, thus caused a compilation failure. ## How was this patch tested? Jenkins compilation. You can merge this pull request into a Git repository by running: $ git pull https://github.com/liancheng/spark hotfix-pr-11392-compilation-failure Alternatively you can review and apply these changes as the patch at: https://github.com/apache/spark/pull/11654.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #11654 commit f416d02a61918cb151a405b77569e8fcced19628 Author: Cheng LianDate: 2016-03-11T08:11:24Z Fixes compilation failure --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-12719][SQL] [WIP] SQL generation suppor...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/11596#issuecomment-195248289 **[Test build #52915 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/52915/consoleFull)** for PR 11596 at commit [`bea871f`](https://github.com/apache/spark/commit/bea871f6b43de7ffcb6c58089521401ca39a7145). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-13810] [CORE] Add Port Configuration Su...
Github user srowen commented on the pull request: https://github.com/apache/spark/pull/11644#issuecomment-195248388 I like the principle, but passing around an error message as an arg to `SparkEnv` makes this tiny aspect leak too much into the API. If it's just to be able to report which config to change, I think I wouldn't do that. Generically advise changing a port config. This would be much simpler then, commensurate with the scope of the change. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-13812][SPARKR] Fix SparkR lint-r test e...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/11652#issuecomment-195246885 Test FAILed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/52913/ Test FAILed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-13812][SPARKR] Fix SparkR lint-r test e...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/11652#issuecomment-195246878 **[Test build #52913 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/52913/consoleFull)** for PR 11652 at commit [`22d85d5`](https://github.com/apache/spark/commit/22d85d59fb8c6fd4e4b7424d52b24f38ad1a26e4). * This patch **fails to build**. * This patch merges cleanly. * This patch adds no public classes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org