[GitHub] spark issue #16668: [SPARK-18788][SPARKR] Add API for getNumPartitions
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/16668 **[Test build #71761 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/71761/testReport)** for PR 16668 at commit [`34f9aa5`](https://github.com/apache/spark/commit/34f9aa520770974be7d1417a11ffdd1e1118ddf2). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #16668: [SPARK-18788][SPARKR] Add API for getNumPartition...
GitHub user felixcheung opened a pull request: https://github.com/apache/spark/pull/16668 [SPARK-18788][SPARKR] Add API for getNumPartitions ## What changes were proposed in this pull request? With doc to say this would convert DF into RDD ## How was this patch tested? unit tests, manual tests You can merge this pull request into a Git repository by running: $ git pull https://github.com/felixcheung/spark rgetnumpartitions Alternatively you can review and apply these changes as the patch at: https://github.com/apache/spark/pull/16668.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #16668 commit 34f9aa520770974be7d1417a11ffdd1e1118ddf2 Author: Felix Cheung Date: 2017-01-21T07:53:30Z getNumPartitions --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #16659: [SPARK-19309][SQL] disable common subexpression eliminat...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/16659 Test FAILed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/71759/ Test FAILed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #16659: [SPARK-19309][SQL] disable common subexpression eliminat...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/16659 Merged build finished. Test FAILed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #16659: [SPARK-19309][SQL] disable common subexpression eliminat...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/16659 **[Test build #71759 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/71759/testReport)** for PR 16659 at commit [`9d50048`](https://github.com/apache/spark/commit/9d50048a47b5052a85faa16535535eb86c146aa3). * This patch **fails Spark unit tests**. * This patch merges cleanly. * This patch adds no public classes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #16652: [SPARK-19234][MLLib] AFTSurvivalRegression should fail f...
Github user admackin commented on the issue: https://github.com/apache/spark/pull/16652 Yes, the version in MLUtils had labels of zero in the test cases, so was causing test cases to fail after my patch. It didn't look like there was a way to fix this, so I thought it better to make a patch that didn't affect potentially dozens of other packages. Any other thoughts on how to achieve this? I could add a 'minLabel' param to the MLUtils methods but that seems overly specific for this one package. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #16659: [SPARK-19309][SQL] disable common subexpression eliminat...
Github user cloud-fan commented on the issue: https://github.com/apache/spark/pull/16659 I reran the `DatasetBenchmark`, there is no performance regression. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #16663: [SPARK-18823][SPARKR] add support for assigning to colum...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/16663 **[Test build #71760 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/71760/testReport)** for PR 16663 at commit [`73845cb`](https://github.com/apache/spark/commit/73845cb93be7692fe6954232583166d66d0bf8d2). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #16663: [SPARK-18823][SPARKR] add support for assigning to colum...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/16663 Merged build finished. Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #16663: [SPARK-18823][SPARKR] add support for assigning to colum...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/16663 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/71760/ Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #16609: [SPARK-8480] [CORE] [PYSPARK] [SPARKR] Add setNam...
Github user felixcheung commented on a diff in the pull request: https://github.com/apache/spark/pull/16609#discussion_r97192829 --- Diff: R/pkg/R/DataFrame.R --- @@ -78,6 +78,55 @@ dataFrame <- function(sdf, isCached = FALSE) { SparkDataFrame Methods ## +#' storageName +#' +#' Return a SparkDataFrame's name. +#' +#' @param x The SparkDataFrame whose name is returned. +#' @family SparkDataFrame functions +#' @rdname storageName +#' @examples +#'\dontrun{ +#' sparkR.session() +#' path <- "path/to/file.json" +#' df <- read.json(path) +#' storageName(df) +#'} +#' @aliases storageName,SparkDataFrame-method +#' @export +#' @note storageName since 2.2.0 +setMethod("storageName", + signature(x = "SparkDataFrame"), + function(x) { +callJMethod(x@sdf, "name") + }) + +#' storageName +#' +#' Set a SparkDataFrame's name. This will be displayed on the Storage tab in the UI if cached. +#' +#' @param x The SparkDataFrame whose name is to be set. +#' @param name The SparkDataFrame name to be set. +#' @family SparkDataFrame functions +#' @return the SparkDataFrame renamed. +#' @rdname storageName +#' @examples +#'\dontrun{ +#' sparkR.session() +#' path <- "path/to/file.json" +#' df <- read.json(path) +#' storageName(df) <- "foo" --- End diff -- since it won't be useful unless it is cached, suggest add in the example here `cache(df)` --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #16609: [SPARK-8480] [CORE] [PYSPARK] [SPARKR] Add setNam...
Github user felixcheung commented on a diff in the pull request: https://github.com/apache/spark/pull/16609#discussion_r97192814 --- Diff: R/pkg/R/DataFrame.R --- @@ -78,6 +78,55 @@ dataFrame <- function(sdf, isCached = FALSE) { SparkDataFrame Methods ## +#' storageName +#' +#' Return a SparkDataFrame's name. +#' +#' @param x The SparkDataFrame whose name is returned. +#' @family SparkDataFrame functions +#' @rdname storageName +#' @examples +#'\dontrun{ +#' sparkR.session() +#' path <- "path/to/file.json" +#' df <- read.json(path) +#' storageName(df) +#'} +#' @aliases storageName,SparkDataFrame-method +#' @export +#' @note storageName since 2.2.0 +setMethod("storageName", + signature(x = "SparkDataFrame"), + function(x) { +callJMethod(x@sdf, "name") + }) + +#' storageName +#' +#' Set a SparkDataFrame's name. This will be displayed on the Storage tab in the UI if cached. +#' +#' @param x The SparkDataFrame whose name is to be set. +#' @param name The SparkDataFrame name to be set. --- End diff -- change this `name` to `value` --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #16609: [SPARK-8480] [CORE] [PYSPARK] [SPARKR] Add setNam...
Github user felixcheung commented on a diff in the pull request: https://github.com/apache/spark/pull/16609#discussion_r97192807 --- Diff: R/pkg/R/DataFrame.R --- @@ -78,6 +78,55 @@ dataFrame <- function(sdf, isCached = FALSE) { SparkDataFrame Methods ## +#' storageName +#' +#' Return a SparkDataFrame's name. +#' +#' @param x The SparkDataFrame whose name is returned. +#' @family SparkDataFrame functions +#' @rdname storageName +#' @examples +#'\dontrun{ +#' sparkR.session() +#' path <- "path/to/file.json" +#' df <- read.json(path) +#' storageName(df) --- End diff -- since they have the same `rdname` instead of two blocks of examples you should merge into one. In that case it would make more sense to check `storageName` - here checking without setting it first doesn't seem to make a lot of sense? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #16626: [SPARK-19261][SQL] Alter add columns for Hive tab...
Github user viirya commented on a diff in the pull request: https://github.com/apache/spark/pull/16626#discussion_r97192776 --- Diff: sql/core/src/main/java/org/apache/spark/sql/execution/datasources/parquet/SpecificParquetRecordReaderBase.java --- @@ -107,7 +107,13 @@ public void initialize(InputSplit inputSplit, TaskAttemptContext taskAttemptCont footer = readFooter(configuration, file, range(split.getStart(), split.getEnd())); MessageType fileSchema = footer.getFileMetaData().getSchema(); FilterCompat.Filter filter = getFilter(configuration); - blocks = filterRowGroups(filter, footer.getBlocks(), fileSchema); + try { +blocks = filterRowGroups(filter, footer.getBlocks(), fileSchema); + } catch (IllegalArgumentException e) { +// In the case where a particular parquet files does not contain --- End diff -- Can we add a TODO? I think the newer Parquet can handle this issue. Once we upgrade Parquet version, we don't need this. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #16609: [SPARK-8480] [CORE] [PYSPARK] [SPARKR] Add setNam...
Github user felixcheung commented on a diff in the pull request: https://github.com/apache/spark/pull/16609#discussion_r97192728 --- Diff: R/pkg/R/generics.R --- @@ -624,6 +624,14 @@ setGeneric("saveAsTable", function(df, tableName, source = NULL, mode = "error", standardGeneric("saveAsTable") }) +#' @rdname storageName +#' @export +setGeneric("storageName", function(x) { standardGeneric("storageName") }) + +#' @rdname storageName +#' @export +setGeneric("storageName<-", function(x, name) { standardGeneric("storageName<-") }) --- End diff -- ditto --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #16609: [SPARK-8480] [CORE] [PYSPARK] [SPARKR] Add setNam...
Github user felixcheung commented on a diff in the pull request: https://github.com/apache/spark/pull/16609#discussion_r97192721 --- Diff: R/pkg/R/DataFrame.R --- @@ -78,6 +78,55 @@ dataFrame <- function(sdf, isCached = FALSE) { SparkDataFrame Methods ## +#' storageName +#' +#' Return a SparkDataFrame's name. +#' +#' @param x The SparkDataFrame whose name is returned. +#' @family SparkDataFrame functions +#' @rdname storageName +#' @examples +#'\dontrun{ +#' sparkR.session() +#' path <- "path/to/file.json" +#' df <- read.json(path) +#' storageName(df) +#'} +#' @aliases storageName,SparkDataFrame-method +#' @export +#' @note name since 2.2.0 +setMethod("storageName", + signature(x = "SparkDataFrame"), + function(x) { +callJMethod(x@sdf, "name") + }) + +#' storageName +#' +#' Set a SparkDataFrame's name. This will be displayed on the Storage tab in the UI if cached. +#' +#' @param x The SparkDataFrame whose name is to be set. +#' @param name The SparkDataFrame name to be set. +#' @family SparkDataFrame functions +#' @return the SparkDataFrame renamed. +#' @rdname storageName +#' @examples +#'\dontrun{ +#' sparkR.session() +#' path <- "path/to/file.json" +#' df <- read.json(path) +#' storageName(df) <- "foo" +#'} +#' @aliases name<-,SparkDataFrame-method +#' @export +#' @note name<- since 2.2.0 +setMethod("storageName<-", + signature(x = "SparkDataFrame", name = "character"), + function(x, name) { +callJMethod(x@sdf, "setName", name) +x --- End diff -- for the setter (`something<-`) you have the make the first parameter `value` (change this from `name` you have here) --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #16666: [SPARK-19319][SparkR]:SparkR Kmeans summary retur...
Github user felixcheung commented on a diff in the pull request: https://github.com/apache/spark/pull/1#discussion_r97192703 --- Diff: R/pkg/R/mllib_clustering.R --- @@ -225,10 +225,12 @@ setMethod("spark.kmeans", signature(data = "SparkDataFrame", formula = "formula" #' @param object a fitted k-means model. #' @return \code{summary} returns summary information of the fitted model, which is a list. -#' The list includes the model's \code{k} (number of cluster centers), +#' The list includes the model's \code{k} (the configured number of cluster centers), #' \code{coefficients} (model cluster centers), -#' \code{size} (number of data points in each cluster), and \code{cluster} -#' (cluster centers of the transformed data). +#' \code{size} (number of data points in each cluster), \code{cluster} +#' (cluster centers of the transformed data), and \code{clusterSize} +#' (the actual number of cluster centers. When using initMode = "random", --- End diff -- let's add `is.loaded` here --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #16666: [SPARK-19319][SparkR]:SparkR Kmeans summary returns erro...
Github user felixcheung commented on the issue: https://github.com/apache/spark/pull/1 ah - does bisecting kmeans have the same behavior? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #16566: [SPARK-18821][SparkR]: Bisecting k-means wrapper in Spar...
Github user felixcheung commented on the issue: https://github.com/apache/spark/pull/16566 couple of last comments. @yanboliang do you have any comment? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #16566: [SPARK-18821][SparkR]: Bisecting k-means wrapper ...
Github user felixcheung commented on a diff in the pull request: https://github.com/apache/spark/pull/16566#discussion_r97192612 --- Diff: mllib/src/main/scala/org/apache/spark/ml/r/BisectingKMeansWrapper.scala --- @@ -0,0 +1,142 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the "License"); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + *http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package org.apache.spark.ml.r + +import org.apache.hadoop.fs.Path +import org.json4s._ +import org.json4s.JsonDSL._ +import org.json4s.jackson.JsonMethods._ + +import org.apache.spark.ml.{Pipeline, PipelineModel} +import org.apache.spark.ml.attribute.AttributeGroup +import org.apache.spark.ml.clustering.{BisectingKMeans, BisectingKMeansModel} +import org.apache.spark.ml.feature.RFormula +import org.apache.spark.ml.util._ +import org.apache.spark.sql.{DataFrame, Dataset} + +private[r] class BisectingKMeansWrapper private ( +val pipeline: PipelineModel, +val features: Array[String], +val size: Array[Long], +val isLoaded: Boolean = false) extends MLWritable { + private val bisectingKmeansModel: BisectingKMeansModel = +pipeline.stages.last.asInstanceOf[BisectingKMeansModel] + + lazy val coefficients: Array[Double] = bisectingKmeansModel.clusterCenters.flatMap(_.toArray) + + lazy val k: Int = bisectingKmeansModel.getK + + lazy val cluster: DataFrame = bisectingKmeansModel.summary.cluster --- End diff -- ah this is checked on the R side. could you add a comment here --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #16566: [SPARK-18821][SparkR]: Bisecting k-means wrapper ...
Github user felixcheung commented on a diff in the pull request: https://github.com/apache/spark/pull/16566#discussion_r97192608 --- Diff: R/pkg/R/mllib_clustering.R --- @@ -38,6 +45,149 @@ setClass("KMeansModel", representation(jobj = "jobj")) #' @note LDAModel since 2.1.0 setClass("LDAModel", representation(jobj = "jobj")) +#' Bisecting K-Means Clustering Model +#' +#' Fits a bisecting k-means clustering model against a Spark DataFrame. +#' Users can call \code{summary} to print a summary of the fitted model, \code{predict} to make +#' predictions on new data, and \code{write.ml}/\code{read.ml} to save/load fitted models. +#' +#' @param data a SparkDataFrame for training. +#' @param formula a symbolic description of the model to be fitted. Currently only a few formula +#'operators are supported, including '~', '.', ':', '+', and '-'. +#'Note that the response variable of formula is empty in spark.bisectingKmeans. +#' @param k the desired number of leaf clusters. Must be > 1. +#' The actual number could be smaller if there are no divisible leaf clusters. +#' @param maxIter maximum iteration number. +#' @param seed the random seed. +#' @param minDivisibleClusterSize The minimum number of points (if greater than or equal to 1.0) +#'or the minimum proportion of points (if less than 1.0) of a divisible cluster. +#'Note that it is an expert parameter. The default value should be good enough +#'for most cases. +#' @param ... additional argument(s) passed to the method. +#' @return \code{spark.bisectingKmeans} returns a fitted bisecting k-means model. +#' @rdname spark.bisectingKmeans +#' @aliases spark.bisectingKmeans,SparkDataFrame,formula-method +#' @name spark.bisectingKmeans +#' @export +#' @examples +#' \dontrun{ +#' sparkR.session() +#' df <- createDataFrame(iris) +#' model <- spark.bisectingKmeans(df, Sepal_Length ~ Sepal_Width, k = 4) +#' summary(model) +#' +#' # fitted values on training data +#' fitted <- predict(model, df) +#' head(select(fitted, "Sepal_Length", "prediction")) +#' +#' # save fitted model to input path +#' path <- "path/to/model" +#' write.ml(model, path) +#' +#' # can also read back the saved model and print +#' savedModel <- read.ml(path) +#' summary(savedModel) +#' } +#' @note spark.bisectingKmeans since 2.2.0 +#' @seealso \link{predict}, \link{read.ml}, \link{write.ml} +setMethod("spark.bisectingKmeans", signature(data = "SparkDataFrame", formula = "formula"), + function(data, formula, k = 4, maxIter = 20, seed = NULL, minDivisibleClusterSize = 1.0) { +formula <- paste0(deparse(formula), collapse = "") +if (!is.null(seed)) { + seed <- as.character(as.integer(seed)) +} +jobj <- callJStatic("org.apache.spark.ml.r.BisectingKMeansWrapper", "fit", +data@sdf, formula, as.integer(k), as.integer(maxIter), +seed, as.numeric(minDivisibleClusterSize)) +new("BisectingKMeansModel", jobj = jobj) + }) + +# Get the summary of a bisecting k-means model + +#' @param object a fitted bisecting k-means model. +#' @return \code{summary} returns summary information of the fitted model, which is a list. +#' The list includes the model's \code{k} (number of cluster centers), +#' \code{coefficients} (model cluster centers), +#' \code{size} (number of data points in each cluster), and \code{cluster} +#' (cluster centers of the transformed data). --- End diff -- also clarify `cluster` is NULL if is.loaded = T --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #16566: [SPARK-18821][SparkR]: Bisecting k-means wrapper ...
Github user felixcheung commented on a diff in the pull request: https://github.com/apache/spark/pull/16566#discussion_r97192589 --- Diff: mllib/src/main/scala/org/apache/spark/ml/r/BisectingKMeansWrapper.scala --- @@ -0,0 +1,142 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the "License"); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + *http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package org.apache.spark.ml.r + +import org.apache.hadoop.fs.Path +import org.json4s._ +import org.json4s.JsonDSL._ +import org.json4s.jackson.JsonMethods._ + +import org.apache.spark.ml.{Pipeline, PipelineModel} +import org.apache.spark.ml.attribute.AttributeGroup +import org.apache.spark.ml.clustering.{BisectingKMeans, BisectingKMeansModel} +import org.apache.spark.ml.feature.RFormula +import org.apache.spark.ml.util._ +import org.apache.spark.sql.{DataFrame, Dataset} + +private[r] class BisectingKMeansWrapper private ( +val pipeline: PipelineModel, +val features: Array[String], +val size: Array[Long], +val isLoaded: Boolean = false) extends MLWritable { + private val bisectingKmeansModel: BisectingKMeansModel = +pipeline.stages.last.asInstanceOf[BisectingKMeansModel] + + lazy val coefficients: Array[Double] = bisectingKmeansModel.clusterCenters.flatMap(_.toArray) + + lazy val k: Int = bisectingKmeansModel.getK + + lazy val cluster: DataFrame = bisectingKmeansModel.summary.cluster + + def fitted(method: String): DataFrame = { +if (method == "centers") { + bisectingKmeansModel.summary.predictions.drop(bisectingKmeansModel.getFeaturesCol) +} else if (method == "classes") { + bisectingKmeansModel.summary.cluster +} else { + throw new UnsupportedOperationException( +s"Method (centers or classes) required but $method found.") +} + } + + def transform(dataset: Dataset[_]): DataFrame = { +pipeline.transform(dataset).drop(bisectingKmeansModel.getFeaturesCol) + } + + override def write: MLWriter = new BisectingKMeansWrapper.BisectingKMeansWrapperWriter(this) +} + +private[r] object BisectingKMeansWrapper extends MLReadable[BisectingKMeansWrapper] { + + def fit( + data: DataFrame, + formula: String, + k: Int, + maxIter: Int, + seed: String, + minDivisibleClusterSize: Double + ): BisectingKMeansWrapper = { + +val rFormula = new RFormula() + .setFormula(formula) + .setFeaturesCol("features") +RWrapperUtils.checkDataColumns(rFormula, data) +val rFormulaModel = rFormula.fit(data) + +// get feature names from output schema +val schema = rFormulaModel.transform(data).schema +val featureAttrs = AttributeGroup.fromStructField(schema(rFormulaModel.getFeaturesCol)) + .attributes.get +val features = featureAttrs.map(_.name.get) + +val bisectingKmeans = new BisectingKMeans() + .setK(k) + .setMaxIter(maxIter) + .setMinDivisibleClusterSize(minDivisibleClusterSize) + .setFeaturesCol(rFormula.getFeaturesCol) + +if (seed != null && seed.length > 0) bisectingKmeans.setSeed(seed.toInt) + +val pipeline = new Pipeline() + .setStages(Array(rFormulaModel, bisectingKmeans)) + .fit(data) + +val bisectingKmeansModel: BisectingKMeansModel = + pipeline.stages(1).asInstanceOf[BisectingKMeansModel] --- End diff -- let's be consistent here with L38 - either (1) or last --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #16566: [SPARK-18821][SparkR]: Bisecting k-means wrapper ...
Github user felixcheung commented on a diff in the pull request: https://github.com/apache/spark/pull/16566#discussion_r97192573 --- Diff: mllib/src/main/scala/org/apache/spark/ml/r/BisectingKMeansWrapper.scala --- @@ -0,0 +1,142 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the "License"); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + *http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package org.apache.spark.ml.r + +import org.apache.hadoop.fs.Path +import org.json4s._ +import org.json4s.JsonDSL._ +import org.json4s.jackson.JsonMethods._ + +import org.apache.spark.ml.{Pipeline, PipelineModel} +import org.apache.spark.ml.attribute.AttributeGroup +import org.apache.spark.ml.clustering.{BisectingKMeans, BisectingKMeansModel} +import org.apache.spark.ml.feature.RFormula +import org.apache.spark.ml.util._ +import org.apache.spark.sql.{DataFrame, Dataset} + +private[r] class BisectingKMeansWrapper private ( +val pipeline: PipelineModel, +val features: Array[String], +val size: Array[Long], +val isLoaded: Boolean = false) extends MLWritable { + private val bisectingKmeansModel: BisectingKMeansModel = +pipeline.stages.last.asInstanceOf[BisectingKMeansModel] + + lazy val coefficients: Array[Double] = bisectingKmeansModel.clusterCenters.flatMap(_.toArray) + + lazy val k: Int = bisectingKmeansModel.getK + + lazy val cluster: DataFrame = bisectingKmeansModel.summary.cluster --- End diff -- does this have valid values when the model is loaded? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #16566: [SPARK-18821][SparkR]: Bisecting k-means wrapper ...
Github user felixcheung commented on a diff in the pull request: https://github.com/apache/spark/pull/16566#discussion_r97192502 --- Diff: R/pkg/R/mllib_clustering.R --- @@ -38,6 +45,149 @@ setClass("KMeansModel", representation(jobj = "jobj")) #' @note LDAModel since 2.1.0 setClass("LDAModel", representation(jobj = "jobj")) +#' Bisecting K-Means Clustering Model +#' +#' Fits a bisecting k-means clustering model against a Spark DataFrame. +#' Users can call \code{summary} to print a summary of the fitted model, \code{predict} to make +#' predictions on new data, and \code{write.ml}/\code{read.ml} to save/load fitted models. +#' +#' @param data a SparkDataFrame for training. +#' @param formula a symbolic description of the model to be fitted. Currently only a few formula +#'operators are supported, including '~', '.', ':', '+', and '-'. +#'Note that the response variable of formula is empty in spark.bisectingKmeans. +#' @param k the desired number of leaf clusters. Must be > 1. +#' The actual number could be smaller if there are no divisible leaf clusters. +#' @param maxIter maximum iteration number. +#' @param seed the random seed. +#' @param minDivisibleClusterSize The minimum number of points (if greater than or equal to 1.0) +#'or the minimum proportion of points (if less than 1.0) of a divisible cluster. +#'Note that it is an expert parameter. The default value should be good enough +#'for most cases. +#' @param ... additional argument(s) passed to the method. +#' @return \code{spark.bisectingKmeans} returns a fitted bisecting k-means model. +#' @rdname spark.bisectingKmeans +#' @aliases spark.bisectingKmeans,SparkDataFrame,formula-method +#' @name spark.bisectingKmeans +#' @export +#' @examples +#' \dontrun{ +#' sparkR.session() +#' df <- createDataFrame(iris) +#' model <- spark.bisectingKmeans(df, Sepal_Length ~ Sepal_Width, k = 4) +#' summary(model) +#' +#' # fitted values on training data +#' fitted <- predict(model, df) +#' head(select(fitted, "Sepal_Length", "prediction")) +#' +#' # save fitted model to input path +#' path <- "path/to/model" +#' write.ml(model, path) +#' +#' # can also read back the saved model and print +#' savedModel <- read.ml(path) +#' summary(savedModel) +#' } +#' @note spark.bisectingKmeans since 2.2.0 +#' @seealso \link{predict}, \link{read.ml}, \link{write.ml} +setMethod("spark.bisectingKmeans", signature(data = "SparkDataFrame", formula = "formula"), + function(data, formula, k = 4, maxIter = 20, seed = NULL, minDivisibleClusterSize = 1.0) { +formula <- paste0(deparse(formula), collapse = "") +if (!is.null(seed)) { + seed <- as.character(as.integer(seed)) +} +jobj <- callJStatic("org.apache.spark.ml.r.BisectingKMeansWrapper", "fit", +data@sdf, formula, as.integer(k), as.integer(maxIter), +seed, as.numeric(minDivisibleClusterSize)) +new("BisectingKMeansModel", jobj = jobj) + }) + +# Get the summary of a bisecting k-means model + +#' @param object a fitted bisecting k-means model. +#' @return \code{summary} returns summary information of the fitted model, which is a list. +#' The list includes the model's \code{k} (number of cluster centers), +#' \code{coefficients} (model cluster centers), +#' \code{size} (number of data points in each cluster), and \code{cluster} +#' (cluster centers of the transformed data). --- End diff -- let's add `is.loaded` here --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #16566: [SPARK-18821][SparkR]: Bisecting k-means wrapper ...
Github user felixcheung commented on a diff in the pull request: https://github.com/apache/spark/pull/16566#discussion_r97192351 --- Diff: R/pkg/R/mllib_clustering.R --- @@ -38,6 +45,149 @@ setClass("KMeansModel", representation(jobj = "jobj")) #' @note LDAModel since 2.1.0 setClass("LDAModel", representation(jobj = "jobj")) +#' Bisecting K-Means Clustering Model +#' +#' Fits a bisecting k-means clustering model against a Spark DataFrame. +#' Users can call \code{summary} to print a summary of the fitted model, \code{predict} to make +#' predictions on new data, and \code{write.ml}/\code{read.ml} to save/load fitted models. +#' +#' @param data a SparkDataFrame for training. +#' @param formula a symbolic description of the model to be fitted. Currently only a few formula +#'operators are supported, including '~', '.', ':', '+', and '-'. +#'Note that the response variable of formula is empty in spark.bisectingKmeans. +#' @param k the desired number of leaf clusters. Must be > 1. +#' The actual number could be smaller if there are no divisible leaf clusters. +#' @param maxIter maximum iteration number. +#' @param seed the random seed. +#' @param minDivisibleClusterSize The minimum number of points (if greater than or equal to 1.0) +#'or the minimum proportion of points (if less than 1.0) of a divisible cluster. +#'Note that it is an expert parameter. The default value should be good enough +#'for most cases. +#' @param ... additional argument(s) passed to the method. +#' @return \code{spark.bisectingKmeans} returns a fitted bisecting k-means model. +#' @rdname spark.bisectingKmeans +#' @aliases spark.bisectingKmeans,SparkDataFrame,formula-method +#' @name spark.bisectingKmeans +#' @export +#' @examples +#' \dontrun{ +#' sparkR.session() +#' df <- createDataFrame(iris) +#' model <- spark.bisectingKmeans(df, Sepal_Length ~ Sepal_Width, k = 4) +#' summary(model) +#' +#' # fitted values on training data +#' fitted <- predict(model, df) +#' head(select(fitted, "Sepal_Length", "prediction")) +#' +#' # save fitted model to input path +#' path <- "path/to/model" +#' write.ml(model, path) +#' +#' # can also read back the saved model and print +#' savedModel <- read.ml(path) +#' summary(savedModel) +#' } +#' @note spark.bisectingKmeans since 2.2.0 +#' @seealso \link{predict}, \link{read.ml}, \link{write.ml} +setMethod("spark.bisectingKmeans", signature(data = "SparkDataFrame", formula = "formula"), + function(data, formula, k = 4, maxIter = 20, seed = NULL, minDivisibleClusterSize = 1.0) { +formula <- paste0(deparse(formula), collapse = "") +if (!is.null(seed)) { + seed <- as.character(as.integer(seed)) +} +jobj <- callJStatic("org.apache.spark.ml.r.BisectingKMeansWrapper", "fit", +data@sdf, formula, as.integer(k), as.integer(maxIter), +seed, as.numeric(minDivisibleClusterSize)) +new("BisectingKMeansModel", jobj = jobj) + }) + +# Get the summary of a bisecting k-means model + +#' @param object a fitted bisecting k-means model. +#' @return \code{summary} returns summary information of the fitted model, which is a list. +#' The list includes the model's \code{k} (number of cluster centers), +#' \code{coefficients} (model cluster centers), +#' \code{size} (number of data points in each cluster), and \code{cluster} +#' (cluster centers of the transformed data). +#' @rdname spark.bisectingKmeans +#' @export +#' @note summary(BisectingKMeansModel) since 2.2.0 +setMethod("summary", signature(object = "BisectingKMeansModel"), + function(object) { +jobj <- object@jobj +is.loaded <- callJMethod(jobj, "isLoaded") +features <- callJMethod(jobj, "features") +coefficients <- callJMethod(jobj, "coefficients") +k <- callJMethod(jobj, "k") +size <- callJMethod(jobj, "size") +coefficients <- t(matrix(coefficients, ncol = k)) +colnames(coefficients) <- unlist(features) +rownames(coefficients) <- 1:k +cluster <- if (is.loaded) { + NULL +} else { + dataFrame(callJMethod(jobj, "cluster")) +} +list(k = k, coefficients = coefficients, size = size, +cluster = cluster, is.loaded = is.loaded) + }) + +# Predicted values b
[GitHub] spark issue #16663: [SPARK-18823][SPARKR] add support for assigning to colum...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/16663 **[Test build #71760 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/71760/testReport)** for PR 16663 at commit [`73845cb`](https://github.com/apache/spark/commit/73845cb93be7692fe6954232583166d66d0bf8d2). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #16660: [SPARK-19311][SQL] fix UDT hierarchy issue
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/16660 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/71755/ Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #16660: [SPARK-19311][SQL] fix UDT hierarchy issue
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/16660 Merged build finished. Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #16660: [SPARK-19311][SQL] fix UDT hierarchy issue
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/16660 **[Test build #71755 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/71755/testReport)** for PR 16660 at commit [`7ea9aa6`](https://github.com/apache/spark/commit/7ea9aa636f430a30b8d83ed2dda954fd06347d79). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #16663: [SPARK-18823][SPARKR] add support for assigning to colum...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/16663 Test FAILed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/71758/ Test FAILed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #16663: [SPARK-18823][SPARKR] add support for assigning to colum...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/16663 **[Test build #71758 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/71758/testReport)** for PR 16663 at commit [`17d3226`](https://github.com/apache/spark/commit/17d32262252f6beac7abd2afd5fb266d092ed7c2). * This patch **fails SparkR unit tests**. * This patch merges cleanly. * This patch adds no public classes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #16663: [SPARK-18823][SPARKR] add support for assigning to colum...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/16663 Merged build finished. Test FAILed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #16516: [SPARK-19155][ML] MLlib GeneralizedLinearRegression fami...
Github user felixcheung commented on the issue: https://github.com/apache/spark/pull/16516 looks good to me --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #16655: [SPARK-19305][SQL] partitioned table should alway...
Github user asfgit closed the pull request at: https://github.com/apache/spark/pull/16655 --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #16659: [SPARK-19309][SQL] disable common subexpression eliminat...
Github user viirya commented on the issue: https://github.com/apache/spark/pull/16659 LGTM --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #16659: [SPARK-19309][SQL] disable common subexpression e...
Github user viirya commented on a diff in the pull request: https://github.com/apache/spark/pull/16659#discussion_r97191894 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/EquivalentExpressions.scala --- @@ -67,28 +67,33 @@ class EquivalentExpressions { /** * Adds the expression to this data structure recursively. Stops if a matching expression * is found. That is, if `expr` has already been added, its children are not added. - * If ignoreLeaf is true, leaf nodes are ignored. */ - def addExprTree( - root: Expression, - ignoreLeaf: Boolean = true, - skipReferenceToExpressions: Boolean = true): Unit = { -val skip = (root.isInstanceOf[LeafExpression] && ignoreLeaf) || + def addExprTree(expr: Expression): Unit = { +val skip = expr.isInstanceOf[LeafExpression] || // `LambdaVariable` is usually used as a loop variable, which can't be evaluated ahead of the // loop. So we can't evaluate sub-expressions containing `LambdaVariable` at the beginning. - root.find(_.isInstanceOf[LambdaVariable]).isDefined -// There are some special expressions that we should not recurse into children. + expr.find(_.isInstanceOf[LambdaVariable]).isDefined + +// There are some special expressions that we should not recurse into all of its children. // 1. CodegenFallback: it's children will not be used to generate code (call eval() instead) -// 2. ReferenceToExpressions: it's kind of an explicit sub-expression elimination. -val shouldRecurse = root match { - // TODO: some expressions implements `CodegenFallback` but can still do codegen, - // e.g. `CaseWhen`, we should support them. - case _: CodegenFallback => false - case _: ReferenceToExpressions if skipReferenceToExpressions => false - case _ => true +// 2. If: common subexpressions will always be evaluated at the beginning, but the true and --- End diff -- this's cool. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #16659: [SPARK-19309][SQL] disable common subexpression eliminat...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/16659 **[Test build #71759 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/71759/testReport)** for PR 16659 at commit [`9d50048`](https://github.com/apache/spark/commit/9d50048a47b5052a85faa16535535eb86c146aa3). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #16655: [SPARK-19305][SQL] partitioned table should always put p...
Github user cloud-fan commented on the issue: https://github.com/apache/spark/pull/16655 thanks for the review, merging to master! --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #16655: [SPARK-19305][SQL] partitioned table should always put p...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/16655 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/71754/ Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #16655: [SPARK-19305][SQL] partitioned table should always put p...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/16655 Merged build finished. Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #16655: [SPARK-19305][SQL] partitioned table should always put p...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/16655 **[Test build #71754 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/71754/testReport)** for PR 16655 at commit [`68f639e`](https://github.com/apache/spark/commit/68f639e468333faa9070cca639b3b491585b2e39). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #16665: [SPARK-13478][YARN] Use real user when fetching delegati...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/16665 Test FAILed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/71757/ Test FAILed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #16665: [SPARK-13478][YARN] Use real user when fetching delegati...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/16665 **[Test build #71757 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/71757/consoleFull)** for PR 16665 at commit [`e847ab0`](https://github.com/apache/spark/commit/e847ab0a13534a3bc97cd37ab91a0be8ed838bfa). * This patch **fails PySpark unit tests**. * This patch merges cleanly. * This patch adds no public classes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #16665: [SPARK-13478][YARN] Use real user when fetching delegati...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/16665 Merged build finished. Test FAILed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #16626: [SPARK-19261][SQL] Alter add columns for Hive tab...
Github user xwu0226 commented on a diff in the pull request: https://github.com/apache/spark/pull/16626#discussion_r97191587 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/command/tables.scala --- @@ -168,6 +168,43 @@ case class AlterTableRenameCommand( } /** + * A command that add columns to a table + * The syntax of using this command in SQL is: + * {{{ + * ALTER TABLE table_identifier + * ADD COLUMNS (col_name data_type [COMMENT col_comment], ...); + * }}} +*/ +case class AlterTableAddColumnsCommand( +table: TableIdentifier, +columns: Seq[StructField]) extends RunnableCommand { + override def run(sparkSession: SparkSession): Seq[Row] = { +val catalog = sparkSession.sessionState.catalog +val catalogTable = DDLUtils.verifyAlterTableAddColumn(catalog, table) + +// If an exception is thrown here we can just assume the table is uncached; +// this can happen with Hive tables when the underlying catalog is in-memory. +val wasCached = Try(sparkSession.catalog.isCached(table.unquotedString)).getOrElse(false) +if (wasCached) { + try { +sparkSession.catalog.uncacheTable(table.unquotedString) + } catch { +case NonFatal(e) => log.warn(e.toString, e) + } +} +// Invalidate the table last, otherwise uncaching the table would load the logical plan +// back into the hive metastore cache +catalog.refreshTable(table) + +val newSchema = catalogTable.schema.copy(fields = catalogTable.schema.fields ++ columns) --- End diff -- We support partitioned tables. The test cases add include this case. However, we don't support ALTER ADD COLUMNS to a particular partition, as what Hive can do today. EX: `ALTER TABLE T1 PARTITION(c3=1) ADD COLUMNS `. This is another potential feature to add if we maintain schema for a partition. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #16663: [SPARK-18823][SPARKR] add support for assigning to colum...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/16663 **[Test build #71758 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/71758/testReport)** for PR 16663 at commit [`17d3226`](https://github.com/apache/spark/commit/17d32262252f6beac7abd2afd5fb266d092ed7c2). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #16527: [SPARK-19146][Core]Drop more elements when stageData.tas...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/16527 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/71753/ Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #16527: [SPARK-19146][Core]Drop more elements when stageData.tas...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/16527 Merged build finished. Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #16626: [SPARK-19261][SQL] Alter add columns for Hive tab...
Github user gatorsmile commented on a diff in the pull request: https://github.com/apache/spark/pull/16626#discussion_r97191445 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/command/ddl.scala --- @@ -814,4 +814,28 @@ object DDLUtils { } } } + + /** + * ALTER TABLE ADD COLUMNS command does not support temporary view/table, + * view, or datasource table yet. + */ + def verifyAlterTableAddColumn( + catalog: SessionCatalog, + table: TableIdentifier): CatalogTable = { +if (catalog.isTemporaryTable(table)) { + throw new AnalysisException( +s"${table.toString} is a temporary VIEW, which does not support ALTER ADD COLUMNS.") +} + +val catalogTable = catalog.getTableMetadata(table) +if (catalogTable.tableType == CatalogTableType.VIEW) { + throw new AnalysisException( +s"${table.toString} is a VIEW, which does not support ALTER ADD COLUMNS.") +} +if (isDatasourceTable(catalogTable)) { --- End diff -- Currently, their code paths for managing hive serde tables and data source tables have been combined. Thus, it can be easily handled together. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #16527: [SPARK-19146][Core]Drop more elements when stageData.tas...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/16527 **[Test build #71753 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/71753/testReport)** for PR 16527 at commit [`89721cd`](https://github.com/apache/spark/commit/89721cd7e8048ee72d37c18bd762d1ba7d73ef3b). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #16620: [SPARK-19263] DAGScheduler should avoid sending conflict...
Github user jinxing64 commented on the issue: https://github.com/apache/spark/pull/16620 @markhamstra Thanks a lot for your comment, I've already refined, please take another look ~ --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #16626: [SPARK-19261][SQL] Alter add columns for Hive tab...
Github user xwu0226 commented on a diff in the pull request: https://github.com/apache/spark/pull/16626#discussion_r97191366 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/command/ddl.scala --- @@ -814,4 +814,28 @@ object DDLUtils { } } } + + /** + * ALTER TABLE ADD COLUMNS command does not support temporary view/table, + * view, or datasource table yet. + */ + def verifyAlterTableAddColumn( + catalog: SessionCatalog, + table: TableIdentifier): CatalogTable = { +if (catalog.isTemporaryTable(table)) { + throw new AnalysisException( +s"${table.toString} is a temporary VIEW, which does not support ALTER ADD COLUMNS.") +} + +val catalogTable = catalog.getTableMetadata(table) +if (catalogTable.tableType == CatalogTableType.VIEW) { + throw new AnalysisException( +s"${table.toString} is a VIEW, which does not support ALTER ADD COLUMNS.") +} +if (isDatasourceTable(catalogTable)) { --- End diff -- I am thinking that there are different ways to create a datasource table, such as df.write.saveAsTable, or create with "create table " DDL with/without schema. Plus JDBC datasource table maybe not be supported.. I just want to spend more time on trying different scenarios to see if there is any hole before claiming supporting it. I will submit another PR once I am sure it is handled correctly. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #16581: [SPARK-18589] [SQL] Fix Python UDF accessing attributes ...
Github user davies commented on the issue: https://github.com/apache/spark/pull/16581 Cherry-picked into 2.1 branch. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #16659: [SPARK-19309][SQL] disable common subexpression eliminat...
Github user viirya commented on the issue: https://github.com/apache/spark/pull/16659 The child expression in `Sum` is wrapped in `Coalesce`. So making `org.apache.spark.sql.SQLQuerySuite.Common subexpression elimination` test failed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #16667: [SPARK-18750][yarn] Avoid using "mapValues" when allocat...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/16667 Merged build finished. Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #16667: [SPARK-18750][yarn] Avoid using "mapValues" when allocat...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/16667 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/71756/ Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #16667: [SPARK-18750][yarn] Avoid using "mapValues" when allocat...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/16667 **[Test build #71756 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/71756/testReport)** for PR 16667 at commit [`16a99fc`](https://github.com/apache/spark/commit/16a99fcff20a2527d95d54d94c1c348dbd638f26). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #16659: [SPARK-19309][SQL] disable common subexpression eliminat...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/16659 Test FAILed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/71752/ Test FAILed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #16659: [SPARK-19309][SQL] disable common subexpression eliminat...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/16659 Merged build finished. Test FAILed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #16659: [SPARK-19309][SQL] disable common subexpression eliminat...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/16659 **[Test build #71752 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/71752/testReport)** for PR 16659 at commit [`cda9723`](https://github.com/apache/spark/commit/cda9723e8adc07142521cd5d17568f6e5ff3b709). * This patch **fails Spark unit tests**. * This patch merges cleanly. * This patch adds no public classes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #16667: [SPARK-18750][yarn] Avoid using "mapValues" when allocat...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/16667 **[Test build #71756 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/71756/testReport)** for PR 16667 at commit [`16a99fc`](https://github.com/apache/spark/commit/16a99fcff20a2527d95d54d94c1c348dbd638f26). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #16665: [SPARK-13478][YARN] Use real user when fetching delegati...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/16665 **[Test build #71757 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/71757/consoleFull)** for PR 16665 at commit [`e847ab0`](https://github.com/apache/spark/commit/e847ab0a13534a3bc97cd37ab91a0be8ed838bfa). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #16665: [SPARK-13478][YARN] Use real user when fetching delegati...
Github user vanzin commented on the issue: https://github.com/apache/spark/pull/16665 seems unrelated but... retest this please --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #16667: [SPARK-18750][yarn] Avoid using "mapValues" when allocat...
Github user vanzin commented on the issue: https://github.com/apache/spark/pull/16667 Argh, api not available in old hadoop... fix coming. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #16626: [SPARK-19261][SQL] Alter add columns for Hive tab...
Github user gatorsmile commented on a diff in the pull request: https://github.com/apache/spark/pull/16626#discussion_r97189715 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/command/tables.scala --- @@ -168,6 +168,43 @@ case class AlterTableRenameCommand( } /** + * A command that add columns to a table + * The syntax of using this command in SQL is: + * {{{ + * ALTER TABLE table_identifier + * ADD COLUMNS (col_name data_type [COMMENT col_comment], ...); + * }}} +*/ +case class AlterTableAddColumnsCommand( +table: TableIdentifier, +columns: Seq[StructField]) extends RunnableCommand { + override def run(sparkSession: SparkSession): Seq[Row] = { +val catalog = sparkSession.sessionState.catalog +val catalogTable = DDLUtils.verifyAlterTableAddColumn(catalog, table) + +// If an exception is thrown here we can just assume the table is uncached; +// this can happen with Hive tables when the underlying catalog is in-memory. +val wasCached = Try(sparkSession.catalog.isCached(table.unquotedString)).getOrElse(false) +if (wasCached) { + try { +sparkSession.catalog.uncacheTable(table.unquotedString) + } catch { +case NonFatal(e) => log.warn(e.toString, e) + } +} +// Invalidate the table last, otherwise uncaching the table would load the logical plan +// back into the hive metastore cache +catalog.refreshTable(table) + +val newSchema = catalogTable.schema.copy(fields = catalogTable.schema.fields ++ columns) --- End diff -- We are not supporting partitioned tables, right? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #16626: [SPARK-19261][SQL] Alter add columns for Hive tab...
Github user gatorsmile commented on a diff in the pull request: https://github.com/apache/spark/pull/16626#discussion_r97189688 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/command/ddl.scala --- @@ -814,4 +814,28 @@ object DDLUtils { } } } + + /** + * ALTER TABLE ADD COLUMNS command does not support temporary view/table, + * view, or datasource table yet. + */ + def verifyAlterTableAddColumn( + catalog: SessionCatalog, + table: TableIdentifier): CatalogTable = { +if (catalog.isTemporaryTable(table)) { + throw new AnalysisException( +s"${table.toString} is a temporary VIEW, which does not support ALTER ADD COLUMNS.") +} + +val catalogTable = catalog.getTableMetadata(table) +if (catalogTable.tableType == CatalogTableType.VIEW) { + throw new AnalysisException( +s"${table.toString} is a VIEW, which does not support ALTER ADD COLUMNS.") +} +if (isDatasourceTable(catalogTable)) { --- End diff -- What is the reason why data source tables are not supported? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #16660: [SPARK-19311][SQL] fix UDT hierarchy issue
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/16660 **[Test build #71755 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/71755/testReport)** for PR 16660 at commit [`7ea9aa6`](https://github.com/apache/spark/commit/7ea9aa636f430a30b8d83ed2dda954fd06347d79). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #16626: [SPARK-19261][SQL] Alter add columns for Hive tables
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/16626 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/71749/ Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #16626: [SPARK-19261][SQL] Alter add columns for Hive tables
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/16626 Merged build finished. Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #16660: [SPARK-19311][SQL] fix UDT hierarchy issue
Github user gatorsmile commented on the issue: https://github.com/apache/spark/pull/16660 You can add a test case in https://github.com/apache/spark/blob/master/sql/core/src/test/scala/org/apache/spark/sql/UserDefinedTypeSuite.scala --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #16626: [SPARK-19261][SQL] Alter add columns for Hive tables
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/16626 **[Test build #71749 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/71749/testReport)** for PR 16626 at commit [`73b0243`](https://github.com/apache/spark/commit/73b024309674dc6d76e853547ef2a64da4836ce8). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #16660: [SPARK-19311][SQL] fix UDT hierarchy issue
Github user gatorsmile commented on the issue: https://github.com/apache/spark/pull/16660 ok to test --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #16660: [SPARK-19311][SQL] fix UDT hierarchy issue
Github user viirya commented on the issue: https://github.com/apache/spark/pull/16660 LGTM too. @gmoehler Can you add a unit test? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #15192: [SPARK-14536] [SQL] fix to handle null value in a...
Github user asfgit closed the pull request at: https://github.com/apache/spark/pull/15192 --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #15192: [SPARK-14536] [SQL] fix to handle null value in array ty...
Github user gatorsmile commented on the issue: https://github.com/apache/spark/pull/15192 Thanks! Merging to master. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #16655: [SPARK-19305][SQL] partitioned table should always put p...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/16655 **[Test build #71754 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/71754/testReport)** for PR 16655 at commit [`68f639e`](https://github.com/apache/spark/commit/68f639e468333faa9070cca639b3b491585b2e39). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #16655: [SPARK-19305][SQL] partitioned table should always put p...
Github user windpiger commented on the issue: https://github.com/apache/spark/pull/16655 LGTM, after this merged, I will contiune the work #16593 thanks~ --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #16655: [SPARK-19305][SQL] partitioned table should always put p...
Github user gatorsmile commented on the issue: https://github.com/apache/spark/pull/16655 LGTM pending test --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #16655: [SPARK-19305][SQL] partitioned table should alway...
Github user cloud-fan commented on a diff in the pull request: https://github.com/apache/spark/pull/16655#discussion_r97189276 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/rules.scala --- @@ -199,31 +199,52 @@ case class AnalyzeCreateTable(sparkSession: SparkSession) extends Rule[LogicalPl // * can't use all table columns as partition columns. // * partition columns' type must be AtomicType. // * sort columns' type must be orderable. +// * reorder table schema or output of query plan, to put partition columns at the end. case c @ CreateTable(tableDesc, _, query) => - val analyzedQuery = query.map { q => -// Analyze the query in CTAS and then we can do the normalization and checking. -val qe = sparkSession.sessionState.executePlan(q) + if (query.isDefined) { +val qe = sparkSession.sessionState.executePlan(query.get) qe.assertAnalyzed() -qe.analyzed - } - val schema = if (analyzedQuery.isDefined) { -analyzedQuery.get.schema - } else { -tableDesc.schema - } +val analyzedQuery = qe.analyzed + +val normalizedTable = normalizeCatalogTable(analyzedQuery.schema, tableDesc) + +val output = analyzedQuery.output +val partitionAttrs = normalizedTable.partitionColumnNames.map { partCol => + output.find(_.name == partCol).get +} +val newOutput = output.filterNot(partitionAttrs.contains) ++ partitionAttrs +val reorderedQuery = if (newOutput == output) { + analyzedQuery +} else { + Project(newOutput, analyzedQuery) +} - val columnNames = if (sparkSession.sessionState.conf.caseSensitiveAnalysis) { -schema.map(_.name) +c.copy(tableDesc = normalizedTable, query = Some(reorderedQuery)) --- End diff -- this should be guaranteed by the parser, but we can check it again here. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #16655: [SPARK-19305][SQL] partitioned table should alway...
Github user gatorsmile commented on a diff in the pull request: https://github.com/apache/spark/pull/16655#discussion_r97189247 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/rules.scala --- @@ -199,31 +199,52 @@ case class AnalyzeCreateTable(sparkSession: SparkSession) extends Rule[LogicalPl // * can't use all table columns as partition columns. // * partition columns' type must be AtomicType. // * sort columns' type must be orderable. +// * reorder table schema or output of query plan, to put partition columns at the end. case c @ CreateTable(tableDesc, _, query) => - val analyzedQuery = query.map { q => -// Analyze the query in CTAS and then we can do the normalization and checking. -val qe = sparkSession.sessionState.executePlan(q) + if (query.isDefined) { +val qe = sparkSession.sessionState.executePlan(query.get) qe.assertAnalyzed() -qe.analyzed - } - val schema = if (analyzedQuery.isDefined) { -analyzedQuery.get.schema - } else { -tableDesc.schema - } +val analyzedQuery = qe.analyzed + +val normalizedTable = normalizeCatalogTable(analyzedQuery.schema, tableDesc) + +val output = analyzedQuery.output +val partitionAttrs = normalizedTable.partitionColumnNames.map { partCol => + output.find(_.name == partCol).get +} +val newOutput = output.filterNot(partitionAttrs.contains) ++ partitionAttrs +val reorderedQuery = if (newOutput == output) { + analyzedQuery +} else { + Project(newOutput, analyzedQuery) +} - val columnNames = if (sparkSession.sessionState.conf.caseSensitiveAnalysis) { -schema.map(_.name) +c.copy(tableDesc = normalizedTable, query = Some(reorderedQuery)) --- End diff -- How about adding one more check here? ```Scala assert(normalizedTable.schema.isEmpty, "Schema may not be specified in a Create Table As Select (CTAS) statement") ``` --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #16582: [SPARK-19220][UI] Make redirection to HTTPS apply to all...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/16582 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/71748/ Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #16582: [SPARK-19220][UI] Make redirection to HTTPS apply to all...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/16582 Merged build finished. Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #16582: [SPARK-19220][UI] Make redirection to HTTPS apply to all...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/16582 **[Test build #71748 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/71748/testReport)** for PR 16582 at commit [`eb0fcb7`](https://github.com/apache/spark/commit/eb0fcb792b8130e9cbdf68eb18b15f3f49148d9b). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #16527: [SPARK-19146][Core]Drop more elements when stageData.tas...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/16527 **[Test build #71753 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/71753/testReport)** for PR 16527 at commit [`89721cd`](https://github.com/apache/spark/commit/89721cd7e8048ee72d37c18bd762d1ba7d73ef3b). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #16660: [SPARK-19311][SQL] fix UDT hierarchy issue
Github user cloud-fan commented on the issue: https://github.com/apache/spark/pull/16660 is it possible to add a unit test? the change LGTM --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #16496: [SPARK-16101][SQL] Refactoring CSV write path to ...
Github user asfgit closed the pull request at: https://github.com/apache/spark/pull/16496 --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #16496: [SPARK-16101][SQL] Refactoring CSV write path to be cons...
Github user cloud-fan commented on the issue: https://github.com/apache/spark/pull/16496 thanks, merging to master! --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #16659: [SPARK-19309][SQL] disable common subexpression eliminat...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/16659 **[Test build #71752 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/71752/testReport)** for PR 16659 at commit [`cda9723`](https://github.com/apache/spark/commit/cda9723e8adc07142521cd5d17568f6e5ff3b709). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #16659: [SPARK-19309][SQL] disable common subexpression e...
Github user cloud-fan commented on a diff in the pull request: https://github.com/apache/spark/pull/16659#discussion_r97188584 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/aggregate/TypedAggregateExpression.scala --- @@ -181,19 +185,17 @@ case class SimpleTypedAggregateExpression( outputExternalType, bufferDeserializer :: Nil) +val serializeExprs = outputSerializer.map(_.transform { --- End diff -- it's always used, no need to make it lazy val. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #16659: [SPARK-19309][SQL] disable common subexpression e...
Github user cloud-fan commented on a diff in the pull request: https://github.com/apache/spark/pull/16659#discussion_r97188544 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/aggregate/TypedAggregateExpression.scala --- @@ -143,9 +143,15 @@ case class SimpleTypedAggregateExpression( override lazy val aggBufferAttributes: Seq[AttributeReference] = bufferSerializer.map(_.toAttribute.asInstanceOf[AttributeReference]) + private def deserializeToBuffer(expr: Expression): Seq[Expression] = { +bufferDeserializer.map(_.transform { + case _: BoundReference => expr +}) + } + override lazy val initialValues: Seq[Expression] = { val zero = Literal.fromObject(aggregator.zero, bufferExternalType) -bufferSerializer.map(ReferenceToExpressions(_, zero :: Nil)) --- End diff -- sorry, typo... --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #15192: [SPARK-14536] [SQL] fix to handle null value in array ty...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/15192 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/71746/ Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #15192: [SPARK-14536] [SQL] fix to handle null value in array ty...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/15192 Merged build finished. Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #16666: [SPARK-19319][SparkR]:SparkR Kmeans summary returns erro...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/1 **[Test build #71750 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/71750/testReport)** for PR 1 at commit [`2c1d02d`](https://github.com/apache/spark/commit/2c1d02d054fe1a8627b8610e8dd6de226b46af55). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #16666: [SPARK-19319][SparkR]:SparkR Kmeans summary returns erro...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/1 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/71750/ Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #16666: [SPARK-19319][SparkR]:SparkR Kmeans summary returns erro...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/1 Merged build finished. Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #15192: [SPARK-14536] [SQL] fix to handle null value in array ty...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/15192 **[Test build #71746 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/71746/testReport)** for PR 15192 at commit [`d8cbe54`](https://github.com/apache/spark/commit/d8cbe54f0440dd4bf4d87ca934a0bdbbf2eaa862). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #16213: [SPARK-18020][Streaming][Kinesis] Checkpoint SHARD_END t...
Github user maropu commented on the issue: https://github.com/apache/spark/pull/16213 @tdas ping --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #16627: [SPARK-19267][SS]Fix a race condition when stoppi...
Github user asfgit closed the pull request at: https://github.com/apache/spark/pull/16627 --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #16665: [SPARK-13478][YARN] Use real user when fetching delegati...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/16665 Merged build finished. Test FAILed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #16665: [SPARK-13478][YARN] Use real user when fetching delegati...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/16665 Test FAILed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/71747/ Test FAILed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org