[GitHub] spark issue #16668: [SPARK-18788][SPARKR] Add API for getNumPartitions

2017-01-20 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/16668
  
**[Test build #71761 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/71761/testReport)**
 for PR 16668 at commit 
[`34f9aa5`](https://github.com/apache/spark/commit/34f9aa520770974be7d1417a11ffdd1e1118ddf2).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #16668: [SPARK-18788][SPARKR] Add API for getNumPartition...

2017-01-20 Thread felixcheung
GitHub user felixcheung opened a pull request:

https://github.com/apache/spark/pull/16668

[SPARK-18788][SPARKR] Add API for getNumPartitions

## What changes were proposed in this pull request?

With doc to say this would convert DF into RDD

## How was this patch tested?

unit tests, manual tests

You can merge this pull request into a Git repository by running:

$ git pull https://github.com/felixcheung/spark rgetnumpartitions

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/spark/pull/16668.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #16668


commit 34f9aa520770974be7d1417a11ffdd1e1118ddf2
Author: Felix Cheung 
Date:   2017-01-21T07:53:30Z

getNumPartitions




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #16659: [SPARK-19309][SQL] disable common subexpression eliminat...

2017-01-20 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/16659
  
Test FAILed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/71759/
Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #16659: [SPARK-19309][SQL] disable common subexpression eliminat...

2017-01-20 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/16659
  
Merged build finished. Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #16659: [SPARK-19309][SQL] disable common subexpression eliminat...

2017-01-20 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/16659
  
**[Test build #71759 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/71759/testReport)**
 for PR 16659 at commit 
[`9d50048`](https://github.com/apache/spark/commit/9d50048a47b5052a85faa16535535eb86c146aa3).
 * This patch **fails Spark unit tests**.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #16652: [SPARK-19234][MLLib] AFTSurvivalRegression should fail f...

2017-01-20 Thread admackin
Github user admackin commented on the issue:

https://github.com/apache/spark/pull/16652
  
Yes, the version in MLUtils had labels of zero in the test cases, so was 
causing test cases to fail after my patch. It didn't look like there was a way 
to fix this, so I thought it better to make a patch that didn't affect 
potentially dozens of other packages. Any other thoughts on how to achieve 
this? I could add a 'minLabel' param to the MLUtils methods but that seems 
overly specific for this one package.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #16659: [SPARK-19309][SQL] disable common subexpression eliminat...

2017-01-20 Thread cloud-fan
Github user cloud-fan commented on the issue:

https://github.com/apache/spark/pull/16659
  
I reran the `DatasetBenchmark`, there is no performance regression.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #16663: [SPARK-18823][SPARKR] add support for assigning to colum...

2017-01-20 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/16663
  
**[Test build #71760 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/71760/testReport)**
 for PR 16663 at commit 
[`73845cb`](https://github.com/apache/spark/commit/73845cb93be7692fe6954232583166d66d0bf8d2).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #16663: [SPARK-18823][SPARKR] add support for assigning to colum...

2017-01-20 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/16663
  
Merged build finished. Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #16663: [SPARK-18823][SPARKR] add support for assigning to colum...

2017-01-20 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/16663
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/71760/
Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #16609: [SPARK-8480] [CORE] [PYSPARK] [SPARKR] Add setNam...

2017-01-20 Thread felixcheung
Github user felixcheung commented on a diff in the pull request:

https://github.com/apache/spark/pull/16609#discussion_r97192829
  
--- Diff: R/pkg/R/DataFrame.R ---
@@ -78,6 +78,55 @@ dataFrame <- function(sdf, isCached = FALSE) {
 
  SparkDataFrame Methods 
##
 
+#' storageName
+#'
+#' Return a SparkDataFrame's name.
+#'
+#' @param x The SparkDataFrame whose name is returned.
+#' @family SparkDataFrame functions
+#' @rdname storageName
+#' @examples
+#'\dontrun{
+#' sparkR.session()
+#' path <- "path/to/file.json"
+#' df <- read.json(path)
+#' storageName(df)
+#'}
+#' @aliases storageName,SparkDataFrame-method
+#' @export
+#' @note storageName since 2.2.0
+setMethod("storageName",
+  signature(x = "SparkDataFrame"),
+  function(x) {
+callJMethod(x@sdf, "name")
+  })
+
+#' storageName
+#'
+#' Set a SparkDataFrame's name. This will be displayed on the Storage tab 
in the UI if cached.
+#'
+#' @param x The SparkDataFrame whose name is to be set.
+#' @param name The SparkDataFrame name to be set.
+#' @family SparkDataFrame functions
+#' @return the SparkDataFrame renamed.
+#' @rdname storageName
+#' @examples
+#'\dontrun{
+#' sparkR.session()
+#' path <- "path/to/file.json"
+#' df <- read.json(path)
+#' storageName(df) <- "foo"
--- End diff --

since it won't be useful unless it is cached, suggest add in the example 
here `cache(df)`


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #16609: [SPARK-8480] [CORE] [PYSPARK] [SPARKR] Add setNam...

2017-01-20 Thread felixcheung
Github user felixcheung commented on a diff in the pull request:

https://github.com/apache/spark/pull/16609#discussion_r97192814
  
--- Diff: R/pkg/R/DataFrame.R ---
@@ -78,6 +78,55 @@ dataFrame <- function(sdf, isCached = FALSE) {
 
  SparkDataFrame Methods 
##
 
+#' storageName
+#'
+#' Return a SparkDataFrame's name.
+#'
+#' @param x The SparkDataFrame whose name is returned.
+#' @family SparkDataFrame functions
+#' @rdname storageName
+#' @examples
+#'\dontrun{
+#' sparkR.session()
+#' path <- "path/to/file.json"
+#' df <- read.json(path)
+#' storageName(df)
+#'}
+#' @aliases storageName,SparkDataFrame-method
+#' @export
+#' @note storageName since 2.2.0
+setMethod("storageName",
+  signature(x = "SparkDataFrame"),
+  function(x) {
+callJMethod(x@sdf, "name")
+  })
+
+#' storageName
+#'
+#' Set a SparkDataFrame's name. This will be displayed on the Storage tab 
in the UI if cached.
+#'
+#' @param x The SparkDataFrame whose name is to be set.
+#' @param name The SparkDataFrame name to be set.
--- End diff --

change this `name` to `value`


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #16609: [SPARK-8480] [CORE] [PYSPARK] [SPARKR] Add setNam...

2017-01-20 Thread felixcheung
Github user felixcheung commented on a diff in the pull request:

https://github.com/apache/spark/pull/16609#discussion_r97192807
  
--- Diff: R/pkg/R/DataFrame.R ---
@@ -78,6 +78,55 @@ dataFrame <- function(sdf, isCached = FALSE) {
 
  SparkDataFrame Methods 
##
 
+#' storageName
+#'
+#' Return a SparkDataFrame's name.
+#'
+#' @param x The SparkDataFrame whose name is returned.
+#' @family SparkDataFrame functions
+#' @rdname storageName
+#' @examples
+#'\dontrun{
+#' sparkR.session()
+#' path <- "path/to/file.json"
+#' df <- read.json(path)
+#' storageName(df)
--- End diff --

since they have the same `rdname` instead of two blocks of examples you 
should merge into one.
In that case it would make more sense to check `storageName` - here 
checking without setting it first doesn't seem to make a lot of sense?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #16626: [SPARK-19261][SQL] Alter add columns for Hive tab...

2017-01-20 Thread viirya
Github user viirya commented on a diff in the pull request:

https://github.com/apache/spark/pull/16626#discussion_r97192776
  
--- Diff: 
sql/core/src/main/java/org/apache/spark/sql/execution/datasources/parquet/SpecificParquetRecordReaderBase.java
 ---
@@ -107,7 +107,13 @@ public void initialize(InputSplit inputSplit, 
TaskAttemptContext taskAttemptCont
   footer = readFooter(configuration, file, range(split.getStart(), 
split.getEnd()));
   MessageType fileSchema = footer.getFileMetaData().getSchema();
   FilterCompat.Filter filter = getFilter(configuration);
-  blocks = filterRowGroups(filter, footer.getBlocks(), fileSchema);
+  try {
+blocks = filterRowGroups(filter, footer.getBlocks(), fileSchema);
+  } catch (IllegalArgumentException e) {
+// In the case where a particular parquet files does not contain
--- End diff --

Can we add a TODO? I think the newer Parquet can handle this issue. Once we 
upgrade Parquet version, we don't need this.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #16609: [SPARK-8480] [CORE] [PYSPARK] [SPARKR] Add setNam...

2017-01-20 Thread felixcheung
Github user felixcheung commented on a diff in the pull request:

https://github.com/apache/spark/pull/16609#discussion_r97192728
  
--- Diff: R/pkg/R/generics.R ---
@@ -624,6 +624,14 @@ setGeneric("saveAsTable", function(df, tableName, 
source = NULL, mode = "error",
   standardGeneric("saveAsTable")
 })
 
+#' @rdname storageName
+#' @export
+setGeneric("storageName", function(x) { standardGeneric("storageName") })
+
+#' @rdname storageName
+#' @export
+setGeneric("storageName<-", function(x, name) { 
standardGeneric("storageName<-") })
--- End diff --

ditto


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #16609: [SPARK-8480] [CORE] [PYSPARK] [SPARKR] Add setNam...

2017-01-20 Thread felixcheung
Github user felixcheung commented on a diff in the pull request:

https://github.com/apache/spark/pull/16609#discussion_r97192721
  
--- Diff: R/pkg/R/DataFrame.R ---
@@ -78,6 +78,55 @@ dataFrame <- function(sdf, isCached = FALSE) {
 
  SparkDataFrame Methods 
##
 
+#' storageName
+#'
+#' Return a SparkDataFrame's name.
+#'
+#' @param x The SparkDataFrame whose name is returned.
+#' @family SparkDataFrame functions
+#' @rdname storageName
+#' @examples
+#'\dontrun{
+#' sparkR.session()
+#' path <- "path/to/file.json"
+#' df <- read.json(path)
+#' storageName(df)
+#'}
+#' @aliases storageName,SparkDataFrame-method
+#' @export
+#' @note name since 2.2.0
+setMethod("storageName",
+  signature(x = "SparkDataFrame"),
+  function(x) {
+callJMethod(x@sdf, "name")
+  })
+
+#' storageName
+#'
+#' Set a SparkDataFrame's name. This will be displayed on the Storage tab 
in the UI if cached.
+#'
+#' @param x The SparkDataFrame whose name is to be set.
+#' @param name The SparkDataFrame name to be set.
+#' @family SparkDataFrame functions
+#' @return the SparkDataFrame renamed.
+#' @rdname storageName
+#' @examples
+#'\dontrun{
+#' sparkR.session()
+#' path <- "path/to/file.json"
+#' df <- read.json(path)
+#' storageName(df) <- "foo"
+#'}
+#' @aliases name<-,SparkDataFrame-method
+#' @export
+#' @note name<- since 2.2.0
+setMethod("storageName<-",
+  signature(x = "SparkDataFrame", name = "character"),
+  function(x, name) {
+callJMethod(x@sdf, "setName", name)
+x
--- End diff --

for the setter (`something<-`) you have the make the first parameter 
`value` (change this from `name` you have here)


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #16666: [SPARK-19319][SparkR]:SparkR Kmeans summary retur...

2017-01-20 Thread felixcheung
Github user felixcheung commented on a diff in the pull request:

https://github.com/apache/spark/pull/1#discussion_r97192703
  
--- Diff: R/pkg/R/mllib_clustering.R ---
@@ -225,10 +225,12 @@ setMethod("spark.kmeans", signature(data = 
"SparkDataFrame", formula = "formula"
 
 #' @param object a fitted k-means model.
 #' @return \code{summary} returns summary information of the fitted model, 
which is a list.
-#' The list includes the model's \code{k} (number of cluster 
centers),
+#' The list includes the model's \code{k} (the configured number 
of cluster centers),
 #' \code{coefficients} (model cluster centers),
-#' \code{size} (number of data points in each cluster), and 
\code{cluster}
-#' (cluster centers of the transformed data).
+#' \code{size} (number of data points in each cluster), 
\code{cluster}
+#' (cluster centers of the transformed data), and 
\code{clusterSize}
+#' (the actual number of cluster centers. When using initMode = 
"random",
--- End diff --

let's add `is.loaded` here


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #16666: [SPARK-19319][SparkR]:SparkR Kmeans summary returns erro...

2017-01-20 Thread felixcheung
Github user felixcheung commented on the issue:

https://github.com/apache/spark/pull/1
  
ah - does bisecting kmeans have the same behavior?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #16566: [SPARK-18821][SparkR]: Bisecting k-means wrapper in Spar...

2017-01-20 Thread felixcheung
Github user felixcheung commented on the issue:

https://github.com/apache/spark/pull/16566
  
couple of last comments.
@yanboliang do you have any comment?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #16566: [SPARK-18821][SparkR]: Bisecting k-means wrapper ...

2017-01-20 Thread felixcheung
Github user felixcheung commented on a diff in the pull request:

https://github.com/apache/spark/pull/16566#discussion_r97192612
  
--- Diff: 
mllib/src/main/scala/org/apache/spark/ml/r/BisectingKMeansWrapper.scala ---
@@ -0,0 +1,142 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.spark.ml.r
+
+import org.apache.hadoop.fs.Path
+import org.json4s._
+import org.json4s.JsonDSL._
+import org.json4s.jackson.JsonMethods._
+
+import org.apache.spark.ml.{Pipeline, PipelineModel}
+import org.apache.spark.ml.attribute.AttributeGroup
+import org.apache.spark.ml.clustering.{BisectingKMeans, 
BisectingKMeansModel}
+import org.apache.spark.ml.feature.RFormula
+import org.apache.spark.ml.util._
+import org.apache.spark.sql.{DataFrame, Dataset}
+
+private[r] class BisectingKMeansWrapper private (
+val pipeline: PipelineModel,
+val features: Array[String],
+val size: Array[Long],
+val isLoaded: Boolean = false) extends MLWritable {
+  private val bisectingKmeansModel: BisectingKMeansModel =
+pipeline.stages.last.asInstanceOf[BisectingKMeansModel]
+
+  lazy val coefficients: Array[Double] = 
bisectingKmeansModel.clusterCenters.flatMap(_.toArray)
+
+  lazy val k: Int = bisectingKmeansModel.getK
+
+  lazy val cluster: DataFrame = bisectingKmeansModel.summary.cluster
--- End diff --

ah this is checked on the R side. could you add a comment here


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #16566: [SPARK-18821][SparkR]: Bisecting k-means wrapper ...

2017-01-20 Thread felixcheung
Github user felixcheung commented on a diff in the pull request:

https://github.com/apache/spark/pull/16566#discussion_r97192608
  
--- Diff: R/pkg/R/mllib_clustering.R ---
@@ -38,6 +45,149 @@ setClass("KMeansModel", representation(jobj = "jobj"))
 #' @note LDAModel since 2.1.0
 setClass("LDAModel", representation(jobj = "jobj"))
 
+#' Bisecting K-Means Clustering Model
+#'
+#' Fits a bisecting k-means clustering model against a Spark DataFrame.
+#' Users can call \code{summary} to print a summary of the fitted model, 
\code{predict} to make
+#' predictions on new data, and \code{write.ml}/\code{read.ml} to 
save/load fitted models.
+#'
+#' @param data a SparkDataFrame for training.
+#' @param formula a symbolic description of the model to be fitted. 
Currently only a few formula
+#'operators are supported, including '~', '.', ':', '+', 
and '-'.
+#'Note that the response variable of formula is empty in 
spark.bisectingKmeans.
+#' @param k the desired number of leaf clusters. Must be > 1.
+#'  The actual number could be smaller if there are no divisible 
leaf clusters.
+#' @param maxIter maximum iteration number.
+#' @param seed the random seed.
+#' @param minDivisibleClusterSize The minimum number of points (if greater 
than or equal to 1.0)
+#'or the minimum proportion of points (if 
less than 1.0) of a divisible cluster.
+#'Note that it is an expert parameter. The 
default value should be good enough
+#'for most cases.
+#' @param ... additional argument(s) passed to the method.
+#' @return \code{spark.bisectingKmeans} returns a fitted bisecting k-means 
model.
+#' @rdname spark.bisectingKmeans
+#' @aliases spark.bisectingKmeans,SparkDataFrame,formula-method
+#' @name spark.bisectingKmeans
+#' @export
+#' @examples
+#' \dontrun{
+#' sparkR.session()
+#' df <- createDataFrame(iris)
+#' model <- spark.bisectingKmeans(df, Sepal_Length ~ Sepal_Width, k = 4)
+#' summary(model)
+#'
+#' # fitted values on training data
+#' fitted <- predict(model, df)
+#' head(select(fitted, "Sepal_Length", "prediction"))
+#'
+#' # save fitted model to input path
+#' path <- "path/to/model"
+#' write.ml(model, path)
+#'
+#' # can also read back the saved model and print
+#' savedModel <- read.ml(path)
+#' summary(savedModel)
+#' }
+#' @note spark.bisectingKmeans since 2.2.0
+#' @seealso \link{predict}, \link{read.ml}, \link{write.ml}
+setMethod("spark.bisectingKmeans", signature(data = "SparkDataFrame", 
formula = "formula"),
+  function(data, formula, k = 4, maxIter = 20, seed = NULL, 
minDivisibleClusterSize = 1.0) {
+formula <- paste0(deparse(formula), collapse = "")
+if (!is.null(seed)) {
+  seed <- as.character(as.integer(seed))
+}
+jobj <- 
callJStatic("org.apache.spark.ml.r.BisectingKMeansWrapper", "fit",
+data@sdf, formula, as.integer(k), 
as.integer(maxIter),
+seed, as.numeric(minDivisibleClusterSize))
+new("BisectingKMeansModel", jobj = jobj)
+  })
+
+#  Get the summary of a bisecting k-means model
+
+#' @param object a fitted bisecting k-means model.
+#' @return \code{summary} returns summary information of the fitted model, 
which is a list.
+#' The list includes the model's \code{k} (number of cluster 
centers),
+#' \code{coefficients} (model cluster centers),
+#' \code{size} (number of data points in each cluster), and 
\code{cluster}
+#' (cluster centers of the transformed data).
--- End diff --

also clarify `cluster` is NULL if is.loaded = T


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #16566: [SPARK-18821][SparkR]: Bisecting k-means wrapper ...

2017-01-20 Thread felixcheung
Github user felixcheung commented on a diff in the pull request:

https://github.com/apache/spark/pull/16566#discussion_r97192589
  
--- Diff: 
mllib/src/main/scala/org/apache/spark/ml/r/BisectingKMeansWrapper.scala ---
@@ -0,0 +1,142 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.spark.ml.r
+
+import org.apache.hadoop.fs.Path
+import org.json4s._
+import org.json4s.JsonDSL._
+import org.json4s.jackson.JsonMethods._
+
+import org.apache.spark.ml.{Pipeline, PipelineModel}
+import org.apache.spark.ml.attribute.AttributeGroup
+import org.apache.spark.ml.clustering.{BisectingKMeans, 
BisectingKMeansModel}
+import org.apache.spark.ml.feature.RFormula
+import org.apache.spark.ml.util._
+import org.apache.spark.sql.{DataFrame, Dataset}
+
+private[r] class BisectingKMeansWrapper private (
+val pipeline: PipelineModel,
+val features: Array[String],
+val size: Array[Long],
+val isLoaded: Boolean = false) extends MLWritable {
+  private val bisectingKmeansModel: BisectingKMeansModel =
+pipeline.stages.last.asInstanceOf[BisectingKMeansModel]
+
+  lazy val coefficients: Array[Double] = 
bisectingKmeansModel.clusterCenters.flatMap(_.toArray)
+
+  lazy val k: Int = bisectingKmeansModel.getK
+
+  lazy val cluster: DataFrame = bisectingKmeansModel.summary.cluster
+
+  def fitted(method: String): DataFrame = {
+if (method == "centers") {
+  
bisectingKmeansModel.summary.predictions.drop(bisectingKmeansModel.getFeaturesCol)
+} else if (method == "classes") {
+  bisectingKmeansModel.summary.cluster
+} else {
+  throw new UnsupportedOperationException(
+s"Method (centers or classes) required but $method found.")
+}
+  }
+
+  def transform(dataset: Dataset[_]): DataFrame = {
+pipeline.transform(dataset).drop(bisectingKmeansModel.getFeaturesCol)
+  }
+
+  override def write: MLWriter = new 
BisectingKMeansWrapper.BisectingKMeansWrapperWriter(this)
+}
+
+private[r] object BisectingKMeansWrapper extends 
MLReadable[BisectingKMeansWrapper] {
+
+  def fit(
+  data: DataFrame,
+  formula: String,
+  k: Int,
+  maxIter: Int,
+  seed: String,
+  minDivisibleClusterSize: Double
+  ): BisectingKMeansWrapper = {
+
+val rFormula = new RFormula()
+  .setFormula(formula)
+  .setFeaturesCol("features")
+RWrapperUtils.checkDataColumns(rFormula, data)
+val rFormulaModel = rFormula.fit(data)
+
+// get feature names from output schema
+val schema = rFormulaModel.transform(data).schema
+val featureAttrs = 
AttributeGroup.fromStructField(schema(rFormulaModel.getFeaturesCol))
+  .attributes.get
+val features = featureAttrs.map(_.name.get)
+
+val bisectingKmeans = new BisectingKMeans()
+  .setK(k)
+  .setMaxIter(maxIter)
+  .setMinDivisibleClusterSize(minDivisibleClusterSize)
+  .setFeaturesCol(rFormula.getFeaturesCol)
+
+if (seed != null && seed.length > 0) 
bisectingKmeans.setSeed(seed.toInt)
+
+val pipeline = new Pipeline()
+  .setStages(Array(rFormulaModel, bisectingKmeans))
+  .fit(data)
+
+val bisectingKmeansModel: BisectingKMeansModel =
+  pipeline.stages(1).asInstanceOf[BisectingKMeansModel]
--- End diff --

let's be consistent here with L38 - either (1) or last


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #16566: [SPARK-18821][SparkR]: Bisecting k-means wrapper ...

2017-01-20 Thread felixcheung
Github user felixcheung commented on a diff in the pull request:

https://github.com/apache/spark/pull/16566#discussion_r97192573
  
--- Diff: 
mllib/src/main/scala/org/apache/spark/ml/r/BisectingKMeansWrapper.scala ---
@@ -0,0 +1,142 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.spark.ml.r
+
+import org.apache.hadoop.fs.Path
+import org.json4s._
+import org.json4s.JsonDSL._
+import org.json4s.jackson.JsonMethods._
+
+import org.apache.spark.ml.{Pipeline, PipelineModel}
+import org.apache.spark.ml.attribute.AttributeGroup
+import org.apache.spark.ml.clustering.{BisectingKMeans, 
BisectingKMeansModel}
+import org.apache.spark.ml.feature.RFormula
+import org.apache.spark.ml.util._
+import org.apache.spark.sql.{DataFrame, Dataset}
+
+private[r] class BisectingKMeansWrapper private (
+val pipeline: PipelineModel,
+val features: Array[String],
+val size: Array[Long],
+val isLoaded: Boolean = false) extends MLWritable {
+  private val bisectingKmeansModel: BisectingKMeansModel =
+pipeline.stages.last.asInstanceOf[BisectingKMeansModel]
+
+  lazy val coefficients: Array[Double] = 
bisectingKmeansModel.clusterCenters.flatMap(_.toArray)
+
+  lazy val k: Int = bisectingKmeansModel.getK
+
+  lazy val cluster: DataFrame = bisectingKmeansModel.summary.cluster
--- End diff --

does this have valid values when the model is loaded?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #16566: [SPARK-18821][SparkR]: Bisecting k-means wrapper ...

2017-01-20 Thread felixcheung
Github user felixcheung commented on a diff in the pull request:

https://github.com/apache/spark/pull/16566#discussion_r97192502
  
--- Diff: R/pkg/R/mllib_clustering.R ---
@@ -38,6 +45,149 @@ setClass("KMeansModel", representation(jobj = "jobj"))
 #' @note LDAModel since 2.1.0
 setClass("LDAModel", representation(jobj = "jobj"))
 
+#' Bisecting K-Means Clustering Model
+#'
+#' Fits a bisecting k-means clustering model against a Spark DataFrame.
+#' Users can call \code{summary} to print a summary of the fitted model, 
\code{predict} to make
+#' predictions on new data, and \code{write.ml}/\code{read.ml} to 
save/load fitted models.
+#'
+#' @param data a SparkDataFrame for training.
+#' @param formula a symbolic description of the model to be fitted. 
Currently only a few formula
+#'operators are supported, including '~', '.', ':', '+', 
and '-'.
+#'Note that the response variable of formula is empty in 
spark.bisectingKmeans.
+#' @param k the desired number of leaf clusters. Must be > 1.
+#'  The actual number could be smaller if there are no divisible 
leaf clusters.
+#' @param maxIter maximum iteration number.
+#' @param seed the random seed.
+#' @param minDivisibleClusterSize The minimum number of points (if greater 
than or equal to 1.0)
+#'or the minimum proportion of points (if 
less than 1.0) of a divisible cluster.
+#'Note that it is an expert parameter. The 
default value should be good enough
+#'for most cases.
+#' @param ... additional argument(s) passed to the method.
+#' @return \code{spark.bisectingKmeans} returns a fitted bisecting k-means 
model.
+#' @rdname spark.bisectingKmeans
+#' @aliases spark.bisectingKmeans,SparkDataFrame,formula-method
+#' @name spark.bisectingKmeans
+#' @export
+#' @examples
+#' \dontrun{
+#' sparkR.session()
+#' df <- createDataFrame(iris)
+#' model <- spark.bisectingKmeans(df, Sepal_Length ~ Sepal_Width, k = 4)
+#' summary(model)
+#'
+#' # fitted values on training data
+#' fitted <- predict(model, df)
+#' head(select(fitted, "Sepal_Length", "prediction"))
+#'
+#' # save fitted model to input path
+#' path <- "path/to/model"
+#' write.ml(model, path)
+#'
+#' # can also read back the saved model and print
+#' savedModel <- read.ml(path)
+#' summary(savedModel)
+#' }
+#' @note spark.bisectingKmeans since 2.2.0
+#' @seealso \link{predict}, \link{read.ml}, \link{write.ml}
+setMethod("spark.bisectingKmeans", signature(data = "SparkDataFrame", 
formula = "formula"),
+  function(data, formula, k = 4, maxIter = 20, seed = NULL, 
minDivisibleClusterSize = 1.0) {
+formula <- paste0(deparse(formula), collapse = "")
+if (!is.null(seed)) {
+  seed <- as.character(as.integer(seed))
+}
+jobj <- 
callJStatic("org.apache.spark.ml.r.BisectingKMeansWrapper", "fit",
+data@sdf, formula, as.integer(k), 
as.integer(maxIter),
+seed, as.numeric(minDivisibleClusterSize))
+new("BisectingKMeansModel", jobj = jobj)
+  })
+
+#  Get the summary of a bisecting k-means model
+
+#' @param object a fitted bisecting k-means model.
+#' @return \code{summary} returns summary information of the fitted model, 
which is a list.
+#' The list includes the model's \code{k} (number of cluster 
centers),
+#' \code{coefficients} (model cluster centers),
+#' \code{size} (number of data points in each cluster), and 
\code{cluster}
+#' (cluster centers of the transformed data).
--- End diff --

let's add `is.loaded` here


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #16566: [SPARK-18821][SparkR]: Bisecting k-means wrapper ...

2017-01-20 Thread felixcheung
Github user felixcheung commented on a diff in the pull request:

https://github.com/apache/spark/pull/16566#discussion_r97192351
  
--- Diff: R/pkg/R/mllib_clustering.R ---
@@ -38,6 +45,149 @@ setClass("KMeansModel", representation(jobj = "jobj"))
 #' @note LDAModel since 2.1.0
 setClass("LDAModel", representation(jobj = "jobj"))
 
+#' Bisecting K-Means Clustering Model
+#'
+#' Fits a bisecting k-means clustering model against a Spark DataFrame.
+#' Users can call \code{summary} to print a summary of the fitted model, 
\code{predict} to make
+#' predictions on new data, and \code{write.ml}/\code{read.ml} to 
save/load fitted models.
+#'
+#' @param data a SparkDataFrame for training.
+#' @param formula a symbolic description of the model to be fitted. 
Currently only a few formula
+#'operators are supported, including '~', '.', ':', '+', 
and '-'.
+#'Note that the response variable of formula is empty in 
spark.bisectingKmeans.
+#' @param k the desired number of leaf clusters. Must be > 1.
+#'  The actual number could be smaller if there are no divisible 
leaf clusters.
+#' @param maxIter maximum iteration number.
+#' @param seed the random seed.
+#' @param minDivisibleClusterSize The minimum number of points (if greater 
than or equal to 1.0)
+#'or the minimum proportion of points (if 
less than 1.0) of a divisible cluster.
+#'Note that it is an expert parameter. The 
default value should be good enough
+#'for most cases.
+#' @param ... additional argument(s) passed to the method.
+#' @return \code{spark.bisectingKmeans} returns a fitted bisecting k-means 
model.
+#' @rdname spark.bisectingKmeans
+#' @aliases spark.bisectingKmeans,SparkDataFrame,formula-method
+#' @name spark.bisectingKmeans
+#' @export
+#' @examples
+#' \dontrun{
+#' sparkR.session()
+#' df <- createDataFrame(iris)
+#' model <- spark.bisectingKmeans(df, Sepal_Length ~ Sepal_Width, k = 4)
+#' summary(model)
+#'
+#' # fitted values on training data
+#' fitted <- predict(model, df)
+#' head(select(fitted, "Sepal_Length", "prediction"))
+#'
+#' # save fitted model to input path
+#' path <- "path/to/model"
+#' write.ml(model, path)
+#'
+#' # can also read back the saved model and print
+#' savedModel <- read.ml(path)
+#' summary(savedModel)
+#' }
+#' @note spark.bisectingKmeans since 2.2.0
+#' @seealso \link{predict}, \link{read.ml}, \link{write.ml}
+setMethod("spark.bisectingKmeans", signature(data = "SparkDataFrame", 
formula = "formula"),
+  function(data, formula, k = 4, maxIter = 20, seed = NULL, 
minDivisibleClusterSize = 1.0) {
+formula <- paste0(deparse(formula), collapse = "")
+if (!is.null(seed)) {
+  seed <- as.character(as.integer(seed))
+}
+jobj <- 
callJStatic("org.apache.spark.ml.r.BisectingKMeansWrapper", "fit",
+data@sdf, formula, as.integer(k), 
as.integer(maxIter),
+seed, as.numeric(minDivisibleClusterSize))
+new("BisectingKMeansModel", jobj = jobj)
+  })
+
+#  Get the summary of a bisecting k-means model
+
+#' @param object a fitted bisecting k-means model.
+#' @return \code{summary} returns summary information of the fitted model, 
which is a list.
+#' The list includes the model's \code{k} (number of cluster 
centers),
+#' \code{coefficients} (model cluster centers),
+#' \code{size} (number of data points in each cluster), and 
\code{cluster}
+#' (cluster centers of the transformed data).
+#' @rdname spark.bisectingKmeans
+#' @export
+#' @note summary(BisectingKMeansModel) since 2.2.0
+setMethod("summary", signature(object = "BisectingKMeansModel"),
+  function(object) {
+jobj <- object@jobj
+is.loaded <- callJMethod(jobj, "isLoaded")
+features <- callJMethod(jobj, "features")
+coefficients <- callJMethod(jobj, "coefficients")
+k <- callJMethod(jobj, "k")
+size <- callJMethod(jobj, "size")
+coefficients <- t(matrix(coefficients, ncol = k))
+colnames(coefficients) <- unlist(features)
+rownames(coefficients) <- 1:k
+cluster <- if (is.loaded) {
+  NULL
+} else {
+  dataFrame(callJMethod(jobj, "cluster"))
+}
+list(k = k, coefficients = coefficients, size = size,
+cluster = cluster, is.loaded = is.loaded)
+  })
+
+#  Predicted values b

[GitHub] spark issue #16663: [SPARK-18823][SPARKR] add support for assigning to colum...

2017-01-20 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/16663
  
**[Test build #71760 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/71760/testReport)**
 for PR 16663 at commit 
[`73845cb`](https://github.com/apache/spark/commit/73845cb93be7692fe6954232583166d66d0bf8d2).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #16660: [SPARK-19311][SQL] fix UDT hierarchy issue

2017-01-20 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/16660
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/71755/
Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #16660: [SPARK-19311][SQL] fix UDT hierarchy issue

2017-01-20 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/16660
  
Merged build finished. Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #16660: [SPARK-19311][SQL] fix UDT hierarchy issue

2017-01-20 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/16660
  
**[Test build #71755 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/71755/testReport)**
 for PR 16660 at commit 
[`7ea9aa6`](https://github.com/apache/spark/commit/7ea9aa636f430a30b8d83ed2dda954fd06347d79).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #16663: [SPARK-18823][SPARKR] add support for assigning to colum...

2017-01-20 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/16663
  
Test FAILed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/71758/
Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #16663: [SPARK-18823][SPARKR] add support for assigning to colum...

2017-01-20 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/16663
  
**[Test build #71758 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/71758/testReport)**
 for PR 16663 at commit 
[`17d3226`](https://github.com/apache/spark/commit/17d32262252f6beac7abd2afd5fb266d092ed7c2).
 * This patch **fails SparkR unit tests**.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #16663: [SPARK-18823][SPARKR] add support for assigning to colum...

2017-01-20 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/16663
  
Merged build finished. Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #16516: [SPARK-19155][ML] MLlib GeneralizedLinearRegression fami...

2017-01-20 Thread felixcheung
Github user felixcheung commented on the issue:

https://github.com/apache/spark/pull/16516
  
looks good to me


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #16655: [SPARK-19305][SQL] partitioned table should alway...

2017-01-20 Thread asfgit
Github user asfgit closed the pull request at:

https://github.com/apache/spark/pull/16655


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #16659: [SPARK-19309][SQL] disable common subexpression eliminat...

2017-01-20 Thread viirya
Github user viirya commented on the issue:

https://github.com/apache/spark/pull/16659
  
LGTM


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #16659: [SPARK-19309][SQL] disable common subexpression e...

2017-01-20 Thread viirya
Github user viirya commented on a diff in the pull request:

https://github.com/apache/spark/pull/16659#discussion_r97191894
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/EquivalentExpressions.scala
 ---
@@ -67,28 +67,33 @@ class EquivalentExpressions {
   /**
* Adds the expression to this data structure recursively. Stops if a 
matching expression
* is found. That is, if `expr` has already been added, its children are 
not added.
-   * If ignoreLeaf is true, leaf nodes are ignored.
*/
-  def addExprTree(
-  root: Expression,
-  ignoreLeaf: Boolean = true,
-  skipReferenceToExpressions: Boolean = true): Unit = {
-val skip = (root.isInstanceOf[LeafExpression] && ignoreLeaf) ||
+  def addExprTree(expr: Expression): Unit = {
+val skip = expr.isInstanceOf[LeafExpression] ||
   // `LambdaVariable` is usually used as a loop variable, which can't 
be evaluated ahead of the
   // loop. So we can't evaluate sub-expressions containing 
`LambdaVariable` at the beginning.
-  root.find(_.isInstanceOf[LambdaVariable]).isDefined
-// There are some special expressions that we should not recurse into 
children.
+  expr.find(_.isInstanceOf[LambdaVariable]).isDefined
+
+// There are some special expressions that we should not recurse into 
all of its children.
 //   1. CodegenFallback: it's children will not be used to generate 
code (call eval() instead)
-//   2. ReferenceToExpressions: it's kind of an explicit 
sub-expression elimination.
-val shouldRecurse = root match {
-  // TODO: some expressions implements `CodegenFallback` but can still 
do codegen,
-  // e.g. `CaseWhen`, we should support them.
-  case _: CodegenFallback => false
-  case _: ReferenceToExpressions if skipReferenceToExpressions => false
-  case _ => true
+//   2. If: common subexpressions will always be evaluated at the 
beginning, but the true and
--- End diff --

this's cool.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #16659: [SPARK-19309][SQL] disable common subexpression eliminat...

2017-01-20 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/16659
  
**[Test build #71759 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/71759/testReport)**
 for PR 16659 at commit 
[`9d50048`](https://github.com/apache/spark/commit/9d50048a47b5052a85faa16535535eb86c146aa3).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #16655: [SPARK-19305][SQL] partitioned table should always put p...

2017-01-20 Thread cloud-fan
Github user cloud-fan commented on the issue:

https://github.com/apache/spark/pull/16655
  
thanks for the review, merging to master!


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #16655: [SPARK-19305][SQL] partitioned table should always put p...

2017-01-20 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/16655
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/71754/
Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #16655: [SPARK-19305][SQL] partitioned table should always put p...

2017-01-20 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/16655
  
Merged build finished. Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #16655: [SPARK-19305][SQL] partitioned table should always put p...

2017-01-20 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/16655
  
**[Test build #71754 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/71754/testReport)**
 for PR 16655 at commit 
[`68f639e`](https://github.com/apache/spark/commit/68f639e468333faa9070cca639b3b491585b2e39).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #16665: [SPARK-13478][YARN] Use real user when fetching delegati...

2017-01-20 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/16665
  
Test FAILed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/71757/
Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #16665: [SPARK-13478][YARN] Use real user when fetching delegati...

2017-01-20 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/16665
  
**[Test build #71757 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/71757/consoleFull)**
 for PR 16665 at commit 
[`e847ab0`](https://github.com/apache/spark/commit/e847ab0a13534a3bc97cd37ab91a0be8ed838bfa).
 * This patch **fails PySpark unit tests**.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #16665: [SPARK-13478][YARN] Use real user when fetching delegati...

2017-01-20 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/16665
  
Merged build finished. Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #16626: [SPARK-19261][SQL] Alter add columns for Hive tab...

2017-01-20 Thread xwu0226
Github user xwu0226 commented on a diff in the pull request:

https://github.com/apache/spark/pull/16626#discussion_r97191587
  
--- Diff: 
sql/core/src/main/scala/org/apache/spark/sql/execution/command/tables.scala ---
@@ -168,6 +168,43 @@ case class AlterTableRenameCommand(
 }
 
 /**
+ * A command that add columns to a table
+ * The syntax of using this command in SQL is:
+ * {{{
+ *   ALTER TABLE table_identifier
+ *   ADD COLUMNS (col_name data_type [COMMENT col_comment], ...);
+ * }}}
+*/
+case class AlterTableAddColumnsCommand(
+table: TableIdentifier,
+columns: Seq[StructField]) extends RunnableCommand {
+  override def run(sparkSession: SparkSession): Seq[Row] = {
+val catalog = sparkSession.sessionState.catalog
+val catalogTable = DDLUtils.verifyAlterTableAddColumn(catalog, table)
+
+// If an exception is thrown here we can just assume the table is 
uncached;
+// this can happen with Hive tables when the underlying catalog is 
in-memory.
+val wasCached = 
Try(sparkSession.catalog.isCached(table.unquotedString)).getOrElse(false)
+if (wasCached) {
+  try {
+sparkSession.catalog.uncacheTable(table.unquotedString)
+  } catch {
+case NonFatal(e) => log.warn(e.toString, e)
+  }
+}
+// Invalidate the table last, otherwise uncaching the table would load 
the logical plan
+// back into the hive metastore cache
+catalog.refreshTable(table)
+
+val newSchema = catalogTable.schema.copy(fields = 
catalogTable.schema.fields ++ columns)
--- End diff --

We support partitioned tables. The test cases add include this case. 
However, we don't support ALTER ADD COLUMNS to a particular partition, as 
what Hive can do today. EX: `ALTER TABLE T1 PARTITION(c3=1) ADD COLUMNS  `. 
This is another potential feature to add if we maintain schema for a partition. 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #16663: [SPARK-18823][SPARKR] add support for assigning to colum...

2017-01-20 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/16663
  
**[Test build #71758 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/71758/testReport)**
 for PR 16663 at commit 
[`17d3226`](https://github.com/apache/spark/commit/17d32262252f6beac7abd2afd5fb266d092ed7c2).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #16527: [SPARK-19146][Core]Drop more elements when stageData.tas...

2017-01-20 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/16527
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/71753/
Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #16527: [SPARK-19146][Core]Drop more elements when stageData.tas...

2017-01-20 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/16527
  
Merged build finished. Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #16626: [SPARK-19261][SQL] Alter add columns for Hive tab...

2017-01-20 Thread gatorsmile
Github user gatorsmile commented on a diff in the pull request:

https://github.com/apache/spark/pull/16626#discussion_r97191445
  
--- Diff: 
sql/core/src/main/scala/org/apache/spark/sql/execution/command/ddl.scala ---
@@ -814,4 +814,28 @@ object DDLUtils {
   }
 }
   }
+
+  /**
+   * ALTER TABLE ADD COLUMNS command does not support temporary view/table,
+   * view, or datasource table yet.
+   */
+  def verifyAlterTableAddColumn(
+  catalog: SessionCatalog,
+  table: TableIdentifier): CatalogTable = {
+if (catalog.isTemporaryTable(table)) {
+  throw new AnalysisException(
+s"${table.toString} is a temporary VIEW, which does not support 
ALTER ADD COLUMNS.")
+}
+
+val catalogTable = catalog.getTableMetadata(table)
+if (catalogTable.tableType == CatalogTableType.VIEW) {
+  throw new AnalysisException(
+s"${table.toString} is a VIEW, which does not support ALTER ADD 
COLUMNS.")
+}
+if (isDatasourceTable(catalogTable)) {
--- End diff --

Currently, their code paths for managing hive serde tables and data source 
tables have been combined. Thus, it can be easily handled together. 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #16527: [SPARK-19146][Core]Drop more elements when stageData.tas...

2017-01-20 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/16527
  
**[Test build #71753 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/71753/testReport)**
 for PR 16527 at commit 
[`89721cd`](https://github.com/apache/spark/commit/89721cd7e8048ee72d37c18bd762d1ba7d73ef3b).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #16620: [SPARK-19263] DAGScheduler should avoid sending conflict...

2017-01-20 Thread jinxing64
Github user jinxing64 commented on the issue:

https://github.com/apache/spark/pull/16620
  
@markhamstra 
Thanks a lot for your comment, I've already refined, please take another 
look ~


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #16626: [SPARK-19261][SQL] Alter add columns for Hive tab...

2017-01-20 Thread xwu0226
Github user xwu0226 commented on a diff in the pull request:

https://github.com/apache/spark/pull/16626#discussion_r97191366
  
--- Diff: 
sql/core/src/main/scala/org/apache/spark/sql/execution/command/ddl.scala ---
@@ -814,4 +814,28 @@ object DDLUtils {
   }
 }
   }
+
+  /**
+   * ALTER TABLE ADD COLUMNS command does not support temporary view/table,
+   * view, or datasource table yet.
+   */
+  def verifyAlterTableAddColumn(
+  catalog: SessionCatalog,
+  table: TableIdentifier): CatalogTable = {
+if (catalog.isTemporaryTable(table)) {
+  throw new AnalysisException(
+s"${table.toString} is a temporary VIEW, which does not support 
ALTER ADD COLUMNS.")
+}
+
+val catalogTable = catalog.getTableMetadata(table)
+if (catalogTable.tableType == CatalogTableType.VIEW) {
+  throw new AnalysisException(
+s"${table.toString} is a VIEW, which does not support ALTER ADD 
COLUMNS.")
+}
+if (isDatasourceTable(catalogTable)) {
--- End diff --

I am thinking that there are different ways to create a datasource table, 
such as df.write.saveAsTable, or create with "create table " DDL with/without 
schema. Plus JDBC datasource table maybe not be supported.. I just want to 
spend more time on trying different scenarios to see if there is any hole 
before claiming supporting it. I will submit another PR once I am sure it is 
handled correctly. 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #16581: [SPARK-18589] [SQL] Fix Python UDF accessing attributes ...

2017-01-20 Thread davies
Github user davies commented on the issue:

https://github.com/apache/spark/pull/16581
  
Cherry-picked into 2.1 branch.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #16659: [SPARK-19309][SQL] disable common subexpression eliminat...

2017-01-20 Thread viirya
Github user viirya commented on the issue:

https://github.com/apache/spark/pull/16659
  
The child expression in `Sum` is wrapped in `Coalesce`. So making 
`org.apache.spark.sql.SQLQuerySuite.Common subexpression elimination` test 
failed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #16667: [SPARK-18750][yarn] Avoid using "mapValues" when allocat...

2017-01-20 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/16667
  
Merged build finished. Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #16667: [SPARK-18750][yarn] Avoid using "mapValues" when allocat...

2017-01-20 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/16667
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/71756/
Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #16667: [SPARK-18750][yarn] Avoid using "mapValues" when allocat...

2017-01-20 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/16667
  
**[Test build #71756 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/71756/testReport)**
 for PR 16667 at commit 
[`16a99fc`](https://github.com/apache/spark/commit/16a99fcff20a2527d95d54d94c1c348dbd638f26).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #16659: [SPARK-19309][SQL] disable common subexpression eliminat...

2017-01-20 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/16659
  
Test FAILed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/71752/
Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #16659: [SPARK-19309][SQL] disable common subexpression eliminat...

2017-01-20 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/16659
  
Merged build finished. Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #16659: [SPARK-19309][SQL] disable common subexpression eliminat...

2017-01-20 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/16659
  
**[Test build #71752 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/71752/testReport)**
 for PR 16659 at commit 
[`cda9723`](https://github.com/apache/spark/commit/cda9723e8adc07142521cd5d17568f6e5ff3b709).
 * This patch **fails Spark unit tests**.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #16667: [SPARK-18750][yarn] Avoid using "mapValues" when allocat...

2017-01-20 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/16667
  
**[Test build #71756 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/71756/testReport)**
 for PR 16667 at commit 
[`16a99fc`](https://github.com/apache/spark/commit/16a99fcff20a2527d95d54d94c1c348dbd638f26).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #16665: [SPARK-13478][YARN] Use real user when fetching delegati...

2017-01-20 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/16665
  
**[Test build #71757 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/71757/consoleFull)**
 for PR 16665 at commit 
[`e847ab0`](https://github.com/apache/spark/commit/e847ab0a13534a3bc97cd37ab91a0be8ed838bfa).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #16665: [SPARK-13478][YARN] Use real user when fetching delegati...

2017-01-20 Thread vanzin
Github user vanzin commented on the issue:

https://github.com/apache/spark/pull/16665
  
seems unrelated but... retest this please


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #16667: [SPARK-18750][yarn] Avoid using "mapValues" when allocat...

2017-01-20 Thread vanzin
Github user vanzin commented on the issue:

https://github.com/apache/spark/pull/16667
  
Argh, api not available in old hadoop... fix coming.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #16626: [SPARK-19261][SQL] Alter add columns for Hive tab...

2017-01-20 Thread gatorsmile
Github user gatorsmile commented on a diff in the pull request:

https://github.com/apache/spark/pull/16626#discussion_r97189715
  
--- Diff: 
sql/core/src/main/scala/org/apache/spark/sql/execution/command/tables.scala ---
@@ -168,6 +168,43 @@ case class AlterTableRenameCommand(
 }
 
 /**
+ * A command that add columns to a table
+ * The syntax of using this command in SQL is:
+ * {{{
+ *   ALTER TABLE table_identifier
+ *   ADD COLUMNS (col_name data_type [COMMENT col_comment], ...);
+ * }}}
+*/
+case class AlterTableAddColumnsCommand(
+table: TableIdentifier,
+columns: Seq[StructField]) extends RunnableCommand {
+  override def run(sparkSession: SparkSession): Seq[Row] = {
+val catalog = sparkSession.sessionState.catalog
+val catalogTable = DDLUtils.verifyAlterTableAddColumn(catalog, table)
+
+// If an exception is thrown here we can just assume the table is 
uncached;
+// this can happen with Hive tables when the underlying catalog is 
in-memory.
+val wasCached = 
Try(sparkSession.catalog.isCached(table.unquotedString)).getOrElse(false)
+if (wasCached) {
+  try {
+sparkSession.catalog.uncacheTable(table.unquotedString)
+  } catch {
+case NonFatal(e) => log.warn(e.toString, e)
+  }
+}
+// Invalidate the table last, otherwise uncaching the table would load 
the logical plan
+// back into the hive metastore cache
+catalog.refreshTable(table)
+
+val newSchema = catalogTable.schema.copy(fields = 
catalogTable.schema.fields ++ columns)
--- End diff --

We are not supporting partitioned tables, right?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #16626: [SPARK-19261][SQL] Alter add columns for Hive tab...

2017-01-20 Thread gatorsmile
Github user gatorsmile commented on a diff in the pull request:

https://github.com/apache/spark/pull/16626#discussion_r97189688
  
--- Diff: 
sql/core/src/main/scala/org/apache/spark/sql/execution/command/ddl.scala ---
@@ -814,4 +814,28 @@ object DDLUtils {
   }
 }
   }
+
+  /**
+   * ALTER TABLE ADD COLUMNS command does not support temporary view/table,
+   * view, or datasource table yet.
+   */
+  def verifyAlterTableAddColumn(
+  catalog: SessionCatalog,
+  table: TableIdentifier): CatalogTable = {
+if (catalog.isTemporaryTable(table)) {
+  throw new AnalysisException(
+s"${table.toString} is a temporary VIEW, which does not support 
ALTER ADD COLUMNS.")
+}
+
+val catalogTable = catalog.getTableMetadata(table)
+if (catalogTable.tableType == CatalogTableType.VIEW) {
+  throw new AnalysisException(
+s"${table.toString} is a VIEW, which does not support ALTER ADD 
COLUMNS.")
+}
+if (isDatasourceTable(catalogTable)) {
--- End diff --

What is the reason why data source tables are not supported?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #16660: [SPARK-19311][SQL] fix UDT hierarchy issue

2017-01-20 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/16660
  
**[Test build #71755 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/71755/testReport)**
 for PR 16660 at commit 
[`7ea9aa6`](https://github.com/apache/spark/commit/7ea9aa636f430a30b8d83ed2dda954fd06347d79).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #16626: [SPARK-19261][SQL] Alter add columns for Hive tables

2017-01-20 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/16626
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/71749/
Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #16626: [SPARK-19261][SQL] Alter add columns for Hive tables

2017-01-20 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/16626
  
Merged build finished. Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #16660: [SPARK-19311][SQL] fix UDT hierarchy issue

2017-01-20 Thread gatorsmile
Github user gatorsmile commented on the issue:

https://github.com/apache/spark/pull/16660
  
You can add a test case in 
https://github.com/apache/spark/blob/master/sql/core/src/test/scala/org/apache/spark/sql/UserDefinedTypeSuite.scala


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #16626: [SPARK-19261][SQL] Alter add columns for Hive tables

2017-01-20 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/16626
  
**[Test build #71749 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/71749/testReport)**
 for PR 16626 at commit 
[`73b0243`](https://github.com/apache/spark/commit/73b024309674dc6d76e853547ef2a64da4836ce8).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #16660: [SPARK-19311][SQL] fix UDT hierarchy issue

2017-01-20 Thread gatorsmile
Github user gatorsmile commented on the issue:

https://github.com/apache/spark/pull/16660
  
ok to test


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #16660: [SPARK-19311][SQL] fix UDT hierarchy issue

2017-01-20 Thread viirya
Github user viirya commented on the issue:

https://github.com/apache/spark/pull/16660
  
LGTM too. @gmoehler Can you add a unit test?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #15192: [SPARK-14536] [SQL] fix to handle null value in a...

2017-01-20 Thread asfgit
Github user asfgit closed the pull request at:

https://github.com/apache/spark/pull/15192


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #15192: [SPARK-14536] [SQL] fix to handle null value in array ty...

2017-01-20 Thread gatorsmile
Github user gatorsmile commented on the issue:

https://github.com/apache/spark/pull/15192
  
Thanks! Merging to master.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #16655: [SPARK-19305][SQL] partitioned table should always put p...

2017-01-20 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/16655
  
**[Test build #71754 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/71754/testReport)**
 for PR 16655 at commit 
[`68f639e`](https://github.com/apache/spark/commit/68f639e468333faa9070cca639b3b491585b2e39).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #16655: [SPARK-19305][SQL] partitioned table should always put p...

2017-01-20 Thread windpiger
Github user windpiger commented on the issue:

https://github.com/apache/spark/pull/16655
  
LGTM, after this merged, I will contiune the work #16593 thanks~


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #16655: [SPARK-19305][SQL] partitioned table should always put p...

2017-01-20 Thread gatorsmile
Github user gatorsmile commented on the issue:

https://github.com/apache/spark/pull/16655
  
LGTM pending test


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #16655: [SPARK-19305][SQL] partitioned table should alway...

2017-01-20 Thread cloud-fan
Github user cloud-fan commented on a diff in the pull request:

https://github.com/apache/spark/pull/16655#discussion_r97189276
  
--- Diff: 
sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/rules.scala 
---
@@ -199,31 +199,52 @@ case class AnalyzeCreateTable(sparkSession: 
SparkSession) extends Rule[LogicalPl
 //   * can't use all table columns as partition columns.
 //   * partition columns' type must be AtomicType.
 //   * sort columns' type must be orderable.
+//   * reorder table schema or output of query plan, to put partition 
columns at the end.
 case c @ CreateTable(tableDesc, _, query) =>
-  val analyzedQuery = query.map { q =>
-// Analyze the query in CTAS and then we can do the normalization 
and checking.
-val qe = sparkSession.sessionState.executePlan(q)
+  if (query.isDefined) {
+val qe = sparkSession.sessionState.executePlan(query.get)
 qe.assertAnalyzed()
-qe.analyzed
-  }
-  val schema = if (analyzedQuery.isDefined) {
-analyzedQuery.get.schema
-  } else {
-tableDesc.schema
-  }
+val analyzedQuery = qe.analyzed
+
+val normalizedTable = normalizeCatalogTable(analyzedQuery.schema, 
tableDesc)
+
+val output = analyzedQuery.output
+val partitionAttrs = normalizedTable.partitionColumnNames.map { 
partCol =>
+  output.find(_.name == partCol).get
+}
+val newOutput = output.filterNot(partitionAttrs.contains) ++ 
partitionAttrs
+val reorderedQuery = if (newOutput == output) {
+  analyzedQuery
+} else {
+  Project(newOutput, analyzedQuery)
+}
 
-  val columnNames = if 
(sparkSession.sessionState.conf.caseSensitiveAnalysis) {
-schema.map(_.name)
+c.copy(tableDesc = normalizedTable, query = Some(reorderedQuery))
--- End diff --

this should be guaranteed by the parser, but we can check it again here.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #16655: [SPARK-19305][SQL] partitioned table should alway...

2017-01-20 Thread gatorsmile
Github user gatorsmile commented on a diff in the pull request:

https://github.com/apache/spark/pull/16655#discussion_r97189247
  
--- Diff: 
sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/rules.scala 
---
@@ -199,31 +199,52 @@ case class AnalyzeCreateTable(sparkSession: 
SparkSession) extends Rule[LogicalPl
 //   * can't use all table columns as partition columns.
 //   * partition columns' type must be AtomicType.
 //   * sort columns' type must be orderable.
+//   * reorder table schema or output of query plan, to put partition 
columns at the end.
 case c @ CreateTable(tableDesc, _, query) =>
-  val analyzedQuery = query.map { q =>
-// Analyze the query in CTAS and then we can do the normalization 
and checking.
-val qe = sparkSession.sessionState.executePlan(q)
+  if (query.isDefined) {
+val qe = sparkSession.sessionState.executePlan(query.get)
 qe.assertAnalyzed()
-qe.analyzed
-  }
-  val schema = if (analyzedQuery.isDefined) {
-analyzedQuery.get.schema
-  } else {
-tableDesc.schema
-  }
+val analyzedQuery = qe.analyzed
+
+val normalizedTable = normalizeCatalogTable(analyzedQuery.schema, 
tableDesc)
+
+val output = analyzedQuery.output
+val partitionAttrs = normalizedTable.partitionColumnNames.map { 
partCol =>
+  output.find(_.name == partCol).get
+}
+val newOutput = output.filterNot(partitionAttrs.contains) ++ 
partitionAttrs
+val reorderedQuery = if (newOutput == output) {
+  analyzedQuery
+} else {
+  Project(newOutput, analyzedQuery)
+}
 
-  val columnNames = if 
(sparkSession.sessionState.conf.caseSensitiveAnalysis) {
-schema.map(_.name)
+c.copy(tableDesc = normalizedTable, query = Some(reorderedQuery))
--- End diff --

How about adding one more check here?
```Scala
assert(normalizedTable.schema.isEmpty,
  "Schema may not be specified in a Create Table As Select (CTAS) 
statement")
```


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #16582: [SPARK-19220][UI] Make redirection to HTTPS apply to all...

2017-01-20 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/16582
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/71748/
Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #16582: [SPARK-19220][UI] Make redirection to HTTPS apply to all...

2017-01-20 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/16582
  
Merged build finished. Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #16582: [SPARK-19220][UI] Make redirection to HTTPS apply to all...

2017-01-20 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/16582
  
**[Test build #71748 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/71748/testReport)**
 for PR 16582 at commit 
[`eb0fcb7`](https://github.com/apache/spark/commit/eb0fcb792b8130e9cbdf68eb18b15f3f49148d9b).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #16527: [SPARK-19146][Core]Drop more elements when stageData.tas...

2017-01-20 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/16527
  
**[Test build #71753 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/71753/testReport)**
 for PR 16527 at commit 
[`89721cd`](https://github.com/apache/spark/commit/89721cd7e8048ee72d37c18bd762d1ba7d73ef3b).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #16660: [SPARK-19311][SQL] fix UDT hierarchy issue

2017-01-20 Thread cloud-fan
Github user cloud-fan commented on the issue:

https://github.com/apache/spark/pull/16660
  
is it possible to add a unit test? the change LGTM


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #16496: [SPARK-16101][SQL] Refactoring CSV write path to ...

2017-01-20 Thread asfgit
Github user asfgit closed the pull request at:

https://github.com/apache/spark/pull/16496


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #16496: [SPARK-16101][SQL] Refactoring CSV write path to be cons...

2017-01-20 Thread cloud-fan
Github user cloud-fan commented on the issue:

https://github.com/apache/spark/pull/16496
  
thanks, merging to master!


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #16659: [SPARK-19309][SQL] disable common subexpression eliminat...

2017-01-20 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/16659
  
**[Test build #71752 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/71752/testReport)**
 for PR 16659 at commit 
[`cda9723`](https://github.com/apache/spark/commit/cda9723e8adc07142521cd5d17568f6e5ff3b709).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #16659: [SPARK-19309][SQL] disable common subexpression e...

2017-01-20 Thread cloud-fan
Github user cloud-fan commented on a diff in the pull request:

https://github.com/apache/spark/pull/16659#discussion_r97188584
  
--- Diff: 
sql/core/src/main/scala/org/apache/spark/sql/execution/aggregate/TypedAggregateExpression.scala
 ---
@@ -181,19 +185,17 @@ case class SimpleTypedAggregateExpression(
   outputExternalType,
   bufferDeserializer :: Nil)
 
+val serializeExprs = outputSerializer.map(_.transform {
--- End diff --

it's always used, no need to make it lazy val.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #16659: [SPARK-19309][SQL] disable common subexpression e...

2017-01-20 Thread cloud-fan
Github user cloud-fan commented on a diff in the pull request:

https://github.com/apache/spark/pull/16659#discussion_r97188544
  
--- Diff: 
sql/core/src/main/scala/org/apache/spark/sql/execution/aggregate/TypedAggregateExpression.scala
 ---
@@ -143,9 +143,15 @@ case class SimpleTypedAggregateExpression(
   override lazy val aggBufferAttributes: Seq[AttributeReference] =
 bufferSerializer.map(_.toAttribute.asInstanceOf[AttributeReference])
 
+  private def deserializeToBuffer(expr: Expression): Seq[Expression] = {
+bufferDeserializer.map(_.transform {
+  case _: BoundReference => expr
+})
+  }
+
   override lazy val initialValues: Seq[Expression] = {
 val zero = Literal.fromObject(aggregator.zero, bufferExternalType)
-bufferSerializer.map(ReferenceToExpressions(_, zero :: Nil))
--- End diff --

sorry, typo...


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #15192: [SPARK-14536] [SQL] fix to handle null value in array ty...

2017-01-20 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/15192
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/71746/
Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #15192: [SPARK-14536] [SQL] fix to handle null value in array ty...

2017-01-20 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/15192
  
Merged build finished. Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #16666: [SPARK-19319][SparkR]:SparkR Kmeans summary returns erro...

2017-01-20 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/1
  
**[Test build #71750 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/71750/testReport)**
 for PR 1 at commit 
[`2c1d02d`](https://github.com/apache/spark/commit/2c1d02d054fe1a8627b8610e8dd6de226b46af55).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #16666: [SPARK-19319][SparkR]:SparkR Kmeans summary returns erro...

2017-01-20 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/1
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/71750/
Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #16666: [SPARK-19319][SparkR]:SparkR Kmeans summary returns erro...

2017-01-20 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/1
  
Merged build finished. Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #15192: [SPARK-14536] [SQL] fix to handle null value in array ty...

2017-01-20 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/15192
  
**[Test build #71746 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/71746/testReport)**
 for PR 15192 at commit 
[`d8cbe54`](https://github.com/apache/spark/commit/d8cbe54f0440dd4bf4d87ca934a0bdbbf2eaa862).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #16213: [SPARK-18020][Streaming][Kinesis] Checkpoint SHARD_END t...

2017-01-20 Thread maropu
Github user maropu commented on the issue:

https://github.com/apache/spark/pull/16213
  
@tdas ping


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #16627: [SPARK-19267][SS]Fix a race condition when stoppi...

2017-01-20 Thread asfgit
Github user asfgit closed the pull request at:

https://github.com/apache/spark/pull/16627


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #16665: [SPARK-13478][YARN] Use real user when fetching delegati...

2017-01-20 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/16665
  
Merged build finished. Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #16665: [SPARK-13478][YARN] Use real user when fetching delegati...

2017-01-20 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/16665
  
Test FAILed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/71747/
Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



  1   2   3   4   5   >