date:20181105

[GitHub] spark issue #22818: [SPARK-25904][CORE] Allocate arrays smaller than Int.Max...

2018-11-05 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/22818
  
**[Test build #98504 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/98504/testReport)**
 for PR 22818 at commit 
[`361bf02`](https://github.com/apache/spark/commit/361bf02bbe3f78c7a68e93b77fbc9a0ccf39b47a).


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #22939: [SPARK-25446][R] Add schema_of_json() and schema_...

2018-11-05 Thread felixcheung

Github user felixcheung commented on a diff in the pull request:

https://github.com/apache/spark/pull/22939#discussion_r231025592
  
--- Diff: R/pkg/R/functions.R ---
@@ -205,11 +205,18 @@ NULL
 #'  also supported for the schema.
 #'  \item \code{from_csv}: a DDL-formatted string
 #'  }
-#' @param ... additional argument(s). In \code{to_json}, \code{to_csv} and 
\code{from_json},
-#'this contains additional named properties to control how it 
is converted, accepts
-#'the same options as the JSON/CSV data source. Additionally 
\code{to_json} supports
-#'the "pretty" option which enables pretty JSON generation. In 
\code{arrays_zip},
-#'this contains additional Columns of arrays to be merged.
+#' @param ... additional argument(s).
+#'  \itemize{
+#'  \item \code{to_json}, \code{from_json} and 
\code{schema_of_json}: this contains
+#'  additional named properties to control how it is converted 
and accepts the
+#'  same options as the JSON data source.
+#'  \item \code{to_json}: it supports the "pretty" option which 
enables pretty
--- End diff --

I know it's there before but I'd like to suggest to give an example - doc 
or code example below.
it's a bit different from python/scala I think


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #22818: [SPARK-25904][CORE] Allocate arrays smaller than Int.Max...

2018-11-05 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/22818
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 

https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution-unified/4788/
Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #22939: [SPARK-25446][R] Add schema_of_json() and schema_...

2018-11-05 Thread felixcheung

Github user felixcheung commented on a diff in the pull request:

https://github.com/apache/spark/pull/22939#discussion_r231025282
  
--- Diff: R/pkg/R/functions.R ---
@@ -2230,6 +2237,32 @@ setMethod("from_json", signature(x = "Column", 
schema = "characterOrstructType")
 column(jc)
   })
 
+#' @details
+#' \code{schema_of_json}: Parses a JSON string and infers its schema in 
DDL format.
+#'
+#' @rdname column_collection_functions
+#' @aliases schema_of_json schema_of_json,characterOrColumn-method
+#' @examples
+#'
+#' \dontrun{
+#' json <- '{"name":"Bob"}'
+#' df <- sql("SELECT * FROM range(1)")
+#' head(select(df, schema_of_json(json)))}
+#' @note schema_of_json since 3.0.0
+setMethod("schema_of_json", signature(x = "characterOrColumn"),
+  function(x, ...) {
+if (class(x) == "character") {
+  col <- callJStatic("org.apache.spark.sql.functions", "lit", 
x)
+} else {
+  col <- x@jc
--- End diff --

ok but one use could be 

`select(df, schema_of_csv(df$schemaCol))`

like an actual col not a literal string?


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #22818: [SPARK-25904][CORE] Allocate arrays smaller than Int.Max...

2018-11-05 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/22818
  
Merged build finished. Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #22921: [SPARK-25908][CORE][SQL] Remove old deprecated it...

2018-11-05 Thread felixcheung

Github user felixcheung commented on a diff in the pull request:

https://github.com/apache/spark/pull/22921#discussion_r231024007
  
--- Diff: R/pkg/R/functions.R ---
@@ -1641,30 +1641,30 @@ setMethod("tanh",
   })
 
 #' @details
-#' \code{toDegrees}: Converts an angle measured in radians to an 
approximately equivalent angle
+#' \code{degrees}: Converts an angle measured in radians to an 
approximately equivalent angle
 #' measured in degrees.
 #'
 #' @rdname column_math_functions
-#' @aliases toDegrees toDegrees,Column-method
-#' @note toDegrees since 1.4.0
-setMethod("toDegrees",
+#' @aliases degrees degrees,Column-method
+#' @note degrees since 2.1.0
--- End diff --

yes.. (version here is R API specific)


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #22921: [SPARK-25908][CORE][SQL] Remove old deprecated it...

2018-11-05 Thread felixcheung

Github user felixcheung commented on a diff in the pull request:

https://github.com/apache/spark/pull/22921#discussion_r231023768
  
--- Diff: R/pkg/R/generics.R ---
@@ -748,7 +748,7 @@ setGeneric("add_months", function(y, x) { 
standardGeneric("add_months") })
 
 #' @rdname column_aggregate_functions
 #' @name NULL
-setGeneric("approxCountDistinct", function(x, ...) { 
standardGeneric("approxCountDistinct") })
+setGeneric("approx_count_distinct", function(x, ...) { 
standardGeneric("approx_count_distinct") })
--- End diff --

I think it's super light weight to have a `approxCountDistinct` that calls 
`approx_count_distinct` with deprecation?
I thought was that R API was not always sync or complete compare to python, 
and a breaking API change - ie. the job will fail - seems a bit drastic even in 
a major release.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #22502: [SPARK-25474][SQL]When the "fallBackToHdfsForStats= true...

2018-11-05 Thread shahidki31

Github user shahidki31 commented on the issue:

https://github.com/apache/spark/pull/22502
  
@cloud-fan Thanks. I will check and update the PR.




---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #22952: [SPARK-20568][SS] Rename files which are completed in pr...

2018-11-05 Thread HeartSaVioR

Github user HeartSaVioR commented on the issue:

https://github.com/apache/spark/pull/22952
  
cc. @zsxwing 


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #22305: [SPARK-24561][SQL][Python] User-defined window aggregati...

2018-11-05 Thread ueshin

Github user ueshin commented on the issue:

https://github.com/apache/spark/pull/22305
  
I'll do a review too hopefully this week. Sorry for the delay.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #22937: [SPARK-25934] [Mesos] Don't propagate SPARK_CONF_...

2018-11-05 Thread dongjoon-hyun

Github user dongjoon-hyun commented on a diff in the pull request:

https://github.com/apache/spark/pull/22937#discussion_r231006006
  
--- Diff: 
core/src/main/scala/org/apache/spark/deploy/rest/RestSubmissionClient.scala ---
@@ -418,8 +418,8 @@ private[spark] object RestSubmissionClient {
   private[rest] def filterSystemEnvironment(env: Map[String, String]): 
Map[String, String] = {
 env.filterKeys { k =>
   // SPARK_HOME is filtered out because it is usually wrong on the 
remote machine (SPARK-12345)
-  (k.startsWith("SPARK_") && k != "SPARK_ENV_LOADED" && k != 
"SPARK_HOME") ||
-k.startsWith("MESOS_")
+  (k.startsWith("SPARK_") && k != "SPARK_ENV_LOADED" && k != 
"SPARK_HOME"
+&& k != "SPARK_CONF_DIR") || k.startsWith("MESOS_")
--- End diff --

Could you add a test case in `StandaloneRestSubmitSuite.scala` in order to 
prevent a future regression?


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #22823: [SPARK-25676][SQL][TEST] Rename and refactor BenchmarkWi...

2018-11-05 Thread dongjoon-hyun

Github user dongjoon-hyun commented on the issue:

https://github.com/apache/spark/pull/22823
  
Could you review and merge https://github.com/yucai/spark/pull/7, @yucai ?


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #22305: [SPARK-24561][SQL][Python] User-defined window aggregati...

2018-11-05 Thread HyukjinKwon

Github user HyukjinKwon commented on the issue:

https://github.com/apache/spark/pull/22305
  
Let me try to take a look this weekends. Sorry it's been delayed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #22823: [SPARK-25676][SQL][TEST] Rename and refactor BenchmarkWi...

2018-11-05 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/22823
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/98501/
Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #22823: [SPARK-25676][SQL][TEST] Rename and refactor BenchmarkWi...

2018-11-05 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/22823
  
Merged build finished. Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #22823: [SPARK-25676][SQL][TEST] Rename and refactor BenchmarkWi...

2018-11-05 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/22823
  
**[Test build #98501 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/98501/testReport)**
 for PR 22823 at commit 
[`f714cc8`](https://github.com/apache/spark/commit/f714cc8795568311dee3b0c93901d16eadc198fb).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds no public classes.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #22953: [SPARK-25946] [BUILD] Upgrade ASM to 7.x to suppo...

2018-11-05 Thread dbtsai

Github user dbtsai closed the pull request at:

https://github.com/apache/spark/pull/22953


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #22953: [SPARK-25946] [BUILD] Upgrade ASM to 7.x to support JDK1...

2018-11-05 Thread dbtsai

Github user dbtsai commented on the issue:

https://github.com/apache/spark/pull/22953
  
Thanks. Merged into master.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #22944: [SPARK-25942][SQL] Fix Dataset.groupByKey to make...

2018-11-05 Thread viirya

Github user viirya commented on a diff in the pull request:

https://github.com/apache/spark/pull/22944#discussion_r231002129
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/plans/logical/object.scala
 ---
@@ -262,25 +262,39 @@ object AppendColumns {
   def apply[T : Encoder, U : Encoder](
   func: T => U,
   child: LogicalPlan): AppendColumns = {
+val outputEncoder = encoderFor[U]
+val namedExpressions = if (!outputEncoder.isSerializedAsStruct) {
+  assert(outputEncoder.namedExpressions.length == 1)
+  outputEncoder.namedExpressions.map(Alias(_, "key")())
+} else {
+  outputEncoder.namedExpressions
+}
 new AppendColumns(
   func.asInstanceOf[Any => Any],
   implicitly[Encoder[T]].clsTag.runtimeClass,
   implicitly[Encoder[T]].schema,
   UnresolvedDeserializer(encoderFor[T].deserializer),
-  encoderFor[U].namedExpressions,
+  namedExpressions,
   child)
   }
 
   def apply[T : Encoder, U : Encoder](
   func: T => U,
   inputAttributes: Seq[Attribute],
   child: LogicalPlan): AppendColumns = {
+val outputEncoder = encoderFor[U]
+val namedExpressions = if (!outputEncoder.isSerializedAsStruct) {
+  assert(outputEncoder.namedExpressions.length == 1)
+  outputEncoder.namedExpressions.map(Alias(_, "key")())
+} else {
+  outputEncoder.namedExpressions
--- End diff --

For option of product, I think it is due to typed select. I will address it 
at #21732.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #22138: [SPARK-25151][SS] Apply Apache Commons Pool to KafkaData...

2018-11-05 Thread HeartSaVioR

Github user HeartSaVioR commented on the issue:

https://github.com/apache/spark/pull/22138
  
@zsxwing 
Given that Spark 2.4 vote passes, could we revisit and make progress on 
this?


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #22944: [SPARK-25942][SQL] Fix Dataset.groupByKey to make...

2018-11-05 Thread viirya

Github user viirya commented on a diff in the pull request:

https://github.com/apache/spark/pull/22944#discussion_r231001655
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/plans/logical/object.scala
 ---
@@ -262,25 +262,39 @@ object AppendColumns {
   def apply[T : Encoder, U : Encoder](
   func: T => U,
   child: LogicalPlan): AppendColumns = {
+val outputEncoder = encoderFor[U]
+val namedExpressions = if (!outputEncoder.isSerializedAsStruct) {
+  assert(outputEncoder.namedExpressions.length == 1)
+  outputEncoder.namedExpressions.map(Alias(_, "key")())
+} else {
+  outputEncoder.namedExpressions
+}
 new AppendColumns(
   func.asInstanceOf[Any => Any],
   implicitly[Encoder[T]].clsTag.runtimeClass,
   implicitly[Encoder[T]].schema,
   UnresolvedDeserializer(encoderFor[T].deserializer),
-  encoderFor[U].namedExpressions,
+  namedExpressions,
   child)
   }
 
   def apply[T : Encoder, U : Encoder](
   func: T => U,
   inputAttributes: Seq[Attribute],
   child: LogicalPlan): AppendColumns = {
+val outputEncoder = encoderFor[U]
+val namedExpressions = if (!outputEncoder.isSerializedAsStruct) {
+  assert(outputEncoder.namedExpressions.length == 1)
+  outputEncoder.namedExpressions.map(Alias(_, "key")())
+} else {
+  outputEncoder.namedExpressions
--- End diff --

For primitive type and product type, looks like it works:
```scala
test("typed aggregation on primitive data") {
  val ds = Seq(1, 2, 3).toDS()
  val agg = ds.select(expr("value").as("data").as[Int])
.groupByKey(_ >= 2)
.agg(sum("data").as[Long], sum($"data" + 1).as[Long])
  agg.show()
}
```
```
+-+-+---+
|value|sum(data)|sum((data + 1))|
+-+-+---+
|false|1|  2|
| true|5|  7|
+-+-+---+
```

```scala
test("typed aggregation on product data") {
  val ds = Seq((1, 2), (2, 3), (3, 4)).toDS()
  val agg = ds.select(expr("_1").as("a").as[Int], 
expr("_2").as("b").as[Int])
.groupByKey(_._1).agg(sum("a").as[Int], sum($"b" + 1).as[Int])
  agg.show
}
```
```
[info] - typed aggregation on primitive data (192 milliseconds)
+-+--++
|value|sum(a)|sum((b + 1))|
+-+--++
|3| 3|   5|
|1| 1|   3|
|2| 2|   4|
+-+--++

```


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #22823: [SPARK-25676][SQL][TEST] Refactor BenchmarkWideTable to ...

2018-11-05 Thread dongjoon-hyun

Github user dongjoon-hyun commented on the issue:

https://github.com/apache/spark/pull/22823
  
Thank you, @yucai . Could you update the title because we are renaming it? 
Maybe, `[SPARK-25676][SQL][TEST] Rename and refactor BenchmarkWideTable to use 
main method`?


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #22949: [minor] update known_translations

2018-11-05 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/22949
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/98498/
Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #22949: [minor] update known_translations

2018-11-05 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/22949
  
Merged build finished. Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #22949: [minor] update known_translations

2018-11-05 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/22949
  
**[Test build #98498 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/98498/testReport)**
 for PR 22949 at commit 
[`9808e43`](https://github.com/apache/spark/commit/9808e434bbfd7f703becc48a7d4fe3628131ad58).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds no public classes.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #17086: [SPARK-24101][ML][MLLIB] ML Evaluators should use weight...

2018-11-05 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/17086
  
Merged build finished. Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #17086: [SPARK-24101][ML][MLLIB] ML Evaluators should use weight...

2018-11-05 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/17086
  
**[Test build #98503 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/98503/testReport)**
 for PR 17086 at commit 
[`0ca102a`](https://github.com/apache/spark/commit/0ca102af8aad76259d8026b0fbeabe5b277e3962).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds the following public classes _(experimental)_:
  * `class MulticlassMetrics @Since(\"3.0.0\") (predAndLabelsWithOptWeight: 
RDD[_]) `


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #17086: [SPARK-24101][ML][MLLIB] ML Evaluators should use weight...

2018-11-05 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/17086
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/98503/
Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #22943: [SPARK-25098][SQL] Trim the string when cast stri...

2018-11-05 Thread dongjoon-hyun

Github user dongjoon-hyun commented on a diff in the pull request:

https://github.com/apache/spark/pull/22943#discussion_r230998508
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/Cast.scala
 ---
@@ -359,7 +359,7 @@ case class Cast(child: Expression, dataType: DataType, 
timeZoneId: Option[String
   // TimestampConverter
   private[this] def castToTimestamp(from: DataType): Any => Any = from 
match {
 case StringType =>
-  buildCast[UTF8String](_, utfs => 
DateTimeUtils.stringToTimestamp(utfs, timeZone).orNull)
+  buildCast[UTF8String](_, s => 
DateTimeUtils.stringToTimestamp(s.trim(), timeZone).orNull)
--- End diff --

Ur, I'd like not to rename it. One line function document will suffice.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #22944: [SPARK-25942][SQL] Fix Dataset.groupByKey to make...

2018-11-05 Thread cloud-fan

Github user cloud-fan commented on a diff in the pull request:

https://github.com/apache/spark/pull/22944#discussion_r230997935
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/plans/logical/object.scala
 ---
@@ -262,25 +262,39 @@ object AppendColumns {
   def apply[T : Encoder, U : Encoder](
   func: T => U,
   child: LogicalPlan): AppendColumns = {
+val outputEncoder = encoderFor[U]
+val namedExpressions = if (!outputEncoder.isSerializedAsStruct) {
+  assert(outputEncoder.namedExpressions.length == 1)
+  outputEncoder.namedExpressions.map(Alias(_, "key")())
+} else {
+  outputEncoder.namedExpressions
+}
 new AppendColumns(
   func.asInstanceOf[Any => Any],
   implicitly[Encoder[T]].clsTag.runtimeClass,
   implicitly[Encoder[T]].schema,
   UnresolvedDeserializer(encoderFor[T].deserializer),
-  encoderFor[U].namedExpressions,
+  namedExpressions,
   child)
   }
 
   def apply[T : Encoder, U : Encoder](
   func: T => U,
   inputAttributes: Seq[Attribute],
   child: LogicalPlan): AppendColumns = {
+val outputEncoder = encoderFor[U]
+val namedExpressions = if (!outputEncoder.isSerializedAsStruct) {
+  assert(outputEncoder.namedExpressions.length == 1)
+  outputEncoder.namedExpressions.map(Alias(_, "key")())
+} else {
+  outputEncoder.namedExpressions
--- End diff --

is this a special case of option of product? can you try pritimive type and 
product type?



---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #22944: [SPARK-25942][SQL] Fix Dataset.groupByKey to make...

2018-11-05 Thread viirya

Github user viirya commented on a diff in the pull request:

https://github.com/apache/spark/pull/22944#discussion_r230995240
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/plans/logical/object.scala
 ---
@@ -262,25 +262,39 @@ object AppendColumns {
   def apply[T : Encoder, U : Encoder](
   func: T => U,
   child: LogicalPlan): AppendColumns = {
+val outputEncoder = encoderFor[U]
+val namedExpressions = if (!outputEncoder.isSerializedAsStruct) {
+  assert(outputEncoder.namedExpressions.length == 1)
+  outputEncoder.namedExpressions.map(Alias(_, "key")())
+} else {
+  outputEncoder.namedExpressions
+}
 new AppendColumns(
   func.asInstanceOf[Any => Any],
   implicitly[Encoder[T]].clsTag.runtimeClass,
   implicitly[Encoder[T]].schema,
   UnresolvedDeserializer(encoderFor[T].deserializer),
-  encoderFor[U].namedExpressions,
+  namedExpressions,
   child)
   }
 
   def apply[T : Encoder, U : Encoder](
   func: T => U,
   inputAttributes: Seq[Attribute],
   child: LogicalPlan): AppendColumns = {
+val outputEncoder = encoderFor[U]
+val namedExpressions = if (!outputEncoder.isSerializedAsStruct) {
+  assert(outputEncoder.namedExpressions.length == 1)
+  outputEncoder.namedExpressions.map(Alias(_, "key")())
+} else {
+  outputEncoder.namedExpressions
--- End diff --

I just tried to manually add alias. It seems not working as we expect:

```scala
val ds = Seq(Some(("a", 10)), Some(("a", 20)), Some(("b", 1)), Some(("b", 
2)), Some(("c", 1)), None).toDS()

val newDS = ds.select(expr("value").as("opt").as[Option[(String, Int)]])

// schema
root
   
 |-- value: struct (nullable = true)
   
 ||-- _1: string (nullable = true)  

 ||-- _2: integer (nullable = false)

// physical plan
*(1) SerializeFromObject [if (isnull(unwrapoption(ObjectType(class 
scala.Tuple2), input[0, scala.Option, true]))) null else named_struct(_1, 
staticinvo
ke(class org.apache.spark.unsafe.types.UTF8String, StringType, fromString, 
assertnotnull(unwrapoption(ObjectType(class scala.Tuple2), input[0, scala.Op
tion, true]))._1, true, false), _2, 
assertnotnull(unwrapoption(ObjectType(class scala.Tuple2), input[0, 
scala.Option, true]))._2) AS value#5482]
+- *(1) MapElements , obj#5481: scala.Option 

   +- *(1) DeserializeToObject newInstance(class scala.Tuple1), obj#5480: 
scala.Tuple1  
  +- *(1) Project [value#5473 AS opt#5476]  
   
 +- LocalTableScan [value#5473]   
```

So even we add alias before `groupByKey`, it can't change dataset's output 
names.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #22953: [SPARK-25946] [BUILD] Upgrade ASM to 7.x to support JDK1...

2018-11-05 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/22953
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/98496/
Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #22953: [SPARK-25946] [BUILD] Upgrade ASM to 7.x to support JDK1...

2018-11-05 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/22953
  
Merged build finished. Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #22953: [SPARK-25946] [BUILD] Upgrade ASM to 7.x to support JDK1...

2018-11-05 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/22953
  
**[Test build #98496 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/98496/testReport)**
 for PR 22953 at commit 
[`7b19dc8`](https://github.com/apache/spark/commit/7b19dc8616670c8db4b853e7fcfd192f8f55e09a).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds no public classes.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #17086: [SPARK-24101][ML][MLLIB] ML Evaluators should use weight...

2018-11-05 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/17086
  
Merged build finished. Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #17086: [SPARK-24101][ML][MLLIB] ML Evaluators should use weight...

2018-11-05 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/17086
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 

https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution-unified/4787/
Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #17086: [SPARK-24101][ML][MLLIB] ML Evaluators should use weight...

2018-11-05 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/17086
  
**[Test build #98503 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/98503/testReport)**
 for PR 17086 at commit 
[`0ca102a`](https://github.com/apache/spark/commit/0ca102af8aad76259d8026b0fbeabe5b277e3962).


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #17086: [SPARK-24101][ML][MLLIB] ML Evaluators should use...

2018-11-05 Thread imatiach-msft

Github user imatiach-msft commented on a diff in the pull request:

https://github.com/apache/spark/pull/17086#discussion_r230992722
  
--- Diff: 
mllib/src/main/scala/org/apache/spark/ml/evaluation/MulticlassClassificationEvaluator.scala
 ---
@@ -67,6 +68,10 @@ class MulticlassClassificationEvaluator @Since("1.5.0") 
(@Since("1.5.0") overrid
   @Since("1.5.0")
   def setLabelCol(value: String): this.type = set(labelCol, value)
 
+  /** @group setParam */
+  @Since("2.4.0")
--- End diff --

done!


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #17086: [SPARK-24101][ML][MLLIB] ML Evaluators should use...

2018-11-05 Thread imatiach-msft

Github user imatiach-msft commented on a diff in the pull request:

https://github.com/apache/spark/pull/17086#discussion_r230992384
  
--- Diff: 
mllib/src/main/scala/org/apache/spark/mllib/evaluation/MulticlassMetrics.scala 
---
@@ -27,10 +27,17 @@ import org.apache.spark.sql.DataFrame
 /**
  * Evaluator for multiclass classification.
  *
- * @param predictionAndLabels an RDD of (prediction, label) pairs.
+ * @param predAndLabelsWithOptWeight an RDD of (prediction, label, weight) 
or
+ * (prediction, label) pairs.
  */
 @Since("1.1.0")
-class MulticlassMetrics @Since("1.1.0") (predictionAndLabels: RDD[(Double, 
Double)]) {
+class MulticlassMetrics @Since("2.4.0") (predAndLabelsWithOptWeight: 
RDD[_]) {
--- End diff --

thanks!  yes, it is backwards compatible.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #17086: [SPARK-24101][ML][MLLIB] ML Evaluators should use...

2018-11-05 Thread imatiach-msft

Github user imatiach-msft commented on a diff in the pull request:

https://github.com/apache/spark/pull/17086#discussion_r230992285
  
--- Diff: 
mllib/src/test/scala/org/apache/spark/mllib/evaluation/MulticlassMetricsSuite.scala
 ---
@@ -18,10 +18,14 @@
 package org.apache.spark.mllib.evaluation
 
 import org.apache.spark.SparkFunSuite
-import org.apache.spark.mllib.linalg.Matrices
+import org.apache.spark.ml.linalg.Matrices
+import org.apache.spark.ml.util.TestingUtils._
 import org.apache.spark.mllib.util.MLlibTestSparkContext
 
 class MulticlassMetricsSuite extends SparkFunSuite with 
MLlibTestSparkContext {
+
+  val delta = 1e-7
--- End diff --

done!


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #22087: [SPARK-25097][ML] Support prediction on single instance ...

2018-11-05 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/22087
  
Merged build finished. Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #22087: [SPARK-25097][ML] Support prediction on single instance ...

2018-11-05 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/22087
  
**[Test build #98502 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/98502/testReport)**
 for PR 22087 at commit 
[`fb3c6d1`](https://github.com/apache/spark/commit/fb3c6d1f933b807c89e5892dadf8654ec280d3b5).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds no public classes.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #22087: [SPARK-25097][ML] Support prediction on single instance ...

2018-11-05 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/22087
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/98502/
Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #22944: [SPARK-25942][SQL] Fix Dataset.groupByKey to make...

2018-11-05 Thread viirya

Github user viirya commented on a diff in the pull request:

https://github.com/apache/spark/pull/22944#discussion_r230989493
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/plans/logical/object.scala
 ---
@@ -262,25 +262,39 @@ object AppendColumns {
   def apply[T : Encoder, U : Encoder](
   func: T => U,
   child: LogicalPlan): AppendColumns = {
+val outputEncoder = encoderFor[U]
+val namedExpressions = if (!outputEncoder.isSerializedAsStruct) {
+  assert(outputEncoder.namedExpressions.length == 1)
+  outputEncoder.namedExpressions.map(Alias(_, "key")())
+} else {
+  outputEncoder.namedExpressions
+}
 new AppendColumns(
   func.asInstanceOf[Any => Any],
   implicitly[Encoder[T]].clsTag.runtimeClass,
   implicitly[Encoder[T]].schema,
   UnresolvedDeserializer(encoderFor[T].deserializer),
-  encoderFor[U].namedExpressions,
+  namedExpressions,
   child)
   }
 
   def apply[T : Encoder, U : Encoder](
   func: T => U,
   inputAttributes: Seq[Attribute],
   child: LogicalPlan): AppendColumns = {
+val outputEncoder = encoderFor[U]
+val namedExpressions = if (!outputEncoder.isSerializedAsStruct) {
+  assert(outputEncoder.namedExpressions.length == 1)
+  outputEncoder.namedExpressions.map(Alias(_, "key")())
+} else {
+  outputEncoder.namedExpressions
--- End diff --

Ok. Sounds good. Let me do the change.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #22547: [SPARK-25528][SQL] data source V2 read side API r...

2018-11-05 Thread cloud-fan

Github user cloud-fan commented on a diff in the pull request:

https://github.com/apache/spark/pull/22547#discussion_r230989559
  
--- Diff: 
external/kafka-0-10-sql/src/main/scala/org/apache/spark/sql/kafka010/KafkaContinuousInputStream.scala
 ---
@@ -46,17 +45,22 @@ import org.apache.spark.sql.types.StructType
  *   scenarios, where some offsets after the specified 
initial ones can't be
  *   properly read.
  */
-class KafkaContinuousReadSupport(
+class KafkaContinuousInputStream(
--- End diff --

Yea I'll separate this PR into 3 smaller ones, after we have agreed on the 
high-level design at 
https://docs.google.com/document/d/1uUmKCpWLdh9vHxP7AWJ9EgbwB_U6T3EJYNjhISGmiQg/edit?usp=sharing


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #22944: [SPARK-25942][SQL] Fix Dataset.groupByKey to make...

2018-11-05 Thread cloud-fan

Github user cloud-fan commented on a diff in the pull request:

https://github.com/apache/spark/pull/22944#discussion_r230989073
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/plans/logical/object.scala
 ---
@@ -262,25 +262,39 @@ object AppendColumns {
   def apply[T : Encoder, U : Encoder](
   func: T => U,
   child: LogicalPlan): AppendColumns = {
+val outputEncoder = encoderFor[U]
+val namedExpressions = if (!outputEncoder.isSerializedAsStruct) {
+  assert(outputEncoder.namedExpressions.length == 1)
+  outputEncoder.namedExpressions.map(Alias(_, "key")())
+} else {
+  outputEncoder.namedExpressions
+}
 new AppendColumns(
   func.asInstanceOf[Any => Any],
   implicitly[Encoder[T]].clsTag.runtimeClass,
   implicitly[Encoder[T]].schema,
   UnresolvedDeserializer(encoderFor[T].deserializer),
-  encoderFor[U].namedExpressions,
+  namedExpressions,
   child)
   }
 
   def apply[T : Encoder, U : Encoder](
   func: T => U,
   inputAttributes: Seq[Attribute],
   child: LogicalPlan): AppendColumns = {
+val outputEncoder = encoderFor[U]
+val namedExpressions = if (!outputEncoder.isSerializedAsStruct) {
+  assert(outputEncoder.namedExpressions.length == 1)
+  outputEncoder.namedExpressions.map(Alias(_, "key")())
+} else {
+  outputEncoder.namedExpressions
--- End diff --

We can improve the `CheckAnalysis` to detect this case, and improve the 
error message to ask users to do alias.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #22944: [SPARK-25942][SQL] Fix Dataset.groupByKey to make...

2018-11-05 Thread viirya

Github user viirya commented on a diff in the pull request:

https://github.com/apache/spark/pull/22944#discussion_r230986888
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/plans/logical/object.scala
 ---
@@ -262,25 +262,39 @@ object AppendColumns {
   def apply[T : Encoder, U : Encoder](
   func: T => U,
   child: LogicalPlan): AppendColumns = {
+val outputEncoder = encoderFor[U]
+val namedExpressions = if (!outputEncoder.isSerializedAsStruct) {
+  assert(outputEncoder.namedExpressions.length == 1)
+  outputEncoder.namedExpressions.map(Alias(_, "key")())
+} else {
+  outputEncoder.namedExpressions
+}
 new AppendColumns(
   func.asInstanceOf[Any => Any],
   implicitly[Encoder[T]].clsTag.runtimeClass,
   implicitly[Encoder[T]].schema,
   UnresolvedDeserializer(encoderFor[T].deserializer),
-  encoderFor[U].namedExpressions,
+  namedExpressions,
   child)
   }
 
   def apply[T : Encoder, U : Encoder](
   func: T => U,
   inputAttributes: Seq[Attribute],
   child: LogicalPlan): AppendColumns = {
+val outputEncoder = encoderFor[U]
+val namedExpressions = if (!outputEncoder.isSerializedAsStruct) {
+  assert(outputEncoder.namedExpressions.length == 1)
+  outputEncoder.namedExpressions.map(Alias(_, "key")())
+} else {
+  outputEncoder.namedExpressions
--- End diff --

You mean we detect conflicting case, and show some error messages to ask 
users for that?


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #22944: [SPARK-25942][SQL] Fix Dataset.groupByKey to make...

2018-11-05 Thread cloud-fan

Github user cloud-fan commented on a diff in the pull request:

https://github.com/apache/spark/pull/22944#discussion_r230986226
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/plans/logical/object.scala
 ---
@@ -262,25 +262,39 @@ object AppendColumns {
   def apply[T : Encoder, U : Encoder](
   func: T => U,
   child: LogicalPlan): AppendColumns = {
+val outputEncoder = encoderFor[U]
+val namedExpressions = if (!outputEncoder.isSerializedAsStruct) {
+  assert(outputEncoder.namedExpressions.length == 1)
+  outputEncoder.namedExpressions.map(Alias(_, "key")())
+} else {
+  outputEncoder.namedExpressions
+}
 new AppendColumns(
   func.asInstanceOf[Any => Any],
   implicitly[Encoder[T]].clsTag.runtimeClass,
   implicitly[Encoder[T]].schema,
   UnresolvedDeserializer(encoderFor[T].deserializer),
-  encoderFor[U].namedExpressions,
+  namedExpressions,
   child)
   }
 
   def apply[T : Encoder, U : Encoder](
   func: T => U,
   inputAttributes: Seq[Attribute],
   child: LogicalPlan): AppendColumns = {
+val outputEncoder = encoderFor[U]
+val namedExpressions = if (!outputEncoder.isSerializedAsStruct) {
+  assert(outputEncoder.namedExpressions.length == 1)
+  outputEncoder.namedExpressions.map(Alias(_, "key")())
+} else {
+  outputEncoder.namedExpressions
--- End diff --

if that is the case, I feel it better to ask users to resolve conflict 
manually, by adding alias.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #22919: [SPARK-25906][SHELL] Documents '-I' option (from ...

2018-11-05 Thread asfgit

Github user asfgit closed the pull request at:

https://github.com/apache/spark/pull/22919


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #22953: [SPARK-25946] [BUILD] Upgrade ASM to 7.x to support JDK1...

2018-11-05 Thread HyukjinKwon

Github user HyukjinKwon commented on the issue:

https://github.com/apache/spark/pull/22953
  
Looks good to me.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #22919: [SPARK-25906][SHELL] Documents '-I' option (from Scala R...

2018-11-05 Thread HyukjinKwon

Github user HyukjinKwon commented on the issue:

https://github.com/apache/spark/pull/22919
  
Merged to master and branch-2.4.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #22937: [SPARK-25934] [Mesos] Don't propagate SPARK_CONF_DIR fro...

2018-11-05 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/22937
  
Merged build finished. Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #22937: [SPARK-25934] [Mesos] Don't propagate SPARK_CONF_DIR fro...

2018-11-05 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/22937
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/98494/
Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #22937: [SPARK-25934] [Mesos] Don't propagate SPARK_CONF_DIR fro...

2018-11-05 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/22937
  
**[Test build #98494 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/98494/testReport)**
 for PR 22937 at commit 
[`10da112`](https://github.com/apache/spark/commit/10da1120d52f61e98bcd929d7ad59220a93d59f7).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds no public classes.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #22944: [SPARK-25942][SQL] Fix Dataset.groupByKey to make...

2018-11-05 Thread viirya

Github user viirya commented on a diff in the pull request:

https://github.com/apache/spark/pull/22944#discussion_r230983077
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/plans/logical/object.scala
 ---
@@ -262,25 +262,39 @@ object AppendColumns {
   def apply[T : Encoder, U : Encoder](
   func: T => U,
   child: LogicalPlan): AppendColumns = {
+val outputEncoder = encoderFor[U]
+val namedExpressions = if (!outputEncoder.isSerializedAsStruct) {
+  assert(outputEncoder.namedExpressions.length == 1)
+  outputEncoder.namedExpressions.map(Alias(_, "key")())
+} else {
+  outputEncoder.namedExpressions
+}
 new AppendColumns(
   func.asInstanceOf[Any => Any],
   implicitly[Encoder[T]].clsTag.runtimeClass,
   implicitly[Encoder[T]].schema,
   UnresolvedDeserializer(encoderFor[T].deserializer),
-  encoderFor[U].namedExpressions,
+  namedExpressions,
   child)
   }
 
   def apply[T : Encoder, U : Encoder](
   func: T => U,
   inputAttributes: Seq[Attribute],
   child: LogicalPlan): AppendColumns = {
+val outputEncoder = encoderFor[U]
+val namedExpressions = if (!outputEncoder.isSerializedAsStruct) {
+  assert(outputEncoder.namedExpressions.length == 1)
+  outputEncoder.namedExpressions.map(Alias(_, "key")())
+} else {
+  outputEncoder.namedExpressions
--- End diff --

yea, but for such cases, seems it is more complicated as we can't simply 
create aliases for serializer fields. Because in methods like `mapGroups`, we 
need to access original fields of key.




---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #22823: [SPARK-25676][SQL][TEST] Refactor BenchmarkWideTable to ...

2018-11-05 Thread yucai

Github user yucai commented on the issue:

https://github.com/apache/spark/pull/22823
  
@dongjoon-hyun Just push the rebased version, thanks!


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #22087: [SPARK-25097][ML] Support prediction on single instance ...

2018-11-05 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/22087
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 

https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution-unified/4786/
Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #22087: [SPARK-25097][ML] Support prediction on single instance ...

2018-11-05 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/22087
  
**[Test build #98502 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/98502/testReport)**
 for PR 22087 at commit 
[`fb3c6d1`](https://github.com/apache/spark/commit/fb3c6d1f933b807c89e5892dadf8654ec280d3b5).


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #22087: [SPARK-25097][ML] Support prediction on single instance ...

2018-11-05 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/22087
  
Merged build finished. Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #22823: [SPARK-25676][SQL][TEST] Refactor BenchmarkWideTable to ...

2018-11-05 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/22823
  
**[Test build #98501 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/98501/testReport)**
 for PR 22823 at commit 
[`f714cc8`](https://github.com/apache/spark/commit/f714cc8795568311dee3b0c93901d16eadc198fb).


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #22952: [SPARK-20568][SS] Rename files which are completed in pr...

2018-11-05 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/22952
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/98493/
Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #22952: [SPARK-20568][SS] Rename files which are completed in pr...

2018-11-05 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/22952
  
Merged build finished. Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #22952: [SPARK-20568][SS] Rename files which are completed in pr...

2018-11-05 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/22952
  
**[Test build #98493 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/98493/testReport)**
 for PR 22952 at commit 
[`fb01c60`](https://github.com/apache/spark/commit/fb01c60624389ee432d0a23afd14e956453cd22e).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds no public classes.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #22944: [SPARK-25942][SQL] Fix Dataset.groupByKey to make...

2018-11-05 Thread cloud-fan

Github user cloud-fan commented on a diff in the pull request:

https://github.com/apache/spark/pull/22944#discussion_r230977212
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/plans/logical/object.scala
 ---
@@ -262,25 +262,39 @@ object AppendColumns {
   def apply[T : Encoder, U : Encoder](
   func: T => U,
   child: LogicalPlan): AppendColumns = {
+val outputEncoder = encoderFor[U]
+val namedExpressions = if (!outputEncoder.isSerializedAsStruct) {
+  assert(outputEncoder.namedExpressions.length == 1)
+  outputEncoder.namedExpressions.map(Alias(_, "key")())
+} else {
+  outputEncoder.namedExpressions
+}
 new AppendColumns(
   func.asInstanceOf[Any => Any],
   implicitly[Encoder[T]].clsTag.runtimeClass,
   implicitly[Encoder[T]].schema,
   UnresolvedDeserializer(encoderFor[T].deserializer),
-  encoderFor[U].namedExpressions,
+  namedExpressions,
   child)
   }
 
   def apply[T : Encoder, U : Encoder](
   func: T => U,
   inputAttributes: Seq[Attribute],
   child: LogicalPlan): AppendColumns = {
+val outputEncoder = encoderFor[U]
+val namedExpressions = if (!outputEncoder.isSerializedAsStruct) {
+  assert(outputEncoder.namedExpressions.length == 1)
+  outputEncoder.namedExpressions.map(Alias(_, "key")())
+} else {
+  outputEncoder.namedExpressions
--- End diff --

so we may still fail if `T` and `U` are case classes and have conflict 
field names?


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #22889: [SPARK-25882][SQL] Added a function to join two datasets...

2018-11-05 Thread rxin

Github user rxin commented on the issue:

https://github.com/apache/spark/pull/22889
  
Yea good idea (prefer Array over Seq for short lists)


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #22683: [SPARK-25696] The storage memory displayed on spark Appl...

2018-11-05 Thread httfighter

Github user httfighter commented on the issue:

https://github.com/apache/spark/pull/22683
  
It's ok. @ajbozarth 


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #22952: [SPARK-20568][SS] Rename files which are completed in pr...

2018-11-05 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/22952
  
Merged build finished. Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #22952: [SPARK-20568][SS] Rename files which are completed in pr...

2018-11-05 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/22952
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/98491/
Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #22952: [SPARK-20568][SS] Rename files which are completed in pr...

2018-11-05 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/22952
  
**[Test build #98491 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/98491/testReport)**
 for PR 22952 at commit 
[`8a1d2e1`](https://github.com/apache/spark/commit/8a1d2e187c667833b2de8eb4cba2fa04dca9c6ff).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds no public classes.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #22928: [SPARK-25926][CORE] Move config entries in core module t...

2018-11-05 Thread cloud-fan

Github user cloud-fan commented on the issue:

https://github.com/apache/spark/pull/22928
  
thanks, merging to master!


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #22947: [SPARK-24913][SQL] Make AssertNotNull and AssertT...

2018-11-05 Thread dbtsai

Github user dbtsai commented on a diff in the pull request:

https://github.com/apache/spark/pull/22947#discussion_r230975661
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/misc.scala
 ---
@@ -66,6 +66,8 @@ case class AssertTrue(child: Expression) extends 
UnaryExpression with ImplicitCa
 
   override def nullable: Boolean = true
 
+  override lazy val deterministic: Boolean = false
--- End diff --

Because of this, I'm leaning towards creating a new flag instead of making 
them non-deterministic.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #22953: [SPARK-25946] [BUILD] Upgrade ASM to 7.x to support JDK1...

2018-11-05 Thread dbtsai

Github user dbtsai commented on the issue:

https://github.com/apache/spark/pull/22953
  
ASM6 supports Java 9 while ASM7 supports Java 9, Java 10, and Java 11.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #22946: [SPARK-25943][SQL] Fail if mismatching nested struct fie...

2018-11-05 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/22946
  
**[Test build #98500 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/98500/testReport)**
 for PR 22946 at commit 
[`63dd40f`](https://github.com/apache/spark/commit/63dd40f47ab8e8e9c120a9801b2f037336001ea6).
 * This patch **fails Scala style tests**.
 * This patch merges cleanly.
 * This patch adds no public classes.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #22946: [SPARK-25943][SQL] Fail if mismatching nested struct fie...

2018-11-05 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/22946
  
Test FAILed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/98500/
Test FAILed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #22945: [SPARK-24066][SQL]Add new optimization rule to eliminate...

2018-11-05 Thread heary-cao

Github user heary-cao commented on the issue:

https://github.com/apache/spark/pull/22945
  
ping @cloud-fan


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #22946: [SPARK-25943][SQL] Fail if mismatching nested struct fie...

2018-11-05 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/22946
  
Merged build finished. Test FAILed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #22946: [SPARK-25943][SQL] Fail if mismatching nested struct fie...

2018-11-05 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/22946
  
**[Test build #98500 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/98500/testReport)**
 for PR 22946 at commit 
[`63dd40f`](https://github.com/apache/spark/commit/63dd40f47ab8e8e9c120a9801b2f037336001ea6).


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #22946: [SPARK-25943][SQL] Fail if mismatching nested struct fie...

2018-11-05 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/22946
  
Merged build finished. Test FAILed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #22547: [SPARK-25528][SQL] data source V2 read side API r...

2018-11-05 Thread mccheah

Github user mccheah commented on a diff in the pull request:

https://github.com/apache/spark/pull/22547#discussion_r230973917
  
--- Diff: 
external/kafka-0-10-sql/src/main/scala/org/apache/spark/sql/kafka010/KafkaContinuousInputStream.scala
 ---
@@ -46,17 +45,22 @@ import org.apache.spark.sql.types.StructType
  *   scenarios, where some offsets after the specified 
initial ones can't be
  *   properly read.
  */
-class KafkaContinuousReadSupport(
+class KafkaContinuousInputStream(
--- End diff --

Makes sense. I really consider this to be a blocker on getting this merged 
and approved. It's difficult to have confidence in a review over such a large 
change. Thoughts @cloud-fan @rdblue?


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #22946: [SPARK-25943][SQL] Fail if mismatching nested struct fie...

2018-11-05 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/22946
  
Test FAILed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/98499/
Test FAILed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #22946: [SPARK-25943][SQL] Fail if mismatching nested struct fie...

2018-11-05 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/22946
  
**[Test build #98499 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/98499/testReport)**
 for PR 22946 at commit 
[`63dd40f`](https://github.com/apache/spark/commit/63dd40f47ab8e8e9c120a9801b2f037336001ea6).
 * This patch **fails Scala style tests**.
 * This patch merges cleanly.
 * This patch adds no public classes.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #22946: [SPARK-25943][SQL] Fail if mismatching nested struct fie...

2018-11-05 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/22946
  
**[Test build #98499 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/98499/testReport)**
 for PR 22946 at commit 
[`63dd40f`](https://github.com/apache/spark/commit/63dd40f47ab8e8e9c120a9801b2f037336001ea6).


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #22946: [SPARK-25943][SQL] Fail if mismatching nested struct fie...

2018-11-05 Thread cloud-fan

Github user cloud-fan commented on the issue:

https://github.com/apache/spark/pull/22946
  
ok to test


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #22928: [SPARK-25926][CORE] Move config entries in core m...

2018-11-05 Thread asfgit

Github user asfgit closed the pull request at:

https://github.com/apache/spark/pull/22928


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #22946: [SPARK-25943][SQL] Fail if mismatching nested struct fie...

2018-11-05 Thread cloud-fan

Github user cloud-fan commented on the issue:

https://github.com/apache/spark/pull/22946
  
ah good catch! Can you also add a test?


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #21306: [SPARK-24252][SQL] Add catalog registration and t...

2018-11-05 Thread mccheah

Github user mccheah commented on a diff in the pull request:

https://github.com/apache/spark/pull/21306#discussion_r230973235
  
--- Diff: 
sql/catalyst/src/main/java/org/apache/spark/sql/catalog/v2/TableChange.java ---
@@ -0,0 +1,182 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *   http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.spark.sql.catalog.v2;
+
+import org.apache.spark.sql.types.DataType;
+
+/**
+ * TableChange subclasses represent requested changes to a table. These 
are passed to
+ * {@link TableCatalog#alterTable}. For example,
+ * 
+ *   import TableChange._
+ *   val catalog = source.asInstanceOf[TableSupport].catalog()
+ *   catalog.alterTable(ident,
+ *   addColumn("x", IntegerType),
+ *   renameColumn("a", "b"),
+ *   deleteColumn("c")
+ * )
+ * 
+ */
+public interface TableChange {
--- End diff --

Would it be a valid operation to change the partitioning of the table 
without dropping the entire table and re-creating it? E.g. change the bucket 
size for such and such column. Seems pretty difficult to do in practice though 
since the underlying data layout would have to change as part of the 
modification.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #21306: [SPARK-24252][SQL] Add catalog registration and t...

2018-11-05 Thread mccheah

Github user mccheah commented on a diff in the pull request:

https://github.com/apache/spark/pull/21306#discussion_r230972839
  
--- Diff: 
sql/catalyst/src/main/java/org/apache/spark/sql/catalog/v2/Table.java ---
@@ -0,0 +1,46 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *   http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.spark.sql.catalog.v2;
+
+import org.apache.spark.sql.types.StructType;
+
+import java.util.List;
+import java.util.Map;
+
+/**
+ * Represents table metadata from a {@link TableCatalog} or other table 
sources.
+ */
+public interface Table {
--- End diff --

The nomenclature here appears to conflict with @cloud-fan's refactor in 
https://github.com/apache/spark/pull/22547/files#diff-45399ef5eed5c873d5f12bf0f1671b8fR40.
 Maybe we can call this `TableMetadata` or `TableDescription`? Or perhaps we 
rename the other construct?


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #22502: [SPARK-25474][SQL]When the "fallBackToHdfsForStats= true...

2018-11-05 Thread cloud-fan

Github user cloud-fan commented on the issue:

https://github.com/apache/spark/pull/22502
  
@shahidki31 thanks for fixing it!

Do you know where we read `fallBackToHdfsForStats` currently and see if we 
can have a unified place to do it?


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #22889: [SPARK-25882][SQL] Added a function to join two datasets...

2018-11-05 Thread cloud-fan

Github user cloud-fan commented on the issue:

https://github.com/apache/spark/pull/22889
  
I think the problem is real, maybe we should not use `Seq` in the end-user 
API, but always use Array to be more Java-friendly. This can also avoid bugs 
like https://github.com/apache/spark/pull/22789

cc @rxin @hvanhovell what do you think?


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #22949: [minor] update known_translations

2018-11-05 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/22949
  
**[Test build #98498 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/98498/testReport)**
 for PR 22949 at commit 
[`9808e43`](https://github.com/apache/spark/commit/9808e434bbfd7f703becc48a7d4fe3628131ad58).


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #22949: [minor] update known_translations

2018-11-05 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/22949
  
Merged build finished. Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #22949: [minor] update known_translations

2018-11-05 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/22949
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 

https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution-unified/4785/
Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #22953: [SPARK-25946] [BUILD] Upgrade ASM to 7.x to support JDK1...

2018-11-05 Thread dbtsai

Github user dbtsai commented on the issue:

https://github.com/apache/spark/pull/22953
  
cc @gatorsmile @srowen @HyukjinKwon  


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #22943: [SPARK-25098][SQL] Trim the string when cast stri...

2018-11-05 Thread wangyum

Github user wangyum commented on a diff in the pull request:

https://github.com/apache/spark/pull/22943#discussion_r230970713
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/Cast.scala
 ---
@@ -359,7 +359,7 @@ case class Cast(child: Expression, dataType: DataType, 
timeZoneId: Option[String
   // TimestampConverter
   private[this] def castToTimestamp(from: DataType): Any => Any = from 
match {
 case StringType =>
-  buildCast[UTF8String](_, utfs => 
DateTimeUtils.stringToTimestamp(utfs, timeZone).orNull)
+  buildCast[UTF8String](_, s => 
DateTimeUtils.stringToTimestamp(s.trim(), timeZone).orNull)
--- End diff --

How about change `stringToDate` to `trimStringToDate` and update 
`trimStringToDate` to:

![image](https://user-images.githubusercontent.com/5399861/48036353-ec49a100-e1a2-11e8-80e6-b52b4a007493.png)



---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #22949: [minor] update known_translations

2018-11-05 Thread cloud-fan

Github user cloud-fan commented on the issue:

https://github.com/apache/spark/pull/22949
  
Note that, these updates are generated by the script not me. If someone is 
not in the list, it means the script can figure out the full name without 
translation.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #22949: [minor] update known_translations

2018-11-05 Thread cloud-fan

Github user cloud-fan commented on a diff in the pull request:

https://github.com/apache/spark/pull/22949#discussion_r230970453
  
--- Diff: dev/create-release/known_translations ---
@@ -203,3 +203,61 @@ shenh062326 - Shen Hong
 aokolnychyi - Anton Okolnychyi
 linbojin - Linbo Jin
 lw-lin - Liwei Lin
+10110346 - Xian Liu
+Achuth17 - Achuth Narayan Rajagopal
+Adamyuanyuan - Adam Wang
+DylanGuedes - Dylan Guedes
+JiahuiJiang - Jiahui Jiang
+KevinZwx - Kevin Zhang
+LantaoJin - Lantao Jin
+Lemonjing - Rann Tao
+LucaCanali - Luca Canali
+XD-DENG - Xiaodong Deng
+aai95 - Aleksei Izmalkin
+akonopko - Alexander Konopko
+ankuriitg - Ankur Gupta
+arucard21 - Riaas Mokiem
+attilapiros - Attila Zsolt Piros
+bravo-zhang - Bravo Zhang
+caneGuy - Kang Zhou
+chaoslawful - Xiaozhe Wang
+cluo512 - Chuan Luo
+codeatri - Neha Patil
+crafty-coder - Carlos Pena
+debugger87 - Chaozhong Yang
+e-dorigatti - Emilio Dorigatti
+eric-maynard - Eric Maynard
+felixalbani - Felix Albani
+fjh100456 - fjh100456
--- End diff --

ah I missed this one.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #22760: [SPARK-25751][K8S][TEST] Unit Testing for Kerberos Suppo...

2018-11-05 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/22760
  
Merged build finished. Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #22760: [SPARK-25751][K8S][TEST] Unit Testing for Kerberos Suppo...

2018-11-05 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/22760
  
Kubernetes integration test status success
URL: 
https://amplab.cs.berkeley.edu/jenkins/job/testing-k8s-prb-make-spark-distribution-unified/4784/



---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #22760: [SPARK-25751][K8S][TEST] Unit Testing for Kerberos Suppo...

2018-11-05 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/22760
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 

https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution-unified/4784/
Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #22087: [SPARK-25097][ML] Support prediction on single in...

2018-11-05 Thread viirya

Github user viirya commented on a diff in the pull request:

https://github.com/apache/spark/pull/22087#discussion_r230968204
  
--- Diff: mllib/src/test/scala/org/apache/spark/ml/util/MLTest.scala ---
@@ -155,4 +155,16 @@ trait MLTest extends StreamTest with TempDirectory { 
self: Suite =>
 assert(prediction === model.predict(features))
 }
   }
+
+  def testClusteringModelSinglePrediction(model: Model[_],
+  transform: Vector => Int,
+  dataset: Dataset[_],
+  input: String,
+  output: String): Unit = {
--- End diff --

I think we should use 2 spaces ident?


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

1 2 3 4 5 >

1 - 100 of 427 matches

Mail list logo