[GitHub] spark issue #22818: [SPARK-25904][CORE] Allocate arrays smaller than Int.Max...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/22818 **[Test build #98504 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/98504/testReport)** for PR 22818 at commit [`361bf02`](https://github.com/apache/spark/commit/361bf02bbe3f78c7a68e93b77fbc9a0ccf39b47a). --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #22939: [SPARK-25446][R] Add schema_of_json() and schema_...
Github user felixcheung commented on a diff in the pull request: https://github.com/apache/spark/pull/22939#discussion_r231025592 --- Diff: R/pkg/R/functions.R --- @@ -205,11 +205,18 @@ NULL #' also supported for the schema. #' \item \code{from_csv}: a DDL-formatted string #' } -#' @param ... additional argument(s). In \code{to_json}, \code{to_csv} and \code{from_json}, -#'this contains additional named properties to control how it is converted, accepts -#'the same options as the JSON/CSV data source. Additionally \code{to_json} supports -#'the "pretty" option which enables pretty JSON generation. In \code{arrays_zip}, -#'this contains additional Columns of arrays to be merged. +#' @param ... additional argument(s). +#' \itemize{ +#' \item \code{to_json}, \code{from_json} and \code{schema_of_json}: this contains +#' additional named properties to control how it is converted and accepts the +#' same options as the JSON data source. +#' \item \code{to_json}: it supports the "pretty" option which enables pretty --- End diff -- I know it's there before but I'd like to suggest to give an example - doc or code example below. it's a bit different from python/scala I think --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22818: [SPARK-25904][CORE] Allocate arrays smaller than Int.Max...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/22818 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution-unified/4788/ Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #22939: [SPARK-25446][R] Add schema_of_json() and schema_...
Github user felixcheung commented on a diff in the pull request: https://github.com/apache/spark/pull/22939#discussion_r231025282 --- Diff: R/pkg/R/functions.R --- @@ -2230,6 +2237,32 @@ setMethod("from_json", signature(x = "Column", schema = "characterOrstructType") column(jc) }) +#' @details +#' \code{schema_of_json}: Parses a JSON string and infers its schema in DDL format. +#' +#' @rdname column_collection_functions +#' @aliases schema_of_json schema_of_json,characterOrColumn-method +#' @examples +#' +#' \dontrun{ +#' json <- '{"name":"Bob"}' +#' df <- sql("SELECT * FROM range(1)") +#' head(select(df, schema_of_json(json)))} +#' @note schema_of_json since 3.0.0 +setMethod("schema_of_json", signature(x = "characterOrColumn"), + function(x, ...) { +if (class(x) == "character") { + col <- callJStatic("org.apache.spark.sql.functions", "lit", x) +} else { + col <- x@jc --- End diff -- ok but one use could be `select(df, schema_of_csv(df$schemaCol))` like an actual col not a literal string? --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22818: [SPARK-25904][CORE] Allocate arrays smaller than Int.Max...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/22818 Merged build finished. Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #22921: [SPARK-25908][CORE][SQL] Remove old deprecated it...
Github user felixcheung commented on a diff in the pull request: https://github.com/apache/spark/pull/22921#discussion_r231024007 --- Diff: R/pkg/R/functions.R --- @@ -1641,30 +1641,30 @@ setMethod("tanh", }) #' @details -#' \code{toDegrees}: Converts an angle measured in radians to an approximately equivalent angle +#' \code{degrees}: Converts an angle measured in radians to an approximately equivalent angle #' measured in degrees. #' #' @rdname column_math_functions -#' @aliases toDegrees toDegrees,Column-method -#' @note toDegrees since 1.4.0 -setMethod("toDegrees", +#' @aliases degrees degrees,Column-method +#' @note degrees since 2.1.0 --- End diff -- yes.. (version here is R API specific) --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #22921: [SPARK-25908][CORE][SQL] Remove old deprecated it...
Github user felixcheung commented on a diff in the pull request: https://github.com/apache/spark/pull/22921#discussion_r231023768 --- Diff: R/pkg/R/generics.R --- @@ -748,7 +748,7 @@ setGeneric("add_months", function(y, x) { standardGeneric("add_months") }) #' @rdname column_aggregate_functions #' @name NULL -setGeneric("approxCountDistinct", function(x, ...) { standardGeneric("approxCountDistinct") }) +setGeneric("approx_count_distinct", function(x, ...) { standardGeneric("approx_count_distinct") }) --- End diff -- I think it's super light weight to have a `approxCountDistinct` that calls `approx_count_distinct` with deprecation? I thought was that R API was not always sync or complete compare to python, and a breaking API change - ie. the job will fail - seems a bit drastic even in a major release. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22502: [SPARK-25474][SQL]When the "fallBackToHdfsForStats= true...
Github user shahidki31 commented on the issue: https://github.com/apache/spark/pull/22502 @cloud-fan Thanks. I will check and update the PR. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22952: [SPARK-20568][SS] Rename files which are completed in pr...
Github user HeartSaVioR commented on the issue: https://github.com/apache/spark/pull/22952 cc. @zsxwing --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22305: [SPARK-24561][SQL][Python] User-defined window aggregati...
Github user ueshin commented on the issue: https://github.com/apache/spark/pull/22305 I'll do a review too hopefully this week. Sorry for the delay. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #22937: [SPARK-25934] [Mesos] Don't propagate SPARK_CONF_...
Github user dongjoon-hyun commented on a diff in the pull request: https://github.com/apache/spark/pull/22937#discussion_r231006006 --- Diff: core/src/main/scala/org/apache/spark/deploy/rest/RestSubmissionClient.scala --- @@ -418,8 +418,8 @@ private[spark] object RestSubmissionClient { private[rest] def filterSystemEnvironment(env: Map[String, String]): Map[String, String] = { env.filterKeys { k => // SPARK_HOME is filtered out because it is usually wrong on the remote machine (SPARK-12345) - (k.startsWith("SPARK_") && k != "SPARK_ENV_LOADED" && k != "SPARK_HOME") || -k.startsWith("MESOS_") + (k.startsWith("SPARK_") && k != "SPARK_ENV_LOADED" && k != "SPARK_HOME" +&& k != "SPARK_CONF_DIR") || k.startsWith("MESOS_") --- End diff -- Could you add a test case in `StandaloneRestSubmitSuite.scala` in order to prevent a future regression? --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22823: [SPARK-25676][SQL][TEST] Rename and refactor BenchmarkWi...
Github user dongjoon-hyun commented on the issue: https://github.com/apache/spark/pull/22823 Could you review and merge https://github.com/yucai/spark/pull/7, @yucai ? --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22305: [SPARK-24561][SQL][Python] User-defined window aggregati...
Github user HyukjinKwon commented on the issue: https://github.com/apache/spark/pull/22305 Let me try to take a look this weekends. Sorry it's been delayed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22823: [SPARK-25676][SQL][TEST] Rename and refactor BenchmarkWi...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/22823 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/98501/ Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22823: [SPARK-25676][SQL][TEST] Rename and refactor BenchmarkWi...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/22823 Merged build finished. Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22823: [SPARK-25676][SQL][TEST] Rename and refactor BenchmarkWi...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/22823 **[Test build #98501 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/98501/testReport)** for PR 22823 at commit [`f714cc8`](https://github.com/apache/spark/commit/f714cc8795568311dee3b0c93901d16eadc198fb). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #22953: [SPARK-25946] [BUILD] Upgrade ASM to 7.x to suppo...
Github user dbtsai closed the pull request at: https://github.com/apache/spark/pull/22953 --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22953: [SPARK-25946] [BUILD] Upgrade ASM to 7.x to support JDK1...
Github user dbtsai commented on the issue: https://github.com/apache/spark/pull/22953 Thanks. Merged into master. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #22944: [SPARK-25942][SQL] Fix Dataset.groupByKey to make...
Github user viirya commented on a diff in the pull request: https://github.com/apache/spark/pull/22944#discussion_r231002129 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/plans/logical/object.scala --- @@ -262,25 +262,39 @@ object AppendColumns { def apply[T : Encoder, U : Encoder]( func: T => U, child: LogicalPlan): AppendColumns = { +val outputEncoder = encoderFor[U] +val namedExpressions = if (!outputEncoder.isSerializedAsStruct) { + assert(outputEncoder.namedExpressions.length == 1) + outputEncoder.namedExpressions.map(Alias(_, "key")()) +} else { + outputEncoder.namedExpressions +} new AppendColumns( func.asInstanceOf[Any => Any], implicitly[Encoder[T]].clsTag.runtimeClass, implicitly[Encoder[T]].schema, UnresolvedDeserializer(encoderFor[T].deserializer), - encoderFor[U].namedExpressions, + namedExpressions, child) } def apply[T : Encoder, U : Encoder]( func: T => U, inputAttributes: Seq[Attribute], child: LogicalPlan): AppendColumns = { +val outputEncoder = encoderFor[U] +val namedExpressions = if (!outputEncoder.isSerializedAsStruct) { + assert(outputEncoder.namedExpressions.length == 1) + outputEncoder.namedExpressions.map(Alias(_, "key")()) +} else { + outputEncoder.namedExpressions --- End diff -- For option of product, I think it is due to typed select. I will address it at #21732. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22138: [SPARK-25151][SS] Apply Apache Commons Pool to KafkaData...
Github user HeartSaVioR commented on the issue: https://github.com/apache/spark/pull/22138 @zsxwing Given that Spark 2.4 vote passes, could we revisit and make progress on this? --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #22944: [SPARK-25942][SQL] Fix Dataset.groupByKey to make...
Github user viirya commented on a diff in the pull request: https://github.com/apache/spark/pull/22944#discussion_r231001655 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/plans/logical/object.scala --- @@ -262,25 +262,39 @@ object AppendColumns { def apply[T : Encoder, U : Encoder]( func: T => U, child: LogicalPlan): AppendColumns = { +val outputEncoder = encoderFor[U] +val namedExpressions = if (!outputEncoder.isSerializedAsStruct) { + assert(outputEncoder.namedExpressions.length == 1) + outputEncoder.namedExpressions.map(Alias(_, "key")()) +} else { + outputEncoder.namedExpressions +} new AppendColumns( func.asInstanceOf[Any => Any], implicitly[Encoder[T]].clsTag.runtimeClass, implicitly[Encoder[T]].schema, UnresolvedDeserializer(encoderFor[T].deserializer), - encoderFor[U].namedExpressions, + namedExpressions, child) } def apply[T : Encoder, U : Encoder]( func: T => U, inputAttributes: Seq[Attribute], child: LogicalPlan): AppendColumns = { +val outputEncoder = encoderFor[U] +val namedExpressions = if (!outputEncoder.isSerializedAsStruct) { + assert(outputEncoder.namedExpressions.length == 1) + outputEncoder.namedExpressions.map(Alias(_, "key")()) +} else { + outputEncoder.namedExpressions --- End diff -- For primitive type and product type, looks like it works: ```scala test("typed aggregation on primitive data") { val ds = Seq(1, 2, 3).toDS() val agg = ds.select(expr("value").as("data").as[Int]) .groupByKey(_ >= 2) .agg(sum("data").as[Long], sum($"data" + 1).as[Long]) agg.show() } ``` ``` +-+-+---+ |value|sum(data)|sum((data + 1))| +-+-+---+ |false|1| 2| | true|5| 7| +-+-+---+ ``` ```scala test("typed aggregation on product data") { val ds = Seq((1, 2), (2, 3), (3, 4)).toDS() val agg = ds.select(expr("_1").as("a").as[Int], expr("_2").as("b").as[Int]) .groupByKey(_._1).agg(sum("a").as[Int], sum($"b" + 1).as[Int]) agg.show } ``` ``` [info] - typed aggregation on primitive data (192 milliseconds) +-+--++ |value|sum(a)|sum((b + 1))| +-+--++ |3| 3| 5| |1| 1| 3| |2| 2| 4| +-+--++ ``` --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22823: [SPARK-25676][SQL][TEST] Refactor BenchmarkWideTable to ...
Github user dongjoon-hyun commented on the issue: https://github.com/apache/spark/pull/22823 Thank you, @yucai . Could you update the title because we are renaming it? Maybe, `[SPARK-25676][SQL][TEST] Rename and refactor BenchmarkWideTable to use main method`? --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22949: [minor] update known_translations
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/22949 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/98498/ Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22949: [minor] update known_translations
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/22949 Merged build finished. Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22949: [minor] update known_translations
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/22949 **[Test build #98498 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/98498/testReport)** for PR 22949 at commit [`9808e43`](https://github.com/apache/spark/commit/9808e434bbfd7f703becc48a7d4fe3628131ad58). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #17086: [SPARK-24101][ML][MLLIB] ML Evaluators should use weight...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/17086 Merged build finished. Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #17086: [SPARK-24101][ML][MLLIB] ML Evaluators should use weight...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/17086 **[Test build #98503 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/98503/testReport)** for PR 17086 at commit [`0ca102a`](https://github.com/apache/spark/commit/0ca102af8aad76259d8026b0fbeabe5b277e3962). * This patch passes all tests. * This patch merges cleanly. * This patch adds the following public classes _(experimental)_: * `class MulticlassMetrics @Since(\"3.0.0\") (predAndLabelsWithOptWeight: RDD[_]) ` --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #17086: [SPARK-24101][ML][MLLIB] ML Evaluators should use weight...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/17086 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/98503/ Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #22943: [SPARK-25098][SQL] Trim the string when cast stri...
Github user dongjoon-hyun commented on a diff in the pull request: https://github.com/apache/spark/pull/22943#discussion_r230998508 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/Cast.scala --- @@ -359,7 +359,7 @@ case class Cast(child: Expression, dataType: DataType, timeZoneId: Option[String // TimestampConverter private[this] def castToTimestamp(from: DataType): Any => Any = from match { case StringType => - buildCast[UTF8String](_, utfs => DateTimeUtils.stringToTimestamp(utfs, timeZone).orNull) + buildCast[UTF8String](_, s => DateTimeUtils.stringToTimestamp(s.trim(), timeZone).orNull) --- End diff -- Ur, I'd like not to rename it. One line function document will suffice. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #22944: [SPARK-25942][SQL] Fix Dataset.groupByKey to make...
Github user cloud-fan commented on a diff in the pull request: https://github.com/apache/spark/pull/22944#discussion_r230997935 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/plans/logical/object.scala --- @@ -262,25 +262,39 @@ object AppendColumns { def apply[T : Encoder, U : Encoder]( func: T => U, child: LogicalPlan): AppendColumns = { +val outputEncoder = encoderFor[U] +val namedExpressions = if (!outputEncoder.isSerializedAsStruct) { + assert(outputEncoder.namedExpressions.length == 1) + outputEncoder.namedExpressions.map(Alias(_, "key")()) +} else { + outputEncoder.namedExpressions +} new AppendColumns( func.asInstanceOf[Any => Any], implicitly[Encoder[T]].clsTag.runtimeClass, implicitly[Encoder[T]].schema, UnresolvedDeserializer(encoderFor[T].deserializer), - encoderFor[U].namedExpressions, + namedExpressions, child) } def apply[T : Encoder, U : Encoder]( func: T => U, inputAttributes: Seq[Attribute], child: LogicalPlan): AppendColumns = { +val outputEncoder = encoderFor[U] +val namedExpressions = if (!outputEncoder.isSerializedAsStruct) { + assert(outputEncoder.namedExpressions.length == 1) + outputEncoder.namedExpressions.map(Alias(_, "key")()) +} else { + outputEncoder.namedExpressions --- End diff -- is this a special case of option of product? can you try pritimive type and product type? --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #22944: [SPARK-25942][SQL] Fix Dataset.groupByKey to make...
Github user viirya commented on a diff in the pull request: https://github.com/apache/spark/pull/22944#discussion_r230995240 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/plans/logical/object.scala --- @@ -262,25 +262,39 @@ object AppendColumns { def apply[T : Encoder, U : Encoder]( func: T => U, child: LogicalPlan): AppendColumns = { +val outputEncoder = encoderFor[U] +val namedExpressions = if (!outputEncoder.isSerializedAsStruct) { + assert(outputEncoder.namedExpressions.length == 1) + outputEncoder.namedExpressions.map(Alias(_, "key")()) +} else { + outputEncoder.namedExpressions +} new AppendColumns( func.asInstanceOf[Any => Any], implicitly[Encoder[T]].clsTag.runtimeClass, implicitly[Encoder[T]].schema, UnresolvedDeserializer(encoderFor[T].deserializer), - encoderFor[U].namedExpressions, + namedExpressions, child) } def apply[T : Encoder, U : Encoder]( func: T => U, inputAttributes: Seq[Attribute], child: LogicalPlan): AppendColumns = { +val outputEncoder = encoderFor[U] +val namedExpressions = if (!outputEncoder.isSerializedAsStruct) { + assert(outputEncoder.namedExpressions.length == 1) + outputEncoder.namedExpressions.map(Alias(_, "key")()) +} else { + outputEncoder.namedExpressions --- End diff -- I just tried to manually add alias. It seems not working as we expect: ```scala val ds = Seq(Some(("a", 10)), Some(("a", 20)), Some(("b", 1)), Some(("b", 2)), Some(("c", 1)), None).toDS() val newDS = ds.select(expr("value").as("opt").as[Option[(String, Int)]]) // schema root |-- value: struct (nullable = true) ||-- _1: string (nullable = true) ||-- _2: integer (nullable = false) // physical plan *(1) SerializeFromObject [if (isnull(unwrapoption(ObjectType(class scala.Tuple2), input[0, scala.Option, true]))) null else named_struct(_1, staticinvo ke(class org.apache.spark.unsafe.types.UTF8String, StringType, fromString, assertnotnull(unwrapoption(ObjectType(class scala.Tuple2), input[0, scala.Op tion, true]))._1, true, false), _2, assertnotnull(unwrapoption(ObjectType(class scala.Tuple2), input[0, scala.Option, true]))._2) AS value#5482] +- *(1) MapElements , obj#5481: scala.Option +- *(1) DeserializeToObject newInstance(class scala.Tuple1), obj#5480: scala.Tuple1 +- *(1) Project [value#5473 AS opt#5476] +- LocalTableScan [value#5473] ``` So even we add alias before `groupByKey`, it can't change dataset's output names. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22953: [SPARK-25946] [BUILD] Upgrade ASM to 7.x to support JDK1...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/22953 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/98496/ Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22953: [SPARK-25946] [BUILD] Upgrade ASM to 7.x to support JDK1...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/22953 Merged build finished. Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22953: [SPARK-25946] [BUILD] Upgrade ASM to 7.x to support JDK1...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/22953 **[Test build #98496 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/98496/testReport)** for PR 22953 at commit [`7b19dc8`](https://github.com/apache/spark/commit/7b19dc8616670c8db4b853e7fcfd192f8f55e09a). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #17086: [SPARK-24101][ML][MLLIB] ML Evaluators should use weight...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/17086 Merged build finished. Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #17086: [SPARK-24101][ML][MLLIB] ML Evaluators should use weight...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/17086 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution-unified/4787/ Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #17086: [SPARK-24101][ML][MLLIB] ML Evaluators should use weight...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/17086 **[Test build #98503 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/98503/testReport)** for PR 17086 at commit [`0ca102a`](https://github.com/apache/spark/commit/0ca102af8aad76259d8026b0fbeabe5b277e3962). --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #17086: [SPARK-24101][ML][MLLIB] ML Evaluators should use...
Github user imatiach-msft commented on a diff in the pull request: https://github.com/apache/spark/pull/17086#discussion_r230992722 --- Diff: mllib/src/main/scala/org/apache/spark/ml/evaluation/MulticlassClassificationEvaluator.scala --- @@ -67,6 +68,10 @@ class MulticlassClassificationEvaluator @Since("1.5.0") (@Since("1.5.0") overrid @Since("1.5.0") def setLabelCol(value: String): this.type = set(labelCol, value) + /** @group setParam */ + @Since("2.4.0") --- End diff -- done! --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #17086: [SPARK-24101][ML][MLLIB] ML Evaluators should use...
Github user imatiach-msft commented on a diff in the pull request: https://github.com/apache/spark/pull/17086#discussion_r230992384 --- Diff: mllib/src/main/scala/org/apache/spark/mllib/evaluation/MulticlassMetrics.scala --- @@ -27,10 +27,17 @@ import org.apache.spark.sql.DataFrame /** * Evaluator for multiclass classification. * - * @param predictionAndLabels an RDD of (prediction, label) pairs. + * @param predAndLabelsWithOptWeight an RDD of (prediction, label, weight) or + * (prediction, label) pairs. */ @Since("1.1.0") -class MulticlassMetrics @Since("1.1.0") (predictionAndLabels: RDD[(Double, Double)]) { +class MulticlassMetrics @Since("2.4.0") (predAndLabelsWithOptWeight: RDD[_]) { --- End diff -- thanks! yes, it is backwards compatible. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #17086: [SPARK-24101][ML][MLLIB] ML Evaluators should use...
Github user imatiach-msft commented on a diff in the pull request: https://github.com/apache/spark/pull/17086#discussion_r230992285 --- Diff: mllib/src/test/scala/org/apache/spark/mllib/evaluation/MulticlassMetricsSuite.scala --- @@ -18,10 +18,14 @@ package org.apache.spark.mllib.evaluation import org.apache.spark.SparkFunSuite -import org.apache.spark.mllib.linalg.Matrices +import org.apache.spark.ml.linalg.Matrices +import org.apache.spark.ml.util.TestingUtils._ import org.apache.spark.mllib.util.MLlibTestSparkContext class MulticlassMetricsSuite extends SparkFunSuite with MLlibTestSparkContext { + + val delta = 1e-7 --- End diff -- done! --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22087: [SPARK-25097][ML] Support prediction on single instance ...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/22087 Merged build finished. Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22087: [SPARK-25097][ML] Support prediction on single instance ...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/22087 **[Test build #98502 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/98502/testReport)** for PR 22087 at commit [`fb3c6d1`](https://github.com/apache/spark/commit/fb3c6d1f933b807c89e5892dadf8654ec280d3b5). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22087: [SPARK-25097][ML] Support prediction on single instance ...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/22087 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/98502/ Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #22944: [SPARK-25942][SQL] Fix Dataset.groupByKey to make...
Github user viirya commented on a diff in the pull request: https://github.com/apache/spark/pull/22944#discussion_r230989493 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/plans/logical/object.scala --- @@ -262,25 +262,39 @@ object AppendColumns { def apply[T : Encoder, U : Encoder]( func: T => U, child: LogicalPlan): AppendColumns = { +val outputEncoder = encoderFor[U] +val namedExpressions = if (!outputEncoder.isSerializedAsStruct) { + assert(outputEncoder.namedExpressions.length == 1) + outputEncoder.namedExpressions.map(Alias(_, "key")()) +} else { + outputEncoder.namedExpressions +} new AppendColumns( func.asInstanceOf[Any => Any], implicitly[Encoder[T]].clsTag.runtimeClass, implicitly[Encoder[T]].schema, UnresolvedDeserializer(encoderFor[T].deserializer), - encoderFor[U].namedExpressions, + namedExpressions, child) } def apply[T : Encoder, U : Encoder]( func: T => U, inputAttributes: Seq[Attribute], child: LogicalPlan): AppendColumns = { +val outputEncoder = encoderFor[U] +val namedExpressions = if (!outputEncoder.isSerializedAsStruct) { + assert(outputEncoder.namedExpressions.length == 1) + outputEncoder.namedExpressions.map(Alias(_, "key")()) +} else { + outputEncoder.namedExpressions --- End diff -- Ok. Sounds good. Let me do the change. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #22547: [SPARK-25528][SQL] data source V2 read side API r...
Github user cloud-fan commented on a diff in the pull request: https://github.com/apache/spark/pull/22547#discussion_r230989559 --- Diff: external/kafka-0-10-sql/src/main/scala/org/apache/spark/sql/kafka010/KafkaContinuousInputStream.scala --- @@ -46,17 +45,22 @@ import org.apache.spark.sql.types.StructType * scenarios, where some offsets after the specified initial ones can't be * properly read. */ -class KafkaContinuousReadSupport( +class KafkaContinuousInputStream( --- End diff -- Yea I'll separate this PR into 3 smaller ones, after we have agreed on the high-level design at https://docs.google.com/document/d/1uUmKCpWLdh9vHxP7AWJ9EgbwB_U6T3EJYNjhISGmiQg/edit?usp=sharing --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #22944: [SPARK-25942][SQL] Fix Dataset.groupByKey to make...
Github user cloud-fan commented on a diff in the pull request: https://github.com/apache/spark/pull/22944#discussion_r230989073 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/plans/logical/object.scala --- @@ -262,25 +262,39 @@ object AppendColumns { def apply[T : Encoder, U : Encoder]( func: T => U, child: LogicalPlan): AppendColumns = { +val outputEncoder = encoderFor[U] +val namedExpressions = if (!outputEncoder.isSerializedAsStruct) { + assert(outputEncoder.namedExpressions.length == 1) + outputEncoder.namedExpressions.map(Alias(_, "key")()) +} else { + outputEncoder.namedExpressions +} new AppendColumns( func.asInstanceOf[Any => Any], implicitly[Encoder[T]].clsTag.runtimeClass, implicitly[Encoder[T]].schema, UnresolvedDeserializer(encoderFor[T].deserializer), - encoderFor[U].namedExpressions, + namedExpressions, child) } def apply[T : Encoder, U : Encoder]( func: T => U, inputAttributes: Seq[Attribute], child: LogicalPlan): AppendColumns = { +val outputEncoder = encoderFor[U] +val namedExpressions = if (!outputEncoder.isSerializedAsStruct) { + assert(outputEncoder.namedExpressions.length == 1) + outputEncoder.namedExpressions.map(Alias(_, "key")()) +} else { + outputEncoder.namedExpressions --- End diff -- We can improve the `CheckAnalysis` to detect this case, and improve the error message to ask users to do alias. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #22944: [SPARK-25942][SQL] Fix Dataset.groupByKey to make...
Github user viirya commented on a diff in the pull request: https://github.com/apache/spark/pull/22944#discussion_r230986888 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/plans/logical/object.scala --- @@ -262,25 +262,39 @@ object AppendColumns { def apply[T : Encoder, U : Encoder]( func: T => U, child: LogicalPlan): AppendColumns = { +val outputEncoder = encoderFor[U] +val namedExpressions = if (!outputEncoder.isSerializedAsStruct) { + assert(outputEncoder.namedExpressions.length == 1) + outputEncoder.namedExpressions.map(Alias(_, "key")()) +} else { + outputEncoder.namedExpressions +} new AppendColumns( func.asInstanceOf[Any => Any], implicitly[Encoder[T]].clsTag.runtimeClass, implicitly[Encoder[T]].schema, UnresolvedDeserializer(encoderFor[T].deserializer), - encoderFor[U].namedExpressions, + namedExpressions, child) } def apply[T : Encoder, U : Encoder]( func: T => U, inputAttributes: Seq[Attribute], child: LogicalPlan): AppendColumns = { +val outputEncoder = encoderFor[U] +val namedExpressions = if (!outputEncoder.isSerializedAsStruct) { + assert(outputEncoder.namedExpressions.length == 1) + outputEncoder.namedExpressions.map(Alias(_, "key")()) +} else { + outputEncoder.namedExpressions --- End diff -- You mean we detect conflicting case, and show some error messages to ask users for that? --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #22944: [SPARK-25942][SQL] Fix Dataset.groupByKey to make...
Github user cloud-fan commented on a diff in the pull request: https://github.com/apache/spark/pull/22944#discussion_r230986226 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/plans/logical/object.scala --- @@ -262,25 +262,39 @@ object AppendColumns { def apply[T : Encoder, U : Encoder]( func: T => U, child: LogicalPlan): AppendColumns = { +val outputEncoder = encoderFor[U] +val namedExpressions = if (!outputEncoder.isSerializedAsStruct) { + assert(outputEncoder.namedExpressions.length == 1) + outputEncoder.namedExpressions.map(Alias(_, "key")()) +} else { + outputEncoder.namedExpressions +} new AppendColumns( func.asInstanceOf[Any => Any], implicitly[Encoder[T]].clsTag.runtimeClass, implicitly[Encoder[T]].schema, UnresolvedDeserializer(encoderFor[T].deserializer), - encoderFor[U].namedExpressions, + namedExpressions, child) } def apply[T : Encoder, U : Encoder]( func: T => U, inputAttributes: Seq[Attribute], child: LogicalPlan): AppendColumns = { +val outputEncoder = encoderFor[U] +val namedExpressions = if (!outputEncoder.isSerializedAsStruct) { + assert(outputEncoder.namedExpressions.length == 1) + outputEncoder.namedExpressions.map(Alias(_, "key")()) +} else { + outputEncoder.namedExpressions --- End diff -- if that is the case, I feel it better to ask users to resolve conflict manually, by adding alias. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #22919: [SPARK-25906][SHELL] Documents '-I' option (from ...
Github user asfgit closed the pull request at: https://github.com/apache/spark/pull/22919 --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22953: [SPARK-25946] [BUILD] Upgrade ASM to 7.x to support JDK1...
Github user HyukjinKwon commented on the issue: https://github.com/apache/spark/pull/22953 Looks good to me. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22919: [SPARK-25906][SHELL] Documents '-I' option (from Scala R...
Github user HyukjinKwon commented on the issue: https://github.com/apache/spark/pull/22919 Merged to master and branch-2.4. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22937: [SPARK-25934] [Mesos] Don't propagate SPARK_CONF_DIR fro...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/22937 Merged build finished. Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22937: [SPARK-25934] [Mesos] Don't propagate SPARK_CONF_DIR fro...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/22937 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/98494/ Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22937: [SPARK-25934] [Mesos] Don't propagate SPARK_CONF_DIR fro...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/22937 **[Test build #98494 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/98494/testReport)** for PR 22937 at commit [`10da112`](https://github.com/apache/spark/commit/10da1120d52f61e98bcd929d7ad59220a93d59f7). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #22944: [SPARK-25942][SQL] Fix Dataset.groupByKey to make...
Github user viirya commented on a diff in the pull request: https://github.com/apache/spark/pull/22944#discussion_r230983077 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/plans/logical/object.scala --- @@ -262,25 +262,39 @@ object AppendColumns { def apply[T : Encoder, U : Encoder]( func: T => U, child: LogicalPlan): AppendColumns = { +val outputEncoder = encoderFor[U] +val namedExpressions = if (!outputEncoder.isSerializedAsStruct) { + assert(outputEncoder.namedExpressions.length == 1) + outputEncoder.namedExpressions.map(Alias(_, "key")()) +} else { + outputEncoder.namedExpressions +} new AppendColumns( func.asInstanceOf[Any => Any], implicitly[Encoder[T]].clsTag.runtimeClass, implicitly[Encoder[T]].schema, UnresolvedDeserializer(encoderFor[T].deserializer), - encoderFor[U].namedExpressions, + namedExpressions, child) } def apply[T : Encoder, U : Encoder]( func: T => U, inputAttributes: Seq[Attribute], child: LogicalPlan): AppendColumns = { +val outputEncoder = encoderFor[U] +val namedExpressions = if (!outputEncoder.isSerializedAsStruct) { + assert(outputEncoder.namedExpressions.length == 1) + outputEncoder.namedExpressions.map(Alias(_, "key")()) +} else { + outputEncoder.namedExpressions --- End diff -- yea, but for such cases, seems it is more complicated as we can't simply create aliases for serializer fields. Because in methods like `mapGroups`, we need to access original fields of key. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22823: [SPARK-25676][SQL][TEST] Refactor BenchmarkWideTable to ...
Github user yucai commented on the issue: https://github.com/apache/spark/pull/22823 @dongjoon-hyun Just push the rebased version, thanks! --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22087: [SPARK-25097][ML] Support prediction on single instance ...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/22087 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution-unified/4786/ Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22087: [SPARK-25097][ML] Support prediction on single instance ...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/22087 **[Test build #98502 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/98502/testReport)** for PR 22087 at commit [`fb3c6d1`](https://github.com/apache/spark/commit/fb3c6d1f933b807c89e5892dadf8654ec280d3b5). --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22087: [SPARK-25097][ML] Support prediction on single instance ...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/22087 Merged build finished. Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22823: [SPARK-25676][SQL][TEST] Refactor BenchmarkWideTable to ...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/22823 **[Test build #98501 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/98501/testReport)** for PR 22823 at commit [`f714cc8`](https://github.com/apache/spark/commit/f714cc8795568311dee3b0c93901d16eadc198fb). --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22952: [SPARK-20568][SS] Rename files which are completed in pr...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/22952 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/98493/ Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22952: [SPARK-20568][SS] Rename files which are completed in pr...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/22952 Merged build finished. Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22952: [SPARK-20568][SS] Rename files which are completed in pr...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/22952 **[Test build #98493 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/98493/testReport)** for PR 22952 at commit [`fb01c60`](https://github.com/apache/spark/commit/fb01c60624389ee432d0a23afd14e956453cd22e). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #22944: [SPARK-25942][SQL] Fix Dataset.groupByKey to make...
Github user cloud-fan commented on a diff in the pull request: https://github.com/apache/spark/pull/22944#discussion_r230977212 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/plans/logical/object.scala --- @@ -262,25 +262,39 @@ object AppendColumns { def apply[T : Encoder, U : Encoder]( func: T => U, child: LogicalPlan): AppendColumns = { +val outputEncoder = encoderFor[U] +val namedExpressions = if (!outputEncoder.isSerializedAsStruct) { + assert(outputEncoder.namedExpressions.length == 1) + outputEncoder.namedExpressions.map(Alias(_, "key")()) +} else { + outputEncoder.namedExpressions +} new AppendColumns( func.asInstanceOf[Any => Any], implicitly[Encoder[T]].clsTag.runtimeClass, implicitly[Encoder[T]].schema, UnresolvedDeserializer(encoderFor[T].deserializer), - encoderFor[U].namedExpressions, + namedExpressions, child) } def apply[T : Encoder, U : Encoder]( func: T => U, inputAttributes: Seq[Attribute], child: LogicalPlan): AppendColumns = { +val outputEncoder = encoderFor[U] +val namedExpressions = if (!outputEncoder.isSerializedAsStruct) { + assert(outputEncoder.namedExpressions.length == 1) + outputEncoder.namedExpressions.map(Alias(_, "key")()) +} else { + outputEncoder.namedExpressions --- End diff -- so we may still fail if `T` and `U` are case classes and have conflict field names? --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22889: [SPARK-25882][SQL] Added a function to join two datasets...
Github user rxin commented on the issue: https://github.com/apache/spark/pull/22889 Yea good idea (prefer Array over Seq for short lists) --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22683: [SPARK-25696] The storage memory displayed on spark Appl...
Github user httfighter commented on the issue: https://github.com/apache/spark/pull/22683 It's ok. @ajbozarth --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22952: [SPARK-20568][SS] Rename files which are completed in pr...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/22952 Merged build finished. Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22952: [SPARK-20568][SS] Rename files which are completed in pr...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/22952 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/98491/ Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22952: [SPARK-20568][SS] Rename files which are completed in pr...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/22952 **[Test build #98491 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/98491/testReport)** for PR 22952 at commit [`8a1d2e1`](https://github.com/apache/spark/commit/8a1d2e187c667833b2de8eb4cba2fa04dca9c6ff). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22928: [SPARK-25926][CORE] Move config entries in core module t...
Github user cloud-fan commented on the issue: https://github.com/apache/spark/pull/22928 thanks, merging to master! --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #22947: [SPARK-24913][SQL] Make AssertNotNull and AssertT...
Github user dbtsai commented on a diff in the pull request: https://github.com/apache/spark/pull/22947#discussion_r230975661 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/misc.scala --- @@ -66,6 +66,8 @@ case class AssertTrue(child: Expression) extends UnaryExpression with ImplicitCa override def nullable: Boolean = true + override lazy val deterministic: Boolean = false --- End diff -- Because of this, I'm leaning towards creating a new flag instead of making them non-deterministic. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22953: [SPARK-25946] [BUILD] Upgrade ASM to 7.x to support JDK1...
Github user dbtsai commented on the issue: https://github.com/apache/spark/pull/22953 ASM6 supports Java 9 while ASM7 supports Java 9, Java 10, and Java 11. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22946: [SPARK-25943][SQL] Fail if mismatching nested struct fie...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/22946 **[Test build #98500 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/98500/testReport)** for PR 22946 at commit [`63dd40f`](https://github.com/apache/spark/commit/63dd40f47ab8e8e9c120a9801b2f037336001ea6). * This patch **fails Scala style tests**. * This patch merges cleanly. * This patch adds no public classes. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22946: [SPARK-25943][SQL] Fail if mismatching nested struct fie...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/22946 Test FAILed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/98500/ Test FAILed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22945: [SPARK-24066][SQL]Add new optimization rule to eliminate...
Github user heary-cao commented on the issue: https://github.com/apache/spark/pull/22945 ping @cloud-fan --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22946: [SPARK-25943][SQL] Fail if mismatching nested struct fie...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/22946 Merged build finished. Test FAILed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22946: [SPARK-25943][SQL] Fail if mismatching nested struct fie...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/22946 **[Test build #98500 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/98500/testReport)** for PR 22946 at commit [`63dd40f`](https://github.com/apache/spark/commit/63dd40f47ab8e8e9c120a9801b2f037336001ea6). --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22946: [SPARK-25943][SQL] Fail if mismatching nested struct fie...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/22946 Merged build finished. Test FAILed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #22547: [SPARK-25528][SQL] data source V2 read side API r...
Github user mccheah commented on a diff in the pull request: https://github.com/apache/spark/pull/22547#discussion_r230973917 --- Diff: external/kafka-0-10-sql/src/main/scala/org/apache/spark/sql/kafka010/KafkaContinuousInputStream.scala --- @@ -46,17 +45,22 @@ import org.apache.spark.sql.types.StructType * scenarios, where some offsets after the specified initial ones can't be * properly read. */ -class KafkaContinuousReadSupport( +class KafkaContinuousInputStream( --- End diff -- Makes sense. I really consider this to be a blocker on getting this merged and approved. It's difficult to have confidence in a review over such a large change. Thoughts @cloud-fan @rdblue? --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22946: [SPARK-25943][SQL] Fail if mismatching nested struct fie...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/22946 Test FAILed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/98499/ Test FAILed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22946: [SPARK-25943][SQL] Fail if mismatching nested struct fie...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/22946 **[Test build #98499 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/98499/testReport)** for PR 22946 at commit [`63dd40f`](https://github.com/apache/spark/commit/63dd40f47ab8e8e9c120a9801b2f037336001ea6). * This patch **fails Scala style tests**. * This patch merges cleanly. * This patch adds no public classes. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22946: [SPARK-25943][SQL] Fail if mismatching nested struct fie...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/22946 **[Test build #98499 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/98499/testReport)** for PR 22946 at commit [`63dd40f`](https://github.com/apache/spark/commit/63dd40f47ab8e8e9c120a9801b2f037336001ea6). --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22946: [SPARK-25943][SQL] Fail if mismatching nested struct fie...
Github user cloud-fan commented on the issue: https://github.com/apache/spark/pull/22946 ok to test --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #22928: [SPARK-25926][CORE] Move config entries in core m...
Github user asfgit closed the pull request at: https://github.com/apache/spark/pull/22928 --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22946: [SPARK-25943][SQL] Fail if mismatching nested struct fie...
Github user cloud-fan commented on the issue: https://github.com/apache/spark/pull/22946 ah good catch! Can you also add a test? --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #21306: [SPARK-24252][SQL] Add catalog registration and t...
Github user mccheah commented on a diff in the pull request: https://github.com/apache/spark/pull/21306#discussion_r230973235 --- Diff: sql/catalyst/src/main/java/org/apache/spark/sql/catalog/v2/TableChange.java --- @@ -0,0 +1,182 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the "License"); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package org.apache.spark.sql.catalog.v2; + +import org.apache.spark.sql.types.DataType; + +/** + * TableChange subclasses represent requested changes to a table. These are passed to + * {@link TableCatalog#alterTable}. For example, + * + * import TableChange._ + * val catalog = source.asInstanceOf[TableSupport].catalog() + * catalog.alterTable(ident, + * addColumn("x", IntegerType), + * renameColumn("a", "b"), + * deleteColumn("c") + * ) + * + */ +public interface TableChange { --- End diff -- Would it be a valid operation to change the partitioning of the table without dropping the entire table and re-creating it? E.g. change the bucket size for such and such column. Seems pretty difficult to do in practice though since the underlying data layout would have to change as part of the modification. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #21306: [SPARK-24252][SQL] Add catalog registration and t...
Github user mccheah commented on a diff in the pull request: https://github.com/apache/spark/pull/21306#discussion_r230972839 --- Diff: sql/catalyst/src/main/java/org/apache/spark/sql/catalog/v2/Table.java --- @@ -0,0 +1,46 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the "License"); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package org.apache.spark.sql.catalog.v2; + +import org.apache.spark.sql.types.StructType; + +import java.util.List; +import java.util.Map; + +/** + * Represents table metadata from a {@link TableCatalog} or other table sources. + */ +public interface Table { --- End diff -- The nomenclature here appears to conflict with @cloud-fan's refactor in https://github.com/apache/spark/pull/22547/files#diff-45399ef5eed5c873d5f12bf0f1671b8fR40. Maybe we can call this `TableMetadata` or `TableDescription`? Or perhaps we rename the other construct? --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22502: [SPARK-25474][SQL]When the "fallBackToHdfsForStats= true...
Github user cloud-fan commented on the issue: https://github.com/apache/spark/pull/22502 @shahidki31 thanks for fixing it! Do you know where we read `fallBackToHdfsForStats` currently and see if we can have a unified place to do it? --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22889: [SPARK-25882][SQL] Added a function to join two datasets...
Github user cloud-fan commented on the issue: https://github.com/apache/spark/pull/22889 I think the problem is real, maybe we should not use `Seq` in the end-user API, but always use Array to be more Java-friendly. This can also avoid bugs like https://github.com/apache/spark/pull/22789 cc @rxin @hvanhovell what do you think? --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22949: [minor] update known_translations
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/22949 **[Test build #98498 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/98498/testReport)** for PR 22949 at commit [`9808e43`](https://github.com/apache/spark/commit/9808e434bbfd7f703becc48a7d4fe3628131ad58). --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22949: [minor] update known_translations
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/22949 Merged build finished. Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22949: [minor] update known_translations
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/22949 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution-unified/4785/ Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22953: [SPARK-25946] [BUILD] Upgrade ASM to 7.x to support JDK1...
Github user dbtsai commented on the issue: https://github.com/apache/spark/pull/22953 cc @gatorsmile @srowen @HyukjinKwon --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #22943: [SPARK-25098][SQL] Trim the string when cast stri...
Github user wangyum commented on a diff in the pull request: https://github.com/apache/spark/pull/22943#discussion_r230970713 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/Cast.scala --- @@ -359,7 +359,7 @@ case class Cast(child: Expression, dataType: DataType, timeZoneId: Option[String // TimestampConverter private[this] def castToTimestamp(from: DataType): Any => Any = from match { case StringType => - buildCast[UTF8String](_, utfs => DateTimeUtils.stringToTimestamp(utfs, timeZone).orNull) + buildCast[UTF8String](_, s => DateTimeUtils.stringToTimestamp(s.trim(), timeZone).orNull) --- End diff -- How about change `stringToDate` to `trimStringToDate` and update `trimStringToDate` to: ![image](https://user-images.githubusercontent.com/5399861/48036353-ec49a100-e1a2-11e8-80e6-b52b4a007493.png) --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22949: [minor] update known_translations
Github user cloud-fan commented on the issue: https://github.com/apache/spark/pull/22949 Note that, these updates are generated by the script not me. If someone is not in the list, it means the script can figure out the full name without translation. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #22949: [minor] update known_translations
Github user cloud-fan commented on a diff in the pull request: https://github.com/apache/spark/pull/22949#discussion_r230970453 --- Diff: dev/create-release/known_translations --- @@ -203,3 +203,61 @@ shenh062326 - Shen Hong aokolnychyi - Anton Okolnychyi linbojin - Linbo Jin lw-lin - Liwei Lin +10110346 - Xian Liu +Achuth17 - Achuth Narayan Rajagopal +Adamyuanyuan - Adam Wang +DylanGuedes - Dylan Guedes +JiahuiJiang - Jiahui Jiang +KevinZwx - Kevin Zhang +LantaoJin - Lantao Jin +Lemonjing - Rann Tao +LucaCanali - Luca Canali +XD-DENG - Xiaodong Deng +aai95 - Aleksei Izmalkin +akonopko - Alexander Konopko +ankuriitg - Ankur Gupta +arucard21 - Riaas Mokiem +attilapiros - Attila Zsolt Piros +bravo-zhang - Bravo Zhang +caneGuy - Kang Zhou +chaoslawful - Xiaozhe Wang +cluo512 - Chuan Luo +codeatri - Neha Patil +crafty-coder - Carlos Pena +debugger87 - Chaozhong Yang +e-dorigatti - Emilio Dorigatti +eric-maynard - Eric Maynard +felixalbani - Felix Albani +fjh100456 - fjh100456 --- End diff -- ah I missed this one. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22760: [SPARK-25751][K8S][TEST] Unit Testing for Kerberos Suppo...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/22760 Merged build finished. Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22760: [SPARK-25751][K8S][TEST] Unit Testing for Kerberos Suppo...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/22760 Kubernetes integration test status success URL: https://amplab.cs.berkeley.edu/jenkins/job/testing-k8s-prb-make-spark-distribution-unified/4784/ --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22760: [SPARK-25751][K8S][TEST] Unit Testing for Kerberos Suppo...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/22760 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution-unified/4784/ Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #22087: [SPARK-25097][ML] Support prediction on single in...
Github user viirya commented on a diff in the pull request: https://github.com/apache/spark/pull/22087#discussion_r230968204 --- Diff: mllib/src/test/scala/org/apache/spark/ml/util/MLTest.scala --- @@ -155,4 +155,16 @@ trait MLTest extends StreamTest with TempDirectory { self: Suite => assert(prediction === model.predict(features)) } } + + def testClusteringModelSinglePrediction(model: Model[_], + transform: Vector => Int, + dataset: Dataset[_], + input: String, + output: String): Unit = { --- End diff -- I think we should use 2 spaces ident? --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org