[GitHub] spark issue #22934: [INFRA] Close stale PRs
Github user HyukjinKwon commented on the issue: https://github.com/apache/spark/pull/22934 Merged to master. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22626: [SPARK-25638][SQL] Adding new function - to_csv()
Github user HyukjinKwon commented on the issue: https://github.com/apache/spark/pull/22626 Merged to master. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #22626: [SPARK-25638][SQL] Adding new function - to_csv()
Github user HyukjinKwon commented on a diff in the pull request: https://github.com/apache/spark/pull/22626#discussion_r230577431 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/csvExpressions.scala --- @@ -174,3 +176,68 @@ case class SchemaOfCsv( override def prettyName: String = "schema_of_csv" } + +/** + * Converts a [[StructType]] to a CSV output string. + */ +// scalastyle:off line.size.limit +@ExpressionDescription( + usage = "_FUNC_(expr[, options]) - Returns a CSV string with a given struct value", + examples = """ +Examples: + > SELECT _FUNC_(named_struct('a', 1, 'b', 2)); + 1,2 + > SELECT _FUNC_(named_struct('time', to_timestamp('2015-08-26', '-MM-dd')), map('timestampFormat', 'dd/MM/')); + "26/08/2015" + """, + since = "3.0.0") +// scalastyle:on line.size.limit +case class StructsToCsv( + options: Map[String, String], + child: Expression, + timeZoneId: Option[String] = None) + extends UnaryExpression with TimeZoneAwareExpression with CodegenFallback with ExpectsInputTypes { + override def nullable: Boolean = true + + def this(options: Map[String, String], child: Expression) = this(options, child, None) + + // Used in `FunctionRegistry` + def this(child: Expression) = this(Map.empty, child, None) + + def this(child: Expression, options: Expression) = +this( + options = ExprUtils.convertToMapData(options), + child = child, + timeZoneId = None) + + @transient + lazy val writer = new CharArrayWriter() + + @transient + lazy val inputSchema: StructType = child.dataType match { +case st: StructType => st +case other => + throw new IllegalArgumentException(s"Unsupported input type ${other.catalogString}") + } + + @transient + lazy val gen = new UnivocityGenerator( +inputSchema, writer, new CSVOptions(options, columnPruning = true, timeZoneId.get)) --- End diff -- nit: We wouldn't need `lazy val writer` then but just `new CharArrayWriter()` here. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22919: [SPARK-25906][SHELL] Documents '-I' option (from Scala R...
Github user HyukjinKwon commented on the issue: https://github.com/apache/spark/pull/22919 Let me get this in to master and branch-2.4 in few days if there are no more comments. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22754: [SPARK-25776][CORE]The disk write buffer size must be gr...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/22754 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/98434/ Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22754: [SPARK-25776][CORE]The disk write buffer size must be gr...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/22754 Merged build finished. Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22754: [SPARK-25776][CORE]The disk write buffer size must be gr...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/22754 **[Test build #98434 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/98434/testReport)** for PR 22754 at commit [`c883f4b`](https://github.com/apache/spark/commit/c883f4b1b26139e3cb2053b8efa0b0bb0f902be0). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #22889: [SPARK-25882][SQL] Added a function to join two d...
Github user arman1371 commented on a diff in the pull request: https://github.com/apache/spark/pull/22889#discussion_r230575855 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/Dataset.scala --- @@ -883,6 +883,31 @@ class Dataset[T] private[sql]( join(right, Seq(usingColumn)) } + /** +* Equi-join with another `DataFrame` using the given column. +* +* Different from other join functions, the join column will only appear once in the output, +* i.e. similar to SQL's `JOIN USING` syntax. +* +* {{{ +* // Left join of df1 and df2 using the column "user_id" +* df1.join(df2, "user_id", "left") +* }}} +* +* @param right Right side of the join operation. +* @param usingColumn Name of the column to join on. This column must exist on both sides. +* @param joinType Type of join to perform. Default `inner`. Must be one of: +* `inner`, `cross`, `outer`, `full`, `full_outer`, `left`, `left_outer`, +* `right`, `right_outer`, `left_semi`, `left_anti`. +* @note If you perform a self-join using this function without aliasing the input +* `DataFrame`s, you will NOT be able to reference any columns after the join, since +* there is no way to disambiguate which side of the join you would like to reference. +* @group untypedrel +*/ + def join(right: Dataset[_], usingColumn: String, joinType: String): DataFrame = { --- End diff -- Yes, for scala api it's just 5 characters but in java api it's very hard to change `String` to `Seq[String]` --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22867: [SPARK-25778] WriteAheadLogBackedBlockRDD in YARN Cluste...
Github user gss2002 commented on the issue: https://github.com/apache/spark/pull/22867 @vanzin can you please review latest patch thanks! --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22914: [SPARK-25900][WEBUI]When the page number is more than th...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/22914 **[Test build #98436 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/98436/testReport)** for PR 22914 at commit [`64aa7a7`](https://github.com/apache/spark/commit/64aa7a7ab2f4d2653aa4310d30884b26eb712ed1). --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22914: [SPARK-25900][WEBUI]When the page number is more than th...
Github user HyukjinKwon commented on the issue: https://github.com/apache/spark/pull/22914 retest this please --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22818: [SPARK-25904][CORE] Allocate arrays smaller than Int.Max...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/22818 Merged build finished. Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22818: [SPARK-25904][CORE] Allocate arrays smaller than Int.Max...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/22818 **[Test build #98435 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/98435/testReport)** for PR 22818 at commit [`ca3efd8`](https://github.com/apache/spark/commit/ca3efd8f636706abf8c994cb75c14432f4e4939a). --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22818: [SPARK-25904][CORE] Allocate arrays smaller than Int.Max...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/22818 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution-unified/4744/ Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22818: [SPARK-25904][CORE] Allocate arrays smaller than Int.Max...
Github user kiszk commented on the issue: https://github.com/apache/spark/pull/22818 retest this please --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22754: [SPARK-25776][CORE]The disk write buffer size must be gr...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/22754 **[Test build #98434 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/98434/testReport)** for PR 22754 at commit [`c883f4b`](https://github.com/apache/spark/commit/c883f4b1b26139e3cb2053b8efa0b0bb0f902be0). --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22754: [SPARK-25776][CORE]The disk write buffer size must be gr...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/22754 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution-unified/4743/ Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22754: [SPARK-25776][CORE]The disk write buffer size must be gr...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/22754 Merged build finished. Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22754: [SPARK-25776][CORE]The disk write buffer size must be gr...
Github user kiszk commented on the issue: https://github.com/apache/spark/pull/22754 retest this please --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22754: [SPARK-25776][CORE]The disk write buffer size must be gr...
Github user kiszk commented on the issue: https://github.com/apache/spark/pull/22754 LGTM --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22921: [SPARK-25908][CORE][SQL] Remove old deprecated items in ...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/22921 Merged build finished. Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22921: [SPARK-25908][CORE][SQL] Remove old deprecated items in ...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/22921 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/98433/ Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22921: [SPARK-25908][CORE][SQL] Remove old deprecated items in ...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/22921 **[Test build #98433 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/98433/testReport)** for PR 22921 at commit [`df92f0f`](https://github.com/apache/spark/commit/df92f0fd48883adfd1c4881c03d86665b93d0831). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22683: [SPARK-25696] The storage memory displayed on spark Appl...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/22683 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/98432/ Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22683: [SPARK-25696] The storage memory displayed on spark Appl...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/22683 Merged build finished. Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22683: [SPARK-25696] The storage memory displayed on spark Appl...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/22683 **[Test build #98432 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/98432/testReport)** for PR 22683 at commit [`9e45697`](https://github.com/apache/spark/commit/9e45697296039e55e85dd204788e287c9c60fceb). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #22889: [SPARK-25882][SQL] Added a function to join two d...
Github user dongjoon-hyun commented on a diff in the pull request: https://github.com/apache/spark/pull/22889#discussion_r230571447 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/Dataset.scala --- @@ -883,6 +883,31 @@ class Dataset[T] private[sql]( join(right, Seq(usingColumn)) } + /** +* Equi-join with another `DataFrame` using the given column. --- End diff -- Please fix the indentation here. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #22889: [SPARK-25882][SQL] Added a function to join two d...
Github user dongjoon-hyun commented on a diff in the pull request: https://github.com/apache/spark/pull/22889#discussion_r230571437 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/Dataset.scala --- @@ -883,6 +883,31 @@ class Dataset[T] private[sql]( join(right, Seq(usingColumn)) } + /** +* Equi-join with another `DataFrame` using the given column. +* +* Different from other join functions, the join column will only appear once in the output, +* i.e. similar to SQL's `JOIN USING` syntax. +* +* {{{ +* // Left join of df1 and df2 using the column "user_id" +* df1.join(df2, "user_id", "left") +* }}} +* +* @param right Right side of the join operation. +* @param usingColumn Name of the column to join on. This column must exist on both sides. +* @param joinType Type of join to perform. Default `inner`. Must be one of: +* `inner`, `cross`, `outer`, `full`, `full_outer`, `left`, `left_outer`, +* `right`, `right_outer`, `left_semi`, `left_anti`. +* @note If you perform a self-join using this function without aliasing the input +* `DataFrame`s, you will NOT be able to reference any columns after the join, since +* there is no way to disambiguate which side of the join you would like to reference. +* @group untypedrel +*/ + def join(right: Dataset[_], usingColumn: String, joinType: String): DataFrame = { --- End diff -- @arman1371 . We understand that this PR is trying to add a syntactic sugar. But you need only 5 characters, `Seq(` and `)`, to use the existing general API. Personally, I agree with @wangyum . I prefer not to add this. Historically, 1. Spark 1.4 adds `Seq[String]` version was added later to support PySpark (SPARK-7990) 2. Spark 1.6 adds `join` type to `Seq[String]` version (SPARK-10446) It's a long time ago. Given that, I guess Apache Spark community intentionally didn't add the `String` version for this in order to keep `Dataset` simple in terms of the number of APIs. Anyway, since you need an answer, let's ask the general opinion again to make a decision. Hi, @rxin, @cloud-fan , @gatorsmile . Did we explicitly decide not to add this API ? It seems that @arman1371 wants to add this for feature parity with PySpark at Spark 3.0.0. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #22889: [SPARK-25882][SQL] Added a function to join two d...
Github user wangyum commented on a diff in the pull request: https://github.com/apache/spark/pull/22889#discussion_r230570164 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/Dataset.scala --- @@ -883,6 +883,31 @@ class Dataset[T] private[sql]( join(right, Seq(usingColumn)) } + /** +* Equi-join with another `DataFrame` using the given column. +* +* Different from other join functions, the join column will only appear once in the output, +* i.e. similar to SQL's `JOIN USING` syntax. +* +* {{{ +* // Left join of df1 and df2 using the column "user_id" +* df1.join(df2, "user_id", "left") +* }}} +* +* @param right Right side of the join operation. +* @param usingColumn Name of the column to join on. This column must exist on both sides. +* @param joinType Type of join to perform. Default `inner`. Must be one of: +* `inner`, `cross`, `outer`, `full`, `full_outer`, `left`, `left_outer`, +* `right`, `right_outer`, `left_semi`, `left_anti`. +* @note If you perform a self-join using this function without aliasing the input +* `DataFrame`s, you will NOT be able to reference any columns after the join, since +* there is no way to disambiguate which side of the join you would like to reference. +* @group untypedrel +*/ + def join(right: Dataset[_], usingColumn: String, joinType: String): DataFrame = { --- End diff -- cc @dongjoon-hyun --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22914: [SPARK-25900][WEBUI]When the page number is more than th...
Github user shahidki31 commented on the issue: https://github.com/apache/spark/pull/22914 retest this please --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22921: [SPARK-25908][CORE][SQL] Remove old deprecated items in ...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/22921 Merged build finished. Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22921: [SPARK-25908][CORE][SQL] Remove old deprecated items in ...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/22921 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution-unified/4742/ Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22921: [SPARK-25908][CORE][SQL] Remove old deprecated items in ...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/22921 **[Test build #98433 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/98433/testReport)** for PR 22921 at commit [`df92f0f`](https://github.com/apache/spark/commit/df92f0fd48883adfd1c4881c03d86665b93d0831). --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22921: [SPARK-25908][CORE][SQL] Remove old deprecated items in ...
Github user srowen commented on the issue: https://github.com/apache/spark/pull/22921 Yeah it's a good point that these weren't deprecated, but I assume they should have been. Same change, same time, same logic. given that it's a reasonably niche method, I thought it would be best to go ahead and be consistent here? --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #22921: [SPARK-25908][CORE][SQL] Remove old deprecated it...
Github user srowen commented on a diff in the pull request: https://github.com/apache/spark/pull/22921#discussion_r230568378 --- Diff: R/pkg/R/functions.R --- @@ -319,23 +319,23 @@ setMethod("acos", }) #' @details -#' \code{approxCountDistinct}: Returns the approximate number of distinct items in a group. +#' \code{approx_count_distinct}: Returns the approximate number of distinct items in a group. #' #' @rdname column_aggregate_functions -#' @aliases approxCountDistinct approxCountDistinct,Column-method +#' @aliases approx_count_distinct approx_count_distinct,Column-method #' @examples #' #' \dontrun{ -#' head(select(df, approxCountDistinct(df$gear))) -#' head(select(df, approxCountDistinct(df$gear, 0.02))) +#' head(select(df, approx_count_distinct(df$gear))) +#' head(select(df, approx_count_distinct(df$gear, 0.02))) #' head(select(df, countDistinct(df$gear, df$cyl))) #' head(select(df, n_distinct(df$gear))) #' head(distinct(select(df, "gear")))} -#' @note approxCountDistinct(Column) since 1.4.0 -setMethod("approxCountDistinct", +#' @note approx_count_distinct(Column) since 2.0.0 --- End diff -- Right, will fix that one too if I missed it, per https://github.com/apache/spark/pull/22921#discussion_r230449173 --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22683: [SPARK-25696] The storage memory displayed on spark Appl...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/22683 **[Test build #98432 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/98432/testReport)** for PR 22683 at commit [`9e45697`](https://github.com/apache/spark/commit/9e45697296039e55e85dd204788e287c9c60fceb). --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22683: [SPARK-25696] The storage memory displayed on spark Appl...
Github user felixcheung commented on the issue: https://github.com/apache/spark/pull/22683 Jenkins, ok to test --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #22921: [SPARK-25908][CORE][SQL] Remove old deprecated it...
Github user felixcheung commented on a diff in the pull request: https://github.com/apache/spark/pull/22921#discussion_r230568079 --- Diff: R/pkg/R/functions.R --- @@ -1641,30 +1641,30 @@ setMethod("tanh", }) #' @details -#' \code{toDegrees}: Converts an angle measured in radians to an approximately equivalent angle +#' \code{degrees}: Converts an angle measured in radians to an approximately equivalent angle #' measured in degrees. #' #' @rdname column_math_functions -#' @aliases toDegrees toDegrees,Column-method -#' @note toDegrees since 1.4.0 -setMethod("toDegrees", +#' @aliases degrees degrees,Column-method +#' @note degrees since 3.0.0 +setMethod("degrees", --- End diff -- `degrees` and `radians` will need to be added to NAMESPACE file for export --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #22921: [SPARK-25908][CORE][SQL] Remove old deprecated it...
Github user felixcheung commented on a diff in the pull request: https://github.com/apache/spark/pull/22921#discussion_r230568058 --- Diff: R/pkg/R/generics.R --- @@ -748,7 +748,7 @@ setGeneric("add_months", function(y, x) { standardGeneric("add_months") }) #' @rdname column_aggregate_functions #' @name NULL -setGeneric("approxCountDistinct", function(x, ...) { standardGeneric("approxCountDistinct") }) +setGeneric("approx_count_distinct", function(x, ...) { standardGeneric("approx_count_distinct") }) --- End diff -- my concern is that these are breaking changes in a version without having them deprecated first... could we leave the old one to redirect and add .Deprecate? --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #22921: [SPARK-25908][CORE][SQL] Remove old deprecated it...
Github user felixcheung commented on a diff in the pull request: https://github.com/apache/spark/pull/22921#discussion_r230568088 --- Diff: R/pkg/R/functions.R --- @@ -319,23 +319,23 @@ setMethod("acos", }) #' @details -#' \code{approxCountDistinct}: Returns the approximate number of distinct items in a group. +#' \code{approx_count_distinct}: Returns the approximate number of distinct items in a group. #' #' @rdname column_aggregate_functions -#' @aliases approxCountDistinct approxCountDistinct,Column-method +#' @aliases approx_count_distinct approx_count_distinct,Column-method #' @examples #' #' \dontrun{ -#' head(select(df, approxCountDistinct(df$gear))) -#' head(select(df, approxCountDistinct(df$gear, 0.02))) +#' head(select(df, approx_count_distinct(df$gear))) +#' head(select(df, approx_count_distinct(df$gear, 0.02))) #' head(select(df, countDistinct(df$gear, df$cyl))) #' head(select(df, n_distinct(df$gear))) #' head(distinct(select(df, "gear")))} -#' @note approxCountDistinct(Column) since 1.4.0 -setMethod("approxCountDistinct", +#' @note approx_count_distinct(Column) since 2.0.0 --- End diff -- it's actually new in R for 3.0.0 then --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22626: [SPARK-25638][SQL] Adding new function - to_csv()
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/22626 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/98431/ Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22626: [SPARK-25638][SQL] Adding new function - to_csv()
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/22626 Merged build finished. Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22626: [SPARK-25638][SQL] Adding new function - to_csv()
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/22626 **[Test build #98431 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/98431/testReport)** for PR 22626 at commit [`1895cdc`](https://github.com/apache/spark/commit/1895cdc3540f67ad562e10488ac7ffe7012d9ccc). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22937: [SPARK-25934] [Mesos] Don't propagate SPARK_CONF_DIR fro...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/22937 Can one of the admins verify this patch? --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22937: [SPARK-25934] [Mesos] Don't propagate SPARK_CONF_DIR fro...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/22937 Can one of the admins verify this patch? --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22937: [SPARK-25934] [Mesos] Don't propagate SPARK_CONF_DIR fro...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/22937 Can one of the admins verify this patch? --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #22937: [SPARK-25934] [Mesos] Don't propagate SPARK_CONF_...
GitHub user mpmolek opened a pull request: https://github.com/apache/spark/pull/22937 [SPARK-25934] [Mesos] Don't propagate SPARK_CONF_DIR from spark submit ## What changes were proposed in this pull request? Don't propagate SPARK_CONF_DIR to the driver in mesos cluster mode. ## How was this patch tested? I built the 2.3.2 tag with this patch added and deployed a test job to a mesos cluster to confirm that the incorrect SPARK_CONF_DIR was no longer passed from the submit command. You can merge this pull request into a Git repository by running: $ git pull https://github.com/mpmolek/spark fix-conf-dir Alternatively you can review and apply these changes as the patch at: https://github.com/apache/spark/pull/22937.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #22937 commit 10da1120d52f61e98bcd929d7ad59220a93d59f7 Author: Matt Molek Date: 2018-11-03T19:33:02Z [SPARK-25934] Don't propagate SPARK_CONF_DIR from spark submit --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22932: [SPARK-25102][SQL] Write Spark version to ORC/Parquet fi...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/22932 Merged build finished. Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22932: [SPARK-25102][SQL] Write Spark version to ORC/Parquet fi...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/22932 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/98430/ Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22932: [SPARK-25102][SQL] Write Spark version to ORC/Parquet fi...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/22932 **[Test build #98430 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/98430/testReport)** for PR 22932 at commit [`f5d35b4`](https://github.com/apache/spark/commit/f5d35b42092a7af2b545b4145daf9172ea6a8e32). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22626: [SPARK-25638][SQL] Adding new function - to_csv()
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/22626 Merged build finished. Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22626: [SPARK-25638][SQL] Adding new function - to_csv()
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/22626 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/98428/ Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22626: [SPARK-25638][SQL] Adding new function - to_csv()
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/22626 **[Test build #98428 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/98428/testReport)** for PR 22626 at commit [`6969b49`](https://github.com/apache/spark/commit/6969b49812acd2664bca724378a3739cb7846a6a). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22932: [SPARK-25102][SQL] Write Spark version to ORC/Parquet fi...
Github user dongjoon-hyun commented on the issue: https://github.com/apache/spark/pull/22932 The last commit will pass the test. The previous one fails due to `spaces at the end`. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22932: [SPARK-25102][SQL] Write Spark version to ORC/Parquet fi...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/22932 Test FAILed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/98429/ Test FAILed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22932: [SPARK-25102][SQL] Write Spark version to ORC/Parquet fi...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/22932 Merged build finished. Test FAILed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22932: [SPARK-25102][SQL] Write Spark version to ORC/Parquet fi...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/22932 **[Test build #98429 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/98429/testReport)** for PR 22932 at commit [`1ed6368`](https://github.com/apache/spark/commit/1ed63683f2f8f7361a83892d38c84f40e2464590). * This patch **fails Spark unit tests**. * This patch merges cleanly. * This patch adds no public classes. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #22932: [SPARK-25102][SQL] Write Spark version to ORC/Par...
Github user dongjoon-hyun commented on a diff in the pull request: https://github.com/apache/spark/pull/22932#discussion_r230564513 --- Diff: sql/core/src/test/resources/sql-tests/results/describe-part-after-analyze.sql.out --- @@ -93,7 +93,7 @@ Partition Values [ds=2017-08-01, hr=10] Location [not included in comparison]sql/core/spark-warehouse/t/ds=2017-08-01/hr=10 Created Time [not included in comparison] Last Access [not included in comparison] -Partition Statistics 1121 bytes, 3 rows +Partition Statistics 1229 bytes, 3 rows --- End diff -- Right, @gatorsmile . --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #22932: [SPARK-25102][SQL] Write Spark version to ORC/Par...
Github user gatorsmile commented on a diff in the pull request: https://github.com/apache/spark/pull/22932#discussion_r230563752 --- Diff: sql/core/src/test/resources/sql-tests/results/describe-part-after-analyze.sql.out --- @@ -93,7 +93,7 @@ Partition Values [ds=2017-08-01, hr=10] Location [not included in comparison]sql/core/spark-warehouse/t/ds=2017-08-01/hr=10 Created Time [not included in comparison] Last Access [not included in comparison] -Partition Statistics 1121 bytes, 3 rows +Partition Statistics 1229 bytes, 3 rows --- End diff -- This is caused by adding `org.apache.spark.sql.create.version = 3.0.0-SNAPSHOT`? --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22934: [BUILD] Close stale PRs
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/22934 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/98427/ Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22934: [BUILD] Close stale PRs
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/22934 Merged build finished. Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22934: [BUILD] Close stale PRs
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/22934 **[Test build #98427 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/98427/testReport)** for PR 22934 at commit [`322e21c`](https://github.com/apache/spark/commit/322e21c29919cb7dcfc2e088cd5d605e1f4bb5a7). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22914: [SPARK-25900][WEBUI]When the page number is more than th...
Github user shahidki31 commented on the issue: https://github.com/apache/spark/pull/22914 @srowen @gengliangwang There is one more place where the WEBUI can throw an exception. https://github.com/apache/spark/blob/1a7abf3f453f7d6012d7e842cf05f29f3afbb3bc/core/src/main/scala/org/apache/spark/ui/PagedTable.scala#L36-L38 I will update the PR. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #22933: [SPARK-25933][DOCUMENTATION] Fix pstats.Stats() r...
Github user asfgit closed the pull request at: https://github.com/apache/spark/pull/22933 --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22933: [SPARK-25933][DOCUMENTATION] Fix pstats.Stats() referenc...
Github user srowen commented on the issue: https://github.com/apache/spark/pull/22933 Merged to master/2.4/2.3 --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22626: [SPARK-25638][SQL] Adding new function - to_csv()
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/22626 **[Test build #98431 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/98431/testReport)** for PR 22626 at commit [`1895cdc`](https://github.com/apache/spark/commit/1895cdc3540f67ad562e10488ac7ffe7012d9ccc). --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22914: [SPARK-25900][WEBUI]When the page number is more than th...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/22914 Merged build finished. Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22914: [SPARK-25900][WEBUI]When the page number is more than th...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/22914 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/98426/ Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22914: [SPARK-25900][WEBUI]When the page number is more than th...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/22914 **[Test build #98426 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/98426/testReport)** for PR 22914 at commit [`2e39c4a`](https://github.com/apache/spark/commit/2e39c4a2cbf1db82b37795b2b568985fda2ff903). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22920: [SPARK-25931][SQL] Benchmarking creation of Jackson pars...
Github user MaxGekk commented on the issue: https://github.com/apache/spark/pull/22920 @dongjoon-hyun Thank you for re-running the benchmarks on EC2, and @HyukjinKwon for review. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22919: [SPARK-25906][SHELL] Documents '-I' option (from Scala R...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/22919 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/98425/ Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22919: [SPARK-25906][SHELL] Documents '-I' option (from Scala R...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/22919 Merged build finished. Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22919: [SPARK-25906][SHELL] Documents '-I' option (from Scala R...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/22919 **[Test build #98425 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/98425/testReport)** for PR 22919 at commit [`5f3cb87`](https://github.com/apache/spark/commit/5f3cb87c8798e72cc6852e71c02ffc2077c748d7). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22920: [SPARK-25931][SQL] Benchmarking creation of Jackson pars...
Github user dongjoon-hyun commented on the issue: https://github.com/apache/spark/pull/22920 Thank you, @MaxGekk and @HyukjinKwon ! --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22932: [SPARK-25102][SQL] Write Spark version to ORC/Parquet fi...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/22932 **[Test build #98430 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/98430/testReport)** for PR 22932 at commit [`f5d35b4`](https://github.com/apache/spark/commit/f5d35b42092a7af2b545b4145daf9172ea6a8e32). --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22932: [SPARK-25102][SQL] Write Spark version to ORC/Parquet fi...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/22932 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution-unified/4741/ Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22932: [SPARK-25102][SQL] Write Spark version to ORC/Parquet fi...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/22932 Merged build finished. Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #22889: [SPARK-25882][SQL] Added a function to join two d...
Github user arman1371 commented on a diff in the pull request: https://github.com/apache/spark/pull/22889#discussion_r230560687 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/Dataset.scala --- @@ -883,6 +883,31 @@ class Dataset[T] private[sql]( join(right, Seq(usingColumn)) } + /** +* Equi-join with another `DataFrame` using the given column. +* +* Different from other join functions, the join column will only appear once in the output, +* i.e. similar to SQL's `JOIN USING` syntax. +* +* {{{ +* // Left join of df1 and df2 using the column "user_id" +* df1.join(df2, "user_id", "left") +* }}} +* +* @param right Right side of the join operation. +* @param usingColumn Name of the column to join on. This column must exist on both sides. +* @param joinType Type of join to perform. Default `inner`. Must be one of: +* `inner`, `cross`, `outer`, `full`, `full_outer`, `left`, `left_outer`, +* `right`, `right_outer`, `left_semi`, `left_anti`. +* @note If you perform a self-join using this function without aliasing the input +* `DataFrame`s, you will NOT be able to reference any columns after the join, since +* there is no way to disambiguate which side of the join you would like to reference. +* @group untypedrel +*/ + def join(right: Dataset[_], usingColumn: String, joinType: String): DataFrame = { --- End diff -- no, it's a simpler implementation. as i said we have both of `def join(right: Dataset[_], usingColumn: String)` and `def join(right: Dataset[_], usingColumns: Seq[String])`. based on your opinion the first function should be removed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22934: [BUILD] Close stale PRs
Github user dongjoon-hyun commented on the issue: https://github.com/apache/spark/pull/22934 Thank you for taking care of this, @wangyum . nit. We are using `[BUILD]` or `[INFRA]` tag for this kind of work. Maybe, can we use `[INFRA]` consistently? --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22932: [SPARK-25102][SQL] Write Spark version to ORC/Parquet fi...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/22932 **[Test build #98429 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/98429/testReport)** for PR 22932 at commit [`1ed6368`](https://github.com/apache/spark/commit/1ed63683f2f8f7361a83892d38c84f40e2464590). --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22932: [SPARK-25102][SQL] Write Spark version to ORC/Parquet fi...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/22932 Merged build finished. Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22932: [SPARK-25102][SQL] Write Spark version to ORC/Parquet fi...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/22932 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution-unified/4740/ Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22930: [SPARK-24869][SQL] Fix SaveIntoDataSourceCommand's input...
Github user wangyum commented on the issue: https://github.com/apache/spark/pull/22930 cc @gatorsmile @gengliangwang @maropu --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #22920: [SPARK-25931][SQL] Benchmarking creation of Jacks...
Github user asfgit closed the pull request at: https://github.com/apache/spark/pull/22920 --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22936: Support WITH clause (CTE) in subqueries
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/22936 Can one of the admins verify this patch? --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22936: Support WITH clause (CTE) in subqueries
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/22936 Can one of the admins verify this patch? --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22936: Support WITH clause (CTE) in subqueries
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/22936 Can one of the admins verify this patch? --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #22936: Support WITH clause (CTE) in subqueries
GitHub user gbloisi opened a pull request: https://github.com/apache/spark/pull/22936 Support WITH clause (CTE) in subqueries Because of SPARK-17590 support of WITH clause (CTE) in subqueries requires only grammar support. Test for augmented syntax is provided. You can merge this pull request into a Git repository by running: $ git pull https://github.com/gbloisi/spark SPARK-19799 Alternatively you can review and apply these changes as the patch at: https://github.com/apache/spark/pull/22936.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #22936 commit 66cd5379a17e05707ae162bb20e9c64812737d78 Author: Giambattista Bloisi Date: 2018-11-03T16:04:09Z Because of SPARK-17590 support of WITH clause in subqueries requires only grammar support. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #22889: [SPARK-25882][SQL] Added a function to join two d...
Github user wangyum commented on a diff in the pull request: https://github.com/apache/spark/pull/22889#discussion_r230559647 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/Dataset.scala --- @@ -883,6 +883,31 @@ class Dataset[T] private[sql]( join(right, Seq(usingColumn)) } + /** +* Equi-join with another `DataFrame` using the given column. +* +* Different from other join functions, the join column will only appear once in the output, +* i.e. similar to SQL's `JOIN USING` syntax. +* +* {{{ +* // Left join of df1 and df2 using the column "user_id" +* df1.join(df2, "user_id", "left") +* }}} +* +* @param right Right side of the join operation. +* @param usingColumn Name of the column to join on. This column must exist on both sides. +* @param joinType Type of join to perform. Default `inner`. Must be one of: +* `inner`, `cross`, `outer`, `full`, `full_outer`, `left`, `left_outer`, +* `right`, `right_outer`, `left_semi`, `left_anti`. +* @note If you perform a self-join using this function without aliasing the input +* `DataFrame`s, you will NOT be able to reference any columns after the join, since +* there is no way to disambiguate which side of the join you would like to reference. +* @group untypedrel +*/ + def join(right: Dataset[_], usingColumn: String, joinType: String): DataFrame = { --- End diff -- Cloud we close this? --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #22889: [SPARK-25882][SQL] Added a function to join two d...
Github user arman1371 commented on a diff in the pull request: https://github.com/apache/spark/pull/22889#discussion_r230559472 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/Dataset.scala --- @@ -883,6 +883,31 @@ class Dataset[T] private[sql]( join(right, Seq(usingColumn)) } + /** +* Equi-join with another `DataFrame` using the given column. +* +* Different from other join functions, the join column will only appear once in the output, +* i.e. similar to SQL's `JOIN USING` syntax. +* +* {{{ +* // Left join of df1 and df2 using the column "user_id" +* df1.join(df2, "user_id", "left") +* }}} +* +* @param right Right side of the join operation. +* @param usingColumn Name of the column to join on. This column must exist on both sides. +* @param joinType Type of join to perform. Default `inner`. Must be one of: +* `inner`, `cross`, `outer`, `full`, `full_outer`, `left`, `left_outer`, +* `right`, `right_outer`, `left_semi`, `left_anti`. +* @note If you perform a self-join using this function without aliasing the input +* `DataFrame`s, you will NOT be able to reference any columns after the join, since +* there is no way to disambiguate which side of the join you would like to reference. +* @group untypedrel +*/ + def join(right: Dataset[_], usingColumn: String, joinType: String): DataFrame = { --- End diff -- The answer of both questions are yes. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #22626: [SPARK-25638][SQL] Adding new function - to_csv()
Github user MaxGekk commented on a diff in the pull request: https://github.com/apache/spark/pull/22626#discussion_r230559020 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/csv/UnivocityGenerator.scala --- @@ -15,18 +15,17 @@ * limitations under the License. */ -package org.apache.spark.sql.execution.datasources.csv +package org.apache.spark.sql.catalyst.csv import java.io.Writer import com.univocity.parsers.csv.CsvWriter import org.apache.spark.sql.catalyst.InternalRow -import org.apache.spark.sql.catalyst.csv.CSVOptions import org.apache.spark.sql.catalyst.util.DateTimeUtils import org.apache.spark.sql.types._ -private[csv] class UnivocityGenerator( +private[sql] class UnivocityGenerator( --- End diff -- removed --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #22626: [SPARK-25638][SQL] Adding new function - to_csv()
Github user MaxGekk commented on a diff in the pull request: https://github.com/apache/spark/pull/22626#discussion_r230559006 --- Diff: sql/core/src/test/resources/sql-tests/inputs/csv-functions.sql --- @@ -15,3 +15,10 @@ CREATE TEMPORARY VIEW csvTable(csvField, a) AS SELECT * FROM VALUES ('1,abc', 'a SELECT schema_of_csv(csvField) FROM csvTable; -- Clean up DROP VIEW IF EXISTS csvTable; +-- to_csv +select to_csv(named_struct('a', 1, 'b', 2)); +select to_csv(named_struct('time', to_timestamp('2015-08-26', '-MM-dd')), map('timestampFormat', 'dd/MM/')); +-- Check if errors handled +select to_csv(named_struct('a', 1, 'b', 2), named_struct('mode', 'PERMISSIVE')); +select to_csv(named_struct('a', 1, 'b', 2), map('mode', 1)); --- End diff -- I removed `select to_csv()` --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22626: [SPARK-25638][SQL] Adding new function - to_csv()
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/22626 **[Test build #98428 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/98428/testReport)** for PR 22626 at commit [`6969b49`](https://github.com/apache/spark/commit/6969b49812acd2664bca724378a3739cb7846a6a). --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22893: [SPARK-25868][MLlib] One part of Spark MLlib Kmean Logic...
Github user KyleLi1985 commented on the issue: https://github.com/apache/spark/pull/22893 > OK, the Spark part doesn't seem relevant. The input might be more realistic here, yes. I was commenting that your test code doesn't show what you're testing, though I understand you manually modified it. Because the test is so central here I think it's important to understand exactly what you're measuring and exactly what you're running. > > This doesn't show an improvement, right? TEST, I agree with you No influence for sparse case --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #22935: Branch 2.2
Github user litao1223 closed the pull request at: https://github.com/apache/spark/pull/22935 --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #22889: [SPARK-25882][SQL] Added a function to join two d...
Github user wangyum commented on a diff in the pull request: https://github.com/apache/spark/pull/22889#discussion_r230557937 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/Dataset.scala --- @@ -883,6 +883,31 @@ class Dataset[T] private[sql]( join(right, Seq(usingColumn)) } + /** +* Equi-join with another `DataFrame` using the given column. +* +* Different from other join functions, the join column will only appear once in the output, +* i.e. similar to SQL's `JOIN USING` syntax. +* +* {{{ +* // Left join of df1 and df2 using the column "user_id" +* df1.join(df2, "user_id", "left") +* }}} +* +* @param right Right side of the join operation. +* @param usingColumn Name of the column to join on. This column must exist on both sides. +* @param joinType Type of join to perform. Default `inner`. Must be one of: +* `inner`, `cross`, `outer`, `full`, `full_outer`, `left`, `left_outer`, +* `right`, `right_outer`, `left_semi`, `left_anti`. +* @note If you perform a self-join using this function without aliasing the input +* `DataFrame`s, you will NOT be able to reference any columns after the join, since +* there is no way to disambiguate which side of the join you would like to reference. +* @group untypedrel +*/ + def join(right: Dataset[_], usingColumn: String, joinType: String): DataFrame = { --- End diff -- @arman1371 What do you think? ```def join(right: Dataset[_], usingColumn: String, joinType: String)``` only support one column. right? --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22935: Branch 2.2
Github user wangyum commented on the issue: https://github.com/apache/spark/pull/22935 @litao1223 Please close this. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22935: Branch 2.2
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/22935 Can one of the admins verify this patch? --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22935: Branch 2.2
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/22935 Can one of the admins verify this patch? --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22935: Branch 2.2
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/22935 Can one of the admins verify this patch? --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org