[GitHub] spark pull request #22889: [SPARK-25882][SQL] Added a function to join two d...
Github user arman1371 commented on a diff in the pull request: https://github.com/apache/spark/pull/22889#discussion_r230575855 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/Dataset.scala --- @@ -883,6 +883,31 @@ class Dataset[T] private[sql]( join(right, Seq(usingColumn)) } + /** +* Equi-join with another `DataFrame` using the given column. +* +* Different from other join functions, the join column will only appear once in the output, +* i.e. similar to SQL's `JOIN USING` syntax. +* +* {{{ +* // Left join of df1 and df2 using the column "user_id" +* df1.join(df2, "user_id", "left") +* }}} +* +* @param right Right side of the join operation. +* @param usingColumn Name of the column to join on. This column must exist on both sides. +* @param joinType Type of join to perform. Default `inner`. Must be one of: +* `inner`, `cross`, `outer`, `full`, `full_outer`, `left`, `left_outer`, +* `right`, `right_outer`, `left_semi`, `left_anti`. +* @note If you perform a self-join using this function without aliasing the input +* `DataFrame`s, you will NOT be able to reference any columns after the join, since +* there is no way to disambiguate which side of the join you would like to reference. +* @group untypedrel +*/ + def join(right: Dataset[_], usingColumn: String, joinType: String): DataFrame = { --- End diff -- Yes, for scala api it's just 5 characters but in java api it's very hard to change `String` to `Seq[String]` --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22867: [SPARK-25778] WriteAheadLogBackedBlockRDD in YARN Cluste...
Github user gss2002 commented on the issue: https://github.com/apache/spark/pull/22867 @vanzin can you please review latest patch thanks! --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22914: [SPARK-25900][WEBUI]When the page number is more than th...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/22914 **[Test build #98436 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/98436/testReport)** for PR 22914 at commit [`64aa7a7`](https://github.com/apache/spark/commit/64aa7a7ab2f4d2653aa4310d30884b26eb712ed1). --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22914: [SPARK-25900][WEBUI]When the page number is more than th...
Github user HyukjinKwon commented on the issue: https://github.com/apache/spark/pull/22914 retest this please --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22818: [SPARK-25904][CORE] Allocate arrays smaller than Int.Max...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/22818 Merged build finished. Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22818: [SPARK-25904][CORE] Allocate arrays smaller than Int.Max...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/22818 **[Test build #98435 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/98435/testReport)** for PR 22818 at commit [`ca3efd8`](https://github.com/apache/spark/commit/ca3efd8f636706abf8c994cb75c14432f4e4939a). --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22818: [SPARK-25904][CORE] Allocate arrays smaller than Int.Max...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/22818 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution-unified/4744/ Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22818: [SPARK-25904][CORE] Allocate arrays smaller than Int.Max...
Github user kiszk commented on the issue: https://github.com/apache/spark/pull/22818 retest this please --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22754: [SPARK-25776][CORE]The disk write buffer size must be gr...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/22754 **[Test build #98434 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/98434/testReport)** for PR 22754 at commit [`c883f4b`](https://github.com/apache/spark/commit/c883f4b1b26139e3cb2053b8efa0b0bb0f902be0). --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22754: [SPARK-25776][CORE]The disk write buffer size must be gr...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/22754 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution-unified/4743/ Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22754: [SPARK-25776][CORE]The disk write buffer size must be gr...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/22754 Merged build finished. Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22754: [SPARK-25776][CORE]The disk write buffer size must be gr...
Github user kiszk commented on the issue: https://github.com/apache/spark/pull/22754 retest this please --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22754: [SPARK-25776][CORE]The disk write buffer size must be gr...
Github user kiszk commented on the issue: https://github.com/apache/spark/pull/22754 LGTM --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22921: [SPARK-25908][CORE][SQL] Remove old deprecated items in ...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/22921 Merged build finished. Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22921: [SPARK-25908][CORE][SQL] Remove old deprecated items in ...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/22921 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/98433/ Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22921: [SPARK-25908][CORE][SQL] Remove old deprecated items in ...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/22921 **[Test build #98433 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/98433/testReport)** for PR 22921 at commit [`df92f0f`](https://github.com/apache/spark/commit/df92f0fd48883adfd1c4881c03d86665b93d0831). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22683: [SPARK-25696] The storage memory displayed on spark Appl...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/22683 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/98432/ Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22683: [SPARK-25696] The storage memory displayed on spark Appl...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/22683 Merged build finished. Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22683: [SPARK-25696] The storage memory displayed on spark Appl...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/22683 **[Test build #98432 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/98432/testReport)** for PR 22683 at commit [`9e45697`](https://github.com/apache/spark/commit/9e45697296039e55e85dd204788e287c9c60fceb). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #22889: [SPARK-25882][SQL] Added a function to join two d...
Github user dongjoon-hyun commented on a diff in the pull request: https://github.com/apache/spark/pull/22889#discussion_r230571447 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/Dataset.scala --- @@ -883,6 +883,31 @@ class Dataset[T] private[sql]( join(right, Seq(usingColumn)) } + /** +* Equi-join with another `DataFrame` using the given column. --- End diff -- Please fix the indentation here. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #22889: [SPARK-25882][SQL] Added a function to join two d...
Github user dongjoon-hyun commented on a diff in the pull request: https://github.com/apache/spark/pull/22889#discussion_r230571437 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/Dataset.scala --- @@ -883,6 +883,31 @@ class Dataset[T] private[sql]( join(right, Seq(usingColumn)) } + /** +* Equi-join with another `DataFrame` using the given column. +* +* Different from other join functions, the join column will only appear once in the output, +* i.e. similar to SQL's `JOIN USING` syntax. +* +* {{{ +* // Left join of df1 and df2 using the column "user_id" +* df1.join(df2, "user_id", "left") +* }}} +* +* @param right Right side of the join operation. +* @param usingColumn Name of the column to join on. This column must exist on both sides. +* @param joinType Type of join to perform. Default `inner`. Must be one of: +* `inner`, `cross`, `outer`, `full`, `full_outer`, `left`, `left_outer`, +* `right`, `right_outer`, `left_semi`, `left_anti`. +* @note If you perform a self-join using this function without aliasing the input +* `DataFrame`s, you will NOT be able to reference any columns after the join, since +* there is no way to disambiguate which side of the join you would like to reference. +* @group untypedrel +*/ + def join(right: Dataset[_], usingColumn: String, joinType: String): DataFrame = { --- End diff -- @arman1371 . We understand that this PR is trying to add a syntactic sugar. But you need only 5 characters, `Seq(` and `)`, to use the existing general API. Personally, I agree with @wangyum . I prefer not to add this. Historically, 1. Spark 1.4 adds `Seq[String]` version was added later to support PySpark (SPARK-7990) 2. Spark 1.6 adds `join` type to `Seq[String]` version (SPARK-10446) It's a long time ago. Given that, I guess Apache Spark community intentionally didn't add the `String` version for this in order to keep `Dataset` simple in terms of the number of APIs. Anyway, since you need an answer, let's ask the general opinion again to make a decision. Hi, @rxin, @cloud-fan , @gatorsmile . Did we explicitly decide not to add this API ? It seems that @arman1371 wants to add this for feature parity with PySpark at Spark 3.0.0. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #22889: [SPARK-25882][SQL] Added a function to join two d...
Github user wangyum commented on a diff in the pull request: https://github.com/apache/spark/pull/22889#discussion_r230570164 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/Dataset.scala --- @@ -883,6 +883,31 @@ class Dataset[T] private[sql]( join(right, Seq(usingColumn)) } + /** +* Equi-join with another `DataFrame` using the given column. +* +* Different from other join functions, the join column will only appear once in the output, +* i.e. similar to SQL's `JOIN USING` syntax. +* +* {{{ +* // Left join of df1 and df2 using the column "user_id" +* df1.join(df2, "user_id", "left") +* }}} +* +* @param right Right side of the join operation. +* @param usingColumn Name of the column to join on. This column must exist on both sides. +* @param joinType Type of join to perform. Default `inner`. Must be one of: +* `inner`, `cross`, `outer`, `full`, `full_outer`, `left`, `left_outer`, +* `right`, `right_outer`, `left_semi`, `left_anti`. +* @note If you perform a self-join using this function without aliasing the input +* `DataFrame`s, you will NOT be able to reference any columns after the join, since +* there is no way to disambiguate which side of the join you would like to reference. +* @group untypedrel +*/ + def join(right: Dataset[_], usingColumn: String, joinType: String): DataFrame = { --- End diff -- cc @dongjoon-hyun --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22914: [SPARK-25900][WEBUI]When the page number is more than th...
Github user shahidki31 commented on the issue: https://github.com/apache/spark/pull/22914 retest this please --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22921: [SPARK-25908][CORE][SQL] Remove old deprecated items in ...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/22921 Merged build finished. Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22921: [SPARK-25908][CORE][SQL] Remove old deprecated items in ...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/22921 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution-unified/4742/ Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22921: [SPARK-25908][CORE][SQL] Remove old deprecated items in ...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/22921 **[Test build #98433 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/98433/testReport)** for PR 22921 at commit [`df92f0f`](https://github.com/apache/spark/commit/df92f0fd48883adfd1c4881c03d86665b93d0831). --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22921: [SPARK-25908][CORE][SQL] Remove old deprecated items in ...
Github user srowen commented on the issue: https://github.com/apache/spark/pull/22921 Yeah it's a good point that these weren't deprecated, but I assume they should have been. Same change, same time, same logic. given that it's a reasonably niche method, I thought it would be best to go ahead and be consistent here? --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #22921: [SPARK-25908][CORE][SQL] Remove old deprecated it...
Github user srowen commented on a diff in the pull request: https://github.com/apache/spark/pull/22921#discussion_r230568378 --- Diff: R/pkg/R/functions.R --- @@ -319,23 +319,23 @@ setMethod("acos", }) #' @details -#' \code{approxCountDistinct}: Returns the approximate number of distinct items in a group. +#' \code{approx_count_distinct}: Returns the approximate number of distinct items in a group. #' #' @rdname column_aggregate_functions -#' @aliases approxCountDistinct approxCountDistinct,Column-method +#' @aliases approx_count_distinct approx_count_distinct,Column-method #' @examples #' #' \dontrun{ -#' head(select(df, approxCountDistinct(df$gear))) -#' head(select(df, approxCountDistinct(df$gear, 0.02))) +#' head(select(df, approx_count_distinct(df$gear))) +#' head(select(df, approx_count_distinct(df$gear, 0.02))) #' head(select(df, countDistinct(df$gear, df$cyl))) #' head(select(df, n_distinct(df$gear))) #' head(distinct(select(df, "gear")))} -#' @note approxCountDistinct(Column) since 1.4.0 -setMethod("approxCountDistinct", +#' @note approx_count_distinct(Column) since 2.0.0 --- End diff -- Right, will fix that one too if I missed it, per https://github.com/apache/spark/pull/22921#discussion_r230449173 --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22683: [SPARK-25696] The storage memory displayed on spark Appl...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/22683 **[Test build #98432 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/98432/testReport)** for PR 22683 at commit [`9e45697`](https://github.com/apache/spark/commit/9e45697296039e55e85dd204788e287c9c60fceb). --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22683: [SPARK-25696] The storage memory displayed on spark Appl...
Github user felixcheung commented on the issue: https://github.com/apache/spark/pull/22683 Jenkins, ok to test --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #22921: [SPARK-25908][CORE][SQL] Remove old deprecated it...
Github user felixcheung commented on a diff in the pull request: https://github.com/apache/spark/pull/22921#discussion_r230568058 --- Diff: R/pkg/R/generics.R --- @@ -748,7 +748,7 @@ setGeneric("add_months", function(y, x) { standardGeneric("add_months") }) #' @rdname column_aggregate_functions #' @name NULL -setGeneric("approxCountDistinct", function(x, ...) { standardGeneric("approxCountDistinct") }) +setGeneric("approx_count_distinct", function(x, ...) { standardGeneric("approx_count_distinct") }) --- End diff -- my concern is that these are breaking changes in a version without having them deprecated first... could we leave the old one to redirect and add .Deprecate? --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #22921: [SPARK-25908][CORE][SQL] Remove old deprecated it...
Github user felixcheung commented on a diff in the pull request: https://github.com/apache/spark/pull/22921#discussion_r230568079 --- Diff: R/pkg/R/functions.R --- @@ -1641,30 +1641,30 @@ setMethod("tanh", }) #' @details -#' \code{toDegrees}: Converts an angle measured in radians to an approximately equivalent angle +#' \code{degrees}: Converts an angle measured in radians to an approximately equivalent angle #' measured in degrees. #' #' @rdname column_math_functions -#' @aliases toDegrees toDegrees,Column-method -#' @note toDegrees since 1.4.0 -setMethod("toDegrees", +#' @aliases degrees degrees,Column-method +#' @note degrees since 3.0.0 +setMethod("degrees", --- End diff -- `degrees` and `radians` will need to be added to NAMESPACE file for export --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #22921: [SPARK-25908][CORE][SQL] Remove old deprecated it...
Github user felixcheung commented on a diff in the pull request: https://github.com/apache/spark/pull/22921#discussion_r230568088 --- Diff: R/pkg/R/functions.R --- @@ -319,23 +319,23 @@ setMethod("acos", }) #' @details -#' \code{approxCountDistinct}: Returns the approximate number of distinct items in a group. +#' \code{approx_count_distinct}: Returns the approximate number of distinct items in a group. #' #' @rdname column_aggregate_functions -#' @aliases approxCountDistinct approxCountDistinct,Column-method +#' @aliases approx_count_distinct approx_count_distinct,Column-method #' @examples #' #' \dontrun{ -#' head(select(df, approxCountDistinct(df$gear))) -#' head(select(df, approxCountDistinct(df$gear, 0.02))) +#' head(select(df, approx_count_distinct(df$gear))) +#' head(select(df, approx_count_distinct(df$gear, 0.02))) #' head(select(df, countDistinct(df$gear, df$cyl))) #' head(select(df, n_distinct(df$gear))) #' head(distinct(select(df, "gear")))} -#' @note approxCountDistinct(Column) since 1.4.0 -setMethod("approxCountDistinct", +#' @note approx_count_distinct(Column) since 2.0.0 --- End diff -- it's actually new in R for 3.0.0 then --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22626: [SPARK-25638][SQL] Adding new function - to_csv()
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/22626 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/98431/ Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22626: [SPARK-25638][SQL] Adding new function - to_csv()
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/22626 Merged build finished. Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22626: [SPARK-25638][SQL] Adding new function - to_csv()
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/22626 **[Test build #98431 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/98431/testReport)** for PR 22626 at commit [`1895cdc`](https://github.com/apache/spark/commit/1895cdc3540f67ad562e10488ac7ffe7012d9ccc). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22937: [SPARK-25934] [Mesos] Don't propagate SPARK_CONF_DIR fro...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/22937 Can one of the admins verify this patch? --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22937: [SPARK-25934] [Mesos] Don't propagate SPARK_CONF_DIR fro...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/22937 Can one of the admins verify this patch? --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22937: [SPARK-25934] [Mesos] Don't propagate SPARK_CONF_DIR fro...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/22937 Can one of the admins verify this patch? --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #22937: [SPARK-25934] [Mesos] Don't propagate SPARK_CONF_...
GitHub user mpmolek opened a pull request: https://github.com/apache/spark/pull/22937 [SPARK-25934] [Mesos] Don't propagate SPARK_CONF_DIR from spark submit ## What changes were proposed in this pull request? Don't propagate SPARK_CONF_DIR to the driver in mesos cluster mode. ## How was this patch tested? I built the 2.3.2 tag with this patch added and deployed a test job to a mesos cluster to confirm that the incorrect SPARK_CONF_DIR was no longer passed from the submit command. You can merge this pull request into a Git repository by running: $ git pull https://github.com/mpmolek/spark fix-conf-dir Alternatively you can review and apply these changes as the patch at: https://github.com/apache/spark/pull/22937.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #22937 commit 10da1120d52f61e98bcd929d7ad59220a93d59f7 Author: Matt Molek Date: 2018-11-03T19:33:02Z [SPARK-25934] Don't propagate SPARK_CONF_DIR from spark submit --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22932: [SPARK-25102][SQL] Write Spark version to ORC/Parquet fi...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/22932 Merged build finished. Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22932: [SPARK-25102][SQL] Write Spark version to ORC/Parquet fi...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/22932 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/98430/ Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22932: [SPARK-25102][SQL] Write Spark version to ORC/Parquet fi...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/22932 **[Test build #98430 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/98430/testReport)** for PR 22932 at commit [`f5d35b4`](https://github.com/apache/spark/commit/f5d35b42092a7af2b545b4145daf9172ea6a8e32). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22626: [SPARK-25638][SQL] Adding new function - to_csv()
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/22626 Merged build finished. Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22626: [SPARK-25638][SQL] Adding new function - to_csv()
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/22626 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/98428/ Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22626: [SPARK-25638][SQL] Adding new function - to_csv()
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/22626 **[Test build #98428 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/98428/testReport)** for PR 22626 at commit [`6969b49`](https://github.com/apache/spark/commit/6969b49812acd2664bca724378a3739cb7846a6a). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22932: [SPARK-25102][SQL] Write Spark version to ORC/Parquet fi...
Github user dongjoon-hyun commented on the issue: https://github.com/apache/spark/pull/22932 The last commit will pass the test. The previous one fails due to `spaces at the end`. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22932: [SPARK-25102][SQL] Write Spark version to ORC/Parquet fi...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/22932 Test FAILed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/98429/ Test FAILed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22932: [SPARK-25102][SQL] Write Spark version to ORC/Parquet fi...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/22932 Merged build finished. Test FAILed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22932: [SPARK-25102][SQL] Write Spark version to ORC/Parquet fi...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/22932 **[Test build #98429 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/98429/testReport)** for PR 22932 at commit [`1ed6368`](https://github.com/apache/spark/commit/1ed63683f2f8f7361a83892d38c84f40e2464590). * This patch **fails Spark unit tests**. * This patch merges cleanly. * This patch adds no public classes. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #22932: [SPARK-25102][SQL] Write Spark version to ORC/Par...
Github user dongjoon-hyun commented on a diff in the pull request: https://github.com/apache/spark/pull/22932#discussion_r230564513 --- Diff: sql/core/src/test/resources/sql-tests/results/describe-part-after-analyze.sql.out --- @@ -93,7 +93,7 @@ Partition Values [ds=2017-08-01, hr=10] Location [not included in comparison]sql/core/spark-warehouse/t/ds=2017-08-01/hr=10 Created Time [not included in comparison] Last Access [not included in comparison] -Partition Statistics 1121 bytes, 3 rows +Partition Statistics 1229 bytes, 3 rows --- End diff -- Right, @gatorsmile . --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #22932: [SPARK-25102][SQL] Write Spark version to ORC/Par...
Github user gatorsmile commented on a diff in the pull request: https://github.com/apache/spark/pull/22932#discussion_r230563752 --- Diff: sql/core/src/test/resources/sql-tests/results/describe-part-after-analyze.sql.out --- @@ -93,7 +93,7 @@ Partition Values [ds=2017-08-01, hr=10] Location [not included in comparison]sql/core/spark-warehouse/t/ds=2017-08-01/hr=10 Created Time [not included in comparison] Last Access [not included in comparison] -Partition Statistics 1121 bytes, 3 rows +Partition Statistics 1229 bytes, 3 rows --- End diff -- This is caused by adding `org.apache.spark.sql.create.version = 3.0.0-SNAPSHOT`? --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22934: [BUILD] Close stale PRs
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/22934 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/98427/ Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22934: [BUILD] Close stale PRs
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/22934 Merged build finished. Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22934: [BUILD] Close stale PRs
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/22934 **[Test build #98427 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/98427/testReport)** for PR 22934 at commit [`322e21c`](https://github.com/apache/spark/commit/322e21c29919cb7dcfc2e088cd5d605e1f4bb5a7). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22914: [SPARK-25900][WEBUI]When the page number is more than th...
Github user shahidki31 commented on the issue: https://github.com/apache/spark/pull/22914 @srowen @gengliangwang There is one more place where the WEBUI can throw an exception. https://github.com/apache/spark/blob/1a7abf3f453f7d6012d7e842cf05f29f3afbb3bc/core/src/main/scala/org/apache/spark/ui/PagedTable.scala#L36-L38 I will update the PR. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #22933: [SPARK-25933][DOCUMENTATION] Fix pstats.Stats() r...
Github user asfgit closed the pull request at: https://github.com/apache/spark/pull/22933 --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22933: [SPARK-25933][DOCUMENTATION] Fix pstats.Stats() referenc...
Github user srowen commented on the issue: https://github.com/apache/spark/pull/22933 Merged to master/2.4/2.3 --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22626: [SPARK-25638][SQL] Adding new function - to_csv()
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/22626 **[Test build #98431 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/98431/testReport)** for PR 22626 at commit [`1895cdc`](https://github.com/apache/spark/commit/1895cdc3540f67ad562e10488ac7ffe7012d9ccc). --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22914: [SPARK-25900][WEBUI]When the page number is more than th...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/22914 Merged build finished. Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22914: [SPARK-25900][WEBUI]When the page number is more than th...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/22914 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/98426/ Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22914: [SPARK-25900][WEBUI]When the page number is more than th...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/22914 **[Test build #98426 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/98426/testReport)** for PR 22914 at commit [`2e39c4a`](https://github.com/apache/spark/commit/2e39c4a2cbf1db82b37795b2b568985fda2ff903). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22920: [SPARK-25931][SQL] Benchmarking creation of Jackson pars...
Github user MaxGekk commented on the issue: https://github.com/apache/spark/pull/22920 @dongjoon-hyun Thank you for re-running the benchmarks on EC2, and @HyukjinKwon for review. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22919: [SPARK-25906][SHELL] Documents '-I' option (from Scala R...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/22919 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/98425/ Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22919: [SPARK-25906][SHELL] Documents '-I' option (from Scala R...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/22919 Merged build finished. Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22919: [SPARK-25906][SHELL] Documents '-I' option (from Scala R...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/22919 **[Test build #98425 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/98425/testReport)** for PR 22919 at commit [`5f3cb87`](https://github.com/apache/spark/commit/5f3cb87c8798e72cc6852e71c02ffc2077c748d7). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22920: [SPARK-25931][SQL] Benchmarking creation of Jackson pars...
Github user dongjoon-hyun commented on the issue: https://github.com/apache/spark/pull/22920 Thank you, @MaxGekk and @HyukjinKwon ! --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22932: [SPARK-25102][SQL] Write Spark version to ORC/Parquet fi...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/22932 **[Test build #98430 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/98430/testReport)** for PR 22932 at commit [`f5d35b4`](https://github.com/apache/spark/commit/f5d35b42092a7af2b545b4145daf9172ea6a8e32). --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22932: [SPARK-25102][SQL] Write Spark version to ORC/Parquet fi...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/22932 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution-unified/4741/ Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22932: [SPARK-25102][SQL] Write Spark version to ORC/Parquet fi...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/22932 Merged build finished. Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #22889: [SPARK-25882][SQL] Added a function to join two d...
Github user arman1371 commented on a diff in the pull request: https://github.com/apache/spark/pull/22889#discussion_r230560687 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/Dataset.scala --- @@ -883,6 +883,31 @@ class Dataset[T] private[sql]( join(right, Seq(usingColumn)) } + /** +* Equi-join with another `DataFrame` using the given column. +* +* Different from other join functions, the join column will only appear once in the output, +* i.e. similar to SQL's `JOIN USING` syntax. +* +* {{{ +* // Left join of df1 and df2 using the column "user_id" +* df1.join(df2, "user_id", "left") +* }}} +* +* @param right Right side of the join operation. +* @param usingColumn Name of the column to join on. This column must exist on both sides. +* @param joinType Type of join to perform. Default `inner`. Must be one of: +* `inner`, `cross`, `outer`, `full`, `full_outer`, `left`, `left_outer`, +* `right`, `right_outer`, `left_semi`, `left_anti`. +* @note If you perform a self-join using this function without aliasing the input +* `DataFrame`s, you will NOT be able to reference any columns after the join, since +* there is no way to disambiguate which side of the join you would like to reference. +* @group untypedrel +*/ + def join(right: Dataset[_], usingColumn: String, joinType: String): DataFrame = { --- End diff -- no, it's a simpler implementation. as i said we have both of `def join(right: Dataset[_], usingColumn: String)` and `def join(right: Dataset[_], usingColumns: Seq[String])`. based on your opinion the first function should be removed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22934: [BUILD] Close stale PRs
Github user dongjoon-hyun commented on the issue: https://github.com/apache/spark/pull/22934 Thank you for taking care of this, @wangyum . nit. We are using `[BUILD]` or `[INFRA]` tag for this kind of work. Maybe, can we use `[INFRA]` consistently? --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22932: [SPARK-25102][SQL] Write Spark version to ORC/Parquet fi...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/22932 **[Test build #98429 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/98429/testReport)** for PR 22932 at commit [`1ed6368`](https://github.com/apache/spark/commit/1ed63683f2f8f7361a83892d38c84f40e2464590). --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22932: [SPARK-25102][SQL] Write Spark version to ORC/Parquet fi...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/22932 Merged build finished. Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22932: [SPARK-25102][SQL] Write Spark version to ORC/Parquet fi...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/22932 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution-unified/4740/ Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22930: [SPARK-24869][SQL] Fix SaveIntoDataSourceCommand's input...
Github user wangyum commented on the issue: https://github.com/apache/spark/pull/22930 cc @gatorsmile @gengliangwang @maropu --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #22920: [SPARK-25931][SQL] Benchmarking creation of Jacks...
Github user asfgit closed the pull request at: https://github.com/apache/spark/pull/22920 --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22936: Support WITH clause (CTE) in subqueries
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/22936 Can one of the admins verify this patch? --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22936: Support WITH clause (CTE) in subqueries
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/22936 Can one of the admins verify this patch? --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22936: Support WITH clause (CTE) in subqueries
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/22936 Can one of the admins verify this patch? --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #22936: Support WITH clause (CTE) in subqueries
GitHub user gbloisi opened a pull request: https://github.com/apache/spark/pull/22936 Support WITH clause (CTE) in subqueries Because of SPARK-17590 support of WITH clause (CTE) in subqueries requires only grammar support. Test for augmented syntax is provided. You can merge this pull request into a Git repository by running: $ git pull https://github.com/gbloisi/spark SPARK-19799 Alternatively you can review and apply these changes as the patch at: https://github.com/apache/spark/pull/22936.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #22936 commit 66cd5379a17e05707ae162bb20e9c64812737d78 Author: Giambattista Bloisi Date: 2018-11-03T16:04:09Z Because of SPARK-17590 support of WITH clause in subqueries requires only grammar support. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #22889: [SPARK-25882][SQL] Added a function to join two d...
Github user wangyum commented on a diff in the pull request: https://github.com/apache/spark/pull/22889#discussion_r230559647 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/Dataset.scala --- @@ -883,6 +883,31 @@ class Dataset[T] private[sql]( join(right, Seq(usingColumn)) } + /** +* Equi-join with another `DataFrame` using the given column. +* +* Different from other join functions, the join column will only appear once in the output, +* i.e. similar to SQL's `JOIN USING` syntax. +* +* {{{ +* // Left join of df1 and df2 using the column "user_id" +* df1.join(df2, "user_id", "left") +* }}} +* +* @param right Right side of the join operation. +* @param usingColumn Name of the column to join on. This column must exist on both sides. +* @param joinType Type of join to perform. Default `inner`. Must be one of: +* `inner`, `cross`, `outer`, `full`, `full_outer`, `left`, `left_outer`, +* `right`, `right_outer`, `left_semi`, `left_anti`. +* @note If you perform a self-join using this function without aliasing the input +* `DataFrame`s, you will NOT be able to reference any columns after the join, since +* there is no way to disambiguate which side of the join you would like to reference. +* @group untypedrel +*/ + def join(right: Dataset[_], usingColumn: String, joinType: String): DataFrame = { --- End diff -- Cloud we close this? --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #22889: [SPARK-25882][SQL] Added a function to join two d...
Github user arman1371 commented on a diff in the pull request: https://github.com/apache/spark/pull/22889#discussion_r230559472 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/Dataset.scala --- @@ -883,6 +883,31 @@ class Dataset[T] private[sql]( join(right, Seq(usingColumn)) } + /** +* Equi-join with another `DataFrame` using the given column. +* +* Different from other join functions, the join column will only appear once in the output, +* i.e. similar to SQL's `JOIN USING` syntax. +* +* {{{ +* // Left join of df1 and df2 using the column "user_id" +* df1.join(df2, "user_id", "left") +* }}} +* +* @param right Right side of the join operation. +* @param usingColumn Name of the column to join on. This column must exist on both sides. +* @param joinType Type of join to perform. Default `inner`. Must be one of: +* `inner`, `cross`, `outer`, `full`, `full_outer`, `left`, `left_outer`, +* `right`, `right_outer`, `left_semi`, `left_anti`. +* @note If you perform a self-join using this function without aliasing the input +* `DataFrame`s, you will NOT be able to reference any columns after the join, since +* there is no way to disambiguate which side of the join you would like to reference. +* @group untypedrel +*/ + def join(right: Dataset[_], usingColumn: String, joinType: String): DataFrame = { --- End diff -- The answer of both questions are yes. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #22626: [SPARK-25638][SQL] Adding new function - to_csv()
Github user MaxGekk commented on a diff in the pull request: https://github.com/apache/spark/pull/22626#discussion_r230559020 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/csv/UnivocityGenerator.scala --- @@ -15,18 +15,17 @@ * limitations under the License. */ -package org.apache.spark.sql.execution.datasources.csv +package org.apache.spark.sql.catalyst.csv import java.io.Writer import com.univocity.parsers.csv.CsvWriter import org.apache.spark.sql.catalyst.InternalRow -import org.apache.spark.sql.catalyst.csv.CSVOptions import org.apache.spark.sql.catalyst.util.DateTimeUtils import org.apache.spark.sql.types._ -private[csv] class UnivocityGenerator( +private[sql] class UnivocityGenerator( --- End diff -- removed --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #22626: [SPARK-25638][SQL] Adding new function - to_csv()
Github user MaxGekk commented on a diff in the pull request: https://github.com/apache/spark/pull/22626#discussion_r230559006 --- Diff: sql/core/src/test/resources/sql-tests/inputs/csv-functions.sql --- @@ -15,3 +15,10 @@ CREATE TEMPORARY VIEW csvTable(csvField, a) AS SELECT * FROM VALUES ('1,abc', 'a SELECT schema_of_csv(csvField) FROM csvTable; -- Clean up DROP VIEW IF EXISTS csvTable; +-- to_csv +select to_csv(named_struct('a', 1, 'b', 2)); +select to_csv(named_struct('time', to_timestamp('2015-08-26', '-MM-dd')), map('timestampFormat', 'dd/MM/')); +-- Check if errors handled +select to_csv(named_struct('a', 1, 'b', 2), named_struct('mode', 'PERMISSIVE')); +select to_csv(named_struct('a', 1, 'b', 2), map('mode', 1)); --- End diff -- I removed `select to_csv()` --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22626: [SPARK-25638][SQL] Adding new function - to_csv()
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/22626 **[Test build #98428 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/98428/testReport)** for PR 22626 at commit [`6969b49`](https://github.com/apache/spark/commit/6969b49812acd2664bca724378a3739cb7846a6a). --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22893: [SPARK-25868][MLlib] One part of Spark MLlib Kmean Logic...
Github user KyleLi1985 commented on the issue: https://github.com/apache/spark/pull/22893 > OK, the Spark part doesn't seem relevant. The input might be more realistic here, yes. I was commenting that your test code doesn't show what you're testing, though I understand you manually modified it. Because the test is so central here I think it's important to understand exactly what you're measuring and exactly what you're running. > > This doesn't show an improvement, right? TEST, I agree with you No influence for sparse case --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #22935: Branch 2.2
Github user litao1223 closed the pull request at: https://github.com/apache/spark/pull/22935 --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #22889: [SPARK-25882][SQL] Added a function to join two d...
Github user wangyum commented on a diff in the pull request: https://github.com/apache/spark/pull/22889#discussion_r230557937 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/Dataset.scala --- @@ -883,6 +883,31 @@ class Dataset[T] private[sql]( join(right, Seq(usingColumn)) } + /** +* Equi-join with another `DataFrame` using the given column. +* +* Different from other join functions, the join column will only appear once in the output, +* i.e. similar to SQL's `JOIN USING` syntax. +* +* {{{ +* // Left join of df1 and df2 using the column "user_id" +* df1.join(df2, "user_id", "left") +* }}} +* +* @param right Right side of the join operation. +* @param usingColumn Name of the column to join on. This column must exist on both sides. +* @param joinType Type of join to perform. Default `inner`. Must be one of: +* `inner`, `cross`, `outer`, `full`, `full_outer`, `left`, `left_outer`, +* `right`, `right_outer`, `left_semi`, `left_anti`. +* @note If you perform a self-join using this function without aliasing the input +* `DataFrame`s, you will NOT be able to reference any columns after the join, since +* there is no way to disambiguate which side of the join you would like to reference. +* @group untypedrel +*/ + def join(right: Dataset[_], usingColumn: String, joinType: String): DataFrame = { --- End diff -- @arman1371 What do you think? ```def join(right: Dataset[_], usingColumn: String, joinType: String)``` only support one column. right? --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22935: Branch 2.2
Github user wangyum commented on the issue: https://github.com/apache/spark/pull/22935 @litao1223 Please close this. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22935: Branch 2.2
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/22935 Can one of the admins verify this patch? --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22935: Branch 2.2
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/22935 Can one of the admins verify this patch? --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22935: Branch 2.2
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/22935 Can one of the admins verify this patch? --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #22935: Branch 2.2
GitHub user litao1223 opened a pull request: https://github.com/apache/spark/pull/22935 Branch 2.2 ## What changes were proposed in this pull request? (Please fill in changes proposed in this fix) ## How was this patch tested? (Please explain how this patch was tested. E.g. unit tests, integration tests, manual tests) (If this patch involves UI changes, please attach a screenshot; otherwise, remove this) Please review http://spark.apache.org/contributing.html before opening a pull request. You can merge this pull request into a Git repository by running: $ git pull https://github.com/litao1223/spark branch-2.2 Alternatively you can review and apply these changes as the patch at: https://github.com/apache/spark/pull/22935.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #22935 commit 399aa016e8f44fea4e5ef4b71a9a80484dd755f8 Author: Xingbo Jiang Date: 2017-07-11T13:52:54Z [SPARK-21366][SQL][TEST] Add sql test for window functions ## What changes were proposed in this pull request? Add sql test for window functions, also remove uncecessary test cases in `WindowQuerySuite`. ## How was this patch tested? Added `window.sql` and the corresponding output file. Author: Xingbo Jiang Closes #18591 from jiangxb1987/window. (cherry picked from commit 66d21686556681457aab6e44e19f5614c5635f0c) Signed-off-by: Wenchen Fan commit cb6fc89ba20a427fa7d66fa5036b17c1a5d5d87f Author: Eric Vandenberg Date: 2017-07-12T06:49:15Z [SPARK-21219][CORE] Task retry occurs on same executor due to race co⦠â¦ndition with blacklisting There's a race condition in the current TaskSetManager where a failed task is added for retry (addPendingTask), and can asynchronously be assigned to an executor *prior* to the blacklist state (updateBlacklistForFailedTask), the result is the task might re-execute on the same executor. This is particularly problematic if the executor is shutting down since the retry task immediately becomes a lost task (ExecutorLostFailure). Another side effect is that the actual failure reason gets obscured by the retry task which never actually executed. There are sample logs showing the issue in the https://issues.apache.org/jira/browse/SPARK-21219 The fix is to change the ordering of the addPendingTask and updatingBlackListForFailedTask calls in TaskSetManager.handleFailedTask Implemented a unit test that verifies the task is black listed before it is added to the pending task. Ran the unit test without the fix and it fails. Ran the unit test with the fix and it passes. Please review http://spark.apache.org/contributing.html before opening a pull request. Author: Eric Vandenberg Closes #18427 from ericvandenbergfb/blacklistFix. ## What changes were proposed in this pull request? This is a backport of the fix to SPARK-21219, already checked in as 96d58f2. ## How was this patch tested? Ran TaskSetManagerSuite tests locally. Author: Eric Vandenberg Closes #18604 from jsoltren/branch-2.2. commit 39eba3053ac99f03d9df56471bae5fc5cc9f4462 Author: Kohki Nishio Date: 2017-07-13T00:22:40Z [SPARK-18646][REPL] Set parent classloader as null for ExecutorClassLoader ## What changes were proposed in this pull request? `ClassLoader` will preferentially load class from `parent`. Only when `parent` is null or the load failed, that it will call the overridden `findClass` function. To avoid the potential issue caused by loading class using inappropriate class loader, we should set the `parent` of `ClassLoader` to null, so that we can fully control which class loader is used. This is take over of #17074, the primary author of this PR is taroplus . Should close #17074 after this PR get merged. ## How was this patch tested? Add test case in `ExecutorClassLoaderSuite`. Author: Kohki Nishio Author: Xingbo Jiang Closes #18614 from jiangxb1987/executor_classloader. (cherry picked from commit e08d06b37bc96cc48fec1c5e40f73e0bca09c616) Signed-off-by: Wenchen Fan commit cf0719b5e99333b28bb4066b304dbcf8400c80ea Author: Wenchen Fan Date: 2017-07-13T00:34:42Z Revert "[SPARK-18646][REPL] Set parent classloader as null for ExecutorClassLoader" This reverts commit 39eba3053ac99f03d9df56471bae5fc5cc9f4462. commit bfe3ba86936ffaabff9f89d03018eb368d246b4d Author: jerryshao Date: 2017-07-13T22:25:38Z [SPARK-21376][YARN] Fix yarn client token expire issue when cleaning the staging files in long running scenario ## What changes were proposed in this pull request? This issue happens in long running application with
[GitHub] spark issue #22089: [SPARK-25098][SQL]‘Cast’ will return NULL when input...
Github user wangyum commented on the issue: https://github.com/apache/spark/pull/22089 ping @bingbai0912 --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22088: [SPARK-24931][CORE]CoarseGrainedExecutorBackend send wro...
Github user wangyum commented on the issue: https://github.com/apache/spark/pull/22088 cc @jiangxb1987 --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22933: [SPARK-25933][DOCUMENTATION] Fix pstats.Stats() referenc...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/22933 **[Test build #4411 has finished](https://amplab.cs.berkeley.edu/jenkins/job/NewSparkPullRequestBuilder/4411/testReport)** for PR 22933 at commit [`6061b76`](https://github.com/apache/spark/commit/6061b766e6d62936dc39967b8ad21441b04bbfec). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22893: [SPARK-25868][MLlib] One part of Spark MLlib Kmean Logic...
Github user srowen commented on the issue: https://github.com/apache/spark/pull/22893 OK, the Spark part doesn't seem relevant. The input might be more realistic here, yes. I was commenting that your test code doesn't show what you're testing, though I understand you manually modified it. Because the test is so central here I think it's important to understand exactly what you're measuring and exactly what you're running. This doesn't show an improvement, right? --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #22894: [SPARK-25885][Core][Minor] HighlyCompressedMapSta...
Github user srowen commented on a diff in the pull request: https://github.com/apache/spark/pull/22894#discussion_r230556818 --- Diff: core/src/main/scala/org/apache/spark/scheduler/MapStatus.scala --- @@ -189,13 +188,12 @@ private[spark] class HighlyCompressedMapStatus private ( emptyBlocks.readExternal(in) avgSize = in.readLong() val count = in.readInt() -val hugeBlockSizesArray = mutable.ArrayBuffer[Tuple2[Int, Byte]]() +hugeBlockSizes = mutable.Map.empty[Int, Byte] (0 until count).foreach { _ => val block = in.readInt() val size = in.readByte() - hugeBlockSizesArray += Tuple2(block, size) + hugeBlockSizes.asInstanceOf[mutable.Map[Int, Byte]].update(block, size) --- End diff -- Why cast it? it is used as a mutable map and its type is a mutable map, so the type on line 151 is wrong. Also, just `hugeBlockSizes(block) = size`, no? --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22933: [SPARK-25933][DOCUMENTATION] Fix pstats.Stats() referenc...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/22933 **[Test build #4411 has started](https://amplab.cs.berkeley.edu/jenkins/job/NewSparkPullRequestBuilder/4411/testReport)** for PR 22933 at commit [`6061b76`](https://github.com/apache/spark/commit/6061b766e6d62936dc39967b8ad21441b04bbfec). --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org