Repository: spark Updated Branches: refs/heads/master 1f6ded645 -> 19d9d4c85
[SPARK-19126][DOCS] Update Join Documentation Across Languages ## What changes were proposed in this pull request? - [X] Make sure all join types are clearly mentioned - [X] Make join labeling/style consistent - [X] Make join label ordering docs the same - [X] Improve join documentation according to above for Scala - [X] Improve join documentation according to above for Python - [X] Improve join documentation according to above for R ## How was this patch tested? No tests b/c docs. Please review http://spark.apache.org/contributing.html before opening a pull request. Author: anabranch <wac.chamb...@gmail.com> Closes #16504 from anabranch/SPARK-19126. Project: http://git-wip-us.apache.org/repos/asf/spark/repo Commit: http://git-wip-us.apache.org/repos/asf/spark/commit/19d9d4c8 Tree: http://git-wip-us.apache.org/repos/asf/spark/tree/19d9d4c8 Diff: http://git-wip-us.apache.org/repos/asf/spark/diff/19d9d4c8 Branch: refs/heads/master Commit: 19d9d4c855eab8f647a5ec66b079172de81221d0 Parents: 1f6ded6 Author: anabranch <wac.chamb...@gmail.com> Authored: Sun Jan 8 20:37:46 2017 -0800 Committer: Felix Cheung <felixche...@apache.org> Committed: Sun Jan 8 20:37:46 2017 -0800 ---------------------------------------------------------------------- R/pkg/R/DataFrame.R | 19 +++++++++++-------- python/pyspark/sql/dataframe.py | 5 +++-- .../scala/org/apache/spark/sql/Dataset.scala | 16 ++++++++++++---- 3 files changed, 26 insertions(+), 14 deletions(-) ---------------------------------------------------------------------- http://git-wip-us.apache.org/repos/asf/spark/blob/19d9d4c8/R/pkg/R/DataFrame.R ---------------------------------------------------------------------- diff --git a/R/pkg/R/DataFrame.R b/R/pkg/R/DataFrame.R index 7737ffe..c56648a 100644 --- a/R/pkg/R/DataFrame.R +++ b/R/pkg/R/DataFrame.R @@ -2313,9 +2313,9 @@ setMethod("dropDuplicates", #' @param joinExpr (Optional) The expression used to perform the join. joinExpr must be a #' Column expression. If joinExpr is omitted, the default, inner join is attempted and an error is #' thrown if it would be a Cartesian Product. For Cartesian join, use crossJoin instead. -#' @param joinType The type of join to perform. The following join types are available: -#' 'inner', 'outer', 'full', 'fullouter', leftouter', 'left_outer', 'left', -#' 'right_outer', 'rightouter', 'right', and 'leftsemi'. The default joinType is "inner". +#' @param joinType The type of join to perform, default 'inner'. +#' Must be one of: 'inner', 'cross', 'outer', 'full', 'full_outer', +#' 'left', 'left_outer', 'right', 'right_outer', 'left_semi', or 'left_anti'. #' @return A SparkDataFrame containing the result of the join operation. #' @family SparkDataFrame functions #' @aliases join,SparkDataFrame,SparkDataFrame-method @@ -2344,15 +2344,18 @@ setMethod("join", if (is.null(joinType)) { sdf <- callJMethod(x@sdf, "join", y@sdf, joinExpr@jc) } else { - if (joinType %in% c("inner", "outer", "full", "fullouter", - "leftouter", "left_outer", "left", - "rightouter", "right_outer", "right", "leftsemi")) { + if (joinType %in% c("inner", "cross", + "outer", "full", "fullouter", "full_outer", + "left", "leftouter", "left_outer", + "right", "rightouter", "right_outer", + "left_semi", "leftsemi", "left_anti", "leftanti")) { joinType <- gsub("_", "", joinType) sdf <- callJMethod(x@sdf, "join", y@sdf, joinExpr@jc, joinType) } else { stop("joinType must be one of the following types: ", - "'inner', 'outer', 'full', 'fullouter', 'leftouter', 'left_outer', 'left', - 'rightouter', 'right_outer', 'right', 'leftsemi'") + "'inner', 'cross', 'outer', 'full', 'full_outer',", + "'left', 'left_outer', 'right', 'right_outer',", + "'left_semi', or 'left_anti'.") } } } http://git-wip-us.apache.org/repos/asf/spark/blob/19d9d4c8/python/pyspark/sql/dataframe.py ---------------------------------------------------------------------- diff --git a/python/pyspark/sql/dataframe.py b/python/pyspark/sql/dataframe.py index b9d9038..10e42d0 100644 --- a/python/pyspark/sql/dataframe.py +++ b/python/pyspark/sql/dataframe.py @@ -730,8 +730,9 @@ class DataFrame(object): a join expression (Column), or a list of Columns. If `on` is a string or a list of strings indicating the name of the join column(s), the column(s) must exist on both sides, and this performs an equi-join. - :param how: str, default 'inner'. - One of `inner`, `outer`, `left_outer`, `right_outer`, `leftsemi`. + :param how: str, default ``inner``. Must be one of: ``inner``, ``cross``, ``outer``, + ``full``, ``full_outer``, ``left``, ``left_outer``, ``right``, ``right_outer``, + ``left_semi``, and ``left_anti``. The following performs a full outer join between ``df1`` and ``df2``. http://git-wip-us.apache.org/repos/asf/spark/blob/19d9d4c8/sql/core/src/main/scala/org/apache/spark/sql/Dataset.scala ---------------------------------------------------------------------- diff --git a/sql/core/src/main/scala/org/apache/spark/sql/Dataset.scala b/sql/core/src/main/scala/org/apache/spark/sql/Dataset.scala index fd75d51..1a7a5ba 100644 --- a/sql/core/src/main/scala/org/apache/spark/sql/Dataset.scala +++ b/sql/core/src/main/scala/org/apache/spark/sql/Dataset.scala @@ -750,14 +750,18 @@ class Dataset[T] private[sql]( } /** - * Equi-join with another `DataFrame` using the given columns. + * Equi-join with another `DataFrame` using the given columns. A cross join with a predicate + * is specified as an inner join. If you would explicitly like to perform a cross join use the + * `crossJoin` method. * * Different from other join functions, the join columns will only appear once in the output, * i.e. similar to SQL's `JOIN USING` syntax. * * @param right Right side of the join operation. * @param usingColumns Names of the columns to join on. This columns must exist on both sides. - * @param joinType One of: `inner`, `outer`, `left_outer`, `right_outer`, `leftsemi`. + * @param joinType Type of join to perform. Default `inner`. Must be one of: + * `inner`, `cross`, `outer`, `full`, `full_outer`, `left`, `left_outer`, + * `right`, `right_outer`, `left_semi`, `left_anti`. * * @note If you perform a self-join using this function without aliasing the input * `DataFrame`s, you will NOT be able to reference any columns after the join, since @@ -812,7 +816,9 @@ class Dataset[T] private[sql]( * * @param right Right side of the join. * @param joinExprs Join expression. - * @param joinType One of: `inner`, `outer`, `left_outer`, `right_outer`, `leftsemi`. + * @param joinType Type of join to perform. Default `inner`. Must be one of: + * `inner`, `cross`, `outer`, `full`, `full_outer`, `left`, `left_outer`, + * `right`, `right_outer`, `left_semi`, `left_anti`. * * @group untypedrel * @since 2.0.0 @@ -889,7 +895,9 @@ class Dataset[T] private[sql]( * * @param other Right side of the join. * @param condition Join expression. - * @param joinType One of: `inner`, `outer`, `left_outer`, `right_outer`, `leftsemi`. + * @param joinType Type of join to perform. Default `inner`. Must be one of: + * `inner`, `cross`, `outer`, `full`, `full_outer`, `left`, `left_outer`, + * `right`, `right_outer`, `left_semi`, `left_anti`. * * @group typedrel * @since 1.6.0 --------------------------------------------------------------------- To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org