Repository: spark Updated Branches: refs/heads/master 1426a0805 -> 6ee40d2cc
[DOC] improve python doc for rdd.histogram and dataframe.join ## What changes were proposed in this pull request? doc change only ## How was this patch tested? doc change only Author: Mortada Mehyar <mortada.meh...@gmail.com> Closes #14253 from mortada/histogram_typos. Project: http://git-wip-us.apache.org/repos/asf/spark/repo Commit: http://git-wip-us.apache.org/repos/asf/spark/commit/6ee40d2c Tree: http://git-wip-us.apache.org/repos/asf/spark/tree/6ee40d2c Diff: http://git-wip-us.apache.org/repos/asf/spark/diff/6ee40d2c Branch: refs/heads/master Commit: 6ee40d2cc5f467c78be662c1639fc3d5b7f796cf Parents: 1426a08 Author: Mortada Mehyar <mortada.meh...@gmail.com> Authored: Mon Jul 18 23:49:47 2016 -0700 Committer: Reynold Xin <r...@databricks.com> Committed: Mon Jul 18 23:49:47 2016 -0700 ---------------------------------------------------------------------- python/pyspark/rdd.py | 18 +++++++++--------- python/pyspark/sql/dataframe.py | 10 +++++----- 2 files changed, 14 insertions(+), 14 deletions(-) ---------------------------------------------------------------------- http://git-wip-us.apache.org/repos/asf/spark/blob/6ee40d2c/python/pyspark/rdd.py ---------------------------------------------------------------------- diff --git a/python/pyspark/rdd.py b/python/pyspark/rdd.py index 6afe769..0508235 100644 --- a/python/pyspark/rdd.py +++ b/python/pyspark/rdd.py @@ -1027,20 +1027,20 @@ class RDD(object): If your histogram is evenly spaced (e.g. [0, 10, 20, 30]), this can be switched from an O(log n) inseration to O(1) per - element(where n = # buckets). + element (where n is the number of buckets). - Buckets must be sorted and not contain any duplicates, must be + Buckets must be sorted, not contain any duplicates, and have at least two elements. - If `buckets` is a number, it will generates buckets which are + If `buckets` is a number, it will generate buckets which are evenly spaced between the minimum and maximum of the RDD. For - example, if the min value is 0 and the max is 100, given buckets - as 2, the resulting buckets will be [0,50) [50,100]. buckets must - be at least 1 If the RDD contains infinity, NaN throws an exception - If the elements in RDD do not vary (max == min) always returns - a single bucket. + example, if the min value is 0 and the max is 100, given `buckets` + as 2, the resulting buckets will be [0,50) [50,100]. `buckets` must + be at least 1. An exception is raised if the RDD contains infinity. + If the elements in the RDD do not vary (max == min), a single bucket + will be used. - It will return a tuple of buckets and histogram. + The return value is a tuple of buckets and histogram. >>> rdd = sc.parallelize(range(51)) >>> rdd.histogram(2) http://git-wip-us.apache.org/repos/asf/spark/blob/6ee40d2c/python/pyspark/sql/dataframe.py ---------------------------------------------------------------------- diff --git a/python/pyspark/sql/dataframe.py b/python/pyspark/sql/dataframe.py index adf549d..8ff9403 100644 --- a/python/pyspark/sql/dataframe.py +++ b/python/pyspark/sql/dataframe.py @@ -613,16 +613,16 @@ class DataFrame(object): def join(self, other, on=None, how=None): """Joins with another :class:`DataFrame`, using the given join expression. - The following performs a full outer join between ``df1`` and ``df2``. - :param other: Right side of the join - :param on: a string for join column name, a list of column names, - , a join expression (Column) or a list of Columns. - If `on` is a string or a list of string indicating the name of the join column(s), + :param on: a string for the join column name, a list of column names, + a join expression (Column), or a list of Columns. + If `on` is a string or a list of strings indicating the name of the join column(s), the column(s) must exist on both sides, and this performs an equi-join. :param how: str, default 'inner'. One of `inner`, `outer`, `left_outer`, `right_outer`, `leftsemi`. + The following performs a full outer join between ``df1`` and ``df2``. + >>> df.join(df2, df.name == df2.name, 'outer').select(df.name, df2.height).collect() [Row(name=None, height=80), Row(name=u'Bob', height=85), Row(name=u'Alice', height=None)] --------------------------------------------------------------------- To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org