spark git commit: [DOC] improve python doc for rdd.histogram and dataframe.join

rxin Mon, 18 Jul 2016 23:50:38 -0700

Repository: spark
Updated Branches:
  refs/heads/branch-2.0 ef2a6f131 -> 504aa6f7a



[DOC] improve python doc for rdd.histogram and dataframe.join

## What changes were proposed in this pull request?

doc change only

## How was this patch tested?

doc change only

Author: Mortada Mehyar <mortada.meh...@gmail.com>

Closes #14253 from mortada/histogram_typos.

(cherry picked from commit 6ee40d2cc5f467c78be662c1639fc3d5b7f796cf)
Signed-off-by: Reynold Xin <r...@databricks.com>


Project: http://git-wip-us.apache.org/repos/asf/spark/repo
Commit: http://git-wip-us.apache.org/repos/asf/spark/commit/504aa6f7
Tree: http://git-wip-us.apache.org/repos/asf/spark/tree/504aa6f7
Diff: http://git-wip-us.apache.org/repos/asf/spark/diff/504aa6f7

Branch: refs/heads/branch-2.0
Commit: 504aa6f7a87973de0955aa8c124e2a036f8b3369
Parents: ef2a6f1
Author: Mortada Mehyar <mortada.meh...@gmail.com>
Authored: Mon Jul 18 23:49:47 2016 -0700
Committer: Reynold Xin <r...@databricks.com>
Committed: Mon Jul 18 23:50:01 2016 -0700

----------------------------------------------------------------------
 python/pyspark/rdd.py           | 18 +++++++++---------
 python/pyspark/sql/dataframe.py | 10 +++++-----
 2 files changed, 14 insertions(+), 14 deletions(-)
----------------------------------------------------------------------


http://git-wip-us.apache.org/repos/asf/spark/blob/504aa6f7/python/pyspark/rdd.py
----------------------------------------------------------------------
diff --git a/python/pyspark/rdd.py b/python/pyspark/rdd.py
index 6afe769..0508235 100644
--- a/python/pyspark/rdd.py
+++ b/python/pyspark/rdd.py
@@ -1027,20 +1027,20 @@ class RDD(object):
 
         If your histogram is evenly spaced (e.g. [0, 10, 20, 30]),
         this can be switched from an O(log n) inseration to O(1) per
-        element(where n = # buckets).
+        element (where n is the number of buckets).
 
-        Buckets must be sorted and not contain any duplicates, must be
+        Buckets must be sorted, not contain any duplicates, and have
         at least two elements.
 
-        If `buckets` is a number, it will generates buckets which are
+        If `buckets` is a number, it will generate buckets which are
         evenly spaced between the minimum and maximum of the RDD. For
-        example, if the min value is 0 and the max is 100, given buckets
-        as 2, the resulting buckets will be [0,50) [50,100]. buckets must
-        be at least 1 If the RDD contains infinity, NaN throws an exception
-        If the elements in RDD do not vary (max == min) always returns
-        a single bucket.
+        example, if the min value is 0 and the max is 100, given `buckets`
+        as 2, the resulting buckets will be [0,50) [50,100]. `buckets` must
+        be at least 1. An exception is raised if the RDD contains infinity.
+        If the elements in the RDD do not vary (max == min), a single bucket
+        will be used.
 
-        It will return a tuple of buckets and histogram.
+        The return value is a tuple of buckets and histogram.
 
         >>> rdd = sc.parallelize(range(51))
         >>> rdd.histogram(2)

http://git-wip-us.apache.org/repos/asf/spark/blob/504aa6f7/python/pyspark/sql/dataframe.py
----------------------------------------------------------------------
diff --git a/python/pyspark/sql/dataframe.py b/python/pyspark/sql/dataframe.py
index c7d704a..b9f50ff 100644
--- a/python/pyspark/sql/dataframe.py
+++ b/python/pyspark/sql/dataframe.py
@@ -601,16 +601,16 @@ class DataFrame(object):
     def join(self, other, on=None, how=None):
         """Joins with another :class:`DataFrame`, using the given join 
expression.
 
-        The following performs a full outer join between ``df1`` and ``df2``.
-
         :param other: Right side of the join
-        :param on: a string for join column name, a list of column names,
-            , a join expression (Column) or a list of Columns.
-            If `on` is a string or a list of string indicating the name of the 
join column(s),
+        :param on: a string for the join column name, a list of column names,
+            a join expression (Column), or a list of Columns.
+            If `on` is a string or a list of strings indicating the name of 
the join column(s),
             the column(s) must exist on both sides, and this performs an 
equi-join.
         :param how: str, default 'inner'.
             One of `inner`, `outer`, `left_outer`, `right_outer`, `leftsemi`.
 
+        The following performs a full outer join between ``df1`` and ``df2``.
+
         >>> df.join(df2, df.name == df2.name, 'outer').select(df.name, 
df2.height).collect()
         [Row(name=None, height=80), Row(name=u'Bob', height=85), 
Row(name=u'Alice', height=None)]
 


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org

spark git commit: [DOC] improve python doc for rdd.histogram and dataframe.join

Reply via email to