[spark] Git Push Summary

2015-12-27 Thread marmbrus
Repository: spark
Updated Tags:  refs/tags/v1.6.0-rc4 [deleted] 4062cda30

-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org



[spark] Git Push Summary

2015-12-27 Thread marmbrus
Repository: spark
Updated Tags:  refs/tags/v1.6.0 [created] 4062cda30

-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org



spark git commit: [SPARK-12520] [PYSPARK] Correct Descriptions and Add Use Cases in Equi-Join

2015-12-27 Thread davies
Repository: spark
Updated Branches:
  refs/heads/branch-1.6 865dd8bcc -> b8da77ef7


[SPARK-12520] [PYSPARK] Correct Descriptions and Add Use Cases in Equi-Join

After reading the JIRA https://issues.apache.org/jira/browse/SPARK-12520, I 
double checked the code.

For example, users can do the Equi-Join like
  ```df.join(df2, 'name', 'outer').select('name', 'height').collect()```
- There exists a bug in 1.5 and 1.4. The code just ignores the third parameter 
(join type) users pass. However, the join type we called is `Inner`, even if 
the user-specified type is the other type (e.g., `Outer`).
- After a PR: https://github.com/apache/spark/pull/8600, the 1.6 does not have 
such an issue, but the description has not been updated.

Plan to submit another PR to fix 1.5 and issue an error message if users 
specify a non-inner join type when using Equi-Join.

Author: gatorsmile 

Closes #10477 from gatorsmile/pyOuterJoin.


Project: http://git-wip-us.apache.org/repos/asf/spark/repo
Commit: http://git-wip-us.apache.org/repos/asf/spark/commit/b8da77ef
Tree: http://git-wip-us.apache.org/repos/asf/spark/tree/b8da77ef
Diff: http://git-wip-us.apache.org/repos/asf/spark/diff/b8da77ef

Branch: refs/heads/branch-1.6
Commit: b8da77ef776ab9cdc130a70293d75e7bdcdf95b0
Parents: 865dd8b
Author: gatorsmile 
Authored: Sun Dec 27 23:18:48 2015 -0800
Committer: Davies Liu 
Committed: Sun Dec 27 23:19:50 2015 -0800

--
 python/pyspark/sql/dataframe.py | 5 -
 1 file changed, 4 insertions(+), 1 deletion(-)
--


http://git-wip-us.apache.org/repos/asf/spark/blob/b8da77ef/python/pyspark/sql/dataframe.py
--
diff --git a/python/pyspark/sql/dataframe.py b/python/pyspark/sql/dataframe.py
index b15b8d7..a0fdaf3 100644
--- a/python/pyspark/sql/dataframe.py
+++ b/python/pyspark/sql/dataframe.py
@@ -608,13 +608,16 @@ class DataFrame(object):
 :param on: a string for join column name, a list of column names,
 , a join expression (Column) or a list of Columns.
 If `on` is a string or a list of string indicating the name of the 
join column(s),
-the column(s) must exist on both sides, and this performs an inner 
equi-join.
+the column(s) must exist on both sides, and this performs an 
equi-join.
 :param how: str, default 'inner'.
 One of `inner`, `outer`, `left_outer`, `right_outer`, `leftsemi`.
 
 >>> df.join(df2, df.name == df2.name, 'outer').select(df.name, 
df2.height).collect()
 [Row(name=None, height=80), Row(name=u'Alice', height=None), 
Row(name=u'Bob', height=85)]
 
+>>> df.join(df2, 'name', 'outer').select('name', 'height').collect()
+[Row(name=u'Tom', height=80), Row(name=u'Alice', height=None), 
Row(name=u'Bob', height=85)]
+
 >>> cond = [df.name == df3.name, df.age == df3.age]
 >>> df.join(df3, cond, 'outer').select(df.name, df3.age).collect()
 [Row(name=u'Bob', age=5), Row(name=u'Alice', age=2)]


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org



spark git commit: [SPARK-12520] [PYSPARK] Correct Descriptions and Add Use Cases in Equi-Join

2015-12-27 Thread davies
Repository: spark
Updated Branches:
  refs/heads/master 1e9781395 -> 9ab296ecd


[SPARK-12520] [PYSPARK] Correct Descriptions and Add Use Cases in Equi-Join

After reading the JIRA https://issues.apache.org/jira/browse/SPARK-12520, I 
double checked the code.

For example, users can do the Equi-Join like
  ```df.join(df2, 'name', 'outer').select('name', 'height').collect()```
- There exists a bug in 1.5 and 1.4. The code just ignores the third parameter 
(join type) users pass. However, the join type we called is `Inner`, even if 
the user-specified type is the other type (e.g., `Outer`).
- After a PR: https://github.com/apache/spark/pull/8600, the 1.6 does not have 
such an issue, but the description has not been updated.

Plan to submit another PR to fix 1.5 and issue an error message if users 
specify a non-inner join type when using Equi-Join.

Author: gatorsmile 

Closes #10477 from gatorsmile/pyOuterJoin.


Project: http://git-wip-us.apache.org/repos/asf/spark/repo
Commit: http://git-wip-us.apache.org/repos/asf/spark/commit/9ab296ec
Tree: http://git-wip-us.apache.org/repos/asf/spark/tree/9ab296ec
Diff: http://git-wip-us.apache.org/repos/asf/spark/diff/9ab296ec

Branch: refs/heads/master
Commit: 9ab296ecdceef88ebca523ed62848fbeb5df353b
Parents: 1e97813
Author: gatorsmile 
Authored: Sun Dec 27 23:18:48 2015 -0800
Committer: Davies Liu 
Committed: Sun Dec 27 23:18:48 2015 -0800

--
 python/pyspark/sql/dataframe.py | 5 -
 1 file changed, 4 insertions(+), 1 deletion(-)
--


http://git-wip-us.apache.org/repos/asf/spark/blob/9ab296ec/python/pyspark/sql/dataframe.py
--
diff --git a/python/pyspark/sql/dataframe.py b/python/pyspark/sql/dataframe.py
index 4b3791e..ad621df 100644
--- a/python/pyspark/sql/dataframe.py
+++ b/python/pyspark/sql/dataframe.py
@@ -608,13 +608,16 @@ class DataFrame(object):
 :param on: a string for join column name, a list of column names,
 , a join expression (Column) or a list of Columns.
 If `on` is a string or a list of string indicating the name of the 
join column(s),
-the column(s) must exist on both sides, and this performs an inner 
equi-join.
+the column(s) must exist on both sides, and this performs an 
equi-join.
 :param how: str, default 'inner'.
 One of `inner`, `outer`, `left_outer`, `right_outer`, `leftsemi`.
 
 >>> df.join(df2, df.name == df2.name, 'outer').select(df.name, 
df2.height).collect()
 [Row(name=None, height=80), Row(name=u'Alice', height=None), 
Row(name=u'Bob', height=85)]
 
+>>> df.join(df2, 'name', 'outer').select('name', 'height').collect()
+[Row(name=u'Tom', height=80), Row(name=u'Alice', height=None), 
Row(name=u'Bob', height=85)]
+
 >>> cond = [df.name == df3.name, df.age == df3.age]
 >>> df.join(df3, cond, 'outer').select(df.name, df3.age).collect()
 [Row(name=u'Bob', age=5), Row(name=u'Alice', age=2)]


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org



spark git commit: [SPARK-12520] [PYSPARK] [1.5] Ensure the join type is `inner` for equi-Join.

2015-12-27 Thread davies
Repository: spark
Updated Branches:
  refs/heads/branch-1.5 86161a4f7 -> 42286feb6


[SPARK-12520] [PYSPARK] [1.5] Ensure the join type is `inner` for equi-Join.

This PR is to add `assert` to ensure the join type is `inner` for equi-Join.

JIRA: https://issues.apache.org/jira/browse/SPARK-12520

In the JIRA, users specify the join type `outer` when using the equi-join. 
However, the result we returned is the `inner` join, which is the only type 
Spark 1.5 supports. (Note, starting from Spark 1.6, we can support the other 
types for equi-join).

For example,
```scala
joined_table = left_table.join(right_table, "joining_column", "outer")
```

Should we also back port it to 1.4? davies JoshRosen Thanks!

Author: gatorsmile 

Closes #10484 from gatorsmile/pythonEquiOuterJoin.


Project: http://git-wip-us.apache.org/repos/asf/spark/repo
Commit: http://git-wip-us.apache.org/repos/asf/spark/commit/42286feb
Tree: http://git-wip-us.apache.org/repos/asf/spark/tree/42286feb
Diff: http://git-wip-us.apache.org/repos/asf/spark/diff/42286feb

Branch: refs/heads/branch-1.5
Commit: 42286feb676f52b366c7be3f9ace4bfde55d72a9
Parents: 86161a4
Author: gatorsmile 
Authored: Sun Dec 27 23:23:57 2015 -0800
Committer: Davies Liu 
Committed: Sun Dec 27 23:23:57 2015 -0800

--
 python/pyspark/sql/dataframe.py | 1 +
 1 file changed, 1 insertion(+)
--


http://git-wip-us.apache.org/repos/asf/spark/blob/42286feb/python/pyspark/sql/dataframe.py
--
diff --git a/python/pyspark/sql/dataframe.py b/python/pyspark/sql/dataframe.py
index 2b23815..eb2c6e5 100644
--- a/python/pyspark/sql/dataframe.py
+++ b/python/pyspark/sql/dataframe.py
@@ -570,6 +570,7 @@ class DataFrame(object):
 if on is None or len(on) == 0:
 jdf = self._jdf.join(other._jdf)
 elif isinstance(on[0], basestring):
+assert how is None or how == 'inner', "Equi-join does not support: 
%s" % how
 jdf = self._jdf.join(other._jdf, self._jseq(on))
 else:
 assert isinstance(on[0], Column), "on should be Column or list of 
Column"


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org