[ 
https://issues.apache.org/jira/browse/SPARK-14761?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15257672#comment-15257672
 ] 

Bijay Kumar Pathak commented on SPARK-14761:
--------------------------------------------

[~joshrosen]Hi Josh, I gave it a try and created the pull request.Do we need to 
add the add the tests for this?



> PySpark DataFrame.join should reject invalid join methods even when join 
> columns are not specified
> --------------------------------------------------------------------------------------------------
>
>                 Key: SPARK-14761
>                 URL: https://issues.apache.org/jira/browse/SPARK-14761
>             Project: Spark
>          Issue Type: Bug
>          Components: PySpark, SQL
>            Reporter: Josh Rosen
>            Priority: Minor
>              Labels: starter
>
> In PySpark, the following invalid DataFrame join will not result an error:
> {code}
> df1.join(df2, how='not-a-valid-join-type')
> {code}
> The signature for `join` is
> {code}
>     def join(self, other, on=None, how=None):
> {code}
> and its code ends up completely skipping handling of the `how` parameter when 
> `on` is `None`:
> {code}
>  if on is not None and not isinstance(on, list):
>             on = [on]
>         if on is None or len(on) == 0:
>             jdf = self._jdf.join(other._jdf)
>         elif isinstance(on[0], basestring):
>             if how is None:
>                 jdf = self._jdf.join(other._jdf, self._jseq(on), "inner")
>             else:
>                 assert isinstance(how, basestring), "how should be basestring"
>                 jdf = self._jdf.join(other._jdf, self._jseq(on), how)
>         else:
> {code}
> Given that this behavior can mask user errors (as in the above example), I 
> think that we should refactor this to first process all arguments and then 
> call the three-argument {{_.jdf.join}}. This would handle the above invalid 
> example by passing all arguments to the JVM DataFrame for analysis.
> I'm not planning to work on this myself, so this bugfix (+ regression test!) 
> is up for grabs in case someone else wants to do it.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

Reply via email to