[ https://issues.apache.org/jira/browse/SPARK-14761?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15257672#comment-15257672 ]
Bijay Kumar Pathak commented on SPARK-14761: -------------------------------------------- [~joshrosen]Hi Josh, I gave it a try and created the pull request.Do we need to add the add the tests for this? > PySpark DataFrame.join should reject invalid join methods even when join > columns are not specified > -------------------------------------------------------------------------------------------------- > > Key: SPARK-14761 > URL: https://issues.apache.org/jira/browse/SPARK-14761 > Project: Spark > Issue Type: Bug > Components: PySpark, SQL > Reporter: Josh Rosen > Priority: Minor > Labels: starter > > In PySpark, the following invalid DataFrame join will not result an error: > {code} > df1.join(df2, how='not-a-valid-join-type') > {code} > The signature for `join` is > {code} > def join(self, other, on=None, how=None): > {code} > and its code ends up completely skipping handling of the `how` parameter when > `on` is `None`: > {code} > if on is not None and not isinstance(on, list): > on = [on] > if on is None or len(on) == 0: > jdf = self._jdf.join(other._jdf) > elif isinstance(on[0], basestring): > if how is None: > jdf = self._jdf.join(other._jdf, self._jseq(on), "inner") > else: > assert isinstance(how, basestring), "how should be basestring" > jdf = self._jdf.join(other._jdf, self._jseq(on), how) > else: > {code} > Given that this behavior can mask user errors (as in the above example), I > think that we should refactor this to first process all arguments and then > call the three-argument {{_.jdf.join}}. This would handle the above invalid > example by passing all arguments to the JVM DataFrame for analysis. > I'm not planning to work on this myself, so this bugfix (+ regression test!) > is up for grabs in case someone else wants to do it. -- This message was sent by Atlassian JIRA (v6.3.4#6332) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org