[ https://issues.apache.org/jira/browse/SPARK-14761?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Bijay Kumar Pathak updated SPARK-14761: --------------------------------------- Comment: was deleted (was: Hi [~joshrosen], how do we handle the on=None while passing to JVM api, since passing None throws {{java.lang.NullPointerException}} in my regression test.) > PySpark DataFrame.join should reject invalid join methods even when join > columns are not specified > -------------------------------------------------------------------------------------------------- > > Key: SPARK-14761 > URL: https://issues.apache.org/jira/browse/SPARK-14761 > Project: Spark > Issue Type: Bug > Components: PySpark, SQL > Reporter: Josh Rosen > Priority: Minor > Labels: starter > > In PySpark, the following invalid DataFrame join will not result an error: > {code} > df1.join(df2, how='not-a-valid-join-type') > {code} > The signature for `join` is > {code} > def join(self, other, on=None, how=None): > {code} > and its code ends up completely skipping handling of the `how` parameter when > `on` is `None`: > {code} > if on is not None and not isinstance(on, list): > on = [on] > if on is None or len(on) == 0: > jdf = self._jdf.join(other._jdf) > elif isinstance(on[0], basestring): > if how is None: > jdf = self._jdf.join(other._jdf, self._jseq(on), "inner") > else: > assert isinstance(how, basestring), "how should be basestring" > jdf = self._jdf.join(other._jdf, self._jseq(on), how) > else: > {code} > Given that this behavior can mask user errors (as in the above example), I > think that we should refactor this to first process all arguments and then > call the three-argument {{_.jdf.join}}. This would handle the above invalid > example by passing all arguments to the JVM DataFrame for analysis. > I'm not planning to work on this myself, so this bugfix (+ regression test!) > is up for grabs in case someone else wants to do it. -- This message was sent by Atlassian JIRA (v6.3.4#6332) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org