Hyukjin Kwon created SPARK-21264:
------------------------------------

             Summary: Omitting columns in join in PySpark throws NPE
                 Key: SPARK-21264
                 URL: https://issues.apache.org/jira/browse/SPARK-21264
             Project: Spark
          Issue Type: Bug
          Components: PySpark
    Affects Versions: 2.1.0, 2.2.0
            Reporter: Hyukjin Kwon
            Priority: Minor


{code}
>>> spark.conf.set("spark.sql.crossJoin.enabled", "false")
>>> spark.range(1).join(spark.range(1), how="inner").show()
Traceback (most recent call last):
...
py4j.protocol.Py4JJavaError: An error occurred while calling o66.join.
: java.lang.NullPointerException
        at org.apache.spark.sql.Dataset.join(Dataset.scala:931)
        at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
...

>>> spark.conf.set("spark.sql.crossJoin.enabled", "true")
>>> spark.range(1).join(spark.range(1), how="inner").show()
...
py4j.protocol.Py4JJavaError: An error occurred while calling o84.join.
: java.lang.NullPointerException
        at org.apache.spark.sql.Dataset.join(Dataset.scala:931)
        at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
...
{code}

Omitting columns as above throws an exception.

This works in 2.0.2:

{code}
Welcome to
      ____              __
     / __/__  ___ _____/ /__
    _\ \/ _ \/ _ `/ __/  '_/
   /__ / .__/\_,_/_/ /_/\_\   version 2.0.2
      /_/

Using Python version 2.7.10 (default, Jul 30 2016 19:40:32)
SparkSession available as 'spark'.
>>> spark.range(1).join(spark.range(1), how="inner").show()
+---+---+
| id| id|
+---+---+
|  0|  0|
+---+---+
{code}

but looks not from Spark 2.1.0.

It sounds a trivial small regression:





--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to