Hyukjin Kwon created SPARK-21264:
------------------------------------
Summary: Omitting columns in join in PySpark throws NPE
Key: SPARK-21264
URL: https://issues.apache.org/jira/browse/SPARK-21264
Project: Spark
Issue Type: Bug
Components: PySpark
Affects Versions: 2.1.0, 2.2.0
Reporter: Hyukjin Kwon
Priority: Minor
{code}
>>> spark.conf.set("spark.sql.crossJoin.enabled", "false")
>>> spark.range(1).join(spark.range(1), how="inner").show()
Traceback (most recent call last):
...
py4j.protocol.Py4JJavaError: An error occurred while calling o66.join.
: java.lang.NullPointerException
at org.apache.spark.sql.Dataset.join(Dataset.scala:931)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
...
>>> spark.conf.set("spark.sql.crossJoin.enabled", "true")
>>> spark.range(1).join(spark.range(1), how="inner").show()
...
py4j.protocol.Py4JJavaError: An error occurred while calling o84.join.
: java.lang.NullPointerException
at org.apache.spark.sql.Dataset.join(Dataset.scala:931)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
...
{code}
Omitting columns as above throws an exception.
This works in 2.0.2:
{code}
Welcome to
____ __
/ __/__ ___ _____/ /__
_\ \/ _ \/ _ `/ __/ '_/
/__ / .__/\_,_/_/ /_/\_\ version 2.0.2
/_/
Using Python version 2.7.10 (default, Jul 30 2016 19:40:32)
SparkSession available as 'spark'.
>>> spark.range(1).join(spark.range(1), how="inner").show()
+---+---+
| id| id|
+---+---+
| 0| 0|
+---+---+
{code}
but looks not from Spark 2.1.0.
It sounds a trivial small regression:
--
This message was sent by Atlassian JIRA
(v6.4.14#64029)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]