[ 
https://issues.apache.org/jira/browse/SPARK-14757?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15254939#comment-15254939
 ] 

Apache Spark commented on SPARK-14757:
--------------------------------------

User 'arashn' has created a pull request for this issue:
https://github.com/apache/spark/pull/12628

> Incorrect behavior of Join operation in Spqrk SQL JOIN : "false" in the left 
> table is joined to "null" on the right table
> -------------------------------------------------------------------------------------------------------------------------
>
>                 Key: SPARK-14757
>                 URL: https://issues.apache.org/jira/browse/SPARK-14757
>             Project: Spark
>          Issue Type: Bug
>          Components: SQL
>    Affects Versions: 1.6.0
>            Reporter: Hong Huang
>
> Content of table a:
> |outgoing_0|
> |     false     |
> |      true     |
> |      null      |
> a has only one field: outgoing_0 
> Content of table b:
> |outgoing_1|
> |     false      |
> |      true      |
> |      null       |
> b has only one filed: outgoing_1
> After running this query:
> select * from a FULL JOIN b ON ( outgoing_0<=>outgoing_1)
> I got the following result:
> |outgoing_0|outgoing_1|
> |      true      |      true      |
> |     false      |     false     |
> |     false      |      null      |
> |      null       |      null      |
> The row with "false" as outgoing_0 and "null" as outgoing_1 is unexpected. 
> The operator <=> should match null with null. 
> While left "false" is matched with right "null", it is also strange to find 
> that the "false" on the right table does not match with "null" on the left 
> table (no row with "null" as outgoing_0 and "false" as outgoing_1)
> You can easily reproduce this bug by pasting the following code fragment:
> case class A( outgoing_0: Option[Boolean] )
> case class B( outgoing_1: Option[Boolean] )
> val a = sc.parallelize( Seq(
>       A( Some( false ) ),
>       A( Some( true ) ),
>       A( None )
>     ) ).toDF()
>     a.show
>     val b = sc.parallelize( Seq(
>       B( Some( false ) ),
>       B( Some( true ) ),
>       B( None )
>     ) ).toDF()
>     b.show
>     a.registerTempTable( "a" )
>     b.registerTempTable( "b" )
>     sqlContext.sql( "select * from a FULL JOIN b ON ( 
> outgoing_0<=>outgoing_1)" ).show()
>     



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

Reply via email to