[jira] [Commented] (SPARK-26336) left_anti join with Na Values

2018-12-18 Thread Marco Gaido (JIRA)


[ 
https://issues.apache.org/jira/browse/SPARK-26336?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16724098#comment-16724098
 ] 

Marco Gaido commented on SPARK-26336:
-

[~csevilla] the point is always the same, ie. the presence of {{NULL}} 
(Python's None is SQL's NULL). And {{NULL = NULL}} returns {{NULL}}, not 
{{true}}. This is how every DB works. You can try it in MySQL, Postgres, 
whatever you prefer.

> left_anti join with Na Values
> -
>
> Key: SPARK-26336
> URL: https://issues.apache.org/jira/browse/SPARK-26336
> Project: Spark
>  Issue Type: Bug
>  Components: PySpark
>Affects Versions: 2.2.0
>Reporter: Carlos
>Priority: Major
>
> When I'm joining two dataframes with data that haves NA values, the left_anti 
> join don't work as well, cause don't detect registers with NA values.
> Example:  
> {code:java}
> from pyspark.sql import SparkSession
> from pyspark.sql.functions import *
> spark = SparkSession.builder.appName('test').enableHiveSupport().getOrCreate()
> data = [(1,"Test"),(2,"Test"),(3,None)]
> df1 = spark.createDataFrame(data,("id","columndata"))
> df2 = spark.createDataFrame(data,("id","columndata"))
> df_joined = df1.join(df2, df1.columns,'left_anti'){code}
> df_joined have data, when two dataframe are the same.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-26336) left_anti join with Na Values

2018-12-18 Thread Carlos (JIRA)


[ 
https://issues.apache.org/jira/browse/SPARK-26336?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16724081#comment-16724081
 ] 

Carlos commented on SPARK-26336:


[~mgaido] I think I choose a bad objects to example that.

 

data1 = {

'id':1,

'name':'Carlos'

'surname':'Sevilla'

'address':None

'Country':'ESP'

}

data2 = {

'id':1,

'name':'Carlos'

'surname':'Sevilla'

'address':None

'Country':'ESP'

}

 

That 2 variables, contains the SAME data.

If I try to left_anti (with inner don't works too), he must return None 
results, none rows, cause both dataframe have exactly the same data.

 

 

 

 

 

 

> left_anti join with Na Values
> -
>
> Key: SPARK-26336
> URL: https://issues.apache.org/jira/browse/SPARK-26336
> Project: Spark
>  Issue Type: Bug
>  Components: PySpark
>Affects Versions: 2.2.0
>Reporter: Carlos
>Priority: Major
>
> When I'm joining two dataframes with data that haves NA values, the left_anti 
> join don't work as well, cause don't detect registers with NA values.
> Example:  
> {code:java}
> from pyspark.sql import SparkSession
> from pyspark.sql.functions import *
> spark = SparkSession.builder.appName('test').enableHiveSupport().getOrCreate()
> data = [(1,"Test"),(2,"Test"),(3,None)]
> df1 = spark.createDataFrame(data,("id","columndata"))
> df2 = spark.createDataFrame(data,("id","columndata"))
> df_joined = df1.join(df2, df1.columns,'left_anti'){code}
> df_joined have data, when two dataframe are the same.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-26336) left_anti join with Na Values

2018-12-18 Thread Marco Gaido (JIRA)


[ 
https://issues.apache.org/jira/browse/SPARK-26336?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16724052#comment-16724052
 ] 

Marco Gaido commented on SPARK-26336:
-

That's correct because NULLs do not match. The usual implementation of ANTIJOIN 
in other DBs (eg. Postgres) is to do a left join and filter for the column on 
the right side being NULL. If you do so in your example 1 row is returned.

> left_anti join with Na Values
> -
>
> Key: SPARK-26336
> URL: https://issues.apache.org/jira/browse/SPARK-26336
> Project: Spark
>  Issue Type: Bug
>  Components: PySpark
>Affects Versions: 2.2.0
>Reporter: Carlos
>Priority: Major
>
> When I'm joining two dataframes with data that haves NA values, the left_anti 
> join don't work as well, cause don't detect registers with NA values.
> Example:  
> {code:java}
> from pyspark.sql import SparkSession
> from pyspark.sql.functions import *
> spark = SparkSession.builder.appName('test').enableHiveSupport().getOrCreate()
> data = [(1,"Test"),(2,"Test"),(3,None)]
> df1 = spark.createDataFrame(data,("id","columndata"))
> df2 = spark.createDataFrame(data,("id","columndata"))
> df_joined = df1.join(df2, df1.columns,'left_anti'){code}
> df_joined have data, when two dataframe are the same.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org