[ 
https://issues.apache.org/jira/browse/SPARK-24210?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16500383#comment-16500383
 ] 

Li Yuanjian commented on SPARK-24210:
-------------------------------------

I think it maybe not a bug.
#KO: returns r1 and r3ex.filter(('c1 = 1') and ('c2 = 1')).show()
This cause by python self base string __and__ implementation. After passing to 
df.filter, there's only 'c2 = 1'.
#KO: returns r0 and r3ex.filter('c1 = 1 & c2 = 1').show()#KO: returns r0 and 
r3ex.filter('c1 == 1 & c2 == 1').show()
As you mentioned, [https://github.com/apache/spark/pull/6961] actually fix the 
'&' between column, but not string expression like 'c1 = 1 & c2 = 1', here in 
ex.filter('c1 = 1 & c2 = 1'), Spark parse it to valueExpression like: 'Filter 
(('a = (1 & 'b)) = 1), I think this make sense here. 

> incorrect handling of boolean expressions when using column in expressions in 
> pyspark.sql.DataFrame filter function
> -------------------------------------------------------------------------------------------------------------------
>
>                 Key: SPARK-24210
>                 URL: https://issues.apache.org/jira/browse/SPARK-24210
>             Project: Spark
>          Issue Type: Bug
>          Components: PySpark
>    Affects Versions: 2.1.2
>            Reporter: Michael H
>            Priority: Major
>
> {code:python}
> ex = spark.createDataFrame([
>     ('r0', 0, 0),
>     ('r1', 0, 1),
>     ('r2', 1, 0),
>     ('r3', 1, 1)]\
>   , "row: string, c1: int, c2: int")
> #KO: returns r1 and r3
> ex.filter(('c1 = 1') and ('c2 = 1')).show()
> #OK, raises an exception
> ex.filter(('c1 == 1') & ('c2 == 1')).show()
> #KO: returns r0 and r3
> ex.filter('c1 = 1 & c2 = 1').show()
> #KO: returns r0 and r3
> ex.filter('c1 == 1 & c2 == 1').show()
> #OK: returns r3 only
> ex.filter('c1 = 1 and c2 = 1').show()
> #OK: returns r3 only
> ex.filter('c1 == 1 and c2 == 1').show()
> {code}
> building the expressions using {code}ex.c1{code} or {code}ex['c1']{code} we 
> don't have this.
> Issue seems related with
> https://github.com/apache/spark/pull/6961



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

Reply via email to