[ 
https://issues.apache.org/jira/browse/SPARK-15063?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15270450#comment-15270450
 ] 

Sagar commented on SPARK-15063:
-------------------------------

Yes but where we are using seq, there we can use t1 to get required results for 
each filter as mentioned above. 
In order to refer its column we need to involve t1.

val t2a = sc.makeRDD(accounts).toDF("uid", "type", "amount")
val t2s = t2a.filter(t2a("type") <=> "savings")

t1.
  join(t2c, t1("uid") <=> t2c("uid"), "left").
  join(t2s, t1("uid") <=> t2s("uid"), "left").
  take(10)

> filtering and joining back doesn't work
> ---------------------------------------
>
>                 Key: SPARK-15063
>                 URL: https://issues.apache.org/jira/browse/SPARK-15063
>             Project: Spark
>          Issue Type: Bug
>    Affects Versions: 1.6.1
>            Reporter: Neville Kadwa
>
> I'm trying to filter and join to do a simple pivot but getting very odd 
> results.
> {quote} {noformat}
> val sqlContext = new org.apache.spark.sql.SQLContext(sc)
> import sqlContext.implicits._
> val people = Array((1, "sam"), (2, "joe"), (3, "sally"), (4, "joanna"))
> val accounts = Array(
>   (1, "checking", 100.0),
>   (1, "savings", 300.0),
>   (2, "savings", 1000.0),
>   (3, "carloan", 12000.0),
>   (3, "checking", 400.0)
> )
> val t1 = sc.makeRDD(people).toDF("uid", "name")
> val t2 = sc.makeRDD(accounts).toDF("uid", "type", "amount")
> val t2c = t2.filter(t2("type") <=> "checking")
> val t2s = t2.filter(t2("type") <=> "savings")
> t1.
>   join(t2c, t1("uid") <=> t2c("uid"), "left").
>   join(t2s, t1("uid") <=> t2s("uid"), "left").
>   take(10)
> {noformat} {quote}
> The results are wrong:
> {quote} {noformat}
> Array(
>   [1,sam,1,checking,100.0,1,savings,300.0],
>   [1,sam,1,checking,100.0,2,savings,1000.0],
>   [2,joe,null,null,null,null,null,null],
>   [3,sally,3,checking,400.0,1,savings,300.0],
>   [3,sally,3,checking,400.0,2,savings,1000.0],
>   [4,joanna,null,null,null,null,null,null]
> )
> {noformat} {quote}
> The way I can force it to work properly is to create a new df for each filter:
> {quote} {noformat}
> val t2a = sc.makeRDD(accounts).toDF("uid", "type", "amount")
> val t2s = t2a.filter(t2a("type") <=> "savings")
> t1.
>   join(t2c, t1("uid") <=> t2c("uid"), "left").
>   join(t2s, t1("uid") <=> t2s("uid"), "left").
>   take(10)
> {noformat} {quote}
> The results are right:
> {quote} {noformat}
> Array(
>   [1,sam,1,checking,100.0,1,savings,300.0],
>   [2,joe,null,null,null,2,savings,1000.0],
>   [3,sally,3,checking,400.0,null,null,null],
>   [4,joanna,null,null,null,null,null,null]
> )
> {noformat} {quote}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

Reply via email to