[ https://issues.apache.org/jira/browse/SPARK-15063?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15272439#comment-15272439 ]
Neville Kadwa commented on SPARK-15063: --------------------------------------- Marking duplicate > filtering and joining back doesn't work > --------------------------------------- > > Key: SPARK-15063 > URL: https://issues.apache.org/jira/browse/SPARK-15063 > Project: Spark > Issue Type: Bug > Affects Versions: 1.6.1 > Reporter: Neville Kadwa > > I'm trying to filter and join to do a simple pivot but getting very odd > results. > {quote} {noformat} > val sqlContext = new org.apache.spark.sql.SQLContext(sc) > import sqlContext.implicits._ > val people = Array((1, "sam"), (2, "joe"), (3, "sally"), (4, "joanna")) > val accounts = Array( > (1, "checking", 100.0), > (1, "savings", 300.0), > (2, "savings", 1000.0), > (3, "carloan", 12000.0), > (3, "checking", 400.0) > ) > val t1 = sc.makeRDD(people).toDF("uid", "name") > val t2 = sc.makeRDD(accounts).toDF("uid", "type", "amount") > val t2c = t2.filter(t2("type") <=> "checking") > val t2s = t2.filter(t2("type") <=> "savings") > t1. > join(t2c, t1("uid") <=> t2c("uid"), "left"). > join(t2s, t1("uid") <=> t2s("uid"), "left"). > take(10) > {noformat} {quote} > The results are wrong: > {quote} {noformat} > Array( > [1,sam,1,checking,100.0,1,savings,300.0], > [1,sam,1,checking,100.0,2,savings,1000.0], > [2,joe,null,null,null,null,null,null], > [3,sally,3,checking,400.0,1,savings,300.0], > [3,sally,3,checking,400.0,2,savings,1000.0], > [4,joanna,null,null,null,null,null,null] > ) > {noformat} {quote} > The way I can force it to work properly is to create a new df for each filter: > {quote} {noformat} > val t2a = sc.makeRDD(accounts).toDF("uid", "type", "amount") > val t2s = t2a.filter(t2a("type") <=> "savings") > t1. > join(t2c, t1("uid") <=> t2c("uid"), "left"). > join(t2s, t1("uid") <=> t2s("uid"), "left"). > take(10) > {noformat} {quote} > The results are right: > {quote} {noformat} > Array( > [1,sam,1,checking,100.0,1,savings,300.0], > [2,joe,null,null,null,2,savings,1000.0], > [3,sally,3,checking,400.0,null,null,null], > [4,joanna,null,null,null,null,null,null] > ) > {noformat} {quote} -- This message was sent by Atlassian JIRA (v6.3.4#6332) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org