Re: how to filter value in spark
you could join, it'll give you the intersection and a list of the labels where the value was found. a.join(b).collect Array[(String, (String, String))] = Array((4,(a,b)), (3,(a,b))) best, matt On 08/31/2014 09:23 PM, Liu, Raymond wrote: You could use cogroup to combine RDDs in one RDD for cross reference processing. e.g. a.cogroup(b). filter{case (_, (l,r)) = l.nonEmpty r.nonEmpty }. map{case (k,(l,r)) = (k, l)} Best Regards, Raymond Liu -Original Message- From: marylucy [mailto:qaz163wsx_...@hotmail.com] Sent: Friday, August 29, 2014 9:26 PM To: Matthew Farrellee Cc: user@spark.apache.org Subject: Re: how to filter value in spark i see it works well,thank you!!! But in follow situation how to do var a = sc.textFile(/sparktest/1/).map((_,a)) var b = sc.textFile(/sparktest/2/).map((_,b)) How to get (3,a) and (4,a) 在 Aug 28, 2014,19:54,Matthew Farrellee m...@redhat.com 写道: On 08/28/2014 07:20 AM, marylucy wrote: fileA=1 2 3 4 one number a line,save in /sparktest/1/ fileB=3 4 5 6 one number a line,save in /sparktest/2/ I want to get 3 and 4 var a = sc.textFile(/sparktest/1/).map((_,1)) var b = sc.textFile(/sparktest/2/).map((_,1)) a.filter(param={b.lookup(param._1).length0}).map(_._1).foreach(prin tln) Error throw Scala.MatchError:Null PairRDDFunctions.lookup... the issue is nesting of the b rdd inside a transformation of the a rdd consider using intersection, it's more idiomatic a.intersection(b).foreach(println) but not that intersection will remove duplicates best, matt - To unsubscribe, e-mail: user-unsubscr...@spark.apache.org For additional commands, e-mail: user-h...@spark.apache.org B�CB??[��X�剀�X�KK[XZ[ ?\�\�][��X�剀�X�P?\���\X?KBY][��[圹[X[??K[XZ[ ?\�\�Z[?\���\X?KB�B - To unsubscribe, e-mail: user-unsubscr...@spark.apache.org For additional commands, e-mail: user-h...@spark.apache.org - To unsubscribe, e-mail: user-unsubscr...@spark.apache.org For additional commands, e-mail: user-h...@spark.apache.org
RE: how to filter value in spark
You could use cogroup to combine RDDs in one RDD for cross reference processing. e.g. a.cogroup(b). filter{case (_, (l,r)) = l.nonEmpty r.nonEmpty }. map{case (k,(l,r)) = (k, l)} Best Regards, Raymond Liu -Original Message- From: marylucy [mailto:qaz163wsx_...@hotmail.com] Sent: Friday, August 29, 2014 9:26 PM To: Matthew Farrellee Cc: user@spark.apache.org Subject: Re: how to filter value in spark i see it works well,thank you!!! But in follow situation how to do var a = sc.textFile(/sparktest/1/).map((_,a)) var b = sc.textFile(/sparktest/2/).map((_,b)) How to get (3,a) and (4,a) 在 Aug 28, 2014,19:54,Matthew Farrellee m...@redhat.com 写道: On 08/28/2014 07:20 AM, marylucy wrote: fileA=1 2 3 4 one number a line,save in /sparktest/1/ fileB=3 4 5 6 one number a line,save in /sparktest/2/ I want to get 3 and 4 var a = sc.textFile(/sparktest/1/).map((_,1)) var b = sc.textFile(/sparktest/2/).map((_,1)) a.filter(param={b.lookup(param._1).length0}).map(_._1).foreach(prin tln) Error throw Scala.MatchError:Null PairRDDFunctions.lookup... the issue is nesting of the b rdd inside a transformation of the a rdd consider using intersection, it's more idiomatic a.intersection(b).foreach(println) but not that intersection will remove duplicates best, matt - To unsubscribe, e-mail: user-unsubscr...@spark.apache.org For additional commands, e-mail: user-h...@spark.apache.org B�CB??[��X�剀�X�KK[XZ[ ?\�\�][��X�剀�X�P?\���\X?KBY][��[圹[X[??K[XZ[ ?\�\�Z[?\���\X?KB�B
Re: how to filter value in spark
i see it works well,thank you!!! But in follow situation how to do var a = sc.textFile(/sparktest/1/).map((_,a)) var b = sc.textFile(/sparktest/2/).map((_,b)) How to get (3,a) and (4,a) 在 Aug 28, 2014,19:54,Matthew Farrellee m...@redhat.com 写道: On 08/28/2014 07:20 AM, marylucy wrote: fileA=1 2 3 4 one number a line,save in /sparktest/1/ fileB=3 4 5 6 one number a line,save in /sparktest/2/ I want to get 3 and 4 var a = sc.textFile(/sparktest/1/).map((_,1)) var b = sc.textFile(/sparktest/2/).map((_,1)) a.filter(param={b.lookup(param._1).length0}).map(_._1).foreach(println) Error throw Scala.MatchError:Null PairRDDFunctions.lookup... the issue is nesting of the b rdd inside a transformation of the a rdd consider using intersection, it's more idiomatic a.intersection(b).foreach(println) but not that intersection will remove duplicates best, matt - To unsubscribe, e-mail: user-unsubscr...@spark.apache.org For additional commands, e-mail: user-h...@spark.apache.org
Re: how to filter value in spark
On 08/28/2014 07:20 AM, marylucy wrote: fileA=1 2 3 4 one number a line,save in /sparktest/1/ fileB=3 4 5 6 one number a line,save in /sparktest/2/ I want to get 3 and 4 var a = sc.textFile(/sparktest/1/).map((_,1)) var b = sc.textFile(/sparktest/2/).map((_,1)) a.filter(param={b.lookup(param._1).length0}).map(_._1).foreach(println) Error throw Scala.MatchError:Null PairRDDFunctions.lookup... the issue is nesting of the b rdd inside a transformation of the a rdd consider using intersection, it's more idiomatic a.intersection(b).foreach(println) but not that intersection will remove duplicates best, matt - To unsubscribe, e-mail: user-unsubscr...@spark.apache.org For additional commands, e-mail: user-h...@spark.apache.org