Re: how to filter value in spark

2014-09-01 Thread Matthew Farrellee
you could join, it'll give you the intersection and a list of the labels
where the value was found.

 a.join(b).collect
Array[(String, (String, String))] = Array((4,(a,b)), (3,(a,b)))

best,


matt

On 08/31/2014 09:23 PM, Liu, Raymond wrote:
 You could use cogroup to combine RDDs in one RDD for cross reference 
 processing.
 
 e.g.
 
 a.cogroup(b). filter{case (_, (l,r)) = l.nonEmpty  r.nonEmpty }. map{case 
 (k,(l,r)) = (k, l)}
 
 Best Regards,
 Raymond Liu
 
 -Original Message-
 From: marylucy [mailto:qaz163wsx_...@hotmail.com]
 Sent: Friday, August 29, 2014 9:26 PM
 To: Matthew Farrellee
 Cc: user@spark.apache.org
 Subject: Re: how to filter value in spark
 
 i see it works well,thank you!!!
 
 But in follow situation how to do
 
 var a = sc.textFile(/sparktest/1/).map((_,a))
 var b = sc.textFile(/sparktest/2/).map((_,b))
 How to get (3,a) and (4,a)
 
 
 在 Aug 28, 2014,19:54,Matthew Farrellee m...@redhat.com 写道:
 
 On 08/28/2014 07:20 AM, marylucy wrote:
 fileA=1 2 3 4  one number a line,save in /sparktest/1/
 fileB=3 4 5 6  one number a line,save in /sparktest/2/ I want to get
 3 and 4

 var a = sc.textFile(/sparktest/1/).map((_,1))
 var b = sc.textFile(/sparktest/2/).map((_,1))

 a.filter(param={b.lookup(param._1).length0}).map(_._1).foreach(prin
 tln)

 Error throw
 Scala.MatchError:Null
 PairRDDFunctions.lookup...

 the issue is nesting of the b rdd inside a transformation of the a rdd

 consider using intersection, it's more idiomatic

 a.intersection(b).foreach(println)

 but not that intersection will remove duplicates

 best,


 matt

 -
 To unsubscribe, e-mail: user-unsubscr...@spark.apache.org For
 additional commands, e-mail: user-h...@spark.apache.org

 B�CB??[��X�剀�X�KK[XZ[
 ?\�\�][��X�剀�X�P?\���\X?KBY][��[圹[X[??K[XZ[
 ?\�\�Z[?\���\X?KB�B
 
 -
 To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
 For additional commands, e-mail: user-h...@spark.apache.org
 


-
To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
For additional commands, e-mail: user-h...@spark.apache.org



RE: how to filter value in spark

2014-08-31 Thread Liu, Raymond
You could use cogroup to combine RDDs in one RDD for cross reference processing.

e.g.

a.cogroup(b). filter{case (_, (l,r)) = l.nonEmpty  r.nonEmpty }. map{case 
(k,(l,r)) = (k, l)}

Best Regards,
Raymond Liu

-Original Message-
From: marylucy [mailto:qaz163wsx_...@hotmail.com] 
Sent: Friday, August 29, 2014 9:26 PM
To: Matthew Farrellee
Cc: user@spark.apache.org
Subject: Re: how to filter value in spark

i see it works well,thank you!!!

But in follow situation how to do

var a = sc.textFile(/sparktest/1/).map((_,a))
var b = sc.textFile(/sparktest/2/).map((_,b))
How to get (3,a) and (4,a)


在 Aug 28, 2014,19:54,Matthew Farrellee m...@redhat.com 写道:

 On 08/28/2014 07:20 AM, marylucy wrote:
 fileA=1 2 3 4  one number a line,save in /sparktest/1/
 fileB=3 4 5 6  one number a line,save in /sparktest/2/ I want to get 
 3 and 4
 
 var a = sc.textFile(/sparktest/1/).map((_,1))
 var b = sc.textFile(/sparktest/2/).map((_,1))
 
 a.filter(param={b.lookup(param._1).length0}).map(_._1).foreach(prin
 tln)
 
 Error throw
 Scala.MatchError:Null
 PairRDDFunctions.lookup...
 
 the issue is nesting of the b rdd inside a transformation of the a rdd
 
 consider using intersection, it's more idiomatic
 
 a.intersection(b).foreach(println)
 
 but not that intersection will remove duplicates
 
 best,
 
 
 matt
 
 -
 To unsubscribe, e-mail: user-unsubscr...@spark.apache.org For 
 additional commands, e-mail: user-h...@spark.apache.org
 
B�CB??[��X�剀�X�KK[XZ[
?\�\�][��X�剀�X�P?\���\X?KBY][��[圹[X[??K[XZ[
?\�\�Z[?\���\X?KB�B


Re: how to filter value in spark

2014-08-29 Thread marylucy
i see it works well,thank you!!!

But in follow situation how to do

var a = sc.textFile(/sparktest/1/).map((_,a))
var b = sc.textFile(/sparktest/2/).map((_,b))
How to get (3,a) and (4,a)


在 Aug 28, 2014,19:54,Matthew Farrellee m...@redhat.com 写道:

 On 08/28/2014 07:20 AM, marylucy wrote:
 fileA=1 2 3 4  one number a line,save in /sparktest/1/
 fileB=3 4 5 6  one number a line,save in /sparktest/2/
 I want to get 3 and 4
 
 var a = sc.textFile(/sparktest/1/).map((_,1))
 var b = sc.textFile(/sparktest/2/).map((_,1))
 
 a.filter(param={b.lookup(param._1).length0}).map(_._1).foreach(println)
 
 Error throw
 Scala.MatchError:Null
 PairRDDFunctions.lookup...
 
 the issue is nesting of the b rdd inside a transformation of the a rdd
 
 consider using intersection, it's more idiomatic
 
 a.intersection(b).foreach(println)
 
 but not that intersection will remove duplicates
 
 best,
 
 
 matt
 
 -
 To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
 For additional commands, e-mail: user-h...@spark.apache.org
 


Re: how to filter value in spark

2014-08-28 Thread Matthew Farrellee

On 08/28/2014 07:20 AM, marylucy wrote:

fileA=1 2 3 4  one number a line,save in /sparktest/1/
fileB=3 4 5 6  one number a line,save in /sparktest/2/
I want to get 3 and 4

var a = sc.textFile(/sparktest/1/).map((_,1))
var b = sc.textFile(/sparktest/2/).map((_,1))

a.filter(param={b.lookup(param._1).length0}).map(_._1).foreach(println)

Error throw
Scala.MatchError:Null
PairRDDFunctions.lookup...


the issue is nesting of the b rdd inside a transformation of the a rdd

consider using intersection, it's more idiomatic

a.intersection(b).foreach(println)

but not that intersection will remove duplicates

best,


matt

-
To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
For additional commands, e-mail: user-h...@spark.apache.org