While it does feel like a filter is what you want to do, a common way to handle this is to map to different keys.
Using your rddList example it becomes like this (scala style): --- val rddSplit: RDD[(Int, Any)] = rdd.map(x => (*createKey*(x), x)) val rddBuckets: RDD[(Int, Iterable[Any])] = rddSplit.groupByKey --- You write *createKey* to do the equivalent work as your filters & then you have a single RDD with your buckets. On Wed, Jun 10, 2015 at 5:56 AM dgoldenberg <dgoldenberg...@gmail.com> wrote: > Hi, > > I'm gathering that the typical approach for splitting an RDD is to apply > several filters to it. > > rdd1 = rdd.filter(func1); > rdd2 = rdd.filter(func2); > ... > > Is there/should there be a way to create 'buckets' like these in one go? > > List<RDD> rddList = rdd.filter(func1, func2, ..., funcN) > > Another angle here is, when applying a filter(func), is there a way to get > two RDD's back, one for which func returned true for all elements of the > original RDD (the one being filtered), and the other one for which func > returned false for all the elements? > > Pair<RDD> pair = rdd.filterTrueFalse(func); > > Right now I'm doing > > RDD x = rdd.filter(func); > RDD y = rdd.filter(reverseOfFunc); > > This seems a bit tautological to me, though Spark must be optimizing this > out (?) > > Thanks. > > > > > > -- > View this message in context: > http://apache-spark-user-list.1001560.n3.nabble.com/Split-RDD-based-on-criteria-tp23254.html > Sent from the Apache Spark User List mailing list archive at Nabble.com. > > --------------------------------------------------------------------- > To unsubscribe, e-mail: user-unsubscr...@spark.apache.org > For additional commands, e-mail: user-h...@spark.apache.org > >