Hi Andy, thanks for your response. I already thought about filtering twice, that was what I meant with "that would be equivalent to applying filter twice", but I was thinking if I could do it in a single pass, so that could be later generalized to an arbitrary numbers of classes. I would also like to be able to generate RDDs instead of partitions of a single RDD, so I could use RDD methods like stats() on the fragments. But I think there is currently no RDD method that returns more than one RDD for a single input RDD, so maybe there is some design limitation on Spark that prevents this?
Again, thanks for your answer. Greetings, Juan El 17/12/2014 18:15, "andy petrella" <andy.petre...@gmail.com> escribió: > yo, > > First, here is the scala version: > http://www.scala-lang.org/api/current/index.html#scala.collection.Seq@partition(p:A= > >Boolean):(Repr,Repr) > > Second: RDD is distributed so what you'll have to do is to partition each > partition each partition (:-D) or create two RDDs with by filtering twice → > hence tasks will be scheduled distinctly, and data read twice. Choose > what's best for you! > > hth, > andy > > > On Wed Dec 17 2014 at 5:57:56 PM Juan Rodríguez Hortalá < > juan.rodriguez.hort...@gmail.com> wrote: > >> Hi all, >> >> I would like to be able to split a RDD in two pieces according to a >> predicate. That would be equivalent to applying filter twice, with the >> predicate and its complement, which is also similar to Haskell's partition >> list function ( >> http://hackage.haskell.org/package/base-4.7.0.1/docs/Data-List.html). >> There is currently any way to do this in Spark?, or maybe anyone has a >> suggestion about how to implent this by modifying the Spark source. I think >> this is valuable because sometimes I need to split a RDD in several groups >> that are too big to fit in the memory of a single thread, so pair RDDs are >> not solution for those cases. A generalization to n parts of Haskell's >> partition would do the job. >> >> Thanks a lot for your help. >> >> Greetings, >> >> Juan Rodriguez >> >