Something like this?

(2 to alphabetLength toList).map(shift => Object.myFunction(inputRDD,
shift).map(v => shift -> v).foldLeft(Object.myFunction(inputRDD, 1).map(v
=> 1 -> v))(_ union _)

which is an RDD[(Int, Char)]

Problem is that you can't play with RDDs inside of RDDs. The recursive
structure breaks the Spark programming model.

On Sat, Mar 21, 2015 at 10:26 AM, Michael Lewis <lewi...@me.com> wrote:

> Hi,
>
> I wonder if someone can help suggest a solution to my problem, I had a
> simple process working using Strings and now
> want to convert to RDD[Char], the problem is when I end up with a nested
> call as follow:
>
>
> 1) Load a text file into an RDD[Char]
>
>         val inputRDD = sc.textFile(“myFile.txt”).flatMap(_.toIterator)
>
>
> 2) I have a method that takes two parameters:
>
>         object Foo
>         {
>                 def myFunction(inputRDD: RDD[Char], int val) : RDD[Char]
> ...
>
>
> 3) I have a method that the driver process calls once its loaded the
> inputRDD ‘bar’ as follows:
>
> def bar(inputRDD: Rdd[Char) : Int = {
>
>          val solutionSet = sc.parallelize(1 to alphabetLength
> toList).map(shift => (shift, Object.myFunction(inputRDD,shift)))
>
>
>
> What I’m trying to do is take a list 1..26 and generate a set of tuples {
> (1,RDD(1)), …. (26,RDD(26)) }  which is the inputRDD passed through
> the function above, but with different set of shift parameters.
>
> In my original I could parallelise the algorithm fine, but my input string
> had to be in a ‘String’ variable, I’d rather it be an RDD
> (string could be large). I think the way I’m trying to do it above won’t
> work because its a nested RDD call.
>
> Can anybody suggest a solution?
>
> Regards,
> Mike Lewis
>
>
>
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
> For additional commands, e-mail: user-h...@spark.apache.org
>
>

Reply via email to