Dear fellow Sparkers,

I am barely dipping my toes into the Spark world and I was wondering if the
​ ​
following workflow can be implemented in Spark:

    1. Initialize custom data structure DS<i> on each executor <i>.
       These data structures DS<i> should live until the end of the
       program.

    2. While (some_boolean):

        2.1. Read data to RDD
        2.2. Partition the RDD with custom partioner
        2.3. Loop over the partitions (alternatively executors):
            2.3.1 Process RDD partition using data structure DS<i>.
                  Said data structures will need to be updated.

    3. Collect the results. This will need access to each DS<i>.

This question is probably very naive, and yes, I am aware this looks more
like what one would do with MPI than in Spark.

Best regards,

Jeroen

Reply via email to