Dear fellow Sparkers, I am barely dipping my toes into the Spark world and I was wondering if the following workflow can be implemented in Spark:
1. Initialize custom data structure DS<i> on each executor <i>. These data structures DS<i> should live until the end of the program. 2. While (some_boolean): 2.1. Read data to RDD 2.2. Partition the RDD with custom partioner 2.3. Loop over the partitions (alternatively executors): 2.3.1 Process RDD partition using data structure DS<i>. Said data structures will need to be updated. 3. Collect the results. This will need access to each DS<i>. This question is probably very naive, and yes, I am aware this looks more like what one would do with MPI than in Spark. Best regards, Jeroen