Hi all, I have a process where I do some calculations on each one of the columns of a dataframe. Intrinsecally, I run across each column with a for loop. On the other hand, each process itself is non-entirely-distributable.
To speed up the process, I would like to submit a spark program for each column, any suggestions? I was thinking on primitive threads sharing a spark context. Thank you, Saif