Hi all, would like some insight. I am currently computing huge databases, and playing with monitoring and tunning.
When monitoring the multiple cores I have, I see that even when RDDs are parallelized, computation on the RDD jump from core to core sporadically ( I guess, depending on where the chunk is), So I see one CORE at 100% usage, and the other ones sitting idle by, after some time when the task is complete, the procesing jumps into another core, and so on. can you share any general insight on this situation? Does this depend on the computation? I have tried serialization and different setups, but I neve see more than 1 Core working at a spark-submission. note: This is no cluster mode, just local processors. Thanks, Saif