Hello Rajat,
Look up the spark *Pipelining* concept; any sequence of operations that
feed data directly into each other without need for shuffling will packed
into a single stage, ie select -> filter -> select (SparkSQL) ; map ->
filter -> map (RDD), for any operation that requires shuffling
Hello Members,
I have a query for spark stages:-
why every stage has a different number of tasks/partitions in spark. Or how
is it determined?
Moreover, where can i see the improvements done in spark3+
Thanks in advance
Rajat