Re: deciding Spark tasks & optimization resource

2022-08-29 Thread Gibson
Hello Rajat, Look up the spark *Pipelining* concept; any sequence of operations that feed data directly into each other without need for shuffling will packed into a single stage, ie select -> filter -> select (SparkSQL) ; map -> filter -> map (RDD), for any operation that requires shuffling

deciding Spark tasks & optimization resource

2022-08-29 Thread rajat kumar
Hello Members, I have a query for spark stages:- why every stage has a different number of tasks/partitions in spark. Or how is it determined? Moreover, where can i see the improvements done in spark3+ Thanks in advance Rajat