Hi, 2- the end results are sent back to the driver; the shuffles are transmission of intermediate results between nodes such as the -> which are all intermediate transformations.
More precisely, since flatMap and map are narrow dependencies, meaning they can usually happen on the local node, I bet shuffle is just sending out the textFile to a few nodes to distribute the partitions. ________________________________ From: Kartik Mathur <kar...@bluedata.com> Sent: Thursday, October 1, 2015 12:42 AM To: user Subject: Problem understanding spark word count execution Hi All, I tried running spark word count and I have couple of questions - I am analyzing stage 0 , i.e sc.textFile -> flatMap -> Map (Word count example) 1) In the Stage logs under Application UI details for every task I am seeing Shuffle write as 2.7 KB, question - how can I know where all did this task write ? like how many bytes to which executer ? 2) In the executer's log when I look for same task it says 2000 bytes of result is sent to driver , my question is , if the results were directly sent to driver what is this shuffle write ? Thanks, Kartik