Hi, 

I have seen in the video from Spark summit that usually (when I use HDFS) are 
data distributed across the whole cluster and usually computations goes to the 
data.

My question is how does it work when I read the data from Amazon S3? Is the 
whole input dataset readed by the master node and then distributed to the slave 
nodes? Or does master node only determine which slave should read what and then 
the reading is performed independently by each of the slaves? 

Thank you in advance for the clarification. 
 

---------------------------------------------------------------------
To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
For additional commands, e-mail: user-h...@spark.apache.org

Reply via email to