Aggregation of distributed datasets

raggy Fri, 13 Mar 2015 21:57:51 -0700

I am a PhD student trying to understand the internals of Spark, so that I can
make some modifications to it. I am trying to understand how the aggregation
of the distributed datasets(through the network) onto the driver node works.
I would very much appreciate it if someone could point me towards the source
code that is involved with the aggregation over the network. An explanation
on how it works would also be appreciated.


 So far, I have followed the code to identify that the handleJobSubmitted()
function in DAGScheduler.scala is invoked when trying to schedule a job. And
then since I am trying to run it on a cluster, I reach 

listenerBus.post(SparkListenerJobStart(job.jobId, jobSubmissionTime,
stageInfos, properties)) on line 759 in DAGScheduler.scala. I am not sure
where to go from here. 



--
View this message in context: 
http://apache-spark-user-list.1001560.n3.nabble.com/Aggregation-of-distributed-datasets-tp22048.html
Sent from the Apache Spark User List mailing list archive at Nabble.com.

---------------------------------------------------------------------
To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
For additional commands, e-mail: user-h...@spark.apache.org

Aggregation of distributed datasets

Reply via email to