I’ve been trying to produce an updated box diagram to refresh :
http://www.slideshare.net/spark-project/deep-divewithsparkstreaming-tathagatadassparkmeetup20130617/26


… after the SPARK-3129, and other switches (a surprising number of comments 
still mention NetworkReceiver).


Here’s what I have so far:
https://www.dropbox.com/s/q79taoce2ywdmf1/SparkStreaming.pdf?dl=0


This is not supposed to respect any particular convention (ER, ORM, …). Data 
flow up to right before RDD creation is in bold arrows, metadata flow is in 
normal width arrows.


This diagram is still very much a WIP (see below : todo), but I wanted to share 
it to ask:
- what’s wrong ?
- what are the glaring omissions ?
- how can I make this better (i.e. what should I add first to the Todo-list 
below) ?


I’ll be happy to share this (including sources) with whoever asks for it. 


Todo :
- mark private/public classes
- mark queues in Receiver, ReceivedBlockHandler, BlockManager
- mark type of info on transport : e.g. Actor message, ReceivedBlockInfo 



—
François Garillot

Reply via email to