I’ve been trying to produce an updated box diagram to refresh : http://www.slideshare.net/spark-project/deep-divewithsparkstreaming-tathagatadassparkmeetup20130617/26
… after the SPARK-3129, and other switches (a surprising number of comments still mention NetworkReceiver). Here’s what I have so far: https://www.dropbox.com/s/q79taoce2ywdmf1/SparkStreaming.pdf?dl=0 This is not supposed to respect any particular convention (ER, ORM, …). Data flow up to right before RDD creation is in bold arrows, metadata flow is in normal width arrows. This diagram is still very much a WIP (see below : todo), but I wanted to share it to ask: - what’s wrong ? - what are the glaring omissions ? - how can I make this better (i.e. what should I add first to the Todo-list below) ? I’ll be happy to share this (including sources) with whoever asks for it. Todo : - mark private/public classes - mark queues in Receiver, ReceivedBlockHandler, BlockManager - mark type of info on transport : e.g. Actor message, ReceivedBlockInfo — François Garillot