AFAIK Spark Streaming can not work in a way like this. Transformations are
made on DStreams, where DStreams are basically hold (time,
allocatedBlocksForBatch) pairs. Allocated blocks are allocated by the
JobGenerator, unallocated blocks (infos) are collected by
ReceivedBlockTracker. In Spark
There is a BlockGenerator on each worker node next to the
ReceiverSupervisorImpl, which generates Blocks out of an ArrayBuffer in
each interval (block_interval). These Blocks are passed to
ReceiverSupervisorImpl, which throws these blocks to into the BlockManager
for storage. BlockInfos are passed
The block size is configurable and that way I think you can reduce the
block interval, to keep the block in memory only for the limiter interval?
Is that what you are looking for?
On Tue, Mar 24, 2015 at 1:38 PM, Bin Wang wbi...@gmail.com wrote:
Hi,
I'm learning Spark and I find there could
Hi,
I'm learning Spark and I find there could be some optimize for the current
streaming implementation. Correct me if I'm wrong.
The current streaming implementation put the data of one batch into memory
(as RDD). But it seems not necessary.
For example, if I want to count the lines which