Is it possible to merged delayed batches in streaming?

2015-09-23 Thread Bin Wang
I'm using Spark Streaming and there maybe some delays between batches. I'd
like to know is it possible to merge delayed batches into one batch to do
processing?

For example, the interval is set to 5 min but the first batch uses 1 hour,
so there are many batches delayed. In the end of processing for each batch,
I'll save the data into database. So if all the delayed batches are merged
into a big one, it will save many resources. I'd like to know if it is
possible. Thanks.


Re: Is it possible to merged delayed batches in streaming?

2015-09-23 Thread Tathagata Das
Its not possible. And its actually fundamentally challenging to do so in
the general case because it becomes hard to reason about the processing
semantics - especially when there are per-batch aggregations.

On Wed, Sep 23, 2015 at 12:17 AM, Bin Wang  wrote:

> I'm using Spark Streaming and there maybe some delays between batches. I'd
> like to know is it possible to merge delayed batches into one batch to do
> processing?
>
> For example, the interval is set to 5 min but the first batch uses 1 hour,
> so there are many batches delayed. In the end of processing for each batch,
> I'll save the data into database. So if all the delayed batches are merged
> into a big one, it will save many resources. I'd like to know if it is
> possible. Thanks.
>