Re: load based stream partitioning

Amol Kekre Thu, 11 Feb 2016 12:07:44 -0800

Gaurav,
It would not be idempotent per partition, but will be across all partitions
combined. In this case the user would have explicitly asked for such a
pattern.


Thks,
Amol


On Thu, Feb 11, 2016 at 12:04 PM, Gaurav Gupta <[email protected]>
wrote:

> Pramod,
>
> How would it work with recovery? There could be cases where a tuple went to
> P1 and post recovery it can go to P2
>
> Gaurav
>
> On Thu, Feb 11, 2016 at 11:56 AM, Pramod Immaneni <[email protected]>
> wrote:
>
> > Hi,
> >
> > There are scenarios where the downstream partitions of an upstream
> operator
> > are generally not performing uniformly resulting in an overall
> sub-optimal
> > performance dictated by the slowest partitions. The reasons could be data
> > related such as some partitions are receiving more data to process than
> the
> > others or could be environment related such as some partitions are
> running
> > slower than others because they are on heavily loaded nodes.
> >
> > A solution based on currently available functionality in the engine would
> > be to write a StreamCodec implementation to distribute data among the
> > partitions such that each partition is receiving similar amount of data
> to
> > process. We should consider adding StreamCodecs like these to the library
> > but these however do not solve the problem when it is environment
> related.
> >
> > For that a better and more comprehensive approach would be look at how
> data
> > is being consumed by the downstream partitions from the BufferServer and
> > use that information to make decisions on how to send future data. If
> some
> > partitions are behind others in consuming data then data can be directed
> to
> > the other partitions. One way to do this would be to relay this type of
> > statistical and positional information from BufferServer to the upstream
> > publishers. The publishers can use this information in ways such as
> making
> > it available to StreamCodecs to affect destination of future data.
> >
> > What do you think.
> >
> > Thanks
> >
>

Re: load based stream partitioning

Reply via email to