Gaurav, It would not be idempotent per partition, but will be across all partitions combined. In this case the user would have explicitly asked for such a pattern.
Thks, Amol On Thu, Feb 11, 2016 at 12:04 PM, Gaurav Gupta <[email protected]> wrote: > Pramod, > > How would it work with recovery? There could be cases where a tuple went to > P1 and post recovery it can go to P2 > > Gaurav > > On Thu, Feb 11, 2016 at 11:56 AM, Pramod Immaneni <[email protected]> > wrote: > > > Hi, > > > > There are scenarios where the downstream partitions of an upstream > operator > > are generally not performing uniformly resulting in an overall > sub-optimal > > performance dictated by the slowest partitions. The reasons could be data > > related such as some partitions are receiving more data to process than > the > > others or could be environment related such as some partitions are > running > > slower than others because they are on heavily loaded nodes. > > > > A solution based on currently available functionality in the engine would > > be to write a StreamCodec implementation to distribute data among the > > partitions such that each partition is receiving similar amount of data > to > > process. We should consider adding StreamCodecs like these to the library > > but these however do not solve the problem when it is environment > related. > > > > For that a better and more comprehensive approach would be look at how > data > > is being consumed by the downstream partitions from the BufferServer and > > use that information to make decisions on how to send future data. If > some > > partitions are behind others in consuming data then data can be directed > to > > the other partitions. One way to do this would be to relay this type of > > statistical and positional information from BufferServer to the upstream > > publishers. The publishers can use this information in ways such as > making > > it available to StreamCodecs to affect destination of future data. > > > > What do you think. > > > > Thanks > > >
