Hello Jungtaek,

+1 for a distributed snapshot support for Storm !

Regarding breaking workers compatibility, on my side that wouln't be a big
deal, as we do not yet do "rolling upgrades" of our Storm clusters.

Even do we where doing rolling upgrades for normal upgrades, getting such a
great improvement such as distributed snapshot would be a good reason to
make a "cold upgrade" of our clusters.

Thanks,
Alexandre Vermeerbergen


2018-01-08 0:53 GMT+01:00 Jungtaek Lim <[email protected]>:

> Hi devs,
>
> We have added a feature regarding support old Storm workers in Storm 2.0.0
> via STORM-2448 [1] which was OK to me before addressing metrics issue, but
> for now I think it worths to discuss.
>
> STORM-2448 assumes we have backward compatible interaction between daemons
> (Nimbus/Supervisor/etc.) and worker in Storm 2.0.0. It is not only for
> interaction via thrift, but also for interaction via any ways including
> Zookeeper.
>
> STORM-2693[2] came in as nice improvement, which changes the mechanism of
> heartbeat (replace ZK with thrift RPC for interprocess heartbeat transfer)
> and it is not compatible with old Storm workers. (We are still be able to
> make it as backward compatible via letting Nimbus also support old style
> heartbeat - reading ZK periodically, but it clearly reduces the performance
> gain.)
>
> Now I can see a patch for STORM-2156[3], which stores metrics into RocksDB,
> but worker metrics are not addressed yet. I guess it will depend on Metrics
> V2 (STORM-2153)[4] and regardless of dependent, if STORM-2156 would want to
> change the approach of publishing metric from workers (via thrift RPC), it
> will be also backward incompatible (same reason as STORM-2693).
>
> We should break backward compatibility eventually to enjoy full benefits on
> this (and others if we have similar improvements), and I'm not sure why it
> can't be at Storm 2.0.0 (major release, nearly 2 years after 1.0.0). Some
> users might be upset with backward incompatibility, but I don't think they
> would not be upset we postpone the breaking changes and finally bring them
> to Storm 3.0.0.
>
> I would like to hear everyone's opinions regarding how to handle this
> situation. We might have some workarounds which makes us bring both
> features but with reducing effects.
>
> Thanks,
> Jungtaek Lim (HeartSaVioR)
>
> 1. https://issues.apache.org/jira/browse/STORM-2448
> 2. https://issues.apache.org/jira/browse/STORM-2693
> 3. https://issues.apache.org/jira/browse/STORM-2156
> 4. https://issues.apache.org/jira/browse/STORM-2153
>
> ps. I imagine that how our consensus goes for this situation: if we could
> bring much improvements but only breaking backward compatible way. One
> possible change would be dropping Acker mechanism and adopting distributed
> snapshot: I have been thinking this as worth to do, and JStorm already made
> a change to bring performance gain and also get advantage while windowing.
>

Reply via email to