Hello Jungtaek, +1 for a distributed snapshot support for Storm !
Regarding breaking workers compatibility, on my side that wouln't be a big deal, as we do not yet do "rolling upgrades" of our Storm clusters. Even do we where doing rolling upgrades for normal upgrades, getting such a great improvement such as distributed snapshot would be a good reason to make a "cold upgrade" of our clusters. Thanks, Alexandre Vermeerbergen 2018-01-08 0:53 GMT+01:00 Jungtaek Lim <[email protected]>: > Hi devs, > > We have added a feature regarding support old Storm workers in Storm 2.0.0 > via STORM-2448 [1] which was OK to me before addressing metrics issue, but > for now I think it worths to discuss. > > STORM-2448 assumes we have backward compatible interaction between daemons > (Nimbus/Supervisor/etc.) and worker in Storm 2.0.0. It is not only for > interaction via thrift, but also for interaction via any ways including > Zookeeper. > > STORM-2693[2] came in as nice improvement, which changes the mechanism of > heartbeat (replace ZK with thrift RPC for interprocess heartbeat transfer) > and it is not compatible with old Storm workers. (We are still be able to > make it as backward compatible via letting Nimbus also support old style > heartbeat - reading ZK periodically, but it clearly reduces the performance > gain.) > > Now I can see a patch for STORM-2156[3], which stores metrics into RocksDB, > but worker metrics are not addressed yet. I guess it will depend on Metrics > V2 (STORM-2153)[4] and regardless of dependent, if STORM-2156 would want to > change the approach of publishing metric from workers (via thrift RPC), it > will be also backward incompatible (same reason as STORM-2693). > > We should break backward compatibility eventually to enjoy full benefits on > this (and others if we have similar improvements), and I'm not sure why it > can't be at Storm 2.0.0 (major release, nearly 2 years after 1.0.0). Some > users might be upset with backward incompatibility, but I don't think they > would not be upset we postpone the breaking changes and finally bring them > to Storm 3.0.0. > > I would like to hear everyone's opinions regarding how to handle this > situation. We might have some workarounds which makes us bring both > features but with reducing effects. > > Thanks, > Jungtaek Lim (HeartSaVioR) > > 1. https://issues.apache.org/jira/browse/STORM-2448 > 2. https://issues.apache.org/jira/browse/STORM-2693 > 3. https://issues.apache.org/jira/browse/STORM-2156 > 4. https://issues.apache.org/jira/browse/STORM-2153 > > ps. I imagine that how our consensus goes for this situation: if we could > bring much improvements but only breaking backward compatible way. One > possible change would be dropping Acker mechanism and adopting distributed > snapshot: I have been thinking this as worth to do, and JStorm already made > a change to bring performance gain and also get advantage while windowing. >
