Glad that you reproduced and fixed the problem. Happy to help! - Jungtaek Lim (HeartSaVioR)
2017년 8월 26일 (토) 오후 11:13, Alexandre Vermeerbergen <avermeerber...@gmail.com>님이 작성: > Hello Jungtaek, > > Your answers were very useful, because I was able to reproduce the issue by > simply deleting storm.local.dir contents, and I found traces that indeed > the machine that suffered this issue had lost this directory : it confirms > your diagnostic, thank you very much for this! > > I have filled a JIRA: https://issues.apache.org/jira/browse/STORM-2707 > hoping that this will help make Storm even more robust when it'll be fixed. > > In the meantime, I'm changing our storm.local.dir to a place that differs > from /tmp in order to reduce the risk of deletion of this directory. > > Best regard, > Alexandre Vermeerbergen > > 2017-08-25 9:44 GMT+02:00 Jungtaek Lim <kabh...@gmail.com>: > > > I'm not sure. Topology code can't be restored, so my best bet would be > > detecting it (periodically, or react in failure) and give up leadership. > > If my memory is right, leader Nimbus doesn't pull blobs from followers, > so > > if it doesn't have any blobs and need to sync, it just needs to become > > follower and let other nimbuses to take a chance to be a leader. > > > > It would not help for non-HA Nimbus cluster anyway. What's next? Maybe we > > could remove related topology (from ZK), but it feels me a bit dangerous > to > > do it automatically, so I'd rather leaning on CLI (maybe) tool so that > > operators can do it manually. > > > > Please file an issue regarding this when you would want to have a > solution. > > > > Best regards, > > Jungtaek Lim (HeartSaVioR) > > > > 2017년 8월 25일 (금) 오전 6:29, Alexandre Vermeerbergen < > > avermeerber...@gmail.com>님이 > > 작성: > > > > > Hello Jungtaek, > > > > > > I confirm that we currently do not have multiple Nimbus nodes. > > > > > > I want to clarify that Nimbus process never crashed : it keep printing > in > > > its log the error: > > > > > > 2017-08-06 03:44:01.777 o.a.s.t.ProcessFunction pool-14-thread-1 > [ERROR] > > > Internal error processing getClusterInfo > > > org.apache.storm.generated.KeyNotFoundException: null > > > ... (rest of the stack trace) > > > > > > However I forgot to mention that: > > > > > > * During this problem our topologies are unaffected > > > * We notice the problem because we have a self-healing check on > Nimbus-UI > > > process which consists in calling Web Services to get a few stats on > > > running topologies. But when Nimbus-UI is unresponsive, we kill it and > > > restart it automatically. > > > * The period of our NimbusUI self healing cron is 1 minute > > > * When the getClusterInfo stack trace occurs on Nimbus log, Nimbus UI > is > > > unable to restart > > > * Here's the kind of exceptions we see in ui.log (Nimbus UI's trace): > > > > > > 2017-08-24 21:24:21.777 o.a.s.u.core main [INFO] Starting ui server for > > > storm version '1.1.0' > > > 2017-08-24 21:24:21.788 o.a.s.d.m.MetricsUtils main [INFO] Using > > statistics > > > reporter > > > plugin:org.apache.storm.daemon.metrics.reporters.JmxPreparableReporter > > > 2017-08-24 21:24:21.790 o.a.s.d.m.r.JmxPreparableReporter main [INFO] > > > Preparing... > > > 2017-08-24 21:24:21.801 o.a.s.d.common main [INFO] Started statistics > > > report plugin... > > > 2017-08-24 21:24:21.878 o.a.s.s.o.e.j.s.Server main [INFO] > > > jetty-7.x.y-SNAPSHOT > > > 2017-08-24 21:24:21.929 o.a.s.s.o.e.j.s.h.ContextHandler main [INFO] > > > started o.a.s.s.o.e.j.s.ServletContextHandler{/,null} > > > 2017-08-24 21:24:21.965 o.a.s.s.o.e.j.s.AbstractConnector main [INFO] > > > Started SelectChannelConnector@0.0.0.0:8070 > > > 2017-08-24 21:25:02.446 o.a.s.u.NimbusClient qtp2142536057-19 [WARN] > > > Ignoring exception while trying to get leader nimbus info from > > > ec2-52-48-59-151.eu-west-1.compute.amazonaws.com. will retry with a > > > different seed host. > > > org.apache.storm.thrift.transport.TTransportException: null > > > at > > > > > > org.apache.storm.thrift.transport.TIOStreamTransport. > > read(TIOStreamTransport.java:132) > > > ~[storm-core-1.1.0.jar:1.1.0] > > > at > > > > org.apache.storm.thrift.transport.TTransport.readAll(TTransport.java:86) > > > ~[storm-core-1.1.0.jar:1.1.0] > > > at > > > > > > org.apache.storm.thrift.transport.TFramedTransport. > > readFrame(TFramedTransport.java:129) > > > ~[storm-core-1.1.0.jar:1.1.0] > > > at > > > > > > org.apache.storm.thrift.transport.TFramedTransport. > > read(TFramedTransport.java:101) > > > ~[storm-core-1.1.0.jar:1.1.0] > > > at > > > > org.apache.storm.thrift.transport.TTransport.readAll(TTransport.java:86) > > > ~[storm-core-1.1.0.jar:1.1.0] > > > at > > > > > > org.apache.storm.thrift.protocol.TBinaryProtocol. > > readAll(TBinaryProtocol.java:429) > > > ~[storm-core-1.1.0.jar:1.1.0] > > > at > > > > > > org.apache.storm.thrift.protocol.TBinaryProtocol. > > readI32(TBinaryProtocol.java:318) > > > ~[storm-core-1.1.0.jar:1.1.0] > > > at > > > > > > org.apache.storm.thrift.protocol.TBinaryProtocol.readMessageBegin( > > TBinaryProtocol.java:219) > > > ~[storm-core-1.1.0.jar:1.1.0] > > > at > > > org.apache.storm.thrift.TServiceClient.receiveBase( > > TServiceClient.java:77) > > > ~[storm-core-1.1.0.jar:1.1.0] > > > at > > > org.apache.storm.generated.Nimbus$Client.recv_getLeader( > > Nimbus.java:1193) > > > ~[storm-core-1.1.0.jar:1.1.0] > > > at > > > org.apache.storm.generated.Nimbus$Client.getLeader(Nimbus.java:1181) > > > ~[storm-core-1.1.0.jar:1.1.0] > > > at > > > > > > org.apache.storm.utils.NimbusClient.getConfiguredClientAs( > > NimbusClient.java:84) > > > ~[storm-core-1.1.0.jar:1.1.0] > > > at > org.apache.storm.ui.core$cluster_summary.invoke(core.clj:355) > > > ~[storm-core-1.1.0.jar:1.1.0] > > > at org.apache.storm.ui.core$fn__9556.invoke(core.clj:1113) > > > ~[storm-core-1.1.0.jar:1.1.0] > > > at > > > > > > org.apache.storm.shade.compojure.core$make_route$fn__ > > 5976.invoke(core.clj:100) > > > ~[storm-core-1.1.0.jar:1.1.0] > > > at > > > org.apache.storm.shade.compojure.core$if_route$fn__ > > 5964.invoke(core.clj:46) > > > ~[storm-core-1.1.0.jar:1.1.0] > > > at > > > > > > org.apache.storm.shade.compojure.core$if_method$fn__ > > 5957.invoke(core.clj:31) > > > ~[storm-core-1.1.0.jar:1.1.0] > > > at > > > org.apache.storm.shade.compojure.core$routing$fn__ > > 5982.invoke(core.clj:113) > > > ~[storm-core-1.1.0.jar:1.1.0] > > > at clojure.core$some.invoke(core.clj:2570) > > ~[clojure-1.7.0.jar:?] > > > at > > > org.apache.storm.shade.compojure.core$routing.doInvoke(core.clj:113) > > > ~[storm-core-1.1.0.jar:1.1.0] > > > at clojure.lang.RestFn.applyTo(RestFn.java:139) > > > ~[clojure-1.7.0.jar:?] > > > at clojure.core$apply.invoke(core.clj:632) > > ~[clojure-1.7.0.jar:?] > > > > > > So our problem is that Nimbus stays forever in a inconsistent state > > without > > > crashing, and if it would crash, then it would be helpless since we > would > > > have automatic restart of it (we also have a self healing on Nimbus) > but > > > again in such case we need to manually cleanup the content in the ZKs > to > > be > > > able to restart Nimbus in a clean state. > > > > > > My question is the following one: is there a way to make Nimbus more > > > resilient to such issues? > > > > > > Best regards, > > > Alexandre > > > > > > > > > > > > > > > > > > > > > > > > 2017-08-24 23:06 GMT+02:00 Jungtaek Lim <kabh...@gmail.com>: > > > > > > > Blob files (meta, data) are in storm local directory. ZK only has > list > > of > > > > blob keys and which alive nimbuses have that file. So if you lose > storm > > > > local directory, you just can't restore blobs, unless other nimbuses > > have > > > > these blobs so current nimbus could pull. > > > > (I guess you have only one nimbus, since replication factor was 1 in > > > nimbus > > > > log.) > > > > > > > > If FNF occurs while finding meta file, we just throw > > KeyNotFoundException > > > > because it is just same as not exist. > > > > > > > > Anyway you have to clear up everything because there's inconsistency > > > > between ZK and local, and no way to recover missed blob files so no > way > > > to > > > > make it consistent. > > > > > > > > Btw, in this case, my expectation is that after crashing and > > restarting, > > > > nimbus can't be a leader and just waits for missing blobs. If you saw > > > > Nimbus crashed again while relaunching, it could be a bug. > > > > > > > > - Jungtaek Lim (HeartSaVioR) > > > > On Fri, 25 Aug 2017 at 3:35 AM Alexandre Vermeerbergen < > > > > avermeerber...@gmail.com> wrote: > > > > > > > > > Hello Jungtaek, > > > > > > > > > > I can do what you suggest (ie moving storm local dir to a place > which > > > > isn't > > > > > in /tmp),but since the issue occurs rarely (once per month), I > doubt > > > I'll > > > > > be able to feedback soon. > > > > > > > > > > What is puzzling to me is that in order to recover from such issue, > > we > > > > have > > > > > to stop everything, then clean the Zookeepers, then restarts all > > Storm > > > > > processes and finally restart our topologies. We don't clean the > /tmp > > > > > directory. > > > > > > > > > > Are you sure there's no relationships between Zookeeper contents > and > > > what > > > > > is causing the following message to happen? > > > > > "Internal error processing getClusterInfo" > > > > > Best regards; > > > > > Alexandre > > > > > > > > > > 2017-08-24 13:41 GMT+02:00 Jungtaek Lim <kabh...@gmail.com>: > > > > > > > > > > > Sorry I meant reproduce, not replicate. :) > > > > > > > > > > > > 2017년 8월 24일 (목) 오후 8:34, Jungtaek Lim <kabh...@gmail.com>님이 작성: > > > > > > > > > > > > > Alexandre, > > > > > > > > > > > > > > I found that your storm local dir is placed to "/tmp/storm" > which > > > > parts > > > > > > or > > > > > > > all could be removed at any time. > > > > > > > Could you move the path to non-temporary place and try to > > > replicate? > > > > > > > > > > > > > > Thanks, > > > > > > > Jungtaek Lim (HeartSaVioR) > > > > > > > > > > > > > > > > > > > > > 2017년 8월 24일 (목) 오후 6:40, Alexandre Vermeerbergen < > > > > > > > avermeerber...@gmail.com>님이 작성: > > > > > > > > > > > > > >> Hello Jungtaek, > > > > > > >> > > > > > > >> Thank you very much for your answer. > > > > > > >> > > > > > > >> Please find attached the full Nimbus log (gzipped) related to > > this > > > > > > issue. > > > > > > >> > > > > > > >> Please note that the last ERROR repeats forever until we > > "repair" > > > > > Storm. > > > > > > >> > > > > > > >> From the logs, it could be that the issue began close to when > a > > > > > topology > > > > > > >> was restarted (killed, then started) > > > > > > >> > > > > > > >> Maybe this caused a corruption in Zookeeper. Is there anything > > > > which I > > > > > > >> can collect in our Zookeeper nodes/logs to help analysis? > > > > > > >> > > > > > > >> Best regards, > > > > > > >> Alexandre > > > > > > >> > > > > > > >> > > > > > > >> > > > > > > >> > > > > > > >> 2017-08-24 9:29 GMT+02:00 Jungtaek Lim <kabh...@gmail.com>: > > > > > > >> > > > > > > >>> Hi Alexandre, I missed this mail since I was on vacation. > > > > > > >>> > > > > > > >>> I followed the stack trace but hard to analyze without > context. > > > Do > > > > > you > > > > > > >>> mind > > > > > > >>> providing full nimbus log? > > > > > > >>> > > > > > > >>> Thanks, > > > > > > >>> Jungtaek Lim (HeartSaVioR) > > > > > > >>> > > > > > > >>> 2017년 8월 16일 (수) 오전 4:12, Alexandre Vermeerbergen < > > > > > > >>> avermeerber...@gmail.com>님이 > > > > > > >>> 작성: > > > > > > >>> > > > > > > >>> > Hello, > > > > > > >>> > > > > > > > >>> > Tomorrow I will have to restart the cluster on which I have > > > this > > > > > > issue > > > > > > >>> with > > > > > > >>> > Storm 1.1.0. > > > > > > >>> > Is there are anybody interested in my running some commands > > to > > > > get > > > > > > more > > > > > > >>> > logs before I repair this cluster? > > > > > > >>> > > > > > > > >>> > Best regards, > > > > > > >>> > Alexandre Vermeerbergen > > > > > > >>> > > > > > > > >>> > 2017-08-13 16:14 GMT+02:00 Alexandre Vermeerbergen < > > > > > > >>> > avermeerber...@gmail.com > > > > > > >>> > >: > > > > > > >>> > > > > > > > >>> > > Hello, > > > > > > >>> > > > > > > > > >>> > > I think it might be of interest for you Storm developers > to > > > > learn > > > > > > >>> that I > > > > > > >>> > > currently have a case of issue with Storm 1.1.0 which was > > > > > supposed > > > > > > to > > > > > > >>> > > resolved in this release according to > > > > https://issues.apache.org/ > > > > > > >>> > > jira/browse/STORM-1977 ; and I can look for any more > > > > information > > > > > > >>> which > > > > > > >>> > > you'd need to make a diagnostic on why this issue still > can > > > > > happen. > > > > > > >>> > > > > > > > > >>> > > Indeed, I have a Storm UI process which can't get any > > > > information > > > > > > on > > > > > > >>> its > > > > > > >>> > > Storm cluster, and I see many following exception in > > > > nimbus.log: > > > > > > >>> > > > > > > > > >>> > > 2017-08-02 05:11:15.971 o.a.s.d.nimbus pool-14-thread-21 > > > [INFO] > > > > > > >>> Created > > > > > > >>> > > download session for > > > > > > >>> statefulAlerting_ec2-52-51-199-56-eu-west-1-compute- > > > > > > >>> > > amazonaws-com_defaultStormTopic-29- > > 1501650673-stormcode.ser > > > > with > > > > > > id > > > > > > >>> > > d5120ad7-a81c-4c39-afc5-a7f876b04c73 > > > > > > >>> > > 2017-08-02 05:11:15.978 o.a.s.d.nimbus pool-14-thread-27 > > > [INFO] > > > > > > >>> Created > > > > > > >>> > > download session for > > > > > > >>> statefulAlerting_ec2-52-51-199-56-eu-west-1-compute- > > > > > > >>> > > amazonaws-com_defaultStormTopic-29- > > 1501650673-stormconf.ser > > > > with > > > > > > id > > > > > > >>> > > aba18011-3258-4023-bbef-14d21a7066e1 > > > > > > >>> > > 2017-08-02 06:20:59.208 o.a.s.d.nimbus timer [INFO] > > Cleaning > > > > > inbox > > > > > > >>> ... > > > > > > >>> > > deleted: > stormjar-fbdadeab-105d-4510-9beb-0f0d87e1a77d.jar > > > > > > >>> > > 2017-08-06 03:42:02.854 o.a.s.t.ProcessFunction > > > > pool-14-thread-34 > > > > > > >>> [ERROR] > > > > > > >>> > > Internal error processing getClusterInfo > > > > > > >>> > > org.apache.storm.generated.KeyNotFoundException: null > > > > > > >>> > > at > > > > > > >>> > > > > > > > >>> > org.apache.storm.blobstore.LocalFsBlobStore.getStoredBlobMeta( > > > > > > LocalFsBlobStore.java:147) > > > > > > >>> > > ~[storm-core-1.1.0.jar:1.1.0] > > > > > > >>> > > at > > > > > > >>> > > > > > > > >>> org.apache.storm.blobstore.LocalFsBlobStore. > > getBlobReplication( > > > > > > LocalFsBlobStore.java:299) > > > > > > >>> > > ~[storm-core-1.1.0.jar:1.1.0] > > > > > > >>> > > at sun.reflect.GeneratedMethodAccessor78. > > > > invoke(Unknown > > > > > > >>> Source) > > > > > > >>> > > ~[?:?] > > > > > > >>> > > at sun.reflect.DelegatingMethodAccessorImpl. > > invoke( > > > > > > >>> > > DelegatingMethodAccessorImpl.java:43) ~[?:1.8.0_121] > > > > > > >>> > > at java.lang.reflect.Method. > > invoke(Method.java:498) > > > > > > >>> > ~[?:1.8.0_121] > > > > > > >>> > > at > > > > > > >>> > clojure.lang.Reflector.invokeMatchingMethod(Reflector.java:93) > > > > > > >>> > > ~[clojure-1.7.0.jar:?] > > > > > > >>> > > at > > > > > > >>> > clojure.lang.Reflector.invokeInstanceMethod(Reflector.java:28) > > > > > > >>> > > ~[clojure-1.7.0.jar:?] > > > > > > >>> > > at > > > > > > >>> > > > > > > > >>> org.apache.storm.daemon.nimbus$get_blob_replication_ > > > > > > count.invoke(nimbus.clj:489) > > > > > > >>> > > ~[storm-core-1.1.0.jar:1.1.0] > > > > > > >>> > > at org.apache.storm.daemon. > > > > nimbus$get_cluster_info$iter__ > > > > > > >>> > > 10687__10691$fn__10692.invoke(nimbus.clj:1550) > > > > > > >>> > > ~[storm-core-1.1.0.jar:1.1.0] > > > > > > >>> > > at clojure.lang.LazySeq.sval(LazySeq.java:40) > > > > > > >>> > > ~[clojure-1.7.0.jar:?] > > > > > > >>> > > at clojure.lang.LazySeq.seq(LazySeq.java:49) > > > > > > >>> > > ~[clojure-1.7.0.jar:?] > > > > > > >>> > > at clojure.lang.RT.seq(RT.java:507) > > > > > ~[clojure-1.7.0.jar:?] > > > > > > >>> > > at clojure.core$seq__4128.invoke(core.clj:137) > > > > > > >>> > > ~[clojure-1.7.0.jar:?] > > > > > > >>> > > at clojure.core$dorun.invoke(core.clj:3009) > > > > > > >>> > ~[clojure-1.7.0.jar:?] > > > > > > >>> > > at clojure.core$doall.invoke(core.clj:3025) > > > > > > >>> > ~[clojure-1.7.0.jar:?] > > > > > > >>> > > at > > > > > > >>> > org.apache.storm.daemon.nimbus$get_cluster_info. > > > > > > invoke(nimbus.clj:1524) > > > > > > >>> > > ~[storm-core-1.1.0.jar:1.1.0] > > > > > > >>> > > at org.apache.storm.daemon. > > nimbus$mk_reified_nimbus$ > > > > > > >>> > > reify__10782.getClusterInfo(nimbus.clj:1971) > > > > > > >>> > ~[storm-core-1.1.0.jar:1.1.0] > > > > > > >>> > > at org.apache.storm.generated.Nimbus$Processor$ > > > > > > >>> > > getClusterInfo.getResult(Nimbus.java:3920) > > > > > > >>> ~[storm-core-1.1.0.jar:1.1.0] > > > > > > >>> > > at org.apache.storm.generated.Nimbus$Processor$ > > > > > > >>> > > getClusterInfo.getResult(Nimbus.java:3904) > > > > > > >>> ~[storm-core-1.1.0.jar:1.1.0] > > > > > > >>> > > at > > > > > > >>> > > > > > > > >>> org.apache.storm.thrift.ProcessFunction.process( > > > > > > ProcessFunction.java:39) > > > > > > >>> > > ~[storm-core-1.1.0.jar:1.1.0] > > > > > > >>> > > at > > > > > > >>> > org.apache.storm.thrift.TBaseProcessor.process( > > > > > > TBaseProcessor.java:39) > > > > > > >>> > > ~[storm-core-1.1.0.jar:1.1.0] > > > > > > >>> > > at org.apache.storm.security. > > > > auth.SimpleTransportPlugin$ > > > > > > >>> > > SimpleWrapProcessor.process(SimpleTransportPlugin.java: > > 162) > > > > > > >>> > > ~[storm-core-1.1.0.jar:1.1.0] > > > > > > >>> > > at org.apache.storm.thrift.server. > > > > > > AbstractNonblockingServer$ > > > > > > >>> > > FrameBuffer.invoke(AbstractNonblockingServer.java:518) > > > > > > >>> > > ~[storm-core-1.1.0.jar:1.1.0] > > > > > > >>> > > at > > > > > > >>> > org.apache.storm.thrift.server.Invocation.run( > > > > Invocation.java:18) > > > > > > >>> > > ~[storm-core-1.1.0.jar:1.1.0] > > > > > > >>> > > at > > > > > > >>> > > > > > > > >>> java.util.concurrent.ThreadPoolExecutor.runWorker( > > > > > > ThreadPoolExecutor.java:1142) > > > > > > >>> > > [?:1.8.0_121] > > > > > > >>> > > at > > > > > > >>> > > > > > > > >>> java.util.concurrent.ThreadPoolExecutor$Worker.run( > > > > > > ThreadPoolExecutor.java:617) > > > > > > >>> > > [?:1.8.0_121] > > > > > > >>> > > at java.lang.Thread.run(Thread.java:745) > > > [?:1.8.0_121] > > > > > > >>> > > > > > > > > >>> > > > > > > > > >>> > > What is amazing is that I got this same issue on two > Storm > > > > > clusters > > > > > > >>> > > running on different VMs ; just they share the same data > in > > > > they > > > > > > >>> Kafka > > > > > > >>> > > Broker cluster (one cluster is the production one, which > > was > > > > > > quickly > > > > > > >>> > fixed, > > > > > > >>> > > and the other one is the "backup" cluster to be used if > the > > > > > > >>> production > > > > > > >>> > one > > > > > > >>> > > fails for quick "back to production") > > > > > > >>> > > > > > > > > >>> > > If left one of these cluster with this behavior because I > > > felt > > > > > that > > > > > > >>> it > > > > > > >>> > > could be interesting for Storm developers to have more > > > > > information > > > > > > on > > > > > > >>> > this > > > > > > >>> > > issue, if needed to properly diagnose it. > > > > > > >>> > > > > > > > > >>> > > I can keep this cluster as is for max 2 days. > > > > > > >>> > > > > > > > > >>> > > Is there anything useful which I can collect on it to > help > > > > Storm > > > > > > >>> > > developers to understand the cause (and hopefully use it > to > > > > make > > > > > > >>> Storm > > > > > > >>> > more > > > > > > >>> > > robust) ? > > > > > > >>> > > > > > > > > >>> > > Few details: > > > > > > >>> > > > > > > > > >>> > > * Storm 1.1.0 cluster with Nimbus & NimbusUI running on a > > VM, > > > > > and 4 > > > > > > >>> > > Supervisors VMs + 3 Zookeeper VMs > > > > > > >>> > > > > > > > > >>> > > * Running with Java Server JRE 1.8.0_121 > > > > > > >>> > > * Running on AWS EC2 instances > > > > > > >>> > > > > > > > > >>> > > * We run about 10 topologies, with automatic self-healing > > on > > > > them > > > > > > (if > > > > > > >>> > they > > > > > > >>> > > stop consuming Kafka items, our self-healer call "Kill > > > > topology", > > > > > > and > > > > > > >>> > then > > > > > > >>> > > eventually restarts the topology > > > > > > >>> > > > > > > > > >>> > > * We have a self-healing on Nimbus UI based on calling > its > > > REST > > > > > > >>> services. > > > > > > >>> > > If it's not responding fast enough, we restart Nimbus UI > > > > > > >>> > > * We figured out the issue because Nimbus UI was > restarted > > > > every > > > > > 2 > > > > > > >>> > minutes > > > > > > >>> > > > > > > > > >>> > > * To fix our production server which had the same > symptom, > > we > > > > had > > > > > > to > > > > > > >>> stop > > > > > > >>> > > all Storm processes, then stop all Zookeepers, then > remove > > > all > > > > > data > > > > > > >>> in > > > > > > >>> > > Zookeeper "snapshot files", then restart all Zookeeper, > > then > > > > > > restart > > > > > > >>> all > > > > > > >>> > > Storm process, then re-submit all our topologies > > > > > > >>> > > > > > > > > >>> > > Please be as clear as possible about which commands we > > should > > > > run > > > > > > to > > > > > > >>> give > > > > > > >>> > > you more details if needed > > > > > > >>> > > > > > > > > >>> > > Best regards, > > > > > > >>> > > Alexandre Vermeerbergen > > > > > > >>> > > > > > > > > >>> > > > > > > > > >>> > > > > > > > >>> > > > > > > >> > > > > > > >> > > > > > > > > > > > > > > > > > > > > >