Dear Apache Storm Community, I am currently managing an Apache Storm cluster with 38 nodes: 3 dedicated to ZooKeeper, 1 to Nimbus and the UI, and 34 nodes running Supervisor and Logviewer processes. Each node has 2 Workers.
At present, our topology update process involves the following steps: 1. Killing the existing topology. 2. Changing dependency JARs under the external-lib dir and restarting Nimbus. 3. Changing dependency JARs under the external-lib dir and restarting Supervisors. 4. Submitting the new topology. Each operation takes about 2–3 minutes. As the number of Supervisor nodes increases, the overall time for topology updates is becoming a concern. I am reaching out to seek advice on how to optimize this process, as I believe there are more efficient ways to handle topology updates in large-scale Storm deployments. Specifically: - Is there a more efficient process to handle code changes without having to manually restart Nimbus and Supervisors? - How can I reduce the overall time for topology updates, especially as our cluster continues to grow? - Are there industry-standard practices for implementing rolling updates or automating the deployment process? Any insights, recommendations, or best practices that could help streamline our update process would be greatly appreciated. Thank you for your time, and I look forward to your suggestions!
