[GitHub] storm issue #2789: STORM-3173: flush metrics to ScheduledReporter on shutdow...
Github user srdo commented on the issue: https://github.com/apache/storm/pull/2789 @zd-project I'm happy to take a look at implementing https://github.com/apache/storm/pull/2764 once the non-static PR is merged. I don't think it makes sense to mix more stuff into the current PR, and if it doesn't get approved, it'll be wasted work. ---
[GitHub] storm issue #2789: STORM-3173: flush metrics to ScheduledReporter on shutdow...
Github user zd-project commented on the issue: https://github.com/apache/storm/pull/2789 #2764 also depends on resolution of this issue. Please consider merging that as well as part of the patch. ---
[GitHub] storm issue #2789: STORM-3173: flush metrics to ScheduledReporter on shutdow...
Github user srdo commented on the issue: https://github.com/apache/storm/pull/2789 The change is here https://github.com/apache/storm/pull/2805 and hasn't been merged yet. ---
[GitHub] storm issue #2789: STORM-3173: flush metrics to ScheduledReporter on shutdow...
Github user agresch commented on the issue: https://github.com/apache/storm/pull/2789 what was the status of this? I think work @srdo was doing addressed STORM-3173 possibly? Was that change committed? ---
[GitHub] storm issue #2789: STORM-3173: flush metrics to ScheduledReporter on shutdow...
Github user Ethanlm commented on the issue: https://github.com/apache/storm/pull/2789 sure. Take your time please. ---
[GitHub] storm issue #2789: STORM-3173: flush metrics to ScheduledReporter on shutdow...
Github user srdo commented on the issue: https://github.com/apache/storm/pull/2789 Once the two PRs with new metrics are merged, I'll try to update the non-static branch and get a PR opened. It might take me a little while. ---
[GitHub] storm issue #2789: STORM-3173: flush metrics to ScheduledReporter on shutdow...
Github user Ethanlm commented on the issue: https://github.com/apache/storm/pull/2789 That's right. Will take a look at non-static registry after #2764 and #2764 ---
[GitHub] storm issue #2789: STORM-3173: flush metrics to ScheduledReporter on shutdow...
Github user srdo commented on the issue: https://github.com/apache/storm/pull/2789 Before spending too long on it, consider if we should fix this issue and https://github.com/apache/storm/pull/2714 by making the metrics registry non-static instead (https://github.com/apache/storm/pull/2783). I think it will be much easier to do. ---
[GitHub] storm issue #2789: STORM-3173: flush metrics to ScheduledReporter on shutdow...
Github user Ethanlm commented on the issue: https://github.com/apache/storm/pull/2789 Tested it manually on a single-node cluster. `nimbus:num-shutdown-calls` was flushed now with this patch. Still need to figure out travis build failure before merging this in. ---
[GitHub] storm issue #2789: STORM-3173: flush metrics to ScheduledReporter on shutdow...
Github user Ethanlm commented on the issue: https://github.com/apache/storm/pull/2789 I am also getting ` org.apache.maven.surefire.booter.SurefireBooterForkException: Error occurred in starting fork, check output in log` on my VM. And the tests passed without this patch. Still need to investigate the issue ---
[GitHub] storm issue #2789: STORM-3173: flush metrics to ScheduledReporter on shutdow...
Github user srdo commented on the issue: https://github.com/apache/storm/pull/2789 I've added calls to report to the branch, the tests are running at https://travis-ci.org/srdo/storm/builds/413332618. Unless the test specifies another configuration file, I would expect us to be using https://github.com/apache/storm/blob/master/conf/defaults.yaml. I also don't understand why heartbeatTimer would still be running in AsyncLocalizerTest. JUnit doesn't shut down the JVM between tests, so it would make sense that a thread could leak from another test into AsyncLocalizerTest, but it would require someone opening a Supervisor in a test somewhere and forgetting to close it. Happy to help out. ---
[GitHub] storm issue #2789: STORM-3173: flush metrics to ScheduledReporter on shutdow...
Github user zd-project commented on the issue: https://github.com/apache/storm/pull/2789 Thanks again for helping me out of this issue though. Hope this don't create too much trouble for you. ---
[GitHub] storm issue #2789: STORM-3173: flush metrics to ScheduledReporter on shutdow...
Github user zd-project commented on the issue: https://github.com/apache/storm/pull/2789 Also, I just dug into the error code you suggested. It turned out to be caused by heartbeatTimer (instance of StormTimer) in Supervisor in AsyncLocalizerTest.testKeyNotFoundException. But I don't see where Supervisor is created. Super weird. > 2018-08-07 16:38:11.217 [heartbeatTimer] ERROR org.apache.storm.daemon.supervisor.DefaultUncaughtExceptionHandler - Error when processing event on thread Thread[heartbeatTimer ,10,main] ... > 2018-08-07 16:38:11.218 [heartbeatTimer] ERROR org.apache.storm.utils.Utils - val 202 2018-08-07 16:38:11.218 [heartbeatTimer] ERROR org.apache.storm.utils.Utils - Halting process: Error when processing an event java.lang.RuntimeException: Halting process: Error when processing an event at org.apache.storm.utils.Utils.exitProcess(Utils.java:474) [storm-client-2.0.0-SNAPSHOT.jar:?] at org.apache.storm.daemon.supervisor.DefaultUncaughtExceptionHandler.uncaughtException(DefaultUncaughtExceptionHandler.java:25) [classes/:?] at org.apache.storm.StormTimer$StormTimerTask.run(StormTimer.java:253) [storm-client-2.0.0-SNAPSHOT.jar:?] ---
[GitHub] storm issue #2789: STORM-3173: flush metrics to ScheduledReporter on shutdow...
Github user zd-project commented on the issue: https://github.com/apache/storm/pull/2789 Okay that alerted me one thing. What's our config file for the testing? If it doesn't use ScheduledReporter for testing, this PR shouldn't make a matter at all. ---
[GitHub] storm issue #2789: STORM-3173: flush metrics to ScheduledReporter on shutdow...
Github user zd-project commented on the issue: https://github.com/apache/storm/pull/2789 I can't see changes in your link, so I looked into your latest change in branch `non-static-metrics`. Please correct me if I interpret your code wrong. Your implementation didn't actually flush the metrics for ScheduledReporter on exit. You want to add `reporter.report()` in ConsolePreparableReporter or CsvPreparableReporter and test it out. See my commit [STORM-3173: Addressing nits](https://github.com/apache/storm/pull/2789/commits/e5fb9d80f4af8fee2b3cff915c5401a1e0e0cde2#diff-65a7d8c48dacfbb072a99f50df9775c6R40) ---
[GitHub] storm issue #2789: STORM-3173: flush metrics to ScheduledReporter on shutdow...
Github user srdo commented on the issue: https://github.com/apache/storm/pull/2789 It should. Take a look at https://github.com/srdo/storm/compare/892179527b5a...0916fae11cb2#diff-9c4945e8e924565ca1f071435f4711e9. If you compare the daemon classes (Nimbus, Supervisor etc.) there should be a call to stopMetricsReporters in all of them, that should stop the reporters on shutdown. ---
[GitHub] storm issue #2789: STORM-3173: flush metrics to ScheduledReporter on shutdow...
Github user zd-project commented on the issue: https://github.com/apache/storm/pull/2789 Interesting. Does your implementation also guarantee flushing all metrics upon exit? ---
[GitHub] storm issue #2789: STORM-3173: flush metrics to ScheduledReporter on shutdow...
Github user srdo commented on the issue: https://github.com/apache/storm/pull/2789 Unless it turns out to be easy to fix the test errors, I'd like to propose not merging this and fixing this issue as part of making the metrics registry non-static. I added reporter shutdowns to all the daemons on the non-static-metrics branch, and it works with no need for bandaid fixes https://travis-ci.org/srdo/storm/builds/413283316. ---
[GitHub] storm issue #2789: STORM-3173: flush metrics to ScheduledReporter on shutdow...
Github user srdo commented on the issue: https://github.com/apache/storm/pull/2789 Right, sorry, didn't recheck the logs. ---
[GitHub] storm issue #2789: STORM-3173: flush metrics to ScheduledReporter on shutdow...
Github user zd-project commented on the issue: https://github.com/apache/storm/pull/2789 The lastest failure didn't provide an error code though. ---
[GitHub] storm issue #2789: STORM-3173: flush metrics to ScheduledReporter on shutdow...
Github user srdo commented on the issue: https://github.com/apache/storm/pull/2789 No, but like I said, I suspect that the exit code is being set by a call to `Utils.exitProcess`. If you try changing some of the exit codes to be unique, it's likely you'll be able to tell which call to `Utils.exitProcess` is responsible for killing the JVM, which could help you tell what's happening. ---
[GitHub] storm issue #2789: STORM-3173: flush metrics to ScheduledReporter on shutdow...
Github user zd-project commented on the issue: https://github.com/apache/storm/pull/2789 I managed to fix the original timeout error by wrapping around a nimbus gauge with exception handling code. However the PR still failed the storm-server test due to error in starting fork. I don't really know what caused this. @srdo Have you seen anything like this before? Additionally, I think our tests have some flaws in manipulating zookeeper as I constantly see a component tries to connect to Zookeeper that is not set up in the test, causing timeout warning in https://issues.apache.org/jira/browse/STORM-3128 ---
[GitHub] storm issue #2789: STORM-3173: flush metrics to ScheduledReporter on shutdow...
Github user srdo commented on the issue: https://github.com/apache/storm/pull/2789 The exit code is 20. Try looking for uses of `Utils.exitProcess` with `20` as the argument. If you change the values to be unique and rerun the tests, you should be able to nail down where the exit is happening. ---
[GitHub] storm issue #2789: STORM-3173: flush metrics to ScheduledReporter on shutdow...
Github user zd-project commented on the issue: https://github.com/apache/storm/pull/2789 Build logs report test crash [INFO] Running org.apache.storm.scheduler.resource.TestResourceAwareScheduler, but it's passing on my local machine. Don't know what's happening. It'll be great if you guys can offer some insight. @Ethanlm @srdo @HeartSaVioR ---
[GitHub] storm issue #2789: STORM-3173: flush metrics to ScheduledReporter on shutdow...
Github user zd-project commented on the issue: https://github.com/apache/storm/pull/2789 https://issues.apache.org/jira/browse/STORM-3128 This bug in test appeared again and broke the test in this PR but I don't actually know the cause of it. ---