[GitHub] flink issue #5718: [FLINK-8073][kafka-tests] Disable timeout in tests
Github user pnowojski commented on the issue: https://github.com/apache/flink/pull/5718 Thanks! ---
[GitHub] flink issue #5718: [FLINK-8073][kafka-tests] Disable timeout in tests
Github user zentol commented on the issue: https://github.com/apache/flink/pull/5718 merging. ---
[GitHub] flink issue #5718: [FLINK-8073][kafka-tests] Disable timeout in tests
Github user StephanEwen commented on the issue: https://github.com/apache/flink/pull/5718 Having a custom timeout like @zentol suggests could be helpful in the long run. For now, though, I personally agree that the timeouts are counter productive (I actually never understood what they are supposed to solve). I prefer a deadlocking test to stay around, so I can jstack to the process, attach a debugger, pull a heap dump, whatever. The only environment where we cannot do that is the CI server, which has a timeout and thread dump already... ---
[GitHub] flink issue #5718: [FLINK-8073][kafka-tests] Disable timeout in tests
Github user pnowojski commented on the issue: https://github.com/apache/flink/pull/5718 In that case can we merge it as it is and if you want @zentol, we can start a discussion about future direction of this? ---
[GitHub] flink issue #5718: [FLINK-8073][kafka-tests] Disable timeout in tests
Github user zentol commented on the issue: https://github.com/apache/flink/pull/5718 I don't want to make a fuss over a single test, so for the purposes of this test it's fine to just remote the timeout. In general however i think it would be pretty neat if we would add such a `Timeout` to the `TestLogger` class. I see the problem of debugging tests that have timeouts, but removing timeouts altogether is the wrong conclusion imo. Instead, we could implement the `Timeout` such that it doesn't fail the test if a certain profile/property is set. With this we keep the benefits of test timeouts (CI service independence, less special behavior locally vs CI, fixed upper bound for test times which is particularly useful for new tests, "guarantee" that the test terminates) while still allowing debugging. In fact we may end up improving the debugging situation by consolidating how timeouts are implemented instead of each test rolling their own solution that you can't disable. The travis watchdog _cannot_ be removed as it covers the entire maven process that from time to time locks up outside of tests. That said, the fact that we have to rely on the travis watchdog _to ensure that tests terminate_ is a bad sign. Not to mention that it already forced us to introduce workarounds to make tests "compatible", like the kafka tests and others that print stuff for the sole purposes of not triggering it. ---
[GitHub] flink issue #5718: [FLINK-8073][kafka-tests] Disable timeout in tests
Github user tillrohrmann commented on the issue: https://github.com/apache/flink/pull/5718 I agree with @pnowojski. The watchdog and Travis itself should time out the test for us if things take too long. ---
[GitHub] flink issue #5718: [FLINK-8073][kafka-tests] Disable timeout in tests
Github user pnowojski commented on the issue: https://github.com/apache/flink/pull/5718 I think that timeouts in IDE are only making things worse: 1. you can not debug, since longer debugging session will fail 2. in case of deadlocks, locally you can always manually dump/view stack traces but you also can do full thread/memory dump and/or attach a debugger. On travis you can not debug it in any way, thus automated actions make sense (and they preserve some travis resources for other builds). Furthermore, such rule would have to be implemented in all of our tests, not only here. Also IMO if you want to replace/remove `travis_mvn_watchdog` with such rule, this is out of scope of this ticket and it should be discussed separately (I would be against it), so that it doesn't hinder this ticket. When I originally created those tests (with `@Test(timeout = )`) I did it by copying without thinking from a test, that was printing something to logs once every checkpoint, thus it was interrupting travis watchdog and one could argue that it required this special case timeout. This commit makes it more consistent with rest of the tests. ---
[GitHub] flink issue #5718: [FLINK-8073][kafka-tests] Disable timeout in tests
Github user zentol commented on the issue: https://github.com/apache/flink/pull/5718 This change appears to be focus mostly around Travis which i dislike a bit. I'd rather add a custom `Timeout` that prints the thread-dump so we still get nice behavior when running things locally/in the IDE. ``` @Rule public final Timeout timeout = new MyTimeOut(10, TimeUnit.MILLISECONDS); public static class MyTimeOut extends Timeout { public MyTimeOut(long timeout, TimeUnit timeUnit) { super(timeout, timeUnit); } @Override public Statement apply(Statement base, Description description) { Statement fail = super.apply(base, description); return new Statement() { @Override public void evaluate() throws Throwable { System.out.println(crunchifyGenerateThreadDump()); fail.evaluate(); } }; } } public static String crunchifyGenerateThreadDump() { final StringBuilder dump = new StringBuilder(); final ThreadMXBean threadMXBean = ManagementFactory.getThreadMXBean(); final ThreadInfo[] threadInfos = threadMXBean.getThreadInfo(threadMXBean.getAllThreadIds(), 100); for (ThreadInfo threadInfo : threadInfos) { dump.append('"'); dump.append(threadInfo.getThreadName()); dump.append("\" "); final Thread.State state = threadInfo.getThreadState(); dump.append("\n java.lang.Thread.State: "); dump.append(state); final StackTraceElement[] stackTraceElements = threadInfo.getStackTrace(); for (final StackTraceElement stackTraceElement : stackTraceElements) { dump.append("\nat "); dump.append(stackTraceElement); } dump.append("\n\n"); } return dump.toString(); } ``` ---