[ https://issues.apache.org/jira/browse/KAFKA-3549?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15242461#comment-15242461 ]
Ewen Cheslack-Postava commented on KAFKA-3549: ---------------------------------------------- [~granthenke] [~ijuma] So, I verified that this doesn't clean up the transient failures, but committed as this cleanup was definitely worthwhile. However, I also thought a bit more about how this could get triggered. I thought of 2 things: 1. We have some background threads that run consumer polling. If any of those are not shut down, they could continue connecting to any clusters that were given the same port and mess up partition assignment (if the groups use the same ID). I checked this out and it looks like they are all being shut down properly. 2. Some tests try to use fixed ports because they need to restart brokers and we can't guarantee tests will pass properly without a consistent set of ports to seed consumer metadata with. These tests need to override generateConfigs() to use FixedPortTestUtils. This was never guaranteed to work perfectly and it's basically just the solution we used given that we don't have a better solution for running these style tests when we can't guarantee certain resources (i.e. ports) will be available. In particular, while the broker is shutdown, at a minimum its possible that another test allocates the port and starts using it for its own broker. Depending on certain timeouts on tests and how long they all take to run, it's possible some assertions (e.g. those using TestUtils.waitUntilTrue()) might be able to pass even if another broker manages to bind the port and temporarily get in the way, and meanwhile the extra consumer connecting will interfere with the other test. If FixedPortTestUtils *isn't* properly used, I think there are other ways things can fail too. For example, we might bind using port = 0, then restarting the broker will result in it listening on a different port, but another test can be given that same port and the consumers in the first test will connect to brokers in the second test. At a minimum, I think ProducerFailureHandlingTest, maybe BaseTopicMetadataTest.testIsrAfterBrokerShutDownAndJoinsBack, RollingBounceTest, UncleanLeaderElectionTest, ProducerTest.testSendWithDeadBroker, and probably more are buggy in that they do broker restarts but don't seem to use fixed ports. I'm not sure why the consumer tests would be triggering assertions way more frequently than any other test. But given the way tests are run, it doesn't look likely to be an issue with state being held across multiple tests accidentally, and ports are a resource that we can't guarantee we hold onto across broker restarts (i.e. may not be valid across the entire test) but which consumers have set from their initialization such that we could potentially see cross-test contamination. > Close consumers instantiated in consumer tests > ---------------------------------------------- > > Key: KAFKA-3549 > URL: https://issues.apache.org/jira/browse/KAFKA-3549 > Project: Kafka > Issue Type: Improvement > Reporter: Grant Henke > Assignee: Grant Henke > Fix For: 0.10.1.0 > > > Close consumers instantiated in consumer tests. Since these consumers often > use the default group.id of "", they could cause transient failures like > those seen in KAFKA-3117 and KAFKA-2933. I have not been able to prove that > this change will fix those failures, but closing the consumers is a good > practice regardless. -- This message was sent by Atlassian JIRA (v6.3.4#6332)