[jira] [Comment Edited] (KAFKA-1501) transient unit tests failures due to port already in use
[ https://issues.apache.org/jira/browse/KAFKA-1501?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14396022#comment-14396022 ] Gwen Shapira edited comment on KAFKA-1501 at 4/5/15 1:04 AM: - Not to take from this awesome improvement, but any reason ProducerFailureHandlingTest.testBrokerFailure() was removed? (I mean, other than the fact that it is consistently failing / hanging after this patch... auto-merge left the test in and I found out the hard way. I admit failure to finding out why producers hang on lack of memory after this patch is added, but I'm a bit concerned) was (Author: gwenshap): Not to take from this awesome improvement, but any reason ProducerFailureHandlingTest.testBrokerFailure() was removed? > transient unit tests failures due to port already in use > > > Key: KAFKA-1501 > URL: https://issues.apache.org/jira/browse/KAFKA-1501 > Project: Kafka > Issue Type: Improvement > Components: core >Reporter: Jun Rao >Assignee: Ewen Cheslack-Postava > Labels: newbie > Attachments: KAFKA-1501-choosePorts.patch, KAFKA-1501.patch, > KAFKA-1501.patch, KAFKA-1501.patch, KAFKA-1501.patch, > KAFKA-1501_2015-03-09_11:41:07.patch, KAFKA-1501_2015-03-25_00:44:50.patch, > test-100.out, test-100.out, test-27.out, test-29.out, test-32.out, > test-35.out, test-38.out, test-4.out, test-42.out, test-45.out, test-46.out, > test-51.out, test-55.out, test-58.out, test-59.out, test-60.out, test-69.out, > test-72.out, test-74.out, test-76.out, test-84.out, test-87.out, test-91.out, > test-92.out > > > Saw the following transient failures. > kafka.api.ProducerFailureHandlingTest > testTooLargeRecordWithAckOne FAILED > kafka.common.KafkaException: Socket server failed to bind to > localhost:59909: Address already in use. > at kafka.network.Acceptor.openServerSocket(SocketServer.scala:195) > at kafka.network.Acceptor.(SocketServer.scala:141) > at kafka.network.SocketServer.startup(SocketServer.scala:68) > at kafka.server.KafkaServer.startup(KafkaServer.scala:95) > at kafka.utils.TestUtils$.createServer(TestUtils.scala:123) > at > kafka.api.ProducerFailureHandlingTest.setUp(ProducerFailureHandlingTest.scala:68) -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Comment Edited] (KAFKA-1501) transient unit tests failures due to port already in use
[ https://issues.apache.org/jira/browse/KAFKA-1501?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14236147#comment-14236147 ] Jay Kreps edited comment on KAFKA-1501 at 12/5/14 9:29 PM: --- Yeah that Stack Overflow article I linked to previously indicates that as of Java 7 the only really reliable way to check is to try to create a socket and see if that works. That should be a super non-invasive change too--just updating the implementation for choosePorts. http://stackoverflow.com/questions/434718/sockets-discover-port-availability-using-java They give a checkAvailable method something like the below. So the new approach would be to choose a random port in some range, and then check that it is available. {code} private static boolean checkAvailable(int port) { Socket s = null; try { s = new Socket("localhost", port); return false; } catch (IOException e) { return true; } finally { if( s != null){ try { s.close(); } catch (IOException e) { throw new RuntimeException("You should handle this error." , e); } } } } {code} was (Author: jkreps): Yeah that Stack Overflow article I linked to previously indicates that as of Java 7 the only really reliable way to check is to try to create a socket and see if that works. That should be a super non-invasive change too--just updating the implementation for choosePorts. http://stackoverflow.com/questions/434718/sockets-discover-port-availability-using-java They give a checkAvailable method something like the below. So the new approach would be to choose a random port in some range, and then check that it is available. {code} private static boolean checkAvailable(int port) { Socket s = null; try { s = new Socket("localhost", port); return false; } catch (IOException e) { System.out.println("--Port " + port + " is available"); return true; } finally { if( s != null){ try { s.close(); } catch (IOException e) { throw new RuntimeException("You should handle this error." , e); } } } } {code} > transient unit tests failures due to port already in use > > > Key: KAFKA-1501 > URL: https://issues.apache.org/jira/browse/KAFKA-1501 > Project: Kafka > Issue Type: Improvement > Components: core >Reporter: Jun Rao >Assignee: Guozhang Wang > Labels: newbie > Attachments: KAFKA-1501-choosePorts.patch, KAFKA-1501.patch, > KAFKA-1501.patch, KAFKA-1501.patch, test-100.out, test-100.out, test-27.out, > test-29.out, test-32.out, test-35.out, test-38.out, test-4.out, test-42.out, > test-45.out, test-46.out, test-51.out, test-55.out, test-58.out, test-59.out, > test-60.out, test-69.out, test-72.out, test-74.out, test-76.out, test-84.out, > test-87.out, test-91.out, test-92.out > > > Saw the following transient failures. > kafka.api.ProducerFailureHandlingTest > testTooLargeRecordWithAckOne FAILED > kafka.common.KafkaException: Socket server failed to bind to > localhost:59909: Address already in use. > at kafka.network.Acceptor.openServerSocket(SocketServer.scala:195) > at kafka.network.Acceptor.(SocketServer.scala:141) > at kafka.network.SocketServer.startup(SocketServer.scala:68) > at kafka.server.KafkaServer.startup(KafkaServer.scala:95) > at kafka.utils.TestUtils$.createServer(TestUtils.scala:123) > at > kafka.api.ProducerFailureHandlingTest.setUp(ProducerFailureHandlingTest.scala:68) -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Comment Edited] (KAFKA-1501) transient unit tests failures due to port already in use
[ https://issues.apache.org/jira/browse/KAFKA-1501?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14128653#comment-14128653 ] Chris Cope edited comment on KAFKA-1501 at 9/10/14 4:10 PM: I agree, [~absingh]. I'm running some more tests and I think the best way to handle this unlikely event is to catch it specifically, and then have it rerun the entire test class *one* time, and noting this in the test log. This bug does not affect the core Kafka code, and is simply exposed here because Kafka has such great unit tests, and we just happen to run them A LOT for our purposes. I'm proposing this solution instead of hunting and fixing the underlying issue in choosePorts(), which when looking around at other projects does seem like a decent implementation. The probability of a test class failing twice in a row should be very low (<0.0001%) and should result in any test class failure less than 1% of the time `./gradlew test` is run. Is this approach sound? was (Author: copester): I agree, [~absingh]. I'm running some more tests and I think the best way to handle this unlikely event is to catch is specifically, and then have it rerun the entire test class *one* time, and noting this in the test log. This bug does not affect the core Kafka code, and is simple exposed here because Kafka has such great unit tests, and we just happen to run them A LOT of our purposes. I'm proposing this solution instead of hunting and fixing the underlying issue in choosePorts(), which when looking around at other projects does seem like a decent implementation. The probability of a test class failing twice in a row should be very low (<0.0001%) and should result in any test class failure less than 1% of the time `./gradlew test` is run. Is this approach sound? > transient unit tests failures due to port already in use > > > Key: KAFKA-1501 > URL: https://issues.apache.org/jira/browse/KAFKA-1501 > Project: Kafka > Issue Type: Improvement > Components: core >Reporter: Jun Rao > Labels: newbie > > Saw the following transient failures. > kafka.api.ProducerFailureHandlingTest > testTooLargeRecordWithAckOne FAILED > kafka.common.KafkaException: Socket server failed to bind to > localhost:59909: Address already in use. > at kafka.network.Acceptor.openServerSocket(SocketServer.scala:195) > at kafka.network.Acceptor.(SocketServer.scala:141) > at kafka.network.SocketServer.startup(SocketServer.scala:68) > at kafka.server.KafkaServer.startup(KafkaServer.scala:95) > at kafka.utils.TestUtils$.createServer(TestUtils.scala:123) > at > kafka.api.ProducerFailureHandlingTest.setUp(ProducerFailureHandlingTest.scala:68) -- This message was sent by Atlassian JIRA (v6.3.4#6332)