[ https://issues.apache.org/jira/browse/CASSANDRA-14078?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16272271#comment-16272271 ]
Jaydeepkumar Chovatia commented on CASSANDRA-14078: --------------------------------------------------- Hi [~KurtG] Thanks! for the review. {quote} My understanding was that the max # connections was configured so that {{COPY TO}} would always exceed the max and fail-over. {quote} Yes it is designed to failover to peer node, but this configuration {{'native_transport_max_concurrent_connections': '12'}} is applicable to all the nodes in cluster, not just a few. So client tries to fail over to peer and finds that peer is also busy, as a result {{COPY TO}} times out from client side and this test fails. In this [run|https://builds.apache.org/view/A-D/view/Cassandra/job/Cassandra-trunk-dtest/353/testReport/cqlsh_tests.cqlsh_copy_tests/CqlshCopyTest/test_bulk_round_trip_blogposts_with_max_connections/] we can see that {{COPY FROM}} command tries to fail over to peer node but all the nodes have exhausted connections so cannot failover, it retries by sleeping, etc. and finally gives up. {code} All replicas busy, sleeping for 4 second(s)... ... All replicas busy, sleeping for 1 second(s)... ... All replicas busy, sleeping for 23 second(s)... Replicas too busy, given up {code} {quote} We've had no record of this particular failure before (at least in JIRA), seems like it could actually be something that needs fixing. {quote} [This run |https://builds.apache.org/view/A-D/view/Cassandra/job/Cassandra-trunk-dtest/353/testReport/cqlsh_tests.cqlsh_copy_tests/CqlshCopyTest/test_bulk_round_trip_blogposts_with_max_connections/] on {{trunk}} shows 7/30 failure rate {quote} To some degree all tests kind of depend on what hardware you run, this is the nature of the beast with C* and dtests. Can you elaborate on why it's not deterministic? {quote} I agree that many of the dtests depend on the hardware we run, but this particular test is hardware dependent + *timing related*. If threads are less busy then it may not require more connections and test will pass, but if threads are more busy then connections will pile up and results in timeout, etc. We can try tweking {{INGESTRATE}}, {{'native_transport_max_concurrent_connections': '<>'}}, etc. options to make this test working but in my opinion it would be difficult to find ideal tuning which always works. Jaydeep > Fix dTest test_bulk_round_trip_blogposts_with_max_connections > ------------------------------------------------------------- > > Key: CASSANDRA-14078 > URL: https://issues.apache.org/jira/browse/CASSANDRA-14078 > Project: Cassandra > Issue Type: Test > Components: Testing > Reporter: Jaydeepkumar Chovatia > Assignee: Jaydeepkumar Chovatia > Priority: Minor > > This ticket is regarding following dTest > {{cqlsh_tests.cqlsh_copy_tests.CqlshCopyTest.test_bulk_round_trip_blogposts_with_max_connections}} > This test is trying to limit number of client connections and assumes that > once connection count has reached then client will fail-over to other node > and do the request. The reason is, it is not deterministic test case as it > totally depends on what hardware you run, timing, etc. > For example > If we look at > https://builds.apache.org/view/A-D/view/Cassandra/job/Cassandra-trunk-dtest/353/testReport/cqlsh_tests.cqlsh_copy_tests/CqlshCopyTest/test_bulk_round_trip_blogposts_with_max_connections/ > {quote} > ... > Processed: 5000 rows; Rate: 2551 rows/s; Avg. rate: 2551 rows/s > All replicas busy, sleeping for 4 second(s)... > Processed: 10000 rows; Rate: 2328 rows/s; Avg. rate: 2307 rows/s > All replicas busy, sleeping for 1 second(s)... > Processed: 15000 rows; Rate: 2137 rows/s; Avg. rate: 2173 rows/s > All replicas busy, sleeping for 11 second(s)... > Processed: 20000 rows; Rate: 2138 rows/s; Avg. rate: 2164 rows/s > Processed: 25000 rows; Rate: 2403 rows/s; Avg. rate: 2249 rows/s > Processed: 30000 rows; Rate: 2582 rows/s; Avg. rate: 2321 rows/s > Processed: 35000 rows; Rate: 2835 rows/s; Avg. rate: 2406 rows/s > Processed: 40000 rows; Rate: 2867 rows/s; Avg. rate: 2458 rows/s > Processed: 45000 rows; Rate: 3163 rows/s; Avg. rate: 2540 rows/s > Processed: 50000 rows; Rate: 3200 rows/s; Avg. rate: 2596 rows/s > Processed: 50234 rows; Rate: 2032 rows/s; Avg. rate: 2572 rows/s > All replicas busy, sleeping for 23 second(s)... > Replicas too busy, given up > ... > {quote} > Here we can see request is timing out, sometimes it resumes after 1 second, > next time 11 seconds and some times it doesn't work at all. > In my opinion this test is not a good fit for dTest as dTest(s) should be > deterministic. -- This message was sent by Atlassian JIRA (v6.4.14#64029) --------------------------------------------------------------------- To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org