FlorianHockmann commented on code in PR #1882: URL: https://github.com/apache/tinkerpop/pull/1882#discussion_r1048686076
########## docs/src/reference/gremlin-variants.asciidoc: ########## @@ -658,6 +658,31 @@ social.persons("marko").knows("josh"); NOTE: Using Maven, as shown in the `gremlin-archetype-dsl` module, makes developing DSLs with the annotation processor straightforward in that it sets up appropriate paths to the generated code automatically. +[[gremlin-java-troubleshooting]] +=== Troubleshooting + +*Max frame length of 65536 has been exceeded* + +This error occurs when the driver attempts to process a request/response that exceeds the configured maximum size. +The most direct way to fix this problem is to increase the `maxContentLength` setting in the driver. Ideally, the +`maxContentLength` set for the driver should match the setting defined on the server. + +*TimeoutException* + +A `TimeoutException` is thrown by the driver when the time limit assigned by the `maxWaitForConnection` is exceeded +when trying to borrow a connection from the connection pool for a particular host. There are generally two scenarios +where this occurs: + +1. The server has actually reached its maximum capacity or the driver has just learned that the server is unreachable. +2. The client is throttling requests. Review Comment: To be honest, when I just read this once it was not directly clear to me what that means. Why is it throttling requests? Simply adding "[...] because the pool is exhausted" would already help. ########## gremlin-driver/src/main/java/org/apache/tinkerpop/gremlin/driver/ConnectionPool.java: ########## @@ -334,10 +362,13 @@ private boolean addConnectionIfUnderMaximum() { } try { - connections.add(new Connection(host.getHostUri(), this, settings().maxInProcessPerConnection)); - } catch (ConnectionException ce) { - logger.error("Connections were under max, but there was an error creating the connection.", ce); + connections.add(connectionFactory.create(this)); + } catch (Exception ce) { Review Comment: Really just a tiny nit, but this should probably be `e` now that it's no longer a `ConnectionException`. ########## docs/src/upgrade/release-3.5.x.asciidoc: ########## @@ -50,6 +50,48 @@ var client = new GremlinClient(new GremlinServer("localhost", 8182), loggerFacto See: link:https://issues.apache.org/jira/browse/TINKERPOP-2471[TINKERPOP-2471] +==== gremlin-driver Host Availability + +The Java drivers approach to determine the health of the host to which it is connected is important to how it behaves. +The driver has generally taken a pessimistic approach to make this determination, where a failure to borrow a +connection from the pool would constitute enough evidence to mark the host as dead. Unfortunately a failure to borrow +from the pool is not the best indicator for a dead host. Intermittent network failure, an overly busy client, or other +temporary issues could have been at play while the server was perfectly available. + +The consequences for marking a host dead are fairly severe as the connection pool is shutdown. While it is shutdown +in an orderly fashion so as to allow for existing requests to complete if at all possible, it immediately blocks new +requests producing an immediate `NoHostAvailableException` (assuming there are actually no other hosts to route +requests to). There is then some delay to reconnect to the server and re-establish each of the connections in the pool, +which might take some time especially if there was a large pool. If an application were sending hundreds of request Review Comment: ```suggestion which might take some time especially if there was a large pool. If an application were sending hundreds of requests ``` ########## docs/src/reference/gremlin-variants.asciidoc: ########## @@ -658,6 +658,31 @@ social.persons("marko").knows("josh"); NOTE: Using Maven, as shown in the `gremlin-archetype-dsl` module, makes developing DSLs with the annotation processor straightforward in that it sets up appropriate paths to the generated code automatically. +[[gremlin-java-troubleshooting]] +=== Troubleshooting + +*Max frame length of 65536 has been exceeded* + +This error occurs when the driver attempts to process a request/response that exceeds the configured maximum size. +The most direct way to fix this problem is to increase the `maxContentLength` setting in the driver. Ideally, the +`maxContentLength` set for the driver should match the setting defined on the server. + +*TimeoutException* + +A `TimeoutException` is thrown by the driver when the time limit assigned by the `maxWaitForConnection` is exceeded +when trying to borrow a connection from the connection pool for a particular host. There are generally two scenarios +where this occurs: + +1. The server has actually reached its maximum capacity or the driver has just learned that the server is unreachable. +2. The client is throttling requests. + +The latter of the two can be addressed from the driver side in the following ways: Review Comment: Not sure if that's necessary at all, but how about mentioning that the exception message / the error log message should contain information about the state of the `ConnectionPool` and its `Connections` to understand which of the two cases caused the problem? ########## gremlin-server/src/test/java/org/apache/tinkerpop/gremlin/driver/ClientConnectionIntegrateTest.java: ########## @@ -110,4 +124,100 @@ public void shouldCloseConnectionDeadDueToUnRecoverableError() throws Exception assertThat(recordingAppender.logContainsAny("^(?!.*(isDead=false)).*isDead=true.*destroyed successfully.$"), is(true)); } + + /** + * Added for TINKERPOP-2813 - this scenario would have previously thrown tons of + * {@link NoHostAvailableException}. + */ + @Test + public void shouldSucceedWithJitteryConnection() throws Exception { + final Cluster cluster = TestClientFactory.build().minConnectionPoolSize(1).maxConnectionPoolSize(4). + reconnectInterval(1000). + maxWaitForConnection(4000).validationRequest("g.inject()").create(); + final Client.ClusteredClient client = cluster.connect(); + + client.init(); + + // every 10 connections let's have some problems + final JitteryConnectionFactory connectionFactory = new JitteryConnectionFactory(3); + client.hostConnectionPools.forEach((h, pool) -> pool.connectionFactory = connectionFactory); + + // get an initial connection which marks the host as available + assertEquals(2, client.submit("1+1").all().join().get(0).getInt()); + + // network is gonna get fishy - ConnectionPool should try to grow during the workload below and when it + // does some connections will fail to create in the background which should log some errors but not tank + // the submit() as connections that are currently still working and active should be able to handle the load. + connectionFactory.jittery = true; + + // load up a hella ton of requests + final int requests = 1000; + final CountDownLatch latch = new CountDownLatch(requests); + final AtomicBoolean hadFailOtherThanTimeout = new AtomicBoolean(false); + + new Thread(() -> { + IntStream.range(0, requests).forEach(i -> { + try { + client.submitAsync("1 + " + i); + } catch (Exception ex) { + // we could catch a TimeoutException here in some cases if the jitters cause a borrow of a + // connection to take too long. submitAsync() will wrap in a RuntimeException. can't assert + // this condition inside this thread or the test locks up + hadFailOtherThanTimeout.compareAndSet(false, !(ex.getCause() instanceof TimeoutException)); + } finally { + latch.countDown(); + } + }); + }, "worker-shouldSucceedWithJitteryConnection").start(); + + // wait long enough for the jitters to kick in at least a little + while (latch.getCount() > 500) { Review Comment: (nitpick) Is this `500` because then half the requests have been executed? In that case, I'd change this to `requests / 2` to make this clear. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@tinkerpop.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org