FlorianHockmann commented on code in PR #1882:
URL: https://github.com/apache/tinkerpop/pull/1882#discussion_r1048686076


##########
docs/src/reference/gremlin-variants.asciidoc:
##########
@@ -658,6 +658,31 @@ social.persons("marko").knows("josh");
 NOTE: Using Maven, as shown in the `gremlin-archetype-dsl` module, makes 
developing DSLs with the annotation processor
 straightforward in that it sets up appropriate paths to the generated code 
automatically.
 
+[[gremlin-java-troubleshooting]]
+=== Troubleshooting
+
+*Max frame length of 65536 has been exceeded*
+
+This error occurs when the driver attempts to process a request/response that 
exceeds the configured maximum size.
+The most direct way to fix this problem is to increase the `maxContentLength` 
setting in the driver. Ideally, the
+`maxContentLength` set for the driver should match the setting defined on the 
server.
+
+*TimeoutException*
+
+A `TimeoutException` is thrown by the driver when the time limit assigned by 
the `maxWaitForConnection` is exceeded
+when trying to borrow a connection from the connection pool for a particular 
host. There are generally two scenarios
+where this occurs:
+
+1. The server has actually reached its maximum capacity or the driver has just 
learned that the server is unreachable.
+2. The client is throttling requests.

Review Comment:
   To be honest, when I just read this once it was not directly clear to me 
what that means. Why is it throttling requests?
   Simply adding "[...] because the pool is exhausted" would already help.



##########
gremlin-driver/src/main/java/org/apache/tinkerpop/gremlin/driver/ConnectionPool.java:
##########
@@ -334,10 +362,13 @@ private boolean addConnectionIfUnderMaximum() {
         }
 
         try {
-            connections.add(new Connection(host.getHostUri(), this, 
settings().maxInProcessPerConnection));
-        } catch (ConnectionException ce) {
-            logger.error("Connections were under max, but there was an error 
creating the connection.", ce);
+            connections.add(connectionFactory.create(this));
+        } catch (Exception ce) {

Review Comment:
   Really just a tiny nit, but this should probably be `e` now that it's no 
longer a `ConnectionException`.



##########
docs/src/upgrade/release-3.5.x.asciidoc:
##########
@@ -50,6 +50,48 @@ var client = new GremlinClient(new 
GremlinServer("localhost", 8182), loggerFacto
 
 See: link:https://issues.apache.org/jira/browse/TINKERPOP-2471[TINKERPOP-2471]
 
+==== gremlin-driver Host Availability
+
+The Java drivers approach to determine the health of the host to which it is 
connected is important to how it behaves.
+The driver has generally taken a pessimistic approach to make this 
determination, where a failure to borrow a
+connection from the pool would constitute enough evidence to mark the host as 
dead. Unfortunately a failure to borrow
+from the pool is not the best indicator for a dead host. Intermittent network 
failure, an overly busy client, or other
+temporary issues could have been at play while the server was perfectly 
available.
+
+The consequences for marking a host dead are fairly severe as the connection 
pool is shutdown. While it is shutdown
+in an orderly fashion so as to allow for existing requests to complete if at 
all possible, it immediately blocks new
+requests producing an immediate `NoHostAvailableException` (assuming there are 
actually no other hosts to route
+requests to). There is then some delay to reconnect to the server and 
re-establish each of the connections in the pool,
+which might take some time especially if there was a large pool. If an 
application were sending hundreds of request

Review Comment:
   ```suggestion
   which might take some time especially if there was a large pool. If an 
application were sending hundreds of requests
   ```



##########
docs/src/reference/gremlin-variants.asciidoc:
##########
@@ -658,6 +658,31 @@ social.persons("marko").knows("josh");
 NOTE: Using Maven, as shown in the `gremlin-archetype-dsl` module, makes 
developing DSLs with the annotation processor
 straightforward in that it sets up appropriate paths to the generated code 
automatically.
 
+[[gremlin-java-troubleshooting]]
+=== Troubleshooting
+
+*Max frame length of 65536 has been exceeded*
+
+This error occurs when the driver attempts to process a request/response that 
exceeds the configured maximum size.
+The most direct way to fix this problem is to increase the `maxContentLength` 
setting in the driver. Ideally, the
+`maxContentLength` set for the driver should match the setting defined on the 
server.
+
+*TimeoutException*
+
+A `TimeoutException` is thrown by the driver when the time limit assigned by 
the `maxWaitForConnection` is exceeded
+when trying to borrow a connection from the connection pool for a particular 
host. There are generally two scenarios
+where this occurs:
+
+1. The server has actually reached its maximum capacity or the driver has just 
learned that the server is unreachable.
+2. The client is throttling requests.
+
+The latter of the two can be addressed from the driver side in the following 
ways:

Review Comment:
   Not sure if that's necessary at all, but how about mentioning that the 
exception message / the error log message should contain information about the 
state of the `ConnectionPool` and its `Connections` to understand which of the 
two cases caused the problem?



##########
gremlin-server/src/test/java/org/apache/tinkerpop/gremlin/driver/ClientConnectionIntegrateTest.java:
##########
@@ -110,4 +124,100 @@ public void 
shouldCloseConnectionDeadDueToUnRecoverableError() throws Exception
         
assertThat(recordingAppender.logContainsAny("^(?!.*(isDead=false)).*isDead=true.*destroyed
 successfully.$"), is(true));
 
     }
+
+    /**
+     * Added for TINKERPOP-2813 - this scenario would have previously thrown 
tons of
+     * {@link NoHostAvailableException}.
+     */
+    @Test
+    public void shouldSucceedWithJitteryConnection() throws Exception {
+        final Cluster cluster = 
TestClientFactory.build().minConnectionPoolSize(1).maxConnectionPoolSize(4).
+                reconnectInterval(1000).
+                
maxWaitForConnection(4000).validationRequest("g.inject()").create();
+        final Client.ClusteredClient client = cluster.connect();
+
+        client.init();
+
+        // every 10 connections let's have some problems
+        final JitteryConnectionFactory connectionFactory = new 
JitteryConnectionFactory(3);
+        client.hostConnectionPools.forEach((h, pool) -> pool.connectionFactory 
= connectionFactory);
+
+        // get an initial connection which marks the host as available
+        assertEquals(2, client.submit("1+1").all().join().get(0).getInt());
+
+        // network is gonna get fishy - ConnectionPool should try to grow 
during the workload below and when it
+        // does some connections will fail to create in the background which 
should log some errors but not tank
+        // the submit() as connections that are currently still working and 
active should be able to handle the load.
+        connectionFactory.jittery = true;
+
+        // load up a hella ton of requests
+        final int requests = 1000;
+        final CountDownLatch latch = new CountDownLatch(requests);
+        final AtomicBoolean hadFailOtherThanTimeout = new AtomicBoolean(false);
+
+        new Thread(() -> {
+            IntStream.range(0, requests).forEach(i -> {
+                try {
+                    client.submitAsync("1 + " + i);
+                } catch (Exception ex) {
+                    // we could catch a TimeoutException here in some cases if 
the jitters cause a borrow of a
+                    // connection to take too long. submitAsync() will wrap in 
a RuntimeException. can't assert
+                    // this condition inside this thread or the test locks up
+                    hadFailOtherThanTimeout.compareAndSet(false, 
!(ex.getCause() instanceof TimeoutException));
+                } finally {
+                    latch.countDown();
+                }
+            });
+        }, "worker-shouldSucceedWithJitteryConnection").start();
+
+        // wait long enough for the jitters to kick in at least a little
+        while (latch.getCount() > 500) {

Review Comment:
   (nitpick) Is this `500` because then half the requests have been executed? 
In that case, I'd change this to `requests / 2` to make this clear.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@tinkerpop.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

Reply via email to