absurdfarce commented on code in PR #1902:
URL: 
https://github.com/apache/cassandra-java-driver/pull/1902#discussion_r1429553321


##########
core/src/main/java/com/datastax/oss/driver/internal/core/config/cloud/CloudConfigFactory.java:
##########
@@ -225,22 +228,40 @@ protected TrustManagerFactory createTrustManagerFactory(
   @NonNull
   protected BufferedReader fetchProxyMetadata(
       @NonNull URL metadataServiceUrl, @NonNull SSLContext sslContext) throws 
IOException {
-    try {
-      HttpsURLConnection connection = (HttpsURLConnection) 
metadataServiceUrl.openConnection();
-      connection.setSSLSocketFactory(sslContext.getSocketFactory());
-      connection.setRequestMethod("GET");
-      connection.setRequestProperty("host", "localhost");
-      return new BufferedReader(
-          new InputStreamReader(connection.getInputStream(), 
StandardCharsets.UTF_8));
-    } catch (ConnectException e) {
-      throw new IllegalStateException(
-          "Unable to connect to cloud metadata service. Please make sure your 
cluster is not parked or terminated",
-          e);
-    } catch (UnknownHostException e) {
-      throw new IllegalStateException(
-          "Unable to resolve host for cloud metadata service. Please make sure 
your cluster is not terminated",
-          e);
+    HttpsURLConnection connection = null;
+    int attempt = 0;
+    while(attempt < METADATA_RETRY_MAX_ATTEMPTS){
+      try {
+        connection = (HttpsURLConnection) metadataServiceUrl.openConnection();
+        connection.setSSLSocketFactory(sslContext.getSocketFactory());
+        connection.setRequestMethod("GET");
+        connection.setRequestProperty("host", "localhost");
+        attempt++;
+        // if this is the last attempt, throw
+        // else if the response code is not 200, retry
+        // else, throw
+        if (attempt != METADATA_RETRY_MAX_ATTEMPTS && 
connection.getResponseCode() != 200) {

Review Comment:
   HttpURLConnection.HTTP_OK is probably preferred here to the literal 200



##########
core/src/main/java/com/datastax/oss/driver/internal/core/config/cloud/CloudConfigFactory.java:
##########
@@ -225,22 +228,40 @@ protected TrustManagerFactory createTrustManagerFactory(
   @NonNull
   protected BufferedReader fetchProxyMetadata(
       @NonNull URL metadataServiceUrl, @NonNull SSLContext sslContext) throws 
IOException {
-    try {
-      HttpsURLConnection connection = (HttpsURLConnection) 
metadataServiceUrl.openConnection();
-      connection.setSSLSocketFactory(sslContext.getSocketFactory());
-      connection.setRequestMethod("GET");
-      connection.setRequestProperty("host", "localhost");
-      return new BufferedReader(
-          new InputStreamReader(connection.getInputStream(), 
StandardCharsets.UTF_8));
-    } catch (ConnectException e) {
-      throw new IllegalStateException(
-          "Unable to connect to cloud metadata service. Please make sure your 
cluster is not parked or terminated",
-          e);
-    } catch (UnknownHostException e) {
-      throw new IllegalStateException(
-          "Unable to resolve host for cloud metadata service. Please make sure 
your cluster is not terminated",
-          e);
+    HttpsURLConnection connection = null;
+    int attempt = 0;
+    while(attempt < METADATA_RETRY_MAX_ATTEMPTS){
+      try {
+        connection = (HttpsURLConnection) metadataServiceUrl.openConnection();
+        connection.setSSLSocketFactory(sslContext.getSocketFactory());
+        connection.setRequestMethod("GET");
+        connection.setRequestProperty("host", "localhost");
+        attempt++;

Review Comment:
   It's not a problem _logically_ but I'd argue it's a bit cleaner to increment 
the number of attempts at the very top of this loop.  You can probably do it 
even before going into the try block honestly.



##########
core/src/main/java/com/datastax/oss/driver/internal/core/config/cloud/CloudConfigFactory.java:
##########
@@ -225,22 +228,40 @@ protected TrustManagerFactory createTrustManagerFactory(
   @NonNull
   protected BufferedReader fetchProxyMetadata(
       @NonNull URL metadataServiceUrl, @NonNull SSLContext sslContext) throws 
IOException {
-    try {
-      HttpsURLConnection connection = (HttpsURLConnection) 
metadataServiceUrl.openConnection();
-      connection.setSSLSocketFactory(sslContext.getSocketFactory());
-      connection.setRequestMethod("GET");
-      connection.setRequestProperty("host", "localhost");
-      return new BufferedReader(
-          new InputStreamReader(connection.getInputStream(), 
StandardCharsets.UTF_8));
-    } catch (ConnectException e) {
-      throw new IllegalStateException(
-          "Unable to connect to cloud metadata service. Please make sure your 
cluster is not parked or terminated",
-          e);
-    } catch (UnknownHostException e) {
-      throw new IllegalStateException(
-          "Unable to resolve host for cloud metadata service. Please make sure 
your cluster is not terminated",
-          e);
+    HttpsURLConnection connection = null;
+    int attempt = 0;
+    while(attempt < METADATA_RETRY_MAX_ATTEMPTS){
+      try {
+        connection = (HttpsURLConnection) metadataServiceUrl.openConnection();
+        connection.setSSLSocketFactory(sslContext.getSocketFactory());
+        connection.setRequestMethod("GET");
+        connection.setRequestProperty("host", "localhost");
+        attempt++;
+        // if this is the last attempt, throw
+        // else if the response code is not 200, retry
+        // else, throw
+        if (attempt != METADATA_RETRY_MAX_ATTEMPTS && 
connection.getResponseCode() != 200) {
+          try {
+            Thread.sleep(METADATA_RETRY_INITIAL_DELAY_MS);
+            continue;
+          } catch (InterruptedException interruptedException) {
+            Thread.currentThread().interrupt();

Review Comment:
   You shouldn't need to do an explicit interrupt of the current thread here; 
that's already happened (in order to get the InterruptedException to you). :)



##########
core/src/main/java/com/datastax/oss/driver/internal/core/config/cloud/CloudConfigFactory.java:
##########
@@ -58,6 +59,8 @@
 @ThreadSafe
 public class CloudConfigFactory {
   private static final Logger LOG = 
LoggerFactory.getLogger(CloudConfigFactory.class);
+  private static final int METADATA_RETRY_MAX_ATTEMPTS = 4;
+  private static final int METADATA_RETRY_INITIAL_DELAY_MS = 250;

Review Comment:
   Note: per an earlier conversation @SiyaoIsHiding and I decided to leave 
these as constants for now.  We can move them to configurable values if an 
explicit need prevents itself (or perhaps as a future enhancement).



##########
core/src/main/java/com/datastax/oss/driver/internal/core/config/cloud/CloudConfigFactory.java:
##########
@@ -225,22 +228,40 @@ protected TrustManagerFactory createTrustManagerFactory(
   @NonNull
   protected BufferedReader fetchProxyMetadata(
       @NonNull URL metadataServiceUrl, @NonNull SSLContext sslContext) throws 
IOException {
-    try {
-      HttpsURLConnection connection = (HttpsURLConnection) 
metadataServiceUrl.openConnection();
-      connection.setSSLSocketFactory(sslContext.getSocketFactory());
-      connection.setRequestMethod("GET");
-      connection.setRequestProperty("host", "localhost");
-      return new BufferedReader(
-          new InputStreamReader(connection.getInputStream(), 
StandardCharsets.UTF_8));
-    } catch (ConnectException e) {
-      throw new IllegalStateException(
-          "Unable to connect to cloud metadata service. Please make sure your 
cluster is not parked or terminated",
-          e);
-    } catch (UnknownHostException e) {
-      throw new IllegalStateException(
-          "Unable to resolve host for cloud metadata service. Please make sure 
your cluster is not terminated",
-          e);
+    HttpsURLConnection connection = null;
+    int attempt = 0;
+    while(attempt < METADATA_RETRY_MAX_ATTEMPTS){
+      try {
+        connection = (HttpsURLConnection) metadataServiceUrl.openConnection();
+        connection.setSSLSocketFactory(sslContext.getSocketFactory());
+        connection.setRequestMethod("GET");
+        connection.setRequestProperty("host", "localhost");
+        attempt++;
+        // if this is the last attempt, throw
+        // else if the response code is not 200, retry
+        // else, throw
+        if (attempt != METADATA_RETRY_MAX_ATTEMPTS && 
connection.getResponseCode() != 200) {
+          try {
+            Thread.sleep(METADATA_RETRY_INITIAL_DELAY_MS);
+            continue;
+          } catch (InterruptedException interruptedException) {
+            Thread.currentThread().interrupt();
+            throw new IOException("Interrupted while waiting to retry metadata 
fetch", interruptedException);
+          }
+        }
+        return new BufferedReader(
+                new InputStreamReader(connection.getInputStream(), 
StandardCharsets.UTF_8));

Review Comment:
   We know that connection.getInputStream() can throw an IOException under 
various cases (including some non-200 response codes).  Presumably if we add 
checks above to make sure we only hit this code if we have a 200 response code 
we can avoid that issue... but any such code is probably a bit more brittle 
than we'd like.  While we're here it may be best to just wrap this line of code 
in it's own try/catch and either (a) return an IllegalStateException if we get 
an IOException (like we do in the other catch blocks here) or (b) fold any such 
error into our retry logic.
   
   Note that this issue is really what 
[JAVA-3033](https://datastax-oss.atlassian.net/browse/JAVA-3033) is about.



##########
core/src/main/java/com/datastax/oss/driver/internal/core/config/cloud/CloudConfigFactory.java:
##########
@@ -225,22 +228,40 @@ protected TrustManagerFactory createTrustManagerFactory(
   @NonNull
   protected BufferedReader fetchProxyMetadata(
       @NonNull URL metadataServiceUrl, @NonNull SSLContext sslContext) throws 
IOException {
-    try {
-      HttpsURLConnection connection = (HttpsURLConnection) 
metadataServiceUrl.openConnection();
-      connection.setSSLSocketFactory(sslContext.getSocketFactory());
-      connection.setRequestMethod("GET");
-      connection.setRequestProperty("host", "localhost");
-      return new BufferedReader(
-          new InputStreamReader(connection.getInputStream(), 
StandardCharsets.UTF_8));
-    } catch (ConnectException e) {
-      throw new IllegalStateException(
-          "Unable to connect to cloud metadata service. Please make sure your 
cluster is not parked or terminated",
-          e);
-    } catch (UnknownHostException e) {
-      throw new IllegalStateException(
-          "Unable to resolve host for cloud metadata service. Please make sure 
your cluster is not terminated",
-          e);
+    HttpsURLConnection connection = null;
+    int attempt = 0;
+    while(attempt < METADATA_RETRY_MAX_ATTEMPTS){
+      try {
+        connection = (HttpsURLConnection) metadataServiceUrl.openConnection();
+        connection.setSSLSocketFactory(sslContext.getSocketFactory());
+        connection.setRequestMethod("GET");
+        connection.setRequestProperty("host", "localhost");
+        attempt++;
+        // if this is the last attempt, throw
+        // else if the response code is not 200, retry
+        // else, throw
+        if (attempt != METADATA_RETRY_MAX_ATTEMPTS && 
connection.getResponseCode() != 200) {

Review Comment:
   We might need to make this a bit more granular.  For example: 300 errors 
(3xx) indicate that a resource has been moved (or that we're being redirected 
someplace else).  We should probably follow those if URLConnection doesn't do 
so already.
   
   4xx errors are all client errors; my guess is that in most cases we won't 
want to retry these, although we have seen at least one case with a 421 
(misdirected request) response which could benefit from a retry.  
[JAVA-3033](https://datastax-oss.atlassian.net/browse/JAVA-3033) also makes 
reference to a case in which 401 errors can be returned apparently due to a 
hibernated Astra instance.
   
   5xx errors indicate a failure server-side; these should probably be retried 
in case there's a transient server error of some kind.
   
   It might be worth changing this to an if test only here and then having a 
set of if tests underneath that to check for the various error codes; that way 
you can handle each case independently without making the code too complex.



##########
core/src/main/java/com/datastax/oss/driver/internal/core/config/cloud/CloudConfigFactory.java:
##########
@@ -225,22 +228,40 @@ protected TrustManagerFactory createTrustManagerFactory(
   @NonNull
   protected BufferedReader fetchProxyMetadata(
       @NonNull URL metadataServiceUrl, @NonNull SSLContext sslContext) throws 
IOException {
-    try {
-      HttpsURLConnection connection = (HttpsURLConnection) 
metadataServiceUrl.openConnection();
-      connection.setSSLSocketFactory(sslContext.getSocketFactory());
-      connection.setRequestMethod("GET");
-      connection.setRequestProperty("host", "localhost");
-      return new BufferedReader(
-          new InputStreamReader(connection.getInputStream(), 
StandardCharsets.UTF_8));
-    } catch (ConnectException e) {
-      throw new IllegalStateException(
-          "Unable to connect to cloud metadata service. Please make sure your 
cluster is not parked or terminated",
-          e);
-    } catch (UnknownHostException e) {
-      throw new IllegalStateException(
-          "Unable to resolve host for cloud metadata service. Please make sure 
your cluster is not terminated",
-          e);
+    HttpsURLConnection connection = null;
+    int attempt = 0;
+    while(attempt < METADATA_RETRY_MAX_ATTEMPTS){
+      try {
+        connection = (HttpsURLConnection) metadataServiceUrl.openConnection();
+        connection.setSSLSocketFactory(sslContext.getSocketFactory());
+        connection.setRequestMethod("GET");
+        connection.setRequestProperty("host", "localhost");
+        attempt++;
+        // if this is the last attempt, throw
+        // else if the response code is not 200, retry
+        // else, throw
+        if (attempt != METADATA_RETRY_MAX_ATTEMPTS && 
connection.getResponseCode() != 200) {

Review Comment:
   This should be `attempt <= METADATA_RETRY_MAX_ATTEMPTS`.  You're 
incrementing "attempt" at the very top of this block so it will always point to 
the _current_ attempt (i.e. it'll be one on the first attempt, two on the 
second, etc.).  You want to check here that the current attempt is less than or 
equal to the max number of attempts allowed.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org

Reply via email to