[ https://issues.apache.org/jira/browse/HADOOP-18883?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17804148#comment-17804148 ]
ASF GitHub Bot commented on HADOOP-18883: ----------------------------------------- saxenapranav commented on code in PR #6022: URL: https://github.com/apache/hadoop/pull/6022#discussion_r1444240890 ########## hadoop-tools/hadoop-azure/src/main/java/org/apache/hadoop/fs/azurebfs/services/AbfsHttpOperation.java: ########## @@ -340,8 +344,11 @@ public void sendRequest(byte[] buffer, int offset, int length) throws IOExceptio If expect header is not enabled, we throw back the exception. */ String expectHeader = getConnProperty(EXPECT); - if (expectHeader != null && expectHeader.equals(HUNDRED_CONTINUE)) { + if (expectHeader != null && expectHeader.equals(HUNDRED_CONTINUE) + && e instanceof ProtocolException + && EXPECT_100_JDK_ERROR.equals(e.getMessage())) { Review Comment: At `httpUrlConnection.getOutputStream`, either the error could IOException(including ConnectionTimeout and ReadTimeout) or expect-100 error (this raises ProtocolException which is child of IOException). Server errors if any would be caught in `processResponse` and the treatment would be same as done with all other apis (analyse if needed to be retried and then RestOperation would retry it). In the JDK's implementation of `getOutputStream`, For the IOExceptions, the connection is killed. So, if further APIs are let go ahead, they would be firing a new server call all together. So, other APIs, like getHeaderField() etc, would be returning the data as per the new server call which is undesirable. Also, the implementation of `httpUrlConnection` is such that the other APIs (like getHeaderField()), would internally call getInputStream(), which would would first call getOutputStream() (if the sendData flag is true and doesnt hold strOutputStream object). Now, here two things can happen: 1. Expect100 failure: no data capture, and again any next API on the httpUrlConnection would fire a new call. 2. Status-100 : Now, it is not in the block where data can be put in the outputStream, the stream shall be closed which will raise IOException, and from here it will go back to retry loop. Ref: https://github.com/openjdk/jdk8/blob/master/jdk/src/share/classes/sun/net/www/protocol/http/HttpURLConnection.java#L1463-L1471 Hence, any further API is prevented on the HttpUrlConnection object which has got an IOException in getOutputStream. > Expect-100 JDK bug resolution: prevent multiple server calls > ------------------------------------------------------------ > > Key: HADOOP-18883 > URL: https://issues.apache.org/jira/browse/HADOOP-18883 > Project: Hadoop Common > Issue Type: Sub-task > Components: fs/azure > Reporter: Pranav Saxena > Assignee: Pranav Saxena > Priority: Major > Labels: pull-request-available > Fix For: 3.4.0 > > > This is inline to JDK bug: [https://bugs.openjdk.org/browse/JDK-8314978]. > > With the current implementation of HttpURLConnection if server rejects the > “Expect 100-continue” then there will be ‘java.net.ProtocolException’ will be > thrown from 'expect100Continue()' method. > After the exception thrown, If we call any other method on the same instance > (ex getHeaderField(), or getHeaderFields()). They will internally call > getOuputStream() which invokes writeRequests(), which make the actual server > call. > In the AbfsHttpOperation, after sendRequest() we call processResponse() > method from AbfsRestOperation. Even if the conn.getOutputStream() fails due > to expect-100 error, we consume the exception and let the code go ahead. So, > we can have getHeaderField() / getHeaderFields() / getHeaderFieldLong() which > will be triggered after getOutputStream is failed. These invocation will lead > to server calls. -- This message was sent by Atlassian Jira (v8.20.10#820010) --------------------------------------------------------------------- To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: common-issues-h...@hadoop.apache.org