[
https://issues.apache.org/jira/browse/KNOX-2095?focusedWorklogId=341469&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-341469
]
ASF GitHub Bot logged work on KNOX-2095:
----------------------------------------
Author: ASF GitHub Bot
Created on: 11/Nov/19 20:41
Start Date: 11/Nov/19 20:41
Worklog Time Spent: 10m
Work Description: risdenk commented on pull request #177: [WIP] KNOX-2095
- Adding in DefaultDispatch code and tests to handle 504 errors
URL: https://github.com/apache/knox/pull/177#discussion_r344900248
##########
File path:
gateway-spi/src/main/java/org/apache/knox/gateway/dispatch/DefaultDispatch.java
##########
@@ -138,6 +144,10 @@ protected HttpResponse executeOutboundRequest(
HttpUriRequest outboundRequest )
}
}
auditor.audit( Action.DISPATCH, outboundRequest.getURI().toString(),
ResourceType.URI, ActionOutcome.SUCCESS, RES.responseStatus( statusCode ) );
+ } catch( SocketTimeoutException e ) {
+ //Set a 504 instead of throwing an IO exception in the event of a
socket timeout,
+ //as an IO exception forces a 500 to be displayed.
Review comment:
Yes one server could be dying or overloaded where the second one isn't. That
would cause socket timeout on one server but failover would work correctly.
There can definitely be socket timeouts in active/passive and active/active
case.
active passive - active server gets overloaded
active/active - could be something wrong on one of the two hosts
Yes you could failover only to have to wait again but that is what failover
is designed to do. You can't just skip failover in some cases since you don't
have enough information to say whether the failover will succeed or not. The
fix for this would be some sort of active list of backend hosts, but Knox
doesn't have that today. That is a bigger feature for sure.
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
[email protected]
Issue Time Tracking
-------------------
Worklog Id: (was: 341469)
Remaining Estimate: 166h (was: 166h 10m)
Time Spent: 2h (was: 1h 50m)
> Many errors (E.G. 504s) being masked as 500 errors
> --------------------------------------------------
>
> Key: KNOX-2095
> URL: https://issues.apache.org/jira/browse/KNOX-2095
> Project: Apache Knox
> Issue Type: Improvement
> Affects Versions: 1.2.0, 1.3.0
> Reporter: James Chen
> Assignee: James Chen
> Priority: Minor
> Labels: easyfix
> Fix For: 1.4.0
>
> Attachments: KNOX-2095.patch, jamchen504patch.patch
>
> Original Estimate: 168h
> Time Spent: 2h
> Remaining Estimate: 166h
>
> When errors occur while accessing the Knox gateway, errors are forcibly
> overridden and represented as 500 errors, rather than whatever errors they
> should be.
> For example, when the timeout value under gateway.httpclient.socketTimeout is
> set to a very low timeout value (E.G. 1 ms) under gateway-site.xml, a socket
> timeout exception is produced by the getHttpClient().execute(
> outboundRequest) call. However, this is caught by the surrounding try-catch
> block and thrown again as an IOException. This results in a generic 500
> error, rather than a 504 error one would normally expect from this sort of
> interaction.
>
> For these sorts of scenarios, I believe it would be prudent to create a dummy
> HttpResponse using a HttpResponseFactory object for the inboundResponse with
> the corresponding error code (E.G. HttpStatus.SC_GATEWAY_TIMEOUT in the event
> of a SocketTimeoutException) and return that instead to trigger the
> appropriate 504 error. I suspect there are other sorts of potential error
> code triggers that get this same IOException treatment that would be better
> off receiving their own error codes.
>
> Judging from the source code, this issue likely affects version 1.3.0, though
> this has not been tested.
--
This message was sent by Atlassian Jira
(v8.3.4#803005)