[ 
https://issues.apache.org/jira/browse/KNOX-2095?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16967107#comment-16967107
 ] 

James Chen commented on KNOX-2095:
----------------------------------

Sounds good; I'll try setting up a test case. As for the "different catch 
block" issue, from testing the code, it seems that when a timeout occurs, the 
inboundResponse generated by "inboundResponse = getHttpClient().execute( 
outboundRequest );" under executeOutboundRequest is the call that fails during 
a socket timeout error, with the time gap before the error controlled by the 
gateway-site.xml file (The logic in between is a bit murky, but from what I've 
seen, it's not really possible to dig much deeper without going into the other 
Apache packages). I don't think a socket timeout would occur in a try-catch 
block other than this for purposes of accessing the Knox gateway, though 
admittedly I could be better informed on any other things that the 
socketTimeout parameter controls.

I do agree that this patch doesn't seem ideal and that there would optimally be 
a better way of handling this. I also haven't used Knox long enough to know 
whether or not other errors are masked as 500 errors, though I'd assume that 
there are some that exist. Complicating things is the fact that this particular 
fix doesn't go the failover nodes, while other errors that occur as a result of 
setting the inboundResponse may benefit from trying other failover nodes. 
Still, it'd be nice if we could distinguish 504 errors at least; I'll set up 
the PR after the test case(s) are written.

> Many errors (E.G. 504s) being masked as 500 errors
> --------------------------------------------------
>
>                 Key: KNOX-2095
>                 URL: https://issues.apache.org/jira/browse/KNOX-2095
>             Project: Apache Knox
>          Issue Type: Improvement
>    Affects Versions: 1.2.0, 1.3.0
>            Reporter: James Chen
>            Assignee: James Chen
>            Priority: Minor
>              Labels: easyfix
>             Fix For: 1.4.0
>
>         Attachments: jamchen504patch.patch
>
>   Original Estimate: 168h
>  Remaining Estimate: 168h
>
> When errors occur while accessing the Knox gateway, errors are forcibly 
> overridden and represented as 500 errors, rather than whatever errors they 
> should be.
> For example, when the timeout value under gateway.httpclient.socketTimeout is 
> set to a very low timeout value (E.G. 1 ms) under gateway-site.xml, a socket 
> timeout exception is produced by the getHttpClient().execute( 
> outboundRequest) call. However, this is caught by the surrounding try-catch 
> block and thrown again as an IOException. This results in a generic 500 
> error, rather than a 504 error one would normally expect from this sort of 
> interaction.
>  
> For these sorts of scenarios, I believe it would be prudent to create a dummy 
> HttpResponse using a HttpResponseFactory object for the inboundResponse with 
> the corresponding error code (E.G. HttpStatus.SC_GATEWAY_TIMEOUT in the event 
> of a SocketTimeoutException) and return that instead to trigger the 
> appropriate 504 error. I suspect there are other sorts of potential error 
> code triggers that get this same IOException treatment that would be better 
> off receiving their own error codes.
>  
> Judging from the source code, this issue likely affects version 1.3.0, though 
> this has not been tested.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

Reply via email to