[ https://issues.apache.org/jira/browse/KNOX-3058?focusedWorklogId=931031&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-931031 ]
ASF GitHub Bot logged work on KNOX-3058: ---------------------------------------- Author: ASF GitHub Bot Created on: 20/Aug/24 21:45 Start Date: 20/Aug/24 21:45 Worklog Time Spent: 10m Work Description: pzampino opened a new pull request, #929: URL: https://github.com/apache/knox/pull/929 ## What changes were proposed in this pull request? Modified error handling when a topology is being redeployed, such that the response is not HTTP 404 'Not Found', but rather HTTP 503 'Service Unavailable'. The 503 response is much more likely to be retried by clients than is a 404 response. Added a Jetty ErrorHandler that checks whether or not the topology being requested is in a Set marked as inactive. Topology names are add to this inactive set when the associated topology is deactivated, and removed from this set when the topology is reactivated. In the case of topology deletion, the topology is marked as inactive, but then removed from the inactive set because we know it's being deleted. ## How was this patch tested? I deployed a test topology with a demo LDAP provider and the Knox Token service. I then ran the following script with 'https://localhost:8443/gateway/demo/knoxtoken/api/v2/token' and one of the demo LDAP username/pwd combinations, and piped the output to a file. This script outputs only the HTTP response status code for each invocation. ``` #!/bin/sh # # ENDPOINT=$1 echo "Endpoint: $ENDPOINT" if [ ! -z "$2" ] ; then USER=$2 fi if [ ! -z "$3" ] ; then PWD=$3 fi for i in {1..100000} do curl -o /dev/null -s -w "%{http_code}\n" -ku ${USER}:${PWD} ${ENDPOINT} done ``` Example: `~/bin/resp-test.sh 'https://localhost:8443/gateway/demo/knoxtoken/api/v2/token' sam sam-password > ~/response-code-test.txt & ` While this script is running, I "touched" the test topology to trigger redeployment many times over several minutes. Finally, I deleted the test topology. Following this, I reviewed the output to verify that there were no 404 responses until that time at which I deleted the topology. I also verified the periodic 503 responses which are expected, and the normal 200 responses in between. Issue Time Tracking ------------------- Worklog Id: (was: 931031) Remaining Estimate: 0h Time Spent: 10m > Avoid 404 When Topology Is Being Redeployed > ------------------------------------------- > > Key: KNOX-3058 > URL: https://issues.apache.org/jira/browse/KNOX-3058 > Project: Apache Knox > Issue Type: Improvement > Components: Server > Reporter: Philip Zampino > Assignee: Philip Zampino > Priority: Major > Time Spent: 10m > Remaining Estimate: 0h > > While a topology is being redeployed, if it is requested, the client receives > an HTTP 404 response. Most clients will not retry when receiving a 404, so > the interaction will fail. > If Knox were to respond with a more retry-friendly response (e.g., HTTP 503), > then clients could overcome these small windows of unavailability with > retries. > The difficult part may be distinguishing topology removal from topology > inactivity. I think a deleted topology should still result in a 404. -- This message was sent by Atlassian Jira (v8.20.10#820010)