Hi all, fixing the subtle bugs in HttpUtils has lead to quite an odyssey. We needed to roll back the changes mostly because of latency regressions being caught by the tests in the main repository.
These regressions seemed to have been applying only to runtimes != node.js, which caused me to investigate into those. My theory is that neither our Python proxy (used by Swift etc. as well) nor the Java proxy properly support persistent connections today. Handling requests properly on the client-side (closing the entity stream) turned out to add a whole lot of latency only for those runtimes (10-20m per request). I then went ahead and disabled connection reuse in total (it shouldn't be working correctly today anyway, due to the missing closes). That brought the latency back to normal through all runtimes. The good sideeffect of that is, that resolves any issues of checking connections for staleness and/or them breaking because of the pause/resume lifecycle discussed in this thread. The added latency of always establishing a new connection were not measurable in the latency tests. They are probably <= 1ms. Compared to the added benefit of always being "correct" and enabling us to implement the other bugfixes needed in HttpUtils (properly closing responses and entities), I feel like we should disable connection reuse until we find that to be a bottleneck that warrants the added complexity of handling their lifecycle properly (as discussed here). The new implementation is to be found here: https://github.com/apache/incubator-openwhisk/pull/3843 Cheers -mt
