bingquanzhao opened a new pull request, #63181:
URL: https://github.com/apache/doris/pull/63181

   Replace HttpClient5 async with HttpClient4 sync to fix 
CircularRedirectException (1.2.0 -> 1.2.1)
   
   The logstash-output-doris plugin uses Apache HttpClient5 async client to PUT 
stream load requests. Against SelectDB Cloud / BYOC FE — which returns '307 + 
Connection: close' on stream load — the async client fails with 
CircularRedirectException under any meaningful concurrency / body size.
   
   Root cause:
     1. HC5 async does not strictly block body transmission while waiting for 
'100 Continue'. When FE returns 307 before issuing 100, the entity producer has 
already started writing; FE closing the connection then yields an IOException 
mid-transfer.
     2. HC5 default exec chain wraps RedirectExec around 
AsyncHttpRequestRetryExec. The recoverable IOException triggers an internal 
retry that re-enters the same FE -> 307 path, but RedirectLocations from the 
first attempt is still populated, so the same BE URL is detected as 'already 
visited' and reported as a circular redirect.
   
   This is a real HC5-vs-HC4 implementation difference, not a configuration 
issue. The Doris Flink connector also follows FE 307 to BE in its default path 
(autoRedirect=true) and works correctly precisely because it uses HC4 sync: HC4 
honors 'Expect: 100-continue' strictly, so when FE 307s without sending 100, 
the entity is left unconsumed and HC4's RedirectExec follows the redirect 
normally.
   
   This patch aligns the plugin with the Flink connector's HTTP layer:
     - bump gem version 1.2.0 -> 1.2.1
     - httpclient5 5.4.2 (async)  ->  httpclient 4.5.13 (sync)
     - SimpleRequestBuilder       ->  HttpPut + ByteArrayEntity (repeatable)
     - HttpAsyncClients defaults  ->  HttpClients with:
         * setRequestExecutor(HttpRequestExecutor(60s))
         * setRedirectStrategy(DorisRedirectStrategy) (isRedirectable=true,
           strip userinfo, normalize empty query)
         * setRetryHandler(DefaultHttpRequestRetryHandler(0, false))
         * setConnectionReuseStrategy(NoConnectionReuseStrategy.INSTANCE)
         * RequestConfig.setExpectContinueEnabled(true)
     - Async future plumbing in TableEvents replaced with sync
       response_code / response_body / response_error fields.
     - Stringify both key and value at request.addHeader call site: HC4's
       addHeader(String, String) is strict on types whereas HC5 had a
       permissive (String, Object) overload; user configs commonly carry
       Float / Integer values like 'max_filter_ratio => 1.0'.
     - Drop 's.requirements << jar ...' from gemspec: with JARs vendored under
       lib/, the maven lookup at install time is unnecessary and forced users
       to set JARS_SKIP=true for offline installs.
   
   Pipeline configuration, retry queue, save_on_failure, group_commit, label 
generation, header handling - all unchanged.
   
   Verified on a SelectDB BYOC cluster mirroring the reported production shape 
(16 workers x 10000 batch x 200,000 events):
     - Before: 100% requests fail with CircularRedirectException
     - After:  20/20 stream loads Status=Success, 200,000/200,000 rows 
ingested, 0 HTTP-layer errors.
   
   ### What problem does this PR solve?
   
   Issue Number: close #xxx
   
   Related PR: #xxx
   
   Problem Summary:
   
   ### Release note
   
   None
   
   ### Check List (For Author)
   
   - Test <!-- At least one of them must be included. -->
       - [ ] Regression test
       - [ ] Unit Test
       - [ ] Manual test (add detailed scripts or steps below)
       - [ ] No need to test or manual test. Explain why:
           - [ ] This is a refactor/code format and no logic has been changed.
           - [ ] Previous test can cover this change.
           - [ ] No code files have been changed.
           - [ ] Other reason <!-- Add your reason?  -->
   
   - Behavior changed:
       - [ ] No.
       - [ ] Yes. <!-- Explain the behavior change -->
   
   - Does this need documentation?
       - [ ] No.
       - [ ] Yes. <!-- Add document PR link here. eg: 
https://github.com/apache/doris-website/pull/1214 -->
   
   ### Check List (For Reviewer who merge this PR)
   
   - [ ] Confirm the release note
   - [ ] Confirm test cases
   - [ ] Confirm document
   - [ ] Add branch pick label <!-- Add branch pick label that this PR should 
merge into -->
   
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to