This is an automated email from the ASF dual-hosted git repository.

morningman pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/doris.git


The following commit(s) were added to refs/heads/master by this push:
     new 2471d75d8f5 [Enhancement](be) Fix small_file_mgr to support HTTPS when 
FE runs in HTTPS-only mode (#63918)
2471d75d8f5 is described below

commit 2471d75d8f50b8817c6be2a6c13f44de304060ca
Author: nsivarajan <[email protected]>
AuthorDate: Fri Jun 5 21:05:09 2026 +0530

    [Enhancement](be) Fix small_file_mgr to support HTTPS when FE runs in 
HTTPS-only mode (#63918)
    
    **Problem Summary:**
    
    When Doris FE is configured with `enable_https=true` and `http_port=0`
    (HTTPS-only hardened deployment), the BE's `SmallFileMgr` fails to
    download small files (SSL certificates, UDF jars, Kerberos keytabs) from
    the FE master.
    
    `SmallFileMgr` is the only BE→FE path that uses HTTP rather than
    Thrift/RPC. It downloads files via `/api/get_small_file` using a
    hardcoded `http://` scheme. When the FE disables HTTP (`http_port=0`),
    this connection is refused and the
    download fails — breaking features that depend on small files, such as
    Routine Load with Kafka SSL certificates.
    
    ### What is changed and how does it work?
    
    `_download_file()` in `small_file_mgr.cpp` now tries HTTP first
    (preserving zero-overhead behavior for existing HTTP deployments), then
    falls back to HTTPS if HTTP fails. The HTTPS attempt uses
    `use_untrusted_ssl()` which skips TLS certificate chain verification.
    
    This is safe for two reasons:
    1. This is internal cluster traffic on a private network (FE master →
    BE).
    2. Every downloaded file is independently verified via MD5 checksum
    after download, making it computationally infeasible for a tampered file
    to pass undetected.
    
    Note: A companion FE PR is needed for the complete fix. The FE
    `HeartbeatMgr` must send `https_port` (not `http_port`) to BEs when
    `enable_https=true`, so that `master_fe_http_port` contains the correct
    port for both the HTTP and HTTPS attempts. Without the FE change #60921
    , this BE change is safe (no regression) . With both PRs merged, the
    full end-to-end fix is complete.
    
    Co-authored-by: Sivarajan Narayanan <[email protected]>
---
 be/src/runtime/small_file_mgr.cpp | 46 +++++++++++++++++++++++++++++----------
 1 file changed, 34 insertions(+), 12 deletions(-)

diff --git a/be/src/runtime/small_file_mgr.cpp 
b/be/src/runtime/small_file_mgr.cpp
index bfcded942f3..f383a89e846 100644
--- a/be/src/runtime/small_file_mgr.cpp
+++ b/be/src/runtime/small_file_mgr.cpp
@@ -25,6 +25,7 @@
 #include <glog/logging.h>
 #include <stdint.h>
 #include <stdio.h>
+#include <unistd.h>
 
 #include <cstring>
 #include <memory>
@@ -166,19 +167,15 @@ Status SmallFileMgr::_download_file(int64_t file_id, 
const std::string& md5,
         return Status::InternalError("fail to open file");
     }
 
-    HttpClient client;
-
-    std::stringstream url_ss;
     ClusterInfo* cluster_info = _exec_env->cluster_info();
-    url_ss << cluster_info->master_fe_addr.hostname << ":" << 
cluster_info->master_fe_http_port
-           << "/api/get_small_file?"
-           << "file_id=" << file_id << "&token=" << cluster_info->token;
+    // Small file download is the only BE→FE path that uses HTTP (not 
Thrift/RPC).
+    // master_fe_http_port is set to https_port when enable_https=true (see 
HeartbeatMgr).
+    // The ~1ms fallback overhead is acceptable; small file downloads are 
infrequent.
+    const std::string host_port = cluster_info->master_fe_addr.hostname + ":" +
+                                  
std::to_string(cluster_info->master_fe_http_port);
+    const std::string query = "/api/get_small_file?file_id=" + 
std::to_string(file_id) +
+                              "&token=" + cluster_info->token;
 
-    std::string url = url_ss.str();
-
-    LOG(INFO) << "download file from: " << url;
-
-    RETURN_IF_ERROR(client.init(url));
     Status status;
     Md5Digest digest;
     auto download_cb = [&status, &tmp_file, &fp, &digest](const void* data, 
size_t length) {
@@ -192,7 +189,32 @@ Status SmallFileMgr::_download_file(int64_t file_id, const 
std::string& md5,
         }
         return true;
     };
-    RETURN_IF_ERROR(client.execute(download_cb));
+
+    std::string url = "http://"; + host_port + query;
+    LOG(INFO) << "download file from: " << url;
+    HttpClient client;
+    RETURN_IF_ERROR(client.init(url));
+    Status execute_status = client.execute(download_cb);
+
+    if (!execute_status.ok()) {
+        rewind(fp.get());
+        if (ftruncate(fileno(fp.get()), 0) != 0) {
+            LOG(WARNING) << "fail to truncate temp file for https retry, 
errno=" << errno;
+        }
+        status = Status::OK();
+        digest = Md5Digest();
+
+        url = "https://"; + host_port + query;
+        LOG(INFO) << "HTTP failed, retrying with HTTPS: " << url;
+        HttpClient https_client;
+        RETURN_IF_ERROR(https_client.init(url));
+        // Skip TLS cert verification: internal cluster traffic only; file 
integrity
+        // is guaranteed independently by MD5 checksum verification below.
+        https_client.use_untrusted_ssl();
+        execute_status = https_client.execute(download_cb);
+    }
+
+    RETURN_IF_ERROR(execute_status);
     RETURN_IF_ERROR(status);
     digest.digest();
 


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to