This is an automated email from the ASF dual-hosted git repository.
morningman pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/doris.git
The following commit(s) were added to refs/heads/master by this push:
new 2471d75d8f5 [Enhancement](be) Fix small_file_mgr to support HTTPS when
FE runs in HTTPS-only mode (#63918)
2471d75d8f5 is described below
commit 2471d75d8f50b8817c6be2a6c13f44de304060ca
Author: nsivarajan <[email protected]>
AuthorDate: Fri Jun 5 21:05:09 2026 +0530
[Enhancement](be) Fix small_file_mgr to support HTTPS when FE runs in
HTTPS-only mode (#63918)
**Problem Summary:**
When Doris FE is configured with `enable_https=true` and `http_port=0`
(HTTPS-only hardened deployment), the BE's `SmallFileMgr` fails to
download small files (SSL certificates, UDF jars, Kerberos keytabs) from
the FE master.
`SmallFileMgr` is the only BE→FE path that uses HTTP rather than
Thrift/RPC. It downloads files via `/api/get_small_file` using a
hardcoded `http://` scheme. When the FE disables HTTP (`http_port=0`),
this connection is refused and the
download fails — breaking features that depend on small files, such as
Routine Load with Kafka SSL certificates.
### What is changed and how does it work?
`_download_file()` in `small_file_mgr.cpp` now tries HTTP first
(preserving zero-overhead behavior for existing HTTP deployments), then
falls back to HTTPS if HTTP fails. The HTTPS attempt uses
`use_untrusted_ssl()` which skips TLS certificate chain verification.
This is safe for two reasons:
1. This is internal cluster traffic on a private network (FE master →
BE).
2. Every downloaded file is independently verified via MD5 checksum
after download, making it computationally infeasible for a tampered file
to pass undetected.
Note: A companion FE PR is needed for the complete fix. The FE
`HeartbeatMgr` must send `https_port` (not `http_port`) to BEs when
`enable_https=true`, so that `master_fe_http_port` contains the correct
port for both the HTTP and HTTPS attempts. Without the FE change #60921
, this BE change is safe (no regression) . With both PRs merged, the
full end-to-end fix is complete.
Co-authored-by: Sivarajan Narayanan <[email protected]>
---
be/src/runtime/small_file_mgr.cpp | 46 +++++++++++++++++++++++++++++----------
1 file changed, 34 insertions(+), 12 deletions(-)
diff --git a/be/src/runtime/small_file_mgr.cpp
b/be/src/runtime/small_file_mgr.cpp
index bfcded942f3..f383a89e846 100644
--- a/be/src/runtime/small_file_mgr.cpp
+++ b/be/src/runtime/small_file_mgr.cpp
@@ -25,6 +25,7 @@
#include <glog/logging.h>
#include <stdint.h>
#include <stdio.h>
+#include <unistd.h>
#include <cstring>
#include <memory>
@@ -166,19 +167,15 @@ Status SmallFileMgr::_download_file(int64_t file_id,
const std::string& md5,
return Status::InternalError("fail to open file");
}
- HttpClient client;
-
- std::stringstream url_ss;
ClusterInfo* cluster_info = _exec_env->cluster_info();
- url_ss << cluster_info->master_fe_addr.hostname << ":" <<
cluster_info->master_fe_http_port
- << "/api/get_small_file?"
- << "file_id=" << file_id << "&token=" << cluster_info->token;
+ // Small file download is the only BE→FE path that uses HTTP (not
Thrift/RPC).
+ // master_fe_http_port is set to https_port when enable_https=true (see
HeartbeatMgr).
+ // The ~1ms fallback overhead is acceptable; small file downloads are
infrequent.
+ const std::string host_port = cluster_info->master_fe_addr.hostname + ":" +
+
std::to_string(cluster_info->master_fe_http_port);
+ const std::string query = "/api/get_small_file?file_id=" +
std::to_string(file_id) +
+ "&token=" + cluster_info->token;
- std::string url = url_ss.str();
-
- LOG(INFO) << "download file from: " << url;
-
- RETURN_IF_ERROR(client.init(url));
Status status;
Md5Digest digest;
auto download_cb = [&status, &tmp_file, &fp, &digest](const void* data,
size_t length) {
@@ -192,7 +189,32 @@ Status SmallFileMgr::_download_file(int64_t file_id, const
std::string& md5,
}
return true;
};
- RETURN_IF_ERROR(client.execute(download_cb));
+
+ std::string url = "http://" + host_port + query;
+ LOG(INFO) << "download file from: " << url;
+ HttpClient client;
+ RETURN_IF_ERROR(client.init(url));
+ Status execute_status = client.execute(download_cb);
+
+ if (!execute_status.ok()) {
+ rewind(fp.get());
+ if (ftruncate(fileno(fp.get()), 0) != 0) {
+ LOG(WARNING) << "fail to truncate temp file for https retry,
errno=" << errno;
+ }
+ status = Status::OK();
+ digest = Md5Digest();
+
+ url = "https://" + host_port + query;
+ LOG(INFO) << "HTTP failed, retrying with HTTPS: " << url;
+ HttpClient https_client;
+ RETURN_IF_ERROR(https_client.init(url));
+ // Skip TLS cert verification: internal cluster traffic only; file
integrity
+ // is guaranteed independently by MD5 checksum verification below.
+ https_client.use_untrusted_ssl();
+ execute_status = https_client.execute(download_cb);
+ }
+
+ RETURN_IF_ERROR(execute_status);
RETURN_IF_ERROR(status);
digest.digest();
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]