yiguolei commented on code in PR #60367:
URL: https://github.com/apache/doris/pull/60367#discussion_r2772188143
##########
be/src/vec/common/hash_table/join_hash_table.h:
##########
@@ -35,6 +36,29 @@ inline uint32_t hash_join_table_calc_bucket_size(size_t
num_elem) {
static_cast<size_t>(std::numeric_limits<int32_t>::max()) + 1);
}
+// Estimate the memory size needed for hash table basic structures (first,
next, visited).
+// When include_key_storage is true, also estimates memory for hash map key
storage
+// (stored_keys and bucket_nums), which provides a rough approximation without
+// knowing the exact hash map context type.
+inline size_t estimate_hash_table_mem_size(size_t rows, TJoinOp::type join_op,
+ bool include_key_storage = false) {
+ const auto bucket_size = hash_join_table_calc_bucket_size(rows);
+ size_t size = bucket_size * sizeof(uint32_t); // JoinHashTable::first
+ size += rows * sizeof(uint32_t); // JoinHashTable::next
+ if (join_op == TJoinOp::FULL_OUTER_JOIN || join_op ==
TJoinOp::RIGHT_OUTER_JOIN ||
+ join_op == TJoinOp::RIGHT_ANTI_JOIN || join_op ==
TJoinOp::RIGHT_SEMI_JOIN) {
+ size += rows * sizeof(uint8_t); // JoinHashTable::visited
+ }
+ if (include_key_storage) {
+ // Approximate estimation for hash map key storage:
+ // - stored_keys: StringRef per row for serialized keys
+ // - bucket_nums: uint32_t per row for hash bucket indices
+ size += sizeof(StringRef) * rows; // stored_keys
Review Comment:
StringRef 只是指针+长度,实际的 key 数据存储在 Arena 中
这里没有估算 Arena 的内存
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]