github-actions[bot] commented on code in PR #61242:
URL: https://github.com/apache/doris/pull/61242#discussion_r2922826937


##########
be/src/service/http/action/stream_load.cpp:
##########
@@ -366,6 +370,9 @@ Status StreamLoadAction::_on_header(HttpRequest* http_req, 
std::shared_ptr<Strea
         int64_t begin_txn_start_time = MonotonicNanos();
         
RETURN_IF_ERROR(_exec_env->stream_load_executor()->begin_txn(ctx.get()));
         ctx->begin_txn_cost_nanos = MonotonicNanos() - begin_txn_start_time;
+        if (ctx->group_commit) {
+            RETURN_IF_ERROR(_check_wal_space(ctx->group_commit_mode, 
ctx->body_bytes));
+        }

Review Comment:
   **[Low] `_check_wal_space` called with `body_bytes=0`**
   
   At this point in `_on_header`, `ctx->body_bytes` is initialized to 0 (set at 
line 304). The body hasn't been received yet. So 
`_check_wal_space(ctx->group_commit_mode, ctx->body_bytes)` is called with 
`content_length=0`, which always passes the 
`load_size_smaller_than_wal_limit(0)` check.
   
   This means the WAL space check is effectively a no-op for the table-property 
group commit path. The explicit header path has a proper check in 
`_can_group_commit` (line 943) using the Content-Length header value.
   
   Consider using the Content-Length from the HTTP header instead:
   ```cpp
   if (ctx->group_commit) {
       int64_t content_length = 
http_req->header(HttpHeaders::CONTENT_LENGTH).empty()
                                ? 0 : 
std::stoll(http_req->header(HttpHeaders::CONTENT_LENGTH));
       RETURN_IF_ERROR(_check_wal_space(ctx->group_commit_mode, 
content_length));
   }
   ```



##########
be/src/service/http/action/stream_load.cpp:
##########
@@ -933,17 +940,43 @@ Status 
StreamLoadAction::_handle_group_commit(HttpRequest* req,
         if (!config::wait_internal_group_commit_finish && !ctx->label.empty()) 
{
             return Status::InvalidArgument("label and group_commit can't be 
set at the same time");
         }

Review Comment:
   **[High] Label check incorrectly rejects labeled stream loads on tables with 
`group_commit_mode` property**
   
   When the user does NOT set the `group_commit` HTTP header (empty 
`group_commit_header`), but the table has 
`group_commit_mode=async_mode/sync_mode`, this code path is still reached via 
`_handle_group_commit` → `_can_group_commit`. If the user provides a label 
(which is normal for stream loads), this returns `InvalidArgument("label and 
group_commit can't be set at the same time")`.
   
   The problem: the user never explicitly requested group commit — they just 
have a table property. This is a behavioral regression because existing users 
who always set labels would suddenly get errors after someone sets the table 
property.
   
   Suggestion: Either skip this check when `group_commit_header` is empty 
(table-property path), or move the label conflict check to after the FE 
responds confirming the table actually uses group commit. For example:
   ```cpp
   if (!group_commit_header.empty() && 
!config::wait_internal_group_commit_finish && !ctx->label.empty()) {
       return Status::InvalidArgument("label and group_commit can't be set at 
the same time");
   }
   ```
   This ensures only explicit header-based group commit rejects labels, while 
the table-property path allows labels (and falls through to a normal 
non-group-commit load when a label is present).



##########
be/src/load/stream_load/stream_load_executor.cpp:
##########
@@ -217,6 +220,15 @@ Status StreamLoadExecutor::begin_txn(StreamLoadContext* 
ctx) {
         }
         return status;
     }
+    if (ctx->group_commit_mode.empty()) {
+        auto table_group_commit_mode = result.table_group_commit_mode;
+        if (iequal(table_group_commit_mode, "async_mode") ||

Review Comment:
   **[Low] Missing `__isset` check for `result.table_group_commit_mode`**
   
   This reads `result.table_group_commit_mode` without first checking 
`result.__isset.table_group_commit_mode`. While the default empty string would 
cause the `iequal` checks to fail (so this works in practice), it's more 
idiomatic and safer to check `__isset` first, consistent with how 
`result.__isset.db_id` is checked at line 233:
   
   ```cpp
   if (ctx->group_commit_mode.empty() && 
result.__isset.table_group_commit_mode) {
       auto table_group_commit_mode = result.table_group_commit_mode;
   ```



##########
be/src/service/http/action/stream_load.cpp:
##########
@@ -933,17 +940,43 @@ Status 
StreamLoadAction::_handle_group_commit(HttpRequest* req,
         if (!config::wait_internal_group_commit_finish && !ctx->label.empty()) 
{
             return Status::InvalidArgument("label and group_commit can't be 
set at the same time");
         }
-        ctx->group_commit = true;
-        if (iequal(group_commit_mode, "async_mode")) {
-            if (!load_size_smaller_than_wal_limit(content_length)) {
-                std::stringstream ss;
-                ss << "There is no space for group commit stream load async 
WAL. This stream load "
-                      "size is "
-                   << content_length << ". WAL dir info: "
-                   << 
ExecEnv::GetInstance()->wal_mgr()->get_wal_dirs_info_string();
-                LOG(WARNING) << ss.str();
-                return Status::Error<EXCEEDED_LIMIT>(ss.str());
-            }
+        RETURN_IF_ERROR(_check_wal_space(group_commit_header, content_length));
+        can_group_commit = true;
+    }
+    return Status::OK();
+}
+
+Status StreamLoadAction::_handle_group_commit(HttpRequest* req,
+                                              
std::shared_ptr<StreamLoadContext> ctx) {
+    std::string group_commit_header = req->header(HTTP_GROUP_COMMIT);
+    if (!group_commit_header.empty() && !iequal(group_commit_header, 
SYNC_MODE) &&
+        !iequal(group_commit_header, ASYNC_MODE) && 
!iequal(group_commit_header, OFF_MODE)) {
+        return Status::InvalidArgument(
+                "group_commit can only be [async_mode, sync_mode, off_mode]");
+    }
+    if (config::wait_internal_group_commit_finish) {
+        group_commit_header = SYNC_MODE;
+    }
+
+    // if group_commit_header is off_mode, we will not use group commit
+    if (iequal(group_commit_header, OFF_MODE)) {
+        ctx->group_commit_mode = OFF_MODE;
+        ctx->group_commit = false;
+        return Status::OK();
+    }
+    bool can_group_commit = false;
+    RETURN_IF_ERROR(_can_group_commit(req, ctx, group_commit_header, 
can_group_commit));
+    if (!can_group_commit) {
+        ctx->group_commit_mode = OFF_MODE;
+        ctx->group_commit = false;
+    } else {
+        if (!group_commit_header.empty()) {
+            ctx->group_commit_mode = group_commit_header;
+            ctx->group_commit = true;
+        } else {
+            // use table property to decide group commit or not
+            ctx->group_commit_mode = "";

Review Comment:
   **[Medium] Parallel code path: `HttpStreamAction` not updated**
   
   The `HttpStreamAction` (`http_stream.cpp`) has its own 
`_handle_group_commit` method that does NOT support the table-property group 
commit mode. It treats an empty `group_commit` header as "no group commit" and 
returns immediately. Unlike `StreamLoadAction`, it doesn't call `begin_txn` 
(the transaction is opened by FE via `NativeInsertStmt`), so the mechanism used 
here (empty `group_commit_mode` → `begin_txn` queries FE for table property) 
doesn't directly apply.
   
   However, the feature description says _"For stream load, if it not set 
group_commit header, use the table property as the group commit mode"_. This 
should apply to both stream load endpoints. Consider either:
   1. Adding table-property support to `HttpStreamAction` in this PR, or
   2. Documenting this as a known limitation and filing a follow-up issue.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to