github-actions[bot] commented on code in PR #63533:
URL: https://github.com/apache/doris/pull/63533#discussion_r3301884953


##########
cloud/src/meta-service/meta_service_resource.cpp:
##########
@@ -1491,6 +1641,115 @@ void 
MetaServiceImpl::alter_storage_vault(google::protobuf::RpcController* contr
         code = cast_as<ErrCategory::COMMIT>(err);
         msg = fmt::format("failed to commit kv txn, err={}", err);
         LOG(WARNING) << msg;
+        return;
+    }
+
+    async_notify_refresh_instance(txn_kv_, instance_id, true);
+
+    if (!supports_cascade) {
+        return;
+    }
+
+    if (!instance.has_snapshot_switch_status() ||
+        instance.snapshot_switch_status() == SNAPSHOT_SWITCH_DISABLED) {
+        LOG(INFO) << "snapshot disabled for instance_id=" << instance_id
+                  << ", skip cascade updating derived instances after 
alter_storage_vault";
+        return;
+    }
+
+    std::vector<std::string> cascade_instance_ids;
+    if (find_cascade_instances(txn_kv_.get(), instance_id, 
&cascade_instance_ids) != 0) {
+        LOG(WARNING) << "failed to find derived instances for storage vault 
cascade, instance_id="
+                     << instance_id;
+        return;
+    }
+
+    for (const auto& cascade_id : cascade_instance_ids) {
+        std::unique_ptr<Transaction> cascade_txn;
+        TxnErrorCode cascade_err = txn_kv_->create_txn(&cascade_txn);
+        if (cascade_err != TxnErrorCode::TXN_OK) {
+            code = cast_as<ErrCategory::CREATE>(cascade_err);
+            msg = fmt::format(

Review Comment:
   This reports a derived-instance cascade failure after the root storage-vault 
transaction has already been committed above. A transient error here leaves the 
root, and possibly earlier descendants, updated while the RPC returns failure 
to the caller. Retrying is not safe for operations like renames because the 
root no longer has the old vault name, and the same post-commit error pattern 
is used by the newly added `alter_obj_store_info` cascade below. Please either 
keep post-commit cascade failures best-effort/retriable in the background, or 
make the whole operation transactional/compensated before returning an error.



##########
cloud/src/meta-service/meta_service_resource.cpp:
##########
@@ -1845,9 +2159,11 @@ void 
MetaServiceImpl::update_ak_sk(google::protobuf::RpcController* controller,
         std::unique_ptr<Transaction> cascade_txn;
         TxnErrorCode cascade_err = txn_kv_->create_txn(&cascade_txn);
         if (cascade_err != TxnErrorCode::TXN_OK) {
-            LOG(WARNING) << "failed to create txn for derived instance, 
instance_id=" << cascade_id
-                         << " err=" << cascade_err;
-            continue;
+            code = cast_as<ErrCategory::CREATE>(cascade_err);
+            msg = fmt::format("failed to create txn for derived instance, 
instance_id={}, err={}",

Review Comment:
   This changes the existing AK/SK cascade from best-effort `continue` to 
returning an RPC error after the root instance was already committed. If a 
derived update fails, clients see failure even though the root update 
succeeded; a retry can then hit `ak sk eq original` on the root and still not 
repair the skipped descendants. Please preserve the previous 
success/best-effort semantics or add a retry/reconciliation mechanism instead 
of returning errors after partial commits.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to