[jira] [Created] (KUDU-3666) ReplicatedAlterTableTest.AlterTableAndDropTablet fails from time to time due to absence of only-once semantics for AlterTable

Alexey Serbin (Jira) Wed, 04 Jun 2025 11:33:08 -0700

Alexey Serbin created KUDU-3666:
-----------------------------------

             Summary: ReplicatedAlterTableTest.AlterTableAndDropTablet fails 
from time to time due to absence of only-once semantics for AlterTable
                 Key: KUDU-3666
                 URL: https://issues.apache.org/jira/browse/KUDU-3666
             Project: Kudu
          Issue Type: Bug
          Components: master, test
            Reporter: Alexey Serbin
         Attachments: alter_table-test.00-debug.txt.xz, 
alter_table-test.00-release.txt.xz, alter_table-test.01-release.txt.xz


The ReplicatedAlterTableTest.AlterTableAndDropTablet fails from time to time.  
Failures are manifested by error messages like below:

{noformat}
src/kudu/integration-tests/alter_table-test.cc:2378: Failure
Failed                                                                          
Bad status: Already present: The column already exists: new_c39 
{noformat}

{noformat}
src/kudu/integration-tests/alter_table-test.cc:2378: Failure
Failed
Bad status: Already present: The column already exists: new_c44  
{noformat}

{noformat}
src/kudu/integration-tests/alter_table-test.cc:2385: Failure
Failed
Bad status: Invalid argument: no range partition to drop: 9 <= VALUES < 10 
{noformat}

The culprit seems to be a retried AlterTable RPC request.  The client assumed 
that the request failed, but in fact the request succeeded at the server side.  
To address the issue, we need to enable exactly-once RPC semantics (i.e. 
kudu.rpc.track_rpc_result option in protobuf) for 
AlterTable(AlterTableRequestPB) RPC method of masters as well.  At the time of 
writing, we have it enabled only for Write(WriteRequestPB) RPC method of tablet 
servers.

Full test logs are attached for convenience.  In each of the logs, the evidence 
of re-attempted RPC request can be found, e.g.:
{noformat}
W20250531 02:02:56.259193 30345 master_proxy_rpc.cc:203] Re-attempting 
AlterTable request to leader Master (127.22.204.254:43629)
{noformat}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

[jira] [Created] (KUDU-3666) ReplicatedAlterTableTest.AlterTableAndDropTablet fails from time to time due to absence of only-once semantics for AlterTable

Reply via email to