[
https://issues.apache.org/jira/browse/KUDU-3666?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Alexey Serbin updated KUDU-3666:
--------------------------------
Summary: ReplicatedAlterTableTest.AlterTableAndDropTablet fails from time
to time due to absence of exactly-once semantics for AlterTable (was:
ReplicatedAlterTableTest.AlterTableAndDropTablet fails from time to time due to
absence of only-once semantics for AlterTable)
> ReplicatedAlterTableTest.AlterTableAndDropTablet fails from time to time due
> to absence of exactly-once semantics for AlterTable
> --------------------------------------------------------------------------------------------------------------------------------
>
> Key: KUDU-3666
> URL: https://issues.apache.org/jira/browse/KUDU-3666
> Project: Kudu
> Issue Type: Bug
> Components: master, test
> Reporter: Alexey Serbin
> Priority: Major
> Attachments: alter_table-test.00-debug.txt.xz,
> alter_table-test.00-release.txt.xz, alter_table-test.01-release.txt.xz
>
>
> The ReplicatedAlterTableTest.AlterTableAndDropTablet fails from time to time.
> Failures are manifested by error messages like below:
> {noformat}
> src/kudu/integration-tests/alter_table-test.cc:2378: Failure
> Failed
>
> Bad status: Already present: The column already exists: new_c39
> {noformat}
> {noformat}
> src/kudu/integration-tests/alter_table-test.cc:2378: Failure
> Failed
> Bad status: Already present: The column already exists: new_c44
> {noformat}
> {noformat}
> src/kudu/integration-tests/alter_table-test.cc:2385: Failure
> Failed
> Bad status: Invalid argument: no range partition to drop: 9 <= VALUES < 10
> {noformat}
> The culprit seems to be a retried AlterTable RPC request. The client assumed
> that the request failed, but in fact the request succeeded at the server
> side. To address the issue, we need to enable exactly-once RPC semantics
> (i.e. kudu.rpc.track_rpc_result option in protobuf) for
> AlterTable(AlterTableRequestPB) RPC method of masters as well. At the time
> of writing, we have it enabled only for Write(WriteRequestPB) RPC method of
> tablet servers.
> Full test logs are attached for convenience. In each of the logs, the
> evidence of re-attempted RPC request can be found, e.g.:
> {noformat}
> W20250531 02:02:56.259193 30345 master_proxy_rpc.cc:203] Re-attempting
> AlterTable request to leader Master (127.22.204.254:43629)
> {noformat}
--
This message was sent by Atlassian Jira
(v8.20.10#820010)