Alexey Serbin has posted comments on this change. ( http://gerrit.cloudera.org:8080/16527 )
Change subject: KUDU-2612: initial implementation of TxnManager ...................................................................... Patch Set 1: (16 comments) I split this patch in two, addressing the feedback in both patches. A follow-up patch is coming. http://gerrit.cloudera.org:8080/#/c/16527/1//COMMIT_MSG Commit Message: http://gerrit.cloudera.org:8080/#/c/16527/1//COMMIT_MSG@14 PS1, Line 14: * Add many more tests, especially for BeginTransaction to make : txn_id allocation works as expected: as of now, it seems : we could get surprises until TxnStatusManager::BeginTransaction() : updated to avoid incrementing highest_txn_id_ and responding : back when the txn status tablet replica isn't a leader. > nit: would it make sense to split up the txn ID assignment piece from the X Yeah, it totally makes sense -- I moved that from TxnManagerServiceImpl into TxnManage. So, as of now (PS2), the assignment of txn ID is implemented in BeginTransaction, and in this sense it's decoupled from the initialization logic. http://gerrit.cloudera.org:8080/#/c/16527/1//COMMIT_MSG@19 PS1, Line 19: Change the initialization of the TxnManager in master to : run on first txn-related request: that should fix the issue : of too many retries when masters are up and running, but : no tablet servers are around yet. Also, that should fix : many of currently failing tests scenarios which expect : there isn't any existing tables when they run right after : starting a mini cluster first time. > We discussed this a bit on Slack earlier -- I like the idea of lazily creat I implemented it as a configurable option: see FLAGS_txn_manager_lazily_initialized http://gerrit.cloudera.org:8080/#/c/16527/1/src/kudu/master/catalog_manager.cc File src/kudu/master/catalog_manager.cc: http://gerrit.cloudera.org:8080/#/c/16527/1/src/kudu/master/catalog_manager.cc@3235 PS1, Line 3235: for (const auto& elem : table_ids_map_) { : const auto& table_info = elem.second; : if (table_info->type() != TableTypePB::DEFAULT_TABLE) { : continue; : } : tables->emplace_back(table_info); : } > Making sure I understand why this is here -- will we need this if we go dow Right: this is a piece from the alternative approach, and it should not be needed once switched to the alternative 'the lazy initialization' approach. http://gerrit.cloudera.org:8080/#/c/16527/1/src/kudu/master/txn_manager-test.cc File src/kudu/master/txn_manager-test.cc: http://gerrit.cloudera.org:8080/#/c/16527/1/src/kudu/master/txn_manager-test.cc@108 PS1, Line 108: ASSERT_TRUE(resp.has_txn_id()); > nit: maybe check that these are all unique? It's a good point. Done. I also added verification of the txn_id monotonicity. http://gerrit.cloudera.org:8080/#/c/16527/1/src/kudu/master/txn_manager.cc File src/kudu/master/txn_manager.cc: http://gerrit.cloudera.org:8080/#/c/16527/1/src/kudu/master/txn_manager.cc@47 PS1, Line 47: using kudu::rpc::RpcContext; > warning: using decl 'RpcContext' is unused [misc-unused-using-decls] Done http://gerrit.cloudera.org:8080/#/c/16527/1/src/kudu/master/txn_manager.cc@48 PS1, Line 48: using kudu::server::ServerBase; > warning: using decl 'ServerBase' is unused [misc-unused-using-decls] Done http://gerrit.cloudera.org:8080/#/c/16527/1/src/kudu/master/txn_manager.cc@77 PS1, Line 77: static const auto kMissingTxnId = > warning: 'kMissingTxnId' is a static definition in anonymous namespace; sta Done http://gerrit.cloudera.org:8080/#/c/16527/1/src/kudu/master/txn_manager.cc@121 PS1, Line 121: txn_id_ > nit: maybe call this next_txn_id_ so it's clear that it's, e.g. not the hig I don't think such a renaming would bring a lot of clarity, frankly. Instead, I added a comment into the header file. http://gerrit.cloudera.org:8080/#/c/16527/1/src/kudu/master/txn_manager.cc@123 PS1, Line 123: static constexpr auto kMaxIterNum = : std::numeric_limits<decltype(try_txn_id)>::digits - 1; > nit: It isn't clear to me what this is or why we're using it. Could you add That was a remnant from the previous implementation, it's removed now. http://gerrit.cloudera.org:8080/#/c/16527/1/src/kudu/master/txn_manager.cc@132 PS1, Line 132: if (txn_id_.compare_exchange_strong(stored_txn_id, try_txn_id + 1) || : stored_txn_id > try_txn_id) > Should we do this exchange even if we didn't get a success? E.g. if one thr I don't think so. As written, another thread and this threads are already trying with higher ids. This is just condition to see that current thread has stored its txn_id which it reserved, and it's the highest seen so far. http://gerrit.cloudera.org:8080/#/c/16527/1/src/kudu/master/txn_manager.cc@138 PS1, Line 138: } else if (s.IsInvalidArgument()) { > warning: do not use 'else' after 'break' [readability-else-after-return] Done http://gerrit.cloudera.org:8080/#/c/16527/1/src/kudu/master/txn_manager.cc@195 PS1, Line 195: Status TxnManager::KeepTransactionAlive(int64_t /* txn_id */, > warning: method 'KeepTransactionAlive' can be made static [readability-conv This cannot be static once the TODO is addressed. http://gerrit.cloudera.org:8080/#/c/16527/1/src/kudu/master/txn_manager.cc@198 PS1, Line 198: // TODO: call txn_sys_client_ once the functionality is available there > warning: missing username/bug in TODO [google-readability-todo] Done http://gerrit.cloudera.org:8080/#/c/16527/1/src/kudu/master/txn_manager_service.cc File src/kudu/master/txn_manager_service.cc: http://gerrit.cloudera.org:8080/#/c/16527/1/src/kudu/master/txn_manager_service.cc@59 PS1, Line 59: static const auto kMissingTxnId = > warning: 'kMissingTxnId' is a static definition in anonymous namespace; sta Done http://gerrit.cloudera.org:8080/#/c/16527/1/src/kudu/master/txn_manager_service.cc@152 PS1, Line 152: // TODO: call txn_sys_client_ once the functionality is available there > warning: missing username/bug in TODO [google-readability-todo] Done http://gerrit.cloudera.org:8080/#/c/16527/1/src/kudu/transactions/txn_system_client.cc File src/kudu/transactions/txn_system_client.cc: http://gerrit.cloudera.org:8080/#/c/16527/1/src/kudu/transactions/txn_system_client.cc@62 PS1, Line 62: if (master_addrs.empty()) { : return Status::InvalidArgument("empty list of master addresses"); : } > Can we get the non-empty list from the Master directly, assuming this is on Not sure I understand what you mean. In this context, the list if master addresses is received from GetMasterHostPorts(&hostports) and then TxnSystemClient::Create() is called, so the list coming from the Master. Maybe, we should to move this verification to be in KuduClientBuilder::Build()? I guess at it should be an improvement that helps to catch various issues earlier, without first trying to connect to a cluster. I'll put together a patch for that. -- To view, visit http://gerrit.cloudera.org:8080/16527 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: kudu Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: Ie952977a3ae5f625d1283389f0be8afb79df7d8c Gerrit-Change-Number: 16527 Gerrit-PatchSet: 1 Gerrit-Owner: Alexey Serbin <aser...@cloudera.com> Gerrit-Reviewer: Alexey Serbin <aser...@cloudera.com> Gerrit-Reviewer: Andrew Wong <aw...@cloudera.com> Gerrit-Reviewer: Hao Hao <hao....@cloudera.com> Gerrit-Reviewer: Kudu Jenkins (120) Gerrit-Reviewer: Tidy Bot (241) Gerrit-Comment-Date: Mon, 12 Oct 2020 20:45:16 +0000 Gerrit-HasComments: Yes