[kudu-CR] KUDU-2612: add PartitionLock and ScopedPartitionLock
Hello Tidy Bot, Alexey Serbin, Kudu Jenkins, Andrew Wong, Bankim Bhavsar, I'd like you to reexamine a change. Please visit http://gerrit.cloudera.org:8080/17097 to look at the new patch set (#8). Change subject: KUDU-2612: add PartitionLock and ScopedPartitionLock .. KUDU-2612: add PartitionLock and ScopedPartitionLock This patch introduces a coarse grained partition level lock, PartitionLock to prevent dirty writes for multi-rows transaction. A partition lock can only be held by a single transaction (or write op) at a time, the same transaction can acquire the lock for multiple times. To prevent deadlock, 'wait-die' scheme is used, which if the transaction requires a lock held by another transaction, 1) abort the transaction immediately if it has a higher txn ID than the other one. 2) Otherwise, retry the write op (or participant op) of the transaction. A ScopedPartitionLock is also introduced to automatically manage the lifecycle of a PartitionLock. Change-Id: I158115739ce3e7cfb77bbcb854e834336c1256b1 --- M src/kudu/tablet/lock_manager-test.cc M src/kudu/tablet/lock_manager.cc M src/kudu/tablet/lock_manager.h M src/kudu/tserver/tserver.proto 4 files changed, 464 insertions(+), 13 deletions(-) git pull ssh://gerrit.cloudera.org:29418/kudu refs/changes/97/17097/8 -- To view, visit http://gerrit.cloudera.org:8080/17097 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: kudu Gerrit-Branch: master Gerrit-MessageType: newpatchset Gerrit-Change-Id: I158115739ce3e7cfb77bbcb854e834336c1256b1 Gerrit-Change-Number: 17097 Gerrit-PatchSet: 8 Gerrit-Owner: Hao Hao Gerrit-Reviewer: Alexey Serbin Gerrit-Reviewer: Andrew Wong Gerrit-Reviewer: Bankim Bhavsar Gerrit-Reviewer: Hao Hao Gerrit-Reviewer: Kudu Jenkins (120) Gerrit-Reviewer: Tidy Bot (241)
[kudu-CR] KUDU-2612: add PartitionLock and ScopedPartitionLock
Hello Tidy Bot, Alexey Serbin, Kudu Jenkins, Andrew Wong, Bankim Bhavsar, I'd like you to reexamine a change. Please visit http://gerrit.cloudera.org:8080/17097 to look at the new patch set (#7). Change subject: KUDU-2612: add PartitionLock and ScopedPartitionLock .. KUDU-2612: add PartitionLock and ScopedPartitionLock This patch introduces a coarse grained partition level lock, PartitionLock to prevent dirty writes for multi-rows transaction. A partition lock can only be held by a single transaction (or write op) at a time, the same transaction can acquire the lock for multiple times. To prevent deadlock, 'wait-die' scheme is used, which if the transaction requires a lock held by another transaction, 1) abort the transaction immediately if it has a higher txn ID than the other one. 2) Otherwise, retry the write op (or participant op) of the transaction. A ScopedPartitionLock is also introduced to automatically manage the lifecycle of a PartitionLock. Change-Id: I158115739ce3e7cfb77bbcb854e834336c1256b1 --- M src/kudu/tablet/lock_manager-test.cc M src/kudu/tablet/lock_manager.cc M src/kudu/tablet/lock_manager.h M src/kudu/tserver/tserver.proto 4 files changed, 464 insertions(+), 13 deletions(-) git pull ssh://gerrit.cloudera.org:29418/kudu refs/changes/97/17097/7 -- To view, visit http://gerrit.cloudera.org:8080/17097 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: kudu Gerrit-Branch: master Gerrit-MessageType: newpatchset Gerrit-Change-Id: I158115739ce3e7cfb77bbcb854e834336c1256b1 Gerrit-Change-Number: 17097 Gerrit-PatchSet: 7 Gerrit-Owner: Hao Hao Gerrit-Reviewer: Alexey Serbin Gerrit-Reviewer: Andrew Wong Gerrit-Reviewer: Bankim Bhavsar Gerrit-Reviewer: Hao Hao Gerrit-Reviewer: Kudu Jenkins (120) Gerrit-Reviewer: Tidy Bot (241)
[kudu-CR] KUDU-2612: add PartitionLock and ScopedPartitionLock
Hao Hao has posted comments on this change. ( http://gerrit.cloudera.org:8080/17097 ) Change subject: KUDU-2612: add PartitionLock and ScopedPartitionLock .. Patch Set 6: (7 comments) http://gerrit.cloudera.org:8080/#/c/17097/6/src/kudu/tablet/lock_manager-test.cc File src/kudu/tablet/lock_manager-test.cc: http://gerrit.cloudera.org:8080/#/c/17097/6/src/kudu/tablet/lock_manager-test.cc@158 PS6, Line 158: ASSERT_EQ(1, lock_manager_.partition_lock_refs()); > Looking at this as a block box, does it make sense to check that the first Done http://gerrit.cloudera.org:8080/#/c/17097/6/src/kudu/tablet/lock_manager-test.cc@252 PS6, Line 252: if (acquired) { > Why it might not be acquired? Is it possible? This might happen if the non-transactional op take the op first. http://gerrit.cloudera.org:8080/#/c/17097/6/src/kudu/tablet/lock_manager-test.cc@267 PS6, Line 267: if (acquired) { : CHECK_EQ(TabletServerErrorPB::UNKNOWN_ERROR, code); : CHECK_EQ(1, lock_manager_.partition_lock_refs()); : } > I'm not sure I understand this piece if reading the comment for the thread' Sorry forget to update the comment.. Updated it. http://gerrit.cloudera.org:8080/#/c/17097/6/src/kudu/tablet/lock_manager.h File src/kudu/tablet/lock_manager.h: http://gerrit.cloudera.org:8080/#/c/17097/6/src/kudu/tablet/lock_manager.h@193 PS6, Line 193: ScopedPartitionLock > What if an instance of this class is copied? Would it result in incorrect I don't think so, since we only update the reference at the constructor. But I disabled the copy constructor anyway. http://gerrit.cloudera.org:8080/#/c/17097/6/src/kudu/tablet/lock_manager.cc File src/kudu/tablet/lock_manager.cc: http://gerrit.cloudera.org:8080/#/c/17097/6/src/kudu/tablet/lock_manager.cc@448 PS6, Line 448: kMaxBackoffExp > nit: this might be a constexpr Ack http://gerrit.cloudera.org:8080/#/c/17097/6/src/kudu/tablet/lock_manager.cc@496 PS6, Line 496: ms > Is it in seconds or milliseconds? Ah, sorry for the confusion. It is in milliseconds, since I think we don't need to wait at the second level, as the caller() should already know the lock can be acquired (fast) with TryAcquirePartitionLock() succeeded previously. WDYT? I moved the log to TryAcquirePartitionLock() to avoid the report being screwed. http://gerrit.cloudera.org:8080/#/c/17097/6/src/kudu/tablet/lock_manager.cc@520 PS6, Line 520: partition_lock_refs_ -= 1; : DCHECK_GE(partition_lock_refs_, 0); : if (partition_lock_refs_ == 0) { : partition_lock_.reset(); : } > nit: this might be shortened to Done -- To view, visit http://gerrit.cloudera.org:8080/17097 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: kudu Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I158115739ce3e7cfb77bbcb854e834336c1256b1 Gerrit-Change-Number: 17097 Gerrit-PatchSet: 6 Gerrit-Owner: Hao Hao Gerrit-Reviewer: Alexey Serbin Gerrit-Reviewer: Andrew Wong Gerrit-Reviewer: Bankim Bhavsar Gerrit-Reviewer: Hao Hao Gerrit-Reviewer: Kudu Jenkins (120) Gerrit-Reviewer: Tidy Bot (241) Gerrit-Comment-Date: Fri, 12 Mar 2021 06:28:23 + Gerrit-HasComments: Yes
[kudu-CR] [tool] KUDU-3226 Validate List of Masters kudu ksck
Andrew Wong has posted comments on this change. ( http://gerrit.cloudera.org:8080/17141 ) Change subject: [tool] KUDU-3226 Validate List of Masters kudu ksck .. Patch Set 10: Thanks for the patch Abhishek! -- To view, visit http://gerrit.cloudera.org:8080/17141 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: kudu Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I2f3b2b7dcf2ac78cb95cf43242651e3ce8fddf6f Gerrit-Change-Number: 17141 Gerrit-PatchSet: 10 Gerrit-Owner: Abhishek Chennaka Gerrit-Reviewer: Abhishek Chennaka Gerrit-Reviewer: Alexey Serbin Gerrit-Reviewer: Andrew Wong Gerrit-Reviewer: Attila Bukor Gerrit-Reviewer: Bankim Bhavsar Gerrit-Reviewer: Grant Henke Gerrit-Reviewer: Kudu Jenkins (120) Gerrit-Comment-Date: Fri, 12 Mar 2021 05:29:14 + Gerrit-HasComments: No
[kudu-CR] [tool] KUDU-3226 Validate List of Masters kudu ksck
Andrew Wong has posted comments on this change. ( http://gerrit.cloudera.org:8080/17141 ) Change subject: [tool] KUDU-3226 Validate List of Masters kudu ksck .. Patch Set 9: Code-Review+2 -- To view, visit http://gerrit.cloudera.org:8080/17141 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: kudu Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I2f3b2b7dcf2ac78cb95cf43242651e3ce8fddf6f Gerrit-Change-Number: 17141 Gerrit-PatchSet: 9 Gerrit-Owner: Abhishek Chennaka Gerrit-Reviewer: Abhishek Chennaka Gerrit-Reviewer: Alexey Serbin Gerrit-Reviewer: Andrew Wong Gerrit-Reviewer: Attila Bukor Gerrit-Reviewer: Bankim Bhavsar Gerrit-Reviewer: Grant Henke Gerrit-Reviewer: Kudu Jenkins (120) Gerrit-Comment-Date: Fri, 12 Mar 2021 05:28:47 + Gerrit-HasComments: No
[kudu-CR] [tool] KUDU-3226 Validate List of Masters kudu ksck
Andrew Wong has submitted this change and it was merged. ( http://gerrit.cloudera.org:8080/17141 ) Change subject: [tool] KUDU-3226 Validate List of Masters kudu ksck .. [tool] KUDU-3226 Validate List of Masters kudu ksck This patch checks for duplicate kudu masters provided in all of the kudu command line tools which call ParseMasterAddresses(). Currently ksck and rebalance (as rebalance calls ksck internally) are the only commands which throw the reported stack trace as ksck constructs a map internally for the kudu masters and bombs once it sees duplicate UUIDs. Other commands do not report any issue currently with duplicate masters. The patch only does a string comparison check to detect and report all the duplicate master(/s). If a user provides the IP address and hostname of the same master, it won't be able to catch the duplicates. That seems more of a deliberate attempt rather than a common mistake a user could make and hence not considering that case. Also wrote a simple test case to test this functionality. Decided not to spin up a test cluster for this as this is more of client side check and a master is not actually contacted. This change got conflicted with another test i.e. AdminCliTest.TestAuthzResetCacheIncorrectMasterAddressList. Updated this test as well to by creating a new external mini cluster with three masters and running the command in the test with just one master Change-Id: I2f3b2b7dcf2ac78cb95cf43242651e3ce8fddf6f Reviewed-on: http://gerrit.cloudera.org:8080/17141 Reviewed-by: Alexey Serbin Tested-by: Kudu Jenkins Reviewed-by: Andrew Wong --- M src/kudu/tools/kudu-admin-test.cc M src/kudu/tools/kudu-tool-test.cc M src/kudu/tools/tool_action_common.cc 3 files changed, 43 insertions(+), 10 deletions(-) Approvals: Alexey Serbin: Looks good to me, approved Kudu Jenkins: Verified Andrew Wong: Looks good to me, approved -- To view, visit http://gerrit.cloudera.org:8080/17141 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: kudu Gerrit-Branch: master Gerrit-MessageType: merged Gerrit-Change-Id: I2f3b2b7dcf2ac78cb95cf43242651e3ce8fddf6f Gerrit-Change-Number: 17141 Gerrit-PatchSet: 10 Gerrit-Owner: Abhishek Chennaka Gerrit-Reviewer: Abhishek Chennaka Gerrit-Reviewer: Alexey Serbin Gerrit-Reviewer: Andrew Wong Gerrit-Reviewer: Attila Bukor Gerrit-Reviewer: Bankim Bhavsar Gerrit-Reviewer: Grant Henke Gerrit-Reviewer: Kudu Jenkins (120)
[kudu-CR] KUDU-2612: add PartitionLock and ScopedPartitionLock
Hao Hao has posted comments on this change. ( http://gerrit.cloudera.org:8080/17097 ) Change subject: KUDU-2612: add PartitionLock and ScopedPartitionLock .. Patch Set 6: (1 comment) http://gerrit.cloudera.org:8080/#/c/17097/6/src/kudu/tablet/lock_manager.cc File src/kudu/tablet/lock_manager.cc: http://gerrit.cloudera.org:8080/#/c/17097/6/src/kudu/tablet/lock_manager.cc@485 PS6, Line 485: SleepFor(MonoDelta::FromMilliseconds(1L << backoff_exp)); > The point here is that the lock based on spinlock is taken at line 452 and Ah I though you are referring to the partition lock. Makes sense, updated it. -- To view, visit http://gerrit.cloudera.org:8080/17097 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: kudu Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I158115739ce3e7cfb77bbcb854e834336c1256b1 Gerrit-Change-Number: 17097 Gerrit-PatchSet: 6 Gerrit-Owner: Hao Hao Gerrit-Reviewer: Alexey Serbin Gerrit-Reviewer: Andrew Wong Gerrit-Reviewer: Bankim Bhavsar Gerrit-Reviewer: Hao Hao Gerrit-Reviewer: Kudu Jenkins (120) Gerrit-Reviewer: Tidy Bot (241) Gerrit-Comment-Date: Fri, 12 Mar 2021 05:03:49 + Gerrit-HasComments: Yes
[kudu-CR] [tool] KUDU-3226 Validate List of Masters kudu ksck
Alexey Serbin has posted comments on this change. ( http://gerrit.cloudera.org:8080/17141 ) Change subject: [tool] KUDU-3226 Validate List of Masters kudu ksck .. Patch Set 9: Code-Review+2 -- To view, visit http://gerrit.cloudera.org:8080/17141 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: kudu Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I2f3b2b7dcf2ac78cb95cf43242651e3ce8fddf6f Gerrit-Change-Number: 17141 Gerrit-PatchSet: 9 Gerrit-Owner: Abhishek Chennaka Gerrit-Reviewer: Abhishek Chennaka Gerrit-Reviewer: Alexey Serbin Gerrit-Reviewer: Andrew Wong Gerrit-Reviewer: Attila Bukor Gerrit-Reviewer: Bankim Bhavsar Gerrit-Reviewer: Grant Henke Gerrit-Reviewer: Kudu Jenkins (120) Gerrit-Comment-Date: Fri, 12 Mar 2021 05:03:16 + Gerrit-HasComments: No
[kudu-CR] [tool] KUDU-3226 Validate List of Masters kudu ksck
Hello Alexey Serbin, Attila Bukor, Kudu Jenkins, Andrew Wong, Grant Henke, Bankim Bhavsar, I'd like you to reexamine a change. Please visit http://gerrit.cloudera.org:8080/17141 to look at the new patch set (#9). Change subject: [tool] KUDU-3226 Validate List of Masters kudu ksck .. [tool] KUDU-3226 Validate List of Masters kudu ksck This patch checks for duplicate kudu masters provided in all of the kudu command line tools which call ParseMasterAddresses(). Currently ksck and rebalance (as rebalance calls ksck internally) are the only commands which throw the reported stack trace as ksck constructs a map internally for the kudu masters and bombs once it sees duplicate UUIDs. Other commands do not report any issue currently with duplicate masters. The patch only does a string comparison check to detect and report all the duplicate master(/s). If a user provides the IP address and hostname of the same master, it won't be able to catch the duplicates. That seems more of a deliberate attempt rather than a common mistake a user could make and hence not considering that case. Also wrote a simple test case to test this functionality. Decided not to spin up a test cluster for this as this is more of client side check and a master is not actually contacted. This change got conflicted with another test i.e. AdminCliTest.TestAuthzResetCacheIncorrectMasterAddressList. Updated this test as well to by creating a new external mini cluster with three masters and running the command in the test with just one master Change-Id: I2f3b2b7dcf2ac78cb95cf43242651e3ce8fddf6f --- M src/kudu/tools/kudu-admin-test.cc M src/kudu/tools/kudu-tool-test.cc M src/kudu/tools/tool_action_common.cc 3 files changed, 43 insertions(+), 10 deletions(-) git pull ssh://gerrit.cloudera.org:29418/kudu refs/changes/41/17141/9 -- To view, visit http://gerrit.cloudera.org:8080/17141 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: kudu Gerrit-Branch: master Gerrit-MessageType: newpatchset Gerrit-Change-Id: I2f3b2b7dcf2ac78cb95cf43242651e3ce8fddf6f Gerrit-Change-Number: 17141 Gerrit-PatchSet: 9 Gerrit-Owner: Abhishek Chennaka Gerrit-Reviewer: Abhishek Chennaka Gerrit-Reviewer: Alexey Serbin Gerrit-Reviewer: Andrew Wong Gerrit-Reviewer: Attila Bukor Gerrit-Reviewer: Bankim Bhavsar Gerrit-Reviewer: Grant Henke Gerrit-Reviewer: Kudu Jenkins (120)
[kudu-CR] KUDU-2612: add PartitionLock and ScopedPartitionLock
Alexey Serbin has posted comments on this change. ( http://gerrit.cloudera.org:8080/17097 ) Change subject: KUDU-2612: add PartitionLock and ScopedPartitionLock .. Patch Set 6: (1 comment) http://gerrit.cloudera.org:8080/#/c/17097/6/src/kudu/tablet/lock_manager.cc File src/kudu/tablet/lock_manager.cc: http://gerrit.cloudera.org:8080/#/c/17097/6/src/kudu/tablet/lock_manager.cc@485 PS6, Line 485: SleepFor(MonoDelta::FromMilliseconds(1L << backoff_exp)); > Hmm, we are only sleeping if we cannot hold the lock. But I will try to use The point here is that the lock based on spinlock is taken at line 452 and hold when SleepFor() is executed at line 485. This is a no-no. -- To view, visit http://gerrit.cloudera.org:8080/17097 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: kudu Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I158115739ce3e7cfb77bbcb854e834336c1256b1 Gerrit-Change-Number: 17097 Gerrit-PatchSet: 6 Gerrit-Owner: Hao Hao Gerrit-Reviewer: Alexey Serbin Gerrit-Reviewer: Andrew Wong Gerrit-Reviewer: Bankim Bhavsar Gerrit-Reviewer: Hao Hao Gerrit-Reviewer: Kudu Jenkins (120) Gerrit-Reviewer: Tidy Bot (241) Gerrit-Comment-Date: Fri, 12 Mar 2021 04:30:41 + Gerrit-HasComments: Yes
[kudu-CR] KUDU-2612: add PartitionLock and ScopedPartitionLock
Hao Hao has posted comments on this change. ( http://gerrit.cloudera.org:8080/17097 ) Change subject: KUDU-2612: add PartitionLock and ScopedPartitionLock .. Patch Set 6: (1 comment) http://gerrit.cloudera.org:8080/#/c/17097/6/src/kudu/tablet/lock_manager.cc File src/kudu/tablet/lock_manager.cc: http://gerrit.cloudera.org:8080/#/c/17097/6/src/kudu/tablet/lock_manager.cc@485 PS6, Line 485: SleepFor(MonoDelta::FromMilliseconds(1L << backoff_exp)); > BTW, why to hold that spinlock-based lock when sleeping (i.e., why not to r Hmm, we are only sleeping if we cannot hold the lock. But I will try to use a semaphore instead. -- To view, visit http://gerrit.cloudera.org:8080/17097 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: kudu Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I158115739ce3e7cfb77bbcb854e834336c1256b1 Gerrit-Change-Number: 17097 Gerrit-PatchSet: 6 Gerrit-Owner: Hao Hao Gerrit-Reviewer: Alexey Serbin Gerrit-Reviewer: Andrew Wong Gerrit-Reviewer: Bankim Bhavsar Gerrit-Reviewer: Hao Hao Gerrit-Reviewer: Kudu Jenkins (120) Gerrit-Reviewer: Tidy Bot (241) Gerrit-Comment-Date: Fri, 12 Mar 2021 03:56:59 + Gerrit-HasComments: Yes
[kudu-CR] [tool] Add the missing "member type" in help text of list master CLI
Alexey Serbin has posted comments on this change. ( http://gerrit.cloudera.org:8080/17175 ) Change subject: [tool] Add the missing "member_type" in help text of list master CLI .. Patch Set 1: Code-Review+1 (1 comment) LGTM, just one quick question http://gerrit.cloudera.org:8080/#/c/17175/1/src/kudu/tools/kudu-tool-test.cc File src/kudu/tools/kudu-tool-test.cc: http://gerrit.cloudera.org:8080/#/c/17175/1/src/kudu/tools/kudu-tool-test.cc@3347 PS1, Line 3347: VOTER If it were a non-voter, what should be the output? Is this supposed to fail if for some reason NON_VOTER is output for a single master? :) -- To view, visit http://gerrit.cloudera.org:8080/17175 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: kudu Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I00f349687866269987691f18aa85e5bf02d5df81 Gerrit-Change-Number: 17175 Gerrit-PatchSet: 1 Gerrit-Owner: Bankim Bhavsar Gerrit-Reviewer: Alexey Serbin Gerrit-Reviewer: Kudu Jenkins (120) Gerrit-Comment-Date: Fri, 12 Mar 2021 03:56:46 + Gerrit-HasComments: Yes
[kudu-CR] KUDU-2612: add PartitionLock and ScopedPartitionLock
Alexey Serbin has posted comments on this change. ( http://gerrit.cloudera.org:8080/17097 ) Change subject: KUDU-2612: add PartitionLock and ScopedPartitionLock .. Patch Set 6: (8 comments) http://gerrit.cloudera.org:8080/#/c/17097/6/src/kudu/tablet/lock_manager-test.cc File src/kudu/tablet/lock_manager-test.cc: http://gerrit.cloudera.org:8080/#/c/17097/6/src/kudu/tablet/lock_manager-test.cc@158 PS6, Line 158: ASSERT_EQ(1, lock_manager_.partition_lock_refs()); Looking at this as a block box, does it make sense to check that the first lock is still held even if the other went out of the scope? http://gerrit.cloudera.org:8080/#/c/17097/6/src/kudu/tablet/lock_manager-test.cc@252 PS6, Line 252: if (acquired) { Why it might not be acquired? Is it possible? http://gerrit.cloudera.org:8080/#/c/17097/6/src/kudu/tablet/lock_manager-test.cc@267 PS6, Line 267: if (acquired) { : CHECK_EQ(TabletServerErrorPB::UNKNOWN_ERROR, code); : CHECK_EQ(1, lock_manager_.partition_lock_refs()); : } I'm not sure I understand this piece if reading the comment for the thread's functor: it seems the lock should not be ever acquired. If so, why to check for acquired at all? http://gerrit.cloudera.org:8080/#/c/17097/6/src/kudu/tablet/lock_manager.h File src/kudu/tablet/lock_manager.h: http://gerrit.cloudera.org:8080/#/c/17097/6/src/kudu/tablet/lock_manager.h@193 PS6, Line 193: ScopedPartitionLock What if an instance of this class is copied? Would it result in incorrect lock reference count for the underlying PartitionLock? http://gerrit.cloudera.org:8080/#/c/17097/6/src/kudu/tablet/lock_manager.cc File src/kudu/tablet/lock_manager.cc: http://gerrit.cloudera.org:8080/#/c/17097/6/src/kudu/tablet/lock_manager.cc@448 PS6, Line 448: kMaxBackoffExp nit: this might be a constexpr http://gerrit.cloudera.org:8080/#/c/17097/6/src/kudu/tablet/lock_manager.cc@485 PS6, Line 485: SleepFor(MonoDelta::FromMilliseconds(1L << backoff_exp)); > It's bad practice to sleep in a thread that is holding a lock. Is there any BTW, why to hold that spinlock-based lock when sleeping (i.e., why not to release it before going to sleep) -- it's possible to re-acquire it on the next iteration of the loop. Anywas, I guess using Semaphore::TimedAcquire() might be the best option as Andrew suggested. http://gerrit.cloudera.org:8080/#/c/17097/6/src/kudu/tablet/lock_manager.cc@496 PS6, Line 496: ms Is it in seconds or milliseconds? I guess the report might be skewed because thread might be put to sleep for longer. Would it make sense just to use KLOG_EVERY_N_SECS() here instead, so the wait interval passed to TryAcquirePartitionLock() could be arbitrary? http://gerrit.cloudera.org:8080/#/c/17097/6/src/kudu/tablet/lock_manager.cc@520 PS6, Line 520: partition_lock_refs_ -= 1; : DCHECK_GE(partition_lock_refs_, 0); : if (partition_lock_refs_ == 0) { : partition_lock_.reset(); : } nit: this might be shortened to DCHECK_GT(partition_lock_refs_, 0); if (--partition_lock_refs == 0) { partition_lock_.reset(); } -- To view, visit http://gerrit.cloudera.org:8080/17097 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: kudu Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I158115739ce3e7cfb77bbcb854e834336c1256b1 Gerrit-Change-Number: 17097 Gerrit-PatchSet: 6 Gerrit-Owner: Hao Hao Gerrit-Reviewer: Alexey Serbin Gerrit-Reviewer: Andrew Wong Gerrit-Reviewer: Bankim Bhavsar Gerrit-Reviewer: Hao Hao Gerrit-Reviewer: Kudu Jenkins (120) Gerrit-Reviewer: Tidy Bot (241) Gerrit-Comment-Date: Fri, 12 Mar 2021 03:51:58 + Gerrit-HasComments: Yes
[kudu-CR] KUDU-2671: No-op ClientServerMapping if only one schema exists.
Andrew Wong has submitted this change and it was merged. ( http://gerrit.cloudera.org:8080/17161 ) Change subject: KUDU-2671: No-op ClientServerMapping if only one schema exists. .. KUDU-2671: No-op ClientServerMapping if only one schema exists. For the new field of PartitionSchema, a client schema is necessary for RowOperationsPBDecoder to populate the new field. Not all call sites for PartitionSchema::FromPB() have access to an explicit client schema. This patch allows the client schema to have column IDs if it's equal to the tablet schema and no-ops the ClientServerMapping in this case. If we don't have access to a client schema, we don't need a mapping since we only have one schema. Change-Id: I836f157201509a9b38a3ad42d653236f240dda5e Reviewed-on: http://gerrit.cloudera.org:8080/17161 Reviewed-by: Andrew Wong Tested-by: Andrew Wong --- M src/kudu/common/row_operations-test.cc M src/kudu/common/row_operations.cc M src/kudu/common/row_operations.h 3 files changed, 43 insertions(+), 14 deletions(-) Approvals: Andrew Wong: Looks good to me, approved; Verified -- To view, visit http://gerrit.cloudera.org:8080/17161 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: kudu Gerrit-Branch: master Gerrit-MessageType: merged Gerrit-Change-Id: I836f157201509a9b38a3ad42d653236f240dda5e Gerrit-Change-Number: 17161 Gerrit-PatchSet: 7 Gerrit-Owner: Mahesh Reddy Gerrit-Reviewer: Alexey Serbin Gerrit-Reviewer: Andrew Wong Gerrit-Reviewer: Kudu Jenkins (120) Gerrit-Reviewer: Mahesh Reddy
[kudu-CR] KUDU-2671: No-op ClientServerMapping if only one schema exists.
Andrew Wong has posted comments on this change. ( http://gerrit.cloudera.org:8080/17161 ) Change subject: KUDU-2671: No-op ClientServerMapping if only one schema exists. .. Patch Set 6: Verified+1 Code-Review+2 -- To view, visit http://gerrit.cloudera.org:8080/17161 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: kudu Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I836f157201509a9b38a3ad42d653236f240dda5e Gerrit-Change-Number: 17161 Gerrit-PatchSet: 6 Gerrit-Owner: Mahesh Reddy Gerrit-Reviewer: Alexey Serbin Gerrit-Reviewer: Andrew Wong Gerrit-Reviewer: Kudu Jenkins (120) Gerrit-Reviewer: Mahesh Reddy Gerrit-Comment-Date: Fri, 12 Mar 2021 01:05:08 + Gerrit-HasComments: No
[kudu-CR] KUDU-2671: No-op ClientServerMapping if only one schema exists.
Andrew Wong has removed a vote on this change. Change subject: KUDU-2671: No-op ClientServerMapping if only one schema exists. .. Removed Verified-1 by Kudu Jenkins (120) -- To view, visit http://gerrit.cloudera.org:8080/17161 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: kudu Gerrit-Branch: master Gerrit-MessageType: deleteVote Gerrit-Change-Id: I836f157201509a9b38a3ad42d653236f240dda5e Gerrit-Change-Number: 17161 Gerrit-PatchSet: 6 Gerrit-Owner: Mahesh Reddy Gerrit-Reviewer: Alexey Serbin Gerrit-Reviewer: Andrew Wong Gerrit-Reviewer: Kudu Jenkins (120) Gerrit-Reviewer: Mahesh Reddy
[kudu-CR] KUDU-2671: No-op ClientServerMapping if only one schema exists.
Mahesh Reddy has posted comments on this change. ( http://gerrit.cloudera.org:8080/17161 ) Change subject: KUDU-2671: No-op ClientServerMapping if only one schema exists. .. Patch Set 6: > Patch Set 6: Verified-1 > > Build Failed > > http://jenkins.kudu.apache.org/job/kudu-gerrit/23475/ : FAILURE Looks like the test failure is unrelated. -- To view, visit http://gerrit.cloudera.org:8080/17161 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: kudu Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I836f157201509a9b38a3ad42d653236f240dda5e Gerrit-Change-Number: 17161 Gerrit-PatchSet: 6 Gerrit-Owner: Mahesh Reddy Gerrit-Reviewer: Alexey Serbin Gerrit-Reviewer: Andrew Wong Gerrit-Reviewer: Kudu Jenkins (120) Gerrit-Reviewer: Mahesh Reddy Gerrit-Comment-Date: Fri, 12 Mar 2021 01:03:02 + Gerrit-HasComments: No
[kudu-CR] KUDU-2612: add PartitionLock and ScopedPartitionLock
Andrew Wong has posted comments on this change. ( http://gerrit.cloudera.org:8080/17097 ) Change subject: KUDU-2612: add PartitionLock and ScopedPartitionLock .. Patch Set 6: (1 comment) http://gerrit.cloudera.org:8080/#/c/17097/6/src/kudu/tablet/lock_manager.cc File src/kudu/tablet/lock_manager.cc: http://gerrit.cloudera.org:8080/#/c/17097/6/src/kudu/tablet/lock_manager.cc@485 PS6, Line 485: SleepFor(MonoDelta::FromMilliseconds(1L << backoff_exp)); It's bad practice to sleep in a thread that is holding a lock. Is there any way we could use a semaphore instead, like we do for the row locks? Perhaps add it to the LockManager as a member? -- To view, visit http://gerrit.cloudera.org:8080/17097 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: kudu Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I158115739ce3e7cfb77bbcb854e834336c1256b1 Gerrit-Change-Number: 17097 Gerrit-PatchSet: 6 Gerrit-Owner: Hao Hao Gerrit-Reviewer: Alexey Serbin Gerrit-Reviewer: Andrew Wong Gerrit-Reviewer: Bankim Bhavsar Gerrit-Reviewer: Hao Hao Gerrit-Reviewer: Kudu Jenkins (120) Gerrit-Reviewer: Tidy Bot (241) Gerrit-Comment-Date: Fri, 12 Mar 2021 00:53:39 + Gerrit-HasComments: Yes
[kudu-CR] [tool] KUDU-3226 Validate List of Masters kudu ksck
Alexey Serbin has posted comments on this change. ( http://gerrit.cloudera.org:8080/17141 ) Change subject: [tool] KUDU-3226 Validate List of Masters kudu ksck .. Patch Set 8: Code-Review+2 LGTM. Please address the nits pointed by Bankim. -- To view, visit http://gerrit.cloudera.org:8080/17141 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: kudu Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I2f3b2b7dcf2ac78cb95cf43242651e3ce8fddf6f Gerrit-Change-Number: 17141 Gerrit-PatchSet: 8 Gerrit-Owner: Abhishek Chennaka Gerrit-Reviewer: Abhishek Chennaka Gerrit-Reviewer: Alexey Serbin Gerrit-Reviewer: Andrew Wong Gerrit-Reviewer: Attila Bukor Gerrit-Reviewer: Bankim Bhavsar Gerrit-Reviewer: Grant Henke Gerrit-Reviewer: Kudu Jenkins (120) Gerrit-Comment-Date: Fri, 12 Mar 2021 00:30:09 + Gerrit-HasComments: No
[kudu-CR] KUDU-2612: add PartitionLock and ScopedPartitionLock
Hao Hao has posted comments on this change. ( http://gerrit.cloudera.org:8080/17097 ) Change subject: KUDU-2612: add PartitionLock and ScopedPartitionLock .. Patch Set 5: (9 comments) http://gerrit.cloudera.org:8080/#/c/17097/5/src/kudu/tablet/lock_manager-test.cc File src/kudu/tablet/lock_manager-test.cc: PS5: > Could you also add some (maybe multithreaded) tests for the Wait... functio Ah right, missed that. http://gerrit.cloudera.org:8080/#/c/17097/5/src/kudu/tablet/lock_manager-test.cc@261 PS5, Line 261: partition_lock.IsAcquired() > Ah, I guess you can use the moved instances, but maybe add specific NOLINT Ack http://gerrit.cloudera.org:8080/#/c/17097/5/src/kudu/tablet/lock_manager.h File src/kudu/tablet/lock_manager.h: http://gerrit.cloudera.org:8080/#/c/17097/5/src/kudu/tablet/lock_manager.h@81 PS5, Line 81: // Similar to the above, but wait until the lock is acquired. > nit: could you mention that the caller is expected to ensure there is no de Done http://gerrit.cloudera.org:8080/#/c/17097/5/src/kudu/tablet/lock_manager.cc File src/kudu/tablet/lock_manager.cc: http://gerrit.cloudera.org:8080/#/c/17097/5/src/kudu/tablet/lock_manager.cc@367 PS5, Line 367: bool must_acquire > nit: maybe use an enum? Done http://gerrit.cloudera.org:8080/#/c/17097/5/src/kudu/tablet/lock_manager.cc@372 PS5, Line 372: DCHECK_NOTNULL(lock_); > nit: can just used DCHECK(lock_) Done http://gerrit.cloudera.org:8080/#/c/17097/5/src/kudu/tablet/lock_manager.cc@409 PS5, Line 409: manager_ = other->manager_; > nit: add a DCHECK() on other != this, just to catch programming errors. Done http://gerrit.cloudera.org:8080/#/c/17097/5/src/kudu/tablet/lock_manager.cc@437 PS5, Line 437: if (!txn_id.IsValid()) { : id = std::numeric_limits::max(); : } else { : id = txn_id.value(); : } > nit: this might be done using tri-state operator: Done http://gerrit.cloudera.org:8080/#/c/17097/5/src/kudu/tablet/lock_manager.cc@473 PS5, Line 473: PartitionLock* LockManager::WaitUntilAcquiredPartitionLock(const TxnId& txn_id) { > What happens when this is running for too long, going over the overall time Yeah, this is used for case where we expect the lock can be acquired eventually (e.g. for a follower to replicate the participanOp). http://gerrit.cloudera.org:8080/#/c/17097/5/src/kudu/tablet/lock_manager.cc@478 PS5, Line 478: (lock = TryAcquirePartitionLock(txn_id, )) > Would it make sense to extend the TryAcquirePartitionLock() method with a t Makes sense, updated it. -- To view, visit http://gerrit.cloudera.org:8080/17097 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: kudu Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I158115739ce3e7cfb77bbcb854e834336c1256b1 Gerrit-Change-Number: 17097 Gerrit-PatchSet: 5 Gerrit-Owner: Hao Hao Gerrit-Reviewer: Alexey Serbin Gerrit-Reviewer: Andrew Wong Gerrit-Reviewer: Bankim Bhavsar Gerrit-Reviewer: Hao Hao Gerrit-Reviewer: Kudu Jenkins (120) Gerrit-Reviewer: Tidy Bot (241) Gerrit-Comment-Date: Fri, 12 Mar 2021 00:18:21 + Gerrit-HasComments: Yes
[kudu-CR] KUDU-2612: add PartitionLock and ScopedPartitionLock
Hello Tidy Bot, Alexey Serbin, Kudu Jenkins, Andrew Wong, Bankim Bhavsar, I'd like you to reexamine a change. Please visit http://gerrit.cloudera.org:8080/17097 to look at the new patch set (#6). Change subject: KUDU-2612: add PartitionLock and ScopedPartitionLock .. KUDU-2612: add PartitionLock and ScopedPartitionLock This patch introduces a coarse grained partition level lock, PartitionLock to prevent dirty writes for multi-rows transaction. A partition lock can only be held by a single transaction (or write op) at a time, the same transaction can acquire the lock for multiple times. To prevent deadlock, 'wait-die' scheme is used, which if the transaction requires a lock held by another transaction, 1) abort the transaction immediately if it has a higher txn ID than the other one. 2) Otherwise, retry the write op (or participant op) of the transaction. A ScopedPartitionLock is also introduced to automatically manage the lifecycle of a PartitionLock. Change-Id: I158115739ce3e7cfb77bbcb854e834336c1256b1 --- M src/kudu/tablet/lock_manager-test.cc M src/kudu/tablet/lock_manager.cc M src/kudu/tablet/lock_manager.h M src/kudu/tserver/tserver.proto 4 files changed, 457 insertions(+), 13 deletions(-) git pull ssh://gerrit.cloudera.org:29418/kudu refs/changes/97/17097/6 -- To view, visit http://gerrit.cloudera.org:8080/17097 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: kudu Gerrit-Branch: master Gerrit-MessageType: newpatchset Gerrit-Change-Id: I158115739ce3e7cfb77bbcb854e834336c1256b1 Gerrit-Change-Number: 17097 Gerrit-PatchSet: 6 Gerrit-Owner: Hao Hao Gerrit-Reviewer: Alexey Serbin Gerrit-Reviewer: Andrew Wong Gerrit-Reviewer: Bankim Bhavsar Gerrit-Reviewer: Hao Hao Gerrit-Reviewer: Kudu Jenkins (120) Gerrit-Reviewer: Tidy Bot (241)
[kudu-CR] [master] Check for 'last known addr' field when adding master
Bankim Bhavsar has uploaded this change for review. ( http://gerrit.cloudera.org:8080/17176 Change subject: [master] Check for 'last_known_addr' field when adding master .. [master] Check for 'last_known_addr' field when adding master In a single master configuration, the 'last_known_addr' field may not be populated in the Raft config and Raft implementation will throw an error on adding a new master which is not actionable. Hence adding an explicit check with an actionable error message to fix the problem. Change-Id: Ic991c82acbb2a2149f647b5c1a5b323fecb958cb --- M src/kudu/master/dynamic_multi_master-test.cc M src/kudu/master/master.cc 2 files changed, 21 insertions(+), 2 deletions(-) git pull ssh://gerrit.cloudera.org:29418/kudu refs/changes/76/17176/1 -- To view, visit http://gerrit.cloudera.org:8080/17176 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: kudu Gerrit-Branch: master Gerrit-MessageType: newchange Gerrit-Change-Id: Ic991c82acbb2a2149f647b5c1a5b323fecb958cb Gerrit-Change-Number: 17176 Gerrit-PatchSet: 1 Gerrit-Owner: Bankim Bhavsar
[kudu-CR] KUDU-2612: acquire and release partition lock
Alexey Serbin has posted comments on this change. ( http://gerrit.cloudera.org:8080/17159 ) Change subject: KUDU-2612: acquire and release partition lock .. Patch Set 3: (11 comments) http://gerrit.cloudera.org:8080/#/c/17159/3//COMMIT_MSG Commit Message: http://gerrit.cloudera.org:8080/#/c/17159/3//COMMIT_MSG@7 PS3, Line 7: acquire and release partition lock Do you plan to add an end-to-end test scenario into client.cc or alike to verify how this works if looking from the client side? http://gerrit.cloudera.org:8080/#/c/17159/3//COMMIT_MSG@13 PS3, Line 13: parition partition http://gerrit.cloudera.org:8080/#/c/17159/3//COMMIT_MSG@16 PS3, Line 16: will be aborted or retried Do you mind extending this a bit to explain what actor is to abort or retry corresponding transaction/write op? http://gerrit.cloudera.org:8080/#/c/17159/3/src/kudu/client/batcher.cc File src/kudu/client/batcher.cc: http://gerrit.cloudera.org:8080/#/c/17159/3/src/kudu/client/batcher.cc@530 PS3, Line 530: //Status::Abort() originated from TabletServerErrorPB::TXN_LOCKED_ABORT : //response are needlessly retried. : > Isn't this handled above? Or do you mean TXN_LOCKED_RETRY? We'd probably be +1 If that's about TXN_LOCKED_RETRY, I agree that it's better to handle TXN_LOCKED_RETRY as a specific error code than generic IsIllegalState() http://gerrit.cloudera.org:8080/#/c/17159/3/src/kudu/integration-tests/txn_participant-itest.cc File src/kudu/integration-tests/txn_participant-itest.cc: http://gerrit.cloudera.org:8080/#/c/17159/3/src/kudu/integration-tests/txn_participant-itest.cc@813 PS3, Line 813: TEST_F(TxnParticipantITest, TestTxnSystemClientBeginTxnLockAbort) { Do you mind adding a scenario with calling ParticipateInTransaction() for the same txn_id multiple times? http://gerrit.cloudera.org:8080/#/c/17159/3/src/kudu/integration-tests/txn_participant-itest.cc@833 PS3, Line 833: s.IsAborted Hmm, interesting: what actor is to abort the transaction? I'm curious about the bigger picture: I was under impression that it's the TxnManager who takes care of that, right? Could you add a comment clarifying that at this point kSecondTxn isn't aborted yet (well, it's not even started, actually, right)? http://gerrit.cloudera.org:8080/#/c/17159/3/src/kudu/integration-tests/txn_participant-itest.cc@930 PS3, Line 930: FLAGS_enable_txn_partition_lock = false; It would be great if you could add a note explaining that this scenario became a fully synthetic one with the introduction of coarse-grain locking. It might be a TODO reminding to remove the override for the flag once fine-grained locking appears. http://gerrit.cloudera.org:8080/#/c/17159/3/src/kudu/tablet/ops/participant_op.h File src/kudu/tablet/ops/participant_op.h: http://gerrit.cloudera.org:8080/#/c/17159/3/src/kudu/tablet/ops/participant_op.h@75 PS3, Line 75: AcquirePartitionLock Can this method be invoked as a part of larger operation which might have a deadline? If so, should this method take an optional 'deadline' parameter as well? http://gerrit.cloudera.org:8080/#/c/17159/3/src/kudu/tablet/ops/participant_op.cc File src/kudu/tablet/ops/participant_op.cc: http://gerrit.cloudera.org:8080/#/c/17159/3/src/kudu/tablet/ops/participant_op.cc@123 PS3, Line 123: TRACE("Released partition lock"); Probably, we want to distinguish in the trace whether this is was a no-op or actual release of the lock, no? http://gerrit.cloudera.org:8080/#/c/17159/3/src/kudu/tablet/tablet.h File src/kudu/tablet/tablet.h: http://gerrit.cloudera.org:8080/#/c/17159/3/src/kudu/tablet/tablet.h@165 PS3, Line 165: If 'must_acquire' is true, : // then wait until the lock is acquired Write operations are executed in the context of an RPC which have a timeout. What happens if a lock is still being acquired but the operation has timed out? Could this end up in accumulating too many ops in the queue? http://gerrit.cloudera.org:8080/#/c/17159/3/src/kudu/tablet/tablet.h@166 PS3, Line 166: return : // 'TXN_LOCKED_ABORT' or 'TXN_LOCKED_RETRY_OP' error How is it to return those if the return type of this method is Status? -- To view, visit http://gerrit.cloudera.org:8080/17159 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: kudu Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: If26733cae16810f3b3afd1fd05dcb984e6366939 Gerrit-Change-Number: 17159 Gerrit-PatchSet: 3 Gerrit-Owner: Hao Hao Gerrit-Reviewer: Alexey Serbin Gerrit-Reviewer: Andrew Wong Gerrit-Reviewer: Hao Hao Gerrit-Reviewer: Kudu Jenkins (120) Gerrit-Reviewer: Tidy Bot (241) Gerrit-Comment-Date: Thu, 11 Mar 2021 22:11:50 + Gerrit-HasComments: Yes
[kudu-CR] [tool] Add the missing "member type" in help text of list master CLI
Bankim Bhavsar has uploaded this change for review. ( http://gerrit.cloudera.org:8080/17175 Change subject: [tool] Add the missing "member_type" in help text of list master CLI .. [tool] Add the missing "member_type" in help text of list master CLI When "member_type" was introduced in the kudu master list CLI, missed adding the field to the column description. Verified "kudu master list --help" lists the optional "member_type" under the "columns" parameter. Change-Id: I00f349687866269987691f18aa85e5bf02d5df81 --- M src/kudu/tools/kudu-tool-test.cc M src/kudu/tools/tool_action_master.cc 2 files changed, 6 insertions(+), 2 deletions(-) git pull ssh://gerrit.cloudera.org:29418/kudu refs/changes/75/17175/1 -- To view, visit http://gerrit.cloudera.org:8080/17175 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: kudu Gerrit-Branch: master Gerrit-MessageType: newchange Gerrit-Change-Id: I00f349687866269987691f18aa85e5bf02d5df81 Gerrit-Change-Number: 17175 Gerrit-PatchSet: 1 Gerrit-Owner: Bankim Bhavsar
[kudu-CR] KUDU-2612: acquire and release partition lock
Andrew Wong has posted comments on this change. ( http://gerrit.cloudera.org:8080/17159 ) Change subject: KUDU-2612: acquire and release partition lock .. Patch Set 3: (8 comments) http://gerrit.cloudera.org:8080/#/c/17159/3/src/kudu/client/batcher.cc File src/kudu/client/batcher.cc: http://gerrit.cloudera.org:8080/#/c/17159/3/src/kudu/client/batcher.cc@530 PS3, Line 530: //Status::Abort() originated from TabletServerErrorPB::TXN_LOCKED_ABORT : //response are needlessly retried. : Isn't this handled above? Or do you mean TXN_LOCKED_RETRY? We'd probably be better off if we handled TXN_LOCKED_RETRY_OP explicitly and treated it as SERVICE_UNAVAILABLE or somesuch. http://gerrit.cloudera.org:8080/#/c/17159/3/src/kudu/integration-tests/txn_write_ops-itest.cc File src/kudu/integration-tests/txn_write_ops-itest.cc: http://gerrit.cloudera.org:8080/#/c/17159/3/src/kudu/integration-tests/txn_write_ops-itest.cc@1046 PS3, Line 1046: } nit: for completeness sake, maybe commit `second_txn` and ensure we no longer have dispatchers? Same below? http://gerrit.cloudera.org:8080/#/c/17159/2/src/kudu/tablet/ops/participant_op.h File src/kudu/tablet/ops/participant_op.h: http://gerrit.cloudera.org:8080/#/c/17159/2/src/kudu/tablet/ops/participant_op.h@75 PS2, Line 75: lock_manag nit: the lock wasn't acquired by this op, it was acquired by txn_. http://gerrit.cloudera.org:8080/#/c/17159/2/src/kudu/tablet/ops/participant_op.cc File src/kudu/tablet/ops/participant_op.cc: http://gerrit.cloudera.org:8080/#/c/17159/2/src/kudu/tablet/ops/participant_op.cc@108 PS2, Line 108: else { nit: use LOG(FATAL) or LOG(DFATAL). http://gerrit.cloudera.org:8080/#/c/17159/3/src/kudu/tablet/ops/write_op.cc File src/kudu/tablet/ops/write_op.cc: http://gerrit.cloudera.org:8080/#/c/17159/3/src/kudu/tablet/ops/write_op.cc@67 PS3, Line 67: DEFINE_bool(enable_write_op_partition_lock, true, : "Whether or not to enable partition lock for write op"); : TAG_FLAG(enable_write_op_partition_lock, advanced); : TAG_FLAG(enable_write_op_partition_lock, experimental); Could we just reuse --enable_txn_partition_lock? When might we want to set them to different values? http://gerrit.cloudera.org:8080/#/c/17159/3/src/kudu/tablet/ops/write_op.cc@569 PS3, Line 569: DCHECK(false) << "unexpected error code " << code; LOG(DFATAL) or LOG(FATAL)? http://gerrit.cloudera.org:8080/#/c/17159/3/src/kudu/tablet/txn_participant.h File src/kudu/tablet/txn_participant.h: http://gerrit.cloudera.org:8080/#/c/17159/3/src/kudu/tablet/txn_participant.h@220 PS3, Line 220: partition_lock_ = std::move(partition_lock); nit: maybe DCHECK or CHECK that we own the lock? http://gerrit.cloudera.org:8080/#/c/17159/3/src/kudu/transactions/participant_rpc.cc File src/kudu/transactions/participant_rpc.cc: http://gerrit.cloudera.org:8080/#/c/17159/3/src/kudu/transactions/participant_rpc.cc@180 PS3, Line 180: case TabletServerErrorPB::TXN_LOCKED_ABORT: Can we also explicitly handle TXN_LOCKED_RETRY_OP? -- To view, visit http://gerrit.cloudera.org:8080/17159 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: kudu Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: If26733cae16810f3b3afd1fd05dcb984e6366939 Gerrit-Change-Number: 17159 Gerrit-PatchSet: 3 Gerrit-Owner: Hao Hao Gerrit-Reviewer: Alexey Serbin Gerrit-Reviewer: Andrew Wong Gerrit-Reviewer: Hao Hao Gerrit-Reviewer: Kudu Jenkins (120) Gerrit-Reviewer: Tidy Bot (241) Gerrit-Comment-Date: Thu, 11 Mar 2021 20:58:26 + Gerrit-HasComments: Yes
[kudu-CR] [tool] KUDU-3226 Validate List of Masters kudu ksck
Attila Bukor has posted comments on this change. ( http://gerrit.cloudera.org:8080/17141 ) Change subject: [tool] KUDU-3226 Validate List of Masters kudu ksck .. Patch Set 8: Code-Review+1 Agree with Bankim's nits, otherwise, it looks good. -- To view, visit http://gerrit.cloudera.org:8080/17141 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: kudu Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I2f3b2b7dcf2ac78cb95cf43242651e3ce8fddf6f Gerrit-Change-Number: 17141 Gerrit-PatchSet: 8 Gerrit-Owner: Abhishek Chennaka Gerrit-Reviewer: Abhishek Chennaka Gerrit-Reviewer: Alexey Serbin Gerrit-Reviewer: Andrew Wong Gerrit-Reviewer: Attila Bukor Gerrit-Reviewer: Bankim Bhavsar Gerrit-Reviewer: Grant Henke Gerrit-Reviewer: Kudu Jenkins (120) Gerrit-Comment-Date: Thu, 11 Mar 2021 20:46:06 + Gerrit-HasComments: No
[kudu-CR] KUDU-2612: add PartitionLock and ScopedPartitionLock
Andrew Wong has posted comments on this change. ( http://gerrit.cloudera.org:8080/17097 ) Change subject: KUDU-2612: add PartitionLock and ScopedPartitionLock .. Patch Set 5: (6 comments) http://gerrit.cloudera.org:8080/#/c/17097/5/src/kudu/tablet/lock_manager-test.cc File src/kudu/tablet/lock_manager-test.cc: PS5: Could you also add some (maybe multithreaded) tests for the Wait... functionality? http://gerrit.cloudera.org:8080/#/c/17097/5/src/kudu/tablet/lock_manager.h File src/kudu/tablet/lock_manager.h: http://gerrit.cloudera.org:8080/#/c/17097/5/src/kudu/tablet/lock_manager.h@81 PS5, Line 81: // Similar to the above, but wait until the lock is acquired. nit: could you mention that the caller is expected to ensure there is no deadlock? E.g. by only running on followers in the prepare phase, or running serially in an order that has already been successful with TryAcquirePartitionLock() calls? http://gerrit.cloudera.org:8080/#/c/17097/3/src/kudu/tablet/lock_manager.h File src/kudu/tablet/lock_manager.h: http://gerrit.cloudera.org:8080/#/c/17097/3/src/kudu/tablet/lock_manager.h@157 PS3, Line 157: multi-rows t > I found it is more clear to have a specific entity to represent the lock. E Yeah, if anything maybe we could call this PartitionLockState or something http://gerrit.cloudera.org:8080/#/c/17097/5/src/kudu/tablet/lock_manager.cc File src/kudu/tablet/lock_manager.cc: http://gerrit.cloudera.org:8080/#/c/17097/5/src/kudu/tablet/lock_manager.cc@367 PS5, Line 367: bool must_acquire nit: maybe use an enum? enum LockWaitMode { kWaitForLock, kTryLock, }; http://gerrit.cloudera.org:8080/#/c/17097/5/src/kudu/tablet/lock_manager.cc@372 PS5, Line 372: DCHECK_NOTNULL(lock_); nit: can just used DCHECK(lock_) DCHECK_NOTNULL() returns the pointer, so this is creates an unused reference http://gerrit.cloudera.org:8080/#/c/17097/5/src/kudu/tablet/lock_manager.cc@478 PS5, Line 478: (lock = TryAcquirePartitionLock(txn_id, )) Would it make sense to extend the TryAcquirePartitionLock() method with a timeout, similar to the semaphore at L537? I guess we'll see how frequently this codepath gets hit based on how frequently there are wait-die conflicts.. -- To view, visit http://gerrit.cloudera.org:8080/17097 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: kudu Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I158115739ce3e7cfb77bbcb854e834336c1256b1 Gerrit-Change-Number: 17097 Gerrit-PatchSet: 5 Gerrit-Owner: Hao Hao Gerrit-Reviewer: Alexey Serbin Gerrit-Reviewer: Andrew Wong Gerrit-Reviewer: Bankim Bhavsar Gerrit-Reviewer: Hao Hao Gerrit-Reviewer: Kudu Jenkins (120) Gerrit-Reviewer: Tidy Bot (241) Gerrit-Comment-Date: Thu, 11 Mar 2021 20:39:56 + Gerrit-HasComments: Yes
[kudu-CR] KUDU-2612: add PartitionLock and ScopedPartitionLock
Alexey Serbin has posted comments on this change. ( http://gerrit.cloudera.org:8080/17097 ) Change subject: KUDU-2612: add PartitionLock and ScopedPartitionLock .. Patch Set 5: (2 comments) http://gerrit.cloudera.org:8080/#/c/17097/5/src/kudu/tablet/lock_manager-test.cc File src/kudu/tablet/lock_manager-test.cc: http://gerrit.cloudera.org:8080/#/c/17097/5/src/kudu/tablet/lock_manager-test.cc@261 PS5, Line 261: partition_lock.IsAcquired() > Using an already 'moved' variable doesn't smell good. Is it enough just to Ah, I guess you can use the moved instances, but maybe add specific NOLINT to be explicit about the intention here and suppress TidyBot warnings http://gerrit.cloudera.org:8080/#/c/17097/5/src/kudu/tablet/lock_manager.cc File src/kudu/tablet/lock_manager.cc: http://gerrit.cloudera.org:8080/#/c/17097/5/src/kudu/tablet/lock_manager.cc@409 PS5, Line 409: manager_ = other->manager_; nit: add a DCHECK() on other != this, just to catch programming errors. -- To view, visit http://gerrit.cloudera.org:8080/17097 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: kudu Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I158115739ce3e7cfb77bbcb854e834336c1256b1 Gerrit-Change-Number: 17097 Gerrit-PatchSet: 5 Gerrit-Owner: Hao Hao Gerrit-Reviewer: Alexey Serbin Gerrit-Reviewer: Andrew Wong Gerrit-Reviewer: Bankim Bhavsar Gerrit-Reviewer: Hao Hao Gerrit-Reviewer: Kudu Jenkins (120) Gerrit-Reviewer: Tidy Bot (241) Gerrit-Comment-Date: Thu, 11 Mar 2021 20:22:14 + Gerrit-HasComments: Yes
[kudu-CR] KUDU-2612: add PartitionLock and ScopedPartitionLock
Alexey Serbin has posted comments on this change. ( http://gerrit.cloudera.org:8080/17097 ) Change subject: KUDU-2612: add PartitionLock and ScopedPartitionLock .. Patch Set 5: (3 comments) http://gerrit.cloudera.org:8080/#/c/17097/5/src/kudu/tablet/lock_manager-test.cc File src/kudu/tablet/lock_manager-test.cc: http://gerrit.cloudera.org:8080/#/c/17097/5/src/kudu/tablet/lock_manager-test.cc@261 PS5, Line 261: partition_lock.IsAcquired() Using an already 'moved' variable doesn't smell good. Is it enough just to check for the reference count to make sure the former lock isn't held anymore? http://gerrit.cloudera.org:8080/#/c/17097/5/src/kudu/tablet/lock_manager.cc File src/kudu/tablet/lock_manager.cc: http://gerrit.cloudera.org:8080/#/c/17097/5/src/kudu/tablet/lock_manager.cc@437 PS5, Line 437: if (!txn_id.IsValid()) { : id = std::numeric_limits::max(); : } else { : id = txn_id.value(); : } nit: this might be done using tri-state operator: int64_t id = txn_id.IsValid() ? txn_id.value() : ...; http://gerrit.cloudera.org:8080/#/c/17097/5/src/kudu/tablet/lock_manager.cc@473 PS5, Line 473: PartitionLock* LockManager::WaitUntilAcquiredPartitionLock(const TxnId& txn_id) { What happens when this is running for too long, going over the overall timeout for an operation? In other words, should we add a deadline parameter for this method? -- To view, visit http://gerrit.cloudera.org:8080/17097 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: kudu Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I158115739ce3e7cfb77bbcb854e834336c1256b1 Gerrit-Change-Number: 17097 Gerrit-PatchSet: 5 Gerrit-Owner: Hao Hao Gerrit-Reviewer: Alexey Serbin Gerrit-Reviewer: Andrew Wong Gerrit-Reviewer: Bankim Bhavsar Gerrit-Reviewer: Hao Hao Gerrit-Reviewer: Kudu Jenkins (120) Gerrit-Reviewer: Tidy Bot (241) Gerrit-Comment-Date: Thu, 11 Mar 2021 20:17:28 + Gerrit-HasComments: Yes
[kudu-CR] KUDU-2612: acquire and release partition lock
Hao Hao has posted comments on this change. ( http://gerrit.cloudera.org:8080/17159 ) Change subject: KUDU-2612: acquire and release partition lock .. Patch Set 3: (1 comment) http://gerrit.cloudera.org:8080/#/c/17159/1/src/kudu/tablet/lock_manager.cc File src/kudu/tablet/lock_manager.cc: http://gerrit.cloudera.org:8080/#/c/17159/1/src/kudu/tablet/lock_manager.cc@401 PS1, Line 401: } > For consistency, maybe add a check that other != this ? Will update it in the other CR. -- To view, visit http://gerrit.cloudera.org:8080/17159 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: kudu Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: If26733cae16810f3b3afd1fd05dcb984e6366939 Gerrit-Change-Number: 17159 Gerrit-PatchSet: 3 Gerrit-Owner: Hao Hao Gerrit-Reviewer: Alexey Serbin Gerrit-Reviewer: Andrew Wong Gerrit-Reviewer: Hao Hao Gerrit-Reviewer: Kudu Jenkins (120) Gerrit-Reviewer: Tidy Bot (241) Gerrit-Comment-Date: Thu, 11 Mar 2021 18:57:42 + Gerrit-HasComments: Yes
[kudu-CR] KUDU-2612: acquire and release partition lock
Hao Hao has posted comments on this change. ( http://gerrit.cloudera.org:8080/17159 ) Change subject: KUDU-2612: acquire and release partition lock .. Patch Set 3: (7 comments) http://gerrit.cloudera.org:8080/#/c/17159/1//COMMIT_MSG Commit Message: http://gerrit.cloudera.org:8080/#/c/17159/1//COMMIT_MSG@13 PS1, Line 13: COMMI > We'll also need to make sure the TxnOpDispatcher handles the error cases we Done http://gerrit.cloudera.org:8080/#/c/17159/1/src/kudu/tablet/ops/participant_op.cc File src/kudu/tablet/ops/participant_op.cc: http://gerrit.cloudera.org:8080/#/c/17159/1/src/kudu/tablet/ops/participant_op.cc@95 PS1, Line 95: l acquired = partition_lock_.IsAcquired(); > nit: based on the current code, this cannot be any other code, right? If so Done http://gerrit.cloudera.org:8080/#/c/17159/1/src/kudu/tablet/ops/participant_op.cc@199 PS1, Line 199: "ParticipantOp::Prepare > nit: can use 'replica' Done http://gerrit.cloudera.org:8080/#/c/17159/1/src/kudu/tablet/ops/participant_op.cc@200 PS1, Line 200: TRACE("PREPARE: Starting."); > Shouldn't there be a 'break' before this line? Or are we taking the partiti Ah, right, 'break' is needed. Thanks for catching it! http://gerrit.cloudera.org:8080/#/c/17159/1/src/kudu/tablet/ops/participant_op.cc@300 PS1, Line 300: return Status::InvalidArgument("unknown op type"); > As for the sequence of releasing txn and its lock and partition lock: shoul Yes, it makes sense. Updated it. http://gerrit.cloudera.org:8080/#/c/17159/1/src/kudu/tablet/ops/participant_op.cc@303 PS1, Line 303: return Status::OK(); > nit: could store the type of the operation instead since the operation as i Will do. http://gerrit.cloudera.org:8080/#/c/17159/1/src/kudu/tablet/tablet.cc File src/kudu/tablet/tablet.cc: http://gerrit.cloudera.org:8080/#/c/17159/1/src/kudu/tablet/tablet.cc@223 PS1, Line 223: using std::endl; > warning: using decl 'TabletServerErrorPB' is unused [misc-unused-using-decl Done -- To view, visit http://gerrit.cloudera.org:8080/17159 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: kudu Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: If26733cae16810f3b3afd1fd05dcb984e6366939 Gerrit-Change-Number: 17159 Gerrit-PatchSet: 3 Gerrit-Owner: Hao Hao Gerrit-Reviewer: Alexey Serbin Gerrit-Reviewer: Andrew Wong Gerrit-Reviewer: Hao Hao Gerrit-Reviewer: Kudu Jenkins (120) Gerrit-Reviewer: Tidy Bot (241) Gerrit-Comment-Date: Thu, 11 Mar 2021 18:57:01 + Gerrit-HasComments: Yes
[kudu-CR] [tool] KUDU-3226 Validate List of Masters kudu ksck
Andrew Wong has posted comments on this change. ( http://gerrit.cloudera.org:8080/17141 ) Change subject: [tool] KUDU-3226 Validate List of Masters kudu ksck .. Patch Set 8: (2 comments) http://gerrit.cloudera.org:8080/#/c/17141/7/src/kudu/tools/tool_action_common.cc File src/kudu/tools/tool_action_common.cc: http://gerrit.cloudera.org:8080/#/c/17141/7/src/kudu/tools/tool_action_common.cc@665 PS7, Line 665: > Nit: delimiting by a comma will be preferred. +1 http://gerrit.cloudera.org:8080/#/c/17141/7/src/kudu/tools/tool_action_common.cc@665 PS7, Line 665: string dup_masters_str = JoinStrings(duplicate_masters, " "); : return Status::InvalidArgument("Duplicate master addresses specified: " + dup_masters_str); > Nit: Get rid of the local variable and directly use in InvalidArgument() ca +1 -- To view, visit http://gerrit.cloudera.org:8080/17141 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: kudu Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I2f3b2b7dcf2ac78cb95cf43242651e3ce8fddf6f Gerrit-Change-Number: 17141 Gerrit-PatchSet: 8 Gerrit-Owner: Abhishek Chennaka Gerrit-Reviewer: Abhishek Chennaka Gerrit-Reviewer: Alexey Serbin Gerrit-Reviewer: Andrew Wong Gerrit-Reviewer: Attila Bukor Gerrit-Reviewer: Bankim Bhavsar Gerrit-Reviewer: Grant Henke Gerrit-Reviewer: Kudu Jenkins (120) Gerrit-Comment-Date: Thu, 11 Mar 2021 18:34:22 + Gerrit-HasComments: Yes
[kudu-CR] [tool] KUDU-3226 Validate List of Masters kudu ksck
Andrew Wong has removed a vote on this change. Change subject: [tool] KUDU-3226 Validate List of Masters kudu ksck .. Removed Code-Review+2 by Andrew Wong -- To view, visit http://gerrit.cloudera.org:8080/17141 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: kudu Gerrit-Branch: master Gerrit-MessageType: deleteVote Gerrit-Change-Id: I2f3b2b7dcf2ac78cb95cf43242651e3ce8fddf6f Gerrit-Change-Number: 17141 Gerrit-PatchSet: 8 Gerrit-Owner: Abhishek Chennaka Gerrit-Reviewer: Abhishek Chennaka Gerrit-Reviewer: Alexey Serbin Gerrit-Reviewer: Andrew Wong Gerrit-Reviewer: Attila Bukor Gerrit-Reviewer: Bankim Bhavsar Gerrit-Reviewer: Grant Henke Gerrit-Reviewer: Kudu Jenkins (120)
[kudu-CR] [tool] KUDU-3226 Validate List of Masters kudu ksck
Andrew Wong has posted comments on this change. ( http://gerrit.cloudera.org:8080/17141 ) Change subject: [tool] KUDU-3226 Validate List of Masters kudu ksck .. Patch Set 8: Verified+1 Code-Review+2 -- To view, visit http://gerrit.cloudera.org:8080/17141 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: kudu Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I2f3b2b7dcf2ac78cb95cf43242651e3ce8fddf6f Gerrit-Change-Number: 17141 Gerrit-PatchSet: 8 Gerrit-Owner: Abhishek Chennaka Gerrit-Reviewer: Abhishek Chennaka Gerrit-Reviewer: Alexey Serbin Gerrit-Reviewer: Andrew Wong Gerrit-Reviewer: Attila Bukor Gerrit-Reviewer: Bankim Bhavsar Gerrit-Reviewer: Grant Henke Gerrit-Reviewer: Kudu Jenkins (120) Gerrit-Comment-Date: Thu, 11 Mar 2021 18:33:25 + Gerrit-HasComments: No
[kudu-CR] [tool] KUDU-3226 Validate List of Masters kudu ksck
Andrew Wong has removed a vote on this change. Change subject: [tool] KUDU-3226 Validate List of Masters kudu ksck .. Removed Verified-1 by Kudu Jenkins (120) -- To view, visit http://gerrit.cloudera.org:8080/17141 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: kudu Gerrit-Branch: master Gerrit-MessageType: deleteVote Gerrit-Change-Id: I2f3b2b7dcf2ac78cb95cf43242651e3ce8fddf6f Gerrit-Change-Number: 17141 Gerrit-PatchSet: 8 Gerrit-Owner: Abhishek Chennaka Gerrit-Reviewer: Abhishek Chennaka Gerrit-Reviewer: Alexey Serbin Gerrit-Reviewer: Andrew Wong Gerrit-Reviewer: Attila Bukor Gerrit-Reviewer: Bankim Bhavsar Gerrit-Reviewer: Grant Henke Gerrit-Reviewer: Kudu Jenkins (120)
[kudu-CR] [tool] KUDU-3226 Validate List of Masters kudu ksck
Hello Alexey Serbin, Attila Bukor, Kudu Jenkins, Andrew Wong, Grant Henke, Bankim Bhavsar, I'd like you to reexamine a change. Please visit http://gerrit.cloudera.org:8080/17141 to look at the new patch set (#8). Change subject: [tool] KUDU-3226 Validate List of Masters kudu ksck .. [tool] KUDU-3226 Validate List of Masters kudu ksck This patch checks for duplicate kudu masters provided in all of the kudu command line tools which call ParseMasterAddresses(). Currently ksck and rebalance (as rebalance calls ksck internally) are the only commands which throw the reported stack trace as ksck constructs a map internally for the kudu masters and bombs once it sees duplicate UUIDs. Other commands do not report any issue currently with duplicate masters. The patch only does a string comparison check to detect and report all the duplicate master(/s). If a user provides the IP address and hostname of the same master, it won't be able to catch the duplicates. That seems more of a deliberate attempt rather than a common mistake a user could make and hence not considering that case. Also wrote a simple test case to test this functionality. Decided not to spin up a test cluster for this as this is more of client side check and a master is not actually contacted. This change got conflicted with another test i.e. AdminCliTest.TestAuthzResetCacheIncorrectMasterAddressList. Updated this test as well to by creating a new external mini cluster with three masters and running the command in the test with just one master Change-Id: I2f3b2b7dcf2ac78cb95cf43242651e3ce8fddf6f --- M src/kudu/tools/kudu-admin-test.cc M src/kudu/tools/kudu-tool-test.cc M src/kudu/tools/tool_action_common.cc 3 files changed, 43 insertions(+), 10 deletions(-) git pull ssh://gerrit.cloudera.org:29418/kudu refs/changes/41/17141/8 -- To view, visit http://gerrit.cloudera.org:8080/17141 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: kudu Gerrit-Branch: master Gerrit-MessageType: newpatchset Gerrit-Change-Id: I2f3b2b7dcf2ac78cb95cf43242651e3ce8fddf6f Gerrit-Change-Number: 17141 Gerrit-PatchSet: 8 Gerrit-Owner: Abhishek Chennaka Gerrit-Reviewer: Abhishek Chennaka Gerrit-Reviewer: Alexey Serbin Gerrit-Reviewer: Andrew Wong Gerrit-Reviewer: Attila Bukor Gerrit-Reviewer: Bankim Bhavsar Gerrit-Reviewer: Grant Henke Gerrit-Reviewer: Kudu Jenkins (120)
[kudu-CR] [tool] KUDU-3226 Validate List of Masters kudu ksck
Bankim Bhavsar has posted comments on this change. ( http://gerrit.cloudera.org:8080/17141 ) Change subject: [tool] KUDU-3226 Validate List of Masters kudu ksck .. Patch Set 7: Code-Review+2 (2 comments) Please wait for more comments from Andrew, Alexey. http://gerrit.cloudera.org:8080/#/c/17141/7/src/kudu/tools/tool_action_common.cc File src/kudu/tools/tool_action_common.cc: http://gerrit.cloudera.org:8080/#/c/17141/7/src/kudu/tools/tool_action_common.cc@665 PS7, Line 665: " " Nit: delimiting by a comma will be preferred. http://gerrit.cloudera.org:8080/#/c/17141/7/src/kudu/tools/tool_action_common.cc@665 PS7, Line 665: string duplicate_masters_str = JoinStrings(duplicate_masters, " "); : return Status::InvalidArgument("Duplicate master addresses specified: " + duplicate_masters_str); Nit: Get rid of the local variable and directly use in InvalidArgument() call. -- To view, visit http://gerrit.cloudera.org:8080/17141 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: kudu Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I2f3b2b7dcf2ac78cb95cf43242651e3ce8fddf6f Gerrit-Change-Number: 17141 Gerrit-PatchSet: 7 Gerrit-Owner: Abhishek Chennaka Gerrit-Reviewer: Abhishek Chennaka Gerrit-Reviewer: Alexey Serbin Gerrit-Reviewer: Andrew Wong Gerrit-Reviewer: Attila Bukor Gerrit-Reviewer: Bankim Bhavsar Gerrit-Reviewer: Grant Henke Gerrit-Reviewer: Kudu Jenkins (120) Gerrit-Comment-Date: Thu, 11 Mar 2021 16:46:56 + Gerrit-HasComments: Yes
[kudu-CR] [tool] KUDU-3226 Validate List of Masters kudu ksck
Hello Alexey Serbin, Attila Bukor, Kudu Jenkins, Andrew Wong, Grant Henke, Bankim Bhavsar, I'd like you to reexamine a change. Please visit http://gerrit.cloudera.org:8080/17141 to look at the new patch set (#7). Change subject: [tool] KUDU-3226 Validate List of Masters kudu ksck .. [tool] KUDU-3226 Validate List of Masters kudu ksck This patch checks for duplicate kudu masters provided in all of the kudu command line tools which call ParseMasterAddresses(). Currently ksck and rebalance (as rebalance calls ksck internally) are the only commands which throw the reported stack trace as ksck constructs a map internally for the kudu masters and bombs once it sees duplicate UUIDs. Other commands do not report any issue currently with duplicate masters. The patch only does a string comparison check to detect and report all the duplicate master(/s). If a user provides the IP address and hostname of the same master, it won't be able to catch the duplicates. That seems more of a deliberate attempt rather than a common mistake a user could make and hence not considering that case. Also wrote a simple test case to test this functionality. Decided not to spin up a test cluster for this as this is more of client side check and a master is not actually contacted. This change got conflicted with another test i.e. AdminCliTest.TestAuthzResetCacheIncorrectMasterAddressList. Updated this test as well to by creating a new external mini cluster with three masters and running the command in the test with just one master Change-Id: I2f3b2b7dcf2ac78cb95cf43242651e3ce8fddf6f --- M src/kudu/tools/kudu-admin-test.cc M src/kudu/tools/kudu-tool-test.cc M src/kudu/tools/tool_action_common.cc 3 files changed, 43 insertions(+), 10 deletions(-) git pull ssh://gerrit.cloudera.org:29418/kudu refs/changes/41/17141/7 -- To view, visit http://gerrit.cloudera.org:8080/17141 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: kudu Gerrit-Branch: master Gerrit-MessageType: newpatchset Gerrit-Change-Id: I2f3b2b7dcf2ac78cb95cf43242651e3ce8fddf6f Gerrit-Change-Number: 17141 Gerrit-PatchSet: 7 Gerrit-Owner: Abhishek Chennaka Gerrit-Reviewer: Abhishek Chennaka Gerrit-Reviewer: Alexey Serbin Gerrit-Reviewer: Andrew Wong Gerrit-Reviewer: Attila Bukor Gerrit-Reviewer: Bankim Bhavsar Gerrit-Reviewer: Grant Henke Gerrit-Reviewer: Kudu Jenkins (120)
[kudu-CR] [tool] KUDU-3226 Validate List of Masters kudu ksck
Abhishek Chennaka has posted comments on this change. ( http://gerrit.cloudera.org:8080/17141 ) Change subject: [tool] KUDU-3226 Validate List of Masters kudu ksck .. Patch Set 6: (9 comments) http://gerrit.cloudera.org:8080/#/c/17141/5/src/kudu/tools/kudu-tool-test.cc File src/kudu/tools/kudu-tool-test.cc: http://gerrit.cloudera.org:8080/#/c/17141/5/src/kudu/tools/kudu-tool-test.cc@6103 PS5, Line 6103: // > nit: here and elsewhere with comments, add a space after the //, as is used Done http://gerrit.cloudera.org:8080/#/c/17141/5/src/kudu/tools/kudu-tool-test.cc@6107 PS5, Line 6107: ()) > nit: it's probably a good habit to build to start using curly braces for th Done http://gerrit.cloudera.org:8080/#/c/17141/1/src/kudu/tools/tool_action_common.cc File src/kudu/tools/tool_action_common.cc: http://gerrit.cloudera.org:8080/#/c/17141/1/src/kudu/tools/tool_action_common.cc@652 PS1, Line 652: std:: > I don't see this update in PS5? Looks like an error on my side. Fixed it now http://gerrit.cloudera.org:8080/#/c/17141/5/src/kudu/tools/tool_action_common.cc File src/kudu/tools/tool_action_common.cc: http://gerrit.cloudera.org:8080/#/c/17141/5/src/kudu/tools/tool_action_common.cc@653 PS5, Line 653: std:: > nit: don't need the std:: here either. Done http://gerrit.cloudera.org:8080/#/c/17141/5/src/kudu/tools/tool_action_common.cc@654 PS5, Line 654: / Loop through the master addresses to find the duplicate. If there is no master specified : // > nit: same here with regards to adding a space after //. Done http://gerrit.cloudera.org:8080/#/c/17141/5/src/kudu/tools/tool_action_common.cc@657 PS5, Line 657: if (master.empty()) continue; : if (Contain > Nit: I'm not sure whether it's part of style guide but for "if" if there is Done http://gerrit.cloudera.org:8080/#/c/17141/5/src/kudu/tools/tool_action_common.cc@660 PS5, Line 660: > > That said, maybe instead keep an unordered_set of duplicated masters and Done http://gerrit.cloudera.org:8080/#/c/17141/5/src/kudu/tools/tool_action_common.cc@660 PS5, Line 660: > nit: I don't think it's strictly requested by the GSG, but for the most par Done http://gerrit.cloudera.org:8080/#/c/17141/5/src/kudu/tools/tool_action_common.cc@665 PS5, Line 665: return Status::InvalidArgument("Duplicate master addresses specified: " + JoinStrings(duplicate_masters, " ")); : } : *master_addresses = std::move(master_addresses_local); > Same as above about either using braces {} for "if" if the stmt is on next Done -- To view, visit http://gerrit.cloudera.org:8080/17141 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: kudu Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I2f3b2b7dcf2ac78cb95cf43242651e3ce8fddf6f Gerrit-Change-Number: 17141 Gerrit-PatchSet: 6 Gerrit-Owner: Abhishek Chennaka Gerrit-Reviewer: Abhishek Chennaka Gerrit-Reviewer: Alexey Serbin Gerrit-Reviewer: Andrew Wong Gerrit-Reviewer: Attila Bukor Gerrit-Reviewer: Bankim Bhavsar Gerrit-Reviewer: Grant Henke Gerrit-Reviewer: Kudu Jenkins (120) Gerrit-Comment-Date: Thu, 11 Mar 2021 08:15:48 + Gerrit-HasComments: Yes
[kudu-CR] [tool] KUDU-3226 Validate List of Masters kudu ksck
Hello Alexey Serbin, Attila Bukor, Kudu Jenkins, Andrew Wong, Grant Henke, Bankim Bhavsar, I'd like you to reexamine a change. Please visit http://gerrit.cloudera.org:8080/17141 to look at the new patch set (#6). Change subject: [tool] KUDU-3226 Validate List of Masters kudu ksck .. [tool] KUDU-3226 Validate List of Masters kudu ksck This patch checks for duplicate kudu masters provided in all of the kudu command line tools which call ParseMasterAddresses(). Currently ksck and rebalance (as rebalance calls ksck internally) are the only commands which throw the reported stack trace as ksck constructs a map internally for the kudu masters and bombs once it sees duplicate UUIDs. Other commands do not report any issue currently with duplicate masters. The patch only does a string comparison check to detect and report all the duplicate master(/s). If a user provides the IP address and hostname of the same master, it won't be able to catch the duplicates. That seems more of a deliberate attempt rather than a common mistake a user could make and hence not considering that case. Also wrote a simple test case to test this functionality. Decided not to spin up a test cluster for this as this is more of client side check and a master is not actually contacted. This change got conflicted with another test i.e. AdminCliTest.TestAuthzResetCacheIncorrectMasterAddressList. Updated this test as well to by creating a new external mini cluster with three masters and running the command in the test with just one master Change-Id: I2f3b2b7dcf2ac78cb95cf43242651e3ce8fddf6f --- M src/kudu/tools/kudu-admin-test.cc M src/kudu/tools/kudu-tool-test.cc M src/kudu/tools/tool_action_common.cc 3 files changed, 42 insertions(+), 9 deletions(-) git pull ssh://gerrit.cloudera.org:29418/kudu refs/changes/41/17141/6 -- To view, visit http://gerrit.cloudera.org:8080/17141 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: kudu Gerrit-Branch: master Gerrit-MessageType: newpatchset Gerrit-Change-Id: I2f3b2b7dcf2ac78cb95cf43242651e3ce8fddf6f Gerrit-Change-Number: 17141 Gerrit-PatchSet: 6 Gerrit-Owner: Abhishek Chennaka Gerrit-Reviewer: Abhishek Chennaka Gerrit-Reviewer: Alexey Serbin Gerrit-Reviewer: Andrew Wong Gerrit-Reviewer: Attila Bukor Gerrit-Reviewer: Bankim Bhavsar Gerrit-Reviewer: Grant Henke Gerrit-Reviewer: Kudu Jenkins (120)