[kudu-CR] KUDU-2612: restrict TxnStatusManager calls to be made by the leader only

2021-02-01 Thread Hao Hao (Code Review)
Hao Hao has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/16648 )

Change subject: KUDU-2612: restrict TxnStatusManager calls to be made by the 
leader only
..


Patch Set 16:

(3 comments)

http://gerrit.cloudera.org:8080/#/c/16648/9/src/kudu/integration-tests/txn_status_table-itest.cc
File src/kudu/integration-tests/txn_status_table-itest.cc:

http://gerrit.cloudera.org:8080/#/c/16648/9/src/kudu/integration-tests/txn_status_table-itest.cc@714
PS9, Line 714:   orig_leader
> nit: I guess this isn't necessary once the ASSERT_EVENTUALLY() below is gon
Done


http://gerrit.cloudera.org:8080/#/c/16648/9/src/kudu/integration-tests/txn_status_table-itest.cc@715
PS9, Line 715: TONED, /
> nit: just put 3 here?
Address it in a follow up patch.


http://gerrit.cloudera.org:8080/#/c/16648/9/src/kudu/tserver/ts_tablet_manager.cc
File src/kudu/tserver/ts_tablet_manager.cc:

http://gerrit.cloudera.org:8080/#/c/16648/9/src/kudu/tserver/ts_tablet_manager.cc@1457
PS9, Line 1457: nStatusM
> Because the l.first_failed_status().ok() most likely will return not-OK for
Hmm, makes sense. I will address it in a follow up commit.



--
To view, visit http://gerrit.cloudera.org:8080/16648
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I42c1ad095dcb4bdffcbe0ecf9631a60bac208c2a
Gerrit-Change-Number: 16648
Gerrit-PatchSet: 16
Gerrit-Owner: Hao Hao 
Gerrit-Reviewer: Alexey Serbin 
Gerrit-Reviewer: Andrew Wong 
Gerrit-Reviewer: Hao Hao 
Gerrit-Reviewer: Kudu Jenkins (120)
Gerrit-Reviewer: Tidy Bot (241)
Gerrit-Comment-Date: Mon, 01 Feb 2021 22:57:56 +
Gerrit-HasComments: Yes


[kudu-CR] KUDU-2612: restrict TxnStatusManager calls to be made by the leader only

2021-01-31 Thread Alexey Serbin (Code Review)
Alexey Serbin has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/16648 )

Change subject: KUDU-2612: restrict TxnStatusManager calls to be made by the 
leader only
..


Patch Set 16:

(3 comments)

http://gerrit.cloudera.org:8080/#/c/16648/9/src/kudu/integration-tests/txn_status_table-itest.cc
File src/kudu/integration-tests/txn_status_table-itest.cc:

http://gerrit.cloudera.org:8080/#/c/16648/9/src/kudu/integration-tests/txn_status_table-itest.cc@714
PS9, Line 714:   orig_leader
nit: I guess this isn't necessary once the ASSERT_EVENTUALLY() below is gone.


http://gerrit.cloudera.org:8080/#/c/16648/9/src/kudu/integration-tests/txn_status_table-itest.cc@715
PS9, Line 715: TONED, /
nit: just put 3 here?


http://gerrit.cloudera.org:8080/#/c/16648/9/src/kudu/tserver/ts_tablet_manager.cc
File src/kudu/tserver/ts_tablet_manager.cc:

http://gerrit.cloudera.org:8080/#/c/16648/9/src/kudu/tserver/ts_tablet_manager.cc@1457
PS9, Line 1457: nStatusM
> I am not sure why we want to do that? Why don't we check for the rest table
Because the l.first_failed_status().ok() most likely will return not-OK for the 
rest of the tables: l.first_failed_status().ok() doesn't depend on the elements 
in the container, but rather on the status of the replica behind this 
TxnStatusManager.



--
To view, visit http://gerrit.cloudera.org:8080/16648
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I42c1ad095dcb4bdffcbe0ecf9631a60bac208c2a
Gerrit-Change-Number: 16648
Gerrit-PatchSet: 16
Gerrit-Owner: Hao Hao 
Gerrit-Reviewer: Alexey Serbin 
Gerrit-Reviewer: Andrew Wong 
Gerrit-Reviewer: Hao Hao 
Gerrit-Reviewer: Kudu Jenkins (120)
Gerrit-Reviewer: Tidy Bot (241)
Gerrit-Comment-Date: Sun, 31 Jan 2021 17:11:34 +
Gerrit-HasComments: Yes


[kudu-CR] KUDU-2612: restrict TxnStatusManager calls to be made by the leader only

2021-01-30 Thread Hao Hao (Code Review)
Hao Hao has submitted this change and it was merged. ( 
http://gerrit.cloudera.org:8080/16648 )

Change subject: KUDU-2612: restrict TxnStatusManager calls to be made by the 
leader only
..

KUDU-2612: restrict TxnStatusManager calls to be made by the leader only

Currently, even though a non-leader TxnStatusManager will be unable to
write to the underlying table (in the Raft subsystem the write would be
aborted), we may want to restrict calls to be made by the leader
TxnStatusManagers only. The motivation is to provide a more robust system,
which avoids cases when the request was sent to a laggy follower, we may
end up failing the request with an error.

This patch introduces ScopedLeaderSharedLock (similar to the one in Catalog
Manager) to be used to ensure the requests were sent to leaders only and
to block all other operations while reloading the persistent transaction
status metadata upon leadership changes. Note that during failover the
leader replica will wait until in-flight ops in the previous consensus
term to be applied before reloading the metadata.

Change-Id: I42c1ad095dcb4bdffcbe0ecf9631a60bac208c2a
Reviewed-on: http://gerrit.cloudera.org:8080/16648
Tested-by: Hao Hao 
Reviewed-by: Andrew Wong 
---
M src/kudu/client/client-test.cc
M src/kudu/integration-tests/ts_tablet_manager-itest.cc
M src/kudu/integration-tests/txn_status_manager-itest.cc
M src/kudu/integration-tests/txn_status_table-itest.cc
M src/kudu/master/sys_catalog.cc
M src/kudu/tablet/tablet_replica-test-base.cc
M src/kudu/tablet/tablet_replica.cc
M src/kudu/tablet/tablet_replica.h
M src/kudu/tablet/txn_coordinator.h
M src/kudu/transactions/txn_status_manager-test.cc
M src/kudu/transactions/txn_status_manager.cc
M src/kudu/transactions/txn_status_manager.h
M src/kudu/transactions/txn_status_tablet.h
M src/kudu/tserver/tablet_copy_source_session-test.cc
M src/kudu/tserver/tablet_service.cc
M src/kudu/tserver/ts_tablet_manager.cc
M src/kudu/tserver/ts_tablet_manager.h
17 files changed, 850 insertions(+), 140 deletions(-)

Approvals:
  Hao Hao: Verified
  Andrew Wong: Looks good to me, approved

--
To view, visit http://gerrit.cloudera.org:8080/16648
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: merged
Gerrit-Change-Id: I42c1ad095dcb4bdffcbe0ecf9631a60bac208c2a
Gerrit-Change-Number: 16648
Gerrit-PatchSet: 16
Gerrit-Owner: Hao Hao 
Gerrit-Reviewer: Alexey Serbin 
Gerrit-Reviewer: Andrew Wong 
Gerrit-Reviewer: Hao Hao 
Gerrit-Reviewer: Kudu Jenkins (120)
Gerrit-Reviewer: Tidy Bot (241)


[kudu-CR] KUDU-2612: restrict TxnStatusManager calls to be made by the leader only

2021-01-30 Thread Hao Hao (Code Review)
Hao Hao has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/16648 )

Change subject: KUDU-2612: restrict TxnStatusManager calls to be made by the 
leader only
..


Patch Set 15:

(1 comment)

http://gerrit.cloudera.org:8080/#/c/16648/15/src/kudu/integration-tests/txn_status_table-itest.cc
File src/kudu/integration-tests/txn_status_table-itest.cc:

http://gerrit.cloudera.org:8080/#/c/16648/15/src/kudu/integration-tests/txn_status_table-itest.cc@774
PS15, Line 774:   FLAGS_txn_status_tablet_inject_load_failure_error = true;
> You mentioned on slack that this won't do io unless we flush. It's probably
Ack



--
To view, visit http://gerrit.cloudera.org:8080/16648
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I42c1ad095dcb4bdffcbe0ecf9631a60bac208c2a
Gerrit-Change-Number: 16648
Gerrit-PatchSet: 15
Gerrit-Owner: Hao Hao 
Gerrit-Reviewer: Alexey Serbin 
Gerrit-Reviewer: Andrew Wong 
Gerrit-Reviewer: Hao Hao 
Gerrit-Reviewer: Kudu Jenkins (120)
Gerrit-Reviewer: Tidy Bot (241)
Gerrit-Comment-Date: Sun, 31 Jan 2021 06:53:16 +
Gerrit-HasComments: Yes


[kudu-CR] KUDU-2612: restrict TxnStatusManager calls to be made by the leader only

2021-01-30 Thread Andrew Wong (Code Review)
Andrew Wong has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/16648 )

Change subject: KUDU-2612: restrict TxnStatusManager calls to be made by the 
leader only
..


Patch Set 15: Code-Review+2

(1 comment)

http://gerrit.cloudera.org:8080/#/c/16648/15/src/kudu/integration-tests/txn_status_table-itest.cc
File src/kudu/integration-tests/txn_status_table-itest.cc:

http://gerrit.cloudera.org:8080/#/c/16648/15/src/kudu/integration-tests/txn_status_table-itest.cc@774
PS15, Line 774:   FLAGS_txn_status_tablet_inject_load_failure_error = true;
> You can also use --env_inject_eio_globs to target specific directories. IIR
You mentioned on slack that this won't do io unless we flush. It's probably 
more trouble than it's worth to use io errors here.



--
To view, visit http://gerrit.cloudera.org:8080/16648
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I42c1ad095dcb4bdffcbe0ecf9631a60bac208c2a
Gerrit-Change-Number: 16648
Gerrit-PatchSet: 15
Gerrit-Owner: Hao Hao 
Gerrit-Reviewer: Alexey Serbin 
Gerrit-Reviewer: Andrew Wong 
Gerrit-Reviewer: Hao Hao 
Gerrit-Reviewer: Kudu Jenkins (120)
Gerrit-Reviewer: Tidy Bot (241)
Gerrit-Comment-Date: Sun, 31 Jan 2021 06:40:36 +
Gerrit-HasComments: Yes


[kudu-CR] KUDU-2612: restrict TxnStatusManager calls to be made by the leader only

2021-01-30 Thread Andrew Wong (Code Review)
Andrew Wong has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/16648 )

Change subject: KUDU-2612: restrict TxnStatusManager calls to be made by the 
leader only
..


Patch Set 15:

(1 comment)

http://gerrit.cloudera.org:8080/#/c/16648/15/src/kudu/integration-tests/txn_status_table-itest.cc
File src/kudu/integration-tests/txn_status_table-itest.cc:

http://gerrit.cloudera.org:8080/#/c/16648/15/src/kudu/integration-tests/txn_status_table-itest.cc@774
PS15, Line 774:   FLAGS_txn_status_tablet_inject_load_failure_error = true;
> Try to use --env_inject_eio but that seems to affect the log to be appended
You can also use --env_inject_eio_globs to target specific directories. IIRC, 
each mini server stores all of its data in a single directory, so something like

--env_inject_eio_globs={cluster_->mini_tablet_server(i)->server()->fs_manager()->GetDataRootDirs()[0]}/**



--
To view, visit http://gerrit.cloudera.org:8080/16648
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I42c1ad095dcb4bdffcbe0ecf9631a60bac208c2a
Gerrit-Change-Number: 16648
Gerrit-PatchSet: 15
Gerrit-Owner: Hao Hao 
Gerrit-Reviewer: Alexey Serbin 
Gerrit-Reviewer: Andrew Wong 
Gerrit-Reviewer: Hao Hao 
Gerrit-Reviewer: Kudu Jenkins (120)
Gerrit-Reviewer: Tidy Bot (241)
Gerrit-Comment-Date: Sun, 31 Jan 2021 01:18:11 +
Gerrit-HasComments: Yes


[kudu-CR] KUDU-2612: restrict TxnStatusManager calls to be made by the leader only

2021-01-30 Thread Hao Hao (Code Review)
Hao Hao has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/16648 )

Change subject: KUDU-2612: restrict TxnStatusManager calls to be made by the 
leader only
..


Patch Set 15:

(1 comment)

http://gerrit.cloudera.org:8080/#/c/16648/15/src/kudu/integration-tests/txn_status_table-itest.cc
File src/kudu/integration-tests/txn_status_table-itest.cc:

http://gerrit.cloudera.org:8080/#/c/16648/15/src/kudu/integration-tests/txn_status_table-itest.cc@774
PS15, Line 774:   FLAGS_txn_status_tablet_inject_load_failure_error = true;
> Do disk failures exercise this codepath? If so, we can just use --env_injec
Try to use --env_inject_eio but that seems to affect the log to be appended. Or 
maybe you have suggestions on how to narrow down to only affect transaction 
status tablet?



--
To view, visit http://gerrit.cloudera.org:8080/16648
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I42c1ad095dcb4bdffcbe0ecf9631a60bac208c2a
Gerrit-Change-Number: 16648
Gerrit-PatchSet: 15
Gerrit-Owner: Hao Hao 
Gerrit-Reviewer: Alexey Serbin 
Gerrit-Reviewer: Andrew Wong 
Gerrit-Reviewer: Hao Hao 
Gerrit-Reviewer: Kudu Jenkins (120)
Gerrit-Reviewer: Tidy Bot (241)
Gerrit-Comment-Date: Sun, 31 Jan 2021 01:13:00 +
Gerrit-HasComments: Yes


[kudu-CR] KUDU-2612: restrict TxnStatusManager calls to be made by the leader only

2021-01-30 Thread Andrew Wong (Code Review)
Andrew Wong has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/16648 )

Change subject: KUDU-2612: restrict TxnStatusManager calls to be made by the 
leader only
..


Patch Set 15: Code-Review+1

(1 comment)

http://gerrit.cloudera.org:8080/#/c/16648/15/src/kudu/integration-tests/txn_status_table-itest.cc
File src/kudu/integration-tests/txn_status_table-itest.cc:

http://gerrit.cloudera.org:8080/#/c/16648/15/src/kudu/integration-tests/txn_status_table-itest.cc@774
PS15, Line 774:   FLAGS_txn_status_tablet_inject_load_failure_error = true;
Do disk failures exercise this codepath? If so, we can just use 
--env_inject_eio.



--
To view, visit http://gerrit.cloudera.org:8080/16648
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I42c1ad095dcb4bdffcbe0ecf9631a60bac208c2a
Gerrit-Change-Number: 16648
Gerrit-PatchSet: 15
Gerrit-Owner: Hao Hao 
Gerrit-Reviewer: Alexey Serbin 
Gerrit-Reviewer: Andrew Wong 
Gerrit-Reviewer: Hao Hao 
Gerrit-Reviewer: Kudu Jenkins (120)
Gerrit-Reviewer: Tidy Bot (241)
Gerrit-Comment-Date: Sun, 31 Jan 2021 00:58:28 +
Gerrit-HasComments: Yes


[kudu-CR] KUDU-2612: restrict TxnStatusManager calls to be made by the leader only

2021-01-30 Thread Hao Hao (Code Review)
Hao Hao has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/16648 )

Change subject: KUDU-2612: restrict TxnStatusManager calls to be made by the 
leader only
..


Patch Set 15: Verified+1

Unrelated org.apache.kudu.client.TestKuduMetrics flaky test failure.


--
To view, visit http://gerrit.cloudera.org:8080/16648
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I42c1ad095dcb4bdffcbe0ecf9631a60bac208c2a
Gerrit-Change-Number: 16648
Gerrit-PatchSet: 15
Gerrit-Owner: Hao Hao 
Gerrit-Reviewer: Alexey Serbin 
Gerrit-Reviewer: Andrew Wong 
Gerrit-Reviewer: Hao Hao 
Gerrit-Reviewer: Kudu Jenkins (120)
Gerrit-Reviewer: Tidy Bot (241)
Gerrit-Comment-Date: Sun, 31 Jan 2021 00:31:39 +
Gerrit-HasComments: No


[kudu-CR] KUDU-2612: restrict TxnStatusManager calls to be made by the leader only

2021-01-30 Thread Hao Hao (Code Review)
Hao Hao has removed a vote on this change.

Change subject: KUDU-2612: restrict TxnStatusManager calls to be made by the 
leader only
..


Removed Verified-1 by Kudu Jenkins (120)
--
To view, visit http://gerrit.cloudera.org:8080/16648
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: deleteVote
Gerrit-Change-Id: I42c1ad095dcb4bdffcbe0ecf9631a60bac208c2a
Gerrit-Change-Number: 16648
Gerrit-PatchSet: 15
Gerrit-Owner: Hao Hao 
Gerrit-Reviewer: Alexey Serbin 
Gerrit-Reviewer: Andrew Wong 
Gerrit-Reviewer: Hao Hao 
Gerrit-Reviewer: Kudu Jenkins (120)
Gerrit-Reviewer: Tidy Bot (241)


[kudu-CR] KUDU-2612: restrict TxnStatusManager calls to be made by the leader only

2021-01-30 Thread Hao Hao (Code Review)
Hao Hao has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/16648 )

Change subject: KUDU-2612: restrict TxnStatusManager calls to be made by the 
leader only
..


Patch Set 15:

(2 comments)

http://gerrit.cloudera.org:8080/#/c/16648/14/src/kudu/integration-tests/txn_status_manager-itest.cc
File src/kudu/integration-tests/txn_status_manager-itest.cc:

http://gerrit.cloudera.org:8080/#/c/16648/14/src/kudu/integration-tests/txn_status_manager-itest.cc@447
PS14, Line 447:   ASSERT_EVENTUALLY([&]{
> nit: Why do we need to sleep for so long if we're already retrying?
Done


http://gerrit.cloudera.org:8080/#/c/16648/14/src/kudu/transactions/txn_status_manager.cc
File src/kudu/transactions/txn_status_manager.cc:

http://gerrit.cloudera.org:8080/#/c/16648/14/src/kudu/transactions/txn_status_manager.cc@443
PS14, Line 443:
  :   const int64_t term = consensus.CurrentTerm();
  :   // If the term has changed we assume the new leader is 
about to do the
  :   // necessary work in its
> Why might we get here? I thought we didn't want to crash at all? Can we get
Good point! It is overkilled to crash here if there is any failures preventing 
read from the transaction status tablet. Updated.



--
To view, visit http://gerrit.cloudera.org:8080/16648
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I42c1ad095dcb4bdffcbe0ecf9631a60bac208c2a
Gerrit-Change-Number: 16648
Gerrit-PatchSet: 15
Gerrit-Owner: Hao Hao 
Gerrit-Reviewer: Alexey Serbin 
Gerrit-Reviewer: Andrew Wong 
Gerrit-Reviewer: Hao Hao 
Gerrit-Reviewer: Kudu Jenkins (120)
Gerrit-Reviewer: Tidy Bot (241)
Gerrit-Comment-Date: Sat, 30 Jan 2021 23:51:19 +
Gerrit-HasComments: Yes


[kudu-CR] KUDU-2612: restrict TxnStatusManager calls to be made by the leader only

2021-01-30 Thread Hao Hao (Code Review)
Hello Tidy Bot, Alexey Serbin, Kudu Jenkins, Andrew Wong,

I'd like you to reexamine a change. Please visit

http://gerrit.cloudera.org:8080/16648

to look at the new patch set (#15).

Change subject: KUDU-2612: restrict TxnStatusManager calls to be made by the 
leader only
..

KUDU-2612: restrict TxnStatusManager calls to be made by the leader only

Currently, even though a non-leader TxnStatusManager will be unable to
write to the underlying table (in the Raft subsystem the write would be
aborted), we may want to restrict calls to be made by the leader
TxnStatusManagers only. The motivation is to provide a more robust system,
which avoids cases when the request was sent to a laggy follower, we may
end up failing the request with an error.

This patch introduces ScopedLeaderSharedLock (similar to the one in Catalog
Manager) to be used to ensure the requests were sent to leaders only and
to block all other operations while reloading the persistent transaction
status metadata upon leadership changes. Note that during failover the
leader replica will wait until in-flight ops in the previous consensus
term to be applied before reloading the metadata.

Change-Id: I42c1ad095dcb4bdffcbe0ecf9631a60bac208c2a
---
M src/kudu/client/client-test.cc
M src/kudu/integration-tests/ts_tablet_manager-itest.cc
M src/kudu/integration-tests/txn_status_manager-itest.cc
M src/kudu/integration-tests/txn_status_table-itest.cc
M src/kudu/master/sys_catalog.cc
M src/kudu/tablet/tablet_replica-test-base.cc
M src/kudu/tablet/tablet_replica.cc
M src/kudu/tablet/tablet_replica.h
M src/kudu/tablet/txn_coordinator.h
M src/kudu/transactions/txn_status_manager-test.cc
M src/kudu/transactions/txn_status_manager.cc
M src/kudu/transactions/txn_status_manager.h
M src/kudu/transactions/txn_status_tablet.h
M src/kudu/tserver/tablet_copy_source_session-test.cc
M src/kudu/tserver/tablet_service.cc
M src/kudu/tserver/ts_tablet_manager.cc
M src/kudu/tserver/ts_tablet_manager.h
17 files changed, 850 insertions(+), 140 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/kudu refs/changes/48/16648/15
--
To view, visit http://gerrit.cloudera.org:8080/16648
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I42c1ad095dcb4bdffcbe0ecf9631a60bac208c2a
Gerrit-Change-Number: 16648
Gerrit-PatchSet: 15
Gerrit-Owner: Hao Hao 
Gerrit-Reviewer: Alexey Serbin 
Gerrit-Reviewer: Andrew Wong 
Gerrit-Reviewer: Hao Hao 
Gerrit-Reviewer: Kudu Jenkins (120)
Gerrit-Reviewer: Tidy Bot (241)


[kudu-CR] KUDU-2612: restrict TxnStatusManager calls to be made by the leader only

2021-01-30 Thread Andrew Wong (Code Review)
Andrew Wong has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/16648 )

Change subject: KUDU-2612: restrict TxnStatusManager calls to be made by the 
leader only
..


Patch Set 14: Code-Review+2

(1 comment)

http://gerrit.cloudera.org:8080/#/c/16648/14/src/kudu/transactions/txn_status_manager.cc
File src/kudu/transactions/txn_status_manager.cc:

http://gerrit.cloudera.org:8080/#/c/16648/14/src/kudu/transactions/txn_status_manager.cc@443
PS14, Line 443:
  :   // In all other cases non-OK status is considered fatal.
  :   LOG(FATAL) << Substitute("$0 failed: $1", op_description, 
s.ToString());
  :   __builtin_unreachable();
> Why might we get here? I thought we didn't want to crash at all? Can we get
Actually I'd be ok addressing this as a follow-up.



--
To view, visit http://gerrit.cloudera.org:8080/16648
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I42c1ad095dcb4bdffcbe0ecf9631a60bac208c2a
Gerrit-Change-Number: 16648
Gerrit-PatchSet: 14
Gerrit-Owner: Hao Hao 
Gerrit-Reviewer: Alexey Serbin 
Gerrit-Reviewer: Andrew Wong 
Gerrit-Reviewer: Hao Hao 
Gerrit-Reviewer: Kudu Jenkins (120)
Gerrit-Reviewer: Tidy Bot (241)
Gerrit-Comment-Date: Sat, 30 Jan 2021 20:38:09 +
Gerrit-HasComments: Yes


[kudu-CR] KUDU-2612: restrict TxnStatusManager calls to be made by the leader only

2021-01-30 Thread Andrew Wong (Code Review)
Andrew Wong has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/16648 )

Change subject: KUDU-2612: restrict TxnStatusManager calls to be made by the 
leader only
..


Patch Set 14:

(2 comments)

http://gerrit.cloudera.org:8080/#/c/16648/14/src/kudu/integration-tests/txn_status_manager-itest.cc
File src/kudu/integration-tests/txn_status_manager-itest.cc:

http://gerrit.cloudera.org:8080/#/c/16648/14/src/kudu/integration-tests/txn_status_manager-itest.cc@447
PS14, Line 447:   SleepFor(MonoDelta::FromMilliseconds(5 * 
keepalive_interval_ms));
nit: Why do we need to sleep for so long if we're already retrying?


http://gerrit.cloudera.org:8080/#/c/16648/14/src/kudu/transactions/txn_status_manager.cc
File src/kudu/transactions/txn_status_manager.cc:

http://gerrit.cloudera.org:8080/#/c/16648/14/src/kudu/transactions/txn_status_manager.cc@443
PS14, Line 443: 
  :   // In all other cases non-OK status is considered fatal.
  :   LOG(FATAL) << Substitute("$0 failed: $1", op_description, 
s.ToString());
  :   __builtin_unreachable();
Why might we get here? I thought we didn't want to crash at all? Can we get 
here if there's a disk failure or somesuch?



--
To view, visit http://gerrit.cloudera.org:8080/16648
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I42c1ad095dcb4bdffcbe0ecf9631a60bac208c2a
Gerrit-Change-Number: 16648
Gerrit-PatchSet: 14
Gerrit-Owner: Hao Hao 
Gerrit-Reviewer: Alexey Serbin 
Gerrit-Reviewer: Andrew Wong 
Gerrit-Reviewer: Hao Hao 
Gerrit-Reviewer: Kudu Jenkins (120)
Gerrit-Reviewer: Tidy Bot (241)
Gerrit-Comment-Date: Sat, 30 Jan 2021 19:53:39 +
Gerrit-HasComments: Yes


[kudu-CR] KUDU-2612: restrict TxnStatusManager calls to be made by the leader only

2021-01-30 Thread Hao Hao (Code Review)
Hao Hao has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/16648 )

Change subject: KUDU-2612: restrict TxnStatusManager calls to be made by the 
leader only
..


Patch Set 14:

(4 comments)

http://gerrit.cloudera.org:8080/#/c/16648/13/src/kudu/integration-tests/txn_status_table-itest.cc
File src/kudu/integration-tests/txn_status_table-itest.cc:

http://gerrit.cloudera.org:8080/#/c/16648/13/src/kudu/integration-tests/txn_status_table-itest.cc@818
PS13, Line 818: class TxnStatusTableElectionStormITest : public 
TxnStatusTableITest {
> Is this guaranteed to happen? If we need to change leadership, quiescing mi
Done


http://gerrit.cloudera.org:8080/#/c/16648/13/src/kudu/master/txn_manager-test.cc
File src/kudu/master/txn_manager-test.cc:

PS13:
> Why the changes in this file?
Came across this code and saw some nits can be improved (but I guess also 
introduced some redundant dependency). Will revert it back.


http://gerrit.cloudera.org:8080/#/c/16648/13/src/kudu/transactions/txn_status_manager.cc
File src/kudu/transactions/txn_status_manager.cc:

http://gerrit.cloudera.org:8080/#/c/16648/13/src/kudu/transactions/txn_status_manager.cc@214
PS13, Line 214: PREDICT_FALSE(!replica_status_.ok() ||
  :   
FLAGS_txn_status_tablet_inject_uninitialized_leader_status_error)
> nit: wrap in a single PREDICT_FALSE
Done


http://gerrit.cloudera.org:8080/#/c/16648/13/src/kudu/transactions/txn_status_manager.cc@400
PS13, Line 400: PREDICT_FALSE(!s.ok() ||
> nit: maybe wrap the entire thing in PREDICT_FALSE instead of just the flag
Done



--
To view, visit http://gerrit.cloudera.org:8080/16648
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I42c1ad095dcb4bdffcbe0ecf9631a60bac208c2a
Gerrit-Change-Number: 16648
Gerrit-PatchSet: 14
Gerrit-Owner: Hao Hao 
Gerrit-Reviewer: Alexey Serbin 
Gerrit-Reviewer: Andrew Wong 
Gerrit-Reviewer: Hao Hao 
Gerrit-Reviewer: Kudu Jenkins (120)
Gerrit-Reviewer: Tidy Bot (241)
Gerrit-Comment-Date: Sat, 30 Jan 2021 08:21:54 +
Gerrit-HasComments: Yes


[kudu-CR] KUDU-2612: restrict TxnStatusManager calls to be made by the leader only

2021-01-30 Thread Hao Hao (Code Review)
Hello Tidy Bot, Alexey Serbin, Kudu Jenkins, Andrew Wong,

I'd like you to reexamine a change. Please visit

http://gerrit.cloudera.org:8080/16648

to look at the new patch set (#14).

Change subject: KUDU-2612: restrict TxnStatusManager calls to be made by the 
leader only
..

KUDU-2612: restrict TxnStatusManager calls to be made by the leader only

Currently, even though a non-leader TxnStatusManager will be unable to
write to the underlying table (in the Raft subsystem the write would be
aborted), we may want to restrict calls to be made by the leader
TxnStatusManagers only. The motivation is to provide a more robust system,
which avoids cases when the request was sent to a laggy follower, we may
end up failing the request with an error.

This patch introduces ScopedLeaderSharedLock (similar to the one in Catalog
Manager) to be used to ensure the requests were sent to leaders only and
to block all other operations while reloading the persistent transaction
status metadata upon leadership changes. Note that during failover the
leader replica will wait until in-flight ops in the previous consensus
term to be applied before reloading the metadata.

Change-Id: I42c1ad095dcb4bdffcbe0ecf9631a60bac208c2a
---
M src/kudu/client/client-test.cc
M src/kudu/integration-tests/ts_tablet_manager-itest.cc
M src/kudu/integration-tests/txn_status_manager-itest.cc
M src/kudu/integration-tests/txn_status_table-itest.cc
M src/kudu/master/sys_catalog.cc
M src/kudu/tablet/tablet_replica-test-base.cc
M src/kudu/tablet/tablet_replica.cc
M src/kudu/tablet/tablet_replica.h
M src/kudu/tablet/txn_coordinator.h
M src/kudu/transactions/txn_status_manager-test.cc
M src/kudu/transactions/txn_status_manager.cc
M src/kudu/transactions/txn_status_manager.h
M src/kudu/transactions/txn_status_tablet.h
M src/kudu/tserver/tablet_copy_source_session-test.cc
M src/kudu/tserver/tablet_service.cc
M src/kudu/tserver/ts_tablet_manager.cc
M src/kudu/tserver/ts_tablet_manager.h
17 files changed, 833 insertions(+), 140 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/kudu refs/changes/48/16648/14
--
To view, visit http://gerrit.cloudera.org:8080/16648
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I42c1ad095dcb4bdffcbe0ecf9631a60bac208c2a
Gerrit-Change-Number: 16648
Gerrit-PatchSet: 14
Gerrit-Owner: Hao Hao 
Gerrit-Reviewer: Alexey Serbin 
Gerrit-Reviewer: Andrew Wong 
Gerrit-Reviewer: Hao Hao 
Gerrit-Reviewer: Kudu Jenkins (120)
Gerrit-Reviewer: Tidy Bot (241)


[kudu-CR] KUDU-2612: restrict TxnStatusManager calls to be made by the leader only

2021-01-29 Thread Andrew Wong (Code Review)
Andrew Wong has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/16648 )

Change subject: KUDU-2612: restrict TxnStatusManager calls to be made by the 
leader only
..


Patch Set 13:

(4 comments)

http://gerrit.cloudera.org:8080/#/c/16648/13/src/kudu/integration-tests/txn_status_table-itest.cc
File src/kudu/integration-tests/txn_status_table-itest.cc:

http://gerrit.cloudera.org:8080/#/c/16648/13/src/kudu/integration-tests/txn_status_table-itest.cc@818
PS13, Line 818:   // Make sure a re-election happens before the following read.
Is this guaranteed to happen? If we need to change leadership, quiescing might 
be an approach. Look around for 'mutable_quiescing()' in some existing tests.


http://gerrit.cloudera.org:8080/#/c/16648/13/src/kudu/master/txn_manager-test.cc
File src/kudu/master/txn_manager-test.cc:

PS13:
Why the changes in this file?


http://gerrit.cloudera.org:8080/#/c/16648/13/src/kudu/transactions/txn_status_manager.cc
File src/kudu/transactions/txn_status_manager.cc:

http://gerrit.cloudera.org:8080/#/c/16648/13/src/kudu/transactions/txn_status_manager.cc@214
PS13, Line 214: PREDICT_FALSE(!replica_status_.ok()) ||
  : 
PREDICT_FALSE(FLAGS_txn_status_tablet_inject_uninitialized_leader_status_error)
nit: wrap in a single PREDICT_FALSE


http://gerrit.cloudera.org:8080/#/c/16648/13/src/kudu/transactions/txn_status_manager.cc@400
PS13, Line 400: !s.ok() || PREDICT_FALSE
nit: maybe wrap the entire thing in PREDICT_FALSE instead of just the flag



--
To view, visit http://gerrit.cloudera.org:8080/16648
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I42c1ad095dcb4bdffcbe0ecf9631a60bac208c2a
Gerrit-Change-Number: 16648
Gerrit-PatchSet: 13
Gerrit-Owner: Hao Hao 
Gerrit-Reviewer: Alexey Serbin 
Gerrit-Reviewer: Andrew Wong 
Gerrit-Reviewer: Hao Hao 
Gerrit-Reviewer: Kudu Jenkins (120)
Gerrit-Reviewer: Tidy Bot (241)
Gerrit-Comment-Date: Fri, 29 Jan 2021 08:43:43 +
Gerrit-HasComments: Yes


[kudu-CR] KUDU-2612: restrict TxnStatusManager calls to be made by the leader only

2021-01-28 Thread Hao Hao (Code Review)
Hao Hao has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/16648 )

Change subject: KUDU-2612: restrict TxnStatusManager calls to be made by the 
leader only
..


Patch Set 13: Verified+1

Unrelated flaky test ExternalMiniClusterTest.TestBasicOperation


--
To view, visit http://gerrit.cloudera.org:8080/16648
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I42c1ad095dcb4bdffcbe0ecf9631a60bac208c2a
Gerrit-Change-Number: 16648
Gerrit-PatchSet: 13
Gerrit-Owner: Hao Hao 
Gerrit-Reviewer: Alexey Serbin 
Gerrit-Reviewer: Andrew Wong 
Gerrit-Reviewer: Hao Hao 
Gerrit-Reviewer: Kudu Jenkins (120)
Gerrit-Reviewer: Tidy Bot (241)
Gerrit-Comment-Date: Fri, 29 Jan 2021 06:52:16 +
Gerrit-HasComments: No


[kudu-CR] KUDU-2612: restrict TxnStatusManager calls to be made by the leader only

2021-01-28 Thread Hao Hao (Code Review)
Hao Hao has removed a vote on this change.

Change subject: KUDU-2612: restrict TxnStatusManager calls to be made by the 
leader only
..


Removed Verified-1 by Kudu Jenkins (120)
--
To view, visit http://gerrit.cloudera.org:8080/16648
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: deleteVote
Gerrit-Change-Id: I42c1ad095dcb4bdffcbe0ecf9631a60bac208c2a
Gerrit-Change-Number: 16648
Gerrit-PatchSet: 13
Gerrit-Owner: Hao Hao 
Gerrit-Reviewer: Alexey Serbin 
Gerrit-Reviewer: Andrew Wong 
Gerrit-Reviewer: Hao Hao 
Gerrit-Reviewer: Kudu Jenkins (120)
Gerrit-Reviewer: Tidy Bot (241)


[kudu-CR] KUDU-2612: restrict TxnStatusManager calls to be made by the leader only

2021-01-28 Thread Hao Hao (Code Review)
Hello Tidy Bot, Alexey Serbin, Kudu Jenkins, Andrew Wong,

I'd like you to reexamine a change. Please visit

http://gerrit.cloudera.org:8080/16648

to look at the new patch set (#13).

Change subject: KUDU-2612: restrict TxnStatusManager calls to be made by the 
leader only
..

KUDU-2612: restrict TxnStatusManager calls to be made by the leader only

Currently, even though a non-leader TxnStatusManager will be unable to
write to the underlying table (in the Raft subsystem the write would be
aborted), we may want to restrict calls to be made by the leader
TxnStatusManagers only. The motivation is to provide a more robust system,
which avoids cases when the request was sent to a laggy follower, we may
end up failing the request with an error.

This patch introduces ScopedLeaderSharedLock (similar to the one in Catalog
Manager) to be used to ensure the requests were sent to leaders only and
to block all other operations while reloading the persistent transaction
status metadata upon leadership changes. Note that during failover the
leader replica will wait until in-flight ops in the previous consensus
term to be applied before reloading the metadata.

Change-Id: I42c1ad095dcb4bdffcbe0ecf9631a60bac208c2a
---
M src/kudu/client/client-test.cc
M src/kudu/integration-tests/ts_tablet_manager-itest.cc
M src/kudu/integration-tests/txn_status_manager-itest.cc
M src/kudu/integration-tests/txn_status_table-itest.cc
M src/kudu/master/sys_catalog.cc
M src/kudu/master/txn_manager-test.cc
M src/kudu/tablet/tablet_replica-test-base.cc
M src/kudu/tablet/tablet_replica.cc
M src/kudu/tablet/tablet_replica.h
M src/kudu/tablet/txn_coordinator.h
M src/kudu/transactions/txn_status_manager-test.cc
M src/kudu/transactions/txn_status_manager.cc
M src/kudu/transactions/txn_status_manager.h
M src/kudu/transactions/txn_status_tablet.h
M src/kudu/tserver/tablet_copy_source_session-test.cc
M src/kudu/tserver/tablet_service.cc
M src/kudu/tserver/ts_tablet_manager.cc
M src/kudu/tserver/ts_tablet_manager.h
18 files changed, 825 insertions(+), 144 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/kudu refs/changes/48/16648/13
--
To view, visit http://gerrit.cloudera.org:8080/16648
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I42c1ad095dcb4bdffcbe0ecf9631a60bac208c2a
Gerrit-Change-Number: 16648
Gerrit-PatchSet: 13
Gerrit-Owner: Hao Hao 
Gerrit-Reviewer: Alexey Serbin 
Gerrit-Reviewer: Andrew Wong 
Gerrit-Reviewer: Hao Hao 
Gerrit-Reviewer: Kudu Jenkins (120)
Gerrit-Reviewer: Tidy Bot (241)


[kudu-CR] KUDU-2612: restrict TxnStatusManager calls to be made by the leader only

2021-01-28 Thread Hao Hao (Code Review)
Hao Hao has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/16648 )

Change subject: KUDU-2612: restrict TxnStatusManager calls to be made by the 
leader only
..


Patch Set 12:

(10 comments)

http://gerrit.cloudera.org:8080/#/c/16648/11/src/kudu/tablet/tablet_replica.h
File src/kudu/tablet/tablet_replica.h:

http://gerrit.cloudera.org:8080/#/c/16648/11/src/kudu/tablet/tablet_replica.h@329
PS11, Line 329: transaction status table and is
> nit: just "transaction status table"
Done


http://gerrit.cloudera.org:8080/#/c/16648/11/src/kudu/tablet/tablet_replica.h@332
PS11, Line 332: More concretely, if this returns
  :   // 'true', callers must call 
DecreaseTxnCoordinatorTaskCounter().
  :   //
> nit: more concretely, if this returns 'true', callers must call DecreaseTxn
Done


http://gerrit.cloudera.org:8080/#/c/16648/11/src/kudu/tablet/tablet_replica.h@336
PS11, Line 336:
> nit: these methods don't run anything, so how about "if the caller should r
Done


http://gerrit.cloudera.org:8080/#/c/16648/11/src/kudu/tablet/tablet_replica.h@340
PS11, Line 340: transaction status table
  :   // and is
> nit: here too
Done


http://gerrit.cloudera.org:8080/#/c/16648/11/src/kudu/tablet/tablet_replica.cc
File src/kudu/tablet/tablet_replica.cc:

http://gerrit.cloudera.org:8080/#/c/16648/11/src/kudu/tablet/tablet_replica.cc@1010
PS11, Line 1010: std::l
> nit: could use DCHECK_GE
Done


http://gerrit.cloudera.org:8080/#/c/16648/11/src/kudu/transactions/txn_status_manager.h
File src/kudu/transactions/txn_status_manager.h:

http://gerrit.cloudera.org:8080/#/c/16648/11/src/kudu/transactions/txn_status_manager.h@100
PS11, Line 100: 'txn_coordinator' must be a 'TxnStatusManager*' because of the
  : // downcast in the initia
> nit: more specifically, 'txn_coordinator' must be a TxnStatusManager* becau
Done


http://gerrit.cloudera.org:8080/#/c/16648/6/src/kudu/transactions/txn_status_manager.cc
File src/kudu/transactions/txn_status_manager.cc:

http://gerrit.cloudera.org:8080/#/c/16648/6/src/kudu/transactions/txn_status_manager.cc@83
PS6, Line 83:  false,
: "Finalize commit
> Sounds good. Curious if we have test coverage for this.
Done


http://gerrit.cloudera.org:8080/#/c/16648/11/src/kudu/transactions/txn_status_manager.cc
File src/kudu/transactions/txn_status_manager.cc:

http://gerrit.cloudera.org:8080/#/c/16648/11/src/kudu/transactions/txn_status_manager.cc@239
PS11, Line 239:
> Do we have tests for this?
Done


http://gerrit.cloudera.org:8080/#/c/16648/11/src/kudu/transactions/txn_status_manager.cc@356
PS11, Line 356:
> nit: why not put this directly in the if() condition? Same with other insta
Done


http://gerrit.cloudera.org:8080/#/c/16648/11/src/kudu/transactions/txn_status_manager.cc@358
PS11, Line 358: t get an unexpected error response and bail instead
> nit: this might not be that useful. It'll be obvious that the tablet is shu
Done



--
To view, visit http://gerrit.cloudera.org:8080/16648
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I42c1ad095dcb4bdffcbe0ecf9631a60bac208c2a
Gerrit-Change-Number: 16648
Gerrit-PatchSet: 12
Gerrit-Owner: Hao Hao 
Gerrit-Reviewer: Alexey Serbin 
Gerrit-Reviewer: Andrew Wong 
Gerrit-Reviewer: Hao Hao 
Gerrit-Reviewer: Kudu Jenkins (120)
Gerrit-Reviewer: Tidy Bot (241)
Gerrit-Comment-Date: Fri, 29 Jan 2021 02:32:45 +
Gerrit-HasComments: Yes


[kudu-CR] KUDU-2612: restrict TxnStatusManager calls to be made by the leader only

2021-01-28 Thread Hao Hao (Code Review)
Hello Tidy Bot, Alexey Serbin, Kudu Jenkins, Andrew Wong,

I'd like you to reexamine a change. Please visit

http://gerrit.cloudera.org:8080/16648

to look at the new patch set (#12).

Change subject: KUDU-2612: restrict TxnStatusManager calls to be made by the 
leader only
..

KUDU-2612: restrict TxnStatusManager calls to be made by the leader only

Currently, even though a non-leader TxnStatusManager will be unable to
write to the underlying table (in the Raft subsystem the write would be
aborted), we may want to restrict calls to be made by the leader
TxnStatusManagers only. The motivation is to provide a more robust system,
which avoids cases when the request was sent to a laggy follower, we may
end up failing the request with an error.

This patch introduces ScopedLeaderSharedLock (similar to the one in Catalog
Manager) to be used to ensure the requests were sent to leaders only and
to block all other operations while reloading the persistent transaction
status metadata upon leadership changes. Note that during failover the
leader replica will wait until in-flight ops in the previous consensus
term to be applied before reloading the metadata.

Change-Id: I42c1ad095dcb4bdffcbe0ecf9631a60bac208c2a
---
M src/kudu/client/client-test.cc
M src/kudu/integration-tests/ts_tablet_manager-itest.cc
M src/kudu/integration-tests/txn_status_manager-itest.cc
M src/kudu/integration-tests/txn_status_table-itest.cc
M src/kudu/master/sys_catalog.cc
M src/kudu/master/txn_manager-test.cc
M src/kudu/tablet/tablet_replica-test-base.cc
M src/kudu/tablet/tablet_replica.cc
M src/kudu/tablet/tablet_replica.h
M src/kudu/tablet/txn_coordinator.h
M src/kudu/transactions/txn_status_manager-test.cc
M src/kudu/transactions/txn_status_manager.cc
M src/kudu/transactions/txn_status_manager.h
M src/kudu/transactions/txn_status_tablet.h
M src/kudu/tserver/tablet_copy_source_session-test.cc
M src/kudu/tserver/tablet_service.cc
M src/kudu/tserver/ts_tablet_manager.cc
M src/kudu/tserver/ts_tablet_manager.h
18 files changed, 825 insertions(+), 144 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/kudu refs/changes/48/16648/12
--
To view, visit http://gerrit.cloudera.org:8080/16648
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I42c1ad095dcb4bdffcbe0ecf9631a60bac208c2a
Gerrit-Change-Number: 16648
Gerrit-PatchSet: 12
Gerrit-Owner: Hao Hao 
Gerrit-Reviewer: Alexey Serbin 
Gerrit-Reviewer: Andrew Wong 
Gerrit-Reviewer: Hao Hao 
Gerrit-Reviewer: Kudu Jenkins (120)
Gerrit-Reviewer: Tidy Bot (241)


[kudu-CR] KUDU-2612: restrict TxnStatusManager calls to be made by the leader only

2021-01-20 Thread Andrew Wong (Code Review)
Andrew Wong has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/16648 )

Change subject: KUDU-2612: restrict TxnStatusManager calls to be made by the 
leader only
..


Patch Set 11:

(10 comments)

Overall looks good, just more nits and a couple of test suggestions.

http://gerrit.cloudera.org:8080/#/c/16648/11/src/kudu/tablet/tablet_replica.h
File src/kudu/tablet/tablet_replica.h:

http://gerrit.cloudera.org:8080/#/c/16648/11/src/kudu/tablet/tablet_replica.h@329
PS11, Line 329: transaction status manager table
nit: just "transaction status table"


http://gerrit.cloudera.org:8080/#/c/16648/11/src/kudu/tablet/tablet_replica.h@332
PS11, Line 332: Thus, the caller needs to call
  :   // 'DecreaseTxnCoordinatorTaskCounter' after the task is 
executed or
  :   // aborted to unblock the tablet shutting down process.
nit: more concretely, if this returns 'true', callers must call 
DecreaseTxnCoordinatorTaskCounter(). Same below.


http://gerrit.cloudera.org:8080/#/c/16648/11/src/kudu/tablet/tablet_replica.h@336
PS11, Line 336: successfully
nit: these methods don't run anything, so how about "if the caller should run 
the staleness task"

Same below


http://gerrit.cloudera.org:8080/#/c/16648/11/src/kudu/tablet/tablet_replica.h@340
PS11, Line 340: transaction status manager
  :   // table
nit: here too


http://gerrit.cloudera.org:8080/#/c/16648/11/src/kudu/tablet/tablet_replica.cc
File src/kudu/tablet/tablet_replica.cc:

http://gerrit.cloudera.org:8080/#/c/16648/11/src/kudu/tablet/tablet_replica.cc@1010
PS11, Line 1010: DCHECK
nit: could use DCHECK_GE


http://gerrit.cloudera.org:8080/#/c/16648/11/src/kudu/transactions/txn_status_manager.h
File src/kudu/transactions/txn_status_manager.h:

http://gerrit.cloudera.org:8080/#/c/16648/11/src/kudu/transactions/txn_status_manager.h@100
PS11, Line 100: The expect type of 'txn_coordinator' is a subclass of 
TxnCoordinator,
  : // e.g. TxnStatusManager.
nit: more specifically, 'txn_coordinator' must be a TxnStatusManager* because 
of the down_cast, no?


http://gerrit.cloudera.org:8080/#/c/16648/6/src/kudu/transactions/txn_status_manager.cc
File src/kudu/transactions/txn_status_manager.cc:

http://gerrit.cloudera.org:8080/#/c/16648/6/src/kudu/transactions/txn_status_manager.cc@83
PS6, Line 83: ing all operations replicated "
:  "by the previou
> After thinking more, I decide to remove the crash logic and just return, si
Sounds good. Curious if we have test coverage for this.


http://gerrit.cloudera.org:8080/#/c/16648/11/src/kudu/transactions/txn_status_manager.cc
File src/kudu/transactions/txn_status_manager.cc:

http://gerrit.cloudera.org:8080/#/c/16648/11/src/kudu/transactions/txn_status_manager.cc@239
PS11, Line 239: TABLET_NOT_RUNNING);
Do we have tests for this?


http://gerrit.cloudera.org:8080/#/c/16648/11/src/kudu/transactions/txn_status_manager.cc@356
PS11, Line 356: status_tablet_.tablet_replica_->IsShuttingDown();
nit: why not put this directly in the if() condition? Same with other instances 
with task_enabled


http://gerrit.cloudera.org:8080/#/c/16648/11/src/kudu/transactions/txn_status_manager.cc@358
PS11, Line 358: "The tablet is already shutting down or shutdown. ";
nit: this might not be that useful. It'll be obvious that the tablet is 
shutting down from surrounding messages. This should probably just say "Not 
loading TxnStatusManager" or something



--
To view, visit http://gerrit.cloudera.org:8080/16648
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I42c1ad095dcb4bdffcbe0ecf9631a60bac208c2a
Gerrit-Change-Number: 16648
Gerrit-PatchSet: 11
Gerrit-Owner: Hao Hao 
Gerrit-Reviewer: Alexey Serbin 
Gerrit-Reviewer: Andrew Wong 
Gerrit-Reviewer: Hao Hao 
Gerrit-Reviewer: Kudu Jenkins (120)
Gerrit-Reviewer: Tidy Bot (241)
Gerrit-Comment-Date: Thu, 21 Jan 2021 07:24:11 +
Gerrit-HasComments: Yes


[kudu-CR] KUDU-2612: restrict TxnStatusManager calls to be made by the leader only

2021-01-19 Thread Hao Hao (Code Review)
Hao Hao has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/16648 )

Change subject: KUDU-2612: restrict TxnStatusManager calls to be made by the 
leader only
..


Patch Set 11: Verified+1

TSAN build failure doesn't seem to relate to the patch.


--
To view, visit http://gerrit.cloudera.org:8080/16648
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I42c1ad095dcb4bdffcbe0ecf9631a60bac208c2a
Gerrit-Change-Number: 16648
Gerrit-PatchSet: 11
Gerrit-Owner: Hao Hao 
Gerrit-Reviewer: Alexey Serbin 
Gerrit-Reviewer: Andrew Wong 
Gerrit-Reviewer: Hao Hao 
Gerrit-Reviewer: Kudu Jenkins (120)
Gerrit-Reviewer: Tidy Bot (241)
Gerrit-Comment-Date: Tue, 19 Jan 2021 18:06:06 +
Gerrit-HasComments: No


[kudu-CR] KUDU-2612: restrict TxnStatusManager calls to be made by the leader only

2021-01-19 Thread Hao Hao (Code Review)
Hao Hao has removed a vote on this change.

Change subject: KUDU-2612: restrict TxnStatusManager calls to be made by the 
leader only
..


Removed Verified-1 by Kudu Jenkins (120)
--
To view, visit http://gerrit.cloudera.org:8080/16648
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: deleteVote
Gerrit-Change-Id: I42c1ad095dcb4bdffcbe0ecf9631a60bac208c2a
Gerrit-Change-Number: 16648
Gerrit-PatchSet: 11
Gerrit-Owner: Hao Hao 
Gerrit-Reviewer: Alexey Serbin 
Gerrit-Reviewer: Andrew Wong 
Gerrit-Reviewer: Hao Hao 
Gerrit-Reviewer: Kudu Jenkins (120)
Gerrit-Reviewer: Tidy Bot (241)


[kudu-CR] KUDU-2612: restrict TxnStatusManager calls to be made by the leader only

2021-01-18 Thread Hao Hao (Code Review)
Hello Tidy Bot, Alexey Serbin, Kudu Jenkins, Andrew Wong,

I'd like you to reexamine a change. Please visit

http://gerrit.cloudera.org:8080/16648

to look at the new patch set (#11).

Change subject: KUDU-2612: restrict TxnStatusManager calls to be made by the 
leader only
..

KUDU-2612: restrict TxnStatusManager calls to be made by the leader only

Currently, even though a non-leader TxnStatusManager will be unable to
write to the underlying table (in the Raft subsystem the write would be
aborted), we may want to restrict calls to be made by the leader
TxnStatusManagers only. The motivation is to provide a more robust system,
which avoids cases when the request was sent to a laggy follower, we may
end up failing the request with an error.

This patch introduces ScopedLeaderSharedLock (similar to the one in Catalog
Manager) to be used to ensure the requests were sent to leaders only and
to block all other operations while reloading the persistent transaction
status metadata upon leadership changes. Note that during failover the
leader replica will wait until in-flight ops in the previous consensus
term to be applied before reloading the metadata.

Change-Id: I42c1ad095dcb4bdffcbe0ecf9631a60bac208c2a
---
M src/kudu/client/client-test.cc
M src/kudu/integration-tests/ts_tablet_manager-itest.cc
M src/kudu/integration-tests/txn_status_manager-itest.cc
M src/kudu/integration-tests/txn_status_table-itest.cc
M src/kudu/master/sys_catalog.cc
M src/kudu/master/txn_manager-test.cc
M src/kudu/tablet/tablet_replica-test-base.cc
M src/kudu/tablet/tablet_replica.cc
M src/kudu/tablet/tablet_replica.h
M src/kudu/tablet/txn_coordinator.h
M src/kudu/transactions/txn_status_manager-test.cc
M src/kudu/transactions/txn_status_manager.cc
M src/kudu/transactions/txn_status_manager.h
M src/kudu/transactions/txn_status_tablet.h
M src/kudu/tserver/tablet_copy_source_session-test.cc
M src/kudu/tserver/tablet_service.cc
M src/kudu/tserver/ts_tablet_manager.cc
M src/kudu/tserver/ts_tablet_manager.h
18 files changed, 766 insertions(+), 144 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/kudu refs/changes/48/16648/11
--
To view, visit http://gerrit.cloudera.org:8080/16648
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I42c1ad095dcb4bdffcbe0ecf9631a60bac208c2a
Gerrit-Change-Number: 16648
Gerrit-PatchSet: 11
Gerrit-Owner: Hao Hao 
Gerrit-Reviewer: Alexey Serbin 
Gerrit-Reviewer: Andrew Wong 
Gerrit-Reviewer: Hao Hao 
Gerrit-Reviewer: Kudu Jenkins (120)
Gerrit-Reviewer: Tidy Bot (241)


[kudu-CR] KUDU-2612: restrict TxnStatusManager calls to be made by the leader only

2021-01-18 Thread Hao Hao (Code Review)
Hao Hao has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/16648 )

Change subject: KUDU-2612: restrict TxnStatusManager calls to be made by the 
leader only
..


Patch Set 9:

(7 comments)

> Patch Set 9:
>
> (7 comments)
>
> IWYU isn't happy yet:
>
> >>> Fixing #includes in 
> >>> '/home/jenkins-slave/workspace/kudu-master/2/src/kudu/transactions/txn_status_manager.cc'
> @@ -32,6 +32,7 @@
>  #include "kudu/common/wire_protocol.h"
>  #include "kudu/consensus/metadata.pb.h"
>  #include "kudu/consensus/raft_consensus.h"
> +#include "kudu/gutil/casts.h"
>  #include "kudu/gutil/macros.h"
>  #include "kudu/gutil/map-util.h"
>  #include "kudu/gutil/port.h"
> IWYU would have edited 1 files on your behalf.

Thanks a lot for looking into the failure!

http://gerrit.cloudera.org:8080/#/c/16648/9/src/kudu/tablet/tablet_replica.cc
File src/kudu/tablet/tablet_replica.cc:

http://gerrit.cloudera.org:8080/#/c/16648/9/src/kudu/tablet/tablet_replica.cc@304
PS9, Line 304: WaitUntilTxnCoordinatorTasksFinished();
> How do we protect against the race when right after this 'txn_coordinator_t
Are you asking after L304 which we waited 'txn_coordinator_task_counter_' to 
become zero. and then it again becomes non-zero and tablet is shutdown state. 
what happens if TxnStatusManager tries to access the tablet? In that case,  the 
task will not be enabled/ran when the tablet is shutting down. So that 
ShouldRunTxnCoordinatorStalenessTask() will return false and no new tasks will 
run.


http://gerrit.cloudera.org:8080/#/c/16648/9/src/kudu/tablet/tablet_replica.cc@471
PS9, Line 471:   if (state_ == STOPPING || state_ == STOPPED) {
 : return false;
 :   }
 :   return true;
> This looks a bit strange: it returns 'true' even if 'state_' is RUNNING?
Ah, good catch. Sorry it is the opposite meaning as what the method name 
implies here. Adjust the caller as well.


http://gerrit.cloudera.org:8080/#/c/16648/9/src/kudu/tablet/tablet_replica.cc@997
PS9, Line 997: state_
> What about SHUTDOWN state?
SHUTDOWN is only set after the tablet is fully stopped. Which the raft 
consensus has been stopped and the call back should no longer be triggered.


http://gerrit.cloudera.org:8080/#/c/16648/9/src/kudu/tablet/tablet_replica.cc@1009
PS9, Line 1009: txn_coordinator_task_counter_
> nit: is it expected that this counter goes negative?  If not, maybe add CHE
Done


http://gerrit.cloudera.org:8080/#/c/16648/9/src/kudu/transactions/txn_status_manager.h
File src/kudu/transactions/txn_status_manager.h:

http://gerrit.cloudera.org:8080/#/c/16648/9/src/kudu/transactions/txn_status_manager.h@101
PS9, Line 101: txn_coordinator
> nit: it would be great to add a comment about the expectations about the ty
Done


http://gerrit.cloudera.org:8080/#/c/16648/9/src/kudu/tserver/ts_tablet_manager.cc
File src/kudu/tserver/ts_tablet_manager.cc:

http://gerrit.cloudera.org:8080/#/c/16648/9/src/kudu/tserver/ts_tablet_manager.cc@1439
PS9, Line 1439: !r->CheckRunning().ok()
> Does it mean that the TxnStalenessTrackerTask() will cease to run at this t
Hmm, yeah, I guess we wouldn't want to stop the TxnStalenessTrackerTask() if 
some txn status tablets is shutdown. We may want to resume on checking the 
other tablets.


http://gerrit.cloudera.org:8080/#/c/16648/9/src/kudu/tserver/ts_tablet_manager.cc@1457
PS9, Line 1457: continue
> Maybe, we should jump not just to the next element 'replicas' container, bu
I am not sure why we want to do that? Why don't we check for the rest tablets?



--
To view, visit http://gerrit.cloudera.org:8080/16648
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I42c1ad095dcb4bdffcbe0ecf9631a60bac208c2a
Gerrit-Change-Number: 16648
Gerrit-PatchSet: 9
Gerrit-Owner: Hao Hao 
Gerrit-Reviewer: Alexey Serbin 
Gerrit-Reviewer: Andrew Wong 
Gerrit-Reviewer: Hao Hao 
Gerrit-Reviewer: Kudu Jenkins (120)
Gerrit-Reviewer: Tidy Bot (241)
Gerrit-Comment-Date: Tue, 19 Jan 2021 01:15:04 +
Gerrit-HasComments: Yes


[kudu-CR] KUDU-2612: restrict TxnStatusManager calls to be made by the leader only

2021-01-18 Thread Hao Hao (Code Review)
Hello Tidy Bot, Alexey Serbin, Kudu Jenkins, Andrew Wong,

I'd like you to reexamine a change. Please visit

http://gerrit.cloudera.org:8080/16648

to look at the new patch set (#10).

Change subject: KUDU-2612: restrict TxnStatusManager calls to be made by the 
leader only
..

KUDU-2612: restrict TxnStatusManager calls to be made by the leader only

Currently, even though a non-leader TxnStatusManager will be unable to
write to the underlying table (in the Raft subsystem the write would be
aborted), we may want to restrict calls to be made by the leader
TxnStatusManagers only. The motivation is to provide a more robust system,
which avoids cases when the request was sent to a laggy follower, we may
end up failing the request with an error.

This patch introduces ScopedLeaderSharedLock (similar to the one in Catalog
Manager) to be used to ensure the requests were sent to leaders only and
to block all other operations while reloading the persistent transaction
status metadata upon leadership changes. Note that during failover the
leader replica will wait until in-flight ops in the previous consensus
term to be applied before reloading the metadata.

Change-Id: I42c1ad095dcb4bdffcbe0ecf9631a60bac208c2a
---
M src/kudu/client/client-test.cc
M src/kudu/integration-tests/ts_tablet_manager-itest.cc
M src/kudu/integration-tests/txn_status_manager-itest.cc
M src/kudu/integration-tests/txn_status_table-itest.cc
M src/kudu/master/sys_catalog.cc
M src/kudu/master/txn_manager-test.cc
M src/kudu/tablet/tablet_replica-test-base.cc
M src/kudu/tablet/tablet_replica.cc
M src/kudu/tablet/tablet_replica.h
M src/kudu/tablet/txn_coordinator.h
M src/kudu/transactions/txn_status_manager-test.cc
M src/kudu/transactions/txn_status_manager.cc
M src/kudu/transactions/txn_status_manager.h
M src/kudu/transactions/txn_status_tablet.h
M src/kudu/tserver/tablet_copy_source_session-test.cc
M src/kudu/tserver/tablet_service.cc
M src/kudu/tserver/ts_tablet_manager.cc
M src/kudu/tserver/ts_tablet_manager.h
18 files changed, 766 insertions(+), 144 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/kudu refs/changes/48/16648/10
--
To view, visit http://gerrit.cloudera.org:8080/16648
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I42c1ad095dcb4bdffcbe0ecf9631a60bac208c2a
Gerrit-Change-Number: 16648
Gerrit-PatchSet: 10
Gerrit-Owner: Hao Hao 
Gerrit-Reviewer: Alexey Serbin 
Gerrit-Reviewer: Andrew Wong 
Gerrit-Reviewer: Hao Hao 
Gerrit-Reviewer: Kudu Jenkins (120)
Gerrit-Reviewer: Tidy Bot (241)


[kudu-CR] KUDU-2612: restrict TxnStatusManager calls to be made by the leader only

2021-01-15 Thread Alexey Serbin (Code Review)
Alexey Serbin has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/16648 )

Change subject: KUDU-2612: restrict TxnStatusManager calls to be made by the 
leader only
..


Patch Set 9:

(7 comments)

IWYU isn't happy yet:

>>> Fixing #includes in 
>>> '/home/jenkins-slave/workspace/kudu-master/2/src/kudu/transactions/txn_status_manager.cc'
@@ -32,6 +32,7 @@
 #include "kudu/common/wire_protocol.h"
 #include "kudu/consensus/metadata.pb.h"
 #include "kudu/consensus/raft_consensus.h"
+#include "kudu/gutil/casts.h"
 #include "kudu/gutil/macros.h"
 #include "kudu/gutil/map-util.h"
 #include "kudu/gutil/port.h"
IWYU would have edited 1 files on your behalf.

http://gerrit.cloudera.org:8080/#/c/16648/9/src/kudu/tablet/tablet_replica.cc
File src/kudu/tablet/tablet_replica.cc:

http://gerrit.cloudera.org:8080/#/c/16648/9/src/kudu/tablet/tablet_replica.cc@304
PS9, Line 304: WaitUntilTxnCoordinatorTasksFinished();
How do we protect against the race when right after this 
'txn_coordinator_task_counter_' becomes non-zero again and tablet_->Shutdown() 
is called when TxnStatusManager tries to access the tablet?

Should we just nullify txn_coordinator_ in 
WaitUntilTxnCoordinatorTasksFinished() or set another variable that controls 
the return status of ShouldRunTxnCoordinatorStalenessTask() along with 'state_'?


http://gerrit.cloudera.org:8080/#/c/16648/9/src/kudu/tablet/tablet_replica.cc@471
PS9, Line 471:   if (state_ == STOPPING || state_ == STOPPED) {
 : return false;
 :   }
 :   return true;
This looks a bit strange: it returns 'true' even if 'state_' is RUNNING?


http://gerrit.cloudera.org:8080/#/c/16648/9/src/kudu/tablet/tablet_replica.cc@997
PS9, Line 997: state_
What about SHUTDOWN state?


http://gerrit.cloudera.org:8080/#/c/16648/9/src/kudu/tablet/tablet_replica.cc@1009
PS9, Line 1009: txn_coordinator_task_counter_
nit: is it expected that this counter goes negative?  If not, maybe add 
CHECK()/DCHECK() to enforce the invariant here?


http://gerrit.cloudera.org:8080/#/c/16648/9/src/kudu/transactions/txn_status_manager.h
File src/kudu/transactions/txn_status_manager.h:

http://gerrit.cloudera.org:8080/#/c/16648/9/src/kudu/transactions/txn_status_manager.h@101
PS9, Line 101: txn_coordinator
nit: it would be great to add a comment about the expectations about the type 
of this parameter.


http://gerrit.cloudera.org:8080/#/c/16648/9/src/kudu/tserver/ts_tablet_manager.cc
File src/kudu/tserver/ts_tablet_manager.cc:

http://gerrit.cloudera.org:8080/#/c/16648/9/src/kudu/tserver/ts_tablet_manager.cc@1439
PS9, Line 1439: !r->CheckRunning().ok()
Does it mean that the TxnStalenessTrackerTask() will cease to run at this 
tablet server once one of its txn status tablets is shutdown (e.g., moved by 
other server by the rebalancer tool)?  I'm not sure that's the behavior we want.


http://gerrit.cloudera.org:8080/#/c/16648/9/src/kudu/tserver/ts_tablet_manager.cc@1457
PS9, Line 1457: continue
Maybe, we should jump not just to the next element 'replicas' container, but 
onto next iteration of the outer 'while (true)' loop?



--
To view, visit http://gerrit.cloudera.org:8080/16648
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I42c1ad095dcb4bdffcbe0ecf9631a60bac208c2a
Gerrit-Change-Number: 16648
Gerrit-PatchSet: 9
Gerrit-Owner: Hao Hao 
Gerrit-Reviewer: Alexey Serbin 
Gerrit-Reviewer: Andrew Wong 
Gerrit-Reviewer: Hao Hao 
Gerrit-Reviewer: Kudu Jenkins (120)
Gerrit-Reviewer: Tidy Bot (241)
Gerrit-Comment-Date: Sat, 16 Jan 2021 06:10:15 +
Gerrit-HasComments: Yes


[kudu-CR] KUDU-2612: restrict TxnStatusManager calls to be made by the leader only

2021-01-14 Thread Hao Hao (Code Review)
Hao Hao has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/16648 )

Change subject: KUDU-2612: restrict TxnStatusManager calls to be made by the 
leader only
..


Patch Set 8:

(3 comments)

http://gerrit.cloudera.org:8080/#/c/16648/6/src/kudu/client/client-test.cc
File src/kudu/client/client-test.cc:

http://gerrit.cloudera.org:8080/#/c/16648/6/src/kudu/client/client-test.cc@426
PS6, Line 426: dynamic_cast
> nit: I guess we prefer to use down_cast<> for this type of casts in the cod
Done


http://gerrit.cloudera.org:8080/#/c/16648/7/src/kudu/tserver/tablet_service.cc
File src/kudu/tserver/tablet_service.cc:

http://gerrit.cloudera.org:8080/#/c/16648/7/src/kudu/tserver/tablet_service.cc@1262
PS7, Line 1262: dynamic_cast
> nit: use down_cast<> here as well?
Done


http://gerrit.cloudera.org:8080/#/c/16648/7/src/kudu/tserver/ts_tablet_manager.cc
File src/kudu/tserver/ts_tablet_manager.cc:

http://gerrit.cloudera.org:8080/#/c/16648/7/src/kudu/tserver/ts_tablet_manager.cc@1453
PS7, Line 1453:   TxnStatusManager* txn_status_manager =
  :   dynamic_cast(coordinator);
  :   TxnStatusManager::ScopedLeaderSharedLock 
l(txn_status_manager);
> Sounds good to me, will update it.
Done



--
To view, visit http://gerrit.cloudera.org:8080/16648
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I42c1ad095dcb4bdffcbe0ecf9631a60bac208c2a
Gerrit-Change-Number: 16648
Gerrit-PatchSet: 8
Gerrit-Owner: Hao Hao 
Gerrit-Reviewer: Alexey Serbin 
Gerrit-Reviewer: Andrew Wong 
Gerrit-Reviewer: Hao Hao 
Gerrit-Reviewer: Kudu Jenkins (120)
Gerrit-Reviewer: Tidy Bot (241)
Gerrit-Comment-Date: Fri, 15 Jan 2021 05:19:06 +
Gerrit-HasComments: Yes


[kudu-CR] KUDU-2612: restrict TxnStatusManager calls to be made by the leader only

2021-01-14 Thread Hao Hao (Code Review)
Hello Tidy Bot, Alexey Serbin, Kudu Jenkins, Andrew Wong,

I'd like you to reexamine a change. Please visit

http://gerrit.cloudera.org:8080/16648

to look at the new patch set (#9).

Change subject: KUDU-2612: restrict TxnStatusManager calls to be made by the 
leader only
..

KUDU-2612: restrict TxnStatusManager calls to be made by the leader only

Currently, even though a non-leader TxnStatusManager will be unable to
write to the underlying table (in the Raft subsystem the write would be
aborted), we may want to restrict calls to be made by the leader
TxnStatusManagers only. The motivation is to provide a more robust system,
which avoids cases when the request was sent to a laggy follower, we may
end up failing the request with an error.

This patch introduces ScopedLeaderSharedLock (similar to the one in Catalog
Manager) to be used to ensure the requests were sent to leaders only and
to block all other operations while reloading the persistent transaction
status metadata upon leadership changes. Note that during failover the
leader replica will wait until in-flight ops in the previous consensus
term to be applied before reloading the metadata.

Change-Id: I42c1ad095dcb4bdffcbe0ecf9631a60bac208c2a
---
M src/kudu/client/client-test.cc
M src/kudu/integration-tests/ts_tablet_manager-itest.cc
M src/kudu/integration-tests/txn_status_manager-itest.cc
M src/kudu/integration-tests/txn_status_table-itest.cc
M src/kudu/master/sys_catalog.cc
M src/kudu/master/txn_manager-test.cc
M src/kudu/tablet/tablet_replica-test-base.cc
M src/kudu/tablet/tablet_replica.cc
M src/kudu/tablet/tablet_replica.h
M src/kudu/tablet/txn_coordinator.h
M src/kudu/transactions/txn_status_manager-test.cc
M src/kudu/transactions/txn_status_manager.cc
M src/kudu/transactions/txn_status_manager.h
M src/kudu/transactions/txn_status_tablet.h
M src/kudu/tserver/tablet_copy_source_session-test.cc
M src/kudu/tserver/tablet_service.cc
M src/kudu/tserver/ts_tablet_manager.cc
M src/kudu/tserver/ts_tablet_manager.h
18 files changed, 760 insertions(+), 145 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/kudu refs/changes/48/16648/9
--
To view, visit http://gerrit.cloudera.org:8080/16648
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I42c1ad095dcb4bdffcbe0ecf9631a60bac208c2a
Gerrit-Change-Number: 16648
Gerrit-PatchSet: 9
Gerrit-Owner: Hao Hao 
Gerrit-Reviewer: Alexey Serbin 
Gerrit-Reviewer: Andrew Wong 
Gerrit-Reviewer: Hao Hao 
Gerrit-Reviewer: Kudu Jenkins (120)
Gerrit-Reviewer: Tidy Bot (241)


[kudu-CR] KUDU-2612: restrict TxnStatusManager calls to be made by the leader only

2021-01-14 Thread Hao Hao (Code Review)
Hao Hao has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/16648 )

Change subject: KUDU-2612: restrict TxnStatusManager calls to be made by the 
leader only
..


Patch Set 8:

(1 comment)

http://gerrit.cloudera.org:8080/#/c/16648/7/src/kudu/tserver/ts_tablet_manager.cc
File src/kudu/tserver/ts_tablet_manager.cc:

http://gerrit.cloudera.org:8080/#/c/16648/7/src/kudu/tserver/ts_tablet_manager.cc@1453
PS7, Line 1453:   TxnStatusManager* txn_status_manager =
  :   dynamic_cast(coordinator);
  :   TxnStatusManager::ScopedLeaderSharedLock 
l(txn_status_manager);
> The rationale is to prevent code duplication: as I could see, all call site
Sounds good to me, will update it.



--
To view, visit http://gerrit.cloudera.org:8080/16648
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I42c1ad095dcb4bdffcbe0ecf9631a60bac208c2a
Gerrit-Change-Number: 16648
Gerrit-PatchSet: 8
Gerrit-Owner: Hao Hao 
Gerrit-Reviewer: Alexey Serbin 
Gerrit-Reviewer: Andrew Wong 
Gerrit-Reviewer: Hao Hao 
Gerrit-Reviewer: Kudu Jenkins (120)
Gerrit-Reviewer: Tidy Bot (241)
Gerrit-Comment-Date: Thu, 14 Jan 2021 21:02:44 +
Gerrit-HasComments: Yes


[kudu-CR] KUDU-2612: restrict TxnStatusManager calls to be made by the leader only

2021-01-14 Thread Alexey Serbin (Code Review)
Alexey Serbin has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/16648 )

Change subject: KUDU-2612: restrict TxnStatusManager calls to be made by the 
leader only
..


Patch Set 7:

(1 comment)

http://gerrit.cloudera.org:8080/#/c/16648/7/src/kudu/tserver/ts_tablet_manager.cc
File src/kudu/tserver/ts_tablet_manager.cc:

http://gerrit.cloudera.org:8080/#/c/16648/7/src/kudu/tserver/ts_tablet_manager.cc@1453
PS7, Line 1453:   TxnStatusManager* txn_status_manager =
  :   dynamic_cast(coordinator);
  :   TxnStatusManager::ScopedLeaderSharedLock 
l(txn_status_manager);
> Sounds doable for me, but wondering what is the rationale behind to perform
The rationale is to prevent code duplication: as I could see, all call sites 
which use TxnStatusManager::ScopedLeaderSharedLock contain this dynamic_cast 
piece, so why not to provide a constructor with proper signature.

Feel free to ignore this if you think it's not worth it.



--
To view, visit http://gerrit.cloudera.org:8080/16648
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I42c1ad095dcb4bdffcbe0ecf9631a60bac208c2a
Gerrit-Change-Number: 16648
Gerrit-PatchSet: 7
Gerrit-Owner: Hao Hao 
Gerrit-Reviewer: Alexey Serbin 
Gerrit-Reviewer: Andrew Wong 
Gerrit-Reviewer: Hao Hao 
Gerrit-Reviewer: Kudu Jenkins (120)
Gerrit-Reviewer: Tidy Bot (241)
Gerrit-Comment-Date: Thu, 14 Jan 2021 20:42:00 +
Gerrit-HasComments: Yes


[kudu-CR] KUDU-2612: restrict TxnStatusManager calls to be made by the leader only

2021-01-14 Thread Hao Hao (Code Review)
Hao Hao has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/16648 )

Change subject: KUDU-2612: restrict TxnStatusManager calls to be made by the 
leader only
..


Patch Set 8:

(2 comments)

http://gerrit.cloudera.org:8080/#/c/16648/3/src/kudu/integration-tests/txn_status_table-itest.cc
File src/kudu/integration-tests/txn_status_table-itest.cc:

http://gerrit.cloudera.org:8080/#/c/16648/3/src/kudu/integration-tests/txn_status_table-itest.cc@725
PS3, Line 725: FLAGS_leader_failure_max_missed_heartbeat_periods = 1;
 : FLAGS_raft_heartbeat_interval_ms = 30;
> The short heartbeat interval combined with just single missing heartbeat pe
Yes, when I looped I was using --stress_cpu_threads=16.


http://gerrit.cloudera.org:8080/#/c/16648/7/src/kudu/tserver/ts_tablet_manager.cc
File src/kudu/tserver/ts_tablet_manager.cc:

http://gerrit.cloudera.org:8080/#/c/16648/7/src/kudu/tserver/ts_tablet_manager.cc@1453
PS7, Line 1453:   TxnStatusManager* txn_status_manager =
  :   dynamic_cast(coordinator);
  :   TxnStatusManager::ScopedLeaderSharedLock 
l(txn_status_manager);
> Looking at this again and again prompts me to ask the following question: I
Sounds doable for me, but wondering what is the rationale behind to perform the 
downcast under the hood vs. doing it here?



--
To view, visit http://gerrit.cloudera.org:8080/16648
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I42c1ad095dcb4bdffcbe0ecf9631a60bac208c2a
Gerrit-Change-Number: 16648
Gerrit-PatchSet: 8
Gerrit-Owner: Hao Hao 
Gerrit-Reviewer: Alexey Serbin 
Gerrit-Reviewer: Andrew Wong 
Gerrit-Reviewer: Hao Hao 
Gerrit-Reviewer: Kudu Jenkins (120)
Gerrit-Reviewer: Tidy Bot (241)
Gerrit-Comment-Date: Thu, 14 Jan 2021 19:56:43 +
Gerrit-HasComments: Yes


[kudu-CR] KUDU-2612: restrict TxnStatusManager calls to be made by the leader only

2021-01-14 Thread Hao Hao (Code Review)
Hello Tidy Bot, Alexey Serbin, Kudu Jenkins, Andrew Wong,

I'd like you to reexamine a change. Please visit

http://gerrit.cloudera.org:8080/16648

to look at the new patch set (#8).

Change subject: KUDU-2612: restrict TxnStatusManager calls to be made by the 
leader only
..

KUDU-2612: restrict TxnStatusManager calls to be made by the leader only

Currently, even though a non-leader TxnStatusManager will be unable to
write to the underlying table (in the Raft subsystem the write would be
aborted), we may want to restrict calls to be made by the leader
TxnStatusManagers only. The motivation is to provide a more robust system,
which avoids cases when the request was sent to a laggy follower, we may
end up failing the request with an error.

This patch introduces ScopedLeaderSharedLock (similar to the one in Catalog
Manager) to be used to ensure the requests were sent to leaders only and
to block all other operations while reloading the persistent transaction
status metadata upon leadership changes. Note that during failover the
leader replica will wait until in-flight ops in the previous consensus
term to be applied before reloading the metadata.

Change-Id: I42c1ad095dcb4bdffcbe0ecf9631a60bac208c2a
---
M src/kudu/client/client-test.cc
M src/kudu/integration-tests/ts_tablet_manager-itest.cc
M src/kudu/integration-tests/txn_status_manager-itest.cc
M src/kudu/integration-tests/txn_status_table-itest.cc
M src/kudu/master/sys_catalog.cc
M src/kudu/master/txn_manager-test.cc
M src/kudu/tablet/tablet_replica-test-base.cc
M src/kudu/tablet/tablet_replica.cc
M src/kudu/tablet/tablet_replica.h
M src/kudu/tablet/txn_coordinator.h
M src/kudu/transactions/txn_status_manager-test.cc
M src/kudu/transactions/txn_status_manager.cc
M src/kudu/transactions/txn_status_manager.h
M src/kudu/transactions/txn_status_tablet.h
M src/kudu/tserver/tablet_copy_source_session-test.cc
M src/kudu/tserver/tablet_service.cc
M src/kudu/tserver/ts_tablet_manager.cc
M src/kudu/tserver/ts_tablet_manager.h
18 files changed, 760 insertions(+), 145 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/kudu refs/changes/48/16648/8
--
To view, visit http://gerrit.cloudera.org:8080/16648
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I42c1ad095dcb4bdffcbe0ecf9631a60bac208c2a
Gerrit-Change-Number: 16648
Gerrit-PatchSet: 8
Gerrit-Owner: Hao Hao 
Gerrit-Reviewer: Alexey Serbin 
Gerrit-Reviewer: Andrew Wong 
Gerrit-Reviewer: Hao Hao 
Gerrit-Reviewer: Kudu Jenkins (120)
Gerrit-Reviewer: Tidy Bot (241)


[kudu-CR] KUDU-2612: restrict TxnStatusManager calls to be made by the leader only

2021-01-14 Thread Alexey Serbin (Code Review)
Alexey Serbin has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/16648 )

Change subject: KUDU-2612: restrict TxnStatusManager calls to be made by the 
leader only
..


Patch Set 7:

(4 comments)

Yep, I agree with Andrew: after quick looks this looks much better to me now!

I hope to get one more pass tomorrow morning.  As for now, just a few nits not 
related to the essence, but rather syntax and code conventions.

http://gerrit.cloudera.org:8080/#/c/16648/6/src/kudu/client/client-test.cc
File src/kudu/client/client-test.cc:

http://gerrit.cloudera.org:8080/#/c/16648/6/src/kudu/client/client-test.cc@426
PS6, Line 426: dynamic_cast
nit: I guess we prefer to use down_cast<> for this type of casts in the code; 
see src/kudu/gutil/casts.h for details


http://gerrit.cloudera.org:8080/#/c/16648/3/src/kudu/integration-tests/txn_status_table-itest.cc
File src/kudu/integration-tests/txn_status_table-itest.cc:

http://gerrit.cloudera.org:8080/#/c/16648/3/src/kudu/integration-tests/txn_status_table-itest.cc@725
PS3, Line 725: FLAGS_leader_failure_max_missed_heartbeat_periods = 1;
 : FLAGS_raft_heartbeat_interval_ms = 30;
> Yes, ran looped the test in TSAN and didn't see failure caused by this. Is 
The short heartbeat interval combined with just single missing heartbeat period 
prompted me to make sure this test eventually converges when run under 
ASAN/TSAN with --stress_cpu_threads=16 or alike.  But since you verified it 
works as needed, it's great!


http://gerrit.cloudera.org:8080/#/c/16648/7/src/kudu/tserver/tablet_service.cc
File src/kudu/tserver/tablet_service.cc:

http://gerrit.cloudera.org:8080/#/c/16648/7/src/kudu/tserver/tablet_service.cc@1262
PS7, Line 1262: dynamic_cast
nit: use down_cast<> here as well?


http://gerrit.cloudera.org:8080/#/c/16648/7/src/kudu/tserver/ts_tablet_manager.cc
File src/kudu/tserver/ts_tablet_manager.cc:

http://gerrit.cloudera.org:8080/#/c/16648/7/src/kudu/tserver/ts_tablet_manager.cc@1453
PS7, Line 1453:   TxnStatusManager* txn_status_manager =
  :   dynamic_cast(coordinator);
  :   TxnStatusManager::ScopedLeaderSharedLock 
l(txn_status_manager);
Looking at this again and again prompts me to ask the following question: Is it 
possible to introduce a constructor of ScopedLeaderSharedLock with a parameter 
of TxnCoordinator* type?

The idea is to perform the downcast (using down_cast or dynamic_cast) under the 
hood, and then report an error via fist_failed_status() in debug mode if the 
coordinator could not be downcasted to TxnStatusManager?

What do you think?



--
To view, visit http://gerrit.cloudera.org:8080/16648
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I42c1ad095dcb4bdffcbe0ecf9631a60bac208c2a
Gerrit-Change-Number: 16648
Gerrit-PatchSet: 7
Gerrit-Owner: Hao Hao 
Gerrit-Reviewer: Alexey Serbin 
Gerrit-Reviewer: Andrew Wong 
Gerrit-Reviewer: Hao Hao 
Gerrit-Reviewer: Kudu Jenkins (120)
Gerrit-Reviewer: Tidy Bot (241)
Gerrit-Comment-Date: Thu, 14 Jan 2021 08:34:25 +
Gerrit-HasComments: Yes


[kudu-CR] KUDU-2612: restrict TxnStatusManager calls to be made by the leader only

2021-01-13 Thread Hao Hao (Code Review)
Hello Tidy Bot, Alexey Serbin, Kudu Jenkins, Andrew Wong,

I'd like you to reexamine a change. Please visit

http://gerrit.cloudera.org:8080/16648

to look at the new patch set (#7).

Change subject: KUDU-2612: restrict TxnStatusManager calls to be made by the 
leader only
..

KUDU-2612: restrict TxnStatusManager calls to be made by the leader only

Currently, even though a non-leader TxnStatusManager will be unable to
write to the underlying table (in the Raft subsystem the write would be
aborted), we may want to restrict calls to be made by the leader
TxnStatusManagers only. The motivation is to provide a more robust system,
which avoids cases when the request was sent to a laggy follower, we may
end up failing the request with an error.

This patch introduces ScopedLeaderSharedLock (similar to the one in Catalog
Manager) to be used to ensure the requests were sent to leaders only and
to block all other operations while reloading the persistent transaction
status metadata upon leadership changes. Note that during failover the
leader replica will wait until in-flight ops in the previous consensus
term to be applied before reloading the metadata.

Change-Id: I42c1ad095dcb4bdffcbe0ecf9631a60bac208c2a
---
M src/kudu/client/client-test.cc
M src/kudu/integration-tests/ts_tablet_manager-itest.cc
M src/kudu/integration-tests/txn_status_manager-itest.cc
M src/kudu/integration-tests/txn_status_table-itest.cc
M src/kudu/master/sys_catalog.cc
M src/kudu/master/txn_manager-test.cc
M src/kudu/tablet/tablet_replica-test-base.cc
M src/kudu/tablet/tablet_replica.cc
M src/kudu/tablet/tablet_replica.h
M src/kudu/tablet/txn_coordinator.h
M src/kudu/transactions/txn_status_manager-test.cc
M src/kudu/transactions/txn_status_manager.cc
M src/kudu/transactions/txn_status_manager.h
M src/kudu/transactions/txn_status_tablet.h
M src/kudu/tserver/tablet_copy_source_session-test.cc
M src/kudu/tserver/tablet_service.cc
M src/kudu/tserver/ts_tablet_manager.cc
M src/kudu/tserver/ts_tablet_manager.h
18 files changed, 760 insertions(+), 145 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/kudu refs/changes/48/16648/7
--
To view, visit http://gerrit.cloudera.org:8080/16648
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I42c1ad095dcb4bdffcbe0ecf9631a60bac208c2a
Gerrit-Change-Number: 16648
Gerrit-PatchSet: 7
Gerrit-Owner: Hao Hao 
Gerrit-Reviewer: Alexey Serbin 
Gerrit-Reviewer: Andrew Wong 
Gerrit-Reviewer: Hao Hao 
Gerrit-Reviewer: Kudu Jenkins (120)
Gerrit-Reviewer: Tidy Bot (241)


[kudu-CR] KUDU-2612: restrict TxnStatusManager calls to be made by the leader only

2021-01-13 Thread Hao Hao (Code Review)
Hao Hao has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/16648 )

Change subject: KUDU-2612: restrict TxnStatusManager calls to be made by the 
leader only
..


Patch Set 6:

(19 comments)

http://gerrit.cloudera.org:8080/#/c/16648/6/src/kudu/integration-tests/txn_status_table-itest.cc
File src/kudu/integration-tests/txn_status_table-itest.cc:

http://gerrit.cloudera.org:8080/#/c/16648/6/src/kudu/integration-tests/txn_status_table-itest.cc@744
PS6, Line 744: ;
> nit: drop extra semicolon
Done


http://gerrit.cloudera.org:8080/#/c/16648/6/src/kudu/integration-tests/txn_status_table-itest.cc@751
PS6, Line 751: threads.emplace_back([&] {
> What does the multithreading add here? Seems like it complicates things sin
The point is to ensure concurrent txn reads/writes have no failures with 
frequent leader elections. Add the comment to be clear.


http://gerrit.cloudera.org:8080/#/c/16648/6/src/kudu/integration-tests/txn_status_table-itest.cc@758
PS6, Line 758: if (s.ok()) {
> What failures are we expecting?
Add a comment


http://gerrit.cloudera.org:8080/#/c/16648/6/src/kudu/integration-tests/txn_status_table-itest.cc@748
PS6, Line 748:   std::atomic next_thread_id = 0;
 :   vector txn_id_index = { 1, 6, 11, 16, 21};
 :   for (int t = 0; t < 5; t++) {
 : threads.emplace_back([&] {
 :   int thread_id = next_thread_id++;
 :   CHECK_LT(thread_id, txn_id_index.size());
 :   int start_txn_id = txn_id_index[thread_id];
 :   for (int i = start_txn_id; i < start_txn_id + 5; i++) {
 : TxnStatusEntryPB txn_status;
 : Status s = txn_sys_client_->BeginTransaction(i, kUser);
 : if (s.ok()) {
 :   // Make sure a re-election happens before the 
following read.
 :   SleepFor(MonoDelta::FromMilliseconds(3 * 
FLAGS_raft_heartbeat_interval_ms));
 :   s = txn_sys_client_->GetTransactionStatus(i, kUser, 
&txn_status);
 :   CHECK_OK(s);
 :   CHECK(txn_status.has_user());
 :   CHECK_STREQ(kUser, txn_status.user().c_str());
 :   CHECK(txn_status.has_state());
 :   CHECK_EQ(TxnStatePB::OPEN, txn_status.state());
 : }
 :   }
 : });
 :   }
> nit: would be easier to read as:
Done


http://gerrit.cloudera.org:8080/#/c/16648/6/src/kudu/tablet/tablet_replica.cc
File src/kudu/tablet/tablet_replica.cc:

http://gerrit.cloudera.org:8080/#/c/16648/6/src/kudu/tablet/tablet_replica.cc@403
PS6, Line 403: <<
> nit: maybe add a message here too indicating what's going on
Done


http://gerrit.cloudera.org:8080/#/c/16648/6/src/kudu/tablet/tablet_replica.cc@472
PS6, Line 472: LOG(WARNING) << Substitute("The tablet is already shutting down 
or shutdown. State: $0",
 :TabletStatePB_Name(state_));
> nit: might be cleaner to let the callers of these otherwise simple function
Done


http://gerrit.cloudera.org:8080/#/c/16648/6/src/kudu/tablet/tablet_replica.cc@979
PS6, Line 979:   if (!txn_coordinator_) {
 : return false;
 :   }
> nit: maybe only call this if txn_coordinator_ is set and change this to a D
I updated the name to reflect the functionality which is 
ShouldRunTxnCoordinatorStalenessTask(). So keep it as it is here.


http://gerrit.cloudera.org:8080/#/c/16648/6/src/kudu/tablet/tablet_replica.cc@997
PS6, Line 997: state_ == STOPPING || state_ == STOPPED) {
> nit: I'm curious, why not reuse the != RUNNING condition from above? A comm
Added a comment.


http://gerrit.cloudera.org:8080/#/c/16648/6/src/kudu/transactions/txn_status_manager.h
File src/kudu/transactions/txn_status_manager.h:

http://gerrit.cloudera.org:8080/#/c/16648/6/src/kudu/transactions/txn_status_manager.h@109
PS6, Line 109: but some operations
 : // may still be legal.
> I might be missing something but is this fact used anywhere? What operation
Sorry this is not used, removed it.


http://gerrit.cloudera.org:8080/#/c/16648/6/src/kudu/transactions/txn_status_manager.h@106
PS6, Line 106: const Status& replica_status() const { return 
replica_status_; }
 :
 : // Leadership status of the transaction status manager. If 
not OK, the
 : // transaction status manager is not the leader, but some 
operations
 : // may still be legal.
 : const Status& leader_status() const {
 :   return leader_status_;
 : }
> nit: Are these ever used?
Done


http://gerrit.cloudera.org:8080/#/c/16648/6/src/kudu/transactions/txn_status_manager.h@118
PS6, Line 118: if (!replica_status_.ok()) {
 : return replica_status_;
  

[kudu-CR] KUDU-2612: restrict TxnStatusManager calls to be made by the leader only

2021-01-12 Thread Andrew Wong (Code Review)
Andrew Wong has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/16648 )

Change subject: KUDU-2612: restrict TxnStatusManager calls to be made by the 
leader only
..


Patch Set 6:

(27 comments)

A bunch of nits but overall looking much better!

http://gerrit.cloudera.org:8080/#/c/16648/5/src/2.patch
File src/2.patch:

PS5:
Remove this.


http://gerrit.cloudera.org:8080/#/c/16648/5/src/3.patch
File src/3.patch:

PS5:
Remove this.


http://gerrit.cloudera.org:8080/#/c/16648/5/src/kudu/integration-tests/txn_status_manager-itest.cc
File src/kudu/integration-tests/txn_status_manager-itest.cc:

http://gerrit.cloudera.org:8080/#/c/16648/5/src/kudu/integration-tests/txn_status_manager-itest.cc@447
PS5, Line 447:   SleepFor(MonoDelta::FromMilliseconds(5 * 
keepalive_interval_ms));
Do we need this if we're repeating in ASSERT_EVENTUALLY?


http://gerrit.cloudera.org:8080/#/c/16648/6/src/kudu/integration-tests/txn_status_table-itest.cc
File src/kudu/integration-tests/txn_status_table-itest.cc:

http://gerrit.cloudera.org:8080/#/c/16648/6/src/kudu/integration-tests/txn_status_table-itest.cc@744
PS6, Line 744: ;
nit: drop extra semicolon


http://gerrit.cloudera.org:8080/#/c/16648/6/src/kudu/integration-tests/txn_status_table-itest.cc@751
PS6, Line 751: threads.emplace_back([&] {
What does the multithreading add here? Seems like it complicates things since 
we'd have to worry about monotonically increasing transaction IDs.


http://gerrit.cloudera.org:8080/#/c/16648/6/src/kudu/integration-tests/txn_status_table-itest.cc@758
PS6, Line 758: if (s.ok()) {
What failures are we expecting?


http://gerrit.cloudera.org:8080/#/c/16648/6/src/kudu/integration-tests/txn_status_table-itest.cc@748
PS6, Line 748:   std::atomic next_thread_id = 0;
 :   vector txn_id_index = { 1, 6, 11, 16, 21};
 :   for (int t = 0; t < 5; t++) {
 : threads.emplace_back([&] {
 :   int thread_id = next_thread_id++;
 :   CHECK_LT(thread_id, txn_id_index.size());
 :   int start_txn_id = txn_id_index[thread_id];
 :   for (int i = start_txn_id; i < start_txn_id + 5; i++) {
 : TxnStatusEntryPB txn_status;
 : Status s = txn_sys_client_->BeginTransaction(i, kUser);
 : if (s.ok()) {
 :   // Make sure a re-election happens before the 
following read.
 :   SleepFor(MonoDelta::FromMilliseconds(3 * 
FLAGS_raft_heartbeat_interval_ms));
 :   s = txn_sys_client_->GetTransactionStatus(i, kUser, 
&txn_status);
 :   CHECK_OK(s);
 :   CHECK(txn_status.has_user());
 :   CHECK_STREQ(kUser, txn_status.user().c_str());
 :   CHECK(txn_status.has_state());
 :   CHECK_EQ(TxnStatePB::OPEN, txn_status.state());
 : }
 :   }
 : });
 :   }
nit: would be easier to read as:

const int kTxnsPerThread = 5;
const int kNumThreads = 5;
vector threads;
for (int t = 0; t < kNumThreads; t++) {
  threads.emplace_back([t, &] {
for (int i = 0; i < kNumTxnsPerThreads; i++) {
  // do stuff on txn (t * kNumTxnsPerThreads + i)
}
  }
}


http://gerrit.cloudera.org:8080/#/c/16648/5/src/kudu/tablet/tablet_replica.cc
File src/kudu/tablet/tablet_replica.cc:

http://gerrit.cloudera.org:8080/#/c/16648/5/src/kudu/tablet/tablet_replica.cc@395
PS5, Line 395: "Received notification of TxnStatusTablet state change "
 :  << "but the raft consensus is not 
initialized. Tablet ID: "
 :  << tablet_id << ". Reason: " << 
reason;
nit: use Substitute. Same below.


http://gerrit.cloudera.org:8080/#/c/16648/5/src/kudu/tablet/tablet_replica.cc@407
PS5, Line 407:   <<
nit: spacing alignment


http://gerrit.cloudera.org:8080/#/c/16648/5/src/kudu/tablet/tablet_replica.cc@997
PS5, Line 997: state_ == STOPPING || state_ == STOPPED
I'm curious whether the code would work if we reused the != RUNNING condition. 
If so, we could use the same function at both callsites.


http://gerrit.cloudera.org:8080/#/c/16648/6/src/kudu/tablet/tablet_replica.cc
File src/kudu/tablet/tablet_replica.cc:

http://gerrit.cloudera.org:8080/#/c/16648/6/src/kudu/tablet/tablet_replica.cc@403
PS6, Line 403: <<
nit: maybe add a message here too indicating what's going on


http://gerrit.cloudera.org:8080/#/c/16648/6/src/kudu/tablet/tablet_replica.cc@472
PS6, Line 472: LOG(WARNING) << Substitute("The tablet is already shutting down 
or shutdown. State: $0",
 :TabletStatePB_Name(state_));
nit: might be cleaner to let the callers of these otherwise simple functions 
handle logging.


http://gerrit.cloudera.org:8080/#/c/16648/6/src/kudu/tablet/table

[kudu-CR] KUDU-2612: restrict TxnStatusManager calls to be made by the leader only

2021-01-12 Thread Hao Hao (Code Review)
Hello Tidy Bot, Alexey Serbin, Kudu Jenkins, Andrew Wong,

I'd like you to reexamine a change. Please visit

http://gerrit.cloudera.org:8080/16648

to look at the new patch set (#6).

Change subject: KUDU-2612: restrict TxnStatusManager calls to be made by the 
leader only
..

KUDU-2612: restrict TxnStatusManager calls to be made by the leader only

Currently, even though a non-leader TxnStatusManager will be unable to
write to the underlying table (in the Raft subsystem the write would be
aborted), we may want to restrict calls to be made by the leader
TxnStatusManagers only. The motivation is to provide a more robust system,
which avoids cases when the request was sent to a laggy follower, we may
end up failing the request with an error.

This patch introduces ScopedLeaderSharedLock (similar to the one in Catalog
Manager) to be used to ensure the requests were sent to leaders only and
to block all other operations while reloading the persistent transaction
status metadata upon leadership changes. Note that during failover the
leader replica will wait until in-flight ops in the previous consensus
term to be applied before reloading the metadata.

Change-Id: I42c1ad095dcb4bdffcbe0ecf9631a60bac208c2a
---
M src/kudu/client/client-test.cc
M src/kudu/integration-tests/ts_tablet_manager-itest.cc
M src/kudu/integration-tests/txn_status_manager-itest.cc
M src/kudu/integration-tests/txn_status_table-itest.cc
M src/kudu/master/sys_catalog.cc
M src/kudu/master/txn_manager-test.cc
M src/kudu/tablet/tablet_replica-test-base.cc
M src/kudu/tablet/tablet_replica.cc
M src/kudu/tablet/tablet_replica.h
M src/kudu/tablet/txn_coordinator.h
M src/kudu/transactions/txn_status_manager-test.cc
M src/kudu/transactions/txn_status_manager.cc
M src/kudu/transactions/txn_status_manager.h
M src/kudu/transactions/txn_status_tablet.h
M src/kudu/tserver/tablet_copy_source_session-test.cc
M src/kudu/tserver/tablet_service.cc
M src/kudu/tserver/ts_tablet_manager.cc
M src/kudu/tserver/ts_tablet_manager.h
18 files changed, 769 insertions(+), 145 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/kudu refs/changes/48/16648/6
--
To view, visit http://gerrit.cloudera.org:8080/16648
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I42c1ad095dcb4bdffcbe0ecf9631a60bac208c2a
Gerrit-Change-Number: 16648
Gerrit-PatchSet: 6
Gerrit-Owner: Hao Hao 
Gerrit-Reviewer: Alexey Serbin 
Gerrit-Reviewer: Andrew Wong 
Gerrit-Reviewer: Hao Hao 
Gerrit-Reviewer: Kudu Jenkins (120)
Gerrit-Reviewer: Tidy Bot (241)