[kudu-CR] log: start using file cache
Adar Dembo has submitted this change and it was merged. ( http://gerrit.cloudera.org:8080/15082 ) Change subject: log: start using file cache .. log: start using file cache This patch modifies the log to use the file cache. New WAL segments are opened through the cache from the moment we switch to them, meaning there's a short period of time as they're being preallocated where they're opened outside the cache. Log index chunks are only used through the cache. The bulk of the patch is plumbing to get the file cache down from the server into the various log classes. Most "interesting" log-related tests have been modified to instantiate a cache while other unit tests have not, ensuring a mix of test coverage. One important semantic change to be aware of: it is now unsafe to delete a log's data or bootstrap a new copy of an existing tablet without first closing the old log. Thankfully we only took advantage of this in tests. I added a new gflag to use as a feature flag, in case things go south. Change-Id: I2c00f6b839a693e059fa2ce79abf347dbf83bdd0 Reviewed-on: http://gerrit.cloudera.org:8080/15082 Tested-by: Adar Dembo Reviewed-by: Andrew Wong --- M src/kudu/consensus/consensus_peers-test.cc M src/kudu/consensus/consensus_queue-test.cc M src/kudu/consensus/log-test-base.h M src/kudu/consensus/log-test.cc M src/kudu/consensus/log.cc M src/kudu/consensus/log.h M src/kudu/consensus/log_cache-test.cc M src/kudu/consensus/log_index-test.cc M src/kudu/consensus/log_index.cc M src/kudu/consensus/log_index.h M src/kudu/consensus/log_reader.cc M src/kudu/consensus/log_reader.h M src/kudu/consensus/log_util.cc M src/kudu/consensus/log_util.h M src/kudu/consensus/mt-log-test.cc M src/kudu/consensus/raft_consensus_quorum-test.cc M src/kudu/integration-tests/log_verifier.cc M src/kudu/integration-tests/timestamp_advancement-itest.cc M src/kudu/integration-tests/ts_recovery-itest.cc M src/kudu/master/sys_catalog.cc M src/kudu/tablet/tablet_bootstrap-test.cc M src/kudu/tablet/tablet_bootstrap.cc M src/kudu/tablet/tablet_bootstrap.h M src/kudu/tablet/tablet_replica-test.cc M src/kudu/tools/kudu-tool-test.cc M src/kudu/tools/tool_action_local_replica.cc M src/kudu/tools/tool_action_perf.cc M src/kudu/tools/tool_action_wal.cc M src/kudu/tserver/tablet_copy_source_session-test.cc M src/kudu/tserver/ts_tablet_manager.cc M src/kudu/tserver/ts_tablet_manager.h 31 files changed, 259 insertions(+), 112 deletions(-) Approvals: Adar Dembo: Verified Andrew Wong: Looks good to me, approved -- To view, visit http://gerrit.cloudera.org:8080/15082 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: kudu Gerrit-Branch: master Gerrit-MessageType: merged Gerrit-Change-Id: I2c00f6b839a693e059fa2ce79abf347dbf83bdd0 Gerrit-Change-Number: 15082 Gerrit-PatchSet: 4 Gerrit-Owner: Adar Dembo Gerrit-Reviewer: Adar Dembo Gerrit-Reviewer: Alexey Serbin Gerrit-Reviewer: Andrew Wong
[kudu-CR] log: start using file cache
Andrew Wong has posted comments on this change. ( http://gerrit.cloudera.org:8080/15082 ) Change subject: log: start using file cache .. Patch Set 3: Code-Review+2 Would be good to see the effects of this change with concurrent read and write workloads, as well as do some validation that we can create a bunch of replicas on a single server without blowing past the fd limit. -- To view, visit http://gerrit.cloudera.org:8080/15082 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: kudu Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I2c00f6b839a693e059fa2ce79abf347dbf83bdd0 Gerrit-Change-Number: 15082 Gerrit-PatchSet: 3 Gerrit-Owner: Adar Dembo Gerrit-Reviewer: Adar Dembo Gerrit-Reviewer: Alexey Serbin Gerrit-Reviewer: Andrew Wong Gerrit-Comment-Date: Wed, 22 Jan 2020 22:07:18 + Gerrit-HasComments: No
[kudu-CR] log: start using file cache
Adar Dembo has removed Kudu Jenkins from this change. ( http://gerrit.cloudera.org:8080/15082 ) Change subject: log: start using file cache .. Removed reviewer Kudu Jenkins with the following votes: * Verified-1 by Kudu Jenkins (120) -- To view, visit http://gerrit.cloudera.org:8080/15082 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: kudu Gerrit-Branch: master Gerrit-MessageType: deleteReviewer Gerrit-Change-Id: I2c00f6b839a693e059fa2ce79abf347dbf83bdd0 Gerrit-Change-Number: 15082 Gerrit-PatchSet: 3 Gerrit-Owner: Adar Dembo Gerrit-Reviewer: Adar Dembo Gerrit-Reviewer: Alexey Serbin Gerrit-Reviewer: Andrew Wong
[kudu-CR] log: start using file cache
Adar Dembo has posted comments on this change. ( http://gerrit.cloudera.org:8080/15082 ) Change subject: log: start using file cache .. Patch Set 3: Verified+1 Overriding Jenkins, unrelated Java test failure. -- To view, visit http://gerrit.cloudera.org:8080/15082 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: kudu Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I2c00f6b839a693e059fa2ce79abf347dbf83bdd0 Gerrit-Change-Number: 15082 Gerrit-PatchSet: 3 Gerrit-Owner: Adar Dembo Gerrit-Reviewer: Adar Dembo Gerrit-Reviewer: Alexey Serbin Gerrit-Reviewer: Andrew Wong Gerrit-Reviewer: Kudu Jenkins (120) Gerrit-Comment-Date: Wed, 22 Jan 2020 18:43:58 + Gerrit-HasComments: No
[kudu-CR] log: start using file cache
Hello Alexey Serbin, Kudu Jenkins, Andrew Wong, I'd like you to reexamine a change. Please visit http://gerrit.cloudera.org:8080/15082 to look at the new patch set (#3). Change subject: log: start using file cache .. log: start using file cache This patch modifies the log to use the file cache. New WAL segments are opened through the cache from the moment we switch to them, meaning there's a short period of time as they're being preallocated where they're opened outside the cache. Log index chunks are only used through the cache. The bulk of the patch is plumbing to get the file cache down from the server into the various log classes. Most "interesting" log-related tests have been modified to instantiate a cache while other unit tests have not, ensuring a mix of test coverage. One important semantic change to be aware of: it is now unsafe to delete a log's data or bootstrap a new copy of an existing tablet without first closing the old log. Thankfully we only took advantage of this in tests. I added a new gflag to use as a feature flag, in case things go south. Change-Id: I2c00f6b839a693e059fa2ce79abf347dbf83bdd0 --- M src/kudu/consensus/consensus_peers-test.cc M src/kudu/consensus/consensus_queue-test.cc M src/kudu/consensus/log-test-base.h M src/kudu/consensus/log-test.cc M src/kudu/consensus/log.cc M src/kudu/consensus/log.h M src/kudu/consensus/log_cache-test.cc M src/kudu/consensus/log_index-test.cc M src/kudu/consensus/log_index.cc M src/kudu/consensus/log_index.h M src/kudu/consensus/log_reader.cc M src/kudu/consensus/log_reader.h M src/kudu/consensus/log_util.cc M src/kudu/consensus/log_util.h M src/kudu/consensus/mt-log-test.cc M src/kudu/consensus/raft_consensus_quorum-test.cc M src/kudu/integration-tests/log_verifier.cc M src/kudu/integration-tests/timestamp_advancement-itest.cc M src/kudu/integration-tests/ts_recovery-itest.cc M src/kudu/master/sys_catalog.cc M src/kudu/tablet/tablet_bootstrap-test.cc M src/kudu/tablet/tablet_bootstrap.cc M src/kudu/tablet/tablet_bootstrap.h M src/kudu/tablet/tablet_replica-test.cc M src/kudu/tools/kudu-tool-test.cc M src/kudu/tools/tool_action_local_replica.cc M src/kudu/tools/tool_action_perf.cc M src/kudu/tools/tool_action_wal.cc M src/kudu/tserver/tablet_copy_source_session-test.cc M src/kudu/tserver/ts_tablet_manager.cc M src/kudu/tserver/ts_tablet_manager.h 31 files changed, 259 insertions(+), 112 deletions(-) git pull ssh://gerrit.cloudera.org:29418/kudu refs/changes/82/15082/3 -- To view, visit http://gerrit.cloudera.org:8080/15082 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: kudu Gerrit-Branch: master Gerrit-MessageType: newpatchset Gerrit-Change-Id: I2c00f6b839a693e059fa2ce79abf347dbf83bdd0 Gerrit-Change-Number: 15082 Gerrit-PatchSet: 3 Gerrit-Owner: Adar Dembo Gerrit-Reviewer: Adar Dembo Gerrit-Reviewer: Alexey Serbin Gerrit-Reviewer: Andrew Wong Gerrit-Reviewer: Kudu Jenkins (120)
[kudu-CR] log: start using file cache
Adar Dembo has posted comments on this change. ( http://gerrit.cloudera.org:8080/15082 ) Change subject: log: start using file cache .. Patch Set 2: (4 comments) http://gerrit.cloudera.org:8080/#/c/15082/1//COMMIT_MSG Commit Message: http://gerrit.cloudera.org:8080/#/c/15082/1//COMMIT_MSG@19 PS1, Line 19: unsafe to delete a : log's data or bootstrap a new copy of an existing tablet without first : closing the old log > Is it possible to track such conditions and add CHECK()/DCHECK() correspond Check out the comments I added to Log::DeleteOnDiskData and Log::RemoveRecoveryDirIfExists. Both are static functions, called from the static TSTabletManager::DeleteTabletData, and therein lies the problem: making that stuff non-static is tough because sometimes it's used with a corresponding Log instance and sometimes not. http://gerrit.cloudera.org:8080/#/c/15082/1/src/kudu/consensus/log.cc File src/kudu/consensus/log.cc: http://gerrit.cloudera.org:8080/#/c/15082/1/src/kudu/consensus/log.cc@1071 PS1, Line 1071: ops_str = Substitute(" (ops $0-$1)", : segment->footer().m > nit: Is this important? Could we iterate by mutable ref and reset the refer We could but no, it's not important: any deleted file will be actually deleted when the function returns. I just left that comment there as a point of curiosity. http://gerrit.cloudera.org:8080/#/c/15082/1/src/kudu/tablet/tablet_bootstrap.cc File src/kudu/tablet/tablet_bootstrap.cc: http://gerrit.cloudera.org:8080/#/c/15082/1/src/kudu/tablet/tablet_bootstrap.cc@a729 PS1, Line 729: : : > good catch! Ack http://gerrit.cloudera.org:8080/#/c/15082/1/src/kudu/tserver/ts_tablet_manager.cc File src/kudu/tserver/ts_tablet_manager.cc: http://gerrit.cloudera.org:8080/#/c/15082/1/src/kudu/tserver/ts_tablet_manager.cc@1127 PS1, Line 1127: server_->file_cache(), > Using the file cache means there's a performance hit every time we allocate Yeah, that's probably wise. Done. -- To view, visit http://gerrit.cloudera.org:8080/15082 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: kudu Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I2c00f6b839a693e059fa2ce79abf347dbf83bdd0 Gerrit-Change-Number: 15082 Gerrit-PatchSet: 2 Gerrit-Owner: Adar Dembo Gerrit-Reviewer: Adar Dembo Gerrit-Reviewer: Alexey Serbin Gerrit-Reviewer: Andrew Wong Gerrit-Reviewer: Kudu Jenkins (120) Gerrit-Comment-Date: Wed, 22 Jan 2020 17:02:36 + Gerrit-HasComments: Yes
[kudu-CR] log: start using file cache
Hello Alexey Serbin, Kudu Jenkins, Andrew Wong, I'd like you to reexamine a change. Please visit http://gerrit.cloudera.org:8080/15082 to look at the new patch set (#2). Change subject: log: start using file cache .. log: start using file cache This patch modifies the log to use the file cache. New WAL segments are opened through the cache from the moment we switch to them, meaning there's a short period of time as they're being preallocated where they're opened outside the cache. Log index chunks are only used through the cache. The bulk of the patch is plumbing to get the file cache down from the server into the various log classes. Most "interesting" log-related tests have been modified to instantiate a cache while other unit tests have not, ensuring a mix of test coverage. One important semantic change to be aware of: it is now unsafe to delete a log's data or bootstrap a new copy of an existing tablet without first closing the old log. Thankfully we only took advantage of this in tests. I added a new gflag to use as a feature flag, in case things go south. Change-Id: I2c00f6b839a693e059fa2ce79abf347dbf83bdd0 --- M src/kudu/consensus/consensus_peers-test.cc M src/kudu/consensus/consensus_queue-test.cc M src/kudu/consensus/log-test-base.h M src/kudu/consensus/log-test.cc M src/kudu/consensus/log.cc M src/kudu/consensus/log.h M src/kudu/consensus/log_cache-test.cc M src/kudu/consensus/log_index-test.cc M src/kudu/consensus/log_index.cc M src/kudu/consensus/log_index.h M src/kudu/consensus/log_reader.cc M src/kudu/consensus/log_reader.h M src/kudu/consensus/log_util.cc M src/kudu/consensus/log_util.h M src/kudu/consensus/mt-log-test.cc M src/kudu/consensus/raft_consensus_quorum-test.cc M src/kudu/integration-tests/log_verifier.cc M src/kudu/integration-tests/timestamp_advancement-itest.cc M src/kudu/integration-tests/ts_recovery-itest.cc M src/kudu/master/sys_catalog.cc M src/kudu/tablet/tablet_bootstrap-test.cc M src/kudu/tablet/tablet_bootstrap.cc M src/kudu/tablet/tablet_bootstrap.h M src/kudu/tablet/tablet_replica-test.cc M src/kudu/tools/kudu-tool-test.cc M src/kudu/tools/tool_action_local_replica.cc M src/kudu/tools/tool_action_perf.cc M src/kudu/tools/tool_action_wal.cc M src/kudu/tserver/tablet_copy_source_session-test.cc M src/kudu/tserver/ts_tablet_manager.cc M src/kudu/tserver/ts_tablet_manager.h 31 files changed, 258 insertions(+), 112 deletions(-) git pull ssh://gerrit.cloudera.org:29418/kudu refs/changes/82/15082/2 -- To view, visit http://gerrit.cloudera.org:8080/15082 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: kudu Gerrit-Branch: master Gerrit-MessageType: newpatchset Gerrit-Change-Id: I2c00f6b839a693e059fa2ce79abf347dbf83bdd0 Gerrit-Change-Number: 15082 Gerrit-PatchSet: 2 Gerrit-Owner: Adar Dembo Gerrit-Reviewer: Alexey Serbin Gerrit-Reviewer: Andrew Wong Gerrit-Reviewer: Kudu Jenkins (120)
[kudu-CR] log: start using file cache
Alexey Serbin has posted comments on this change. ( http://gerrit.cloudera.org:8080/15082 ) Change subject: log: start using file cache .. Patch Set 1: (2 comments) http://gerrit.cloudera.org:8080/#/c/15082/1//COMMIT_MSG Commit Message: http://gerrit.cloudera.org:8080/#/c/15082/1//COMMIT_MSG@19 PS1, Line 19: unsafe to delete a : log's data or bootstrap a new copy of an existing tablet without first : closing the old log Is it possible to track such conditions and add CHECK()/DCHECK() correspondingly? http://gerrit.cloudera.org:8080/#/c/15082/1/src/kudu/tablet/tablet_bootstrap.cc File src/kudu/tablet/tablet_bootstrap.cc: http://gerrit.cloudera.org:8080/#/c/15082/1/src/kudu/tablet/tablet_bootstrap.cc@a729 PS1, Line 729: : : good catch! -- To view, visit http://gerrit.cloudera.org:8080/15082 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: kudu Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I2c00f6b839a693e059fa2ce79abf347dbf83bdd0 Gerrit-Change-Number: 15082 Gerrit-PatchSet: 1 Gerrit-Owner: Adar Dembo Gerrit-Reviewer: Alexey Serbin Gerrit-Reviewer: Andrew Wong Gerrit-Reviewer: Kudu Jenkins (120) Gerrit-Comment-Date: Wed, 22 Jan 2020 08:18:45 + Gerrit-HasComments: Yes
[kudu-CR] log: start using file cache
Andrew Wong has posted comments on this change. ( http://gerrit.cloudera.org:8080/15082 ) Change subject: log: start using file cache .. Patch Set 1: (2 comments) http://gerrit.cloudera.org:8080/#/c/15082/1/src/kudu/consensus/log.cc File src/kudu/consensus/log.cc: http://gerrit.cloudera.org:8080/#/c/15082/1/src/kudu/consensus/log.cc@1071 PS1, Line 1071: // Note: the segment files will only be deleted from disk when : // segments_to_delete goes out of scope. nit: Is this important? Could we iterate by mutable ref and reset the references in segments_to_delete? http://gerrit.cloudera.org:8080/#/c/15082/1/src/kudu/tserver/ts_tablet_manager.cc File src/kudu/tserver/ts_tablet_manager.cc: http://gerrit.cloudera.org:8080/#/c/15082/1/src/kudu/tserver/ts_tablet_manager.cc@1127 PS1, Line 1127: server_->file_cache(), Using the file cache means there's a performance hit every time we allocate/roll onto new WAL segments, and it adds stress on data block pathways as well. Do you think we ought to add a flag that enables this? -- To view, visit http://gerrit.cloudera.org:8080/15082 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: kudu Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I2c00f6b839a693e059fa2ce79abf347dbf83bdd0 Gerrit-Change-Number: 15082 Gerrit-PatchSet: 1 Gerrit-Owner: Adar Dembo Gerrit-Reviewer: Alexey Serbin Gerrit-Reviewer: Andrew Wong Gerrit-Reviewer: Kudu Jenkins (120) Gerrit-Comment-Date: Tue, 21 Jan 2020 22:37:06 + Gerrit-HasComments: Yes
[kudu-CR] log: start using file cache
Hello Alexey Serbin, Andrew Wong, I'd like you to do a code review. Please visit http://gerrit.cloudera.org:8080/15082 to review the following change. Change subject: log: start using file cache .. log: start using file cache This patch modifies the log to use the file cache. New WAL segments are opened through the cache from the moment we switch to them, meaning there's a short period of time as they're being preallocated where they're opened outside the cache. Log index chunks are only used through the cache. The bulk of the patch is plumbing to get the file cache down from the server into the various log classes. Most "interesting" log-related tests have been modified to instantiate a cache while other unit tests have not, ensuring a mix of test coverage. One important semantic change to be aware of: it is now unsafe to delete a log's data or bootstrap a new copy of an existing tablet without first closing the old log. Thankfully we only took advantage of this in tests. Change-Id: I2c00f6b839a693e059fa2ce79abf347dbf83bdd0 --- M src/kudu/consensus/consensus_peers-test.cc M src/kudu/consensus/consensus_queue-test.cc M src/kudu/consensus/log-test-base.h M src/kudu/consensus/log-test.cc M src/kudu/consensus/log.cc M src/kudu/consensus/log.h M src/kudu/consensus/log_cache-test.cc M src/kudu/consensus/log_index-test.cc M src/kudu/consensus/log_index.cc M src/kudu/consensus/log_index.h M src/kudu/consensus/log_reader.cc M src/kudu/consensus/log_reader.h M src/kudu/consensus/log_util.cc M src/kudu/consensus/log_util.h M src/kudu/consensus/mt-log-test.cc M src/kudu/consensus/raft_consensus_quorum-test.cc M src/kudu/integration-tests/log_verifier.cc M src/kudu/integration-tests/timestamp_advancement-itest.cc M src/kudu/integration-tests/ts_recovery-itest.cc M src/kudu/master/sys_catalog.cc M src/kudu/tablet/tablet_bootstrap-test.cc M src/kudu/tablet/tablet_bootstrap.cc M src/kudu/tablet/tablet_bootstrap.h M src/kudu/tablet/tablet_replica-test.cc M src/kudu/tools/kudu-tool-test.cc M src/kudu/tools/tool_action_local_replica.cc M src/kudu/tools/tool_action_perf.cc M src/kudu/tools/tool_action_wal.cc M src/kudu/tserver/tablet_copy_source_session-test.cc M src/kudu/tserver/ts_tablet_manager.cc M src/kudu/tserver/ts_tablet_manager.h 31 files changed, 246 insertions(+), 110 deletions(-) git pull ssh://gerrit.cloudera.org:29418/kudu refs/changes/82/15082/1 -- To view, visit http://gerrit.cloudera.org:8080/15082 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: kudu Gerrit-Branch: master Gerrit-MessageType: newchange Gerrit-Change-Id: I2c00f6b839a693e059fa2ce79abf347dbf83bdd0 Gerrit-Change-Number: 15082 Gerrit-PatchSet: 1 Gerrit-Owner: Adar Dembo Gerrit-Reviewer: Alexey Serbin Gerrit-Reviewer: Andrew Wong