[kudu-CR] log: start using file cache

2020-01-22 Thread Adar Dembo (Code Review)
Adar Dembo has submitted this change and it was merged. ( 
http://gerrit.cloudera.org:8080/15082 )

Change subject: log: start using file cache
..

log: start using file cache

This patch modifies the log to use the file cache. New WAL segments are
opened through the cache from the moment we switch to them, meaning
there's a short period of time as they're being preallocated where they're
opened outside the cache. Log index chunks are only used through the cache.

The bulk of the patch is plumbing to get the file cache down from the server
into the various log classes. Most "interesting" log-related tests have been
modified to instantiate a cache while other unit tests have not, ensuring a
mix of test coverage.

One important semantic change to be aware of: it is now unsafe to delete a
log's data or bootstrap a new copy of an existing tablet without first
closing the old log. Thankfully we only took advantage of this in tests.

I added a new gflag to use as a feature flag, in case things go south.

Change-Id: I2c00f6b839a693e059fa2ce79abf347dbf83bdd0
Reviewed-on: http://gerrit.cloudera.org:8080/15082
Tested-by: Adar Dembo 
Reviewed-by: Andrew Wong 
---
M src/kudu/consensus/consensus_peers-test.cc
M src/kudu/consensus/consensus_queue-test.cc
M src/kudu/consensus/log-test-base.h
M src/kudu/consensus/log-test.cc
M src/kudu/consensus/log.cc
M src/kudu/consensus/log.h
M src/kudu/consensus/log_cache-test.cc
M src/kudu/consensus/log_index-test.cc
M src/kudu/consensus/log_index.cc
M src/kudu/consensus/log_index.h
M src/kudu/consensus/log_reader.cc
M src/kudu/consensus/log_reader.h
M src/kudu/consensus/log_util.cc
M src/kudu/consensus/log_util.h
M src/kudu/consensus/mt-log-test.cc
M src/kudu/consensus/raft_consensus_quorum-test.cc
M src/kudu/integration-tests/log_verifier.cc
M src/kudu/integration-tests/timestamp_advancement-itest.cc
M src/kudu/integration-tests/ts_recovery-itest.cc
M src/kudu/master/sys_catalog.cc
M src/kudu/tablet/tablet_bootstrap-test.cc
M src/kudu/tablet/tablet_bootstrap.cc
M src/kudu/tablet/tablet_bootstrap.h
M src/kudu/tablet/tablet_replica-test.cc
M src/kudu/tools/kudu-tool-test.cc
M src/kudu/tools/tool_action_local_replica.cc
M src/kudu/tools/tool_action_perf.cc
M src/kudu/tools/tool_action_wal.cc
M src/kudu/tserver/tablet_copy_source_session-test.cc
M src/kudu/tserver/ts_tablet_manager.cc
M src/kudu/tserver/ts_tablet_manager.h
31 files changed, 259 insertions(+), 112 deletions(-)

Approvals:
  Adar Dembo: Verified
  Andrew Wong: Looks good to me, approved

--
To view, visit http://gerrit.cloudera.org:8080/15082
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: merged
Gerrit-Change-Id: I2c00f6b839a693e059fa2ce79abf347dbf83bdd0
Gerrit-Change-Number: 15082
Gerrit-PatchSet: 4
Gerrit-Owner: Adar Dembo 
Gerrit-Reviewer: Adar Dembo 
Gerrit-Reviewer: Alexey Serbin 
Gerrit-Reviewer: Andrew Wong 


[kudu-CR] log: start using file cache

2020-01-22 Thread Andrew Wong (Code Review)
Andrew Wong has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/15082 )

Change subject: log: start using file cache
..


Patch Set 3: Code-Review+2

Would be good to see the effects of this change with concurrent read and write 
workloads, as well as do some validation that we can create a bunch of replicas 
on a single server without blowing past the fd limit.


--
To view, visit http://gerrit.cloudera.org:8080/15082
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I2c00f6b839a693e059fa2ce79abf347dbf83bdd0
Gerrit-Change-Number: 15082
Gerrit-PatchSet: 3
Gerrit-Owner: Adar Dembo 
Gerrit-Reviewer: Adar Dembo 
Gerrit-Reviewer: Alexey Serbin 
Gerrit-Reviewer: Andrew Wong 
Gerrit-Comment-Date: Wed, 22 Jan 2020 22:07:18 +
Gerrit-HasComments: No


[kudu-CR] log: start using file cache

2020-01-22 Thread Adar Dembo (Code Review)
Adar Dembo has removed Kudu Jenkins from this change.  ( 
http://gerrit.cloudera.org:8080/15082 )

Change subject: log: start using file cache
..


Removed reviewer Kudu Jenkins with the following votes:

* Verified-1 by Kudu Jenkins (120)
--
To view, visit http://gerrit.cloudera.org:8080/15082
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: deleteReviewer
Gerrit-Change-Id: I2c00f6b839a693e059fa2ce79abf347dbf83bdd0
Gerrit-Change-Number: 15082
Gerrit-PatchSet: 3
Gerrit-Owner: Adar Dembo 
Gerrit-Reviewer: Adar Dembo 
Gerrit-Reviewer: Alexey Serbin 
Gerrit-Reviewer: Andrew Wong 


[kudu-CR] log: start using file cache

2020-01-22 Thread Adar Dembo (Code Review)
Adar Dembo has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/15082 )

Change subject: log: start using file cache
..


Patch Set 3: Verified+1

Overriding Jenkins, unrelated Java test failure.


--
To view, visit http://gerrit.cloudera.org:8080/15082
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I2c00f6b839a693e059fa2ce79abf347dbf83bdd0
Gerrit-Change-Number: 15082
Gerrit-PatchSet: 3
Gerrit-Owner: Adar Dembo 
Gerrit-Reviewer: Adar Dembo 
Gerrit-Reviewer: Alexey Serbin 
Gerrit-Reviewer: Andrew Wong 
Gerrit-Reviewer: Kudu Jenkins (120)
Gerrit-Comment-Date: Wed, 22 Jan 2020 18:43:58 +
Gerrit-HasComments: No


[kudu-CR] log: start using file cache

2020-01-22 Thread Adar Dembo (Code Review)
Hello Alexey Serbin, Kudu Jenkins, Andrew Wong,

I'd like you to reexamine a change. Please visit

http://gerrit.cloudera.org:8080/15082

to look at the new patch set (#3).

Change subject: log: start using file cache
..

log: start using file cache

This patch modifies the log to use the file cache. New WAL segments are
opened through the cache from the moment we switch to them, meaning
there's a short period of time as they're being preallocated where they're
opened outside the cache. Log index chunks are only used through the cache.

The bulk of the patch is plumbing to get the file cache down from the server
into the various log classes. Most "interesting" log-related tests have been
modified to instantiate a cache while other unit tests have not, ensuring a
mix of test coverage.

One important semantic change to be aware of: it is now unsafe to delete a
log's data or bootstrap a new copy of an existing tablet without first
closing the old log. Thankfully we only took advantage of this in tests.

I added a new gflag to use as a feature flag, in case things go south.

Change-Id: I2c00f6b839a693e059fa2ce79abf347dbf83bdd0
---
M src/kudu/consensus/consensus_peers-test.cc
M src/kudu/consensus/consensus_queue-test.cc
M src/kudu/consensus/log-test-base.h
M src/kudu/consensus/log-test.cc
M src/kudu/consensus/log.cc
M src/kudu/consensus/log.h
M src/kudu/consensus/log_cache-test.cc
M src/kudu/consensus/log_index-test.cc
M src/kudu/consensus/log_index.cc
M src/kudu/consensus/log_index.h
M src/kudu/consensus/log_reader.cc
M src/kudu/consensus/log_reader.h
M src/kudu/consensus/log_util.cc
M src/kudu/consensus/log_util.h
M src/kudu/consensus/mt-log-test.cc
M src/kudu/consensus/raft_consensus_quorum-test.cc
M src/kudu/integration-tests/log_verifier.cc
M src/kudu/integration-tests/timestamp_advancement-itest.cc
M src/kudu/integration-tests/ts_recovery-itest.cc
M src/kudu/master/sys_catalog.cc
M src/kudu/tablet/tablet_bootstrap-test.cc
M src/kudu/tablet/tablet_bootstrap.cc
M src/kudu/tablet/tablet_bootstrap.h
M src/kudu/tablet/tablet_replica-test.cc
M src/kudu/tools/kudu-tool-test.cc
M src/kudu/tools/tool_action_local_replica.cc
M src/kudu/tools/tool_action_perf.cc
M src/kudu/tools/tool_action_wal.cc
M src/kudu/tserver/tablet_copy_source_session-test.cc
M src/kudu/tserver/ts_tablet_manager.cc
M src/kudu/tserver/ts_tablet_manager.h
31 files changed, 259 insertions(+), 112 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/kudu refs/changes/82/15082/3
--
To view, visit http://gerrit.cloudera.org:8080/15082
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I2c00f6b839a693e059fa2ce79abf347dbf83bdd0
Gerrit-Change-Number: 15082
Gerrit-PatchSet: 3
Gerrit-Owner: Adar Dembo 
Gerrit-Reviewer: Adar Dembo 
Gerrit-Reviewer: Alexey Serbin 
Gerrit-Reviewer: Andrew Wong 
Gerrit-Reviewer: Kudu Jenkins (120)


[kudu-CR] log: start using file cache

2020-01-22 Thread Adar Dembo (Code Review)
Adar Dembo has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/15082 )

Change subject: log: start using file cache
..


Patch Set 2:

(4 comments)

http://gerrit.cloudera.org:8080/#/c/15082/1//COMMIT_MSG
Commit Message:

http://gerrit.cloudera.org:8080/#/c/15082/1//COMMIT_MSG@19
PS1, Line 19: unsafe to delete a
: log's data or bootstrap a new copy of an existing tablet without 
first
: closing the old log
> Is it possible to track such conditions and add CHECK()/DCHECK() correspond
Check out the comments I added to Log::DeleteOnDiskData and 
Log::RemoveRecoveryDirIfExists. Both are static functions, called from the 
static TSTabletManager::DeleteTabletData, and therein lies the problem: making 
that stuff non-static is tough because sometimes it's used with a corresponding 
Log instance and sometimes not.


http://gerrit.cloudera.org:8080/#/c/15082/1/src/kudu/consensus/log.cc
File src/kudu/consensus/log.cc:

http://gerrit.cloudera.org:8080/#/c/15082/1/src/kudu/consensus/log.cc@1071
PS1, Line 1071: ops_str = Substitute(" (ops $0-$1)",
  :  segment->footer().m
> nit: Is this important? Could we iterate by mutable ref and reset the refer
We could but no, it's not important: any deleted file will be actually deleted 
when the function returns. I just left that comment there as a point of 
curiosity.


http://gerrit.cloudera.org:8080/#/c/15082/1/src/kudu/tablet/tablet_bootstrap.cc
File src/kudu/tablet/tablet_bootstrap.cc:

http://gerrit.cloudera.org:8080/#/c/15082/1/src/kudu/tablet/tablet_bootstrap.cc@a729
PS1, Line 729:
 :
 :
> good catch!
Ack


http://gerrit.cloudera.org:8080/#/c/15082/1/src/kudu/tserver/ts_tablet_manager.cc
File src/kudu/tserver/ts_tablet_manager.cc:

http://gerrit.cloudera.org:8080/#/c/15082/1/src/kudu/tserver/ts_tablet_manager.cc@1127
PS1, Line 1127: server_->file_cache(),
> Using the file cache means there's a performance hit every time we allocate
Yeah, that's probably wise. Done.



--
To view, visit http://gerrit.cloudera.org:8080/15082
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I2c00f6b839a693e059fa2ce79abf347dbf83bdd0
Gerrit-Change-Number: 15082
Gerrit-PatchSet: 2
Gerrit-Owner: Adar Dembo 
Gerrit-Reviewer: Adar Dembo 
Gerrit-Reviewer: Alexey Serbin 
Gerrit-Reviewer: Andrew Wong 
Gerrit-Reviewer: Kudu Jenkins (120)
Gerrit-Comment-Date: Wed, 22 Jan 2020 17:02:36 +
Gerrit-HasComments: Yes


[kudu-CR] log: start using file cache

2020-01-22 Thread Adar Dembo (Code Review)
Hello Alexey Serbin, Kudu Jenkins, Andrew Wong,

I'd like you to reexamine a change. Please visit

http://gerrit.cloudera.org:8080/15082

to look at the new patch set (#2).

Change subject: log: start using file cache
..

log: start using file cache

This patch modifies the log to use the file cache. New WAL segments are
opened through the cache from the moment we switch to them, meaning
there's a short period of time as they're being preallocated where they're
opened outside the cache. Log index chunks are only used through the cache.

The bulk of the patch is plumbing to get the file cache down from the server
into the various log classes. Most "interesting" log-related tests have been
modified to instantiate a cache while other unit tests have not, ensuring a
mix of test coverage.

One important semantic change to be aware of: it is now unsafe to delete a
log's data or bootstrap a new copy of an existing tablet without first
closing the old log. Thankfully we only took advantage of this in tests.

I added a new gflag to use as a feature flag, in case things go south.

Change-Id: I2c00f6b839a693e059fa2ce79abf347dbf83bdd0
---
M src/kudu/consensus/consensus_peers-test.cc
M src/kudu/consensus/consensus_queue-test.cc
M src/kudu/consensus/log-test-base.h
M src/kudu/consensus/log-test.cc
M src/kudu/consensus/log.cc
M src/kudu/consensus/log.h
M src/kudu/consensus/log_cache-test.cc
M src/kudu/consensus/log_index-test.cc
M src/kudu/consensus/log_index.cc
M src/kudu/consensus/log_index.h
M src/kudu/consensus/log_reader.cc
M src/kudu/consensus/log_reader.h
M src/kudu/consensus/log_util.cc
M src/kudu/consensus/log_util.h
M src/kudu/consensus/mt-log-test.cc
M src/kudu/consensus/raft_consensus_quorum-test.cc
M src/kudu/integration-tests/log_verifier.cc
M src/kudu/integration-tests/timestamp_advancement-itest.cc
M src/kudu/integration-tests/ts_recovery-itest.cc
M src/kudu/master/sys_catalog.cc
M src/kudu/tablet/tablet_bootstrap-test.cc
M src/kudu/tablet/tablet_bootstrap.cc
M src/kudu/tablet/tablet_bootstrap.h
M src/kudu/tablet/tablet_replica-test.cc
M src/kudu/tools/kudu-tool-test.cc
M src/kudu/tools/tool_action_local_replica.cc
M src/kudu/tools/tool_action_perf.cc
M src/kudu/tools/tool_action_wal.cc
M src/kudu/tserver/tablet_copy_source_session-test.cc
M src/kudu/tserver/ts_tablet_manager.cc
M src/kudu/tserver/ts_tablet_manager.h
31 files changed, 258 insertions(+), 112 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/kudu refs/changes/82/15082/2
--
To view, visit http://gerrit.cloudera.org:8080/15082
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I2c00f6b839a693e059fa2ce79abf347dbf83bdd0
Gerrit-Change-Number: 15082
Gerrit-PatchSet: 2
Gerrit-Owner: Adar Dembo 
Gerrit-Reviewer: Alexey Serbin 
Gerrit-Reviewer: Andrew Wong 
Gerrit-Reviewer: Kudu Jenkins (120)


[kudu-CR] log: start using file cache

2020-01-22 Thread Alexey Serbin (Code Review)
Alexey Serbin has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/15082 )

Change subject: log: start using file cache
..


Patch Set 1:

(2 comments)

http://gerrit.cloudera.org:8080/#/c/15082/1//COMMIT_MSG
Commit Message:

http://gerrit.cloudera.org:8080/#/c/15082/1//COMMIT_MSG@19
PS1, Line 19: unsafe to delete a
: log's data or bootstrap a new copy of an existing tablet without 
first
: closing the old log
Is it possible to track such conditions and add CHECK()/DCHECK() 
correspondingly?


http://gerrit.cloudera.org:8080/#/c/15082/1/src/kudu/tablet/tablet_bootstrap.cc
File src/kudu/tablet/tablet_bootstrap.cc:

http://gerrit.cloudera.org:8080/#/c/15082/1/src/kudu/tablet/tablet_bootstrap.cc@a729
PS1, Line 729:
 :
 :
good catch!



--
To view, visit http://gerrit.cloudera.org:8080/15082
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I2c00f6b839a693e059fa2ce79abf347dbf83bdd0
Gerrit-Change-Number: 15082
Gerrit-PatchSet: 1
Gerrit-Owner: Adar Dembo 
Gerrit-Reviewer: Alexey Serbin 
Gerrit-Reviewer: Andrew Wong 
Gerrit-Reviewer: Kudu Jenkins (120)
Gerrit-Comment-Date: Wed, 22 Jan 2020 08:18:45 +
Gerrit-HasComments: Yes


[kudu-CR] log: start using file cache

2020-01-21 Thread Andrew Wong (Code Review)
Andrew Wong has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/15082 )

Change subject: log: start using file cache
..


Patch Set 1:

(2 comments)

http://gerrit.cloudera.org:8080/#/c/15082/1/src/kudu/consensus/log.cc
File src/kudu/consensus/log.cc:

http://gerrit.cloudera.org:8080/#/c/15082/1/src/kudu/consensus/log.cc@1071
PS1, Line 1071: // Note: the segment files will only be deleted from disk when
  : // segments_to_delete goes out of scope.
nit: Is this important? Could we iterate by mutable ref and reset the 
references in segments_to_delete?


http://gerrit.cloudera.org:8080/#/c/15082/1/src/kudu/tserver/ts_tablet_manager.cc
File src/kudu/tserver/ts_tablet_manager.cc:

http://gerrit.cloudera.org:8080/#/c/15082/1/src/kudu/tserver/ts_tablet_manager.cc@1127
PS1, Line 1127: server_->file_cache(),
Using the file cache means there's a performance hit every time we 
allocate/roll onto new WAL segments, and it adds stress on data block pathways 
as well. Do you think we ought to add a flag that enables this?



--
To view, visit http://gerrit.cloudera.org:8080/15082
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I2c00f6b839a693e059fa2ce79abf347dbf83bdd0
Gerrit-Change-Number: 15082
Gerrit-PatchSet: 1
Gerrit-Owner: Adar Dembo 
Gerrit-Reviewer: Alexey Serbin 
Gerrit-Reviewer: Andrew Wong 
Gerrit-Reviewer: Kudu Jenkins (120)
Gerrit-Comment-Date: Tue, 21 Jan 2020 22:37:06 +
Gerrit-HasComments: Yes


[kudu-CR] log: start using file cache

2020-01-21 Thread Adar Dembo (Code Review)
Hello Alexey Serbin, Andrew Wong,

I'd like you to do a code review. Please visit

http://gerrit.cloudera.org:8080/15082

to review the following change.


Change subject: log: start using file cache
..

log: start using file cache

This patch modifies the log to use the file cache. New WAL segments are
opened through the cache from the moment we switch to them, meaning
there's a short period of time as they're being preallocated where they're
opened outside the cache. Log index chunks are only used through the cache.

The bulk of the patch is plumbing to get the file cache down from the server
into the various log classes. Most "interesting" log-related tests have been
modified to instantiate a cache while other unit tests have not, ensuring a
mix of test coverage.

One important semantic change to be aware of: it is now unsafe to delete a
log's data or bootstrap a new copy of an existing tablet without first
closing the old log. Thankfully we only took advantage of this in tests.

Change-Id: I2c00f6b839a693e059fa2ce79abf347dbf83bdd0
---
M src/kudu/consensus/consensus_peers-test.cc
M src/kudu/consensus/consensus_queue-test.cc
M src/kudu/consensus/log-test-base.h
M src/kudu/consensus/log-test.cc
M src/kudu/consensus/log.cc
M src/kudu/consensus/log.h
M src/kudu/consensus/log_cache-test.cc
M src/kudu/consensus/log_index-test.cc
M src/kudu/consensus/log_index.cc
M src/kudu/consensus/log_index.h
M src/kudu/consensus/log_reader.cc
M src/kudu/consensus/log_reader.h
M src/kudu/consensus/log_util.cc
M src/kudu/consensus/log_util.h
M src/kudu/consensus/mt-log-test.cc
M src/kudu/consensus/raft_consensus_quorum-test.cc
M src/kudu/integration-tests/log_verifier.cc
M src/kudu/integration-tests/timestamp_advancement-itest.cc
M src/kudu/integration-tests/ts_recovery-itest.cc
M src/kudu/master/sys_catalog.cc
M src/kudu/tablet/tablet_bootstrap-test.cc
M src/kudu/tablet/tablet_bootstrap.cc
M src/kudu/tablet/tablet_bootstrap.h
M src/kudu/tablet/tablet_replica-test.cc
M src/kudu/tools/kudu-tool-test.cc
M src/kudu/tools/tool_action_local_replica.cc
M src/kudu/tools/tool_action_perf.cc
M src/kudu/tools/tool_action_wal.cc
M src/kudu/tserver/tablet_copy_source_session-test.cc
M src/kudu/tserver/ts_tablet_manager.cc
M src/kudu/tserver/ts_tablet_manager.h
31 files changed, 246 insertions(+), 110 deletions(-)



  git pull ssh://gerrit.cloudera.org:29418/kudu refs/changes/82/15082/1
--
To view, visit http://gerrit.cloudera.org:8080/15082
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: newchange
Gerrit-Change-Id: I2c00f6b839a693e059fa2ce79abf347dbf83bdd0
Gerrit-Change-Number: 15082
Gerrit-PatchSet: 1
Gerrit-Owner: Adar Dembo 
Gerrit-Reviewer: Alexey Serbin 
Gerrit-Reviewer: Andrew Wong