[kudu-CR] KUDU-1657: read-only FsManager::Open on active tablet can crash

2016-09-29 Thread Dinesh Bhat (Code Review)
Dinesh Bhat has posted comments on this change.

Change subject: KUDU-1657: read-only FsManager::Open on active tablet can crash
..


Patch Set 8:

(1 comment)

http://gerrit.cloudera.org:8080/#/c/4551/2/src/kudu/fs/log_block_manager.cc
File src/kudu/fs/log_block_manager.cc:

PS2, Line 499: en to by another lbm, we must reinspect th
> Compaction writes new blocks to the container, and old ones are hole punche
Great, thanks.


-- 
To view, visit http://gerrit.cloudera.org:8080/4551
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: If68b04f1f1b8cd099120a220b1245ecf8f422770
Gerrit-PatchSet: 8
Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-Owner: Dan Burkert 
Gerrit-Reviewer: Adar Dembo 
Gerrit-Reviewer: Dan Burkert 
Gerrit-Reviewer: David Ribeiro Alves 
Gerrit-Reviewer: Dinesh Bhat 
Gerrit-Reviewer: Todd Lipcon 
Gerrit-HasComments: Yes


[kudu-CR] KUDU-1657: read-only FsManager::Open on active tablet can crash

2016-09-29 Thread Dan Burkert (Code Review)
Dan Burkert has submitted this change and it was merged.

Change subject: KUDU-1657: read-only FsManager::Open on active tablet can crash
..


KUDU-1657: read-only FsManager::Open on active tablet can crash

This fixes a race condition when opening a read-only FsManager on a data
directory with lbm containers that are actively being appended to. See
the code comment for more details. A best-effort repro test case is
included; it typically takes about 35 seconds to repro without the fix.
This issue was identified because it was causing
alter_table-randomized-test to be significantly flaky.

Change-Id: If68b04f1f1b8cd099120a220b1245ecf8f422770
Reviewed-on: http://gerrit.cloudera.org:8080/4551
Reviewed-by: Adar Dembo 
Tested-by: Dan Burkert 
---
M src/kudu/fs/log_block_manager.cc
M src/kudu/integration-tests/CMakeLists.txt
A src/kudu/integration-tests/open-readonly-fs-itest.cc
3 files changed, 158 insertions(+), 1 deletion(-)

Approvals:
  Dan Burkert: Verified
  Adar Dembo: Looks good to me, approved



-- 
To view, visit http://gerrit.cloudera.org:8080/4551
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: merged
Gerrit-Change-Id: If68b04f1f1b8cd099120a220b1245ecf8f422770
Gerrit-PatchSet: 8
Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-Owner: Dan Burkert 
Gerrit-Reviewer: Adar Dembo 
Gerrit-Reviewer: Dan Burkert 
Gerrit-Reviewer: David Ribeiro Alves 
Gerrit-Reviewer: Dinesh Bhat 
Gerrit-Reviewer: Todd Lipcon 


[kudu-CR] KUDU-1657: read-only FsManager::Open on active tablet can crash

2016-09-29 Thread Dan Burkert (Code Review)
Dan Burkert has posted comments on this change.

Change subject: KUDU-1657: read-only FsManager::Open on active tablet can crash
..


Patch Set 7: Verified+1

Jenkins is completely hosed and I'm tired of waiting.

-- 
To view, visit http://gerrit.cloudera.org:8080/4551
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: If68b04f1f1b8cd099120a220b1245ecf8f422770
Gerrit-PatchSet: 7
Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-Owner: Dan Burkert 
Gerrit-Reviewer: Adar Dembo 
Gerrit-Reviewer: Dan Burkert 
Gerrit-Reviewer: David Ribeiro Alves 
Gerrit-Reviewer: Dinesh Bhat 
Gerrit-Reviewer: Kudu Jenkins
Gerrit-Reviewer: Todd Lipcon 
Gerrit-HasComments: No


[kudu-CR] KUDU-1657: read-only FsManager::Open on active tablet can crash

2016-09-29 Thread Adar Dembo (Code Review)
Adar Dembo has posted comments on this change.

Change subject: KUDU-1657: read-only FsManager::Open on active tablet can crash
..


Patch Set 7: Code-Review+2

-- 
To view, visit http://gerrit.cloudera.org:8080/4551
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: If68b04f1f1b8cd099120a220b1245ecf8f422770
Gerrit-PatchSet: 7
Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-Owner: Dan Burkert 
Gerrit-Reviewer: Adar Dembo 
Gerrit-Reviewer: Dan Burkert 
Gerrit-Reviewer: David Ribeiro Alves 
Gerrit-Reviewer: Dinesh Bhat 
Gerrit-Reviewer: Kudu Jenkins
Gerrit-Reviewer: Todd Lipcon 
Gerrit-HasComments: No


[kudu-CR] KUDU-1657: read-only FsManager::Open on active tablet can crash

2016-09-29 Thread Dan Burkert (Code Review)
Hello David Ribeiro Alves, Adar Dembo, Kudu Jenkins,

I'd like you to reexamine a change.  Please visit

http://gerrit.cloudera.org:8080/4551

to look at the new patch set (#7).

Change subject: KUDU-1657: read-only FsManager::Open on active tablet can crash
..

KUDU-1657: read-only FsManager::Open on active tablet can crash

This fixes a race condition when opening a read-only FsManager on a data
directory with lbm containers that are actively being appended to. See
the code comment for more details. A best-effort repro test case is
included; it typically takes about 35 seconds to repro without the fix.
This issue was identified because it was causing
alter_table-randomized-test to be significantly flaky.

Change-Id: If68b04f1f1b8cd099120a220b1245ecf8f422770
---
M src/kudu/fs/log_block_manager.cc
M src/kudu/integration-tests/CMakeLists.txt
A src/kudu/integration-tests/open-readonly-fs-itest.cc
3 files changed, 158 insertions(+), 1 deletion(-)


  git pull ssh://gerrit.cloudera.org:29418/kudu refs/changes/51/4551/7
-- 
To view, visit http://gerrit.cloudera.org:8080/4551
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: newpatchset
Gerrit-Change-Id: If68b04f1f1b8cd099120a220b1245ecf8f422770
Gerrit-PatchSet: 7
Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-Owner: Dan Burkert 
Gerrit-Reviewer: Adar Dembo 
Gerrit-Reviewer: Dan Burkert 
Gerrit-Reviewer: David Ribeiro Alves 
Gerrit-Reviewer: Dinesh Bhat 
Gerrit-Reviewer: Kudu Jenkins
Gerrit-Reviewer: Todd Lipcon 


[kudu-CR] KUDU-1657: read-only FsManager::Open on active tablet can crash

2016-09-29 Thread Dinesh Bhat (Code Review)
Dinesh Bhat has posted comments on this change.

Change subject: KUDU-1657: read-only FsManager::Open on active tablet can crash
..


Patch Set 6:

(4 comments)

http://gerrit.cloudera.org:8080/#/c/4551/2/src/kudu/fs/log_block_manager.cc
File src/kudu/fs/log_block_manager.cc:

PS2, Line 499: en to by another lbm, we must reinspect th
> Sort of; the log block manager is careful to only write to the metadata fil
Hmmm, makes sense, thanks. And how does compaction weigh in here ? i.e if 
compaction kicked in after the check (data_file_sz < record.details) and end up 
reducing the data_file_size even further ? I am not sure if that's possible or 
not or perhaps compaction results into writing to a new data file altogether so 
that situation doesn't arise here ?


http://gerrit.cloudera.org:8080/#/c/4551/5/src/kudu/integration-tests/open-readonly-fs-itest.cc
File src/kudu/integration-tests/open-readonly-fs-itest.cc:

Line 30: using kudu::client::KuduColumnSchema;
Unused ?


Line 36: using kudu::client::KuduWriteOperation;
Unused ? Ignore these comments since I don't trust gerrit searches sometimes


Line 92:   }
80-char wrap up ?


-- 
To view, visit http://gerrit.cloudera.org:8080/4551
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: If68b04f1f1b8cd099120a220b1245ecf8f422770
Gerrit-PatchSet: 6
Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-Owner: Dan Burkert 
Gerrit-Reviewer: Adar Dembo 
Gerrit-Reviewer: Dan Burkert 
Gerrit-Reviewer: David Ribeiro Alves 
Gerrit-Reviewer: Dinesh Bhat 
Gerrit-Reviewer: Kudu Jenkins
Gerrit-Reviewer: Todd Lipcon 
Gerrit-HasComments: Yes


[kudu-CR] KUDU-1657: read-only FsManager::Open on active tablet can crash

2016-09-29 Thread Adar Dembo (Code Review)
Adar Dembo has posted comments on this change.

Change subject: KUDU-1657: read-only FsManager::Open on active tablet can crash
..


Patch Set 6: Code-Review+2

(1 comment)

http://gerrit.cloudera.org:8080/#/c/4551/5/src/kudu/integration-tests/open-readonly-fs-itest.cc
File src/kudu/integration-tests/open-readonly-fs-itest.cc:

PS5, Line 44: const auto kTimeout = MonoDelta::FromSeconds(60);
: const char kTableName[] = "test-table";
: const int kNumColumns = 50;
> huh?  They were at the top of the TEST_F in rev 3.
Yeah, but not with the kCamelCase variable name convention we use.


-- 
To view, visit http://gerrit.cloudera.org:8080/4551
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: If68b04f1f1b8cd099120a220b1245ecf8f422770
Gerrit-PatchSet: 6
Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-Owner: Dan Burkert 
Gerrit-Reviewer: Adar Dembo 
Gerrit-Reviewer: Dan Burkert 
Gerrit-Reviewer: David Ribeiro Alves 
Gerrit-Reviewer: Dinesh Bhat 
Gerrit-Reviewer: Kudu Jenkins
Gerrit-Reviewer: Todd Lipcon 
Gerrit-HasComments: Yes


[kudu-CR] KUDU-1657: read-only FsManager::Open on active tablet can crash

2016-09-29 Thread Dan Burkert (Code Review)
Dan Burkert has posted comments on this change.

Change subject: KUDU-1657: read-only FsManager::Open on active tablet can crash
..


Patch Set 5:

(2 comments)

http://gerrit.cloudera.org:8080/#/c/4551/5/src/kudu/integration-tests/open-readonly-fs-itest.cc
File src/kudu/integration-tests/open-readonly-fs-itest.cc:

PS5, Line 44: const char kTableName[] = "test-table";
: const int kNumColumns = 50;
: const auto kTimeout = MonoDelta::FromSeconds(60);
> Sorry, I meant at the top of the TEST_F() method itself, but I guess this i
huh?  They were at the top of the TEST_F in rev 3.


PS5, Line 59: flush blocks
> Nit: "flush blocks" and "compaction blocks" isn't wrong, but it suggests th
Done


-- 
To view, visit http://gerrit.cloudera.org:8080/4551
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: If68b04f1f1b8cd099120a220b1245ecf8f422770
Gerrit-PatchSet: 5
Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-Owner: Dan Burkert 
Gerrit-Reviewer: Adar Dembo 
Gerrit-Reviewer: Dan Burkert 
Gerrit-Reviewer: David Ribeiro Alves 
Gerrit-Reviewer: Dinesh Bhat 
Gerrit-Reviewer: Kudu Jenkins
Gerrit-Reviewer: Todd Lipcon 
Gerrit-HasComments: Yes


[kudu-CR] KUDU-1657: read-only FsManager::Open on active tablet can crash

2016-09-29 Thread David Ribeiro Alves (Code Review)
David Ribeiro Alves has posted comments on this change.

Change subject: KUDU-1657: read-only FsManager::Open on active tablet can crash
..


Patch Set 5: Code-Review+2

+2 from me, once Adar's nits get addressed. Thanks for adding the test.

-- 
To view, visit http://gerrit.cloudera.org:8080/4551
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: If68b04f1f1b8cd099120a220b1245ecf8f422770
Gerrit-PatchSet: 5
Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-Owner: Dan Burkert 
Gerrit-Reviewer: Adar Dembo 
Gerrit-Reviewer: Dan Burkert 
Gerrit-Reviewer: David Ribeiro Alves 
Gerrit-Reviewer: Dinesh Bhat 
Gerrit-Reviewer: Kudu Jenkins
Gerrit-Reviewer: Todd Lipcon 
Gerrit-HasComments: No


[kudu-CR] KUDU-1657: read-only FsManager::Open on active tablet can crash

2016-09-29 Thread Adar Dembo (Code Review)
Adar Dembo has posted comments on this change.

Change subject: KUDU-1657: read-only FsManager::Open on active tablet can crash
..


Patch Set 5:

(2 comments)

http://gerrit.cloudera.org:8080/#/c/4551/5/src/kudu/integration-tests/open-readonly-fs-itest.cc
File src/kudu/integration-tests/open-readonly-fs-itest.cc:

PS5, Line 44: const char kTableName[] = "test-table";
: const int kNumColumns = 50;
: const auto kTimeout = MonoDelta::FromSeconds(60);
Sorry, I meant at the top of the TEST_F() method itself, but I guess this is OK 
too.


PS5, Line 59: flush blocks
Nit: "flush blocks" and "compaction blocks" isn't wrong, but it suggests that 
there is a difference in blocks between flushes and compactions, which isn't 
really the case. How about "we reduce the MRS flush threshold to increase flush 
frequency and increase the number of MM threads to encourage frequent 
compactions. The net effect of both of these changes: more blocks are written 
to disk."


-- 
To view, visit http://gerrit.cloudera.org:8080/4551
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: If68b04f1f1b8cd099120a220b1245ecf8f422770
Gerrit-PatchSet: 5
Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-Owner: Dan Burkert 
Gerrit-Reviewer: Adar Dembo 
Gerrit-Reviewer: Dan Burkert 
Gerrit-Reviewer: David Ribeiro Alves 
Gerrit-Reviewer: Dinesh Bhat 
Gerrit-Reviewer: Kudu Jenkins
Gerrit-Reviewer: Todd Lipcon 
Gerrit-HasComments: Yes


[kudu-CR] KUDU-1657: read-only FsManager::Open on active tablet can crash

2016-09-29 Thread Dan Burkert (Code Review)
Dan Burkert has posted comments on this change.

Change subject: KUDU-1657: read-only FsManager::Open on active tablet can crash
..


Patch Set 3:

(3 comments)

http://gerrit.cloudera.org:8080/#/c/4551/3/src/kudu/integration-tests/open-readonly-fs-itest.cc
File src/kudu/integration-tests/open-readonly-fs-itest.cc:

PS3, Line 54: // To repro KUDU-1657, many flushes need to be happening, so 
turn knobs in
: // order to write and flush as fast as possible.
> Nit: that explains flags #2 and #3, but not #1. For #1, I think it's becaus
Done


PS3, Line 80:   int num_columns = 50;
:   auto timeout = MonoDelta::FromSeconds(60);
> Nit: maybe declare these at the top of the test with kFoo syntax, so it's c
Done


Line 91:   CHECK_OK(b.Build());
> Nit: could be ASSERT_OK since it's in a test.
Done


-- 
To view, visit http://gerrit.cloudera.org:8080/4551
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: If68b04f1f1b8cd099120a220b1245ecf8f422770
Gerrit-PatchSet: 3
Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-Owner: Dan Burkert 
Gerrit-Reviewer: Adar Dembo 
Gerrit-Reviewer: Dan Burkert 
Gerrit-Reviewer: David Ribeiro Alves 
Gerrit-Reviewer: Dinesh Bhat 
Gerrit-Reviewer: Kudu Jenkins
Gerrit-Reviewer: Todd Lipcon 
Gerrit-HasComments: Yes


[kudu-CR] KUDU-1657: read-only FsManager::Open on active tablet can crash

2016-09-29 Thread Dan Burkert (Code Review)
Hello Adar Dembo, Kudu Jenkins,

I'd like you to reexamine a change.  Please visit

http://gerrit.cloudera.org:8080/4551

to look at the new patch set (#4).

Change subject: KUDU-1657: read-only FsManager::Open on active tablet can crash
..

KUDU-1657: read-only FsManager::Open on active tablet can crash

This fixes a race condition when opening a read-only FsManager on a data
directory with lbm containers that are actively being appended to. See
the code comment for more details. A best-effort repro test case is
included; it typically takes about 35 seconds to repro without the fix.
This issue was identified because it was causing
alter_table-randomized-test to be significantly flaky.

Change-Id: If68b04f1f1b8cd099120a220b1245ecf8f422770
---
M src/kudu/fs/log_block_manager.cc
M src/kudu/integration-tests/CMakeLists.txt
A src/kudu/integration-tests/open-readonly-fs-itest.cc
3 files changed, 157 insertions(+), 1 deletion(-)


  git pull ssh://gerrit.cloudera.org:29418/kudu refs/changes/51/4551/4
-- 
To view, visit http://gerrit.cloudera.org:8080/4551
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: newpatchset
Gerrit-Change-Id: If68b04f1f1b8cd099120a220b1245ecf8f422770
Gerrit-PatchSet: 4
Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-Owner: Dan Burkert 
Gerrit-Reviewer: Adar Dembo 
Gerrit-Reviewer: Dan Burkert 
Gerrit-Reviewer: David Ribeiro Alves 
Gerrit-Reviewer: Dinesh Bhat 
Gerrit-Reviewer: Kudu Jenkins
Gerrit-Reviewer: Todd Lipcon 


[kudu-CR] KUDU-1657: read-only FsManager::Open on active tablet can crash

2016-09-29 Thread Adar Dembo (Code Review)
Adar Dembo has posted comments on this change.

Change subject: KUDU-1657: read-only FsManager::Open on active tablet can crash
..


Patch Set 3:

(3 comments)

http://gerrit.cloudera.org:8080/#/c/4551/3/src/kudu/integration-tests/open-readonly-fs-itest.cc
File src/kudu/integration-tests/open-readonly-fs-itest.cc:

PS3, Line 54: // To repro KUDU-1657, many flushes need to be happening, so 
turn knobs in
: // order to write and flush as fast as possible.
Nit: that explains flags #2 and #3, but not #1. For #1, I think it's because 
you don't want the test to eat gobs of disk space right? Or was it because you 
wanted container file sizes to change as often as possible?


PS3, Line 80:   int num_columns = 50;
:   auto timeout = MonoDelta::FromSeconds(60);
Nit: maybe declare these at the top of the test with kFoo syntax, so it's clear 
they're constants that control the performance characteristics of the test?


Line 91:   CHECK_OK(b.Build());
Nit: could be ASSERT_OK since it's in a test.


-- 
To view, visit http://gerrit.cloudera.org:8080/4551
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: If68b04f1f1b8cd099120a220b1245ecf8f422770
Gerrit-PatchSet: 3
Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-Owner: Dan Burkert 
Gerrit-Reviewer: Adar Dembo 
Gerrit-Reviewer: Dan Burkert 
Gerrit-Reviewer: David Ribeiro Alves 
Gerrit-Reviewer: Dinesh Bhat 
Gerrit-Reviewer: Kudu Jenkins
Gerrit-Reviewer: Todd Lipcon 
Gerrit-HasComments: Yes


[kudu-CR] KUDU-1657: read-only FsManager::Open on active tablet can crash

2016-09-29 Thread Dan Burkert (Code Review)
Dan Burkert has posted comments on this change.

Change subject: KUDU-1657: read-only FsManager::Open on active tablet can crash
..


Patch Set 2:

(3 comments)

Test added in slow mode.

http://gerrit.cloudera.org:8080/#/c/4551/2/src/kudu/fs/log_block_manager.cc
File src/kudu/fs/log_block_manager.cc:

Line 496: if (data_file_size < record.offset() + record.length()) {
> PREDICT_FALSE to annotate that this is a rare race
Done


PS2, Line 499: in order to account for the latest appends
> Good find Dan, this is not a review but a curiosity Qn: Looking at the desc
Sort of; the log block manager is careful to only write to the metadata file 
after writing the data file.  So if the other thread or process that's reading 
the metadata file in read-only mode sees an entry for a block in the metadata 
file, it should be guaranteed that if it then rereads the data file length, the 
length will be past the end of the block.


PS2, Line 504: local_records.back()
> Nit: replace with record ?
Done


-- 
To view, visit http://gerrit.cloudera.org:8080/4551
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: If68b04f1f1b8cd099120a220b1245ecf8f422770
Gerrit-PatchSet: 2
Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-Owner: Dan Burkert 
Gerrit-Reviewer: Adar Dembo 
Gerrit-Reviewer: Dan Burkert 
Gerrit-Reviewer: David Ribeiro Alves 
Gerrit-Reviewer: Dinesh Bhat 
Gerrit-Reviewer: Kudu Jenkins
Gerrit-Reviewer: Todd Lipcon 
Gerrit-HasComments: Yes


[kudu-CR] KUDU-1657: read-only FsManager::Open on active tablet can crash

2016-09-29 Thread Dan Burkert (Code Review)
Hello Adar Dembo, Kudu Jenkins,

I'd like you to reexamine a change.  Please visit

http://gerrit.cloudera.org:8080/4551

to look at the new patch set (#3).

Change subject: KUDU-1657: read-only FsManager::Open on active tablet can crash
..

KUDU-1657: read-only FsManager::Open on active tablet can crash

This fixes a race condition when opening a read-only FsManager on a data
directory with tablets that are actively flushing. See the code comment
for more details.  A best-effort repro test case is included; it
typically takes about 35 seconds to repro without the fix. This issue
was identified because it was causing alter_table-randomized-test to be
significantly flaky.

Change-Id: If68b04f1f1b8cd099120a220b1245ecf8f422770
---
M src/kudu/fs/log_block_manager.cc
M src/kudu/integration-tests/CMakeLists.txt
A src/kudu/integration-tests/open-readonly-fs-itest.cc
3 files changed, 154 insertions(+), 1 deletion(-)


  git pull ssh://gerrit.cloudera.org:29418/kudu refs/changes/51/4551/3
-- 
To view, visit http://gerrit.cloudera.org:8080/4551
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: newpatchset
Gerrit-Change-Id: If68b04f1f1b8cd099120a220b1245ecf8f422770
Gerrit-PatchSet: 3
Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-Owner: Dan Burkert 
Gerrit-Reviewer: Adar Dembo 
Gerrit-Reviewer: Dan Burkert 
Gerrit-Reviewer: David Ribeiro Alves 
Gerrit-Reviewer: Dinesh Bhat 
Gerrit-Reviewer: Kudu Jenkins
Gerrit-Reviewer: Todd Lipcon 


[kudu-CR] KUDU-1657: read-only FsManager::Open on active tablet can crash

2016-09-28 Thread Adar Dembo (Code Review)
Adar Dembo has posted comments on this change.

Change subject: KUDU-1657: read-only FsManager::Open on active tablet can crash
..


Patch Set 2:

> that being said, maybe you could just insert the block that causes
 > the test somewhere else.  How about FullStackInsertScanTest, would
 > it be easier to cause it there?

I too have a (weak) opinion that the test should be merged but I'll defer to 
Dan. That said, we use FullStackInsertScanTest in performance dashboards, so I 
don't think we should add a FsManager.Open() looping thread to it.

-- 
To view, visit http://gerrit.cloudera.org:8080/4551
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: If68b04f1f1b8cd099120a220b1245ecf8f422770
Gerrit-PatchSet: 2
Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-Owner: Dan Burkert 
Gerrit-Reviewer: Adar Dembo 
Gerrit-Reviewer: Dan Burkert 
Gerrit-Reviewer: David Ribeiro Alves 
Gerrit-Reviewer: Dinesh Bhat 
Gerrit-Reviewer: Kudu Jenkins
Gerrit-Reviewer: Todd Lipcon 
Gerrit-HasComments: No


[kudu-CR] KUDU-1657: read-only FsManager::Open on active tablet can crash

2016-09-28 Thread David Ribeiro Alves (Code Review)
David Ribeiro Alves has posted comments on this change.

Change subject: KUDU-1657: read-only FsManager::Open on active tablet can crash
..


Patch Set 2:

(1 comment)

that test is going to bit rot super quickly and this seems like a thorny enough 
problem that we'd want to run it frequently in case we break it somewhere else.
60 seconds isn't much when running tests in parallel.
that being said, maybe you could just insert the block that causes the test 
somewhere else.  How about FullStackInsertScanTest, would it be easier to cause 
it there?

http://gerrit.cloudera.org:8080/#/c/4551/2/src/kudu/fs/log_block_manager.cc
File src/kudu/fs/log_block_manager.cc:

Line 496: if (data_file_size < record.offset() + record.length()) {
PREDICT_FALSE to annotate that this is a rare race


-- 
To view, visit http://gerrit.cloudera.org:8080/4551
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: If68b04f1f1b8cd099120a220b1245ecf8f422770
Gerrit-PatchSet: 2
Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-Owner: Dan Burkert 
Gerrit-Reviewer: Adar Dembo 
Gerrit-Reviewer: Dan Burkert 
Gerrit-Reviewer: David Ribeiro Alves 
Gerrit-Reviewer: Dinesh Bhat 
Gerrit-Reviewer: Kudu Jenkins
Gerrit-Reviewer: Todd Lipcon 
Gerrit-HasComments: Yes


[kudu-CR] KUDU-1657: read-only FsManager::Open on active tablet can crash

2016-09-28 Thread Dinesh Bhat (Code Review)
Dinesh Bhat has posted comments on this change.

Change subject: KUDU-1657: read-only FsManager::Open on active tablet can crash
..


Patch Set 2:

(2 comments)

http://gerrit.cloudera.org:8080/#/c/4551/2/src/kudu/fs/log_block_manager.cc
File src/kudu/fs/log_block_manager.cc:

PS2, Line 499: in order to account for the latest appends
Good find Dan, this is not a review but a curiosity Qn: Looking at the 
description/comments it seems like a racy situation between reader thread and a 
writer thread(I may have understood this wrong). Is there a guarantee here that 
the data_file_->Size is not changed after we grabbed the local_record.back() to 
inspect with CheckBlockRecord?


PS2, Line 504: local_records.back()
Nit: replace with record ?


-- 
To view, visit http://gerrit.cloudera.org:8080/4551
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: If68b04f1f1b8cd099120a220b1245ecf8f422770
Gerrit-PatchSet: 2
Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-Owner: Dan Burkert 
Gerrit-Reviewer: Adar Dembo 
Gerrit-Reviewer: Dan Burkert 
Gerrit-Reviewer: David Ribeiro Alves 
Gerrit-Reviewer: Dinesh Bhat 
Gerrit-Reviewer: Kudu Jenkins
Gerrit-Reviewer: Todd Lipcon 
Gerrit-HasComments: Yes


[kudu-CR] KUDU-1657: read-only FsManager::Open on active tablet can crash

2016-09-28 Thread Adar Dembo (Code Review)
Adar Dembo has posted comments on this change.

Change subject: KUDU-1657: read-only FsManager::Open on active tablet can crash
..


Patch Set 2: Code-Review+2

-- 
To view, visit http://gerrit.cloudera.org:8080/4551
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: If68b04f1f1b8cd099120a220b1245ecf8f422770
Gerrit-PatchSet: 2
Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-Owner: Dan Burkert 
Gerrit-Reviewer: Adar Dembo 
Gerrit-Reviewer: Dan Burkert 
Gerrit-Reviewer: David Ribeiro Alves 
Gerrit-Reviewer: Kudu Jenkins
Gerrit-Reviewer: Todd Lipcon 
Gerrit-HasComments: No


[kudu-CR] KUDU-1657: read-only FsManager::Open on active tablet can crash

2016-09-28 Thread Dan Burkert (Code Review)
Dan Burkert has posted comments on this change.

Change subject: KUDU-1657: read-only FsManager::Open on active tablet can crash
..


Patch Set 2:

Two reasons:

It's typically about 35 seconds to repro on ve0158; when we fix it we'll have 
to run the test for perhaps 60 seconds to really make sure it doesn't happen.

The issue already has pretty good coverage through alter_table-randomized-itest.

-- 
To view, visit http://gerrit.cloudera.org:8080/4551
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: If68b04f1f1b8cd099120a220b1245ecf8f422770
Gerrit-PatchSet: 2
Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-Owner: Dan Burkert 
Gerrit-Reviewer: Adar Dembo 
Gerrit-Reviewer: Dan Burkert 
Gerrit-Reviewer: David Ribeiro Alves 
Gerrit-Reviewer: Kudu Jenkins
Gerrit-Reviewer: Todd Lipcon 
Gerrit-HasComments: No


[kudu-CR] KUDU-1657: read-only FsManager::Open on active tablet can crash

2016-09-28 Thread David Ribeiro Alves (Code Review)
David Ribeiro Alves has posted comments on this change.

Change subject: KUDU-1657: read-only FsManager::Open on active tablet can crash
..


Patch Set 2:

why not include the test and only run it in slow mode? jenkins runs tests with 
dist-test so I don't think that 30 secs is going to be a problem there.

-- 
To view, visit http://gerrit.cloudera.org:8080/4551
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: If68b04f1f1b8cd099120a220b1245ecf8f422770
Gerrit-PatchSet: 2
Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-Owner: Dan Burkert 
Gerrit-Reviewer: Adar Dembo 
Gerrit-Reviewer: Dan Burkert 
Gerrit-Reviewer: David Ribeiro Alves 
Gerrit-Reviewer: Kudu Jenkins
Gerrit-Reviewer: Todd Lipcon 
Gerrit-HasComments: No


[kudu-CR] KUDU-1657: read-only FsManager::Open on active tablet can crash

2016-09-28 Thread Dan Burkert (Code Review)
Dan Burkert has posted comments on this change.

Change subject: KUDU-1657: read-only FsManager::Open on active tablet can crash
..


Patch Set 1:

(2 comments)

http://gerrit.cloudera.org:8080/#/c/4551/1/src/kudu/fs/log_block_manager.cc
File src/kudu/fs/log_block_manager.cc:

PS1, Line 498:   // This codepath can be used in read-only mode as part of 
a tool running
 :   // against a live tablet server. In this case, the tablet 
server may be
 :   // actively appending entries to the container, so the 
data_file_size may
 :   // need to be occasionally updated to account for new data 
entries.
 :   // CheckBlockRecord will crash the process when 
data_file_size is less
 :   // than what the metadata record indicates.
> Nit: can you reword this comment to reduce its scope? We're deep inside a b
Done


Line 504: 
> Nit: remove this empty line?
Done


-- 
To view, visit http://gerrit.cloudera.org:8080/4551
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: If68b04f1f1b8cd099120a220b1245ecf8f422770
Gerrit-PatchSet: 1
Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-Owner: Dan Burkert 
Gerrit-Reviewer: Adar Dembo 
Gerrit-Reviewer: Dan Burkert 
Gerrit-Reviewer: David Ribeiro Alves 
Gerrit-Reviewer: Kudu Jenkins
Gerrit-Reviewer: Todd Lipcon 
Gerrit-HasComments: Yes


[kudu-CR] KUDU-1657: read-only FsManager::Open on active tablet can crash

2016-09-28 Thread Dan Burkert (Code Review)
Hello Kudu Jenkins,

I'd like you to reexamine a change.  Please visit

http://gerrit.cloudera.org:8080/4551

to look at the new patch set (#2).

Change subject: KUDU-1657: read-only FsManager::Open on active tablet can crash
..

KUDU-1657: read-only FsManager::Open on active tablet can crash

See the comment and JIRA for explanation. No test is included, since the
only reproducible case takes approximately 30 seconds to run.
alter_table-randomized-test is made significantly flaky by this issue,
so it has some amount of coverage already.

The repro creates a single tablet with many columns and no replication,
and writes data to it as fast as possible. Another thread creates a file
system manager in a loop, which opens a log block manager on the local
tablet. The repro can be found at:
https://github.com/danburkert/kudu/commit/e919ca410941b268993824da7668ee69c5a7b31c

Change-Id: If68b04f1f1b8cd099120a220b1245ecf8f422770
---
M src/kudu/fs/log_block_manager.cc
1 file changed, 11 insertions(+), 2 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/kudu refs/changes/51/4551/2
-- 
To view, visit http://gerrit.cloudera.org:8080/4551
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: newpatchset
Gerrit-Change-Id: If68b04f1f1b8cd099120a220b1245ecf8f422770
Gerrit-PatchSet: 2
Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-Owner: Dan Burkert 
Gerrit-Reviewer: Adar Dembo 
Gerrit-Reviewer: David Ribeiro Alves 
Gerrit-Reviewer: Kudu Jenkins
Gerrit-Reviewer: Todd Lipcon 


[kudu-CR] KUDU-1657: read-only FsManager::Open on active tablet can crash

2016-09-27 Thread Adar Dembo (Code Review)
Adar Dembo has posted comments on this change.

Change subject: KUDU-1657: read-only FsManager::Open on active tablet can crash
..


Patch Set 1:

(2 comments)

http://gerrit.cloudera.org:8080/#/c/4551/1/src/kudu/fs/log_block_manager.cc
File src/kudu/fs/log_block_manager.cc:

PS1, Line 498:   // This codepath can be used in read-only mode as part of 
a tool running
 :   // against a live tablet server. In this case, the tablet 
server may be
 :   // actively appending entries to the container, so the 
data_file_size may
 :   // need to be occasionally updated to account for new data 
entries.
 :   // CheckBlockRecord will crash the process when 
data_file_size is less
 :   // than what the metadata record indicates.
Nit: can you reword this comment to reduce its scope? We're deep inside a block 
manager so we shouldn't need to mention tablet servers, tools, etc. Something 
terse like "It's possible for another process to be writing to the container 
while we're processing its metadata. The filesystem will ensure that we only 
ever see complete records, but the file size could change at any time."

Then, for specific information about the repro case, add a reference to the 
JIRA.


Line 504: 
Nit: remove this empty line?


-- 
To view, visit http://gerrit.cloudera.org:8080/4551
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: If68b04f1f1b8cd099120a220b1245ecf8f422770
Gerrit-PatchSet: 1
Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-Owner: Dan Burkert 
Gerrit-Reviewer: Adar Dembo 
Gerrit-Reviewer: David Ribeiro Alves 
Gerrit-Reviewer: Kudu Jenkins
Gerrit-Reviewer: Todd Lipcon 
Gerrit-HasComments: Yes


[kudu-CR] KUDU-1657: read-only FsManager::Open on active tablet can crash

2016-09-27 Thread Dan Burkert (Code Review)
Hello David Ribeiro Alves, Adar Dembo, Todd Lipcon,

I'd like you to do a code review.  Please visit

http://gerrit.cloudera.org:8080/4551

to review the following change.

Change subject: KUDU-1657: read-only FsManager::Open on active tablet can crash
..

KUDU-1657: read-only FsManager::Open on active tablet can crash

See the comment and JIRA for explanation. No test is included, since the
only reproducible case takes approximately 30 seconds to run.
alter_table-randomized-test is made significantly flaky by this issue,
so it has some amount of coverage already in the code base.

The repro can be found at:
https://github.com/danburkert/kudu/commit/e919ca410941b268993824da7668ee69c5a7b31c

Change-Id: If68b04f1f1b8cd099120a220b1245ecf8f422770
---
M src/kudu/fs/log_block_manager.cc
1 file changed, 12 insertions(+), 0 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/kudu refs/changes/51/4551/1
-- 
To view, visit http://gerrit.cloudera.org:8080/4551
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: newchange
Gerrit-Change-Id: If68b04f1f1b8cd099120a220b1245ecf8f422770
Gerrit-PatchSet: 1
Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-Owner: Dan Burkert 
Gerrit-Reviewer: Adar Dembo 
Gerrit-Reviewer: David Ribeiro Alves 
Gerrit-Reviewer: Todd Lipcon