Hello Dan Burkert, Todd Lipcon,

I'd like you to do a code review.  Please visit

    http://gerrit.cloudera.org:8080/7996

to review the following change.

Change subject: KUDU-2137: protect against concurrent schema version change and 
tablet drop
......................................................................

KUDU-2137: protect against concurrent schema version change and tablet drop

Try as I might, I could not reproduce the failure in the bug report. I
looped the failed test several thousand times. I also looped the entire
alter_table-test suite a thousand times. Finally, I wrote a unit test that
hammers one table with concurrent add column, add partition, and drop
partition operations. Nothing worked.

So, here's my best guess at what's going on: if a tablet is dropped
while the master is processing its report, it's conceivable that we could
wind up in TabletInfo::set_reported_schema_version() with the table spinlock
held just after TableInfo::AddRemoveTablets() dropped the tablet. This would
cause us to decrement the tablet's "old" schema version from the table's
count map twice: once when dropping the tablet and a second time in
set_reported_schema_version().

The fix is straight-forward: after acquiring both spinlocks, double check
that the tablet is still a member of the table. But, the extra locking
needed to do so feels so very wrong.

Change-Id: I371fc310a97ae94ec2ebf04405db99c5f2937e1a
---
M src/kudu/master/catalog_manager.cc
1 file changed, 16 insertions(+), 2 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/kudu refs/changes/96/7996/1
-- 
To view, visit http://gerrit.cloudera.org:8080/7996
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: newchange
Gerrit-Change-Id: I371fc310a97ae94ec2ebf04405db99c5f2937e1a
Gerrit-PatchSet: 1
Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-Owner: Adar Dembo <a...@cloudera.com>
Gerrit-Reviewer: Dan Burkert <danburk...@apache.org>
Gerrit-Reviewer: Todd Lipcon <t...@apache.org>

Reply via email to