Hello Dan Burkert, Todd Lipcon, I'd like you to do a code review. Please visit
http://gerrit.cloudera.org:8080/7996 to review the following change. Change subject: KUDU-2137: protect against concurrent schema version change and tablet drop ...................................................................... KUDU-2137: protect against concurrent schema version change and tablet drop Try as I might, I could not reproduce the failure in the bug report. I looped the failed test several thousand times. I also looped the entire alter_table-test suite a thousand times. Finally, I wrote a unit test that hammers one table with concurrent add column, add partition, and drop partition operations. Nothing worked. So, here's my best guess at what's going on: if a tablet is dropped while the master is processing its report, it's conceivable that we could wind up in TabletInfo::set_reported_schema_version() with the table spinlock held just after TableInfo::AddRemoveTablets() dropped the tablet. This would cause us to decrement the tablet's "old" schema version from the table's count map twice: once when dropping the tablet and a second time in set_reported_schema_version(). The fix is straight-forward: after acquiring both spinlocks, double check that the tablet is still a member of the table. But, the extra locking needed to do so feels so very wrong. Change-Id: I371fc310a97ae94ec2ebf04405db99c5f2937e1a --- M src/kudu/master/catalog_manager.cc 1 file changed, 16 insertions(+), 2 deletions(-) git pull ssh://gerrit.cloudera.org:29418/kudu refs/changes/96/7996/1 -- To view, visit http://gerrit.cloudera.org:8080/7996 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: newchange Gerrit-Change-Id: I371fc310a97ae94ec2ebf04405db99c5f2937e1a Gerrit-PatchSet: 1 Gerrit-Project: kudu Gerrit-Branch: master Gerrit-Owner: Adar Dembo <a...@cloudera.com> Gerrit-Reviewer: Dan Burkert <danburk...@apache.org> Gerrit-Reviewer: Todd Lipcon <t...@apache.org>