[ https://issues.apache.org/jira/browse/HBASE-10598?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Liang Xie updated HBASE-10598: ------------------------------ Assignee: cuijianwei > Written data can not be read out because MemStore#timeRangeTracker might be > updated concurrently > ------------------------------------------------------------------------------------------------ > > Key: HBASE-10598 > URL: https://issues.apache.org/jira/browse/HBASE-10598 > Project: HBase > Issue Type: Bug > Components: regionserver > Affects Versions: 0.94.16 > Reporter: cuijianwei > Assignee: cuijianwei > Attachments: HBASE-10598-0.94.v1.patch > > > In our test environment, we find written data can't be read out occasionally. > After debugging, we find that maximumTimestamp/minimumTimestamp of > MemStore#timeRangeTracker might decrease/increase when > MemStore#timeRangeTracker is updated concurrently, which might make the > MemStore/StoreFile to be filtered incorrectly when reading data out. Let's > see how the concurrent updating of timeRangeTracker#maximumTimestamp cause > this problem. > Imagining there are two threads T1 and T2 putting two KeyValues kv1 and kv2. > kv1 and kv2 belong to the same Store(so belong to the same region), but > contain different rowkeys. Consequently, kv1 and kv2 could be updated > concurrently. When we see the implementation of HRegionServer#multi, kv1 and > kv2 will be add to MemStore by HRegion#applyFamilyMapToMemstore in > HRegion#doMiniBatchMutation. Then, MemStore#internalAdd will be invoked and > MemStore#timeRangeTracker will be updated by > TimeRangeTracker#includeTimestamp as follows: > {code} > private void includeTimestamp(final long timestamp) { > ... > else if (maximumTimestamp < timestamp) { > maximumTimestamp = timestamp; > } > return; > } > {code} > Imagining the current maximumTimestamp of TimeRangeTracker is t0 before > includeTimestamp(...) invoked, kv1.timestamp=t1, kv2.timestamp=t2, t1 and t2 > are both set by user(then, user knows the timestamps of kv1 and kv2), and t1 > > t2 > t0. T1 and T2 will be executed concurrently, therefore, the two > threads might both find the current maximumTimestamp is less than the > timestamp of its kv. After that, T1 and T2 will both set maximumTimestamp to > timestamp of its kv. If T1 set maximumTimestamp before T2 doing that, the > maximumTimestamp will be set to t2. Then, before any new update with bigger > timestamp has been applied to the MemStore, if we try to read out kv1 by > HTable#get and set the timestamp of 'Get' to t1, the StoreScanner will decide > whether the MemStoreScanner(imagining kv1 has not been flushed) should be > selected as candidate scanner by MemStoreScanner#shouldUseScanner. Then, the > MemStore won't be selected in MemStoreScanner#shouldUseScanner because > maximumTimestamp of the MemStore has been set to t2 (t2 < t1). Consequently, > the written kv1 can't be read out and kv1 is lost from user's perspective. > If the above analysis is right, after maximumTimestamp of > MemStore#timeRangeTracker has been set to t2, user will experience data lass > in the following situations: > 1. Before any new write with kv.timestamp > t1 has been add to the MemStore, > read request of kv1 with timestamp=t1 can not read kv1 out. > 2. Before any new write with kv.timestamp > t1 has been add to the MemStore, > if a flush happened, the data of MemStore will be flushed to StoreFile with > StoreFile#maximumTimestamp set to t2. After that, any read request with > timestamp=t1 can not read kv1 before next compaction(Actually, kv1.timestamp > might not be included in timeRange of the StoreFile even after compaction). > The second situation is much more serious because the incorrect timeRange of > MemStore has been persisted to the file. > Similarly, the concurrent update of TimeRangeTracker#minimumTimestamp may > also cause this problem. > As a simple way to fix the problem, we could add synchronized to > TimeRangeTracker#includeTimestamp so that this method won't be invoked > concurrently. -- This message was sent by Atlassian JIRA (v6.1.5#6160)