[ 
https://issues.apache.org/jira/browse/HBASE-10598?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

cuijianwei updated HBASE-10598:
-------------------------------

    Description: 
In our test environment, we find written data can't be read out occasionally. 
After debugging, we find that maximumTimestamp/minimumTimestamp of 
MemStore#timeRangeTracker might decrease/increase when 
MemStore#timeRangeTracker is updated concurrently, which might make the 
MemStore/StoreFile to be filtered incorrectly when reading data out. Let's see 
how the concurrent updating of timeRangeTracker#maximumTimestamp cause this 
problem. 
Imagining there are two threads T1 and T2 putting two KeyValues kv1 and kv2. 
kv1 and kv2 belong to the same Store(so belong to the same region), but contain 
different rowkeys. Consequently, kv1 and kv2 could be updated concurrently. 
When we see the implementation of HRegionServer#multi, kv1 and kv2 will be add 
to MemStore by HRegion#applyFamilyMapToMemstore in HRegion#doMiniBatchMutation. 
Then, MemStore#internalAdd will be invoked and MemStore#timeRangeTracker will 
be updated by TimeRangeTracker#includeTimestamp as follows:
{code}
  private void includeTimestamp(final long timestamp) {
     ...
    else if (maximumTimestamp < timestamp) {
      maximumTimestamp = timestamp;
    }
    return;
  }
{code}
Imagining the current maximumTimestamp of TimeRangeTracker is t0 before 
includeTimestamp invoked, kv1.timestamp=t1,  kv2.timestamp=t2, t1 and t2 are 
both set by user(then, user knows the timestamps of kv1 and kv2), and t1 > t2 > 
t0. T1 and T2 will be executed concurrently, therefore, the two threads might 
both find the current maximumTimestamp is less than the timestamp of its kv. 
After that, T1 and T2 will both set maximumTimestamp to timestamp of its kv. If 
T1 set maximumTimestamp before T2 doing that, the maximumTimestamp will be set 
to t2. Then, before any new update with bigger timestamp has been applied to 
the MemStore, if we try to read out kv1 by HTable#get and set the timestamp of 
'Get' to t1, the StoreScanner will decide whether the MemStoreScanner(imagining 
kv1 has not been flushed) should be selected as candidate scanner by 
MemStoreScanner#shouldUseScanner. Then, the MemStore won't be selected because 
maximumTimestamp of the MemStore has been set to t2 (t2 < t1). Consequently, 
the written kv1 can't be read out and kv1 is lost from user's perspective.
If the above analysis is right, after maximumTimestamp of 
MemStore#timeRangeTracker has been set to t2, user will experience data lass in 
the following situations:
1. Before any new write with kv.timestamp > t1 has been add to the MemStore, 
read request of kv1 with timestamp=t1 can not read kv1 out.
2. Before any new write with kv.timestamp > t1 has been add to the MemStore, if 
a flush happened, the data of MemStore will be flushed to StoreFile with 
StoreFile#maximumTimestamp set to t2. After that, any read request with 
timestamp=t1 can not read kv1 before next compaction(kv1.timestamp might also 
not be included in timeRange of StoreFile even after compaction).
The second situation is much more serious because the incorrect timeRange of 
MemStore has been persisted to the file. 
Similarly, the concurrent update of TimeRangeTracker#minimumTimestamp may also 
cause this problem.
As a simple way to fix the problem, we could add synchronized to 
TimeRangeTracker#includeTimestamp so that this method won't be invoked 
concurrently.

  was:
In our test environment, we find written data can't be read out occasionally. 
After debugging, we find that maximumTimestamp/minimumTimestamp of 
MemStore#timeRangeTracker might decrease/increase when 
MemStore#timeRangeTracker is updated concurrently, which might make the 
MemStore/StoreFile to be filtered incorrectly when reading data out. Let's see 
how the concurrent updating of timeRangeTracker#maximumTimestamp cause this 
problem. 
Imagining there are two threads T1 and T2 putting two KeyValues kv1 and kv2. 
kv1 and kv2 belong to the same Store(so belong to the same region), but contain 
different rowkeys. Consequently, kv1 and kv2 could be updated concurrently. 
When we see the implementation of HRegionServer#multi, kv1 and kv2 will be add 
to MemStore by HRegion#doMiniBatchMutation#applyFamilyMapToMemstore. Then, 
MemStore#internalAdd will be invoked and MemStore#timeRangeTracker will be 
updated by TimeRangeTracker#includeTimestamp as follows:
{code}
  private void includeTimestamp(final long timestamp) {
     ...
    else if (maximumTimestamp < timestamp) {
      maximumTimestamp = timestamp;
    }
    return;
  }
{code}
Imagining the current maximumTimestamp of TimeRangeTracker is t0 before 
includeTimestamp invoked, kv1.timestamp=t1,  kv2.timestamp=t2, t1 and t2 are 
both set by user(then, user knows the timestamps of kv1 and kv2), and t1 > t2 > 
t0. T1 and T2 will be executed concurrently, therefore, the two threads might 
both find the current maximumTimestamp is less than the timestamp of its kv. 
After that, T1 and T2 will both set maximumTimestamp to timestamp of its kv. If 
T1 set maximumTimestamp before T2 doing that, the maximumTimestamp will be set 
to t2. Then, before any new update with bigger timestamp has been applied to 
the MemStore, if we try to read out kv1 by HTable#get and set the timestamp of 
'Get' to t1, the StoreScanner will decide whether the MemStoreScanner(imagining 
kv1 has not been flushed) should be selected as candidate scanner by 
MemStoreScanner#shouldUseScanner. Then, the MemStore won't be selected because 
maximumTimestamp of the MemStore has been set to t2 (t2 < t1). Consequently, 
the written kv1 can't be read out and kv1 is lost from user's perspective.
If the above analysis is right, after maximumTimestamp of 
MemStore#timeRangeTracker has been set to t2, user will experience data lass in 
the following situations:
1. Before any new write with kv.timestamp > t1 has been add to the MemStore, 
read request of kv1 with timestamp=t1 can not read kv1 out.
2. Before any new write with kv.timestamp > t1 has been add to the MemStore, if 
a flush happened, the data of MemStore will be flushed to StoreFile with 
StoreFile#maximumTimestamp set to t2. After that, any read request with 
timestamp=t1 can not read kv1 before next compaction(kv1.timestamp might also 
not be included in timeRange of StoreFile even after compaction).
The second situation is much more serious because the incorrect timeRange of 
MemStore has been persisted to the file. 
Similarly, the concurrent update of TimeRangeTracker#minimumTimestamp may also 
cause this problem.
As a simple way to fix the problem, we could add synchronized to 
TimeRangeTracker#includeTimestamp so that this method won't be invoked 
concurrently.


> Written data can not be read out because MemStore#timeRangeTracker might be 
> updated concurrently
> ------------------------------------------------------------------------------------------------
>
>                 Key: HBASE-10598
>                 URL: https://issues.apache.org/jira/browse/HBASE-10598
>             Project: HBase
>          Issue Type: Bug
>          Components: regionserver
>    Affects Versions: 0.94.16
>            Reporter: cuijianwei
>
> In our test environment, we find written data can't be read out occasionally. 
> After debugging, we find that maximumTimestamp/minimumTimestamp of 
> MemStore#timeRangeTracker might decrease/increase when 
> MemStore#timeRangeTracker is updated concurrently, which might make the 
> MemStore/StoreFile to be filtered incorrectly when reading data out. Let's 
> see how the concurrent updating of timeRangeTracker#maximumTimestamp cause 
> this problem. 
> Imagining there are two threads T1 and T2 putting two KeyValues kv1 and kv2. 
> kv1 and kv2 belong to the same Store(so belong to the same region), but 
> contain different rowkeys. Consequently, kv1 and kv2 could be updated 
> concurrently. When we see the implementation of HRegionServer#multi, kv1 and 
> kv2 will be add to MemStore by HRegion#applyFamilyMapToMemstore in 
> HRegion#doMiniBatchMutation. Then, MemStore#internalAdd will be invoked and 
> MemStore#timeRangeTracker will be updated by 
> TimeRangeTracker#includeTimestamp as follows:
> {code}
>   private void includeTimestamp(final long timestamp) {
>      ...
>     else if (maximumTimestamp < timestamp) {
>       maximumTimestamp = timestamp;
>     }
>     return;
>   }
> {code}
> Imagining the current maximumTimestamp of TimeRangeTracker is t0 before 
> includeTimestamp invoked, kv1.timestamp=t1,  kv2.timestamp=t2, t1 and t2 are 
> both set by user(then, user knows the timestamps of kv1 and kv2), and t1 > t2 
> > t0. T1 and T2 will be executed concurrently, therefore, the two threads 
> might both find the current maximumTimestamp is less than the timestamp of 
> its kv. After that, T1 and T2 will both set maximumTimestamp to timestamp of 
> its kv. If T1 set maximumTimestamp before T2 doing that, the maximumTimestamp 
> will be set to t2. Then, before any new update with bigger timestamp has been 
> applied to the MemStore, if we try to read out kv1 by HTable#get and set the 
> timestamp of 'Get' to t1, the StoreScanner will decide whether the 
> MemStoreScanner(imagining kv1 has not been flushed) should be selected as 
> candidate scanner by MemStoreScanner#shouldUseScanner. Then, the MemStore 
> won't be selected because maximumTimestamp of the MemStore has been set to t2 
> (t2 < t1). Consequently, the written kv1 can't be read out and kv1 is lost 
> from user's perspective.
> If the above analysis is right, after maximumTimestamp of 
> MemStore#timeRangeTracker has been set to t2, user will experience data lass 
> in the following situations:
> 1. Before any new write with kv.timestamp > t1 has been add to the MemStore, 
> read request of kv1 with timestamp=t1 can not read kv1 out.
> 2. Before any new write with kv.timestamp > t1 has been add to the MemStore, 
> if a flush happened, the data of MemStore will be flushed to StoreFile with 
> StoreFile#maximumTimestamp set to t2. After that, any read request with 
> timestamp=t1 can not read kv1 before next compaction(kv1.timestamp might also 
> not be included in timeRange of StoreFile even after compaction).
> The second situation is much more serious because the incorrect timeRange of 
> MemStore has been persisted to the file. 
> Similarly, the concurrent update of TimeRangeTracker#minimumTimestamp may 
> also cause this problem.
> As a simple way to fix the problem, we could add synchronized to 
> TimeRangeTracker#includeTimestamp so that this method won't be invoked 
> concurrently.



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)

Reply via email to