[ 
https://issues.apache.org/jira/browse/HBASE-25913?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17356996#comment-17356996
 ] 

Andrew Kyle Purtell edited comment on HBASE-25913 at 6/4/21, 1:41 AM:
----------------------------------------------------------------------

Draft PR: https://github.com/apache/hbase/pull/3352
The essential changes are in three files, BoundedIncrementYieldAdvancingClock, 
BaseEnvironmentEdge, and HRegion.

TODO:

* One reasonable HRegion based test that ensures the timestamp substitutions 
made in a tight loop that would do more than one in a clock tick are all unique.

 * Because we need an advancing time from the clock in the mini batch mutation 
code path, which is where we have rightly factored mutation processing but also 
where the common case involves multiple rows , we do not manage independent 
advancing time at the row scope, it has to be the region scope. (Strictly 
speaking the interesting scope is the row. What we need to order in time are 
updates to a row.) All concurrent updates to the region are instead in scope in 
this proposal, so the clock type should block as infrequently as possible, i.e. 
BoundedIncrementYieldAdvancingClock. On the upside, we can simply manage this 
clock as part of region state and harmonize all timekeeping for region actions 
through that clock instance. (current time won't ever seem to go backwards wrt 
advancing time). However, it would be worth exploring if we can do 
EnvironmentEdgeManager.getClock(<region_name>+<row_key>) if and when we can 
determine only one row is being updated.



was (Author: apurtell):
Draft PR: https://github.com/apache/hbase/pull/3352
The essential changes are in three files, BoundedIncrementYieldAdvancingClock, 
BaseEnvironmentEdge, and HRegion.

> Introduce EnvironmentEdge.currentTimeAdvancing
> ----------------------------------------------
>
>                 Key: HBASE-25913
>                 URL: https://issues.apache.org/jira/browse/HBASE-25913
>             Project: HBase
>          Issue Type: Sub-task
>            Reporter: Andrew Kyle Purtell
>            Assignee: Andrew Kyle Purtell
>            Priority: Major
>             Fix For: 3.0.0-alpha-1, 2.5.0
>
>         Attachments: HBASE-25913_Multithreaded_Benchmarks.pdf, 
> JMH-HBASE-25913.pdf, jmh-HBASE-25913.tar.gz
>
>
> Introduce new {{EnvironmentEdge#currentTimeAdvancing}} which ensures that 
> when the current time is returned, it is the current time in a different 
> clock tick from the last time the {{EnvironmentEdge}} was used to get the 
> current time.
> When processing mutations we substitute the {{Long.MAX_VALUE}} timestamp 
> placeholder with a real placeholder just before committing the mutation. The 
> current code gets the current time for timestamp substitution while under row 
> lock and mvcc. We will simply use {{EnvironmentEdge#currentTimeAdvancing}} 
> instead of {{EnvironmentEdge#currentTime}} at this point in the code to 
> ensure we have seen the clock tick over. When processing a batch of mutations 
> (doMiniBatchMutation etc) we will call {{currentTimeAdvancing}} only once. 
> This means the client cannot bundle cells with wildcard timestamps into a 
> batch where those cells must be committed with different timestamps. Clients 
> must simply not submit mutations that must be committed with guaranteed 
> distinct timestamps in the same batch. Easy to understand, easy to document, 
> and it aligns with our design philosophy of the client knows best.
> It is not required to handle batches as proposed. We could guarantee a 
> distinct timestamp for every mutation in a batch. Count the number of 
> mutations, call this M. Acquire all row locks and get the current time. Then, 
> wait for at least M milliseconds. Then, set the first mutation timestamp with 
> this value and increment by 1 for all remaining. Then, do the rest of 
> mutation processing as normal. I don't think this extra waiting to reserve 
> the range of timestamps is necessary. See reasoning in above paragraph. 
> Mentioned here for sake of discussion.
> It will be fine to continue to use {{EnvironmentEdge#currentTime}} everywhere 
> else. In this way we will only potentially spin wait where it matters, and 
> won't suffer serious overheads during batch processing.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

Reply via email to