[jira] [Comment Edited] (HBASE-25913) Introduce EnvironmentEdge.currentTimeAdvancing

Andrew Kyle Purtell (Jira) Thu, 27 May 2021 19:33:04 -0700


    [ 
https://issues.apache.org/jira/browse/HBASE-25913?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17352925#comment-17352925
 ]


Andrew Kyle Purtell edited comment on HBASE-25913 at 5/28/21, 2:32 AM:
-----------------------------------------------------------------------

I've put together a JMH based harness, attached to this JIRA. JMH is GPLed so 
we cannot include this in the project, but it is fine for standalone work. 

Build with Maven, 'mvn clean install'.

Run like:
{noformat}
java -jar ./target/benchmarks.jar BenchHRegionIncrementAdvancingClock -f 0 -t 1
{noformat}

-f 0 is to avoid forks, for some reason JMH doesn't work for me if forking
-t 1 is single thread

I have a lot of work to do yet characterizing performance in real conditions 
that matter, but the results so far comport with what you'd expect. The current 
System.currentTimeMillis based EnvironmentEdge.currentTime() measures at a mean 
of 61.237 nanoseconds, as baseline. My IncrementAdvancingClock alternative 
measures at a mean of 66.885 nanoseconds. Very little difference is expected 
here because I haven't done experiments with thread contention yet. Coming 
soon. My SpinAdvancingClock takes exactly until the next tick on my MacOS dev 
box, which is 999265.726 nanoseconds, which is just shy of 1 millisecond. My 
YieldAdvancingClock alternative takes 1304904.706  nanoseconds, which is ~1.3 
milliseconds, which is about the tick resolution and also some overhead for the 
small sleep used to yield the thread. This is exactly what you'd expect and not 
terribly interesting. IncrementAdvancingClock is probably the best choice, 
unless there are concerns about the incrementing aspect of it getting too far 
ahead of real time and causing temporal discontinuities, in which case 
SpinAdvancingClock has less overhead than YieldAdvancingClock but will raise 
CPU utilization probably for the spinning. 

Remember that these times are only relevant if you call currentTimeAdvancing(), 
no matter the implementation. That call is done once per mutation RPC handling. 
The clock state is managed per region.

currentTime is unaffected. It has a mean of 60-70 nanoseconds. 




was (Author: apurtell):
I've put together a JMH based harness, attached to this JIRA. JMH is GPLed so 
we cannot include this in the project, but it is fine for standalone work. 

Build with Maven, 'mvn clean install'.

Run like:
{noformat}
java -jar ./target/benchmarks.jar BenchHRegionIncrementAdvancingClock -f 0 -t 1
{noformat}

-f 0 is to avoid forks, for some reason JMH doesn't work for me if forking
-t 1 is single thread

I have a lot of work to do yet characterizing performance in real conditions 
that matter, but the results so far comport with what you'd expect. The current 
System.currentTimeMillis based EnvironmentEdge.currentTime() measures at a mean 
of 61.237 nanoseconds, as baseline. My IncrementAdvancingClock alternative 
measures at a mean of 66.885 nanoseconds. Very little difference is expected 
here because I haven't done experiments with thread contention yet. Coming 
soon. My SpinAdvancingClock takes exactly until the next tick on my MacOS dev 
box, which is 999265.726 nanoseconds, which is just shy of 1 millisecond. My 
YieldAdvancingClock alternative takes 1304904.706  nanoseconds, which is ~1.3 
milliseconds, which is about the tick resolution and also some overhead for the 
small sleep used to yield the thread. This is exactly what you'd expect and not 
terribly interesting. IncrementAdvancingClock is probably the best choice, 
unless there are concerns about the incrementing aspect of it getting too far 
ahead of real time and causing temporal discontinuities, in which case 
SpinAdvancingClock has less overhead than YieldAdvancingClock but will raise 
CPU utilization probably for the spinning. 

Remember that these times are only relevant if you call currentTimeAdvancing(), 
no matter the implementation. 

currentTime is unaffected. It has a mean of 60-70 nanoseconds. 



> Introduce EnvironmentEdge.currentTimeAdvancing
> ----------------------------------------------
>
>                 Key: HBASE-25913
>                 URL: https://issues.apache.org/jira/browse/HBASE-25913
>             Project: HBase
>          Issue Type: Sub-task
>            Reporter: Andrew Kyle Purtell
>            Assignee: Andrew Kyle Purtell
>            Priority: Major
>             Fix For: 3.0.0-alpha-1, 2.5.0
>
>         Attachments: 
> 0001-HBASE-25913-Introduce-EnvironmentEdge.currentTimeAdv.patch, 
> jmh-HBASE-25913.tar.gz
>
>
> Introduce new {{EnvironmentEdge#currentTimeAdvancing}} which ensures that 
> when the current time is returned, it is the current time in a different 
> clock tick from the last time the {{EnvironmentEdge}} was used to get the 
> current time.
> When processing mutations we substitute the {{Long.MAX_VALUE}} timestamp 
> placeholder with a real placeholder just before committing the mutation. The 
> current code gets the current time for timestamp substitution while under row 
> lock and mvcc. We will simply use {{EnvironmentEdge#currentTimeAdvancing}} 
> instead of {{EnvironmentEdge#currentTime}} at this point in the code to 
> ensure we have seen the clock tick over. When processing a batch of mutations 
> (doMiniBatchMutation etc) we will call {{currentTimeAdvancing}} only once. 
> This means the client cannot bundle cells with wildcard timestamps into a 
> batch where those cells must be committed with different timestamps. Clients 
> must simply not submit mutations that must be committed with guaranteed 
> distinct timestamps in the same batch. Easy to understand, easy to document, 
> and it aligns with our design philosophy of the client knows best.
> It is not required to handle batches as proposed. We could guarantee a 
> distinct timestamp for every mutation in a batch. Count the number of 
> mutations, call this M. Acquire all row locks and get the current time. Then, 
> wait for at least M milliseconds. Then, set the first mutation timestamp with 
> this value and increment by 1 for all remaining. Then, do the rest of 
> mutation processing as normal. I don't think this extra waiting to reserve 
> the range of timestamps is necessary. See reasoning in above paragraph. 
> Mentioned here for sake of discussion.
> It will be fine to continue to use {{EnvironmentEdge#currentTime}} everywhere 
> else. In this way we will only potentially spin wait where it matters, and 
> won't suffer serious overheads during batch processing.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Comment Edited] (HBASE-25913) Introduce EnvironmentEdge.currentTimeAdvancing

Reply via email to