Re: [PATCH 0/9] x86: Concurrent TLB flushes and other improvements

Nadav Amit Tue, 25 Jun 2019 18:35:14 -0700

> On Jun 25, 2019, at 3:02 PM, Dave Hansen <[email protected]> wrote:
> 
> On 6/12/19 11:48 PM, Nadav Amit wrote:
>> Running sysbench on dax w/emulated-pmem, write-cache disabled, and
>> various mitigations (PTI, Spectre, MDS) disabled on Haswell:
>> 
>> sysbench fileio --file-total-size=3G --file-test-mode=rndwr \
>>  --file-io-mode=mmap --threads=4 --file-fsync-mode=fdatasync run
>> 
>>                      events (avg/stddev)
>>                      -------------------
>>  5.2-rc3:            1247669.0000/16075.39
>>  +patchset:          1290607.0000/13617.56 (+3.4%)
> 
> Why did you decide on disabling the side-channel mitigations?  While
> they make things slower, they're also going to be with us for a while,
> so they really are part of real-world testing IMNHO.  I'd be curious
> whether this set has more or less of an advantage when all the
> mitigations are on.


It seemed reasonable since I wanted to avoid all kind of “noise”. I presume
the relative speedup would be smaller, due to the overhead of the
mitigations, would be smaller. Note that in this benchmark every TLB
invalidation is of a single entry. The benefit (in the terms of absolute
time saved) would have been greater if a flush was of multiple entries.

> Also, why only 4 threads?  Does this set help most when using a moderate
> number of threads since the local and remote cost are (relatively) close
> vs. a large system where doing lots of remote flushes is *way* more
> time-consuming than a local flush?

Don’t overthink it. My server was busy doing something else, so I was
running the tests on a lame desktop I have. I will rerun it on a bigger
machine.

I presume the performance benefit will be smaller when more cores are
involved, since the TLB shootdown time will be dominated by the inter-core
communication time (IPI+cache coherency) and the tail latency of the IPI
delivery (if interrupts are disabled on the target).

I am working on some patches to reduce these overheads as well.

Re: [PATCH 0/9] x86: Concurrent TLB flushes and other improvements

Reply via email to