On 01/27/2018 02:20 PM, Dmitry Dolgov wrote: > Hi, > > From what I see some time ago the write lifetime hints support for NVMe multi > streaming was merged into Linux kernel [1]. Theoretically it allows data > written together on media so they can be erased together, which minimizes > garbage collection, resulting in reduced write amplification as well as > efficient flash utilization [2]. I couldn't find any discussion about that on > hackers, so I decided to experiment with this feature a bit. My idea was to > test quite naive approach when all file descriptors, that are related to > temporary files, have assigned `RWH_WRITE_LIFE_SHORT`, and rest of them > `RWH_WRITE_LIFE_EXTREME`. Attached patch is a dead simple POC without any > infrastructure around to enable/disable hints. > > It turns out that it's possible to perform benchmarks on some EC2 instance > types (e.g. c5) with the corresponding version of the kernel, since they > expose > a volume as nvme device: > > ``` > # nvme list > Node SN Model > Namespace Usage Format FW Rev > ---------------- -------------------- > ---------------------------------------- --------- > -------------------------- ---------------- -------- > /dev/nvme0n1 vol01cdbc7ec86f17346 Amazon Elastic Block Store > 1 0.00 B / 8.59 GB 512 B + 0 B 1.0 > ``` > > To get some baseline results I've run several rounds of pgbench on these quite > modest instances (dedicated, with optimized EBS) with slightly adjusted > `max_wal_size` and with default configuration: > > $ pgbench -s 200 -i > $ pgbench -T 600 -c 2 -j 2 > > Analyzing `strace` output I can see that during this test there were some > significant number of operations with pg_stat_tmp and xlogtemp, so I assume > write lifetime hints should have some effect. > > As a result I've got reduction of latency about 5-8% (but so far these numbers > are unstable, probably because of virtualization). > > ``` > # without patch > number of transactions actually processed: 491945 > latency average = 2.439 ms > tps = 819.906323 (including connections establishing) > tps = 819.908755 (excluding connections establishing) > ``` > > ``` > with patch > number of transactions actually processed: 521805 > latency average = 2.300 ms > tps = 869.665330 (including connections establishing) > tps = 869.668026 (excluding connections establishing) > ``` >
Aren't those numbers far lower that you'd expect from NVMe storage? I do have a NVMe drive (Intel 750) in my machine, and I can do thousands of transactions on it with two clients. Seems a bit suspicious. regards -- Tomas Vondra http://www.2ndQuadrant.com PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services