On Tue, Oct 7, 2014 at 11:25 AM, Peter Geoghegan <p...@heroku.com> wrote: > Now, we're not talking about a huge advantage here (I should re-test > that).
I attach raw output when running the bash scripts insert.sh and update.sh. These are benchmarks that concern performance in terms of total system throughput (TPS). The scripts are available from my stress-test suite: https://github.com/petergeoghegan/upsert These scripts were originally designed to compare UPSERT with an unsympathetic "gold-standard" for performance: "equivalent" INSERTs and UPDATEs. I looked at a few runs of 60 seconds, on unlogged tables, making the most direct comparison between UPSERTs and "equivalent" INSERTs and UPDATEs that is possible. To be clear, by "equivalent" I mean UPSERTS where we know we'll only UPDATE (in the case of update.sh), and UPSERTS where we know we'll only INSERT (in the case of insert.sh). Both #1 and #2 do respectably as compared to "equivalent" INSERTs and UPDATEs. There may be even less sympathetic though more representative cases, but certainly for these simple cases, performance is solid across the board. I got these numbers on my laptop, and it may be necessary to devise a more rigorous benchmark later, but performance is quite consistent between runs shown here. Approach #1 wins out with UPDATEs. The heavyweight-lock avoidance stuff is enough to compensate for the fact that we never INSERT (and never need B-Tree leaf page heavyweight locks). Median TPS was 19,310.79 for #1. Whereas for #2, it was 18,872.63 TPS. This is the case even though the "pre-check" for #2 is always appropriate, while we still acquire page-level heavyweight locks sometimes with #1. INSERTs see #2 win, and by a wider margin than #1 beat #2 with UPDATEs. However, insert.sh is by design very unsympathetic towards #1. It uses a serial primary key, so every INSERT uselessly obtains a HW lock on the same leaf page for the duration of heap insertion. Anyway, the median INSERT TPS numbers is 17,759.53 for #1, and 20,441.57 TPS for #2. So you're pretty much seeing the full brunt of page heavyweight locking, and it isn't all that bad. However, Heikki has said something about being more clever with when and how #2 is made to pre-check (which is always wasted here); so it's possible to imagine INSERTs becoming faster for #2, while that probably isn't the case for #1. I think that if I wanted to, I could get #1 to do much better on another case where page heavyweight locking is more varied. My goal here was to do the opposite, though. -- Peter Geoghegan
hwlocking (#1): =============== ******** UPDATES (#1) ******** pg@hamster:~/upsert$ ./update.sh running ./update.sh benchmark 10000000 times Test upsert as a replacement for update...no insert part, for comparison against raw update performance UPSERT with no inserts, only updates, vs. equivalent updates running ./update.sh benchmark trying 60 second run (new infrastructure, extended hwlocking) NOTICE: table "foo" does not exist, skipping DROP TABLE CREATE TABLE pre-inserting tuples to update CHECKPOINT transaction type: Custom query scaling factor: 1000000 query mode: simple number of clients: 8 number of threads: 1 duration: 60 s number of transactions actually processed: 1158697 latency average: 0.414 ms tps = 19308.753179 (including connections establishing) tps = 19313.317204 (excluding connections establishing) trying 60 second run (traditional updates, equivalent to master) DROP TABLE CREATE TABLE pre-inserting tuples to update CHECKPOINT transaction type: Custom query scaling factor: 1000000 query mode: simple number of clients: 8 number of threads: 1 duration: 60 s number of transactions actually processed: 1552490 latency average: 0.309 ms tps = 25873.503004 (including connections establishing) tps = 25878.810475 (excluding connections establishing) running ./update.sh benchmark trying 60 second run (new infrastructure, extended hwlocking) DROP TABLE CREATE TABLE pre-inserting tuples to update CHECKPOINT transaction type: Custom query scaling factor: 1000000 query mode: simple number of clients: 8 number of threads: 1 duration: 60 s number of transactions actually processed: 1158654 latency average: 0.414 ms tps = 19310.786710 (including connections establishing) tps = 19314.838618 (excluding connections establishing) trying 60 second run (traditional updates, equivalent to master) DROP TABLE CREATE TABLE pre-inserting tuples to update CHECKPOINT transaction type: Custom query scaling factor: 1000000 query mode: simple number of clients: 8 number of threads: 1 duration: 60 s number of transactions actually processed: 1557929 latency average: 0.308 ms tps = 25964.116323 (including connections establishing) tps = 25969.841506 (excluding connections establishing) running ./update.sh benchmark trying 60 second run (new infrastructure, extended hwlocking) DROP TABLE CREATE TABLE pre-inserting tuples to update CHECKPOINT transaction type: Custom query scaling factor: 1000000 query mode: simple number of clients: 8 number of threads: 1 duration: 60 s number of transactions actually processed: 1156364 latency average: 0.415 ms tps = 19270.091404 (including connections establishing) tps = 19273.968473 (excluding connections establishing) trying 60 second run (traditional updates, equivalent to master) DROP TABLE CREATE TABLE pre-inserting tuples to update CHECKPOINT transaction type: Custom query scaling factor: 1000000 query mode: simple number of clients: 8 number of threads: 1 duration: 60 s number of transactions actually processed: 1582520 latency average: 0.303 ms tps = 26375.209370 (including connections establishing) tps = 26380.587907 (excluding connections establishing) ******** INSERTS (#1) ******** pg@hamster:~/upsert$ ./insert.sh running ./insert.sh benchmark 10000000 times Test upsert as a replacement for insert...no update part, for comparison against raw insert performance Note that a SERIAL PK is used here, to maximize lock contention running ./insert.sh benchmark trying 60 second run (new infrastructure, extended hwlocking) DROP TABLE CREATE TABLE transaction type: Custom query scaling factor: 150000 query mode: simple number of clients: 8 number of threads: 4 duration: 60 s number of transactions actually processed: 1115938 latency average: 0.430 ms tps = 18597.840257 (including connections establishing) tps = 18599.343148 (excluding connections establishing) trying 60 second run (traditional inserts, equivalent to master) DROP TABLE CREATE TABLE transaction type: Custom query scaling factor: 150000 query mode: simple number of clients: 8 number of threads: 1 duration: 60 s number of transactions actually processed: 1616974 latency average: 0.297 ms tps = 26905.090310 (including connections establishing) tps = 26910.548614 (excluding connections establishing) running ./insert.sh benchmark trying 60 second run (new infrastructure, extended hwlocking) DROP TABLE CREATE TABLE transaction type: Custom query scaling factor: 150000 query mode: simple number of clients: 8 number of threads: 4 duration: 60 s number of transactions actually processed: 1050853 latency average: 0.457 ms tps = 17513.047963 (including connections establishing) tps = 17514.834648 (excluding connections establishing) trying 60 second run (traditional inserts, equivalent to master) DROP TABLE CREATE TABLE transaction type: Custom query scaling factor: 150000 query mode: simple number of clients: 8 number of threads: 1 duration: 60 s number of transactions actually processed: 1611726 latency average: 0.298 ms tps = 26860.299912 (including connections establishing) tps = 26865.700452 (excluding connections establishing) running ./insert.sh benchmark trying 60 second run (new infrastructure, extended hwlocking) DROP TABLE CREATE TABLE transaction type: Custom query scaling factor: 150000 query mode: simple number of clients: 8 number of threads: 4 duration: 60 s number of transactions actually processed: 1065632 latency average: 0.450 ms tps = 17759.534360 (including connections establishing) tps = 17761.232238 (excluding connections establishing) trying 60 second run (traditional inserts, equivalent to master) DROP TABLE CREATE TABLE transaction type: Custom query scaling factor: 150000 query mode: simple number of clients: 8 number of threads: 1 duration: 60 s number of transactions actually processed: 1540704 latency average: 0.312 ms tps = 25676.640722 (including connections establishing) tps = 25682.510058 (excluding connections establishing) running ./insert.sh benchmark trying 60 second run (new infrastructure, extended hwlocking) DROP TABLE CREATE TABLE transaction type: Custom query scaling factor: 150000 query mode: simple number of clients: 8 number of threads: 4 duration: 60 s number of transactions actually processed: 1044714 latency average: 0.459 ms tps = 17410.476984 (including connections establishing) tps = 17412.336106 (excluding connections establishing) trying 60 second run (traditional inserts, equivalent to master) DROP TABLE CREATE TABLE transaction type: Custom query scaling factor: 150000 query mode: simple number of clients: 8 number of threads: 1 duration: 60 s number of transactions actually processed: 1575568 latency average: 0.305 ms tps = 26256.201708 (including connections establishing) tps = 26261.845552 (excluding connections establishing) running ./insert.sh benchmark trying 60 second run (new infrastructure, extended hwlocking) DROP TABLE CREATE TABLE transaction type: Custom query scaling factor: 150000 query mode: simple number of clients: 8 number of threads: 4 duration: 60 s number of transactions actually processed: 1073967 latency average: 0.447 ms tps = 17898.406523 (including connections establishing) tps = 17900.024292 (excluding connections establishing) trying 60 second run (traditional inserts, equivalent to master) DROP TABLE CREATE TABLE transaction type: Custom query scaling factor: 150000 query mode: simple number of clients: 8 number of threads: 1 duration: 60 s number of transactions actually processed: 1555066 latency average: 0.309 ms tps = 25916.316217 (including connections establishing) tps = 25921.476894 (excluding connections establishing) promise tuples (#2): =================== ******** UPDATES (#2) ******** pg@hamster:~/upsert$ ./update.sh running ./update.sh benchmark 10000000 times Test upsert as a replacement for update...no insert part, for comparison against raw update performance UPSERT with no inserts, only updates, vs. equivalent updates running ./update.sh benchmark trying 60 second run (new infrastructure, extended hwlocking) NOTICE: table "foo" does not exist, skipping DROP TABLE CREATE TABLE pre-inserting tuples to update CHECKPOINT transaction type: Custom query scaling factor: 1000000 query mode: simple number of clients: 8 number of threads: 1 duration: 60 s number of transactions actually processed: 1123128 latency average: 0.427 ms tps = 18717.037479 (including connections establishing) tps = 18720.705456 (excluding connections establishing) trying 60 second run (traditional updates, equivalent to master) DROP TABLE CREATE TABLE pre-inserting tuples to update CHECKPOINT transaction type: Custom query scaling factor: 1000000 query mode: simple number of clients: 8 number of threads: 1 duration: 60 s number of transactions actually processed: 1538212 latency average: 0.312 ms tps = 25634.910723 (including connections establishing) tps = 25639.996456 (excluding connections establishing) running ./update.sh benchmark trying 60 second run (new infrastructure, extended hwlocking) DROP TABLE CREATE TABLE pre-inserting tuples to update CHECKPOINT transaction type: Custom query scaling factor: 1000000 query mode: simple number of clients: 8 number of threads: 1 duration: 60 s number of transactions actually processed: 1132236 latency average: 0.424 ms tps = 18868.135507 (including connections establishing) tps = 18872.629751 (excluding connections establishing) trying 60 second run (traditional updates, equivalent to master) DROP TABLE CREATE TABLE pre-inserting tuples to update CHECKPOINT transaction type: Custom query scaling factor: 1000000 query mode: simple number of clients: 8 number of threads: 1 duration: 60 s number of transactions actually processed: 1527868 latency average: 0.314 ms tps = 25462.839591 (including connections establishing) tps = 25468.147238 (excluding connections establishing) running ./update.sh benchmark trying 60 second run (new infrastructure, extended hwlocking) DROP TABLE CREATE TABLE pre-inserting tuples to update CHECKPOINT transaction type: Custom query scaling factor: 1000000 query mode: simple number of clients: 8 number of threads: 1 duration: 60 s number of transactions actually processed: 1134058 latency average: 0.423 ms tps = 18896.552117 (including connections establishing) tps = 18900.606925 (excluding connections establishing) trying 60 second run (traditional updates, equivalent to master) DROP TABLE CREATE TABLE pre-inserting tuples to update CHECKPOINT transaction type: Custom query scaling factor: 1000000 query mode: simple number of clients: 8 number of threads: 1 duration: 60 s number of transactions actually processed: 1566905 latency average: 0.306 ms tps = 26114.340380 (including connections establishing) tps = 26119.575030 (excluding connections establishing) ******** INSERTS (#2) ******** pg@hamster:~/upsert$ ./insert.sh running ./insert.sh benchmark 10000000 times Test upsert as a replacement for insert...no update part, for comparison against raw insert performance Note that a SERIAL PK is used here, to maximize lock contention running ./insert.sh benchmark trying 60 second run (new infrastructure, extended hwlocking) DROP TABLE CREATE TABLE transaction type: Custom query scaling factor: 150000 query mode: simple number of clients: 8 number of threads: 4 duration: 60 s number of transactions actually processed: 1295151 latency average: 0.371 ms tps = 21584.352765 (including connections establishing) tps = 21586.111732 (excluding connections establishing) trying 60 second run (traditional inserts, equivalent to master) DROP TABLE CREATE TABLE transaction type: Custom query scaling factor: 150000 query mode: simple number of clients: 8 number of threads: 1 duration: 60 s number of transactions actually processed: 1595229 latency average: 0.301 ms tps = 26585.172063 (including connections establishing) tps = 26590.385164 (excluding connections establishing) running ./insert.sh benchmark trying 60 second run (new infrastructure, extended hwlocking) DROP TABLE CREATE TABLE transaction type: Custom query scaling factor: 150000 query mode: simple number of clients: 8 number of threads: 4 duration: 60 s number of transactions actually processed: 1243224 latency average: 0.386 ms tps = 20720.270153 (including connections establishing) tps = 20722.015799 (excluding connections establishing) trying 60 second run (traditional inserts, equivalent to master) DROP TABLE CREATE TABLE transaction type: Custom query scaling factor: 150000 query mode: simple number of clients: 8 number of threads: 1 duration: 60 s number of transactions actually processed: 1559009 latency average: 0.308 ms tps = 25981.598368 (including connections establishing) tps = 25987.410897 (excluding connections establishing) running ./insert.sh benchmark trying 60 second run (new infrastructure, extended hwlocking) DROP TABLE CREATE TABLE transaction type: Custom query scaling factor: 150000 query mode: simple number of clients: 8 number of threads: 4 duration: 60 s number of transactions actually processed: 1226602 latency average: 0.391 ms tps = 20441.565424 (including connections establishing) tps = 20443.592909 (excluding connections establishing) trying 60 second run (traditional inserts, equivalent to master) DROP TABLE CREATE TABLE transaction type: Custom query scaling factor: 150000 query mode: simple number of clients: 8 number of threads: 1 duration: 60 s number of transactions actually processed: 1561024 latency average: 0.307 ms tps = 26015.389541 (including connections establishing) tps = 26020.686144 (excluding connections establishing) running ./insert.sh benchmark trying 60 second run (new infrastructure, extended hwlocking) DROP TABLE CREATE TABLE transaction type: Custom query scaling factor: 150000 query mode: simple number of clients: 8 number of threads: 4 duration: 60 s number of transactions actually processed: 1226969 latency average: 0.391 ms tps = 20448.181466 (including connections establishing) tps = 20450.289756 (excluding connections establishing) trying 60 second run (traditional inserts, equivalent to master) DROP TABLE CREATE TABLE transaction type: Custom query scaling factor: 150000 query mode: simple number of clients: 8 number of threads: 1 duration: 60 s number of transactions actually processed: 1496496 latency average: 0.321 ms tps = 24901.224739 (including connections establishing) tps = 24906.522484 (excluding connections establishing) running ./insert.sh benchmark trying 60 second run (new infrastructure, extended hwlocking) DROP TABLE CREATE TABLE transaction type: Custom query scaling factor: 150000 query mode: simple number of clients: 8 number of threads: 4 duration: 60 s number of transactions actually processed: 1197352 latency average: 0.401 ms tps = 19955.269006 (including connections establishing) tps = 19957.244382 (excluding connections establishing) trying 60 second run (traditional inserts, equivalent to master) DROP TABLE CREATE TABLE transaction type: Custom query scaling factor: 150000 query mode: simple number of clients: 8 number of threads: 1 duration: 60 s number of transactions actually processed: 1542751 latency average: 0.311 ms tps = 25709.177045 (including connections establishing) tps = 25714.432820 (excluding connections establishing) running ./insert.sh benchmark trying 60 second run (new infrastructure, extended hwlocking) DROP TABLE CREATE TABLE transaction type: Custom query scaling factor: 150000 query mode: simple number of clients: 8 number of threads: 4 duration: 60 s number of transactions actually processed: 1181995 latency average: 0.406 ms tps = 19697.758449 (including connections establishing) tps = 19699.876610 (excluding connections establishing) trying 60 second run (traditional inserts, equivalent to master) DROP TABLE CREATE TABLE transaction type: Custom query scaling factor: 150000 query mode: simple number of clients: 8 number of threads: 1 duration: 60 s number of transactions actually processed: 1536436 latency average: 0.312 ms tps = 25603.964182 (including connections establishing) tps = 25609.142963 (excluding connections establishing) running ./insert.sh benchmark trying 60 second run (new infrastructure, extended hwlocking) DROP TABLE CREATE TABLE transaction type: Custom query scaling factor: 150000 query mode: simple number of clients: 8 number of threads: 4 duration: 60 s number of transactions actually processed: 1200230 latency average: 0.400 ms tps = 20002.224154 (including connections establishing) tps = 20004.182990 (excluding connections establishing) trying 60 second run (traditional inserts, equivalent to master) DROP TABLE CREATE TABLE transaction type: Custom query scaling factor: 150000 query mode: simple number of clients: 8 number of threads: 1 duration: 60 s number of transactions actually processed: 1539267 latency average: 0.312 ms tps = 25652.835154 (including connections establishing) tps = 25657.859502 (excluding connections establishing)
-- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers