Re: [HACKERS] Re: new patch of MERGE (merge_204) & a question about duplicated ctid

Greg Smith Tue, 04 Jan 2011 01:45:40 -0800

Heikki Linnakangas wrote:

You can of course LOCK TABLE as a work-around, if that's what you want.

What I was trying to suggest upthread is that while there are other possible ways around this problem, the only one that has any hope of shipping with 9.1 is to do just that. So from my perspective, the rest of the discussion about the right way to proceed is moot for now.

For some reason it didn't hit me until you said this that I could do the locking manually in my test case, without even touching the server-side code yet. Attached are a new pair of scripts where each pgbench UPDATE statement executes an explicit LOCK TABLE. Here's the result of a sample run here:


$ pgbench -f update-merge.sql -T 60 -c 16 -j 4 -s 2 pgbench
starting vacuum...end.
transaction type: Custom query
scaling factor: 2
query mode: simple
number of clients: 16
number of threads: 4
duration: 60 s
number of transactions actually processed: 84375
tps = 1405.953672 (including connections establishing)
tps = 1406.137456 (excluding connections establishing)

$ psql -c 'select count(*) as updated FROM pgbench_accounts WHERE NOT abalance=0' -d pgbench

updated
---------
  68897
(1 row)

$ psql -c 'select count(*) as inserted FROM pgbench_accounts WHERE aid > 100000' -d pgbench

inserted
----------
   34497
(1 row)

No assertion crashes, no duplicate key failures. All the weird stuff I was running into is gone, so decent evidence the worst of the problems were all because the heavy lock I expecting just wasn't integrated into the patch. Congratulations to Boxuan: for the first time this is starting to act like a viable feature addition to me, just one with a moderately long list of limitations and performance issues.

1400 TPS worth of UPSERT on my modest 8-core desktop (single drive with cheating fsync) isn't uselessly slow. If I add "SET TRANSACTION ISOLATION LEVEL SERIALIZABLE;" just after the BEGIN;, I don't see any serialization errors, and performance is exactly the same.

Run a straight UPDATE over only the existing range of keys, and I get 7000 TPS instead. So the locking etc. is reducing performance to 20% of its normal rate, on this assertion+debug build. I can run this tomorrow (err, later today I guess looking at the time) on a proper system with BBWC and without asseritions to see if the magnitude of the difference changes, but I don't think that's the main issue here.

Presuming the code quality issues and other little quirks I've documented (and new ones yet to be discovered) can get resolved here, and that's a sizeable open question, I could see shipping this with the automatic heavy LOCK TABLE in there. Then simple UPSERT could work out of the box via a straightforward MERGE. We'd need a big warning disclaiming that concurrent performance is very limited in this first release of the feature, but I don't know that this is at the unacceptable level of slow for smaller web apps and such.

Until proper fine-grained concurrency is implemented, I think it would be PR suicide to release a version of this without a full table lock happening automatically though. The idea Robert advocated well, that it would be possible for advanced users to use even this rough feature in a smarter way to avoid conflicts and not suffer the full performance penalty, is true. But if you consider the main purpose here to be making it easier to get smaller MySQL apps and the like ported to PostgreSQL (which is what I see as goal #1), putting that burden on the user is just going to reinforce the old "PostgreSQL is so much harder than MySQL" stereotype. I'd much prefer to see everyone have a slow but simple to use UPSERT via MERGE available initially, rather than to worry about optimizing for the advanced user in a way that makes life harder for the newbies. The sort of people who must have optimal performance already have trigger functions available to them, that they can write and tweak for best performance.


--
Greg Smith   2ndQuadrant US    [email protected]   Baltimore, MD
PostgreSQL Training, Services and Support        www.2ndQuadrant.us
"PostgreSQL 9.0 High Performance": http://www.2ndQuadrant.com/books

test-merge.sh
Description: Bourne shell script

\set nbranches :scale
\set ntellers 10 * :scale
\set naccounts 100000 * :scale
\setrandom aid 1 :naccounts
\setrandom bid 1 :nbranches
\setrandom tid 1 :ntellers
\setrandom delta -5000 5000
BEGIN;

-- Optional mode change
-- SET TRANSACTION ISOLATION LEVEL SERIALIZABLE;

LOCK TABLE pgbench_accounts;
MERGE INTO pgbench_accounts t USING (SELECT :aid,1+(:aid / 1000000)::integer,:delta,'') AS s(aid,bid,balance,filler) ON s.aid=t.aid WHEN MATCHED THEN UPDATE SET abalance=abalance + s.balance WHEN NOT MATCHED THEN INSERT VALUES(s.aid,s.bid,s.balance,s.filler);
COMMIT;

-- This syntax worked with MERGE v203 patch, but isn't compatible with v204
--MERGE INTO pgbench_accounts t USING (VALUES (:aid,1+(:aid / 1000000)::integer,:delta,'')) AS s(aid,bid,balance,filler) ON s.aid=t.aid WHEN MATCHED THEN UPDATE SET abalance=abalance + s.balance WHEN NOT MATCHED THEN INSERT VALUES(s.aid,s.bid,s.balance,s.filler);

-- 
Sent via pgsql-hackers mailing list ([email protected])
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Re: new patch of MERGE (merge_204) & a question about duplicated ctid

Reply via email to