#77: Bad performance of query_formatted()
-----------------------------+------------------
Reporter: justin | Owner: cito
Type: task | Status: new
Priority: major | Milestone: 5.1
Component: DB API 2 | Version: 5.0
Keywords: pg, performance |
-----------------------------+------------------
This was posted by Justin to the mailing list 2019-04-01:
I was going to do a performance test for prepared queries, specifically
for INSERTs: simple queries, query_formatted, inline=True+False, SQL
EXECUTE, pgdb, dicts and PQexecPrepared.
I didn't get far before needing to step back, as I'm shocked by how slow
parameter formatting is, just to turn %s into $1. It's nearly 10x slower
than query() with $1 params. That's true for pg (which can send params
out of line) and pgdb (which always inlines params).
If I'd noticed 2 years ago, I might have avoided using
DB.query_formatted() just to let developers use %s formats. Perhaps the
only reason why this isn't causing issues for our data loaders is that
each CPU heavy formatted INSERT is partially balanced with I/O on the DB
server. But I can easily imagine that being a deal-breaker. Maybe I'm
wrong, but I'd have expected the additional processing cost to be within
10% or 20% of query, and certainly within a factor of two.
BTW, prepared query completely avoids this by avoiding all the isinstance
which appears to be most of the expense. The high performance cost and
mitigation (prepares) should be documented. Also, I realized that SQL
EXECUTE is a bad way to do it, and maybe shouldn't be documented, since it
requires quoting each parameter (which is slow and thereby defeats
itself).
I tried to use cProfile to find hotspots and guessed at the rest until it
worked. My tweaks make it 3x faster - however, it's still 3x slower than
query()!
My changes are not very "disciplined" so please consider this a proof of
concept written by a non pytho-phone. I didn't touch pgdb and I haven't
tested with dict formats. I imagine you may have a better idea how/what
to cache.
3 runs on a semi-idle server with checkpoint_interval=60sec.
100000rows and 100cols (I believe this scales roughly as rows*cols)
||=method=||=unpatched=||=patched time (sec)=||
||pg simple insert with txn:||(105 98 91)||
||pg query() $1 params:||(169 172 178)||174||
||pg query_formatted %s params:||(745)||314||
||pg insert using inline=True:||(386)||187||
||pg prepare_query + execute_prepared||(n/a)||(90, 88, 94)||
||note, that's not included in this patch||||||
||pgdb insert with/out params||TODO||
||SQL EXECUTE + $1 + inline=True:||TODO||||
||dicts||TODO||||
||psycopg||TODO||||
Find attached my patch and test, which was meant to approximate our loader
processes, which I'd like to convert to use prepared query.
--
Ticket URL: <http://trac.pygresql.org:8000/pgtracker/ticket/77>
PyGreSQL <http://www.pygresql.org/>
PyGreSQL Tracker
_______________________________________________
PyGreSQL mailing list
[email protected]
https://mail.vex.net/mailman/listinfo/pygresql