#77: Bad performance of query_formatted()
-----------------------------+------------------
 Reporter:  justin           |      Owner:  cito
     Type:  task             |     Status:  new
 Priority:  major            |  Milestone:  5.1
Component:  DB API 2         |    Version:  5.0
 Keywords:  pg, performance  |
-----------------------------+------------------
 This was posted by Justin to the mailing list 2019-04-01:

 I was going to do a performance test for prepared queries, specifically
 for INSERTs: simple queries, query_formatted, inline=True+False, SQL
 EXECUTE, pgdb, dicts and PQexecPrepared.

 I didn't get far before needing to step back, as I'm shocked by how slow
 parameter formatting is, just to turn %s into $1.  It's nearly 10x slower
 than query() with $1 params.  That's true for pg (which can send params
 out of line) and pgdb (which always inlines params).

 If I'd noticed 2 years ago, I might have avoided using
 DB.query_formatted() just to let developers use %s formats. Perhaps the
 only reason why this isn't causing issues for our data loaders is that
 each CPU heavy formatted INSERT is partially balanced with I/O on the DB
 server. But I can easily imagine that being a deal-breaker.  Maybe I'm
 wrong, but I'd have expected the additional processing cost to be within
 10% or 20% of query, and certainly within a factor of two.

 BTW, prepared query completely avoids this by avoiding all the isinstance
 which appears to be most of the expense.  The high performance cost and
 mitigation (prepares) should be documented.  Also, I realized that SQL
 EXECUTE is a bad way to do it, and maybe shouldn't be documented, since it
 requires quoting each parameter (which is slow and thereby defeats
 itself).

 I tried to use cProfile to find hotspots and guessed at the rest until it
 worked.  My tweaks make it 3x faster - however, it's still 3x slower than
 query()!

 My changes are not very "disciplined" so please consider this a proof of
 concept written by a non pytho-phone. I didn't touch pgdb and I haven't
 tested with dict formats.  I imagine you may have a better idea how/what
 to cache.

 3 runs on a semi-idle server with checkpoint_interval=60sec.
 100000rows and 100cols (I believe this scales roughly as rows*cols)

 ||=method=||=unpatched=||=patched time (sec)=||
 ||pg simple insert with txn:||(105 98 91)||
 ||pg query() $1 params:||(169 172 178)||174||
 ||pg query_formatted %s params:||(745)||314||
 ||pg insert using inline=True:||(386)||187||
 ||pg prepare_query + execute_prepared||(n/a)||(90, 88, 94)||
 ||note, that's not included in this patch||||||
 ||pgdb insert with/out params||TODO||
 ||SQL EXECUTE + $1 + inline=True:||TODO||||
 ||dicts||TODO||||
 ||psycopg||TODO||||

 Find attached my patch and test, which was meant to approximate our loader
 processes, which I'd like to convert to use prepared query.

--
Ticket URL: <http://trac.pygresql.org:8000/pgtracker/ticket/77>
PyGreSQL <http://www.pygresql.org/>
PyGreSQL Tracker
_______________________________________________
PyGreSQL mailing list
[email protected]
https://mail.vex.net/mailman/listinfo/pygresql

Reply via email to