#77: Bad performance of query_formatted()
-----------------------+-----------------------------
Reporter: justin | Owner: cito
Type: task | Status: new
Priority: major | Milestone: 5.1
Component: DB API 2 | Version: 5.0
Resolution: | Keywords: pg, performance
-----------------------+-----------------------------
Comment (by cito):
Thank you, Justin, and sorry for looking into this only now.
With a similar query as you're using (inserting 250 columns of zeroes), I
measured:
{{{
Test simple inline query():
Average time = 4.93 secs
Test query() with params:
Average time = 6.55 secs
Test query_formatted() with inline=False:
Average time = 12.47 secs
Test query_formatted() with types and inline=False:
Average time = 10.54 secs
Test query_formatted() with inline=True:
Average time = 7.34 secs
}}}
So I can basically confirm what you are reporting. As far as I see, the
difference between inline=False vs. True comes from creating two separate
lists (one for formatting the string, the other one for the separately
passed params) vs. only one, and a general performance penalty when using
PQExecParams vs PQEexec.
But I think the huge performance difference in your example is only an
artifact of the special kind of query you were using, with very simple
values (just integer zeroes) and many columns.
When I use a slightly different query, inserting into 250 columns a string
of 26 chars instead of the integer zero, I measure these numbers:
{{{
Test simple inline query():
Average time = 4.82 secs
Test query() with params:
Average time = 3.84 secs
Test query_formatted() with inline=False:
Average time = 4.15 secs
Test query_formatted() with types and inline=False:
Average time = 4.13 secs
Test query_formatted() with inline=True:
Average time = 5.26 secs
}}}
This is because strings need to be escaped when inserted inline, which
turns the tide against inline=True.
So the performance and which kind of parameter passing is better depends
very much on the kind of values. For more complex values and less columns,
I think inline=False is better.
And as you say, it also depends on the type of query and speed of the
database whether the overhead for conversion and adaptation of the values
matters or not.
The problem of inline=True is also that you can't pass types along, so I
think it is not so good as default value. I think you can also handle more
complicated types with inline=False. So I'd rather keep the old default.
I also had a look at the code, checked with a profiler and made some
optimizations in r970, r971 and r972, but did not really find much more we
can do here, at least no low-hanging fruits. The optimizations in your
patch will not work in the general case, when you have mixed parameters
containing Literal values. Also, caching the types of the values works
only if you have very few different values (like in your case only a
zero), but for most real-world examples with different strings or floats
this will not work.
So I think the best we can do is document the issue as you suggested, and
refer to `query` and `query_prepared` for performance-critial queries and
batch runs which I have done in r973.
--
Ticket URL: <http://trac.pygresql.org:8000/pgtracker/ticket/77#comment:2>
PyGreSQL <http://www.pygresql.org/>
PyGreSQL Tracker
_______________________________________________
PyGreSQL mailing list
[email protected]
https://mail.vex.net/mailman/listinfo/pygresql