#77: Bad performance of query_formatted()
-----------------------+-----------------------------
  Reporter:  justin    |      Owner:  cito
      Type:  task      |     Status:  new
  Priority:  major     |  Milestone:  5.1
 Component:  DB API 2  |    Version:  5.0
Resolution:            |   Keywords:  pg, performance
-----------------------+-----------------------------

Comment (by cito):

 Thank you, Justin, and sorry for looking into this only now.

 With a similar query as you're using (inserting 250 columns of zeroes), I
 measured:

 {{{
 Test simple inline query():
   Average time = 4.93 secs
 Test query() with params:
   Average time = 6.55 secs
 Test query_formatted() with inline=False:
   Average time = 12.47 secs
 Test query_formatted() with types and inline=False:
   Average time = 10.54 secs
 Test query_formatted() with inline=True:
   Average time = 7.34 secs
 }}}

 So I can basically confirm what you are reporting. As far as I see, the
 difference between inline=False vs. True comes from creating two separate
 lists (one for formatting the string, the other one for the separately
 passed params) vs. only one, and a general performance penalty when using
 PQExecParams vs PQEexec.

 But I think the huge performance difference in your example is only an
 artifact of the special kind of query you were using, with very simple
 values (just integer zeroes) and many columns.

 When I use a slightly different query, inserting into 250 columns a string
 of 26 chars instead of the integer zero, I measure these numbers:

 {{{
 Test simple inline query():
   Average time = 4.82 secs
 Test query() with params:
   Average time = 3.84 secs
 Test query_formatted() with inline=False:
   Average time = 4.15 secs
 Test query_formatted() with types and inline=False:
   Average time = 4.13 secs
 Test query_formatted() with inline=True:
   Average time = 5.26 secs
 }}}

 This is because strings need to be escaped when inserted inline, which
 turns the tide against inline=True.

 So the performance and which kind of parameter passing is better depends
 very much on the kind of values. For more complex values and less columns,
 I think inline=False is better.

 And as you say, it also depends on the type of query and speed of the
 database whether the overhead for conversion and adaptation of the values
 matters or not.

 The problem of inline=True is also that you can't pass types along, so I
 think it is not so good as default value. I think you can also handle more
 complicated types with inline=False. So I'd rather keep the old default.

 I also had a look at the code, checked with a profiler and made some
 optimizations in r970, r971 and r972, but did not really find much more we
 can do here, at least no low-hanging fruits. The optimizations in your
 patch will not work in the general case, when you have mixed parameters
 containing Literal values. Also, caching the types of the values works
 only if you have very few different values (like in your case only a
 zero), but for most real-world examples with different strings or floats
 this will not work.

 So I think the best we can do is document the issue as you suggested, and
 refer to `query` and `query_prepared` for performance-critial queries and
 batch runs which I have done in r973.

--
Ticket URL: <http://trac.pygresql.org:8000/pgtracker/ticket/77#comment:2>
PyGreSQL <http://www.pygresql.org/>
PyGreSQL Tracker
_______________________________________________
PyGreSQL mailing list
[email protected]
https://mail.vex.net/mailman/listinfo/pygresql

Reply via email to