So,...
On Monday 17 June 2013 16:02:05 Shai Berger wrote:
> On Thursday 16 May 2013 03:51:05 Shai Berger wrote:
> > Ticket #20414[0] and PR #1071[1].
>
> Over a month, and no comment from anyone on these; I take it as sign that
> nobody has time for it. Unless someone wakes up quite quickly to oppose,
> I'm going to commit essentially the PR [...]
Nobody opposed. Tim Graham reviewed the PR and made some helpful comments --
thanks, Tim. But before committing, I wanted to run again some benchmarks, to
see how the recent changes in query processing affect these performance
enhancements.
I found what Anssi had found, way back when[2]: Switching to outputtypehandler
makes almost no difference for ints, makes float columns (FloatFields) about
2.5
times faster, and makes decimal columns (DecimalFields and raw expressions)
about 2 times slower. I found that baffling, and investigated.
Some background: Oracle number columns support 38 significant digits, which is
significantly more than supported by Python floats (=C doubles). cx_Oracle
supports decimal.Decimal as an input type, but not as an output type (as far
as I could see) -- so to get accurate results when selecting such columns, one
must read these numbers as strings and convert them to numbers later. There
are two ways to do this -- either use an outputtypehandler, which lets us
define how we want to process each column; or use numbersAsStrings, which tells
cx_Oracle to return _all_ numbers as strings, to be processed later in Python.
The part where ints do not improve is explained quite easily: cx_Oracle
returns ints as (small, C) ints only if it knows they will fit -- which means,
if they have less than 10 digits. IntegerFields (incl. AutoField,
OneToOneField, and, surprisingly, even SmallIntegerField) are all NUMBER(11),
and INTEGER is actually NUMBER(38); and so they get converted to a python long
int. Python makes this transparent, so nobody notices, but the conversion
through string happens whether numbersAsString is used or not. If we had
chosen NUMBER(9), some C ints would not fit in, but everything would be a lot
faster. But that is mostly water under the bridge now[3].
The part where decimals become slower -- I'm still baffled by that. Anssi
claimed it was due to calling a function for each value separately (with
numbersAsStrings, a function is only called per row). But this makes little
sense -- we see the slow-down in a benchmark with one-column rows.
I tried to get the best of both world by only using numbersAsStrings when
NUMBER() columns are present -- but that doesn't work; the setting needs to be
made before the execute() call in order to take effect, and at that time we
don't know what columns we are expecting.
Using an outputtypehandler opens the door to some user-initiated
optimizations; one thing I noticed is that for large queries (returning
thousands of rows), performance improves when using a larger cursor arraysize
(the default is 100). This is not the typical query for a web application, but
the Django ORM is used in batch processing as well.
As I said, I prepared a new PR, 1279[4]; the options, as I see them, are:
a) Make this change, sacrificing decimals for floats
b) Reject the change, sacrificing floats for decimals
c) Somehow let users choose (either globally or for specific queries) -- but
then we still need to choose the default
Your advice is appreciated,
Shai.
[0] https://code.djangoproject.com/ticket/20414
[1] https://github.com/django/django/pull/1071
[2] https://groups.google.com/d/topic/django-developers/4BNkJyGez9A/discussion
[3] This optimization opportunity is no longer even available with Python 3,
at least with cx_Oracle 5.1.2
[4] https://github.com/django/django/pull/1279
--
You received this message because you are subscribed to the Google Groups
"Django developers" group.
To unsubscribe from this group and stop receiving emails from it, send an email
to django-developers+unsubscr...@googlegroups.com.
To post to this group, send email to django-developers@googlegroups.com.
Visit this group at http://groups.google.com/group/django-developers.
For more options, visit https://groups.google.com/groups/opt_out.