Re: Faster numbers for Oracle

2013-06-17 Thread Shai Berger
So,...

On Monday 17 June 2013 16:02:05 Shai Berger wrote:
> On Thursday 16 May 2013 03:51:05 Shai Berger wrote:
> > Ticket #20414[0] and PR #1071[1].
> 
> Over a month, and no comment from anyone on these; I take it as sign that
> nobody has time for it. Unless someone wakes up quite quickly to oppose,
> I'm going to commit essentially the PR [...]

Nobody opposed. Tim Graham reviewed the PR and made some helpful comments -- 
thanks, Tim. But before committing, I wanted to run again some benchmarks, to 
see how the recent changes in query processing affect these performance 
enhancements.

I found what Anssi had found, way back when[2]: Switching to outputtypehandler 
makes almost no difference for ints, makes float columns (FloatFields) about 
2.5 
times faster, and makes decimal columns (DecimalFields and raw expressions) 
about 2 times slower. I found that baffling, and investigated.

Some background: Oracle number columns support 38 significant digits, which is 
significantly more than supported by Python floats (=C doubles). cx_Oracle 
supports decimal.Decimal as an input type, but not as an output type (as far 
as I could see) -- so to get accurate results when selecting such columns, one 
must read these numbers as strings and convert them to numbers later. There 
are two ways to do this -- either use an outputtypehandler, which lets us 
define how we want to process each column; or use numbersAsStrings, which tells 
cx_Oracle to return _all_ numbers as strings, to be processed later in Python.

The part where ints do not improve is explained quite easily: cx_Oracle 
returns ints as (small, C) ints only if it knows they will fit -- which means, 
if they have less than 10 digits. IntegerFields (incl. AutoField, 
OneToOneField, and, surprisingly, even SmallIntegerField)  are all NUMBER(11), 
and INTEGER is actually NUMBER(38); and so they get converted to a python long 
int. Python makes this transparent, so nobody notices, but the conversion 
through string happens whether numbersAsString is used or not. If we had 
chosen NUMBER(9), some C ints would not fit in, but everything would be a lot 
faster. But that is mostly water under the bridge now[3].

The part where decimals become slower -- I'm still baffled by that. Anssi 
claimed it was due to calling a function for each value separately (with 
numbersAsStrings, a function is only called per row). But this makes little 
sense -- we see the slow-down in a benchmark with one-column rows.

I tried to get the best of both world by only using numbersAsStrings when 
NUMBER() columns are present -- but that doesn't work; the setting needs to be 
made before the execute() call in order to take effect, and at that time we 
don't know what columns we are expecting.

Using an outputtypehandler opens the door to some user-initiated 
optimizations; one thing I noticed is that for large queries (returning 
thousands of rows), performance improves when using a larger cursor arraysize 
(the default is 100). This is not the typical query for a web application, but 
the Django ORM is used in batch processing as well.

As I said, I prepared a new PR, 1279[4]; the options, as I see them, are:

a) Make this change, sacrificing decimals for floats
b) Reject the change, sacrificing floats for decimals
c) Somehow let users choose (either globally or for specific queries) -- but 
then we still need to choose the default

Your advice is appreciated,

Shai.
 
[0] https://code.djangoproject.com/ticket/20414
[1] https://github.com/django/django/pull/1071
[2] https://groups.google.com/d/topic/django-developers/4BNkJyGez9A/discussion
[3] This optimization opportunity is no longer even available with Python 3, 
at least with cx_Oracle 5.1.2
[4] https://github.com/django/django/pull/1279

-- 
You received this message because you are subscribed to the Google Groups 
"Django developers" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to django-developers+unsubscr...@googlegroups.com.
To post to this group, send email to django-developers@googlegroups.com.
Visit this group at http://groups.google.com/group/django-developers.
For more options, visit https://groups.google.com/groups/opt_out.




Re: Question about password salt and encryption

2013-06-17 Thread Jon Dufresne
On Sat, Jun 15, 2013 at 2:24 PM, Luke Plant  wrote:

> 2) Should Django's security be improved by an additional salt that isn't
> stored in the database?
>
> Regarding number 2, this is not likely to happen quickly, due to
> backwards compatibility issues, and the need to introduce a new setting
> etc. (That may help you to decide question 1).
>
> It's definitely worth considering, of course. We would have to consider
> whether it is worth the work. For many installations, if an attacker has
> the database they are very likely to have the source code too. Of
> course, we should try to layer security so that it isn't all or nothing.
> But given the difficulties of changing things, we'd have to consider
> whether the increase in security, in a typical setup, would justify the
> change.
>

Are you suggesting this should be a change to Django itself? A new password
hasher that uses a per user salt as well as a static salt stored outside
the database? Should I file this as a ticket or is this just hypothetical?

-- 
You received this message because you are subscribed to the Google Groups 
"Django developers" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to django-developers+unsubscr...@googlegroups.com.
To post to this group, send email to django-developers@googlegroups.com.
Visit this group at http://groups.google.com/group/django-developers.
For more options, visit https://groups.google.com/groups/opt_out.




Faster numbers for Oracle

2013-06-17 Thread Shai Berger
On Thursday 16 May 2013 03:51:05 Shai Berger wrote:
> 
> Ticket #20414[0] and PR #1071[1]. 

Over a month, and no comment from anyone on these; I take it as sign that 
nobody has time for it. Unless someone wakes up quite quickly to oppose, I'm 
going to commit essentially the PR (I'm editing it for better commit history 
and documentation). Partly, this is...
> [b]ased on the fact that the code is based
> on Ian's patch, and the work[2] [Anssi] and me have done on this
> together (as PR #393, last september).
> 

Thanks,
Shai.

[0] https://code.djangoproject.com/ticket/20414
[1] https://github.com/django/django/pull/1071
[2] https://groups.google.com/d/topic/django-developers/4BNkJyGez9A/discussion

-- 
You received this message because you are subscribed to the Google Groups 
"Django developers" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to django-developers+unsubscr...@googlegroups.com.
To post to this group, send email to django-developers@googlegroups.com.
Visit this group at http://groups.google.com/group/django-developers.
For more options, visit https://groups.google.com/groups/opt_out.