RE: Django's problem with db-level defaults on Oracle

2014-11-01 Thread Kääriäinen Anssi
Quick question: could django set the default to to_date('2014-31-01', 
'-mm-dd')?

From: django-developers@googlegroups.com [django-developers@googlegroups.com] 
on behalf of Shai Berger [s...@platonix.com]
Sent: Friday, October 31, 2014 17:34
To: django-developers@googlegroups.com
Subject: Django's problem with db-level defaults on Oracle

Hi Everyone,

I just mentioned in another thread that db-level defaults are particularly
troublesome on Oracle. I didn't want to burden that discussion with the
detais, but having been asked about it on IRC (thanks Josh), here they are.

The problem is caused by a combination of factors:

1) Oracle stores database-level defaults as strings, evaluated when needed.

This is not, in itself, completely insensible -- the processing and space
overheads (compared to some more "binary" representation) are negligible, and
it means defaults "4" and "sysdate()" are treated by the system uniformly.

2) Django's Oracle backend sets the date-time format to a constant (close to
ISO format), which is usually not the default.

This has been used to perform some database date-time operations by
manipulating strings -- because that way was easier to the developer
implementing them, or there wasn't proper support for the feature otherwise;
as a classic example, before 1.7, date-times used to be inserted into the
database as strings, because some special manipulation was required to make
cx_Oracle (the database driver library) support sub-second precision (thanks
jtiai). I'm not completely sure how much date-string-manipulation remains in
the Oracle backend today, but it is certainly still used for database
defaults: Oracle doesn't take parameters in DDL statements.

As a result of these two factors, when datetimes were set as default column
values (which happened a lot with South<0.7.3), the value actually stored in
the schema was a string specifying the date-time in a non-default format.
Whenever Django connected to the DB, it set the session's date-time format to
the "right" one, and so no problems were seen.

But when backing up using the oracle "exp" utility -- which, as far as I'm
aware, is pretty standard, at least as a developer backing up schemas on their
own instance -- it was still these strings that were saved; and when trying to
restore with the converse "imp", whose connection is (of course) not
controlled by Django, the utility tried to set the date-time defaults by a
format that was inappropriate for the values. This usually failed, resulting
in partial restores, which lead to a lot of pain.

If you're still here, you probably want to know how we solved the problem: Our
DBA showed us how to install a database-level trigger to change the format
whenever the relevant users logged on. This allowed us to get Oracle's "imp"
to use the right date-time formats. However, this is highly non-obvious: I,
for one, didn't even know such triggers existed.

Thanks for your attention,

Shai.

--
You received this message because you are subscribed to the Google Groups 
"Django developers  (Contributions to Django itself)" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to django-developers+unsubscr...@googlegroups.com.
To post to this group, send email to django-developers@googlegroups.com.
Visit this group at http://groups.google.com/group/django-developers.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/django-developers/201410311734.08971.shai%40platonix.com.
For more options, visit https://groups.google.com/d/optout.

-- 
You received this message because you are subscribed to the Google Groups 
"Django developers  (Contributions to Django itself)" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to django-developers+unsubscr...@googlegroups.com.
To post to this group, send email to django-developers@googlegroups.com.
Visit this group at http://groups.google.com/group/django-developers.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/django-developers/7CDBD1EFB267CD41949C704C14E92DBF1A714566%40HELW040.stakes.fi.
For more options, visit https://groups.google.com/d/optout.


RE: Django 1.6RC1 exclude behavior change

2013-11-04 Thread Kääriäinen Anssi
I'll look into this.

 - Anssi

From: django-developers@googlegroups.com [django-developers@googlegroups.com] 
On Behalf Of jgas...@gmail.com [jgas...@gmail.com]
Sent: Monday, November 04, 2013 17:16
To: django-developers@googlegroups.com
Subject: Django 1.6RC1 exclude behavior change

I've found what looks like a serious behavior change in the exclude queryset 
method from Django 1.5.5 to Django 1.6 rc1.

It seems that on 1.5.5 exclude when traversing relationships only excluded 
items if all criteria on the kwargs were matched on the same related item. On 
1.6rc1 it excludes items even if the criteria on the kwargs is only matched 
across multiple related items. I guess this explanation is not very clear, so 
here is a sample code that show the behavior change:
http://pastebin.kde.org/pe1vlzd3v

Since I didn't find anything on the change notes about this, it looks to me 
like a bug. Is it? Or am I missing something?

--
You received this message because you are subscribed to the Google Groups 
"Django developers" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to django-developers+unsubscr...@googlegroups.com.
To post to this group, send email to django-developers@googlegroups.com.
Visit this group at http://groups.google.com/group/django-developers.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/django-developers/8267eeb8-f1a7-46db-969e-79d819c8f797%40googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

-- 
You received this message because you are subscribed to the Google Groups 
"Django developers" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to django-developers+unsubscr...@googlegroups.com.
To post to this group, send email to django-developers@googlegroups.com.
Visit this group at http://groups.google.com/group/django-developers.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/django-developers/FDD0C28683BA024195713874AF8663B31C6F86DC6E%40EXMAIL.stakes.fi.
For more options, visit https://groups.google.com/groups/opt_out.


Re: Is "transaction.atomic" in 1.6 supposed to work this way?

2013-09-21 Thread Kääriäinen Anssi
For the performance part: a simple model.save() is about 50% more expensive 
with savepoints. This time is used in the database.

In addition there are 3 network trips instead of one. This could add latency in 
some usecases.


 Original message 
Subject: Re: Is "transaction.atomic" in 1.6 supposed to work this way?
From: Aymeric Augustin 
To: "django-developers@googlegroups.com" 
CC:


Le 21 sept. 2013 à 15:53, Richard Ward 
mailto:daedalusf...@gmail.com>> a écrit :

You say in your docs patch that savepoints are cheap

Truth be said, I haven't run benchmarks.

so what is transaction.atomic(savepoint=False) for? is it just for performance, 
or is more like an assertion that we are definitely in a transaction (or both?).

It's mostly for performance. Ask Anssi for details.

There a second, more practical, reason; read below.

At present the decision to rollback or commit is based on whether there is a 
current exeption and whether needs_rollback is True. If instead this were just 
based on whether there is a current exception (getting rid of needs_rollback), 
then exceptions bubbling from inside a transaction.atomic(savepoint=False) 
would still cause a rollback, and catching an exception (hiding it from the 
context manager) would lead to a commit (or at least an attempt to commit). 
This would leave Django+PostgreSQL's behaviour unchanged

You may be right. I'm not sure. This code is tricky. Such assertions routinely 
take more than 10 hours of work to confirm.

Removing the option for savepoint=False would have the same effect

It would have the drawback of breaking everyone's assertNumQueries because of 
the extra savepoints introduced by Django.

This would be very hostile to people porting large, well-tested code bases.

--
Aymeric (mobile).


--
You received this message because you are

-- 
You received this message because you are subscribed to the Google Groups 
"Django developers" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to django-developers+unsubscr...@googlegroups.com.
To post to this group, send email to django-developers@googlegroups.com.
Visit this group at http://groups.google.com/group/django-developers.
For more options, visit https://groups.google.com/groups/opt_out.


RE: Custom Chainable QuerySets (#20625)

2013-07-24 Thread Kääriäinen Anssi
Same pull request at https://github.com/django/django/pull/1328. Seems like it 
is still getting some review & update activity. I am planning on doing a final 
review on commit, but judging by the amount of reviews done already I think 
this one will be very polished by Friday.

From: django-developers@googlegroups.com [django-developers@googlegroups.com] 
On Behalf Of Aymeric Augustin [aymeric.augus...@polytechnique.org]
Sent: Wednesday, July 24, 2013 21:39
To: django-developers@googlegroups.com
Subject: Re: Custom Chainable QuerySets (#20625)

On 24 juil. 2013, at 13:53, Anssi Kääriäinen  wrote:

> I will commit the patch on Friday. If somebody wants more time to review the 
> patch, just ask and I will defer the commit to later date.

Where's the version of the patch you're ready to commit?

--
Aymeric.




--
You received this message because you are subscribed to the Google Groups 
"Django developers" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to django-developers+unsubscr...@googlegroups.com.
To post to this group, send email to django-developers@googlegroups.com.
Visit this group at http://groups.google.com/group/django-developers.
For more options, visit https://groups.google.com/groups/opt_out.


-- 
You received this message because you are subscribed to the Google Groups 
"Django developers" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to django-developers+unsubscr...@googlegroups.com.
To post to this group, send email to django-developers@googlegroups.com.
Visit this group at http://groups.google.com/group/django-developers.
For more options, visit https://groups.google.com/groups/opt_out.




RE: Add signals for QuerySet bulk operations such as `delete`, `update, `bulk_create`

2012-03-25 Thread Kääriäinen Anssi
A somewhat different proposal is in ticket #17824: Add generic pre/post_modify 
signal. I think the generic
"object modified" signals would fit your use case very well.

The idea is that there would be just one signal which would be fired for any 
data modifying ORM operation.
The arguments for it:
  - fired for every operation modifying data
  - one signal to listen to for all data modifications
The likely counter-arguments:
  - duplicates the existing signals
  - the callbacks end up being a big "switch statement", and thus you end up 
separating save, delete etc
anyways.
  - the API isn't good enough

>From performance perspective there should be no big problems: the signal is 
>given an iterable as
"objs_modified" argument. For .update() for example, where you don't want to 
fetch all the objects for
performance reasons, you could just pass qs.filter(update_filters) as the 
modified objects. This way
there would be no performance penalty, except if there is actual use of the 
signal.

I would like to see a generic pre/post modify signal, as I think it is much 
easier to use than using
the pre/post save/delete + m2m_changed signals. However, I do not feel strongly 
at all about this, just
something I would find useful. I believe having total control of all data 
modifying operations using Django
signals would be a welcome addition for many users.

 - Anssi

From: django-developers@googlegroups.com [django-developers@googlegroups.com] 
On Behalf Of Byron Ruth [bjr...@gmail.com]
Sent: Sunday, March 25, 2012 17:46
To: django-developers@googlegroups.com
Subject: Add signals for QuerySet bulk operations such as `delete`, `update, 
`bulk_create`

My use case is for regenerating aggregate data cache at a table level. Simply 
calling a single signal after a bulk operation is complete would enable 
invalidating such aggregate cache. There is not a very clean alternate solution 
to this problem unless using database triggers which calls an external script 
that invalidates the cache.

--
You received this message because you are subscribed to the Google Groups 
"Django developers" group.
To view this discussion on the web visit 
https://groups.google.com/d/msg/django-developers/-/DAaTRIau8h8J.
To post to this group, send email to django-developers@googlegroups.com.
To unsubscribe from this group, send email to 
django-developers+unsubscr...@googlegroups.com.
For more options, visit this group at 
http://groups.google.com/group/django-developers?hl=en.

-- 
You received this message because you are subscribed to the Google Groups 
"Django developers" group.
To post to this group, send email to django-developers@googlegroups.com.
To unsubscribe from this group, send email to 
django-developers+unsubscr...@googlegroups.com.
For more options, visit this group at 
http://groups.google.com/group/django-developers?hl=en.



RE: Complex aggregate and expression composition.

2012-03-20 Thread Kääriäinen Anssi
I took a quick look at your patch. I don't have more time now, so just some 
quick comments:
  - In general, the approach where aggregates are just expressions sounds and 
looks
valid.
  - I would not worry about the extra time used in djangobench. However, 
profiling why
there is extra time used is always recommended.
  - I am a bit scared of the type coercions. The reason is that this could 
prove to be
hopelessly complex to get right in every case. However, I do not have 
concrete
examples where this is in fact a problem. The default should probably not 
be an exception,
but just returning what the database happens to give you back.

I think the approach you have taken is correct in general. I would encourage to 
check
if you can somewhat easily incorporate the conditional aggregate support 
(#11305)
into the ExpressionNode based aggreagates. It does not belong into the same
patch, but is a good sanity check if the approach taken is extensible.


[Following is a bit off-topic]
I wonder if the ExpressionNode itself should be refactored into a public API. 
This way
you could easily write your own SQL snippets injectable into the query. This 
could be
custom aggregates, or this could be just NULLS LAST order by clauses.

The reason I bring this up is that in the long run, adding more and more 
special case
support to the ORM (conditional aggregates, different SQL functions) doesn't 
seem to be
the right way forward. Once you get expression composition in, you only have 
90% of
SQL constructs left...

Spend the time in building support for user writable SQL snippets, so that they 
can
use just the SQL they want. In my opinion NULLS LAST/FIRST support is a great
example: it is common enough that users need it from time to time, but it is 
not common
enough to spend the time to support this special case. Why not just:
qs.order_by(SQL('%s NULLS LAST', F('pub_date'))
and you now got support for _any_ order by clause the user wishes to use. 
Replaces
extra(), but in a cleaner way. The above could support relabel_aliases(). Or 
you could
write it just as qs.order_by(SQL('pub_date NULLS LAST')) if you don't care for 
relabel
aliases support.

For the F-expression support in aggregates this would mean you get actually not 
just
F expression support in aggregates, but any SQL snippet can be injected into the
aggregates, for example Sum(SQL('case when person.age > friend.age then 1 else 
0 end'))

 - Anssi

From: django-developers@googlegroups.com [django-developers@googlegroups.com] 
On Behalf Of Nate Bragg [jonathan.br...@alum.rpi.edu]
Sent: Wednesday, March 21, 2012 01:27
To: django-developers@googlegroups.com
Subject: Complex aggregate and expression composition.

Hello all,

Since even before I saw Alex Gaynor's presentation "I hate the Django ORM"
(the one with the `Sum(F("end_time") - F("start_time"))` query), the problem
of complex aggregates and expressions has vexed me. So, I figured I would
try to solve it.

Originally, I started this trying to pursue a solution to ticket #14030, but 
after
I took a couple of lousy shots at it, it dawned on me that the ticket would be
better resolved as a result of solving the more general case. I realized that
aggregates were just a special case of expressions, and that the best solution
was going to take a refactoring of Aggregate into ExpressionNode.

I have uploaded my branch; it can be found here:

https://github.com/NateBragg/django/tree/14030

Since this is a non-trivial change, I was hoping to open the topic for debate
here, and get some feedback before proposing my solution for inclusion.

Some particular points of note:
* I tried to preserve as much interface as possible; I didn't know how much
  was considered to be more public, so generally I tried to add rather than
  subtract. However, I did remove a couple things - if you see something
  missing that shouldn't be, let me know.
* Currently, I'm getting the entire test suite passed on sqlite, oracle, mysql,
  postgres, and postgis. I was unable to test on oracle spatial - any help
  with that would be appreciated.
* When fields are combined, they are coerced to a common type;
  IntegerFields are coerced to FloatFields, which are coerced into
  DecimalFields as needed. Any other kinds of combinations must be of
  the same types. Also, when coerced to a DecimalField, the precision
  is limited by the original DecimalField. If this is not correct, or other
  coercions should be added, I'd like to correct that.
* When joins are required, they tend to be LEFT OUTER; I'd like some
  feedback on this, as I'm not 100% sure its always the best behavior.
* As the aggregates are a little more complicated, on trivial cases there
  is a minor reduction in performance; using djangobench, I measured
  somewhere between a 3% and 8% increase in runtime.
* I don't have enough tests - mostly for a lack of creativity. What kind of
  composed aggregates and expressions wo

RE: commit_on_success leaves incorrect PostgreSQL isolation mode?

2012-03-19 Thread Kääriäinen Anssi
This issue is handled in ticket #16407 
(https://code.djangoproject.com/ticket/16047), but it is unlikely to get fixed 
in 1.4. Making changes to transaction management code at this late stage of 
development cycle isn't something I am willing to do.

Your analysis of the cause of the problem is correct. When psycopg2 leaves 
transaction management, it mistakenly uses the current transaction management 
state when setting isolation level, not the one before the current one. So, 
when leaving transaction management Django sees that the current state is 
managed and thus keeps autocommit off, instead of seeing that the previous 
state was unmanaged and setting autocommit to on. There is also a fix for this 
in the above mentioned ticket.

 - Anssi

From: django-developers@googlegroups.com [django-developers@googlegroups.com] 
On Behalf Of Christophe Pettus [x...@thebuild.com]
Sent: Monday, March 19, 2012 08:23
To: django-developers@googlegroups.com
Subject: commit_on_success leaves incorrect PostgreSQL isolation mode?

While exploring the Django transaction stuff (in 1.4rc1), I ran across the 
following behavior.  I use commit_on_success as the example here, but the other 
transaction decorators/context managers have the same issue.

It seems to me to be a bug, but I wanted to confirm this before I opened an 
issue.

The configuration is running Django using the psycopg2 backend, with 'OPTIONS': 
{ 'autocommit': True, }
Consider the following code:

from django.db import transaction, DEFAULT_DB_ALIAS, connections
from myapp.mymodels import X

x = X.objects.get(id=1)

print connections[DEFAULT_DB_ALIAS].isolation_level  # As expected, it's 0 
here.

x.myfield = 'Foo'

with commit_on_success():
   x.save()
   print connections[DEFAULT_DB_ALIAS].isolation_level  # As expected, it's 
1 here.

print connections[DEFAULT_DB_ALIAS].isolation_level  # It's now 1 here, but 
shouldn't it be back to 0?


The bug seems to be that the isolation level does not get reset back to 0, even 
when leaving connection management.  This means that any further operations on 
the database will open a new transaction (since psycopg2 will automatically 
open), but this transaction won't be managed in any way.

The bug appears to be in 
django.db.backends.BaseDatabaseWrapper.leave_transaction_management; it calls 
the _leave_transaction_management hook first thing, but this means that 
is_managed() will return true (since the decorators call managed(True)), which 
means that _leave_transaction_management in the psycopg2 backend will not reset 
the transaction isolation level; the code in the psycopg2 backend seems to 
assume that it will be run in the new transaction context, not the previous one.

Or am I missing something?

--
-- Christophe Pettus
   x...@thebuild.com

--
You received this message because you are subscribed to the Google Groups 
"Django developers" group.
To post to this group, send email to django-developers@googlegroups.com.
To unsubscribe from this group, send email to 
django-developers+unsubscr...@googlegroups.com.
For more options, visit this group at 
http://groups.google.com/group/django-developers?hl=en.

-- 
You received this message because you are subscribed to the Google Groups 
"Django developers" group.
To post to this group, send email to django-developers@googlegroups.com.
To unsubscribe from this group, send email to 
django-developers+unsubscr...@googlegroups.com.
For more options, visit this group at 
http://groups.google.com/group/django-developers?hl=en.



RE: DoS using POST via hash algorithm collision

2012-01-20 Thread Kääriäinen Anssi
Paul McMillan had a very good posting about this on the Python issue tracker. 
The problem is that whenever you put user supplied data into a hashmap, you are 
vulnerable to this attack. This basically includes most Python modules, and I 
would guess a lot of user code, too. So, if you fix JSON and POST, you still 
have about 99% (likely would actually round to 100%) of attack surface left.

I found these links very informative about this matter: 
http://lwn.net/Articles/474912/ and http://bugs.python.org/issue13703#msg150840 
(the McMillan's post mentioned above).

 - Anssi

From: django-developers@googlegroups.com [django-developers@googlegroups.com] 
On Behalf Of Luke Plant [l.plant...@cantab.net]
Sent: Friday, January 20, 2012 15:46
To: django-developers@googlegroups.com
Subject: Re: DoS using POST via hash algorithm collision

On 20/01/12 08:47, Aymeric Augustin wrote:
> 2012/1/20 Łukasz Rekucki mailto:lreku...@gmail.com>>
>
> We all know browsers won't crash and they will render the page exactly
> the same. I volunteer to fix any issues in the test suite (considering
> the hash changes also between 32-bit/64-bit Python, i'm not sure there
> are even any or we would get a report on that, wouldn't we ?).
>
> I think it's important for the Django core team to voice their opinion
> on this matter in python-dev.
>
> Hello Łukasz,
>
> I absolutely agree -- code that relies on a deterministic dictionary
> order is broken and should be fixed.

I agree with this completely, and Carl's post:

http://mail.python.org/pipermail/python-dev/2012-January/115700.html

Whether this should be fixed in Python or not is a different question.

Most of the web specific problems can be fixed relatively easily with
HTTP specific solutions and limits. We can easily change how we handle
POST and GET data to a protected solution (by length limitation or a
custom datastructure), and we can protect cookie parsing using simple
length limits (and continue using stdlib SimpleCookie).

However, JSON parsing, which is a common task for web sites, is much
harder to fix, because almost by definition you've got to return
dictionaries with arbitrary keys and arbitrary size, and because as a
framework we don't control how developers do JSON parsing.

Luke


--
"Cross country skiing is great if you live in a small country."
(Steven Wright)

Luke Plant || http://lukeplant.me.uk/

--
You received this message because you are subscribed to the Google Groups 
"Django developers" group.
To post to this group, send email to django-developers@googlegroups.com.
To unsubscribe from this group, send email to 
django-developers+unsubscr...@googlegroups.com.
For more options, visit this group at 
http://groups.google.com/group/django-developers?hl=en.

-- 
You received this message because you are subscribed to the Google Groups 
"Django developers" group.
To post to this group, send email to django-developers@googlegroups.com.
To unsubscribe from this group, send email to 
django-developers+unsubscr...@googlegroups.com.
For more options, visit this group at 
http://groups.google.com/group/django-developers?hl=en.



RE: Custom managers in reverse relations

2012-01-14 Thread Kääriäinen Anssi
Just a quick thought: you should check out the work done for allowing use of 
manager methods in querysets (thread "RFC: query methods"). It seems that work 
does have some overlap with this feature. The patch for #3871 implements 
.manager('manager_name') for reverse relation managers, and there was some 
discussion for allowing .use_manager('manager_name') for querymethods. 
.use_manager() is not going to be in the query methods patch. I haven't looked 
#3871 in detail, but maybe the work done for query methods would make the #3871 
patch easier to implement?

The idea would be to issue: .use_manager(wanted_manager).all() in the 
.manager() method. The first method call would change the base manager to use, 
the second (.all) call would make it return a queryset, so that you would not 
have the .clear and .remove methods available. This might be a stupid idea, but 
maybe worth a try? The .use_manager() call would not need to exist on queryset 
level.

1.4 is feature frozen if I am not mistaken, so this would be 1.5 stuff.

 - Anssi

From: django-developers@googlegroups.com [django-developers@googlegroups.com] 
On Behalf Of Sebastian Goll [sebastian.g...@gmx.de]
Sent: Saturday, January 14, 2012 21:35
To: django-developers@googlegroups.com
Subject: Re: Custom managers in reverse relations

Hi all,

My latest post to the list seems to have been lost in the pre-Christmas
storm. Sorry for that!


The issue of picking which custom manager is used in resolving reverse
relations still stands. Let my give you an example why this is useful:

{{{
class Reporter(models.Model):
...

class Article(models.Model):
reporter = models.ForeignKey(Reporter)
...
articles = models.Manager()
published_articles = PublishedManager()
}}}

We put some thought into designing PublishedManager. Maybe it needs to
do some things in addition to simply checking a flag, who knows. The
thing is: right now, we simply cannot make use of this manager when
looking up a reporter's articles: with `reporter.article_set` we
always get _all_ articles. [1]


Now we have two options: doing the filtering manually, on the returned
queryset, or specify that we want to use PublishedManager, accessible
through the `published_articles` attribute of the Article class.

The latter is implemented by the patches in ticket #3871:

  https://code.djangoproject.com/ticket/3871


Does this seem like a good idea? Should it even be possible to specify
which custom manager is used for reverse relations? Or, am I missing
something and this is already possible in some other way?

Since I'm looking forward to seeing this implementation in Django 1.4,
I ask for your input on the matter.

Thanks!
Sebastian.

[1] In fact, that's not entirely true: we get whatever is returned by
the _default_ manager of the Article class. This seems like an
arbitrary choice: it's not a "plain" manager that always returns all
related instances, it's whatever we picked as the default manager.


On Fri, 23 Dec 2011 21:56:24 +0100
Sebastian Goll  wrote:

> Hi all,
>
> I'd like to draw your attention to long-open ticket #3871 [1].
>
> The idea is to let ORM users choose which custom manager to use for reverse 
> "many" relations, i.e. reverse foreign key (…_set) as well as forward and 
> reverse many-to-many relations.
>
> There are several proposed patches to this ticket, the latest was added by me 
> a week ago. The current implementation adds a "manager()" method to the 
> reverse manager which allows you to pick a manager different from the default 
> one on the related model. All changes are entirely backwards-compatible – if 
> you don't call the "manager()" method, everything is as before, i.e. the 
> default manager is used to look up related model instances.
>
>
> During my review of the previous patch I found that it doesn't apply cleanly 
> to trunk, as well as some concerns with regard to the general approach of the 
> implementation.
>
> Therefore, I wrote an alternative patch which is currently awaiting review. 
> Since I wrote that patch, I cannot review it myself. If you can spare some 
> time, maybe you can take a look at it and if you feel the current approach is 
> okay, bump the ticket to "ready for check-in".
>
> Of course feel free to raise any concerns you might have.
>
> Regards,
> Sebastian.
>
> PS: Merry X-Mas and whatnot! :D
>
> [1] https://code.djangoproject.com/ticket/3871

--
You received this message because you are subscribed to the Google Groups 
"Django developers" group.
To post to this group, send email to django-developers@googlegroups.com.
To unsubscribe from this group, send email to 
django-developers+unsubscr...@googlegroups.com.
For more options, visit this group at 
http://groups.google.com/group/django-developers?hl=en.

-- 
You received this message because you are subscribed to the Google Groups 
"Django developers" group.
To post to this group, send email to django-developers@googlegroups.co

RE: Deprecate change pk + save behavior for 1.4

2011-12-02 Thread Kääriäinen Anssi
"""
On 12/02/2011 06:54 PM, Kääriäinen Anssi wrote:
> I think I will pursuit the immutable PK approach, and see how it
> works (back to square one). BTW are there -1 calls on this approach,
> or the pk change tracking in general?

I haven't been fully following this thread, but I will say that I'm not
yet convinced that the ORM behavior should be changed such that saving
an instance with a modified PK updates the row rather than saving a new
instance.
"""

At this point this is not the idea. The idea is to just disallow this (assuming
multicolumn PK firstname, lastname):

user = User(firstname = 'Jack', lastname = 'Smith')
user.save()
user.firstname = 'John'
user.save()

Current behavior will leave Jack Smith in the DB, and save John Smith as
new object. I my opinion it is too easy to clone the object accidentally.

The idea would be to raise exception from the second save (deprecation
warning at this point). A way to get the current behavior is needed too, but
the user should explicitly request that.

Later on, maybe there should be some way to actually update the PK. But
that is not the current plan.

 - Anssi

-- 
You received this message because you are subscribed to the Google Groups 
"Django developers" group.
To post to this group, send email to django-developers@googlegroups.com.
To unsubscribe from this group, send email to 
django-developers+unsubscr...@googlegroups.com.
For more options, visit this group at 
http://groups.google.com/group/django-developers?hl=en.



RE: Deprecate change pk + save behavior for 1.4

2011-12-02 Thread Kääriäinen Anssi
"""
That's really too bad; I was hoping that that approach would work. (Also, I 
really hope nobody is using a FileField for a primary key ;) )

Is the problem here that we can't reliably tell whether the data that is going 
out to the DB layer has changed? I would think that no matter how the data is 
modified (in-place vs. setattr), that the one thing we could rely on, and the 
one thing that actually matters in this situation, is the serialised 
representation of the data. For a FileField, that would be the filesystem path 
(editing the file in place without changing the path wouldn't give you the 
duplication problems that you are having); for an IntegerField, it's just the 
number itself.

It should be the case that, no matter what sort of python magic a particular 
developer has added, it is equivalence at the SQL level that is causing 
problems. Maybe it's because I haven't tried to hack at this myself, but I 
can't see why storing a copy of the PK fields DB-representation on load, and 
checking them on save, isn't sufficient. There is a memory cost, but it should 
be small, unless you have very large fields for primary keys in your database, 
in which case you are already suffering from them, certainly :)
"""

Good idea. Maybe the best possibility for change tracking is offered by 
Field.value_to_string (returns string suitable for serialization). Using 
value_to_string the following would work:
  - add a flag "is_immutable" to fields. If set, just put the DB value directly 
to _state.old_pk upon model initialization. If not set, call value_to_string, 
and store that in old_pk. I don't think there will be many PK fields which are 
mutable, and even if there are, the value_to_string trick should work.
  - upon save, do the same again, if is_immutable is set, track changes by the 
actual attribute value, if not set, check value_to_string.

Getting the raw SQL string representation will be hard, for example PostgreSQL 
ListField would get the value from psycopg as a list, and would send it back to 
psycopg as a list. Maybe copy.copy() for the DB value would work.

I don't know how likely it is that people use FileFields, ListFields or other 
problematic cases as PK values. The easy way out would be to define that PK 
fields must be immutable, or the field must support change tracking itself by 
providing a suitable descriptor (Django could provide a base class). After that 
everything should be relatively easy. Come to think of it, mutable PK fields 
are probably pretty rare currently, as saving an object back to the DB after PK 
change might have some problems...

I think I will pursuit the immutable PK approach, and see how it works (back to 
square one). BTW are there -1 calls on this approach, or the pk change tracking 
in general?

 - Anssi

-- 
You received this message because you are subscribed to the Google Groups 
"Django developers" group.
To post to this group, send email to django-developers@googlegroups.com.
To unsubscribe from this group, send email to 
django-developers+unsubscr...@googlegroups.com.
For more options, visit this group at 
http://groups.google.com/group/django-developers?hl=en.



RE: Deprecate change pk + save behavior for 1.4

2011-12-02 Thread Kääriäinen Anssi
"""
Now for the funny part of this. I suspected that __setattr__ would
make init slower. But using a little trick, __init__ is now actually
almost 30% faster for a 10 field model. The trick is checking if
setattr needs to be called. It needs to be called if the user has
defined __setattr__ for some subclass of Model, otherwise we can call
directly object.__setattr__. For some reason, this is considerably
faster than calling setattr(self, f.attname, val). I hope the approach
is acceptable. The same trick could be employed directly to trunk
version of Django, resulting in the same speedup.
"""

Now I can't reproduce this speedup. I get pretty much the same speed on
master vs the __setattr__ trick. I don't know what I was doing wrong before,
I tested this literally tens of times trying to understand what is happening.
It is not that surprising that this speedup isn't real, as the speedup seemed
too good to be true.

So, forget about the above optimization for current Django trunk. However
it is still needed if the __setattr__ way of tracking attribute changes is going
to be used, as otherwise model __init__ will be much slower than currently.

I do understand if this is not wanted, as this adds some complexity and if
the only benefit is preventing accidental duplicates due to PK change,
it is questionable if it is worth it. However saving only changed attrs (and
skipping the save completely if there are no changes) could be nice in some
situations.

Maybe I should sleep a little before hacking more... :) Anyways it is probably
good to let this all wait until 1.5. There are pretty big design decisions here.

 - Anssi

-- 
You received this message because you are subscribed to the Google Groups 
"Django developers" group.
To post to this group, send email to django-developers@googlegroups.com.
To unsubscribe from this group, send email to 
django-developers+unsubscr...@googlegroups.com.
For more options, visit this group at 
http://groups.google.com/group/django-developers?hl=en.



RE: Deprecate change pk + save behavior for 1.4

2011-11-30 Thread Kääriäinen Anssi
"""
/me runs off to go correct Wikipedia ;)

I checked the Wikipedia article on Primary Key first, and didn't see that, but 
I did note this:

A table can have at most one primary key, but more than one unique key. A 
primary key is a combination of columns which uniquely specify a row. It is a 
special case of unique keys. One difference is that primary keys have an 
implicit NOT NULL constraint while unique keys do not.
"""

I was confused by this sentance in the wikipedia article:
Note that some DBMS require explicitly marking primary-key columns as NOT NULL.

"""
I'm not sure that I agree -- I don't know if there needs to be a fundamental 
distinction between a new model instance and one that was retrieved from the 
database. I do agree that there should be a way to specify "change the primary 
key on this object" vs "save a new object with this primary key".
"""

The problem, as I see it, is that it is all too easy to do .save() and end up 
duplicates in the DB while the user expects an update of the PK. Django admin 
has currently exactly this problem. Currently this is not that big of an 
problem, as natural primary keys aren't common. You can update the PK with some 
trickery, but that is not what I try to solve. I try to just forbid the 
"whoops, created an duplicate by accident" problem. This needs the information 
about the "old_pk".

One nice little problem more: Multitable inheritance allows object with 
multiple primary keys...

class A(models.Model):
f1 = models.IntegerField(primary_key=True)

class B(A):
f2 = models.IntegerField(primary_key=True)

# Now, B's primary key is f2, but when saving B, the underlying A instance 
needs to be saved too, and its primary key is f1. So, in save B has effectively 
2 primary keys.

b = B(f1=1, f2=1)
b.save()
B.objects.all()
[B obj: f1 = 1, f2 = 1]

b.f2 = 2
b.save()
IntegrityError (tries to save new B: f1=1, f2=2, but f1 needs to be unique)
b.f2 = 1
b.f1 = 2
b.save()
B.objects.all()
[B obj: f1 = 2, f2 = 1]
A.objects.all()
[A obj: f1 = 1, A obj: f1 = 2] # We got a new A obj, but no new B obj.

Add in multitable multi-inheritance... Making this work reliably in all 
situations seems complex. So, no simple solution in sight, and final nail for 
1.4 inclusion.

 - Anssi

# sidenote:
# try b.save() using postgresql again after the integrity error in the above 
example:
# DatabaseError: current transaction is aborted, commands ignored until end of 
transaction block
# connection.rollback()
# TransactionManagementError: This code isn't under transaction management
# Luckily the transaction is still rolled back :)

-- 
You received this message because you are subscribed to the Google Groups 
"Django developers" group.
To post to this group, send email to django-developers@googlegroups.com.
To unsubscribe from this group, send email to 
django-developers+unsubscr...@googlegroups.com.
For more options, visit this group at 
http://groups.google.com/group/django-developers?hl=en.



RE: Deprecate change pk + save behavior for 1.4

2011-11-30 Thread Kääriäinen Anssi

"""
Is this referring exclusively to natural, or user-specified primary key 
columns? Despite Luke's reference to nullable primary keys (are these even 
allowed by SQL?), a common idiom for copying objects is this:
"""

Not allowed by SQL specification, but many databases do allow them (source 
wikipedia).

"""
obj.pk = None
obj.save()

I have used use this pattern in more instances than I can remember; whether for 
duplicating objects, or for making new variants of existing objects. I would 
hate to see the behaviour deprecated, or worse, for the old object to simply 
get reassigned a new (or null) id.
"""

If nullable primary keys are going to be allowed, then the above can not work. 
You would need to use NoneMarker in there, or .save() would need a kwarg for 
backwards compatibility mode. obj.clone() is still another possibility. Maybe 
nullable primary keys should be forbidden?

"""
For changing natural primary key fields, I would prefer to see a pattern like 
this:

class User:
   firstname = models.CharField
   lastname = models.CharField
   pk = (firstname, lastname)

u = User.objects.get(firstname='Anssi', lastname='Kääriäinen')
u.firstname='Matti'
u.save(force_update=True)
"""

That is a possibility, although currently that has well defined meaning: try to 
update the object with pk ('Matti', 'Kääriäinen'), error if it does not exist 
in the DB.

"""
(specifically, with the force_update parameter being required for a PK change). 
Then, as long as we store the original PK values, the object can be updated in 
place. A bare save() would work just as currently changing the id field does -- 
create a new row if possible, otherwise, update the row whose PK matches the 
new values.
"""

IMHO forbidding creation of a new object while leaving the old object in place 
when calling save() is needed. Current behavior is unintuitive. One clear 
indication of this being unintuitive is that even Django's admin does not get 
it right. If bare save() will be deprecated, then an upgrade path for current 
uses is needed. A new kwarg for save (old_pk=None would be a possibility) or 
obj.clone() would be needed.

Solving all these problems before 1.4 seems hard.

 - Anssi

-- 
You received this message because you are subscribed to the Google Groups 
"Django developers" group.
To post to this group, send email to django-developers@googlegroups.com.
To unsubscribe from this group, send email to 
django-developers+unsubscr...@googlegroups.com.
For more options, visit this group at 
http://groups.google.com/group/django-developers?hl=en.



RE: Deprecate change pk + save behavior for 1.4

2011-11-30 Thread Kääriäinen Anssi
"""
> The reason for doing the deprecation now is that it would be nice that
> this behavior is already removed when multicolumn primary keys are
> introduced into Django.
>
> There is a ticket related to this: #2259.

Here is another that could be helped by this change, depending on
implementation - #14615
The decisions on that ticket basically boils down to the question of how
we detect a new object (which is waiting for PK from the DB). The
current solution of comparing with None (used in various places) fails
for nullable primary keys.
"""

I can think of two basic approaches to this: define a __setattr__ for Models,
and check if the pk is set after fetch from DB. This has at least three
problems:
 1. It is likely that users have custom __setattr__ methods that do not use
super().__setattr__ but change the dict directly.
 2. This way it is somewhat hard to detect if the PK has actually changed
or not. You can (and many users likely currently do) set the value to the same
value it is already.
 3. This will make model __init__ slower (although there are tricks to mitigate
this effect).

The other way is storing old_pk in model._state, and compare the PK to
that when saving. If changed, error. This would work best if there was a
NoneMarker object for the cases where there is no PK from DB, so you could
solve #14615 easily, too.

This could result in somewhat larger memory usage. Although normally you
could store the same string (or other object) in db_pk as you store in the
__dict__ of the model. This would mean minimal memory overhead unless
you change a lot of PKs in one go. Are there problematic (mutable object
based) model fields, where you would need to store a copy of the field's
value? We could possibly have an attribute "mutable object based field" for
the problematic fields...

One way to mitigate the speed effect is use of AST for model init. I have done
some experiments about this, see: https://github.com/akaariai/ast_model. That
does come with its own problems, but if templates are going to be using AST,
then we could use it in other places needing speedups, too.

 - Anssi

-- 
You received this message because you are subscribed to the Google Groups 
"Django developers" group.
To post to this group, send email to django-developers@googlegroups.com.
To unsubscribe from this group, send email to 
django-developers+unsubscr...@googlegroups.com.
For more options, visit this group at 
http://groups.google.com/group/django-developers?hl=en.



RE: Allowing models to influence QuerySet.update

2011-11-29 Thread Kääriäinen Anssi
"""
H, that is not ideal behavior. You mean QuerySet.delete() calls
the signal for each deleted object, rather than doing a delete at the
database level?
"""

This might not be exactly accurate, but I think it goes something like this:
  - Fetch all the to-be deleted objects (one query)
  - Check if there are cascades for those objects, fetch the cascades (one 
query per cascaded Model class?)
  - Send pre_delete signals for all deleted instances
  - Do the delete as one query for the to-be deleted objects, and then one 
query(?) per cascade Model class
  - Send post_delete signals

Now, this is not that inefficient - but it would be a good optimization to NOT 
fetch the instances if there are no listeners for pre/post delete signals and 
there are no cascades (or all cascades are DO_NOTHING). Even if there are 
cascades, you could fetch just PKs of the to-be deleted models (even that is 
not actually needed, as you can use joins)

Again: I am not 100% sure how this behaves...

 - Anssi

-- 
You received this message because you are subscribed to the Google Groups 
"Django developers" group.
To post to this group, send email to django-developers@googlegroups.com.
To unsubscribe from this group, send email to 
django-developers+unsubscr...@googlegroups.com.
For more options, visit this group at 
http://groups.google.com/group/django-developers?hl=en.



RE: Allowing models to influence QuerySet.update

2011-11-29 Thread Kääriäinen Anssi
"""
I also noticed in db optimization docs that we have explicitly
documented update() and delete() as bypassing signals, and I think we
should honour that.

https://docs.djangoproject.com/en/dev/topics/db/optimization/#use-queryset-update-and-delete
"""

Is this correct for delete? A quick test (A1 is a model which I have hanging 
around - details about it aren't important):

from django.db.models.signals import post_delete

def foo(*args, **kwargs):
print args, kwargs
post_delete.connect(foo, sender=A1)

A1(dt=datetime.now()).save()
A1.objects.all().delete()

Result:
() {'instance': , 'signal': , 'sender': , 
'using': 'default'}

Search post_delete in django/db/models/deletion.py. Signals seem to be sent, 
even for cascaded deletion.

Personally I don't think post/pre instance changed signals are the way to go if 
you want to do auditing. DB triggers are much more reliable. Some problems with 
the Django signals:
  - All operations do not send signals (bulk_create could easily send signals, 
the instances are available directly in that case, even bulk update could send 
signals per instance - first check if there is a listener, if there is, fetch 
all the updated instances and send signals, if there isn't, then don't fetch 
the instances. You will only pay the price when needed. Not saying this is a 
great idea, but maybe worth a thought).  
  - Proxy (including deferred models) and multitable-inherited models do not 
send signals as you would expect. I have groundwork for how to implement fast 
inherited signals in ticket #16679. The patch in that ticket also makes model 
__init__ much faster in certain common-enough cases. On the other hand, yet 
another cache.
  - If you do anything outside Django Models (raw SQL, dbshell, another 
application accessing the DB) your auditing will not work.
  - The DB triggers approach is faster.

The downside is that you will be programming DB specific triggers, in 
DB-specific language. Schema upgrades are a nightmare.

 - Anssi Kääriäinen

-- 
You received this message because you are subscribed to the Google Groups 
"Django developers" group.
To post to this group, send email to django-developers@googlegroups.com.
To unsubscribe from this group, send email to 
django-developers+unsubscr...@googlegroups.com.
For more options, visit this group at 
http://groups.google.com/group/django-developers?hl=en.



RE: queryset caching note in docs

2011-11-02 Thread Kääriäinen Anssi
"""
so, summarizing again:
  - mysql supports chunked fetch but will lock the table while fetching is in 
progress (likely causing deadlocks)
  - postgresql does not seem to suffer this issue and chunked fetch seems 
doable (not trivial) using named cursor
  - oracle does chunked fetch already (someone confirm this, please)
  - sqlite3 COULD do chunked fetch by using one connection per cursor 
(otherwise cursors will not be isolated)
"""

I did a little testing. It seems you can get the behavior you want if you just 
do this in PostgreSQL:
for obj in Model.objects.all().iterator(): # Note the extra .iterator()
# handle object here.

What is happening? Django correctly uses cursor.fetchmany(chunk_size) in 
models/sql/compiler.py. The chunk_size is hardcoded to 100. The problem is in 
db/models/query.py, and its __iter__ method. __iter__ will keep 
self._results_cache, and that is where the memory is consumed. Changing that is 
not wise, as in many cases you do want to keep the results around. The 
.iterator() call will skip the __iter__ and directly access the underlying 
generator.

You can also do objects.all()[0:10].iterator() and objects are correctly 
fetched without caching.

Here is a printout from my tests. The memory report is the total process memory 
use:

Code:
i = 0
for obj in User.objects.all()[0:10]:
i += 1
if i % 1000 == 0:
print memory()

25780.0kB
26304.0kB
26836.0kB
27380.0kB
27932.0kB
28468.0kB
29036.0kB
29580.0kB
29836.0kB
30388.0kB

And then:
i = 0
for obj in User.objects.all()[0:10].iterator():
i += 1
if i % 1000 == 0:
print memory()

25216.0kB
25216.0kB
25216.0kB
25216.0kB
25216.0kB
25216.0kB
25216.0kB
25216.0kB
25216.0kB
25216.0kB


This would be worth documenting, with maybe a better named method wrapping the 
.iterator(). I have no ideas for a better name, though.

I would sure like a verification to this test, I am tired and this seems like 
too easy of an fix. Or am I missing the problem?

 - Anssi

-- 
You received this message because you are subscribed to the Google Groups 
"Django developers" group.
To post to this group, send email to django-developers@googlegroups.com.
To unsubscribe from this group, send email to 
django-developers+unsubscr...@googlegroups.com.
For more options, visit this group at 
http://groups.google.com/group/django-developers?hl=en.



RE: Yet another __ne not equal discussion

2011-10-27 Thread Kääriäinen Anssi
"""
The exclude() option in its current form is unworkable on multi-valued
relations. I'd like to repeat that for emphasis: exclude() can *never*
obsolete direct negative lookups for multi-value relations.
"""

I do see a problem here: the equality ~Q(a=1) <-> Q(a__lt=1)|Q(a__gt=1) is
not correct in m2m situations: the first is asking for rows where a must not
be 1, most of all, if there is no a, it is a match. The other is asking for rows
where there must be an A value, and it must be < 1 or > 1, most of all, if
there is no value at all, it is NOT a match. So: 
filter(~Q(a=1), a__isnull=False) <-> Q(a__lt=1)|Q(a__gt=1).
The ORM is not able to handle the first version correctly. The interpretation
would be that there is at least one 'a' row, and its value is not 1.

I am strongly against the idea that Q(a__neq=1) would have different
interpretation of ~Q(a__eq=1). If they would have different interpretation, then
there would be basis for negative lookups. Although AFAICS you could still get
the same results using ~Q(a__eq=1, a__isnull=False) so the API would still work
without negative lookups.

I am basing the following discussion on the assumption that a__neq and ~a__eq
should be the same thing.

>From ORM API standpoint, the claim that .exclude() can never obsolete
direct negative lookups is wrong as far as I understand the problem.
Reason:

.filter(Q(__neq=val)) <-> .filter(~Q(__exact=val)) <-> 
.exclude(Q(__exact=val))

Another way to see this is that Django should return same results for
the queries:
filter(~Q(employment__school__site_name='RAE'), employment__end_date=None)
and
filter(employment__school__site_name__neq='RAE', employment__end_date=None)

However, I do not think your issue is due to the above equality between
the two different ways of writing ~__eq problem, it is due to a bug in ORM
implementation. The second filter condition is not pushed down to the
subquery generated by the negated Q condition, and thus it generates
another join and potentially targets different rows. I think this is the main
problem in your situation. This is reinforce by this snippet from your query:
WHERE (
NOT 
`data_staff`.`id`
IN ( subquery data_employment U1)
-- different data_employment reference from the subquery
AND `data_employment`.`end_date` IS NULL
)
That is, you have the data_employment table two times in the query, and
thus the filters are targeting potentially different rows. Note that this is a
bug. The conditions are in the same .filter() call, and thus they should target
the same row!

IMHO There are two underlying problems in the ORM related to this matter,
one is detecting when to use a subquery for the filter condition. The logic
for that is easily fooled. Another problem is that if you do a subquery,
other conditions that should go into the subquery WHERE are sometimes
not correctly pushed down to the subquery clause. This is similar to HAVING
clause pushdown problem.

I must say the m2m handling is very complicated, it took some time to see
the ~__eq=1 <-> __lt=1|__gt=1 difference for example... Thus, it is likely
that I am missing something else, too.

 - Anssi

-- 
You received this message because you are subscribed to the Google Groups 
"Django developers" group.
To post to this group, send email to django-developers@googlegroups.com.
To unsubscribe from this group, send email to 
django-developers+unsubscr...@googlegroups.com.
For more options, visit this group at 
http://groups.google.com/group/django-developers?hl=en.



RE: Yet another __ne not equal discussion

2011-10-27 Thread Kääriäinen Anssi
Quote:
"""
It's also worth noting that Q() objects permit the unary negation
operator, but this also yields the undesired results of the exclude()
call:
Blog.objects.filter(~Q(entry__author_count=2),
entry__tag__name='django')
"""

As far as I understand, this is exactly the query you want. The filters
are treated as single call, that is, they should target the same row,
not possibly different rows of the multijoin. It is another matter if it
actually works in current ORM implementation. IIRC something like
filter(~Q(pk=1)) and .exclude(Q(pk=1)) can produce different results.
But they _should_ produce the same result, and if they do not,
introducing negated lookups isn't the way to fix this - the correct
thing to do is fixing the ORM.

 - Anssi

-- 
You received this message because you are subscribed to the Google Groups 
"Django developers" group.
To post to this group, send email to django-developers@googlegroups.com.
To unsubscribe from this group, send email to 
django-developers+unsubscr...@googlegroups.com.
For more options, visit this group at 
http://groups.google.com/group/django-developers?hl=en.



RE: The state of per-site/per-view middleware caching in Django

2011-10-21 Thread Kääriäinen Anssi
I do not know nearly enough about caching to participate fully in this 
discussion. But it strikes me that the attempt to have CSRF protected anonymous 
page cached is not that smart. If you have an anonymous submittable form, why 
bother with CSRF protection? I mean, what is it protecting against? Making 
complex arrangements in the caching layer for this use case seems like wasted 
effort. Or am I missing something obvious?

The following is from the stupid ideas department: Maybe there could be a 
"reverse cache" template tag, such that you would mark the places where you 
want changing content as non-cacheable. You would need two views for this, one 
which would construct the "base content" and then another which would construct 
the dynamic parts. Something like:

page_cached.html:
... expensive to generate content ...
{% block "login_logout" non_cacheable %}
{% endblock %}
... expensive to generate content ...

You would generate the base page by a cached render view:

def page_view_cached(request, id):
if cached(id):
return cached_content
else:
... expensive queries ...
return cached_render("page_cached.html", context, ...)

The above view would not be directly usable at all, you would need to use a 
wrapper view which would render the non-cacheable parts:

def page_view(request, id):
# Below would return quickly from cache most of the time
cached_portions = page_view_cached(request, id)
return render_to_response("page.html", context={cached: cached_portions, 
user:request.user})

where page.html would be:
{% extends cached %}
{% block login_logout %}
{% if user.is_authenticated %}
Hello, user!
{% else %}
login
{% endif %}
{% endblock %}

That seems to be what is really wanted in this situation. The idea is quite 
simply to extend the block syntax to caching. A whole another issue is how to 
make this easy enough to be actually usable, and fast enough to be actually 
worth it.

 - Anssi


From: django-developers@googlegroups.com [django-developers@googlegroups.com] 
On Behalf Of Jim Dalton [jim.dal...@gmail.com]
Sent: Friday, October 21, 2011 16:02
To: django-developers@googlegroups.com
Subject: Re: The state of per-site/per-view middleware caching in Django

On Oct 20, 2011, at 6:02 PM, Carl Meyer wrote:

> -BEGIN PGP SIGNED MESSAGE-
> Hash: SHA1
>
> Hi Jim,
>
> This is a really useful summary of the current state of things, thanks
> for putting it together.
>
> Re the anonymous/authenticated issue, CSRF token, and Google Analytics
> cookies, it all boils down to the same root issue. And Niran is right,
> what we currently do re setting Vary: Cookie is what we have to do in
> order to be correct with respect to HTTP and upstream caches. For
> instance, we can't just remove Vary: Cookie from unauthenticated
> responses, because then upstream caches could serve that unauthenticated
> response to anyone, even if they are actually authenticated.
>
> Currently the Django page caching middleware behaves pretty much just
> like an upstream cache in terms of the Vary header. Apart from the
> CACHE_MIDDLEWARE_ANONYMOUS_ONLY setting, it just looks at the response,
> it doesn't make use of any additional "inside information" about what
> your Django site did to generate that response in order to decide what
> to cache and how to cache it.
>
> This approach is pretty attractive, because it's conceptually simple,
> consistent with upstream HTTP caching, and conservative (quite unlikely
> to serve the wrong cached content).
>
> It might be possible to make it "smarter" in certain cases, and allow it
> to cache more aggressively than an upstream cache can. #9249 is one
> proposal to do this for cookies that aren't used on the server, either
> via explicit setting or (in a recently-added proposal) via tracking
> which cookie values are accessed. If we did that, plus special-cased the
> session cookie if the user is unauthenticated and the session isn't used
> outside of contrib.auth, I think that could possibly solve the
> unauthenticated-users and GA issues.
>
> However, this (especially the latter) would come with the cost of making
> the cache middleware implementation more fragile and coupled to other
> parts of the framework. And it still doesn't help with CSRF, which is a
> much tougher nut to crack, because every response for pages using CSRF
> come with a Set-Cookie header and probably with a CSRF token embedded in
> the response content; and those both mean that response really can't be
> re-used for anyone else. (Getting rid of the token embedded in the HTML
> means forms couldn't ever POST without JS help, which is not an option
> as the documented default approach). You can mark some form-using views
> that are available to anonymous users as csrf-exempt, which exposes you
> potentially to CSRF-based spam, but isn't a security issue if you aren't
> treating authenticated submissions any differently from

RE: Removing pickle from cookie-based session storage

2011-10-02 Thread Kääriäinen Anssi
Ok, sorry for the uninformed rambling... Will check the code before posting 
next time :)

 - Anssi

-- 
You received this message because you are subscribed to the Google Groups 
"Django developers" group.
To post to this group, send email to django-developers@googlegroups.com.
To unsubscribe from this group, send email to 
django-developers+unsubscr...@googlegroups.com.
For more options, visit this group at 
http://groups.google.com/group/django-developers?hl=en.



RE: Removing pickle from cookie-based session storage

2011-10-02 Thread Kääriäinen Anssi
Forgetaboutit, the exact same problem is there for every session backend. This 
btw means that having write access to django_session table means exploit of all 
Django instances using that DB, right?

"""
Isn't there also the possibility that the attacker can somehow get arbitrary 
data signed into the session cookie without knowing SECRET_KEY? This could be 
due to a bug in the session framework or the developer does something really 
stupid. If this would be the case, then the bug would result in remote code 
execution exploit instead of the user being able to manipulate his session. 
Which sounds kinda scary.

If this is not changed to use JSON, there must be a warning that if the 
attacker can somehow change the contents of the cookie while keeping it signed, 
this results in remote exploit. One such way is knowing the SECRET_KEY.

My feeling is that this should be changed.

 - Anssi
"""

-- 
You received this message because you are subscribed to the Google Groups 
"Django developers" group.
To post to this group, send email to django-developers@googlegroups.com.
To unsubscribe from this group, send email to 
django-developers+unsubscr...@googlegroups.com.
For more options, visit this group at 
http://groups.google.com/group/django-developers?hl=en.



RE: Removing pickle from cookie-based session storage

2011-10-02 Thread Kääriäinen Anssi
"""
As I said in the first message, to the best of my knowledge, there's
nothing insecure about the implementation now. The usage of signing to
validate pickles received directly by end users expands our reliance
on SECRET_KEY pretty heavily. This concerns me, which is why I brought
it up here.
"""

Isn't there also the possibility that the attacker can somehow get arbitrary 
data signed into the session cookie without knowing SECRET_KEY? This could be 
due to a bug in the session framework or the developer does something really 
stupid. If this would be the case, then the bug would result in remote code 
execution exploit instead of the user being able to manipulate his session. 
Which sounds kinda scary.

If this is not changed to use JSON, there must be a warning that if the 
attacker can somehow change the contents of the cookie while keeping it signed, 
this results in remote exploit. One such way is knowing the SECRET_KEY.

My feeling is that this should be changed.

 - Anssi

-- 
You received this message because you are subscribed to the Google Groups 
"Django developers" group.
To post to this group, send email to django-developers@googlegroups.com.
To unsubscribe from this group, send email to 
django-developers+unsubscr...@googlegroups.com.
For more options, visit this group at 
http://groups.google.com/group/django-developers?hl=en.



RE: Removing pickle from cookie-based session storage

2011-10-01 Thread Kääriäinen Anssi


We recently committed changes to 1.4 that added signed cookie based
session storage. Session data is pickled, signed, and sent to the
client as a cookie. On receipt of the cookie, we check the signature,
unpickle, and use the data. We could use JSON instead of pickle, at
the expense of longer cookies.

I believe that our signing implementation is secure and correct.

However, I know that users of Django screw up from time to time. It's
not uncommon to see SECRET_KEY in a git repository, and that value is
often used in production. If SECRET_KEY is compromised, an attacker
can sign arbitrary cookie data. The use of pickle changes an attack
from "screw up the data in this application" to "arbitrary remote code
execution".

In light of this, we should be conservative and use JSON by
default instead of pickle.
"""

If the size of the cookie turns out to be a problem, using compressed JSON 
instead of JSON is a very simple change. I tested on my crummy old laptop, and 
using zlib one can compress + decompress roughly 5000 short strings in a 
second. On reasonable hardware I guess that figure will be 1-3 per 
thread. In the limit, when the compressed size is around 4Kb, one can compress 
about 500 strings a second (or 1000-3000 on reasonable hardware).  So, this 
could cause some performance concerns in extreme cases, but probably not enough 
to worry about.

The test program is simple:

import bz2
from datetime import datetime
import json
import random
import zlib

nums = [random.randint(0, 10) for _ in range(0, 1000)]
var = json.dumps({'nums': nums})
start = datetime.now()
for i in range(0, 1000):
compressed = zlib.compress(var)
uncompressed = zlib.decompress(compressed)
print datetime.now() - start
print len(var)
print len(compressed)

Note that when compressing random integers, one will still get over 50% 
compression. On more realistic data, the compression should be more.

 - Anssi

-- 
You received this message because you are subscribed to the Google Groups 
"Django developers" group.
To post to this group, send email to django-developers@googlegroups.com.
To unsubscribe from this group, send email to 
django-developers+unsubscr...@googlegroups.com.
For more options, visit this group at 
http://groups.google.com/group/django-developers?hl=en.



RE: [NoSQL] Sub-object queries / refactoring JOIN syntax into model fields

2011-09-28 Thread Kääriäinen Anssi
I think that lookup separator syntax is definitely the right approach here.

The implementation should modify setup_joins. I think the cleanest approach 
would be to detect that the current lookup part leads to a field with subfields 
in either this part of the code:
else:
# Non-relation fields.
target = field
break

Or directly after the for loop in setup_joins, around this line:
if pos != len(names) - 1:

In either case you would want to check if the current field supports subfields 
and then continue to do the dirty details of actually descending into the 
subfields and returning the results to add_filter. I wonder if you would need 
additional flag to setup_joins to indicate if subfield queries are allowed, in 
case somebody else than add_filters is the caller. That would be only the tenth 
parameter...

There is still the question if this should be included in core. I am in no 
position to answer that. All I can say a ListField (or ArrayField) would be 
useful in SQL land, too.

 - Anssi
_
From: django-developers@googlegroups.com [django-developers@googlegroups.com] 
On Behalf Of Jonas H. [jo...@lophus.org]
Sent: Wednesday, September 28, 2011 01:52
To: django-developers
Subject: [NoSQL] Sub-object queries / refactoring JOIN syntax into model fields

Hallöchen,

some non-relational databases (e.g. MongoDB) have support for
arbitrarily nested objects. To make queries that "reach" into these
sub-objects, the Django-nonrel developers find it appealing to use JOIN
syntax. For instance, if you had this person in your database

   {'name': 'Bob', 'address': {'city': 'NY', 'street': 'Wall Street 42'}}

you could find Bob using these queries:

   Person.objects.filter(name='Bob')
   Person.objects.filter(address__city='NY')
   Person.objects.filter(address__street__startswith='Wall')
   ...

Similarly, sub-objects may be stored in a list, like so:

   {
 'votes': [
   {'voter': 'Bob', 'vote': 42},
   {'voter': 'Ann', 'vote': 3.14}}
 ]
   }

   Vote.objects.filter(votes__vote__gt=2)
   ...


These sub-object queries are essential for non-relational databases to
be really useful so this is an important feature.

What's the core team's opinion on this topic -- is there any chance to
get something like that into Django at all? (Maybe you think two
meanings for one syntax cause too much confusion)

Secondly, how could this be implemented? I thought about refactoring
JOIN syntax handling into the model fields (as little logic as required;
refactoring the actual hardcore JOIN generation code seems like an
impossible task for anyone but the original author)... any other ideas?

So far,
Jonas

--
You received this message because you are subscribed to the Google Groups 
"Django developers" group.
To post to this group, send email to django-developers@googlegroups.com.
To unsubscribe from this group, send email to 
django-developers+unsubscr...@googlegroups.com.
For more options, visit this group at 
http://groups.google.com/group/django-developers?hl=en.

-- 
You received this message because you are subscribed to the Google Groups 
"Django developers" group.
To post to this group, send email to django-developers@googlegroups.com.
To unsubscribe from this group, send email to 
django-developers+unsubscr...@googlegroups.com.
For more options, visit this group at 
http://groups.google.com/group/django-developers?hl=en.