RE: Django's problem with db-level defaults on Oracle
Quick question: could django set the default to to_date('2014-31-01', '-mm-dd')? From: django-developers@googlegroups.com [django-developers@googlegroups.com] on behalf of Shai Berger [s...@platonix.com] Sent: Friday, October 31, 2014 17:34 To: django-developers@googlegroups.com Subject: Django's problem with db-level defaults on Oracle Hi Everyone, I just mentioned in another thread that db-level defaults are particularly troublesome on Oracle. I didn't want to burden that discussion with the detais, but having been asked about it on IRC (thanks Josh), here they are. The problem is caused by a combination of factors: 1) Oracle stores database-level defaults as strings, evaluated when needed. This is not, in itself, completely insensible -- the processing and space overheads (compared to some more "binary" representation) are negligible, and it means defaults "4" and "sysdate()" are treated by the system uniformly. 2) Django's Oracle backend sets the date-time format to a constant (close to ISO format), which is usually not the default. This has been used to perform some database date-time operations by manipulating strings -- because that way was easier to the developer implementing them, or there wasn't proper support for the feature otherwise; as a classic example, before 1.7, date-times used to be inserted into the database as strings, because some special manipulation was required to make cx_Oracle (the database driver library) support sub-second precision (thanks jtiai). I'm not completely sure how much date-string-manipulation remains in the Oracle backend today, but it is certainly still used for database defaults: Oracle doesn't take parameters in DDL statements. As a result of these two factors, when datetimes were set as default column values (which happened a lot with South<0.7.3), the value actually stored in the schema was a string specifying the date-time in a non-default format. Whenever Django connected to the DB, it set the session's date-time format to the "right" one, and so no problems were seen. But when backing up using the oracle "exp" utility -- which, as far as I'm aware, is pretty standard, at least as a developer backing up schemas on their own instance -- it was still these strings that were saved; and when trying to restore with the converse "imp", whose connection is (of course) not controlled by Django, the utility tried to set the date-time defaults by a format that was inappropriate for the values. This usually failed, resulting in partial restores, which lead to a lot of pain. If you're still here, you probably want to know how we solved the problem: Our DBA showed us how to install a database-level trigger to change the format whenever the relevant users logged on. This allowed us to get Oracle's "imp" to use the right date-time formats. However, this is highly non-obvious: I, for one, didn't even know such triggers existed. Thanks for your attention, Shai. -- You received this message because you are subscribed to the Google Groups "Django developers (Contributions to Django itself)" group. To unsubscribe from this group and stop receiving emails from it, send an email to django-developers+unsubscr...@googlegroups.com. To post to this group, send email to django-developers@googlegroups.com. Visit this group at http://groups.google.com/group/django-developers. To view this discussion on the web visit https://groups.google.com/d/msgid/django-developers/201410311734.08971.shai%40platonix.com. For more options, visit https://groups.google.com/d/optout. -- You received this message because you are subscribed to the Google Groups "Django developers (Contributions to Django itself)" group. To unsubscribe from this group and stop receiving emails from it, send an email to django-developers+unsubscr...@googlegroups.com. To post to this group, send email to django-developers@googlegroups.com. Visit this group at http://groups.google.com/group/django-developers. To view this discussion on the web visit https://groups.google.com/d/msgid/django-developers/7CDBD1EFB267CD41949C704C14E92DBF1A714566%40HELW040.stakes.fi. For more options, visit https://groups.google.com/d/optout.
RE: Django 1.6RC1 exclude behavior change
I'll look into this. - Anssi From: django-developers@googlegroups.com [django-developers@googlegroups.com] On Behalf Of jgas...@gmail.com [jgas...@gmail.com] Sent: Monday, November 04, 2013 17:16 To: django-developers@googlegroups.com Subject: Django 1.6RC1 exclude behavior change I've found what looks like a serious behavior change in the exclude queryset method from Django 1.5.5 to Django 1.6 rc1. It seems that on 1.5.5 exclude when traversing relationships only excluded items if all criteria on the kwargs were matched on the same related item. On 1.6rc1 it excludes items even if the criteria on the kwargs is only matched across multiple related items. I guess this explanation is not very clear, so here is a sample code that show the behavior change: http://pastebin.kde.org/pe1vlzd3v Since I didn't find anything on the change notes about this, it looks to me like a bug. Is it? Or am I missing something? -- You received this message because you are subscribed to the Google Groups "Django developers" group. To unsubscribe from this group and stop receiving emails from it, send an email to django-developers+unsubscr...@googlegroups.com. To post to this group, send email to django-developers@googlegroups.com. Visit this group at http://groups.google.com/group/django-developers. To view this discussion on the web visit https://groups.google.com/d/msgid/django-developers/8267eeb8-f1a7-46db-969e-79d819c8f797%40googlegroups.com. For more options, visit https://groups.google.com/groups/opt_out. -- You received this message because you are subscribed to the Google Groups "Django developers" group. To unsubscribe from this group and stop receiving emails from it, send an email to django-developers+unsubscr...@googlegroups.com. To post to this group, send email to django-developers@googlegroups.com. Visit this group at http://groups.google.com/group/django-developers. To view this discussion on the web visit https://groups.google.com/d/msgid/django-developers/FDD0C28683BA024195713874AF8663B31C6F86DC6E%40EXMAIL.stakes.fi. For more options, visit https://groups.google.com/groups/opt_out.
Re: Is "transaction.atomic" in 1.6 supposed to work this way?
For the performance part: a simple model.save() is about 50% more expensive with savepoints. This time is used in the database. In addition there are 3 network trips instead of one. This could add latency in some usecases. Original message Subject: Re: Is "transaction.atomic" in 1.6 supposed to work this way? From: Aymeric Augustin To: "django-developers@googlegroups.com" CC: Le 21 sept. 2013 à 15:53, Richard Ward mailto:daedalusf...@gmail.com>> a écrit : You say in your docs patch that savepoints are cheap Truth be said, I haven't run benchmarks. so what is transaction.atomic(savepoint=False) for? is it just for performance, or is more like an assertion that we are definitely in a transaction (or both?). It's mostly for performance. Ask Anssi for details. There a second, more practical, reason; read below. At present the decision to rollback or commit is based on whether there is a current exeption and whether needs_rollback is True. If instead this were just based on whether there is a current exception (getting rid of needs_rollback), then exceptions bubbling from inside a transaction.atomic(savepoint=False) would still cause a rollback, and catching an exception (hiding it from the context manager) would lead to a commit (or at least an attempt to commit). This would leave Django+PostgreSQL's behaviour unchanged You may be right. I'm not sure. This code is tricky. Such assertions routinely take more than 10 hours of work to confirm. Removing the option for savepoint=False would have the same effect It would have the drawback of breaking everyone's assertNumQueries because of the extra savepoints introduced by Django. This would be very hostile to people porting large, well-tested code bases. -- Aymeric (mobile). -- You received this message because you are -- You received this message because you are subscribed to the Google Groups "Django developers" group. To unsubscribe from this group and stop receiving emails from it, send an email to django-developers+unsubscr...@googlegroups.com. To post to this group, send email to django-developers@googlegroups.com. Visit this group at http://groups.google.com/group/django-developers. For more options, visit https://groups.google.com/groups/opt_out.
RE: Custom Chainable QuerySets (#20625)
Same pull request at https://github.com/django/django/pull/1328. Seems like it is still getting some review & update activity. I am planning on doing a final review on commit, but judging by the amount of reviews done already I think this one will be very polished by Friday. From: django-developers@googlegroups.com [django-developers@googlegroups.com] On Behalf Of Aymeric Augustin [aymeric.augus...@polytechnique.org] Sent: Wednesday, July 24, 2013 21:39 To: django-developers@googlegroups.com Subject: Re: Custom Chainable QuerySets (#20625) On 24 juil. 2013, at 13:53, Anssi Kääriäinen wrote: > I will commit the patch on Friday. If somebody wants more time to review the > patch, just ask and I will defer the commit to later date. Where's the version of the patch you're ready to commit? -- Aymeric. -- You received this message because you are subscribed to the Google Groups "Django developers" group. To unsubscribe from this group and stop receiving emails from it, send an email to django-developers+unsubscr...@googlegroups.com. To post to this group, send email to django-developers@googlegroups.com. Visit this group at http://groups.google.com/group/django-developers. For more options, visit https://groups.google.com/groups/opt_out. -- You received this message because you are subscribed to the Google Groups "Django developers" group. To unsubscribe from this group and stop receiving emails from it, send an email to django-developers+unsubscr...@googlegroups.com. To post to this group, send email to django-developers@googlegroups.com. Visit this group at http://groups.google.com/group/django-developers. For more options, visit https://groups.google.com/groups/opt_out.
RE: Add signals for QuerySet bulk operations such as `delete`, `update, `bulk_create`
A somewhat different proposal is in ticket #17824: Add generic pre/post_modify signal. I think the generic "object modified" signals would fit your use case very well. The idea is that there would be just one signal which would be fired for any data modifying ORM operation. The arguments for it: - fired for every operation modifying data - one signal to listen to for all data modifications The likely counter-arguments: - duplicates the existing signals - the callbacks end up being a big "switch statement", and thus you end up separating save, delete etc anyways. - the API isn't good enough >From performance perspective there should be no big problems: the signal is >given an iterable as "objs_modified" argument. For .update() for example, where you don't want to fetch all the objects for performance reasons, you could just pass qs.filter(update_filters) as the modified objects. This way there would be no performance penalty, except if there is actual use of the signal. I would like to see a generic pre/post modify signal, as I think it is much easier to use than using the pre/post save/delete + m2m_changed signals. However, I do not feel strongly at all about this, just something I would find useful. I believe having total control of all data modifying operations using Django signals would be a welcome addition for many users. - Anssi From: django-developers@googlegroups.com [django-developers@googlegroups.com] On Behalf Of Byron Ruth [bjr...@gmail.com] Sent: Sunday, March 25, 2012 17:46 To: django-developers@googlegroups.com Subject: Add signals for QuerySet bulk operations such as `delete`, `update, `bulk_create` My use case is for regenerating aggregate data cache at a table level. Simply calling a single signal after a bulk operation is complete would enable invalidating such aggregate cache. There is not a very clean alternate solution to this problem unless using database triggers which calls an external script that invalidates the cache. -- You received this message because you are subscribed to the Google Groups "Django developers" group. To view this discussion on the web visit https://groups.google.com/d/msg/django-developers/-/DAaTRIau8h8J. To post to this group, send email to django-developers@googlegroups.com. To unsubscribe from this group, send email to django-developers+unsubscr...@googlegroups.com. For more options, visit this group at http://groups.google.com/group/django-developers?hl=en. -- You received this message because you are subscribed to the Google Groups "Django developers" group. To post to this group, send email to django-developers@googlegroups.com. To unsubscribe from this group, send email to django-developers+unsubscr...@googlegroups.com. For more options, visit this group at http://groups.google.com/group/django-developers?hl=en.
RE: Complex aggregate and expression composition.
I took a quick look at your patch. I don't have more time now, so just some quick comments: - In general, the approach where aggregates are just expressions sounds and looks valid. - I would not worry about the extra time used in djangobench. However, profiling why there is extra time used is always recommended. - I am a bit scared of the type coercions. The reason is that this could prove to be hopelessly complex to get right in every case. However, I do not have concrete examples where this is in fact a problem. The default should probably not be an exception, but just returning what the database happens to give you back. I think the approach you have taken is correct in general. I would encourage to check if you can somewhat easily incorporate the conditional aggregate support (#11305) into the ExpressionNode based aggreagates. It does not belong into the same patch, but is a good sanity check if the approach taken is extensible. [Following is a bit off-topic] I wonder if the ExpressionNode itself should be refactored into a public API. This way you could easily write your own SQL snippets injectable into the query. This could be custom aggregates, or this could be just NULLS LAST order by clauses. The reason I bring this up is that in the long run, adding more and more special case support to the ORM (conditional aggregates, different SQL functions) doesn't seem to be the right way forward. Once you get expression composition in, you only have 90% of SQL constructs left... Spend the time in building support for user writable SQL snippets, so that they can use just the SQL they want. In my opinion NULLS LAST/FIRST support is a great example: it is common enough that users need it from time to time, but it is not common enough to spend the time to support this special case. Why not just: qs.order_by(SQL('%s NULLS LAST', F('pub_date')) and you now got support for _any_ order by clause the user wishes to use. Replaces extra(), but in a cleaner way. The above could support relabel_aliases(). Or you could write it just as qs.order_by(SQL('pub_date NULLS LAST')) if you don't care for relabel aliases support. For the F-expression support in aggregates this would mean you get actually not just F expression support in aggregates, but any SQL snippet can be injected into the aggregates, for example Sum(SQL('case when person.age > friend.age then 1 else 0 end')) - Anssi From: django-developers@googlegroups.com [django-developers@googlegroups.com] On Behalf Of Nate Bragg [jonathan.br...@alum.rpi.edu] Sent: Wednesday, March 21, 2012 01:27 To: django-developers@googlegroups.com Subject: Complex aggregate and expression composition. Hello all, Since even before I saw Alex Gaynor's presentation "I hate the Django ORM" (the one with the `Sum(F("end_time") - F("start_time"))` query), the problem of complex aggregates and expressions has vexed me. So, I figured I would try to solve it. Originally, I started this trying to pursue a solution to ticket #14030, but after I took a couple of lousy shots at it, it dawned on me that the ticket would be better resolved as a result of solving the more general case. I realized that aggregates were just a special case of expressions, and that the best solution was going to take a refactoring of Aggregate into ExpressionNode. I have uploaded my branch; it can be found here: https://github.com/NateBragg/django/tree/14030 Since this is a non-trivial change, I was hoping to open the topic for debate here, and get some feedback before proposing my solution for inclusion. Some particular points of note: * I tried to preserve as much interface as possible; I didn't know how much was considered to be more public, so generally I tried to add rather than subtract. However, I did remove a couple things - if you see something missing that shouldn't be, let me know. * Currently, I'm getting the entire test suite passed on sqlite, oracle, mysql, postgres, and postgis. I was unable to test on oracle spatial - any help with that would be appreciated. * When fields are combined, they are coerced to a common type; IntegerFields are coerced to FloatFields, which are coerced into DecimalFields as needed. Any other kinds of combinations must be of the same types. Also, when coerced to a DecimalField, the precision is limited by the original DecimalField. If this is not correct, or other coercions should be added, I'd like to correct that. * When joins are required, they tend to be LEFT OUTER; I'd like some feedback on this, as I'm not 100% sure its always the best behavior. * As the aggregates are a little more complicated, on trivial cases there is a minor reduction in performance; using djangobench, I measured somewhere between a 3% and 8% increase in runtime. * I don't have enough tests - mostly for a lack of creativity. What kind of composed aggregates and expressions wo
RE: commit_on_success leaves incorrect PostgreSQL isolation mode?
This issue is handled in ticket #16407 (https://code.djangoproject.com/ticket/16047), but it is unlikely to get fixed in 1.4. Making changes to transaction management code at this late stage of development cycle isn't something I am willing to do. Your analysis of the cause of the problem is correct. When psycopg2 leaves transaction management, it mistakenly uses the current transaction management state when setting isolation level, not the one before the current one. So, when leaving transaction management Django sees that the current state is managed and thus keeps autocommit off, instead of seeing that the previous state was unmanaged and setting autocommit to on. There is also a fix for this in the above mentioned ticket. - Anssi From: django-developers@googlegroups.com [django-developers@googlegroups.com] On Behalf Of Christophe Pettus [x...@thebuild.com] Sent: Monday, March 19, 2012 08:23 To: django-developers@googlegroups.com Subject: commit_on_success leaves incorrect PostgreSQL isolation mode? While exploring the Django transaction stuff (in 1.4rc1), I ran across the following behavior. I use commit_on_success as the example here, but the other transaction decorators/context managers have the same issue. It seems to me to be a bug, but I wanted to confirm this before I opened an issue. The configuration is running Django using the psycopg2 backend, with 'OPTIONS': { 'autocommit': True, } Consider the following code: from django.db import transaction, DEFAULT_DB_ALIAS, connections from myapp.mymodels import X x = X.objects.get(id=1) print connections[DEFAULT_DB_ALIAS].isolation_level # As expected, it's 0 here. x.myfield = 'Foo' with commit_on_success(): x.save() print connections[DEFAULT_DB_ALIAS].isolation_level # As expected, it's 1 here. print connections[DEFAULT_DB_ALIAS].isolation_level # It's now 1 here, but shouldn't it be back to 0? The bug seems to be that the isolation level does not get reset back to 0, even when leaving connection management. This means that any further operations on the database will open a new transaction (since psycopg2 will automatically open), but this transaction won't be managed in any way. The bug appears to be in django.db.backends.BaseDatabaseWrapper.leave_transaction_management; it calls the _leave_transaction_management hook first thing, but this means that is_managed() will return true (since the decorators call managed(True)), which means that _leave_transaction_management in the psycopg2 backend will not reset the transaction isolation level; the code in the psycopg2 backend seems to assume that it will be run in the new transaction context, not the previous one. Or am I missing something? -- -- Christophe Pettus x...@thebuild.com -- You received this message because you are subscribed to the Google Groups "Django developers" group. To post to this group, send email to django-developers@googlegroups.com. To unsubscribe from this group, send email to django-developers+unsubscr...@googlegroups.com. For more options, visit this group at http://groups.google.com/group/django-developers?hl=en. -- You received this message because you are subscribed to the Google Groups "Django developers" group. To post to this group, send email to django-developers@googlegroups.com. To unsubscribe from this group, send email to django-developers+unsubscr...@googlegroups.com. For more options, visit this group at http://groups.google.com/group/django-developers?hl=en.
RE: DoS using POST via hash algorithm collision
Paul McMillan had a very good posting about this on the Python issue tracker. The problem is that whenever you put user supplied data into a hashmap, you are vulnerable to this attack. This basically includes most Python modules, and I would guess a lot of user code, too. So, if you fix JSON and POST, you still have about 99% (likely would actually round to 100%) of attack surface left. I found these links very informative about this matter: http://lwn.net/Articles/474912/ and http://bugs.python.org/issue13703#msg150840 (the McMillan's post mentioned above). - Anssi From: django-developers@googlegroups.com [django-developers@googlegroups.com] On Behalf Of Luke Plant [l.plant...@cantab.net] Sent: Friday, January 20, 2012 15:46 To: django-developers@googlegroups.com Subject: Re: DoS using POST via hash algorithm collision On 20/01/12 08:47, Aymeric Augustin wrote: > 2012/1/20 Łukasz Rekucki mailto:lreku...@gmail.com>> > > We all know browsers won't crash and they will render the page exactly > the same. I volunteer to fix any issues in the test suite (considering > the hash changes also between 32-bit/64-bit Python, i'm not sure there > are even any or we would get a report on that, wouldn't we ?). > > I think it's important for the Django core team to voice their opinion > on this matter in python-dev. > > Hello Łukasz, > > I absolutely agree -- code that relies on a deterministic dictionary > order is broken and should be fixed. I agree with this completely, and Carl's post: http://mail.python.org/pipermail/python-dev/2012-January/115700.html Whether this should be fixed in Python or not is a different question. Most of the web specific problems can be fixed relatively easily with HTTP specific solutions and limits. We can easily change how we handle POST and GET data to a protected solution (by length limitation or a custom datastructure), and we can protect cookie parsing using simple length limits (and continue using stdlib SimpleCookie). However, JSON parsing, which is a common task for web sites, is much harder to fix, because almost by definition you've got to return dictionaries with arbitrary keys and arbitrary size, and because as a framework we don't control how developers do JSON parsing. Luke -- "Cross country skiing is great if you live in a small country." (Steven Wright) Luke Plant || http://lukeplant.me.uk/ -- You received this message because you are subscribed to the Google Groups "Django developers" group. To post to this group, send email to django-developers@googlegroups.com. To unsubscribe from this group, send email to django-developers+unsubscr...@googlegroups.com. For more options, visit this group at http://groups.google.com/group/django-developers?hl=en. -- You received this message because you are subscribed to the Google Groups "Django developers" group. To post to this group, send email to django-developers@googlegroups.com. To unsubscribe from this group, send email to django-developers+unsubscr...@googlegroups.com. For more options, visit this group at http://groups.google.com/group/django-developers?hl=en.
RE: Custom managers in reverse relations
Just a quick thought: you should check out the work done for allowing use of manager methods in querysets (thread "RFC: query methods"). It seems that work does have some overlap with this feature. The patch for #3871 implements .manager('manager_name') for reverse relation managers, and there was some discussion for allowing .use_manager('manager_name') for querymethods. .use_manager() is not going to be in the query methods patch. I haven't looked #3871 in detail, but maybe the work done for query methods would make the #3871 patch easier to implement? The idea would be to issue: .use_manager(wanted_manager).all() in the .manager() method. The first method call would change the base manager to use, the second (.all) call would make it return a queryset, so that you would not have the .clear and .remove methods available. This might be a stupid idea, but maybe worth a try? The .use_manager() call would not need to exist on queryset level. 1.4 is feature frozen if I am not mistaken, so this would be 1.5 stuff. - Anssi From: django-developers@googlegroups.com [django-developers@googlegroups.com] On Behalf Of Sebastian Goll [sebastian.g...@gmx.de] Sent: Saturday, January 14, 2012 21:35 To: django-developers@googlegroups.com Subject: Re: Custom managers in reverse relations Hi all, My latest post to the list seems to have been lost in the pre-Christmas storm. Sorry for that! The issue of picking which custom manager is used in resolving reverse relations still stands. Let my give you an example why this is useful: {{{ class Reporter(models.Model): ... class Article(models.Model): reporter = models.ForeignKey(Reporter) ... articles = models.Manager() published_articles = PublishedManager() }}} We put some thought into designing PublishedManager. Maybe it needs to do some things in addition to simply checking a flag, who knows. The thing is: right now, we simply cannot make use of this manager when looking up a reporter's articles: with `reporter.article_set` we always get _all_ articles. [1] Now we have two options: doing the filtering manually, on the returned queryset, or specify that we want to use PublishedManager, accessible through the `published_articles` attribute of the Article class. The latter is implemented by the patches in ticket #3871: https://code.djangoproject.com/ticket/3871 Does this seem like a good idea? Should it even be possible to specify which custom manager is used for reverse relations? Or, am I missing something and this is already possible in some other way? Since I'm looking forward to seeing this implementation in Django 1.4, I ask for your input on the matter. Thanks! Sebastian. [1] In fact, that's not entirely true: we get whatever is returned by the _default_ manager of the Article class. This seems like an arbitrary choice: it's not a "plain" manager that always returns all related instances, it's whatever we picked as the default manager. On Fri, 23 Dec 2011 21:56:24 +0100 Sebastian Goll wrote: > Hi all, > > I'd like to draw your attention to long-open ticket #3871 [1]. > > The idea is to let ORM users choose which custom manager to use for reverse > "many" relations, i.e. reverse foreign key (…_set) as well as forward and > reverse many-to-many relations. > > There are several proposed patches to this ticket, the latest was added by me > a week ago. The current implementation adds a "manager()" method to the > reverse manager which allows you to pick a manager different from the default > one on the related model. All changes are entirely backwards-compatible – if > you don't call the "manager()" method, everything is as before, i.e. the > default manager is used to look up related model instances. > > > During my review of the previous patch I found that it doesn't apply cleanly > to trunk, as well as some concerns with regard to the general approach of the > implementation. > > Therefore, I wrote an alternative patch which is currently awaiting review. > Since I wrote that patch, I cannot review it myself. If you can spare some > time, maybe you can take a look at it and if you feel the current approach is > okay, bump the ticket to "ready for check-in". > > Of course feel free to raise any concerns you might have. > > Regards, > Sebastian. > > PS: Merry X-Mas and whatnot! :D > > [1] https://code.djangoproject.com/ticket/3871 -- You received this message because you are subscribed to the Google Groups "Django developers" group. To post to this group, send email to django-developers@googlegroups.com. To unsubscribe from this group, send email to django-developers+unsubscr...@googlegroups.com. For more options, visit this group at http://groups.google.com/group/django-developers?hl=en. -- You received this message because you are subscribed to the Google Groups "Django developers" group. To post to this group, send email to django-developers@googlegroups.co
RE: Deprecate change pk + save behavior for 1.4
""" On 12/02/2011 06:54 PM, Kääriäinen Anssi wrote: > I think I will pursuit the immutable PK approach, and see how it > works (back to square one). BTW are there -1 calls on this approach, > or the pk change tracking in general? I haven't been fully following this thread, but I will say that I'm not yet convinced that the ORM behavior should be changed such that saving an instance with a modified PK updates the row rather than saving a new instance. """ At this point this is not the idea. The idea is to just disallow this (assuming multicolumn PK firstname, lastname): user = User(firstname = 'Jack', lastname = 'Smith') user.save() user.firstname = 'John' user.save() Current behavior will leave Jack Smith in the DB, and save John Smith as new object. I my opinion it is too easy to clone the object accidentally. The idea would be to raise exception from the second save (deprecation warning at this point). A way to get the current behavior is needed too, but the user should explicitly request that. Later on, maybe there should be some way to actually update the PK. But that is not the current plan. - Anssi -- You received this message because you are subscribed to the Google Groups "Django developers" group. To post to this group, send email to django-developers@googlegroups.com. To unsubscribe from this group, send email to django-developers+unsubscr...@googlegroups.com. For more options, visit this group at http://groups.google.com/group/django-developers?hl=en.
RE: Deprecate change pk + save behavior for 1.4
""" That's really too bad; I was hoping that that approach would work. (Also, I really hope nobody is using a FileField for a primary key ;) ) Is the problem here that we can't reliably tell whether the data that is going out to the DB layer has changed? I would think that no matter how the data is modified (in-place vs. setattr), that the one thing we could rely on, and the one thing that actually matters in this situation, is the serialised representation of the data. For a FileField, that would be the filesystem path (editing the file in place without changing the path wouldn't give you the duplication problems that you are having); for an IntegerField, it's just the number itself. It should be the case that, no matter what sort of python magic a particular developer has added, it is equivalence at the SQL level that is causing problems. Maybe it's because I haven't tried to hack at this myself, but I can't see why storing a copy of the PK fields DB-representation on load, and checking them on save, isn't sufficient. There is a memory cost, but it should be small, unless you have very large fields for primary keys in your database, in which case you are already suffering from them, certainly :) """ Good idea. Maybe the best possibility for change tracking is offered by Field.value_to_string (returns string suitable for serialization). Using value_to_string the following would work: - add a flag "is_immutable" to fields. If set, just put the DB value directly to _state.old_pk upon model initialization. If not set, call value_to_string, and store that in old_pk. I don't think there will be many PK fields which are mutable, and even if there are, the value_to_string trick should work. - upon save, do the same again, if is_immutable is set, track changes by the actual attribute value, if not set, check value_to_string. Getting the raw SQL string representation will be hard, for example PostgreSQL ListField would get the value from psycopg as a list, and would send it back to psycopg as a list. Maybe copy.copy() for the DB value would work. I don't know how likely it is that people use FileFields, ListFields or other problematic cases as PK values. The easy way out would be to define that PK fields must be immutable, or the field must support change tracking itself by providing a suitable descriptor (Django could provide a base class). After that everything should be relatively easy. Come to think of it, mutable PK fields are probably pretty rare currently, as saving an object back to the DB after PK change might have some problems... I think I will pursuit the immutable PK approach, and see how it works (back to square one). BTW are there -1 calls on this approach, or the pk change tracking in general? - Anssi -- You received this message because you are subscribed to the Google Groups "Django developers" group. To post to this group, send email to django-developers@googlegroups.com. To unsubscribe from this group, send email to django-developers+unsubscr...@googlegroups.com. For more options, visit this group at http://groups.google.com/group/django-developers?hl=en.
RE: Deprecate change pk + save behavior for 1.4
""" Now for the funny part of this. I suspected that __setattr__ would make init slower. But using a little trick, __init__ is now actually almost 30% faster for a 10 field model. The trick is checking if setattr needs to be called. It needs to be called if the user has defined __setattr__ for some subclass of Model, otherwise we can call directly object.__setattr__. For some reason, this is considerably faster than calling setattr(self, f.attname, val). I hope the approach is acceptable. The same trick could be employed directly to trunk version of Django, resulting in the same speedup. """ Now I can't reproduce this speedup. I get pretty much the same speed on master vs the __setattr__ trick. I don't know what I was doing wrong before, I tested this literally tens of times trying to understand what is happening. It is not that surprising that this speedup isn't real, as the speedup seemed too good to be true. So, forget about the above optimization for current Django trunk. However it is still needed if the __setattr__ way of tracking attribute changes is going to be used, as otherwise model __init__ will be much slower than currently. I do understand if this is not wanted, as this adds some complexity and if the only benefit is preventing accidental duplicates due to PK change, it is questionable if it is worth it. However saving only changed attrs (and skipping the save completely if there are no changes) could be nice in some situations. Maybe I should sleep a little before hacking more... :) Anyways it is probably good to let this all wait until 1.5. There are pretty big design decisions here. - Anssi -- You received this message because you are subscribed to the Google Groups "Django developers" group. To post to this group, send email to django-developers@googlegroups.com. To unsubscribe from this group, send email to django-developers+unsubscr...@googlegroups.com. For more options, visit this group at http://groups.google.com/group/django-developers?hl=en.
RE: Deprecate change pk + save behavior for 1.4
""" /me runs off to go correct Wikipedia ;) I checked the Wikipedia article on Primary Key first, and didn't see that, but I did note this: A table can have at most one primary key, but more than one unique key. A primary key is a combination of columns which uniquely specify a row. It is a special case of unique keys. One difference is that primary keys have an implicit NOT NULL constraint while unique keys do not. """ I was confused by this sentance in the wikipedia article: Note that some DBMS require explicitly marking primary-key columns as NOT NULL. """ I'm not sure that I agree -- I don't know if there needs to be a fundamental distinction between a new model instance and one that was retrieved from the database. I do agree that there should be a way to specify "change the primary key on this object" vs "save a new object with this primary key". """ The problem, as I see it, is that it is all too easy to do .save() and end up duplicates in the DB while the user expects an update of the PK. Django admin has currently exactly this problem. Currently this is not that big of an problem, as natural primary keys aren't common. You can update the PK with some trickery, but that is not what I try to solve. I try to just forbid the "whoops, created an duplicate by accident" problem. This needs the information about the "old_pk". One nice little problem more: Multitable inheritance allows object with multiple primary keys... class A(models.Model): f1 = models.IntegerField(primary_key=True) class B(A): f2 = models.IntegerField(primary_key=True) # Now, B's primary key is f2, but when saving B, the underlying A instance needs to be saved too, and its primary key is f1. So, in save B has effectively 2 primary keys. b = B(f1=1, f2=1) b.save() B.objects.all() [B obj: f1 = 1, f2 = 1] b.f2 = 2 b.save() IntegrityError (tries to save new B: f1=1, f2=2, but f1 needs to be unique) b.f2 = 1 b.f1 = 2 b.save() B.objects.all() [B obj: f1 = 2, f2 = 1] A.objects.all() [A obj: f1 = 1, A obj: f1 = 2] # We got a new A obj, but no new B obj. Add in multitable multi-inheritance... Making this work reliably in all situations seems complex. So, no simple solution in sight, and final nail for 1.4 inclusion. - Anssi # sidenote: # try b.save() using postgresql again after the integrity error in the above example: # DatabaseError: current transaction is aborted, commands ignored until end of transaction block # connection.rollback() # TransactionManagementError: This code isn't under transaction management # Luckily the transaction is still rolled back :) -- You received this message because you are subscribed to the Google Groups "Django developers" group. To post to this group, send email to django-developers@googlegroups.com. To unsubscribe from this group, send email to django-developers+unsubscr...@googlegroups.com. For more options, visit this group at http://groups.google.com/group/django-developers?hl=en.
RE: Deprecate change pk + save behavior for 1.4
""" Is this referring exclusively to natural, or user-specified primary key columns? Despite Luke's reference to nullable primary keys (are these even allowed by SQL?), a common idiom for copying objects is this: """ Not allowed by SQL specification, but many databases do allow them (source wikipedia). """ obj.pk = None obj.save() I have used use this pattern in more instances than I can remember; whether for duplicating objects, or for making new variants of existing objects. I would hate to see the behaviour deprecated, or worse, for the old object to simply get reassigned a new (or null) id. """ If nullable primary keys are going to be allowed, then the above can not work. You would need to use NoneMarker in there, or .save() would need a kwarg for backwards compatibility mode. obj.clone() is still another possibility. Maybe nullable primary keys should be forbidden? """ For changing natural primary key fields, I would prefer to see a pattern like this: class User: firstname = models.CharField lastname = models.CharField pk = (firstname, lastname) u = User.objects.get(firstname='Anssi', lastname='Kääriäinen') u.firstname='Matti' u.save(force_update=True) """ That is a possibility, although currently that has well defined meaning: try to update the object with pk ('Matti', 'Kääriäinen'), error if it does not exist in the DB. """ (specifically, with the force_update parameter being required for a PK change). Then, as long as we store the original PK values, the object can be updated in place. A bare save() would work just as currently changing the id field does -- create a new row if possible, otherwise, update the row whose PK matches the new values. """ IMHO forbidding creation of a new object while leaving the old object in place when calling save() is needed. Current behavior is unintuitive. One clear indication of this being unintuitive is that even Django's admin does not get it right. If bare save() will be deprecated, then an upgrade path for current uses is needed. A new kwarg for save (old_pk=None would be a possibility) or obj.clone() would be needed. Solving all these problems before 1.4 seems hard. - Anssi -- You received this message because you are subscribed to the Google Groups "Django developers" group. To post to this group, send email to django-developers@googlegroups.com. To unsubscribe from this group, send email to django-developers+unsubscr...@googlegroups.com. For more options, visit this group at http://groups.google.com/group/django-developers?hl=en.
RE: Deprecate change pk + save behavior for 1.4
""" > The reason for doing the deprecation now is that it would be nice that > this behavior is already removed when multicolumn primary keys are > introduced into Django. > > There is a ticket related to this: #2259. Here is another that could be helped by this change, depending on implementation - #14615 The decisions on that ticket basically boils down to the question of how we detect a new object (which is waiting for PK from the DB). The current solution of comparing with None (used in various places) fails for nullable primary keys. """ I can think of two basic approaches to this: define a __setattr__ for Models, and check if the pk is set after fetch from DB. This has at least three problems: 1. It is likely that users have custom __setattr__ methods that do not use super().__setattr__ but change the dict directly. 2. This way it is somewhat hard to detect if the PK has actually changed or not. You can (and many users likely currently do) set the value to the same value it is already. 3. This will make model __init__ slower (although there are tricks to mitigate this effect). The other way is storing old_pk in model._state, and compare the PK to that when saving. If changed, error. This would work best if there was a NoneMarker object for the cases where there is no PK from DB, so you could solve #14615 easily, too. This could result in somewhat larger memory usage. Although normally you could store the same string (or other object) in db_pk as you store in the __dict__ of the model. This would mean minimal memory overhead unless you change a lot of PKs in one go. Are there problematic (mutable object based) model fields, where you would need to store a copy of the field's value? We could possibly have an attribute "mutable object based field" for the problematic fields... One way to mitigate the speed effect is use of AST for model init. I have done some experiments about this, see: https://github.com/akaariai/ast_model. That does come with its own problems, but if templates are going to be using AST, then we could use it in other places needing speedups, too. - Anssi -- You received this message because you are subscribed to the Google Groups "Django developers" group. To post to this group, send email to django-developers@googlegroups.com. To unsubscribe from this group, send email to django-developers+unsubscr...@googlegroups.com. For more options, visit this group at http://groups.google.com/group/django-developers?hl=en.
RE: Allowing models to influence QuerySet.update
""" H, that is not ideal behavior. You mean QuerySet.delete() calls the signal for each deleted object, rather than doing a delete at the database level? """ This might not be exactly accurate, but I think it goes something like this: - Fetch all the to-be deleted objects (one query) - Check if there are cascades for those objects, fetch the cascades (one query per cascaded Model class?) - Send pre_delete signals for all deleted instances - Do the delete as one query for the to-be deleted objects, and then one query(?) per cascade Model class - Send post_delete signals Now, this is not that inefficient - but it would be a good optimization to NOT fetch the instances if there are no listeners for pre/post delete signals and there are no cascades (or all cascades are DO_NOTHING). Even if there are cascades, you could fetch just PKs of the to-be deleted models (even that is not actually needed, as you can use joins) Again: I am not 100% sure how this behaves... - Anssi -- You received this message because you are subscribed to the Google Groups "Django developers" group. To post to this group, send email to django-developers@googlegroups.com. To unsubscribe from this group, send email to django-developers+unsubscr...@googlegroups.com. For more options, visit this group at http://groups.google.com/group/django-developers?hl=en.
RE: Allowing models to influence QuerySet.update
""" I also noticed in db optimization docs that we have explicitly documented update() and delete() as bypassing signals, and I think we should honour that. https://docs.djangoproject.com/en/dev/topics/db/optimization/#use-queryset-update-and-delete """ Is this correct for delete? A quick test (A1 is a model which I have hanging around - details about it aren't important): from django.db.models.signals import post_delete def foo(*args, **kwargs): print args, kwargs post_delete.connect(foo, sender=A1) A1(dt=datetime.now()).save() A1.objects.all().delete() Result: () {'instance': , 'signal': , 'sender': , 'using': 'default'} Search post_delete in django/db/models/deletion.py. Signals seem to be sent, even for cascaded deletion. Personally I don't think post/pre instance changed signals are the way to go if you want to do auditing. DB triggers are much more reliable. Some problems with the Django signals: - All operations do not send signals (bulk_create could easily send signals, the instances are available directly in that case, even bulk update could send signals per instance - first check if there is a listener, if there is, fetch all the updated instances and send signals, if there isn't, then don't fetch the instances. You will only pay the price when needed. Not saying this is a great idea, but maybe worth a thought). - Proxy (including deferred models) and multitable-inherited models do not send signals as you would expect. I have groundwork for how to implement fast inherited signals in ticket #16679. The patch in that ticket also makes model __init__ much faster in certain common-enough cases. On the other hand, yet another cache. - If you do anything outside Django Models (raw SQL, dbshell, another application accessing the DB) your auditing will not work. - The DB triggers approach is faster. The downside is that you will be programming DB specific triggers, in DB-specific language. Schema upgrades are a nightmare. - Anssi Kääriäinen -- You received this message because you are subscribed to the Google Groups "Django developers" group. To post to this group, send email to django-developers@googlegroups.com. To unsubscribe from this group, send email to django-developers+unsubscr...@googlegroups.com. For more options, visit this group at http://groups.google.com/group/django-developers?hl=en.
RE: queryset caching note in docs
""" so, summarizing again: - mysql supports chunked fetch but will lock the table while fetching is in progress (likely causing deadlocks) - postgresql does not seem to suffer this issue and chunked fetch seems doable (not trivial) using named cursor - oracle does chunked fetch already (someone confirm this, please) - sqlite3 COULD do chunked fetch by using one connection per cursor (otherwise cursors will not be isolated) """ I did a little testing. It seems you can get the behavior you want if you just do this in PostgreSQL: for obj in Model.objects.all().iterator(): # Note the extra .iterator() # handle object here. What is happening? Django correctly uses cursor.fetchmany(chunk_size) in models/sql/compiler.py. The chunk_size is hardcoded to 100. The problem is in db/models/query.py, and its __iter__ method. __iter__ will keep self._results_cache, and that is where the memory is consumed. Changing that is not wise, as in many cases you do want to keep the results around. The .iterator() call will skip the __iter__ and directly access the underlying generator. You can also do objects.all()[0:10].iterator() and objects are correctly fetched without caching. Here is a printout from my tests. The memory report is the total process memory use: Code: i = 0 for obj in User.objects.all()[0:10]: i += 1 if i % 1000 == 0: print memory() 25780.0kB 26304.0kB 26836.0kB 27380.0kB 27932.0kB 28468.0kB 29036.0kB 29580.0kB 29836.0kB 30388.0kB And then: i = 0 for obj in User.objects.all()[0:10].iterator(): i += 1 if i % 1000 == 0: print memory() 25216.0kB 25216.0kB 25216.0kB 25216.0kB 25216.0kB 25216.0kB 25216.0kB 25216.0kB 25216.0kB 25216.0kB This would be worth documenting, with maybe a better named method wrapping the .iterator(). I have no ideas for a better name, though. I would sure like a verification to this test, I am tired and this seems like too easy of an fix. Or am I missing the problem? - Anssi -- You received this message because you are subscribed to the Google Groups "Django developers" group. To post to this group, send email to django-developers@googlegroups.com. To unsubscribe from this group, send email to django-developers+unsubscr...@googlegroups.com. For more options, visit this group at http://groups.google.com/group/django-developers?hl=en.
RE: Yet another __ne not equal discussion
""" The exclude() option in its current form is unworkable on multi-valued relations. I'd like to repeat that for emphasis: exclude() can *never* obsolete direct negative lookups for multi-value relations. """ I do see a problem here: the equality ~Q(a=1) <-> Q(a__lt=1)|Q(a__gt=1) is not correct in m2m situations: the first is asking for rows where a must not be 1, most of all, if there is no a, it is a match. The other is asking for rows where there must be an A value, and it must be < 1 or > 1, most of all, if there is no value at all, it is NOT a match. So: filter(~Q(a=1), a__isnull=False) <-> Q(a__lt=1)|Q(a__gt=1). The ORM is not able to handle the first version correctly. The interpretation would be that there is at least one 'a' row, and its value is not 1. I am strongly against the idea that Q(a__neq=1) would have different interpretation of ~Q(a__eq=1). If they would have different interpretation, then there would be basis for negative lookups. Although AFAICS you could still get the same results using ~Q(a__eq=1, a__isnull=False) so the API would still work without negative lookups. I am basing the following discussion on the assumption that a__neq and ~a__eq should be the same thing. >From ORM API standpoint, the claim that .exclude() can never obsolete direct negative lookups is wrong as far as I understand the problem. Reason: .filter(Q(__neq=val)) <-> .filter(~Q(__exact=val)) <-> .exclude(Q(__exact=val)) Another way to see this is that Django should return same results for the queries: filter(~Q(employment__school__site_name='RAE'), employment__end_date=None) and filter(employment__school__site_name__neq='RAE', employment__end_date=None) However, I do not think your issue is due to the above equality between the two different ways of writing ~__eq problem, it is due to a bug in ORM implementation. The second filter condition is not pushed down to the subquery generated by the negated Q condition, and thus it generates another join and potentially targets different rows. I think this is the main problem in your situation. This is reinforce by this snippet from your query: WHERE ( NOT `data_staff`.`id` IN ( subquery data_employment U1) -- different data_employment reference from the subquery AND `data_employment`.`end_date` IS NULL ) That is, you have the data_employment table two times in the query, and thus the filters are targeting potentially different rows. Note that this is a bug. The conditions are in the same .filter() call, and thus they should target the same row! IMHO There are two underlying problems in the ORM related to this matter, one is detecting when to use a subquery for the filter condition. The logic for that is easily fooled. Another problem is that if you do a subquery, other conditions that should go into the subquery WHERE are sometimes not correctly pushed down to the subquery clause. This is similar to HAVING clause pushdown problem. I must say the m2m handling is very complicated, it took some time to see the ~__eq=1 <-> __lt=1|__gt=1 difference for example... Thus, it is likely that I am missing something else, too. - Anssi -- You received this message because you are subscribed to the Google Groups "Django developers" group. To post to this group, send email to django-developers@googlegroups.com. To unsubscribe from this group, send email to django-developers+unsubscr...@googlegroups.com. For more options, visit this group at http://groups.google.com/group/django-developers?hl=en.
RE: Yet another __ne not equal discussion
Quote: """ It's also worth noting that Q() objects permit the unary negation operator, but this also yields the undesired results of the exclude() call: Blog.objects.filter(~Q(entry__author_count=2), entry__tag__name='django') """ As far as I understand, this is exactly the query you want. The filters are treated as single call, that is, they should target the same row, not possibly different rows of the multijoin. It is another matter if it actually works in current ORM implementation. IIRC something like filter(~Q(pk=1)) and .exclude(Q(pk=1)) can produce different results. But they _should_ produce the same result, and if they do not, introducing negated lookups isn't the way to fix this - the correct thing to do is fixing the ORM. - Anssi -- You received this message because you are subscribed to the Google Groups "Django developers" group. To post to this group, send email to django-developers@googlegroups.com. To unsubscribe from this group, send email to django-developers+unsubscr...@googlegroups.com. For more options, visit this group at http://groups.google.com/group/django-developers?hl=en.
RE: The state of per-site/per-view middleware caching in Django
I do not know nearly enough about caching to participate fully in this discussion. But it strikes me that the attempt to have CSRF protected anonymous page cached is not that smart. If you have an anonymous submittable form, why bother with CSRF protection? I mean, what is it protecting against? Making complex arrangements in the caching layer for this use case seems like wasted effort. Or am I missing something obvious? The following is from the stupid ideas department: Maybe there could be a "reverse cache" template tag, such that you would mark the places where you want changing content as non-cacheable. You would need two views for this, one which would construct the "base content" and then another which would construct the dynamic parts. Something like: page_cached.html: ... expensive to generate content ... {% block "login_logout" non_cacheable %} {% endblock %} ... expensive to generate content ... You would generate the base page by a cached render view: def page_view_cached(request, id): if cached(id): return cached_content else: ... expensive queries ... return cached_render("page_cached.html", context, ...) The above view would not be directly usable at all, you would need to use a wrapper view which would render the non-cacheable parts: def page_view(request, id): # Below would return quickly from cache most of the time cached_portions = page_view_cached(request, id) return render_to_response("page.html", context={cached: cached_portions, user:request.user}) where page.html would be: {% extends cached %} {% block login_logout %} {% if user.is_authenticated %} Hello, user! {% else %} login {% endif %} {% endblock %} That seems to be what is really wanted in this situation. The idea is quite simply to extend the block syntax to caching. A whole another issue is how to make this easy enough to be actually usable, and fast enough to be actually worth it. - Anssi From: django-developers@googlegroups.com [django-developers@googlegroups.com] On Behalf Of Jim Dalton [jim.dal...@gmail.com] Sent: Friday, October 21, 2011 16:02 To: django-developers@googlegroups.com Subject: Re: The state of per-site/per-view middleware caching in Django On Oct 20, 2011, at 6:02 PM, Carl Meyer wrote: > -BEGIN PGP SIGNED MESSAGE- > Hash: SHA1 > > Hi Jim, > > This is a really useful summary of the current state of things, thanks > for putting it together. > > Re the anonymous/authenticated issue, CSRF token, and Google Analytics > cookies, it all boils down to the same root issue. And Niran is right, > what we currently do re setting Vary: Cookie is what we have to do in > order to be correct with respect to HTTP and upstream caches. For > instance, we can't just remove Vary: Cookie from unauthenticated > responses, because then upstream caches could serve that unauthenticated > response to anyone, even if they are actually authenticated. > > Currently the Django page caching middleware behaves pretty much just > like an upstream cache in terms of the Vary header. Apart from the > CACHE_MIDDLEWARE_ANONYMOUS_ONLY setting, it just looks at the response, > it doesn't make use of any additional "inside information" about what > your Django site did to generate that response in order to decide what > to cache and how to cache it. > > This approach is pretty attractive, because it's conceptually simple, > consistent with upstream HTTP caching, and conservative (quite unlikely > to serve the wrong cached content). > > It might be possible to make it "smarter" in certain cases, and allow it > to cache more aggressively than an upstream cache can. #9249 is one > proposal to do this for cookies that aren't used on the server, either > via explicit setting or (in a recently-added proposal) via tracking > which cookie values are accessed. If we did that, plus special-cased the > session cookie if the user is unauthenticated and the session isn't used > outside of contrib.auth, I think that could possibly solve the > unauthenticated-users and GA issues. > > However, this (especially the latter) would come with the cost of making > the cache middleware implementation more fragile and coupled to other > parts of the framework. And it still doesn't help with CSRF, which is a > much tougher nut to crack, because every response for pages using CSRF > come with a Set-Cookie header and probably with a CSRF token embedded in > the response content; and those both mean that response really can't be > re-used for anyone else. (Getting rid of the token embedded in the HTML > means forms couldn't ever POST without JS help, which is not an option > as the documented default approach). You can mark some form-using views > that are available to anonymous users as csrf-exempt, which exposes you > potentially to CSRF-based spam, but isn't a security issue if you aren't > treating authenticated submissions any differently from
RE: Removing pickle from cookie-based session storage
Ok, sorry for the uninformed rambling... Will check the code before posting next time :) - Anssi -- You received this message because you are subscribed to the Google Groups "Django developers" group. To post to this group, send email to django-developers@googlegroups.com. To unsubscribe from this group, send email to django-developers+unsubscr...@googlegroups.com. For more options, visit this group at http://groups.google.com/group/django-developers?hl=en.
RE: Removing pickle from cookie-based session storage
Forgetaboutit, the exact same problem is there for every session backend. This btw means that having write access to django_session table means exploit of all Django instances using that DB, right? """ Isn't there also the possibility that the attacker can somehow get arbitrary data signed into the session cookie without knowing SECRET_KEY? This could be due to a bug in the session framework or the developer does something really stupid. If this would be the case, then the bug would result in remote code execution exploit instead of the user being able to manipulate his session. Which sounds kinda scary. If this is not changed to use JSON, there must be a warning that if the attacker can somehow change the contents of the cookie while keeping it signed, this results in remote exploit. One such way is knowing the SECRET_KEY. My feeling is that this should be changed. - Anssi """ -- You received this message because you are subscribed to the Google Groups "Django developers" group. To post to this group, send email to django-developers@googlegroups.com. To unsubscribe from this group, send email to django-developers+unsubscr...@googlegroups.com. For more options, visit this group at http://groups.google.com/group/django-developers?hl=en.
RE: Removing pickle from cookie-based session storage
""" As I said in the first message, to the best of my knowledge, there's nothing insecure about the implementation now. The usage of signing to validate pickles received directly by end users expands our reliance on SECRET_KEY pretty heavily. This concerns me, which is why I brought it up here. """ Isn't there also the possibility that the attacker can somehow get arbitrary data signed into the session cookie without knowing SECRET_KEY? This could be due to a bug in the session framework or the developer does something really stupid. If this would be the case, then the bug would result in remote code execution exploit instead of the user being able to manipulate his session. Which sounds kinda scary. If this is not changed to use JSON, there must be a warning that if the attacker can somehow change the contents of the cookie while keeping it signed, this results in remote exploit. One such way is knowing the SECRET_KEY. My feeling is that this should be changed. - Anssi -- You received this message because you are subscribed to the Google Groups "Django developers" group. To post to this group, send email to django-developers@googlegroups.com. To unsubscribe from this group, send email to django-developers+unsubscr...@googlegroups.com. For more options, visit this group at http://groups.google.com/group/django-developers?hl=en.
RE: Removing pickle from cookie-based session storage
We recently committed changes to 1.4 that added signed cookie based session storage. Session data is pickled, signed, and sent to the client as a cookie. On receipt of the cookie, we check the signature, unpickle, and use the data. We could use JSON instead of pickle, at the expense of longer cookies. I believe that our signing implementation is secure and correct. However, I know that users of Django screw up from time to time. It's not uncommon to see SECRET_KEY in a git repository, and that value is often used in production. If SECRET_KEY is compromised, an attacker can sign arbitrary cookie data. The use of pickle changes an attack from "screw up the data in this application" to "arbitrary remote code execution". In light of this, we should be conservative and use JSON by default instead of pickle. """ If the size of the cookie turns out to be a problem, using compressed JSON instead of JSON is a very simple change. I tested on my crummy old laptop, and using zlib one can compress + decompress roughly 5000 short strings in a second. On reasonable hardware I guess that figure will be 1-3 per thread. In the limit, when the compressed size is around 4Kb, one can compress about 500 strings a second (or 1000-3000 on reasonable hardware). So, this could cause some performance concerns in extreme cases, but probably not enough to worry about. The test program is simple: import bz2 from datetime import datetime import json import random import zlib nums = [random.randint(0, 10) for _ in range(0, 1000)] var = json.dumps({'nums': nums}) start = datetime.now() for i in range(0, 1000): compressed = zlib.compress(var) uncompressed = zlib.decompress(compressed) print datetime.now() - start print len(var) print len(compressed) Note that when compressing random integers, one will still get over 50% compression. On more realistic data, the compression should be more. - Anssi -- You received this message because you are subscribed to the Google Groups "Django developers" group. To post to this group, send email to django-developers@googlegroups.com. To unsubscribe from this group, send email to django-developers+unsubscr...@googlegroups.com. For more options, visit this group at http://groups.google.com/group/django-developers?hl=en.
RE: [NoSQL] Sub-object queries / refactoring JOIN syntax into model fields
I think that lookup separator syntax is definitely the right approach here. The implementation should modify setup_joins. I think the cleanest approach would be to detect that the current lookup part leads to a field with subfields in either this part of the code: else: # Non-relation fields. target = field break Or directly after the for loop in setup_joins, around this line: if pos != len(names) - 1: In either case you would want to check if the current field supports subfields and then continue to do the dirty details of actually descending into the subfields and returning the results to add_filter. I wonder if you would need additional flag to setup_joins to indicate if subfield queries are allowed, in case somebody else than add_filters is the caller. That would be only the tenth parameter... There is still the question if this should be included in core. I am in no position to answer that. All I can say a ListField (or ArrayField) would be useful in SQL land, too. - Anssi _ From: django-developers@googlegroups.com [django-developers@googlegroups.com] On Behalf Of Jonas H. [jo...@lophus.org] Sent: Wednesday, September 28, 2011 01:52 To: django-developers Subject: [NoSQL] Sub-object queries / refactoring JOIN syntax into model fields Hallöchen, some non-relational databases (e.g. MongoDB) have support for arbitrarily nested objects. To make queries that "reach" into these sub-objects, the Django-nonrel developers find it appealing to use JOIN syntax. For instance, if you had this person in your database {'name': 'Bob', 'address': {'city': 'NY', 'street': 'Wall Street 42'}} you could find Bob using these queries: Person.objects.filter(name='Bob') Person.objects.filter(address__city='NY') Person.objects.filter(address__street__startswith='Wall') ... Similarly, sub-objects may be stored in a list, like so: { 'votes': [ {'voter': 'Bob', 'vote': 42}, {'voter': 'Ann', 'vote': 3.14}} ] } Vote.objects.filter(votes__vote__gt=2) ... These sub-object queries are essential for non-relational databases to be really useful so this is an important feature. What's the core team's opinion on this topic -- is there any chance to get something like that into Django at all? (Maybe you think two meanings for one syntax cause too much confusion) Secondly, how could this be implemented? I thought about refactoring JOIN syntax handling into the model fields (as little logic as required; refactoring the actual hardcore JOIN generation code seems like an impossible task for anyone but the original author)... any other ideas? So far, Jonas -- You received this message because you are subscribed to the Google Groups "Django developers" group. To post to this group, send email to django-developers@googlegroups.com. To unsubscribe from this group, send email to django-developers+unsubscr...@googlegroups.com. For more options, visit this group at http://groups.google.com/group/django-developers?hl=en. -- You received this message because you are subscribed to the Google Groups "Django developers" group. To post to this group, send email to django-developers@googlegroups.com. To unsubscribe from this group, send email to django-developers+unsubscr...@googlegroups.com. For more options, visit this group at http://groups.google.com/group/django-developers?hl=en.