Re: Django ORM performance patch. Fixes #5420, #5768
Per my patches and what I believe Malcom has done with qsrf, the syntax for related fields was field__related__name, and yes, you would repeat this over and over, but it keeps it consistant. I would like a values-like method to return partial objects, and then requesting any field thats not "filled" would hit the db in some fashion, but that's another story. On Feb 15, 3:42 am, Dima Dogadaylo <[EMAIL PROTECTED]> wrote: > On 15 Feb, 12:30, Malcolm Tredinnick <[EMAIL PROTECTED]> wrote: > > > When Adrian > > proposed that API, he realised that almost always you're going to be > > pulling back all of the fields or almost all of them. Once a database > > has read a row to access some of the data, accessing all of the data in > > the row is close to zero overhead. It's already had to read the block > > off disk, for example, which is many kilobytes in size. Only for either > > extremely large fields or for fields that need a large amount of > > post-processing to be useful (such as polygon fields) is it really a win > > to micro-manage what is pulled that. > > > For most non-trivial SQL queries with modern databases, constructing > > (extracting) the output columns, unless they are complex computations, > > is a very small fraction of the timeslice devote to the query execution. > > It's true but not all true. > > Often it's have sense to exclude even small non-used fields, because > when you do joins with GROUP BY or similar for final 100 rows you may > have temporary in-memory or even disc-table from 100*100 rows and size > of each rows really matters. > > Then not only db-stage consume resources, but also network bandwidth > is consumed, python database adapter consumes a lot on converting raw > char stream to never used python objects, memory is wasted. It's > extremely important on limited VPS-hostings. > > However, IMHO such technical details isn't important because ORM > should not assume that things are "good enough". With ideal ORM I > should be able to do _all_ that I can do with raw SQL. Everyone needs > are special, same ORM will be good for small project but unacceptable > for big project if ORM assumes that "this" and "those" limitations are > acceptable for all. > > >> That's why the API is to specify what to exclude, rather than what to > > include. I tend to agree with the proposal, too. From a performance > > management perspective, it's going to be easier to use that way. > > I agree that in many cases exclude() may be useful and probably it's > worth to add it also, but as I already wrote in my blog post, this > patch come from real experience when you don't have "tutorial-like" > models and when there is conscious data redundancy in many tables. > It's typical task to show object_list and object_detail for any model > and for each object_list I know exactly what fields I need and usually > it's 3-5 fields from model itself and 2-3 fields from related models. > I strongly believe that API should give me ability to receive only > those fields even if all database severs will work with speed of > light. > > > This isn't "standard Django lookup syntax", though. It's some entirely > > new lookup syntax you created. There's nowhere in the current lookup > > hierarchy that uses this method for referring to fields in another > > model. Reusing the existing syntax makes more sense. It's less for > > people to learn and it's less restrictive: with the double-underscore > > syntax, I can list the fields in any order, since they're all positional > > arguments. Your syntax has normal fields as positional arguments (so > > they must come first) and related fields as keyword arguments that take > > a sequence. > > I sure you know Django lookup syntax much better than me, but IMHO > your proposition violates DRY, because if I need to select several > fields from related model I need to repeat and repeat > "user__address__fieldname". Your proposition is close to original SQL > syntax of SELECT statement and due to this issue your proposition > should be accepted probably even if it violates DRY but keeps > conceptual integrity with underlaying SQL level. > > I think it's only several lines task to change syntax from > fields(*fields, **related_fields) to fields(*args). So if sometimes > core developers change own mind about fields() method I can add to my > patch plain lookup syntax support. --~--~-~--~~~---~--~~ You received this message because you are subscribed to the Google Groups "Django developers" group. To post to this group, send email to django-developers@googlegroups.com To unsubscribe from this group, send email to [EMAIL PROTECTED] For more options, visit this group at http://groups.google.com/group/django-developers?hl=en -~--~~~~--~~--~--~---
Re: Django ORM performance patch. Fixes #5420, #5768
On 15 Feb, 12:30, Malcolm Tredinnick <[EMAIL PROTECTED]> wrote: > When Adrian > proposed that API, he realised that almost always you're going to be > pulling back all of the fields or almost all of them. Once a database > has read a row to access some of the data, accessing all of the data in > the row is close to zero overhead. It's already had to read the block > off disk, for example, which is many kilobytes in size. Only for either > extremely large fields or for fields that need a large amount of > post-processing to be useful (such as polygon fields) is it really a win > to micro-manage what is pulled that. > > For most non-trivial SQL queries with modern databases, constructing > (extracting) the output columns, unless they are complex computations, > is a very small fraction of the timeslice devote to the query execution. It's true but not all true. Often it's have sense to exclude even small non-used fields, because when you do joins with GROUP BY or similar for final 100 rows you may have temporary in-memory or even disc-table from 100*100 rows and size of each rows really matters. Then not only db-stage consume resources, but also network bandwidth is consumed, python database adapter consumes a lot on converting raw char stream to never used python objects, memory is wasted. It's extremely important on limited VPS-hostings. However, IMHO such technical details isn't important because ORM should not assume that things are "good enough". With ideal ORM I should be able to do _all_ that I can do with raw SQL. Everyone needs are special, same ORM will be good for small project but unacceptable for big project if ORM assumes that "this" and "those" limitations are acceptable for all. >> That's why the API is to specify what to exclude, rather than what to > include. I tend to agree with the proposal, too. From a performance > management perspective, it's going to be easier to use that way. I agree that in many cases exclude() may be useful and probably it's worth to add it also, but as I already wrote in my blog post, this patch come from real experience when you don't have "tutorial-like" models and when there is conscious data redundancy in many tables. It's typical task to show object_list and object_detail for any model and for each object_list I know exactly what fields I need and usually it's 3-5 fields from model itself and 2-3 fields from related models. I strongly believe that API should give me ability to receive only those fields even if all database severs will work with speed of light. > This isn't "standard Django lookup syntax", though. It's some entirely > new lookup syntax you created. There's nowhere in the current lookup > hierarchy that uses this method for referring to fields in another > model. Reusing the existing syntax makes more sense. It's less for > people to learn and it's less restrictive: with the double-underscore > syntax, I can list the fields in any order, since they're all positional > arguments. Your syntax has normal fields as positional arguments (so > they must come first) and related fields as keyword arguments that take > a sequence. I sure you know Django lookup syntax much better than me, but IMHO your proposition violates DRY, because if I need to select several fields from related model I need to repeat and repeat "user__address__fieldname". Your proposition is close to original SQL syntax of SELECT statement and due to this issue your proposition should be accepted probably even if it violates DRY but keeps conceptual integrity with underlaying SQL level. I think it's only several lines task to change syntax from fields(*fields, **related_fields) to fields(*args). So if sometimes core developers change own mind about fields() method I can add to my patch plain lookup syntax support. --~--~-~--~~~---~--~~ You received this message because you are subscribed to the Google Groups "Django developers" group. To post to this group, send email to django-developers@googlegroups.com To unsubscribe from this group, send email to [EMAIL PROTECTED] For more options, visit this group at http://groups.google.com/group/django-developers?hl=en -~--~~~~--~~--~--~---
Re: Django ORM performance patch. Fixes #5420, #5768
On Fri, 15 Feb 2008 21:30:23 +1100 Malcolm Tredinnick <[EMAIL PROTECTED]> wrote: > There's also the fairly pragmatic side-effect that if you are only > pulling back a very few fields, there isn't really a lot your model > methods are going to be able to do without loading more data, unless > they're very simple. You might as well just write those methods as > functions which take a dictionary and use values() for the cases when > only two or three items are needed. I agree completely, as it is nearly the same I said in my last comment on this thread. > > > That QuerySet.values can not select values from related fields, > > > annoys myself, too. But your implementation is a bit restricted. > > > Here the example from your blog: > > > > > > entries = Entry.objects.values('headline', > > > user=('username',))[:10] > > > > > > A syntax as following would be more straight forward: > > > > > > entries = Entry.objects.values('headline', 'user__username')[:10] > > > > I used standard Django lookup syntax, but because in trunk Djnago > > don't support selecting of related fields, I thought it's better to > > respect DRY principe and write: > > entries = Entry.objects.values('headline', user__address=('country', > > 'city', 'zip')) > > This isn't "standard Django lookup syntax", though. It's some entirely > new lookup syntax you created. There's nowhere in the current lookup > hierarchy that uses this method for referring to fields in another > model. Reusing the existing syntax makes more sense. It's less for > people to learn and it's less restrictive: with the double-underscore > syntax, I can list the fields in any order, since they're all > positional arguments. Your syntax has normal fields as positional > arguments (so they must come first) and related fields as keyword > arguments that take a sequence. > > > Also please note, currently lookups with 2 underscores are used only > > in **kwargs, because it don't have many sense as quoted string (even > > current QuerySet.order_by() uses '.' instead of '__' as separator). > > Well, that information's a little old (if you're going to be writing > code in the query construction are of Django, you also need to keep up > with the branch work that's rewriting large portions of that area; > that was already mentioned in the related-values ticket before you > started working on it). Let's not use the current cross-model > ordering syntax as an example. It's a bit of a wart in the design > (Database table name + model field attribute name... it's mixing > apples and oranges). In queryset-refactor, we've changed that > (order-by) use double-underscores and only field names, just as in > filters. One of the reasons for this was consistency. > > I think, as Sebastian suggested (and I mentioned in the comment to the > ticket on related values lookup a few months bcak), we should also use > double underscores for selecting related values. That makes everything > nice and consistent across the board. In my previous message, I said that I agree with Dima's comment, but that was just because I trusted that it is true that this syntax is already used in django, without looking at the source. I don't care much, how the syntax should look like, but it have to enable selecting related fields from related fields and it should be the same syntax used in the rest of django's API. Regards Sebastian Noack --~--~-~--~~~---~--~~ You received this message because you are subscribed to the Google Groups "Django developers" group. To post to this group, send email to django-developers@googlegroups.com To unsubscribe from this group, send email to [EMAIL PROTECTED] For more options, visit this group at http://groups.google.com/group/django-developers?hl=en -~--~~~~--~~--~--~---
Re: Django ORM performance patch. Fixes #5420, #5768
On Fri, 15 Feb 2008 01:54:24 -0800 (PST) Dima Dogadaylo <[EMAIL PROTECTED]> wrote: > > Of course, this decreases database load, but IMHO it is better to > > use QuerySet.values if you want select just certain values from a > > model. > > Often I need methods defined in models and isn't available at > dictionaries. > For example get_avatar_url(). These methods might change and don't will work anymore, if the now required fields are not selected by QuerySet.fields. So you have to look for all occurrences in your code and change the parameter list of QuerySet.fields, after figuring out which fields are required. This is why I think it would be more straight forward to write your functions independent from the model and pass the required values (as retrieved by QuerySet.values) as separate arguments if you are going to avoid selecting more data than required. But on the other hand it is ugly to extract logic which relates to a model from the model. But an ORM always has performance drawbacks in favor of strict design. So I would rathe omit this performance feature in favor of strict desgings, since incomplete models as created by QuerySet.fields might introduce as many problems as the performance increase. > I used standard Django lookup syntax, but because in trunk Djnago > don't support selecting of related fields, I thought it's better to > respect DRY principe and write: > entries = Entry.objects.values('headline', user__address=('country', > 'city', 'zip')) > > than > entries = Entry.objects.values('headline', 'user__address__country', > 'user__address__city', 'user__address__zip'). > > Also please note, currently lookups with 2 underscores are used only > in **kwargs, because it don't have many sense as quoted string (even > current QuerySet.order_by() uses '.' instead of '__' as separator). So > from design and usability point of view my form of selecting related > fields is better, I think. Ok, I agree. I would like to getting this feature merged in trunk, too. Regards Sebastian Noack --~--~-~--~~~---~--~~ You received this message because you are subscribed to the Google Groups "Django developers" group. To post to this group, send email to django-developers@googlegroups.com To unsubscribe from this group, send email to [EMAIL PROTECTED] For more options, visit this group at http://groups.google.com/group/django-developers?hl=en -~--~~~~--~~--~--~---
Re: Django ORM performance patch. Fixes #5420, #5768
On Fri, 2008-02-15 at 01:54 -0800, Dima Dogadaylo wrote: > On 14 Feb, 17:26, Sebastian Noack <[EMAIL PROTECTED]> > wrote: > > > > Of course, this decreases database load, but IMHO it is better to use > > QuerySet.values if you want select just certain values from a model. > > Often I need methods defined in models and isn't available at > dictionaries. > For example get_avatar_url(). This is all very reasonable, but one of the problems with your patch is that you've approached it from the reverse end of what was suggested in #5420 and #5420 is written the way it is for a reason. When Adrian proposed that API, he realised that almost always you're going to be pulling back all of the fields or almost all of them. Once a database has read a row to access some of the data, accessing all of the data in the row is close to zero overhead. It's already had to read the block off disk, for example, which is many kilobytes in size. Only for either extremely large fields or for fields that need a large amount of post-processing to be useful (such as polygon fields) is it really a win to micro-manage what is pulled that. For most non-trivial SQL queries with modern databases, constructing (extracting) the output columns, unless they are complex computations, is a very small fraction of the timeslice devote to the query execution. That's why the API is to specify what to exclude, rather than what to include. I tend to agree with the proposal, too. From a performance management perspective, it's going to be easier to use that way. There's also the fairly pragmatic side-effect that if you are only pulling back a very few fields, there isn't really a lot your model methods are going to be able to do without loading more data, unless they're very simple. You might as well just write those methods as functions which take a dictionary and use values() for the cases when only two or three items are needed. If you have a case that doesn't fit this, the standard data modelling technique is you move your very heavyweight fields into another model/table, so the lightweight stuff doesn't get dragged down by also pulling back the heavyweight stuff. That's always going to be more effective than trying to only select a small fraction of the available columns in a row. > > That QuerySet.values can not select values from related fields, annoys > > myself, too. But your implementation is a bit restricted. Here > > the example from your blog: > > > > entries = Entry.objects.values('headline', user=('username',))[:10] > > > > A syntax as following would be more straight forward: > > > > entries = Entry.objects.values('headline', 'user__username')[:10] > > I used standard Django lookup syntax, but because in trunk Djnago > don't support selecting of related fields, I thought it's better to > respect DRY principe and write: > entries = Entry.objects.values('headline', user__address=('country', > 'city', 'zip')) This isn't "standard Django lookup syntax", though. It's some entirely new lookup syntax you created. There's nowhere in the current lookup hierarchy that uses this method for referring to fields in another model. Reusing the existing syntax makes more sense. It's less for people to learn and it's less restrictive: with the double-underscore syntax, I can list the fields in any order, since they're all positional arguments. Your syntax has normal fields as positional arguments (so they must come first) and related fields as keyword arguments that take a sequence. > Also please note, currently lookups with 2 underscores are used only > in **kwargs, because it don't have many sense as quoted string (even > current QuerySet.order_by() uses '.' instead of '__' as separator). Well, that information's a little old (if you're going to be writing code in the query construction are of Django, you also need to keep up with the branch work that's rewriting large portions of that area; that was already mentioned in the related-values ticket before you started working on it). Let's not use the current cross-model ordering syntax as an example. It's a bit of a wart in the design (Database table name + model field attribute name... it's mixing apples and oranges). In queryset-refactor, we've changed that (order-by) use double-underscores and only field names, just as in filters. One of the reasons for this was consistency. I think, as Sebastian suggested (and I mentioned in the comment to the ticket on related values lookup a few months bcak), we should also use double underscores for selecting related values. That makes everything nice and consistent across the board. Regards, Malcolm -- Many are called, few volunteer. http://www.pointy-stick.com/blog/ --~--~-~--~~~---~--~~ You received this message because you are subscribed to the Google Groups "Django developers" group. To post to this group, send email to django-developers@googlegroups.com To unsubscribe from this group, send email to [EMAIL
Re: Django ORM performance patch. Fixes #5420, #5768
On 14 Feb, 17:26, Sebastian Noack <[EMAIL PROTECTED]> wrote: > > Of course, this decreases database load, but IMHO it is better to use > QuerySet.values if you want select just certain values from a model. Often I need methods defined in models and isn't available at dictionaries. For example get_avatar_url(). > That QuerySet.values can not select values from related fields, annoys > myself, too. But your implementation is a bit restricted. Here > the example from your blog: > > entries = Entry.objects.values('headline', user=('username',))[:10] > > A syntax as following would be more straight forward: > > entries = Entry.objects.values('headline', 'user__username')[:10] I used standard Django lookup syntax, but because in trunk Djnago don't support selecting of related fields, I thought it's better to respect DRY principe and write: entries = Entry.objects.values('headline', user__address=('country', 'city', 'zip')) than entries = Entry.objects.values('headline', 'user__address__country', 'user__address__city', 'user__address__zip'). Also please note, currently lookups with 2 underscores are used only in **kwargs, because it don't have many sense as quoted string (even current QuerySet.order_by() uses '.' instead of '__' as separator). So from design and usability point of view my form of selecting related fields is better, I think. --~--~-~--~~~---~--~~ You received this message because you are subscribed to the Google Groups "Django developers" group. To post to this group, send email to django-developers@googlegroups.com To unsubscribe from this group, send email to [EMAIL PROTECTED] For more options, visit this group at http://groups.google.com/group/django-developers?hl=en -~--~~~~--~~--~--~---
Re: Django ORM performance patch. Fixes #5420, #5768
> I made a patch for Django to add QuerySet.fields(*fields, > **related_fields) and make possible to load only some from master and > related models fields. It allows to tune various object list queries > when we need only limited subset of all fields, improve general > performance and decrease database load. Of course, this decreases database load, but IMHO it is better to use QuerySet.values if you want select just certain values from a model. > As side effect of this patch > support of selecting fields from related models in QuerySet.values() > is implemented too. It was changed signature of this method from > values(*fields) to values(*fields, **related_fields) but the change is > backward compatible. That QuerySet.values can not select values from related fields, annoys myself, too. But your implementation is a bit restricted. Here the example from your blog: entries = Entry.objects.values('headline', user=('username',))[:10] A syntax as following would be more straight forward: entries = Entry.objects.values('headline', 'user__username')[:10] At first this is the way other methods of the QuerySet as filter for example selects related fields and furthermore this would enable to select the related field of a related field, as QuerySet.filter can do. Regards Sebastian Noack --~--~-~--~~~---~--~~ You received this message because you are subscribed to the Google Groups "Django developers" group. To post to this group, send email to django-developers@googlegroups.com To unsubscribe from this group, send email to [EMAIL PROTECTED] For more options, visit this group at http://groups.google.com/group/django-developers?hl=en -~--~~~~--~~--~--~---
Re: Django ORM performance patch. Fixes #5420, #5768
On Thu, 2008-02-14 at 03:50 -0800, Entropy Hacker wrote: > I made a patch for Django to add QuerySet.fields(*fields, > **related_fields) and make possible to load only some from master and > related models fields. It allows to tune various object list queries > when we need only limited subset of all fields, improve general > performance and decrease database load. As side effect of this patch > support of selecting fields from related models in QuerySet.values() > is implemented too. It was changed signature of this method from > values(*fields) to values(*fields, **related_fields) but the change is > backward compatible. Please don't post to both django-users and django-developers (and Trac x 2). We notice when patches are put on tickets. Otherwise it gets very noisy for those of us monitoring all the sources. Regards, Malcolm -- Honk if you love peace and quiet. http://www.pointy-stick.com/blog/ --~--~-~--~~~---~--~~ You received this message because you are subscribed to the Google Groups "Django developers" group. To post to this group, send email to django-developers@googlegroups.com To unsubscribe from this group, send email to [EMAIL PROTECTED] For more options, visit this group at http://groups.google.com/group/django-developers?hl=en -~--~~~~--~~--~--~---