Re: Automatic prefetching in querysets

2017-08-16 Thread Josh Smeaton
I think there's a lot right with your suggestions here Shai. 

It delivers better default behaviour for new projects, does not affect 
existing deployments, and seems pretty easy to enable/disable selectively 
at any level of the stack.

My only concern would be libraries leaning on this behaviour by enabling it 
locally and users being unable to change it. That's only a small concern 
though, and wouldn't prevent me from recommending the proposal.

On Thursday, 17 August 2017 09:34:03 UTC+10, Shai Berger wrote:
>
> First of all, I think making the auto-prefetch feature available in some 
> form 
> is a great idea. As an opt-in, I would have made use of it on several 
> occasions, and cut days of optimization work to minutes. It's true that 
> this 
> wouldn't achieve the best possible optimization in many cases, but it 
> would be 
> close enough, for a small fraction of the effort. 
>
> But of the choices that have been suggested here, I find none quite 
> satisfactory. I agree, basically, with almost all of the objections raised 
> on 
> all sides. And I think I have a better suggestion. 
>
> Rather than a setting (global and essentially constant) or all-local 
> opt-ins, 
> suppose the feature was controlled -- for all querysets -- by a context 
> manager. 
>
> So, rather than 
>
> > "qs.prefetch_related(auto=True)", or "with qs.auto_prefetch():", 
>
> suppose we had 
>
> with QuerySet.auto_prefetch(active=True): 
>
> and under that, all querysets would auto-prefetch. Then: 
>
> - We could put that in a middleware; using that middleware would then, 
> effectively, be a global opt in. We could add the middleware -- perhaps 
> commented-out -- to the new project template, for beginner's sake. Call it 
> PerformanceTrainingWheelsMiddleware, to make people aware that it's 
> something 
> they should grow out of, but make it easily accessible. 
>
> - We could easily build a decorator for views 
>
> - We could have local opt-out, at the correct level -- not the objects, 
> but 
> the control flow. 
>
> For this to work, the flag controlling the feature would need to be 
> thread- 
> local and managed as a stack. But these aren't problems. 
>
> If (as likely) this suggestion still does not generate a concensus, I 
> would 
> prefer that we follow the path Anssi suggested -- 
>
> > Note that it should be possible to implement this fully as 3rd party 
> > project if we add an easy way to customise how related object fetching 
> is 
> > done. I'm not sure if we can add an easy way for that. 
>
> Except I don't think "easy" is a requirement here. If we can add a 
> sensible 
> way, that would be enough -- I don't expect many Django users to develop 
> implementations of the feature, only to use them. 
>
> My 2 cents, 
> Shai. 
>

-- 
You received this message because you are subscribed to the Google Groups 
"Django developers  (Contributions to Django itself)" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to django-developers+unsubscr...@googlegroups.com.
To post to this group, send email to django-developers@googlegroups.com.
Visit this group at https://groups.google.com/group/django-developers.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/django-developers/bf3e22af-9a41-4106-b986-fc136d54c4c6%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


Re: Automatic prefetching in querysets

2017-08-16 Thread Shai Berger
First of all, I think making the auto-prefetch feature available in some form 
is a great idea. As an opt-in, I would have made use of it on several 
occasions, and cut days of optimization work to minutes. It's true that this 
wouldn't achieve the best possible optimization in many cases, but it would be 
close enough, for a small fraction of the effort.

But of the choices that have been suggested here, I find none quite 
satisfactory. I agree, basically, with almost all of the objections raised on 
all sides. And I think I have a better suggestion.

Rather than a setting (global and essentially constant) or all-local opt-ins, 
suppose the feature was controlled -- for all querysets -- by a context 
manager.

So, rather than

> "qs.prefetch_related(auto=True)", or "with qs.auto_prefetch():", 

suppose we had

with QuerySet.auto_prefetch(active=True):

and under that, all querysets would auto-prefetch. Then:

- We could put that in a middleware; using that middleware would then, 
effectively, be a global opt in. We could add the middleware -- perhaps 
commented-out -- to the new project template, for beginner's sake. Call it 
PerformanceTrainingWheelsMiddleware, to make people aware that it's something 
they should grow out of, but make it easily accessible.

- We could easily build a decorator for views

- We could have local opt-out, at the correct level -- not the objects, but 
the control flow.

For this to work, the flag controlling the feature would need to be thread-
local and managed as a stack. But these aren't problems.

If (as likely) this suggestion still does not generate a concensus, I would 
prefer that we follow the path Anssi suggested -- 

> Note that it should be possible to implement this fully as 3rd party
> project if we add an easy way to customise how related object fetching is
> done. I'm not sure if we can add an easy way for that.

Except I don't think "easy" is a requirement here. If we can add a sensible 
way, that would be enough -- I don't expect many Django users to develop 
implementations of the feature, only to use them.

My 2 cents,
Shai.


Re: Automatic prefetching in querysets

2017-08-16 Thread Gordon Wrigley
Regarding 2, it does work for reverse one-to-one relations.

On Wed, Aug 16, 2017 at 9:17 PM, Aymeric Augustin <
aymeric.augus...@polytechnique.org> wrote:

> On 15 Aug 2017, at 11:44, Gordon Wrigley  wrote:
>
> I'd like to discuss automatic prefetching in querysets. Specifically
> automatically doing prefetch_related where needed without the user having
> to request it.
>
>
>
> Hello,
>
> I'm rather sympathetic to this proposal. Figuring out N + 1 problems in
> the admin or elsewhere gets old.
>
>
> In addition to everything that was said already, I'd like to point out
> that Django already has a very similar "magic auto prefetching" behavior in
> some cases :-)
>
> I'm referring to the admin which calls select_related() on non-nullable
> foreign keys in the changelist view. The "non-nullable" condition makes
> that behavior hard to predict — I'd go as far as to call it non
> deterministic. For details, see slide 54 of https://myks.org/data/
> 20161103-Django_Under_the_Hood-Debugging_Performance.pdf and the audio
> commentary at https://youtu.be/5fheDDj3oHY?t=2024.
>
>
> The feature proposed here is most useful if it's opt-out because it
> targets people who aren't aware that the problem even exists — at best they
> notice that Django is slow and that reminds them vaguely of a rant that
> explains why ORMs are the worst thing since object oriented programming.
>
> It should kick in only when no select_related or prefetch_related is in
> effect, to avoid interfering with pre-existing optimizations. It's still
> easy to construct an example where it would degrade performance but I don't
> think such situations will be common in practice. Still, there should be a
> per-queryset opt-out for these cases.
>
> We may want to introduce it with a deprecation path, that is, make it
> opt-in at first and log a deprecation warning where the behavior would
> kick-in, so developers who want to disable it can add the per-queryset
> opt-out.
>
>
> At this point, my main concerns are:
>
> 1) The difficulty of identifying where the queryset originates, given that
> querysets are lazy. Passing objects around is common; sometimes it can be
> hard to figure out where an object comes from. It isn't visible in the
> stack trace. In my opinion this is the strongest argument against the
> feature.
>
> 2) The lack of this feature for reverse one-to-one relations; it's only
> implemented for foreign keys. It's hard to tell them apart in Python code.
> The subtle differences, like return None vs. raise ObjectDoesNotExist when
> there's no related object, degrade the developer experience.
>
> 3) The strong opinions expressed against the feature. I'm not sure that
> consensus is within reach. If we can't agree that this is an adequate
> amount of magic, we're likely to stick with the status quo. I'd rather not
> have this question decided by a vote of the technical board.
>
>
> In the grand scheme of things, going from "prefetching a related instance
> for an object" to "prefetching related instances for all objects in the
> queryset" isn't that much of a stretch... But I admit it's rather scary to
> make this change for all existing Django projects!
>
>
> Best regards,
>
> --
> Aymeric.
>
>
>
>
> --
> You received this message because you are subscribed to a topic in the
> Google Groups "Django developers (Contributions to Django itself)" group.
> To unsubscribe from this topic, visit https://groups.google.com/d/
> topic/django-developers/EplZGj-ejvg/unsubscribe.
> To unsubscribe from this group and all its topics, send an email to
> django-developers+unsubscr...@googlegroups.com.
> To post to this group, send email to django-developers@googlegroups.com.
> Visit this group at https://groups.google.com/group/django-developers.
> To view this discussion on the web visit https://groups.google.com/d/
> msgid/django-developers/2D9DCC0F-EAB0-4512-8AEC-
> 08A694DF9074%40polytechnique.org
> 
> .
>
> For more options, visit https://groups.google.com/d/optout.
>

-- 
You received this message because you are subscribed to the Google Groups 
"Django developers  (Contributions to Django itself)" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to django-developers+unsubscr...@googlegroups.com.
To post to this group, send email to django-developers@googlegroups.com.
Visit this group at https://groups.google.com/group/django-developers.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/django-developers/CAD-wiX0ScioW28wNYmY76Ma03kbSN9dWjRvKWs60%2B0f1VwvkFQ%40mail.gmail.com.
For more options, visit https://groups.google.com/d/optout.


Re: Automatic prefetching in querysets

2017-08-16 Thread Aymeric Augustin
> On 15 Aug 2017, at 11:44, Gordon Wrigley  wrote:
> 
> I'd like to discuss automatic prefetching in querysets. Specifically 
> automatically doing prefetch_related where needed without the user having to 
> request it.


Hello,

I'm rather sympathetic to this proposal. Figuring out N + 1 problems in the 
admin or elsewhere gets old.


In addition to everything that was said already, I'd like to point out that 
Django already has a very similar "magic auto prefetching" behavior in some 
cases :-)

I'm referring to the admin which calls select_related() on non-nullable foreign 
keys in the changelist view. The "non-nullable" condition makes that behavior 
hard to predict — I'd go as far as to call it non deterministic. For details, 
see slide 54 of 
https://myks.org/data/20161103-Django_Under_the_Hood-Debugging_Performance.pdf 

 and the audio commentary at https://youtu.be/5fheDDj3oHY?t=2024 
.


The feature proposed here is most useful if it's opt-out because it targets 
people who aren't aware that the problem even exists — at best they notice that 
Django is slow and that reminds them vaguely of a rant that explains why ORMs 
are the worst thing since object oriented programming.

It should kick in only when no select_related or prefetch_related is in effect, 
to avoid interfering with pre-existing optimizations. It's still easy to 
construct an example where it would degrade performance but I don't think such 
situations will be common in practice. Still, there should be a per-queryset 
opt-out for these cases.

We may want to introduce it with a deprecation path, that is, make it opt-in at 
first and log a deprecation warning where the behavior would kick-in, so 
developers who want to disable it can add the per-queryset opt-out.


At this point, my main concerns are:

1) The difficulty of identifying where the queryset originates, given that 
querysets are lazy. Passing objects around is common; sometimes it can be hard 
to figure out where an object comes from. It isn't visible in the stack trace. 
In my opinion this is the strongest argument against the feature.

2) The lack of this feature for reverse one-to-one relations; it's only 
implemented for foreign keys. It's hard to tell them apart in Python code. The 
subtle differences, like return None vs. raise ObjectDoesNotExist when there's 
no related object, degrade the developer experience.

3) The strong opinions expressed against the feature. I'm not sure that 
consensus is within reach. If we can't agree that this is an adequate amount of 
magic, we're likely to stick with the status quo. I'd rather not have this 
question decided by a vote of the technical board.


In the grand scheme of things, going from "prefetching a related instance for 
an object" to "prefetching related instances for all objects in the queryset" 
isn't that much of a stretch... But I admit it's rather scary to make this 
change for all existing Django projects!


Best regards,

-- 
Aymeric.




-- 
You received this message because you are subscribed to the Google Groups 
"Django developers  (Contributions to Django itself)" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to django-developers+unsubscr...@googlegroups.com.
To post to this group, send email to django-developers@googlegroups.com.
Visit this group at https://groups.google.com/group/django-developers.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/django-developers/2D9DCC0F-EAB0-4512-8AEC-08A694DF9074%40polytechnique.org.
For more options, visit https://groups.google.com/d/optout.


Re: Automatic prefetching in querysets

2017-08-16 Thread 'Tom Evans' via Django developers (Contributions to Django itself)
Is this opt-{in,out} considered to be a global flag, meant to be
toggled on or off depending on whether it is an "expert" working on
the project or not?

I don't think that would be a good idea, almost all of our projects
have a mix of skill levels, and people move from team to team on a
regular basis. I don't think we have a project that has only been
worked on by senior developers or only worked on by junior developers.

What I think would be exceptionally useful would be to emit loud
warnings when developing (DEBUG=True or runserver) whenever this code
can detect that a prefetch MAY be applicable. A lot of our junior
developers are not as aware about DB structures and SQL performance
(particularly now that they don't design schemas and write DB
queries), so warnings as part of their normal development help trigger
teaching moments. Magically (sometimes) making things faster doesn't
teach them anything.

Turning this feature on/off per app is scary. If we're dealing with
models from AppA, we will get auto pre-fetching, but if we work with
models from AppB, we do not? If we're dealing with those models in the
same module, we will have different behaviour depending on which model
is being used? Please no.

I also think that it might be handy to specify
"qs.prefetch_related(auto=True)", or "with qs.auto_prefetch():", which
would then trigger the newly proposed behaviour. It's like "I want you
to prefetch all the things I use, but I don't know what just yet".
Having said that, I'd also like that to spit out a warning (again, dev
only) that specified what was actually prefetched, because why waste a
DB query in production to determine what we know whilst developing?

Cheers

Tom



On Wed, Aug 16, 2017 at 5:28 PM, Brice Parent  wrote:
> Le 16/08/17 à 14:07, Adam Johnson a écrit :
>>
>> I wouldn't want is to give free optimisations to non-ORM pros
>
>
> Something like 99% of django projects are by non-ORM pros, it can be easy to
> forget that on a mailing list of experts.
>
> I don't count myself as an ORM or SQL expert at all, but I've worked on at
> least 2 projects for which they had contracted such experts to optimise
> their projects once they reached some limits. Those experts only worked for
> them for a limited period of time, and if we changed this behaviour by
> default, I'm 100% sure they wouldn't understanding why their website was
> getting slower with the new release. And they would probably only see the
> performance issue on staging servers if they have some, or in production
> with a full database.
> I admit this can be a great thing for new comers and even small sites. But
> remember that, with or without this functionality, their website will need
> to be optimised in many ways when they start to have more users, and
> removing this 1+N common pitfall (or hiding the problem, as it is not
> completely solved as it's not the more efficient fix) will not be the only
> point they should look at.
> I think the solution for this would simply be to have it as opt-out, not to
> harm any already-working and optimised websites and allow to maintain a
> stable behaviour for 3rd party apps, but make it opt-in to new users just by
> pre-setting some setting in the settings.py file generated by "django-admin
> startproject". It could be a simple boolean setting or a list of apps for
> which we want the auto-fetch feature to be activated on. With this, new
> projects would be optimised unless they want to optimise manually, existing
> projects would still work without decreasing performance or creating memory
> problems in any case. Best of both worlds.
> And for existing non-optimised but existing websites, I prefer the
> status-quo than risking to create problems that might occur on production
> website.
> We could also, as proposed elsewhere in this thread, warn the developers (so
> only when settings.debug is set) when this kind of 1+N queries are executed
> that it could be way more efficient, either by activating the auto-fetch
> functionality or by manually prefetch related data.
>
> - Brice
>
> --
> You received this message because you are subscribed to the Google Groups
> "Django developers (Contributions to Django itself)" group.
> To unsubscribe from this group and stop receiving emails from it, send an
> email to django-developers+unsubscr...@googlegroups.com.
> To post to this group, send email to django-developers@googlegroups.com.
> Visit this group at https://groups.google.com/group/django-developers.
> To view this discussion on the web visit
> https://groups.google.com/d/msgid/django-developers/dc483976-f7d0-3aee-4f9e-76f515e5c761%40brice.xyz.
>
> For more options, visit https://groups.google.com/d/optout.

-- 
You received this message because you are subscribed to the Google Groups 
"Django developers  (Contributions to Django itself)" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to 

Re: Automatic prefetching in querysets

2017-08-16 Thread Gordon Wrigley
> It's also likely to require a significant patch

The guts of it are actually surprisingly clean and small. That's probably a
testament to some of the good decisions in the ORM architecture.
Covering one2one in both directions, FK's in the forward direction, ignoring
tests and various optin/out options, it's less than 20 lines, split
reasonably evenly between the three *Descriptor.__get__ functions, and
QuerySet._fetch_all.
Of course lines of code is not a measure of complexity or risk :)


On Wed, Aug 16, 2017 at 2:03 PM, Marc Tamlyn  wrote:

> As an opt in feature, it does sound quite interesting. If over the course
> of a few releases, it proves to be reliable and we don't get tons of
> support requests of it either not working, or causing the opposite problem,
> we can consider moving it to an opt out feature. This makes it easy for
> people to "turn on the magic", but keeps the default safer. It would be
> very easy to write a one line monkeypatch to make it default behaviour.
>
> Django is conservative when it comes to changes to default behaviour, and
> doubly so when the ORM is considered. This will need to have been used
> widely in the wild before it would be considered safe as a default
> behaviour for all projects. This whole process would only take a couple of
> years, which isn't much time in Django release land! I don't mean to
> undermine the belief of the people who think it's definitely an improvement
> in most cases, I just want us to be cautious in how we explore that
> hypothesis.
>
> It's also likely to require a significant patch, and need careful analysis
> before merging. I'd suggest it's a good candidate for a DEP to discuss the
> motivation for the project, and to find a shepherd from the core team
> (Josh? Adam?) to help it land.
>
> On 16 August 2017 at 13:48, Luke Plant  wrote:
>
>> Hi Josh,
>>
>> On 16/08/17 02:26, Josh Smeaton wrote:
>>
>> I believe we should be optimising for the **common** use case, not
>> expecting everyone to be experts with the ORM.
>>
>> > If I can come up with a single example where it would significantly
>> decrease performance (either memory usage or speed) compared to the default
>> (and I'm sure I can), then I would be strongly opposed to it ever being
>> default behaviour.
>>
>> The status quo is already one where thousands of users and sites are
>> doing the non-optimal thing because we're choosing to be conservative and
>> have users opt-in to the optimal behaviour.
>>
>>
>> I wouldn't say it is "optimal behaviour". It is behaviour that is an
>> optimization for some use cases - common ones, I'd agree - but not all. In
>> fact it is not even the best optimization - it performs less well in many
>> cases than a manual `select_related`.
>>
>> We don't know how many sites would be affected by the opposite default,
>> because we aren't putting people in that situation. Generating potentially
>> large queries (i.e. ones that return lots of results) is going to make
>> someone's life a lot harder, and can even crash processes (out of memory
>> errors), or cause massive slowdown due to insufficient memory and swapping.
>> I have had these kind of issues in production systems, and sometimes the
>> answer is to prefetch *less* - which is why things like `iterator()` exist,
>> because sometimes you *don't* want to load it all into memory upfront.
>>
>> A massive complaint against Django is how easy it is for users to build
>> in 1+N behaviour. Django is supposed to abstract the database away (so much
>> so that we avoid SQL related terms in our queryset methods), yet one of the
>> more fundamental concepts such as joins we expect users to know about and
>> optimise for.
>>
>>
>> This is far from the only place where we expect users to be conscious of
>> what has to go on at the database level. For example, we expect users to
>> use `.count()` and `.exists()` where appropriate, and not use them where
>> not appropriate - see https://docs.djangoproject.com
>> /en/1.11/topics/db/optimization/#don-t-overuse-count-and-exists
>>
>> This is an example of doing it right. We could have 'intelligently' made
>> `__len__()` do `count()`, but this introduces queries that the user did not
>> ask for, and that's hard for developers to predict. That whole section of
>> the docs on DB optimisation depends on the possibility of understanding
>> things at a DB level, and understanding how QuerySets behave, and that they
>> only do the queries that you ask them to do. The more magic we build in,
>> the harder we make life for people trying to optimize database access.
>>
>> If we have an `auto_prefetch` method that has to be called explicitly,
>> then we are allowing users who know less about databases to get something
>> that works OK for many situations. But having it on by default makes
>> optimization harder and adds unwelcome surprises.
>>
>>
>> I'd be more in favour of throwing an error on
>> 

Re: Automatic prefetching in querysets

2017-08-16 Thread Tom Forbes
Some quicker changes that have been brought up here could be implemented in
the interim, like adding intelligent warnings.

Someone brought up a great point about how the ORM hides most SQL
terminology from the user, but still requires the knowledge of the
difference between prefetch and select related.

Perhaps a good idea would be to create a generic 'related' method on the
queryset that handles both cases? Hiding the differences between how m2m
and foreign keys are fetched could make them seem less complicated and more
approachable to new users?

On 16 Aug 2017 14:04, "Marc Tamlyn"  wrote:

As an opt in feature, it does sound quite interesting. If over the course
of a few releases, it proves to be reliable and we don't get tons of
support requests of it either not working, or causing the opposite problem,
we can consider moving it to an opt out feature. This makes it easy for
people to "turn on the magic", but keeps the default safer. It would be
very easy to write a one line monkeypatch to make it default behaviour.

Django is conservative when it comes to changes to default behaviour, and
doubly so when the ORM is considered. This will need to have been used
widely in the wild before it would be considered safe as a default
behaviour for all projects. This whole process would only take a couple of
years, which isn't much time in Django release land! I don't mean to
undermine the belief of the people who think it's definitely an improvement
in most cases, I just want us to be cautious in how we explore that
hypothesis.

It's also likely to require a significant patch, and need careful analysis
before merging. I'd suggest it's a good candidate for a DEP to discuss the
motivation for the project, and to find a shepherd from the core team
(Josh? Adam?) to help it land.

On 16 August 2017 at 13:48, Luke Plant  wrote:

> Hi Josh,
>
> On 16/08/17 02:26, Josh Smeaton wrote:
>
> I believe we should be optimising for the **common** use case, not
> expecting everyone to be experts with the ORM.
>
> > If I can come up with a single example where it would significantly
> decrease performance (either memory usage or speed) compared to the default
> (and I'm sure I can), then I would be strongly opposed to it ever being
> default behaviour.
>
> The status quo is already one where thousands of users and sites are doing
> the non-optimal thing because we're choosing to be conservative and have
> users opt-in to the optimal behaviour.
>
>
> I wouldn't say it is "optimal behaviour". It is behaviour that is an
> optimization for some use cases - common ones, I'd agree - but not all. In
> fact it is not even the best optimization - it performs less well in many
> cases than a manual `select_related`.
>
> We don't know how many sites would be affected by the opposite default,
> because we aren't putting people in that situation. Generating potentially
> large queries (i.e. ones that return lots of results) is going to make
> someone's life a lot harder, and can even crash processes (out of memory
> errors), or cause massive slowdown due to insufficient memory and swapping.
> I have had these kind of issues in production systems, and sometimes the
> answer is to prefetch *less* - which is why things like `iterator()` exist,
> because sometimes you *don't* want to load it all into memory upfront.
>
> A massive complaint against Django is how easy it is for users to build in
> 1+N behaviour. Django is supposed to abstract the database away (so much so
> that we avoid SQL related terms in our queryset methods), yet one of the
> more fundamental concepts such as joins we expect users to know about and
> optimise for.
>
>
> This is far from the only place where we expect users to be conscious of
> what has to go on at the database level. For example, we expect users to
> use `.count()` and `.exists()` where appropriate, and not use them where
> not appropriate - see https://docs.djangoproject.com
> /en/1.11/topics/db/optimization/#don-t-overuse-count-and-exists
>
> This is an example of doing it right. We could have 'intelligently' made
> `__len__()` do `count()`, but this introduces queries that the user did not
> ask for, and that's hard for developers to predict. That whole section of
> the docs on DB optimisation depends on the possibility of understanding
> things at a DB level, and understanding how QuerySets behave, and that they
> only do the queries that you ask them to do. The more magic we build in,
> the harder we make life for people trying to optimize database access.
>
> If we have an `auto_prefetch` method that has to be called explicitly,
> then we are allowing users who know less about databases to get something
> that works OK for many situations. But having it on by default makes
> optimization harder and adds unwelcome surprises.
>
>
> I'd be more in favour of throwing an error on
> non-select-related-foreign-key-access than what we're currently doing
> which is 

Re: Automatic prefetching in querysets

2017-08-16 Thread Marc Tamlyn
As an opt in feature, it does sound quite interesting. If over the course
of a few releases, it proves to be reliable and we don't get tons of
support requests of it either not working, or causing the opposite problem,
we can consider moving it to an opt out feature. This makes it easy for
people to "turn on the magic", but keeps the default safer. It would be
very easy to write a one line monkeypatch to make it default behaviour.

Django is conservative when it comes to changes to default behaviour, and
doubly so when the ORM is considered. This will need to have been used
widely in the wild before it would be considered safe as a default
behaviour for all projects. This whole process would only take a couple of
years, which isn't much time in Django release land! I don't mean to
undermine the belief of the people who think it's definitely an improvement
in most cases, I just want us to be cautious in how we explore that
hypothesis.

It's also likely to require a significant patch, and need careful analysis
before merging. I'd suggest it's a good candidate for a DEP to discuss the
motivation for the project, and to find a shepherd from the core team
(Josh? Adam?) to help it land.

On 16 August 2017 at 13:48, Luke Plant  wrote:

> Hi Josh,
>
> On 16/08/17 02:26, Josh Smeaton wrote:
>
> I believe we should be optimising for the **common** use case, not
> expecting everyone to be experts with the ORM.
>
> > If I can come up with a single example where it would significantly
> decrease performance (either memory usage or speed) compared to the default
> (and I'm sure I can), then I would be strongly opposed to it ever being
> default behaviour.
>
> The status quo is already one where thousands of users and sites are doing
> the non-optimal thing because we're choosing to be conservative and have
> users opt-in to the optimal behaviour.
>
>
> I wouldn't say it is "optimal behaviour". It is behaviour that is an
> optimization for some use cases - common ones, I'd agree - but not all. In
> fact it is not even the best optimization - it performs less well in many
> cases than a manual `select_related`.
>
> We don't know how many sites would be affected by the opposite default,
> because we aren't putting people in that situation. Generating potentially
> large queries (i.e. ones that return lots of results) is going to make
> someone's life a lot harder, and can even crash processes (out of memory
> errors), or cause massive slowdown due to insufficient memory and swapping.
> I have had these kind of issues in production systems, and sometimes the
> answer is to prefetch *less* - which is why things like `iterator()` exist,
> because sometimes you *don't* want to load it all into memory upfront.
>
> A massive complaint against Django is how easy it is for users to build in
> 1+N behaviour. Django is supposed to abstract the database away (so much so
> that we avoid SQL related terms in our queryset methods), yet one of the
> more fundamental concepts such as joins we expect users to know about and
> optimise for.
>
>
> This is far from the only place where we expect users to be conscious of
> what has to go on at the database level. For example, we expect users to
> use `.count()` and `.exists()` where appropriate, and not use them where
> not appropriate - see https://docs.djangoproject.com/en/1.11/topics/db/
> optimization/#don-t-overuse-count-and-exists
>
> This is an example of doing it right. We could have 'intelligently' made
> `__len__()` do `count()`, but this introduces queries that the user did not
> ask for, and that's hard for developers to predict. That whole section of
> the docs on DB optimisation depends on the possibility of understanding
> things at a DB level, and understanding how QuerySets behave, and that they
> only do the queries that you ask them to do. The more magic we build in,
> the harder we make life for people trying to optimize database access.
>
> If we have an `auto_prefetch` method that has to be called explicitly,
> then we are allowing users who know less about databases to get something
> that works OK for many situations. But having it on by default makes
> optimization harder and adds unwelcome surprises.
>
>
> I'd be more in favour of throwing an error on 
> non-select-related-foreign-key-access
> than what we're currently doing which is a query for each access.
>
>
> I have some sympathy with this, but I doubt it is somewhere we can go from
> the current ORM and Django ecosystem as a default - but see below.
>
>
> The only options users currently have of monitoring poor behaviour is:
>
> 1. Add logging to django.db.models
> 2. Add django-debug-toolbar
> 3. Investigate page slow downs
>
>
> Here's a bunch of ways that previously tuned queries can "go bad":
>
> 1. A models `__str__` method is updated to include a related field
> 2. A template uses a previously unused related field
> 3. A report uses a previously unused related field
> 4. A 

Re: Should django File wrapper support .next()?

2017-08-16 Thread Cristiano Coelho
I forgot about django dropping support for Python2 so I guess you are right 
this doesn't make much sense if next() is removed, but for Python2 it would 
make sense for django's File object to have the same api as the object 
returned by open().


El martes, 15 de agosto de 2017, 19:21:08 (UTC-3), Adam Johnson escribió:
>
> The next() method is gone from the return value of open() in Python 3 and 
> I don't think it was every necessary. As per the iterator protcool ( 
> https://docs.python.org/2/library/stdtypes.html#iterator-types ), __iter__ 
> is meant to return an iterator over the object, and *that* is what the 
> next() method - __next__ on Python 3 - should be attached to. Django's 
> File.__iter__ is a generator function, thus it returns a generator, with 
> comes with a next()/__next__().
>
> I think you should just try iterating your files by calling iter() on 
> them first, it's Python 3 compatible too.
>
> On 27 July 2017 at 17:24, Cristiano Coelho  > wrote:
>
>> Hello,
>>
>> I have recently found an interesting issue, using a project that relies 
>> on different storage backends, when switching from a custom one to django's 
>> file system storage, I found that existing code that would iterate files 
>> with the next() call would start to fail, since although django's File is 
>> iterable, it doesn't define the next method.
>>
>> Now I'm wondering if this is on purpose, or a bug. It's odd that every 
>> stream (or almost) from the python library such as everything from the .io 
>> module or simply the object returned by open() (python 2) supports the next 
>> call but the django File wrapper doesn't.
>>
>> This happened with django 1.10 and I believe it wasn't changed with 
>> Django 1.11
>>
>> -- 
>> You received this message because you are subscribed to the Google Groups 
>> "Django developers (Contributions to Django itself)" group.
>> To unsubscribe from this group and stop receiving emails from it, send an 
>> email to django-develop...@googlegroups.com .
>> To post to this group, send email to django-d...@googlegroups.com 
>> .
>> Visit this group at https://groups.google.com/group/django-developers.
>> To view this discussion on the web visit 
>> https://groups.google.com/d/msgid/django-developers/09ffad76-e473-4593-ac84-4bca7f76e92c%40googlegroups.com
>>  
>> 
>> .
>> For more options, visit https://groups.google.com/d/optout.
>>
>
>
>
> -- 
> Adam
>

-- 
You received this message because you are subscribed to the Google Groups 
"Django developers  (Contributions to Django itself)" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to django-developers+unsubscr...@googlegroups.com.
To post to this group, send email to django-developers@googlegroups.com.
Visit this group at https://groups.google.com/group/django-developers.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/django-developers/cb7c4945-5932-44a0-8eb8-4571813a0ba3%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


Re: Automatic prefetching in querysets

2017-08-16 Thread Luke Plant

Hi Josh,


On 16/08/17 02:26, Josh Smeaton wrote:
I believe we should be optimising for the **common** use case, not 
expecting everyone to be experts with the ORM.


> If I can come up with a single example where it would significantly 
decrease performance (either memory usage or speed) compared to the 
default (and I'm sure I can), then I would be strongly opposed to it 
ever being default behaviour.


The status quo is already one where thousands of users and sites are 
doing the non-optimal thing because we're choosing to be conservative 
and have users opt-in to the optimal behaviour.


I wouldn't say it is "optimal behaviour". It is behaviour that is an 
optimization for some use cases - common ones, I'd agree - but not all. 
In fact it is not even the best optimization - it performs less well in 
many cases than a manual `select_related`.


We don't know how many sites would be affected by the opposite default, 
because we aren't putting people in that situation. Generating 
potentially large queries (i.e. ones that return lots of results) is 
going to make someone's life a lot harder, and can even crash processes 
(out of memory errors), or cause massive slowdown due to insufficient 
memory and swapping. I have had these kind of issues in production 
systems, and sometimes the answer is to prefetch *less* - which is why 
things like `iterator()` exist, because sometimes you *don't* want to 
load it all into memory upfront.


A massive complaint against Django is how easy it is for users to 
build in 1+N behaviour. Django is supposed to abstract the database 
away (so much so that we avoid SQL related terms in our queryset 
methods), yet one of the more fundamental concepts such as joins we 
expect users to know about and optimise for.


This is far from the only place where we expect users to be conscious of 
what has to go on at the database level. For example, we expect users to 
use `.count()` and `.exists()` where appropriate, and not use them where 
not appropriate - see 
https://docs.djangoproject.com/en/1.11/topics/db/optimization/#don-t-overuse-count-and-exists


This is an example of doing it right. We could have 'intelligently' made 
`__len__()` do `count()`, but this introduces queries that the user did 
not ask for, and that's hard for developers to predict. That whole 
section of the docs on DB optimisation depends on the possibility of 
understanding things at a DB level, and understanding how QuerySets 
behave, and that they only do the queries that you ask them to do. The 
more magic we build in, the harder we make life for people trying to 
optimize database access.


If we have an `auto_prefetch` method that has to be called explicitly, 
then we are allowing users who know less about databases to get 
something that works OK for many situations. But having it on by default 
makes optimization harder and adds unwelcome surprises.




I'd be more in favour of throwing an error on 
non-select-related-foreign-key-access than what we're currently doing 
which is a query for each access.


I have some sympathy with this, but I doubt it is somewhere we can go 
from the current ORM and Django ecosystem as a default - but see below.




The only options users currently have of monitoring poor behaviour is:

1. Add logging to django.db.models
2. Add django-debug-toolbar
3. Investigate page slow downs

Here's a bunch of ways that previously tuned queries can "go bad":

1. A models `__str__` method is updated to include a related field
2. A template uses a previously unused related field
3. A report uses a previously unused related field
4. A ModelAdmin adds a previously unused related field


I completely agree that visibility of this problem is a major issue, and 
would really welcome work on improving this, especially in DEBUG mode. 
One option might be a method that replaces lazy loads with exceptions. 
This would force you to add the required `prefetch_related` or 
`select_related` calls. You could have:


.lazy_load(None)   # exception on accessing any non-loaded FK objects

.lazy_load('my_fk1', 'my_fk2')  # exceptions on accessing any unloaded 
FK objects apart from the named ones


.lazy_load('__any__')   # cancel the above, restore default behaviour

This (or something like it) would be a different way to tackle the 
problem - just throwing it out. You could have a Meta option to do 
`.lazy_load(None)` by default for a Model.  I've no idea how practical 
it would be to wire this all up so that is works correctly, and with 
nested objects etc.


This would be a bit of a departure from the current ORM, especially if 
we wanted to migrate to it being the default behaviour. If we are 
thinking long term, we might have to move to something like this if the 
future is async. Lazy-loading ORMs are not very friendly to async code, 
but an ORM where you must specify a set of queries to execute up-front 
might be more of a possibility.


I think a better question to ask is:

- How many 

Re: Automatic prefetching in querysets

2017-08-16 Thread Anssi Kääriäinen
On Wednesday, August 16, 2017 at 2:27:11 PM UTC+3, Josh Smeaton wrote:
>
> It won't affect experienced users. They'll read the release notes, see 
>> that this change has been implemented, and either go and delete a bunch of 
>> prefetch_related() calls, grumble a bit and turn auto-prefetch off globally 
>> or just file it away as another fact they know about the Django ORM.
>>
>
> This is, for me, the biggest point in favour of having it opt out. It'll 
> barely affect anyone who is really experienced, but can provide such a 
> large upside for others.
>

There is huge potential for this feature going very bad for performance. 
For example, this common anti-pattern would immediately cause interesting 
issues:

if qs:
qs[0].do_something()

then if do_something() does access a couple of relations, you'll load data 
for the whole queryset where you only need it for the first one. So, at 
least make this opt-in instead of opt-out.

Doing this for "select_related" cases on opt-in basis seems like a 
plausible idea. Doing this for reverse foreign key or m2m cases where you 
need to use prefetch_related seems complex to implement, and just for that 
reason I'd avoid the prefetch related case at least initially.

Note that it should be possible to implement this fully as 3rd party 
project if we add an easy way to customise how related object fetching is 
done. I'm not sure if we can add an easy way for that.

By the way, a very efficient test for N+1 queries issue is to do something 
like this:

def test_nplusone_queries(self):
self.create_object()
with self.count_queries() as cnt:
self.client.get('object_listing/')
self.create_object()
with self.assertNumQueries(cnt):
self.client.get('object_listing/')

that is, you test the same list view with one and two objects. If there was 
an easy way to write tests like this, that would make n+1 queries problems 
a lot easier to detect. Maybe something to make this kind of test trivial 
to write could be in Django? Something like 
self.assertListViewScales(object_creator, 'view_name')?

 - Anssi

-- 
You received this message because you are subscribed to the Google Groups 
"Django developers  (Contributions to Django itself)" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to django-developers+unsubscr...@googlegroups.com.
To post to this group, send email to django-developers@googlegroups.com.
Visit this group at https://groups.google.com/group/django-developers.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/django-developers/1dcd8559-5f99-4635-b1d1-959bea066192%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


Re: Automatic prefetching in querysets

2017-08-16 Thread Adam Johnson
>
> I wouldn't want is to give free optimisations to non-ORM pros
>

Something like 99% of django projects are by non-ORM pros, it can be easy
to forget that on a mailing list of experts.

On 16 August 2017 at 12:27, Josh Smeaton  wrote:

> It won't affect experienced users. They'll read the release notes, see
>> that this change has been implemented, and either go and delete a bunch of
>> prefetch_related() calls, grumble a bit and turn auto-prefetch off globally
>> or just file it away as another fact they know about the Django ORM.
>>
>
> This is, for me, the biggest point in favour of having it opt out. It'll
> barely affect anyone who is really experienced, but can provide such a
> large upside for others.
>
> --
> You received this message because you are subscribed to the Google Groups
> "Django developers (Contributions to Django itself)" group.
> To unsubscribe from this group and stop receiving emails from it, send an
> email to django-developers+unsubscr...@googlegroups.com.
> To post to this group, send email to django-developers@googlegroups.com.
> Visit this group at https://groups.google.com/group/django-developers.
> To view this discussion on the web visit https://groups.google.com/d/
> msgid/django-developers/3bf2f56e-8534-404b-8a70-
> 12a657be0514%40googlegroups.com
> 
> .
>
> For more options, visit https://groups.google.com/d/optout.
>



-- 
Adam

-- 
You received this message because you are subscribed to the Google Groups 
"Django developers  (Contributions to Django itself)" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to django-developers+unsubscr...@googlegroups.com.
To post to this group, send email to django-developers@googlegroups.com.
Visit this group at https://groups.google.com/group/django-developers.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/django-developers/CAMyDDM1FyuGNEkWQrrC6Owo1V5c2YDmBaK5NFRkRgdMJtzQjfg%40mail.gmail.com.
For more options, visit https://groups.google.com/d/optout.


Re: Automatic prefetching in querysets

2017-08-16 Thread Josh Smeaton

>
> It won't affect experienced users. They'll read the release notes, see 
> that this change has been implemented, and either go and delete a bunch of 
> prefetch_related() calls, grumble a bit and turn auto-prefetch off globally 
> or just file it away as another fact they know about the Django ORM.
>

This is, for me, the biggest point in favour of having it opt out. It'll 
barely affect anyone who is really experienced, but can provide such a 
large upside for others.

-- 
You received this message because you are subscribed to the Google Groups 
"Django developers  (Contributions to Django itself)" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to django-developers+unsubscr...@googlegroups.com.
To post to this group, send email to django-developers@googlegroups.com.
Visit this group at https://groups.google.com/group/django-developers.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/django-developers/3bf2f56e-8534-404b-8a70-12a657be0514%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.