Re: Faster Migrations! But at what cost?

2019-05-20 Thread Raphael Gaschignard

Hi Markus, Simon,

 Both of you, thank you for the detailed replies and status report on 
this sort of stuff.



Did you look into squashing these 500 migrations by any chance?


Yeah so I'll go in and squash things, though (partly because effective 
squashing requires moving models around, though we're still at Django 
1.11 so it might become easier, partly because of cross-app references) 
it's a decent amount of work.


I do like the idea of reusing a "effectively production DB" for things, 
not least because it's actually acurate.


OK so the Operation API is effectively documented. I am tempted to try 
modifying the `database_forwards` and `database_backwards` to not 
require `from_state` (probably through adding a hook that gets called 
_before_ `state_forwards` to allow capturing the small details of the 
old state). It might actually be possible to make this backwards 
compatible by porting Django migrations to use an `apps`-free workflow, 
but falling back to the old "re-render the universe" mechanism for 
operations that don't apply it.


This is very "that xkcd comic about time spent versus time saved"

> will cause trouble with RunSQL and other operations that use related 
field or model attributes


So one thing I felt like was an invariant in this code was that field 
sharing was expected? From the docstring of ModelState:


    Note that while you are allowed to mutate .fields, you are not allowed
    to mutate the Field instances inside there themselves - you must 
instead

    assign new ones, as these are not detached during a clone.

Also maybe you meant to refer to RunPython instead of RunSQL. But I get 
your point in general here. Related models can be a problem


One throwaway idea would be to not allow related model/related field 
access in these models? There's already a lot of documentation related 
to not allowing general model methods (effectively establishing that 
"migration models are _not_ normal models"), so there's a bit of 
precedent. But beyond the general backwards incompatability, it might 
not actually even be obvious how one would implement this. And you kinda 
need this info for foreign keys and the like anyways.



Working directly off of `ModelState` is interesting, and I think there 
might be a backwards-compatible way forward there, where we still allow 
for rendering on certain operations but hold off on it on the basic 
ones. Even in our large project, most of our migration operatiosn are 
dirt-simple, so if the core django migrations could work off of 
`ModelState` then we could get a fast path through there.


Thanks for your input, both of you. I have a couple ideas now that I'm 
pretty tempted to try out, mainly around the "fast path and slow path" 
strategies that should offer backwards compatibility.


 Raphael

Markus Holtermann wrote on 2019/05/21 2:26:

Thanks Raphael for bringing this topic up and Simon for your input already.

I just left a note on your PR: 
https://github.com/django/django/pull/11388#issuecomment-494076750 . I'll quote 
it here for ease of readability:

As far as I can see right now, a similar caching happened as a first approach 
to the Django 1.8 release but cause significant problems, specifically with 
regards to relational fields. Relational fields (ForeignKey, OneToOneField, 
ManyToManyField) keep an instance reference to the related model in 
`.related_model` or the related fields in `.related_fields`. The problem now 
is, if you reuse a field (and you do because you're only calling `list()` on 
`self.fields` to copy but not deepcopy the list), you're screwing up references 
between the models that _will_ cause trouble with `RunSQL` and other operations 
that use related field or model attributes.

https://github.com/django/django/blob/1d0bab0bfd77edcf1228d45bf654457a8ff1890d/django/db/models/fields/__init__.py#L495-L499

 From my work on migrations, I see essentially 2 approaches that are viable and 
would lead to significant performance improvements:

## 1. Make the schema editor work with model states

There's a _very old_ branch on my fork that tries to implement this approach: 
https://github.com/MarkusH/django/commits/schemaeditor-modelstate

I lost interest and eventually also didn't have the time to pursue this 
approach further. I think that's the better of the two approaches, as it gets 
rid of the piece of code that actually _causes_ the slow behavior: turning a 
model state into a model class to be used in the schema editor.

However, making the schema editor backward compatible has been proven difficult 
and to be a huge pain (just check out the commits :wink: )

## 2. Don't resolve models/fields to instances but rely on string references.

This approach is IMO merely a workaround as it would allow us to cache the fields and models as you're 
doing in your PR. `model._meta.get_field("author").related_model` would not return `` but `"myapp.Author"`. However, as far as I can tell, that's highly 
backward incom

Re: Faster Migrations! But at what cost?

2019-05-20 Thread Markus Holtermann
Thanks Raphael for bringing this topic up and Simon for your input already.

I just left a note on your PR: 
https://github.com/django/django/pull/11388#issuecomment-494076750 . I'll quote 
it here for ease of readability:

As far as I can see right now, a similar caching happened as a first approach 
to the Django 1.8 release but cause significant problems, specifically with 
regards to relational fields. Relational fields (ForeignKey, OneToOneField, 
ManyToManyField) keep an instance reference to the related model in 
`.related_model` or the related fields in `.related_fields`. The problem now 
is, if you reuse a field (and you do because you're only calling `list()` on 
`self.fields` to copy but not deepcopy the list), you're screwing up references 
between the models that _will_ cause trouble with `RunSQL` and other operations 
that use related field or model attributes.

https://github.com/django/django/blob/1d0bab0bfd77edcf1228d45bf654457a8ff1890d/django/db/models/fields/__init__.py#L495-L499

>From my work on migrations, I see essentially 2 approaches that are viable and 
>would lead to significant performance improvements:

## 1. Make the schema editor work with model states

There's a _very old_ branch on my fork that tries to implement this approach: 
https://github.com/MarkusH/django/commits/schemaeditor-modelstate

I lost interest and eventually also didn't have the time to pursue this 
approach further. I think that's the better of the two approaches, as it gets 
rid of the piece of code that actually _causes_ the slow behavior: turning a 
model state into a model class to be used in the schema editor.

However, making the schema editor backward compatible has been proven difficult 
and to be a huge pain (just check out the commits :wink: )

## 2. Don't resolve models/fields to instances but rely on string references.

This approach is IMO merely a workaround as it would allow us to cache the 
fields and models as you're doing in your PR. 
`model._meta.get_field("author").related_model` would not return `` but `"myapp.Author"`. However, as far as I can tell, 
that's highly backward incompatible. And at this point, I also don't see a way 
to make this backward compatible.

Cheers,

Markus

On Mon, May 20, 2019, at 5:26 PM, charettes wrote:
> Hello Raphael,
> 
> > Have there ever been any surveys about how the size of Django projects? I 
> > don't know the value of investigating this further except for our own usage.
> 
> I'm not aware of any similar surveys in the recent years but I would 
> say *240 models across 90 apps, with roughly 500 migrations* would be 
> considered a really large project in my experience. Did you look into 
> squashing these 500 migrations by any chance? Something we did at 
> $DAYJOB to speed up the test bootstraping process is to prebuild 
> containers with migrations already applied in production so CI running 
> PRs only applies new migrations on top of them.
> 
> > Does the caching of ModelState.render as done in this PR (still need to 
> > work through a couple failing tests) sound reasonable? Or is this veering 
> > too far in the performance/safety guarantee tradeoff?
> 
> While the layer you added seems to yield significant benefits I would 
> argue that it complicates an already too complex apps rendering caching 
> layer. As you'll probably come to discover while trying to resolve the 
> currently failing tests model.Fields equality is not implemented how 
> you'd expect it to be[0] and thus require costly deconstruction to be 
> used as a cache staleness predicate[1].
> 
> > Is the migration operation infrastructure considered a public API? As in, 
> > would changing the Operation model API (potentially breaking subclasses) be 
> > considered a major undertaking? Or would it be an acceptable cost to pay 
> > for some performance improvements?
> 
> Given the large adoption of migrations and the fact the Operation API 
> is publicly documented[2] I would say the performance benefits would 
> need to be quite substantial to break backward compatibility. In my 
> opinion, and I think that's something Markus Holtermann who also worked 
> a lot on speeding up migrations would agree on, we should focus our 
> efforts on avoiding model rendering at all cost. We've already made all 
> state mutation (Operation.state_forwards) avoid all accesses to .apps 
> and I think the next step would be to make `database_forwards` and 
> `database_backwards` do the same. This is something Markus worked on a 
> few years ago[3].
> 
> Cheers,
> Simon
> 
> [0] 
> https://github.com/django/django/blob/1d0bab0bfd77edcf1228d45bf654457a8ff1890d/django/db/models/fields/__init__.py#L495-L499
> [1] 
> https://github.com/django/django/blob/1d0bab0bfd77edcf1228d45bf654457a8ff1890d/django/db/migrations/autodetector.py#L49-L87
> [2] 
> https://docs.djangoproject.com/en/2.2/ref/migration-operations/#writing-your-own
> [3] 
> https://github.com/django/django/compare/master...MarkusH:schemaedito

Re: Faster Migrations! But at what cost?

2019-05-20 Thread charettes
Hello Raphael,

> Have there ever been any surveys about how the size of Django projects? I 
don't know the value of investigating this further except for our own usage.

I'm not aware of any similar surveys in the recent years but I would say *240 
models across 90 apps, with roughly 500 migrations* would be considered a 
really large project in my experience. Did you look into squashing these 
500 migrations by any chance? Something we did at $DAYJOB to speed up the 
test bootstraping process is to prebuild containers with migrations already 
applied in production so CI running PRs only applies new migrations on top 
of them.

> Does the caching of ModelState.render as done in this PR (still need to 
work through a couple failing tests) sound reasonable? Or is this veering 
too far in the performance/safety guarantee tradeoff?

While the layer you added seems to yield significant benefits I would argue 
that it complicates an already too complex apps rendering caching layer. As 
you'll probably come to discover while trying to resolve the currently 
failing tests model.Fields equality is not implemented how you'd expect it 
to be[0] and thus require costly deconstruction to be used as a cache 
staleness predicate[1].

> Is the migration operation infrastructure considered a public API? As in, 
would changing the Operation model API (potentially breaking subclasses) be 
considered a major undertaking? Or would it be an acceptable cost to pay 
for some performance improvements?

Given the large adoption of migrations and the fact the Operation API is 
publicly documented[2] I would say the performance benefits would need to 
be quite substantial to break backward compatibility. In my opinion, and I 
think that's something Markus Holtermann who also worked a lot on speeding 
up migrations would agree on, we should focus our efforts on avoiding model 
rendering at all cost. We've already made all state mutation 
(Operation.state_forwards) avoid all accesses to .apps and I think the next 
step would be to make `database_forwards` and `database_backwards` do the 
same. This is something Markus worked on a few years ago[3].

Cheers,
Simon

[0] 
https://github.com/django/django/blob/1d0bab0bfd77edcf1228d45bf654457a8ff1890d/django/db/models/fields/__init__.py#L495-L499
[1] 
https://github.com/django/django/blob/1d0bab0bfd77edcf1228d45bf654457a8ff1890d/django/db/migrations/autodetector.py#L49-L87
[2] 
https://docs.djangoproject.com/en/2.2/ref/migration-operations/#writing-your-own
[3] 
https://github.com/django/django/compare/master...MarkusH:schemaeditor-modelstate

Le dimanche 19 mai 2019 22:13:03 UTC-4, Raphael Gaschignard a écrit :
>
> Hi Developers,
>
>   We have a decently-sized "large project", around 240 models across 90 
> apps, with roughly 500 migrations to work off of. We do periodically squash 
> migrations to keep the migration count under control, but because of all 
> this migrations in our testing server take 3-5 minutes to run to 
> completion. 
>
> I am not sure about what the size of a typical Django project is (or 
> rather, a typical "large project") so it's hard for me to quantify how big 
> of an issue this is.
>
> Looking through the migration code and some profiling I found a place 
> where caching was possible (on the ModelState -> Model rendering, based 
> on some of the invariants stated in ModelState code), which would cut 
> *our* full migration from 230 seconds to 50 seconds (on my machine at 
> least). On the specific caching I did, I was hitting a 90% cache hit rate 
> on our full migration run.
>
> Caching is always a bit scary, though, and there are a *lot* of places in 
> the apps registry code/model registration code in particular where caches 
> are constantly being wiped. So this stuff scares me quite a bit. In my 
> personal ideal, I would love to be able to check in my caching thing but 
> have it be behind some MIGRATIONS_FASTER_BUT_MAYBE_UNSAFE flag. I am not 
> recommending this for Django because it's not how the project tends to do 
> things, this is just my personal feeling. After all, you're rarely running 
> all  your migrations in production, so this is a testing problem more than 
> anything.
>
> I do think there would be an alternative way to move forward though. 
> Currently the migrations Operation class relies on having the from_state 
> and to_state for DB operations in particular. But I think that we could 
> change up this API based on how these properties are used in 
> Django-provided Operation classes to avoid having to copy the state to 
> provide from_state and to_state. I haven't gone through with this 
> investigation too much yet but I think this would improve things a bit.
>
> So this is a multi-pronged question:
>
> - Have there ever been any surveys about how the size of Django projects? 
> I don't know the value of investigating this further except for our own 
> usage.
>
> - Does the caching of ModelState.render as done in this PR 
> 

Re: Fellow Reports - May 2019

2019-05-20 Thread Mariusz Felisiak
Week ending May 19, 2019.

*Triaged:*
https://code.djangoproject.com/ticket/30473 - len() and count() yield 
different results on randomly ordered QuerySet with annotation. (duplicate)
https://code.djangoproject.com/ticket/30474 - Minor change to documentation 
on many-to-many relationships. (invalid)
https://code.djangoproject.com/ticket/30477 - Filtering on reverse 
ForeignKey relation uses get_db_prep_value from a wrong field. (accepted)
https://code.djangoproject.com/ticket/30476 - Add introspection of "json" 
(like "jsonb") data type as JSONField on PostgreSQL. (wontfix)
https://code.djangoproject.com/ticket/30482 - prefetch_related_objects 
evaluates queryset when checking for invalid prefetch ordering. (accepted)
https://code.djangoproject.com/ticket/30480 - Discrepancy in `DateTime` 
field value If the django orm union() is used with the empty array in the 
filter. (worksforme)
https://code.djangoproject.com/ticket/30479 - Autoreloader with 
StatReloader doesn't track changes in manage.py. (accepted)
https://code.djangoproject.com/ticket/30487 - Using more than 245 functions 
in CombinedExpressions causes RecursionError. (wontfix)
https://code.djangoproject.com/ticket/30488 - SearchVector lookup is 
generating redundent Coalesce wrapper. (accepted)

*Reviewed/committed:*
https://github.com/django/django/pull/11356 - Fixed #30436 -- Added check 
that on_delete is callable in ForeignKey and OneToOneField.
https://github.com/django/django/pull/11330 - Fixed #30437 -- Clarified 
that urlpatterns can be a sequence.
https://github.com/django/django/pull/11305 - Refs #27685 -- Logged 
unexpected Watchman autoreloader errors.
https://github.com/django/django/pull/11338 - Fixed #30316 -- Added source 
code link to the default logging configuration in logging docs.
https://github.com/django/django/pull/11369 - Fixed #30482 -- Prevented 
unnecessary evaluation of lookup.queryset in prefetch_related_objects().
https://github.com/django/django/pull/11353 - Fixed #30459 -- Delegated 
hide/show JS toggle to parent div. 
https://github.com/django/django/pull/11033 - Fixed #30220 -- Added support 
for headless mode in selenium tests.
https://github.com/django/django/pull/11334 - Fixed #30453 -- Fixed crash 
of simple_tag() and inclusion_tag() when function is wrapped.
https://github.com/django/django/pull/11293 - Fixed #30395 -- Doc'd a 
limitation of ModelForm.Meta.widgets.
https://github.com/django/django/pull/11377 - Fixed #30463 -- Fixed crash 
of deprecation message when Meta.ordering contains expressions.

I was off for almost entire week.

Best regards,
Mariusz

-- 
You received this message because you are subscribed to the Google Groups 
"Django developers  (Contributions to Django itself)" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to django-developers+unsubscr...@googlegroups.com.
To post to this group, send email to django-developers@googlegroups.com.
Visit this group at https://groups.google.com/group/django-developers.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/django-developers/36897e76-bb56-4274-9854-f963b65e514d%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


Re: STMP ON DJANGO

2019-05-20 Thread Adam Johnson
Hi!

I think you've found the wrong mailing list for this post. This mailing
list is for the development of Django itself, not for support using Django.
This means the discussions of bugs and features in Django itself, rather
than in your code using it. People on this list are unlikely to answer your
support query with their limited time and energy. Read more on the mailing
lists at https://www.djangoproject.com/community/

For support, please use the django-users mailing list, or IRC #django on
Freenode, or a site like Stack Overflow. There are people out there willing
to help on those channels, but they might not respond if you don't ask your
question well. Stack Overflow's question guide can help you frame it well:
https://stackoverflow.com/help/how-to-ask .

Also if you haven't read it, please take a look at Django's Code of
Conduct: https://www.djangoproject.com/conduct/ . These are our "ground
rules" for working well as a community, and will help you get the most out
of Django and our fantastic community.

Thanks for your understanding,

Adam

On Mon, 20 May 2019 at 09:48, ramadhan ngallen  wrote:

> Hello Guys am new to Django
>
> I want to have a model which will have all smtp setup for email.
>
> class EmailSetup(models.Model):
> # model for smtp email EmailSetup
> host = models.CharField(defaut='', blank=False, null=False)
> host_user = models.EmailField(defaut='', blank=False, null=False)
> password = models.CharField(max_length=32, widget=models.PasswordInput)
> port = model.IntegerField()
> email_use_tls = models.BooleanField()
>
>
> This model will be registered to the admin site.
> And admin will be able to save setup and whenever he/she make any
> modification it should updated to the database
> All email will be sent from django using the setup above
>
> I would like to know how to implement something like this and on django
> documentation models do not have PasswordFiled()
>
> --
> You received this message because you are subscribed to the Google Groups
> "Django developers (Contributions to Django itself)" group.
> To unsubscribe from this group and stop receiving emails from it, send an
> email to django-developers+unsubscr...@googlegroups.com.
> To post to this group, send email to django-developers@googlegroups.com.
> Visit this group at https://groups.google.com/group/django-developers.
> To view this discussion on the web visit
> https://groups.google.com/d/msgid/django-developers/52eba26e-7319-4b4f-b784-c742b6eba021%40googlegroups.com
> 
> .
> For more options, visit https://groups.google.com/d/optout.
>


-- 
Adam

-- 
You received this message because you are subscribed to the Google Groups 
"Django developers  (Contributions to Django itself)" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to django-developers+unsubscr...@googlegroups.com.
To post to this group, send email to django-developers@googlegroups.com.
Visit this group at https://groups.google.com/group/django-developers.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/django-developers/CAMyDDM0ebn5BT6BWWDOi%2BjVrz%2BvuYix3KCv1qj8Q7QMjVXo4dw%40mail.gmail.com.
For more options, visit https://groups.google.com/d/optout.


STMP ON DJANGO

2019-05-20 Thread ramadhan ngallen
Hello Guys am new to Django

I want to have a model which will have all smtp setup for email.

class EmailSetup(models.Model):
# model for smtp email EmailSetup
host = models.CharField(defaut='', blank=False, null=False)
host_user = models.EmailField(defaut='', blank=False, null=False)
password = models.CharField(max_length=32, widget=models.PasswordInput)
port = model.IntegerField()
email_use_tls = models.BooleanField()


This model will be registered to the admin site.
And admin will be able to save setup and whenever he/she make any 
modification it should updated to the database
All email will be sent from django using the setup above

I would like to know how to implement something like this and on django 
documentation models do not have PasswordFiled()

-- 
You received this message because you are subscribed to the Google Groups 
"Django developers  (Contributions to Django itself)" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to django-developers+unsubscr...@googlegroups.com.
To post to this group, send email to django-developers@googlegroups.com.
Visit this group at https://groups.google.com/group/django-developers.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/django-developers/52eba26e-7319-4b4f-b784-c742b6eba021%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.