Django, initial data and custom SQL
Hello, I would like to suggest (patch will follow if someone concurs) that custom sql is executed after Django has created indexes. Django is (in my opinion) a bit optimistic regarding index creation, and by looking at pg_stat_* output I see that at least a couple of indexes on busy tables hasn't been used - ever. The obvious thing to do is to drop these. Next step would be to sync this with my test and staging environment so I can run tests against my data/application and verify that something hasn't broken - and here's basically where it gets interesting. Since Django executes my custom SQL before creating indexes, it's impossible to achieve something that hooks into initdb/syncdb. I know that it is "good custom" to create indexes after inserting data – but fixtures in Django is already executed after creating indexes, so that can't be the reason.. So, without further ado, what I would like to know is if there's a reason to why custom SQL is executed before index creation. Kind regards, Johan bergström --~--~-~--~~~---~--~~ You received this message because you are subscribed to the Google Groups "Django developers" group. To post to this group, send email to django-developers@googlegroups.com To unsubscribe from this group, send email to django-developers+unsubscr...@googlegroups.com For more options, visit this group at http://groups.google.com/group/django-developers?hl=en -~--~~~~--~~--~--~---
Re: Django, initial data and custom SQL
On Feb 10, 1:13 pm, Johan Bergström wrote: > Since Django executes my custom SQL before creating indexes, it's > impossible to achieve something that hooks into initdb/syncdb. I know > that it is "good custom" to create indexes after inserting data – but > fixtures in Django is already executed after creating indexes, so that > can't be the reason.. So, without further ado, what I would like to > know is if there's a reason to why custom SQL is executed before index > creation. Isn't this doable with initial SQL? http://docs.djangoproject.com/en/dev/howto/initial-data/#initial-sql Testing here with SQLite, it'd seem it runs the custom SQL at the very last point, so you could actually add some ALTER TABLE statements, I guess. Again, this is testing with SQLite, and SQLite doesn't do indexing. Maybe I misunderstood? - Ludvig --~--~-~--~~~---~--~~ You received this message because you are subscribed to the Google Groups "Django developers" group. To post to this group, send email to django-developers@googlegroups.com To unsubscribe from this group, send email to django-developers+unsubscr...@googlegroups.com For more options, visit this group at http://groups.google.com/group/django-developers?hl=en -~--~~~~--~~--~--~---
Re: Django, initial data and custom SQL
Hey, On Feb 10, 4:51 pm, "ludvig.ericson" wrote: > On Feb 10, 1:13 pm, Johan Bergström wrote: > > > Since Django executes my custom SQL before creating indexes, it's > > impossible to achieve something that hooks into initdb/syncdb. I know > > that it is "good custom" to create indexes after inserting data – but > > fixtures in Django is already executed after creating indexes, so that > > can't be the reason.. So, without further ado, what I would like to > > know is if there's a reason to why custom SQL is executed before index > > creation. > > Isn't this doable with initial > SQL?http://docs.djangoproject.com/en/dev/howto/initial-data/#initial-sql > > Testing here with SQLite, it'd seem it runs the custom SQL at the very > last point, so you could actually add some ALTER TABLE statements, I > guess. Again, this is testing with SQLite, and SQLite doesn't do > indexing. Actually it doesn't. I think you just did a reset/sqlall instead of sync/initdb: # cat settings.py | grep DATABASE_E DATABASE_ENGINE = "sqlite3" # python manage.py syncdb Creating table testapp_message Creating table testapp_avatar Installing custom SQL for testapp.Message model Failed to install custom SQL for testapp.Message model: no such index: testapp_message_avatar_id Installing index testapp.Message model Installing json fixture 'initial_data' from '/fixtures'. As you most likely can tell from above, sql/message.sql contains a "drop index ..." operation. (nitpick: SQlite has indexes - you could of course argue their effectiveness :-) > > Maybe I misunderstood? > Perhaps I should've been more verbose :-) Thanks for your input though. > - Ludvig Regards, Johan Bergström --~--~-~--~~~---~--~~ You received this message because you are subscribed to the Google Groups "Django developers" group. To post to this group, send email to django-developers@googlegroups.com To unsubscribe from this group, send email to django-developers+unsubscr...@googlegroups.com For more options, visit this group at http://groups.google.com/group/django-developers?hl=en -~--~~~~--~~--~--~---
Re: Django, initial data and custom SQL
On Feb 10, 5:07 pm, Johan Bergström wrote: > Hey, > > On Feb 10, 4:51 pm, "ludvig.ericson" wrote: > > > > > > > On Feb 10, 1:13 pm, Johan Bergström wrote: > > > > Since Django executes my custom SQL before creating indexes, it's > > > impossible to achieve something that hooks into initdb/syncdb. I know > > > that it is "good custom" to create indexes after inserting data – but > > > fixtures in Django is already executed after creating indexes, so that > > > can't be the reason.. So, without further ado, what I would like to > > > know is if there's a reason to why custom SQL is executed before index > > > creation. > > > Isn't this doable with initial > > SQL?http://docs.djangoproject.com/en/dev/howto/initial-data/#initial-sql > > > Testing here with SQLite, it'd seem it runs the custom SQL at the very > > last point, so you could actually add some ALTER TABLE statements, I > > guess. Again, this is testing with SQLite, and SQLite doesn't do > > indexing. > > Actually it doesn't. I think you just did a reset/sqlall instead of > sync/initdb: > > # cat settings.py | grep DATABASE_E > DATABASE_ENGINE = "sqlite3" > > # python manage.py syncdb > > Creating table testapp_message > Creating table testapp_avatar > Installing custom SQL for testapp.Message model > Failed to install custom SQL for testapp.Message model: no such index: > testapp_message_avatar_id > Installing index testapp.Message model > > Installing json fixture 'initial_data' from '/fixtures'. > > As you most likely can tell from above, sql/message.sql contains a > "drop index ..." operation. > > (nitpick: SQlite has indexes - you could of course argue their > effectiveness :-) > > > > > Maybe I misunderstood? > > Perhaps I should've been more verbose :-) Thanks for your input > though. > > > - Ludvig > > Regards, > Johan Bergström I took the liberty of creating a ticket with attached patch at: http://code.djangoproject.com/ticket/10236 Thanks, Johan Bergström --~--~-~--~~~---~--~~ You received this message because you are subscribed to the Google Groups "Django developers" group. To post to this group, send email to django-developers@googlegroups.com To unsubscribe from this group, send email to django-developers+unsubscr...@googlegroups.com For more options, visit this group at http://groups.google.com/group/django-developers?hl=en -~--~~~~--~~--~--~---
Re: Django, initial data and custom SQL
Feb 11, Johan Bergström: > I took the liberty of creating a ticket with attached patch at: > http://code.djangoproject.com/ticket/10236 I fail to see how "it has consequences for existing code", as Russell put it. I did discuss this with Bergström, and we came to the conclusion that it won't actually break much code, if any. The only case which it could break, AFAICT, is if custom SQL manages to depend on the absence of indexes. I guess that could break code that violates indexing constraints, which are applied later, maybe? I don't know. - Ludvig --~--~-~--~~~---~--~~ You received this message because you are subscribed to the Google Groups "Django developers" group. To post to this group, send email to django-developers@googlegroups.com To unsubscribe from this group, send email to django-developers+unsubscr...@googlegroups.com For more options, visit this group at http://groups.google.com/group/django-developers?hl=en -~--~~~~--~~--~--~---
Re: Django, initial data and custom SQL
On Thu, Feb 12, 2009 at 2:54 PM, Ludvig Ericson wrote: > > Feb 11, Johan Bergström: >> I took the liberty of creating a ticket with attached patch at: >> http://code.djangoproject.com/ticket/10236 > > I fail to see how "it has consequences for existing code", as Russell > put it. It has consequences because you are proposing to change the order in which indexes and custom SQL are applied. Any code that depends on the existing order will be affected. > I did discuss this with Bergström, and we came to the conclusion that > it won't actually break much code, if any. If it has the potential to break _any_ code, it is an unacceptable change. The Django core developers have stated that we will maintain backwards compatibility of the v1.0 interface, so any change with even the _potential_ to affect backwards compatibility will need to be checked very carefully. However, the larger point is that you don't get to make that decision about whether a change is acceptable or not. You can discuss a change and make a recommendation, but ultimately the decision needs to be made by a core developer. The Design Decision Required triage state exists for exactly this reason. > The only case which it could break, AFAICT, is if custom SQL manages > to depend on the absence of indexes. I guess that could break code > that violates indexing constraints, which are applied later, maybe? I > don't know. This is what needs to be confirmed, and given that it is a non-trivial change, your decision needs to be confirmed by a core developer. I am willing to be convinced, but you will need to prove to me that there is no backwards compatibility problem here. The discussion in this thread so far hasn't done that. To answer the original question (why is it done in this order) - mostly historical reasons. Originally, Django didn't have fixtures, so initial data were all loaded from initial SQL. It's generally faster to insert data and then add indicies, hence the order. When Django added fixtures, the order wasn't changed. Yours, Russ Magee %-) --~--~-~--~~~---~--~~ You received this message because you are subscribed to the Google Groups "Django developers" group. To post to this group, send email to django-developers@googlegroups.com To unsubscribe from this group, send email to django-developers+unsubscr...@googlegroups.com For more options, visit this group at http://groups.google.com/group/django-developers?hl=en -~--~~~~--~~--~--~---
Re: Django, initial data and custom SQL
On Feb 12, 2009, at 07:48, Russell Keith-Magee wrote: > On Thu, Feb 12, 2009 at 2:54 PM, Ludvig Ericson > wrote: >> >> I fail to see how "it has consequences for existing code", as Russell >> put it. > > It has consequences because you are proposing to change the order in > which indexes and custom SQL are applied. Any code that depends on the > existing order will be affected. Just want to note that *I'm* not proposing any changes. >> I did discuss this with Bergström, and we came to the conclusion that >> it won't actually break much code, if any. > > If it has the potential to break _any_ code, it is an unacceptable > change. The Django core developers have stated that we will maintain > backwards compatibility of the v1.0 interface, so any change with even > the _potential_ to affect backwards compatibility will need to be > checked very carefully. > > However, the larger point is that you don't get to make that decision > about whether a change is acceptable or not. You can discuss a change > and make a recommendation, but ultimately the decision needs to be > made by a core developer. The Design Decision Required triage state > exists for exactly this reason. I don't think it's "the larger point." I misunderstood the triaging system, and incorrectly changed it. Bennett reverted. I apologized. End of that. What I meant by "ready for checking" was that the patch applies cleanly, runs, and does what it is intended to do. And obviously that was mistake, and again: I realize this now. >> The only case which it could break, AFAICT, is if custom SQL manages >> to depend on the absence of indexes. I guess that could break code >> that violates indexing constraints, which are applied later, maybe? I >> don't know. > > This is what needs to be confirmed, and given that it is a non-trivial > change, your decision needs to be confirmed by a core developer. I am > willing to be convinced, but you will need to prove to me that there > is no backwards compatibility problem here. The discussion in this > thread so far hasn't done that. I realize that my opinions and decisions play a very small role in the development of Django, but I try to express them so that core developers may read them, and either think "what a retard", or "fair point." I guess the question really is: is this custom SQL execution order change really the right fix for the issue? I can't say I'm convinced that it is, given your next paragraph. I would suggest the need for signaling that a custom SQL file should execute post-indices. One solution, which is entirely backwards-compatible, would be to say that "if you want your custom SQL to run after indices creation, name files '*.post.sql,'" or something like that. That has its own issues, but you get the general idea. > To answer the original question (why is it done in this order) - > mostly historical reasons. Originally, Django didn't have fixtures, so > initial data were all loaded from initial SQL. It's generally faster > to insert data and then add indicies, hence the order. When Django > added fixtures, the order wasn't changed. I see, so the custom SQL machinery grew out of being a crude version of fixtures then. I think the cause of confusion here is that Bergström and I both were expecting it to be for more than loading data -- I want to create a view in it (hence my interest), and he wants to drop some excessive indices. After pondering some more, I realized the docs actually say "initial SQL data", and so I'm not sure if this change is actually a good idea or not. - Ludvig --~--~-~--~~~---~--~~ You received this message because you are subscribed to the Google Groups "Django developers" group. To post to this group, send email to django-developers@googlegroups.com To unsubscribe from this group, send email to django-developers+unsubscr...@googlegroups.com For more options, visit this group at http://groups.google.com/group/django-developers?hl=en -~--~~~~--~~--~--~---
Re: Django, initial data and custom SQL
Hello folks, On Feb 12, 8:49 am, Ludvig Ericson wrote: > On Feb 12, 2009, at 07:48, Russell Keith-Magee wrote: > > > On Thu, Feb 12, 2009 at 2:54 PM, Ludvig Ericson > > wrote: > > >> I fail to see how "it has consequences for existing code", as Russell > >> put it. > > > It has consequences because you are proposing to change the order in > > which indexes and custom SQL are applied. Any code that depends on the > > existing order will be affected. > > Just want to note that *I'm* not proposing any changes. > > >> I did discuss this with Bergström, and we came to the conclusion that > >> it won't actually break much code, if any. > > > If it has the potential to break _any_ code, it is an unacceptable > > change. The Django core developers have stated that we will maintain > > backwards compatibility of the v1.0 interface, so any change with even > > the _potential_ to affect backwards compatibility will need to be > > checked very carefully. > > > However, the larger point is that you don't get to make that decision > > about whether a change is acceptable or not. You can discuss a change > > and make a recommendation, but ultimately the decision needs to be > > made by a core developer. The Design Decision Required triage state > > exists for exactly this reason. > > I don't think it's "the larger point." I misunderstood the triaging > system, and incorrectly changed it. Bennett reverted. I apologized. > End of that. > > What I meant by "ready for checking" was that the patch applies > cleanly, runs, and does what it is intended to do. And obviously that > was mistake, and again: I realize this now. > > >> The only case which it could break, AFAICT, is if custom SQL manages > >> to depend on the absence of indexes. I guess that could break code > >> that violates indexing constraints, which are applied later, maybe? I > >> don't know. > > > This is what needs to be confirmed, and given that it is a non-trivial > > change, your decision needs to be confirmed by a core developer. I am > > willing to be convinced, but you will need to prove to me that there > > is no backwards compatibility problem here. The discussion in this > > thread so far hasn't done that. > > I realize that my opinions and decisions play a very small role in the > development of Django, but I try to express them so that core > developers may read them, and either think "what a retard", or "fair > point." > > I guess the question really is: is this custom SQL execution order > change really the right fix for the issue? I can't say I'm convinced > that it is, given your next paragraph. I would suggest the need for > signaling that a custom SQL file should execute post-indices. > > One solution, which is entirely backwards-compatible, would be to say > that "if you want your custom SQL to run after indices creation, name > files '*.post.sql,'" or something like that. That has its own issues, > but you get the general idea. > > > To answer the original question (why is it done in this order) - > > mostly historical reasons. Originally, Django didn't have fixtures, so > > initial data were all loaded from initial SQL. It's generally faster > > to insert data and then add indicies, hence the order. When Django > > added fixtures, the order wasn't changed. > > I see, so the custom SQL machinery grew out of being a crude version > of fixtures then. > > I think the cause of confusion here is that Bergström and I both were > expecting it to be for more than loading data -- I want to create a > view in it (hence my interest), and he wants to drop some excessive > indices. > > After pondering some more, I realized the docs actually say "initial > SQL data", and so I'm not sure if this change is actually a good idea > or not. I also realize that the docs says "initial data" but since fixtures nowadays fills this gap, I think the documentation example (insert into...) is misleading in that it recommends yet another way to insert into your database. It is also misleading in that it says that "there's a hook" when there's actually two (see http://code.djangoproject.com/browser/django/trunk/django/core/management/commands/syncdb.py#L98 and forward) As I stated in the ticket (and Russell above), databases generally want to throw indexes at the bottom of the equation, which now both custom SQL and fixtures is at. Changing this most likely will break some poor guy's code (which _I'm_ perfectly OK with, since they in that case most likely should update it anyway) but understand the Way of Django and fully respect that. This patch is the result. Just for reference, this is what I think should be done for an optimal workflow: * create table * create m2m * run custom data * run fixtures * run indexes * emit "database done" signal This way, you can still insert your favorite SQL with custom data, use fixtures for data, then for advanced users - remove indexes with their hooks. I know this (kind of) retrac
Re: Django, initial data and custom SQL
On Thu, Feb 12, 2009 at 4:49 PM, Ludvig Ericson wrote: > > On Feb 12, 2009, at 07:48, Russell Keith-Magee wrote: >> On Thu, Feb 12, 2009 at 2:54 PM, Ludvig Ericson >> wrote: >>> >>> I fail to see how "it has consequences for existing code", as Russell >>> put it. >> >> It has consequences because you are proposing to change the order in >> which indexes and custom SQL are applied. Any code that depends on the >> existing order will be affected. > > Just want to note that *I'm* not proposing any changes. So noted :-) > I don't think it's "the larger point." I misunderstood the triaging > system, and incorrectly changed it. Bennett reverted. I apologized. > End of that. No problems. > I realize that my opinions and decisions play a very small role in the > development of Django, but I try to express them so that core > developers may read them, and either think "what a retard", or "fair > point." This is exactly what we want (and need) to happen. We (the core devs) rely upon community members such as yourself to have these discussions, taking some of the load off of us so we can concentrate on having smaller discussions where many of the details have already been sorted out, making decisions and committing code. In this case, there was a misunderstanding about the best way to communicate the result of your discussions, but we've resolved that problem. > One solution, which is entirely backwards-compatible, would be to say > that "if you want your custom SQL to run after indices creation, name > files '*.post.sql,'" or something like that. That has its own issues, > but you get the general idea. This was my immediate thought, and yes, it would be backwards compatible. My immediate counter-reaction is that you would require several configuration points (post table creation, post index creation, post data insertion), which means a lot of difficult-to-explain configuration points. >> To answer the original question (why is it done in this order) - >> mostly historical reasons. Originally, Django didn't have fixtures, so >> initial data were all loaded from initial SQL. It's generally faster >> to insert data and then add indicies, hence the order. When Django >> added fixtures, the order wasn't changed. > > I see, so the custom SQL machinery grew out of being a crude version > of fixtures then. Correct. > I think the cause of confusion here is that Bergström and I both were > expecting it to be for more than loading data -- I want to create a > view in it (hence my interest), and he wants to drop some excessive > indices. To be clear - I fully acknowledge your use case. There are many occasions where you need to invoke custom SQL, and the decision to execute that SQL pre or post index installation is significant. I would very much like to be able to support this use case - we just need to find the right way to do it. > After pondering some more, I realized the docs actually say "initial > SQL data", and so I'm not sure if this change is actually a good idea > or not. There shouldn't be anywhere talking about 'initial SQL data' - I thought I purged all of those references. If you find any such references, they should be referring to "custom SQL", or similar phrasing. Yours, Russ Magee %-) --~--~-~--~~~---~--~~ You received this message because you are subscribed to the Google Groups "Django developers" group. To post to this group, send email to django-developers@googlegroups.com To unsubscribe from this group, send email to django-developers+unsubscr...@googlegroups.com For more options, visit this group at http://groups.google.com/group/django-developers?hl=en -~--~~~~--~~--~--~---
Re: Django, initial data and custom SQL
On Thu, Feb 12, 2009 at 5:21 PM, Johan Bergström wrote: > >> After pondering some more, I realized the docs actually say "initial >> SQL data", and so I'm not sure if this change is actually a good idea >> or not. > > I also realize that the docs says "initial data" but since fixtures > nowadays fills this gap, I think the documentation example (insert > into...) is misleading in that it recommends yet another way to insert > into your database. It is also misleading in that it says that > "there's a hook" when there's actually two (see > http://code.djangoproject.com/browser/django/trunk/django/core/management/commands/syncdb.py#L98 > and forward) As I noted in my message to Ludvig, any reference to initial SQL should be removed (and I thought they already had been). SQL hooks _can_ be used for inserting data, but that's not the preferred option. > As I stated in the ticket (and Russell above), databases generally > want to throw indexes at the bottom of the equation, which now both > custom SQL and fixtures is at. Changing this most likely will break > some poor guy's code (which _I'm_ perfectly OK with, since they in > that case most likely should update it anyway) but understand the Way > of Django and fully respect that. This patch is the result. > Just for reference, this is what I think should be done for an optimal > workflow: > * create table > * create m2m > * run custom data > * run fixtures > * run indexes > * emit "database done" signal I agree that if we were starting with a clean slate, this would be close to the right way to do things (and, if isn't that far from the way things are done right now). My only issue would be with moving the signal to the end of the process. A "no, really, I'm finished" signal was requested some time ago: http://code.djangoproject.com/ticket/7561 Interestingly, removing superfluous indicies was one of the use cases provided in support of this change. At the time it was rejected because the specific proposal was backwards compatible (and was made during the v1.0 push). The post_syncdb hook is commonly used for data insertion, and the current location is the right place for that, but having a second signal is a definite possibility. Moving the fixture loading to pre-indexing is also a possibility - I can't think of a reason this would cause backwards-incompatibility problems (and for large fixtures, it should be marginally faster). Yours, Russ Magee %-) --~--~-~--~~~---~--~~ You received this message because you are subscribed to the Google Groups "Django developers" group. To post to this group, send email to django-developers@googlegroups.com To unsubscribe from this group, send email to django-developers+unsubscr...@googlegroups.com For more options, visit this group at http://groups.google.com/group/django-developers?hl=en -~--~~~~--~~--~--~---
Re: Django, initial data and custom SQL
On Feb 12, 2009, at 11:53, Russell Keith-Magee wrote: > On Thu, Feb 12, 2009 at 4:49 PM, Ludvig Ericson >> One solution, which is entirely backwards-compatible, would be to say >> that "if you want your custom SQL to run after indices creation, name >> files '*.post.sql,'" or something like that. That has its own issues, >> but you get the general idea. > > This was my immediate thought, and yes, it would be backwards > compatible. My immediate counter-reaction is that you would require > several configuration points (post table creation, post index > creation, post data insertion), which means a lot of > difficult-to-explain configuration points. Hm, yes... It's true it would get pretty convoluted quickly. The best approach I can think of is making it flexible, something like (just throwing this out there): class M(models.Model): ... class Meta: custom_sql = {"drop_indices": ("post-indices", "pre- inserts")} Or something like that, and you could build a dependency tree out of that, running the SQL at certain points in the code. It has other issues (like being overkill), and it's not really a proposal I like. Instead, I think what you say in your next e-mail is a better idea: provide the developers with the tools (like signals) to make these complicated usecases doable, with a dash of Python code. Then Django could provide a simple function for saying "execute SQL in file X." def drop_indices(sender=None, **kwargs): load_sql(sender, "drop_indices.sql") post_something.connect(drop_indices) (Though this usecase would probably do better with inlining the SQL as I doubt it would span that many lines.) However, as noted, dropping indices isn't *that* uncommon. I think it'd be a good idea to support dropping indices with the ease of adding a file - or even specifying that in the model. (Like `ForeignKey(no_index=True)`.) >> I think the cause of confusion here is that Bergström and I both were >> expecting it to be for more than loading data -- I want to create a >> view in it (hence my interest), and he wants to drop some excessive >> indices. > > To be clear - I fully acknowledge your use case. There are many > occasions where you need to invoke custom SQL, and the decision to > execute that SQL pre or post index installation is significant. I > would very much like to be able to support this use case - we just > need to find the right way to do it. > >> After pondering some more, I realized the docs actually say "initial >> SQL data", and so I'm not sure if this change is actually a good idea >> or not. > > There shouldn't be anywhere talking about 'initial SQL data' - I > thought I purged all of those references. If you find any such > references, they should be referring to "custom SQL", or similar > phrasing. As linked earlier: http://docs.djangoproject.com/en/dev/howto/initial-data/#initial-sql This whole subsection (and the section intro) speaks of it in terms of initial SQL, and is within that context (initial data). A quick googling reveals two files which still have this reference: * http://docs.djangoproject.com/en/dev/howto/initial-data/ * http://docs.djangoproject.com/en/dev/ref/settings/ - Ludvig --~--~-~--~~~---~--~~ You received this message because you are subscribed to the Google Groups "Django developers" group. To post to this group, send email to django-developers@googlegroups.com To unsubscribe from this group, send email to django-developers+unsubscr...@googlegroups.com For more options, visit this group at http://groups.google.com/group/django-developers?hl=en -~--~~~~--~~--~--~---