Re: dumpdata and loaddata as simple DB migration tool?
On Thu, Jun 4, 2009 at 11:19 AM, Brian May wrote: > > On Thu, Jun 04, 2009 at 10:27:35AM +0800, Russell Keith-Magee wrote: >> If you're going to start throwing around claims that loaddata/dumpdata >> doesn't work, you _really_ need to back them up with a demonstrated >> example that proves your claim. We provide Trac for precisely this >> reason, and you will find that claims of data corruption are taken >> very seriously by the core developers. > > Like I said this was a while ago, I think a test case that shows curruption > with Django 0.96 would be pointless. It isn't pointless in the least. The problem _might_ still exist - we have no way of knowing, because this is the first time you've reported it, so we've never made an explicit attempt to fix it. As it currently stands, you have no way of knowing if your problem has been fixed (inadvertently or directly). > I don't expect anybody to take my claims too seriously, You are correct that a single comment probably doesn't matter that much. However, it matters a great deal in aggregate. You probably won't be quoted directly, but one person saying "I've had data corruption problems" becomes a few people saying "I've heard Django has data corruption problems", which becomes "Everyone knows Django has data corruption problems" - eventually the meme gains traction, even if it is completely untrue. Case in point - the "Rails doesn't scale" meme from a few years ago. > the problems I > encountered were probably fixed ages ago. I don't like to assume something is > fixed though unless I have some confirmation. Instead of hoping that your problem was "probably fixed ages ago", how about helping us confirm that it _has_ been fixed, or if it hasn't been fixed, then helping us fix it. A test case is the first step. Yours, Russ Magee %-) --~--~-~--~~~---~--~~ You received this message because you are subscribed to the Google Groups "Django users" group. To post to this group, send email to django-users@googlegroups.com To unsubscribe from this group, send email to django-users+unsubscr...@googlegroups.com For more options, visit this group at http://groups.google.com/group/django-users?hl=en -~--~~~~--~~--~--~---
Re: dumpdata and loaddata as simple DB migration tool?
On Thu, Jun 04, 2009 at 10:27:35AM +0800, Russell Keith-Magee wrote: > If you're going to start throwing around claims that loaddata/dumpdata > doesn't work, you _really_ need to back them up with a demonstrated > example that proves your claim. We provide Trac for precisely this > reason, and you will find that claims of data corruption are taken > very seriously by the core developers. Like I said this was a while ago, I think a test case that shows curruption with Django 0.96 would be pointless. Yes, I definitely did have curruption, as the data in the database changed after doing a loaddata/dumpdata. I could reproduce this on demand. I don't expect anybody to take my claims too seriously, the problems I encountered were probably fixed ages ago. I don't like to assume something is fixed though unless I have some confirmation. -- Brian May --~--~-~--~~~---~--~~ You received this message because you are subscribed to the Google Groups "Django users" group. To post to this group, send email to django-users@googlegroups.com To unsubscribe from this group, send email to django-users+unsubscr...@googlegroups.com For more options, visit this group at http://groups.google.com/group/django-users?hl=en -~--~~~~--~~--~--~---
Re: dumpdata and loaddata as simple DB migration tool?
On Thu, Jun 4, 2009 at 10:00 AM, Brian May wrote: > > On Tue, Jun 02, 2009 at 09:06:38PM -0700, Kegan wrote: >> 1. Use Django's management command "dumpdata" to get the JSON >> representative of an app. Save the JSON into a file (oldmodel.json). >> 2. Git pull the latest code. And do a reset to the app. So the >> database will have the new model schema now. >> 3. Use the python shell to migrate the JSON file to match the new >> model representation. Save the new JSON into a file (newmodel.json). >> 4. Use management command "loaddata" to populate the new model with >> the newmodel.json. > > I tried this at one stage, and it seemed fine, until I realized non-ascii > (UTF8 > I think) characters were being siliently currupted. This is exactly the kind of comment I referred to in my original reply. It's not fair on the Django developers or the Django community to start spreading rumours that Django corrupts data if you're not willing to follow up and actually document the problem you are having. I am not aware of any corruption that happens to non-ascii characters as a result of using loaddata/dumpdata. There are test cases that specifically validate the serialization of non-ascii characters across all the serializers. I don't doubt that you experienced a problem. I would be the last person to claim that loaddata/dumpdata is infallible. However, we can't fix problems we don't know about. If you're going to start throwing around claims that loaddata/dumpdata doesn't work, you _really_ need to back them up with a demonstrated example that proves your claim. We provide Trac for precisely this reason, and you will find that claims of data corruption are taken very seriously by the core developers. Yours, Russ Magee %-) --~--~-~--~~~---~--~~ You received this message because you are subscribed to the Google Groups "Django users" group. To post to this group, send email to django-users@googlegroups.com To unsubscribe from this group, send email to django-users+unsubscr...@googlegroups.com For more options, visit this group at http://groups.google.com/group/django-users?hl=en -~--~~~~--~~--~--~---
Re: dumpdata and loaddata as simple DB migration tool?
On Tue, Jun 02, 2009 at 09:06:38PM -0700, Kegan wrote: > 1. Use Django's management command "dumpdata" to get the JSON > representative of an app. Save the JSON into a file (oldmodel.json). > 2. Git pull the latest code. And do a reset to the app. So the > database will have the new model schema now. > 3. Use the python shell to migrate the JSON file to match the new > model representation. Save the new JSON into a file (newmodel.json). > 4. Use management command "loaddata" to populate the new model with > the newmodel.json. I tried this at one stage, and it seemed fine, until I realized non-ascii (UTF8 I think) characters were being siliently currupted. Of course django has changed a lot since then so this may no longer be an issue. I also tried sql dump with sql, however at the time I couldn't work out how to get sqlite to dump that data with column names. So if the change meant the columns were in a different order (or new columns between old columns), the import would be wrong. As a result I migrated to mysql, which is more flexible. -- Brian May --~--~-~--~~~---~--~~ You received this message because you are subscribed to the Google Groups "Django users" group. To post to this group, send email to django-users@googlegroups.com To unsubscribe from this group, send email to django-users+unsubscr...@googlegroups.com For more options, visit this group at http://groups.google.com/group/django-users?hl=en -~--~~~~--~~--~--~---
Re: dumpdata and loaddata as simple DB migration tool?
On Wed, Jun 3, 2009 at 11:38 PM, Kegan Gan wrote: > > Hi Russell, > > On the first issue: Good point. I have not the opportunity to work > with such a huge database. > > On the second issue: Yes, what I am doing now is really about writing > conversion code to fit the old json to match the new schema. I find > this to be quite straight forward for my use cases, it's only > modifying a list of hash objects. But I probably haven't encounter the > more complicated use cases out there. > > I feel the fact that Django ORM works, dumpdata+loaddata works, ... > makes it compelling as the foundation for a migration tool. Developers > can work with 100% Python code, not needing to learn migration tools > specific APIs, and maybe easier to provide some logic in the migration > process (for example, calculating default values for new fields, based > on some existing database data). > > Also, independent Django app (as in INSTALLED_APP) developers can all > rely on the serialized JSON format, and provide their own migration > code, when they release new version of their app. (How is this > currently done anyway? Lets say if django-tagging changes the model > class in the next release.) They Don't (tm) :-) Or, to be more precise - they try everything possible to avoid changing the model, and if they do need to, they publish the series of SQL ALTER commands that are needed to update the tables in-situ. Another approach is to change the name of the table, and provide commands to read from the old table and insert into the new. The changes made to Django's comments during the 0.96-1.0 transition give you one example of how this can be done [1]. [1] http://docs.djangoproject.com/en/dev/ref/contrib/comments/upgrade/ > PS: As for this three points ... > * If you add a new non-null field, the fixture won't load because it won't provide data for that field. * If you change the name of a field, the fixture will contain data for the old name, but no data for the new name. Any existing data in that field will be lost. * If you change the type of a field, there is no guarantee that the old data will be format-compatible with the new field. > > ... the old fixtures are loaded with old model class. The serialized > json is modified to fit the new model class, and then loaded using the > new model class with "manage loaddata". This is done on a per app > basis (only for the app that needs the migration). As I said in my last email, this can work. However, at the very least, you double your disk space requirements. Depending on the nature of the conversion, there may also be a considerable processing time and temporary memory requirements. Ultimately, I suppose my point is this. SQL databases provide an extensive and reliable infrastructure for managing the conversion of schema. While you _can_ avoid that infrastructure, there comes a point at which you are reinventing wheels just to avoid using a particular brand of tyre. Yours Russ Magee %-) --~--~-~--~~~---~--~~ You received this message because you are subscribed to the Google Groups "Django users" group. To post to this group, send email to django-users@googlegroups.com To unsubscribe from this group, send email to django-users+unsubscr...@googlegroups.com For more options, visit this group at http://groups.google.com/group/django-users?hl=en -~--~~~~--~~--~--~---
Re: dumpdata and loaddata as simple DB migration tool?
Hi Russell, On the first issue: Good point. I have not the opportunity to work with such a huge database. On the second issue: Yes, what I am doing now is really about writing conversion code to fit the old json to match the new schema. I find this to be quite straight forward for my use cases, it's only modifying a list of hash objects. But I probably haven't encounter the more complicated use cases out there. I feel the fact that Django ORM works, dumpdata+loaddata works, ... makes it compelling as the foundation for a migration tool. Developers can work with 100% Python code, not needing to learn migration tools specific APIs, and maybe easier to provide some logic in the migration process (for example, calculating default values for new fields, based on some existing database data). Also, independent Django app (as in INSTALLED_APP) developers can all rely on the serialized JSON format, and provide their own migration code, when they release new version of their app. (How is this currently done anyway? Lets say if django-tagging changes the model class in the next release.) Thank you for the reply. PS: As for this three points ... >>> * If you add a new non-null field, the fixture won't load because it won't >>> provide data for that field. >>> * If you change the name of a field, the fixture will contain data for the >>> old name, but no data for the new name. Any existing data in that field >>> will be lost. >>> * If you change the type of a field, there is no guarantee that the old >>> data will be format-compatible with the new field. ... the old fixtures are loaded with old model class. The serialized json is modified to fit the new model class, and then loaded using the new model class with "manage loaddata". This is done on a per app basis (only for the app that needs the migration). On Jun 3, 2:57 pm, Russell Keith-Magee wrote: > On Wed, Jun 3, 2009 at 12:06 PM, Kegan wrote: > > > Hi, > > > About Django database migration. I know there's a couple of tools > > available (South, evolution, dmigration, etc), but I am pondering an > > alternative here. Hope good discussion entails. > > > This is what I am doing manually now for my database migration needs. > > > 1. Use Django's management command "dumpdata" to get the JSON > > representative of an app. Save the JSON into a file (oldmodel.json). > > 2. Git pull the latest code. And do a reset to the app. So the > > database will have the new model schema now. > > 3. Use the python shell to migrate the JSON file to match the new > > model representation. Save the new JSON into a file (newmodel.json). > > 4. Use management command "loaddata" to populate the new model with > > the newmodel.json. > > > This has work well so far for my use with PostgreSQL, albeit very > > manual. It's all Python code. > > > So I guess what I want to ask Django experts/developers here is that: > > > 1. Is there any reason this shouldn't be done? > > 2. Any technical challenge that prevent this method from being > > realized into a more automate tool? > > The answer to these two questions is essentially the same - there are > 2 technical challenges that provides reasons why you might not want to > commit to this path. > > Firstly, there is a simple matter of time and space. If you have a > small database, deserializing then reserializing won't take too long, > and won't result in huge files. However, I routinely work with a > database that contains 5GB of data (and growing) - and that's when > it's neatly packed into a database binary format. Spooled out into a > relatively space-inefficient serialized form will consume much more > disk space. On top of that, the time required to produce and reload a > fixture in this format would be prohibitive. > > Secondly, there is the limitations of the approach itself. The > fixture-based migration tool will really only help with simple > migrations where the schema isn't really changing that much - for > example, increasing the size of a CharField. However, for any > non-trivial change, you will hit problems. For example: > > * If you add a new non-null field, the fixture won't load because it > won't provide data for that field. > > * If you change the name of a field, the fixture will contain data > for the old name, but no data for the new name. Any existing data in > that field will be lost. > > * If you change the type of a field, there is no guarantee that the > old data will be format-compatible with the new field. > > So while the dump+load approach will work for simple cases, in > practice, it doesn't work at all. You might be able to work around the > second problem by writing a conversion tool for fixtures that modifies > the old fixture to suit the new format, but you will still be left > with the first problem, now amplified by the need to create and store > an updated fixture. > > > (I read somewhere that dumpdata > > doesn't always work?) > > I've heard many make this claim, but very few have be
Re: dumpdata and loaddata as simple DB migration tool?
On Wed, Jun 3, 2009 at 12:06 PM, Kegan wrote: > > Hi, > > About Django database migration. I know there's a couple of tools > available (South, evolution, dmigration, etc), but I am pondering an > alternative here. Hope good discussion entails. > > This is what I am doing manually now for my database migration needs. > > 1. Use Django's management command "dumpdata" to get the JSON > representative of an app. Save the JSON into a file (oldmodel.json). > 2. Git pull the latest code. And do a reset to the app. So the > database will have the new model schema now. > 3. Use the python shell to migrate the JSON file to match the new > model representation. Save the new JSON into a file (newmodel.json). > 4. Use management command "loaddata" to populate the new model with > the newmodel.json. > > This has work well so far for my use with PostgreSQL, albeit very > manual. It's all Python code. > > So I guess what I want to ask Django experts/developers here is that: > > 1. Is there any reason this shouldn't be done? > 2. Any technical challenge that prevent this method from being > realized into a more automate tool? The answer to these two questions is essentially the same - there are 2 technical challenges that provides reasons why you might not want to commit to this path. Firstly, there is a simple matter of time and space. If you have a small database, deserializing then reserializing won't take too long, and won't result in huge files. However, I routinely work with a database that contains 5GB of data (and growing) - and that's when it's neatly packed into a database binary format. Spooled out into a relatively space-inefficient serialized form will consume much more disk space. On top of that, the time required to produce and reload a fixture in this format would be prohibitive. Secondly, there is the limitations of the approach itself. The fixture-based migration tool will really only help with simple migrations where the schema isn't really changing that much - for example, increasing the size of a CharField. However, for any non-trivial change, you will hit problems. For example: * If you add a new non-null field, the fixture won't load because it won't provide data for that field. * If you change the name of a field, the fixture will contain data for the old name, but no data for the new name. Any existing data in that field will be lost. * If you change the type of a field, there is no guarantee that the old data will be format-compatible with the new field. So while the dump+load approach will work for simple cases, in practice, it doesn't work at all. You might be able to work around the second problem by writing a conversion tool for fixtures that modifies the old fixture to suit the new format, but you will still be left with the first problem, now amplified by the need to create and store an updated fixture. > (I read somewhere that dumpdata > doesn't always work?) I've heard many make this claim, but very few have been able to back it up. I would be the last person to claim that the serializers are bug free, but I would claim that they are very stable, and there is an extensive test suite to back up my claim. There are a small number of known problems, but they tend to be in fairly esoteric areas of serialization, and shouldn't be encountered by most users. If anyone finds a bug in the serializers, I strongly encourage them to report the problem. Problems only get fixed when they are reported. > 3. What do you think about having migration code be more part of the > deployment tool, rather than couple into the source code ? I would heartily agree. However, I would also claim that the tools you mention (Evolution, South etc) are deployment tools and are in no way coupled to source code. Migrations don't interact with the day-to-day usage of your code - they are only ever invoked by the deployment tool as part of a syncdb/migrate command. Yours Russ Magee %-) --~--~-~--~~~---~--~~ You received this message because you are subscribed to the Google Groups "Django users" group. To post to this group, send email to django-users@googlegroups.com To unsubscribe from this group, send email to django-users+unsubscr...@googlegroups.com For more options, visit this group at http://groups.google.com/group/django-users?hl=en -~--~~~~--~~--~--~---
dumpdata and loaddata as simple DB migration tool?
Hi, About Django database migration. I know there's a couple of tools available (South, evolution, dmigration, etc), but I am pondering an alternative here. Hope good discussion entails. This is what I am doing manually now for my database migration needs. 1. Use Django's management command "dumpdata" to get the JSON representative of an app. Save the JSON into a file (oldmodel.json). 2. Git pull the latest code. And do a reset to the app. So the database will have the new model schema now. 3. Use the python shell to migrate the JSON file to match the new model representation. Save the new JSON into a file (newmodel.json). 4. Use management command "loaddata" to populate the new model with the newmodel.json. This has work well so far for my use with PostgreSQL, albeit very manual. It's all Python code. So I guess what I want to ask Django experts/developers here is that: 1. Is there any reason this shouldn't be done? 2. Any technical challenge that prevent this method from being realized into a more automate tool? (I read somewhere that dumpdata doesn't always work?) 3. What do you think about having migration code be more part of the deployment tool, rather than couple into the source code ? Thanks. --~--~-~--~~~---~--~~ You received this message because you are subscribed to the Google Groups "Django users" group. To post to this group, send email to django-users@googlegroups.com To unsubscribe from this group, send email to django-users+unsubscr...@googlegroups.com For more options, visit this group at http://groups.google.com/group/django-users?hl=en -~--~~~~--~~--~--~---