Re: dumpdata and loaddata as simple DB migration tool?

2009-06-03 Thread Russell Keith-Magee

On Thu, Jun 4, 2009 at 11:19 AM, Brian May
 wrote:
>
> On Thu, Jun 04, 2009 at 10:27:35AM +0800, Russell Keith-Magee wrote:
>> If you're going to start throwing around claims that loaddata/dumpdata
>> doesn't work, you _really_ need to back them up with a demonstrated
>> example that proves your claim. We provide Trac for precisely this
>> reason, and you will find that claims of data corruption are taken
>> very seriously by the core developers.
>
> Like I said this was a while ago, I think a test case that shows curruption
> with Django 0.96 would be pointless.

It isn't pointless in the least. The problem _might_ still exist - we
have no way of knowing, because this is the first time you've reported
it, so we've never made an explicit attempt to fix it. As it currently
stands, you have no way of knowing if your problem has been fixed
(inadvertently or directly).

> I don't expect anybody to take my claims too seriously,

You are correct that a single comment probably doesn't matter that
much. However, it matters a great deal in aggregate. You probably
won't be quoted directly, but one person saying "I've had data
corruption problems" becomes a few people saying "I've heard Django
has data corruption problems", which becomes "Everyone knows Django
has data corruption problems" - eventually the meme gains traction,
even if it is completely untrue. Case in point - the "Rails doesn't
scale" meme from a few years ago.

> the problems I
> encountered were probably fixed ages ago. I don't like to assume something is
> fixed though unless I have some confirmation.

Instead of hoping that your problem was "probably fixed ages ago", how
about helping us confirm that it _has_ been fixed, or if it hasn't
been fixed, then helping us fix it. A test case is the first step.

Yours,
Russ Magee %-)

--~--~-~--~~~---~--~~
You received this message because you are subscribed to the Google Groups 
"Django users" group.
To post to this group, send email to django-users@googlegroups.com
To unsubscribe from this group, send email to 
django-users+unsubscr...@googlegroups.com
For more options, visit this group at 
http://groups.google.com/group/django-users?hl=en
-~--~~~~--~~--~--~---



Re: dumpdata and loaddata as simple DB migration tool?

2009-06-03 Thread Brian May

On Thu, Jun 04, 2009 at 10:27:35AM +0800, Russell Keith-Magee wrote:
> If you're going to start throwing around claims that loaddata/dumpdata
> doesn't work, you _really_ need to back them up with a demonstrated
> example that proves your claim. We provide Trac for precisely this
> reason, and you will find that claims of data corruption are taken
> very seriously by the core developers.

Like I said this was a while ago, I think a test case that shows curruption
with Django 0.96 would be pointless.

Yes, I definitely did have curruption, as the data in the database changed
after doing a loaddata/dumpdata. I could reproduce this on demand.

I don't expect anybody to take my claims too seriously, the problems I
encountered were probably fixed ages ago. I don't like to assume something is
fixed though unless I have some confirmation.
-- 
Brian May 

--~--~-~--~~~---~--~~
You received this message because you are subscribed to the Google Groups 
"Django users" group.
To post to this group, send email to django-users@googlegroups.com
To unsubscribe from this group, send email to 
django-users+unsubscr...@googlegroups.com
For more options, visit this group at 
http://groups.google.com/group/django-users?hl=en
-~--~~~~--~~--~--~---



Re: dumpdata and loaddata as simple DB migration tool?

2009-06-03 Thread Russell Keith-Magee

On Thu, Jun 4, 2009 at 10:00 AM, Brian May
 wrote:
>
> On Tue, Jun 02, 2009 at 09:06:38PM -0700, Kegan wrote:
>> 1. Use Django's management command "dumpdata" to get the JSON
>> representative of an app. Save the JSON into a file (oldmodel.json).
>> 2. Git pull the latest code. And do a reset to the app. So the
>> database will have the new model schema now.
>> 3. Use the python shell to migrate the JSON file to match the new
>> model representation. Save the new JSON into a file (newmodel.json).
>> 4. Use management command "loaddata" to populate the new model with
>> the newmodel.json.
>
> I tried this at one stage, and it seemed fine, until I realized non-ascii 
> (UTF8
> I think) characters were being siliently currupted.

This is exactly the kind of comment I referred to in my original reply.

It's not fair on the Django developers or the Django community to
start spreading rumours that Django corrupts data if you're not
willing to follow up and actually document the problem you are having.

I am not aware of any corruption that happens to non-ascii characters
as a result of using loaddata/dumpdata. There are test cases that
specifically validate the serialization of non-ascii characters across
all the serializers. I don't doubt that you experienced a problem. I
would be the last person to claim that loaddata/dumpdata is
infallible. However, we can't fix problems we don't know about.

If you're going to start throwing around claims that loaddata/dumpdata
doesn't work, you _really_ need to back them up with a demonstrated
example that proves your claim. We provide Trac for precisely this
reason, and you will find that claims of data corruption are taken
very seriously by the core developers.

Yours,
Russ Magee %-)

--~--~-~--~~~---~--~~
You received this message because you are subscribed to the Google Groups 
"Django users" group.
To post to this group, send email to django-users@googlegroups.com
To unsubscribe from this group, send email to 
django-users+unsubscr...@googlegroups.com
For more options, visit this group at 
http://groups.google.com/group/django-users?hl=en
-~--~~~~--~~--~--~---



Re: dumpdata and loaddata as simple DB migration tool?

2009-06-03 Thread Brian May

On Tue, Jun 02, 2009 at 09:06:38PM -0700, Kegan wrote:
> 1. Use Django's management command "dumpdata" to get the JSON
> representative of an app. Save the JSON into a file (oldmodel.json).
> 2. Git pull the latest code. And do a reset to the app. So the
> database will have the new model schema now.
> 3. Use the python shell to migrate the JSON file to match the new
> model representation. Save the new JSON into a file (newmodel.json).
> 4. Use management command "loaddata" to populate the new model with
> the newmodel.json.

I tried this at one stage, and it seemed fine, until I realized non-ascii (UTF8
I think) characters were being siliently currupted.

Of course django has changed a lot since then so this may no longer be an issue.

I also tried sql dump with sql, however at the time I couldn't work out how to
get sqlite to dump that data with column names. So if the change meant the
columns were in a different order (or new columns between old columns), the
import would be wrong. As a result I migrated to mysql, which is more flexible.
-- 
Brian May 

--~--~-~--~~~---~--~~
You received this message because you are subscribed to the Google Groups 
"Django users" group.
To post to this group, send email to django-users@googlegroups.com
To unsubscribe from this group, send email to 
django-users+unsubscr...@googlegroups.com
For more options, visit this group at 
http://groups.google.com/group/django-users?hl=en
-~--~~~~--~~--~--~---



Re: dumpdata and loaddata as simple DB migration tool?

2009-06-03 Thread Russell Keith-Magee

On Wed, Jun 3, 2009 at 11:38 PM, Kegan Gan  wrote:
>
> Hi Russell,
>
> On the first issue: Good point. I have not the opportunity to work
> with such a huge database.
>
> On the second issue: Yes, what I am doing now is really about writing
> conversion code to fit the old json to match the new schema. I find
> this to be quite straight forward for my use cases, it's only
> modifying a list of hash objects. But I probably haven't encounter the
> more complicated use cases out there.
>
> I feel the fact that Django ORM works, dumpdata+loaddata works, ...
> makes it compelling as the foundation for a migration tool. Developers
> can work with 100% Python code, not needing to learn migration tools
> specific APIs, and maybe easier to provide some logic in the migration
> process (for example, calculating default values for new fields, based
> on some existing database data).
>
> Also, independent Django app (as in INSTALLED_APP) developers can all
> rely on the serialized JSON format, and provide their own migration
> code, when they release new version of their app. (How is this
> currently done anyway? Lets say if django-tagging changes the model
> class in the next release.)

They Don't (tm) :-)

Or, to be more precise - they try everything possible to avoid
changing the model, and if they do need to, they publish the series of
SQL ALTER commands that are needed to update the tables in-situ.
Another approach is to change the name of the table, and provide
commands to read from the old table and insert into the new. The
changes made to Django's comments during the 0.96-1.0 transition give
you one example of how this can be done [1].

[1] http://docs.djangoproject.com/en/dev/ref/contrib/comments/upgrade/

> PS: As for this three points ...
>
 * If you add a new non-null field, the fixture won't load because it won't 
 provide data for that field.
 * If you change the name of a field, the fixture will contain data for the 
 old name, but no data for the new name. Any existing data in that field 
 will be lost.
 * If you change the type of a field, there is no guarantee that the old 
 data will be format-compatible with the new field.
>
> ... the old fixtures are loaded with old model class. The serialized
> json is modified to fit the new model class, and then loaded using the
> new model class with "manage loaddata". This is done on a per app
> basis (only for the app that needs the migration).

As I said in my last email, this can work. However, at the very least,
you double your disk space requirements. Depending on the nature of
the conversion, there may also be a considerable processing time and
temporary memory requirements.

Ultimately, I suppose my point is this. SQL databases provide an
extensive and reliable infrastructure for managing the conversion of
schema. While you _can_ avoid that infrastructure, there comes a point
at which you are reinventing wheels just to avoid using a particular
brand of tyre.

Yours
Russ Magee %-)

--~--~-~--~~~---~--~~
You received this message because you are subscribed to the Google Groups 
"Django users" group.
To post to this group, send email to django-users@googlegroups.com
To unsubscribe from this group, send email to 
django-users+unsubscr...@googlegroups.com
For more options, visit this group at 
http://groups.google.com/group/django-users?hl=en
-~--~~~~--~~--~--~---



Re: dumpdata and loaddata as simple DB migration tool?

2009-06-03 Thread Kegan Gan

Hi Russell,

On the first issue: Good point. I have not the opportunity to work
with such a huge database.

On the second issue: Yes, what I am doing now is really about writing
conversion code to fit the old json to match the new schema. I find
this to be quite straight forward for my use cases, it's only
modifying a list of hash objects. But I probably haven't encounter the
more complicated use cases out there.

I feel the fact that Django ORM works, dumpdata+loaddata works, ...
makes it compelling as the foundation for a migration tool. Developers
can work with 100% Python code, not needing to learn migration tools
specific APIs, and maybe easier to provide some logic in the migration
process (for example, calculating default values for new fields, based
on some existing database data).

Also, independent Django app (as in INSTALLED_APP) developers can all
rely on the serialized JSON format, and provide their own migration
code, when they release new version of their app. (How is this
currently done anyway? Lets say if django-tagging changes the model
class in the next release.)


Thank you for the reply.


PS: As for this three points ...

>>> * If you add a new non-null field, the fixture won't load because it won't 
>>> provide data for that field.
>>> * If you change the name of a field, the fixture will contain data for the 
>>> old name, but no data for the new name. Any existing data in that field 
>>> will be lost.
>>> * If you change the type of a field, there is no guarantee that the old 
>>> data will be format-compatible with the new field.

... the old fixtures are loaded with old model class. The serialized
json is modified to fit the new model class, and then loaded using the
new model class with "manage loaddata". This is done on a per app
basis (only for the app that needs the migration).



On Jun 3, 2:57 pm, Russell Keith-Magee  wrote:
> On Wed, Jun 3, 2009 at 12:06 PM, Kegan  wrote:
>
> > Hi,
>
> > About Django database migration. I know there's a couple of tools
> > available (South, evolution, dmigration, etc), but I am pondering an
> > alternative here. Hope good discussion entails.
>
> > This is what I am doing manually now for my database migration needs.
>
> > 1. Use Django's management command "dumpdata" to get the JSON
> > representative of an app. Save the JSON into a file (oldmodel.json).
> > 2. Git pull the latest code. And do a reset to the app. So the
> > database will have the new model schema now.
> > 3. Use the python shell to migrate the JSON file to match the new
> > model representation. Save the new JSON into a file (newmodel.json).
> > 4. Use management command "loaddata" to populate the new model with
> > the newmodel.json.
>
> > This has work well so far for my use with PostgreSQL, albeit very
> > manual. It's all Python code.
>
> > So I guess what I want to ask Django experts/developers here is that:
>
> > 1. Is there any reason this shouldn't be done?
> > 2. Any technical challenge that prevent this method from being
> > realized into a more automate tool?
>
> The answer to these two questions is essentially the same - there are
> 2 technical challenges that provides reasons why you might not want to
> commit to this path.
>
> Firstly, there is a simple matter of time and space. If you have a
> small database, deserializing then reserializing won't take too long,
> and won't result in huge files. However, I routinely work with a
> database that contains 5GB of data (and growing) - and that's when
> it's neatly packed into a database binary format. Spooled out into a
> relatively space-inefficient serialized form will consume much more
> disk space. On top of that, the time required to produce and reload a
> fixture in this format would be prohibitive.
>
> Secondly, there is the limitations of the approach itself. The
> fixture-based migration tool will really only help with simple
> migrations where the schema isn't really changing that much - for
> example, increasing the size of a CharField. However, for any
> non-trivial change, you will hit problems. For example:
>
>  * If you add a new non-null field, the fixture won't load because it
> won't provide data for that field.
>
>  * If you change the name of a field, the fixture will contain data
> for the old name, but no data for the new name. Any existing data in
> that field will be lost.
>
>  * If you change the type of a field, there is no guarantee that the
> old data will be format-compatible with the new field.
>
> So while the dump+load approach will work for simple cases, in
> practice, it doesn't work at all. You might be able to work around the
> second problem by writing a conversion tool for fixtures that modifies
> the old fixture to suit the new format, but you will still be left
> with the first problem, now amplified by the need to create and store
> an updated fixture.
>
> > (I read somewhere that dumpdata
> > doesn't always work?)
>
> I've heard many make this claim, but very few have be

Re: dumpdata and loaddata as simple DB migration tool?

2009-06-02 Thread Russell Keith-Magee

On Wed, Jun 3, 2009 at 12:06 PM, Kegan  wrote:
>
> Hi,
>
> About Django database migration. I know there's a couple of tools
> available (South, evolution, dmigration, etc), but I am pondering an
> alternative here. Hope good discussion entails.
>
> This is what I am doing manually now for my database migration needs.
>
> 1. Use Django's management command "dumpdata" to get the JSON
> representative of an app. Save the JSON into a file (oldmodel.json).
> 2. Git pull the latest code. And do a reset to the app. So the
> database will have the new model schema now.
> 3. Use the python shell to migrate the JSON file to match the new
> model representation. Save the new JSON into a file (newmodel.json).
> 4. Use management command "loaddata" to populate the new model with
> the newmodel.json.
>
> This has work well so far for my use with PostgreSQL, albeit very
> manual. It's all Python code.
>
> So I guess what I want to ask Django experts/developers here is that:
>
> 1. Is there any reason this shouldn't be done?
> 2. Any technical challenge that prevent this method from being
> realized into a more automate tool?

The answer to these two questions is essentially the same - there are
2 technical challenges that provides reasons why you might not want to
commit to this path.

Firstly, there is a simple matter of time and space. If you have a
small database, deserializing then reserializing won't take too long,
and won't result in huge files. However, I routinely work with a
database that contains 5GB of data (and growing) - and that's when
it's neatly packed into a database binary format. Spooled out into a
relatively space-inefficient serialized form will consume much more
disk space. On top of that, the time required to produce and reload a
fixture in this format would be prohibitive.

Secondly, there is the limitations of the approach itself. The
fixture-based migration tool will really only help with simple
migrations where the schema isn't really changing that much - for
example, increasing the size of a CharField. However, for any
non-trivial change, you will hit problems. For example:

 * If you add a new non-null field, the fixture won't load because it
won't provide data for that field.

 * If you change the name of a field, the fixture will contain data
for the old name, but no data for the new name. Any existing data in
that field will be lost.

 * If you change the type of a field, there is no guarantee that the
old data will be format-compatible with the new field.

So while the dump+load approach will work for simple cases, in
practice, it doesn't work at all. You might be able to work around the
second problem by writing a conversion tool for fixtures that modifies
the old fixture to suit the new format, but you will still be left
with the first problem, now amplified by the need to create and store
an updated fixture.

> (I read somewhere that dumpdata
> doesn't always work?)

I've heard many make this claim, but very few have been able to back
it up. I would be the last person to claim that the serializers are
bug free, but I would claim that they are very stable, and there is an
extensive test suite to back up my claim. There are a small number of
known problems, but they tend to be in fairly esoteric areas of
serialization, and shouldn't be encountered by most users.

If anyone finds a bug in the serializers, I strongly encourage them to
report the problem. Problems only get fixed when they are reported.

> 3. What do you think about having migration code be more part of the
> deployment tool, rather than couple into the source code ?

I would heartily agree. However, I would also claim that the tools you
mention (Evolution, South etc) are deployment tools and are in no way
coupled to source code. Migrations don't interact with the day-to-day
usage of your code - they are only ever invoked by the deployment tool
as part of a syncdb/migrate command.

Yours
Russ Magee %-)

--~--~-~--~~~---~--~~
You received this message because you are subscribed to the Google Groups 
"Django users" group.
To post to this group, send email to django-users@googlegroups.com
To unsubscribe from this group, send email to 
django-users+unsubscr...@googlegroups.com
For more options, visit this group at 
http://groups.google.com/group/django-users?hl=en
-~--~~~~--~~--~--~---



dumpdata and loaddata as simple DB migration tool?

2009-06-02 Thread Kegan

Hi,

About Django database migration. I know there's a couple of tools
available (South, evolution, dmigration, etc), but I am pondering an
alternative here. Hope good discussion entails.

This is what I am doing manually now for my database migration needs.

1. Use Django's management command "dumpdata" to get the JSON
representative of an app. Save the JSON into a file (oldmodel.json).
2. Git pull the latest code. And do a reset to the app. So the
database will have the new model schema now.
3. Use the python shell to migrate the JSON file to match the new
model representation. Save the new JSON into a file (newmodel.json).
4. Use management command "loaddata" to populate the new model with
the newmodel.json.

This has work well so far for my use with PostgreSQL, albeit very
manual. It's all Python code.

So I guess what I want to ask Django experts/developers here is that:

1. Is there any reason this shouldn't be done?
2. Any technical challenge that prevent this method from being
realized into a more automate tool? (I read somewhere that dumpdata
doesn't always work?)
3. What do you think about having migration code be more part of the
deployment tool, rather than couple into the source code ?

Thanks.

--~--~-~--~~~---~--~~
You received this message because you are subscribed to the Google Groups 
"Django users" group.
To post to this group, send email to django-users@googlegroups.com
To unsubscribe from this group, send email to 
django-users+unsubscr...@googlegroups.com
For more options, visit this group at 
http://groups.google.com/group/django-users?hl=en
-~--~~~~--~~--~--~---