> On 25/03/10 17:47, subs...@gmail.com wrote:
>
> >> The last bit sounds a bit nebulous. You could optimise it by not
> >> including any empty files, or be a bit more specific about what the
> >> empty files are meant to represent. :)
>
> > startapp, startproject, et al.
>
> I see where you're coming from here; in the final proposal, though,
> you'd want to follow Malcom's advice and have an actual use case for
> each file. I'd say you probably only want one, or even none - if your
> approach is so complicated that even simple use cases need lots of
> files, you're unlikely to get much traction.

The way I see it, you need  1) a stage directory with 2)
conversions.py for your conversion classes 3) legacy_models.py 4)
legacy_routers.py (potentially)

> South does have an answer to this - you create the columns as nullable,
> add in the data, and then alter them back to non-nullable. That's the
> only way a database is going to let you add a column; they need either a
> global default or NOT NULL (there are some cases you can do it, but
> they're really not appropriate for most people).
>
> In fact, I'd say this fits perfectly into the South model; one migration
> to make the two/three new tables, one that moves all the data around
> using the ORM (something Django developers know, and mostly love), and
> one to delete the old table. If you only use --auto then yes, it's only
> good at tracking small changes, but the rest of the power is right
> there, you just have to actually write code.

Sorry, I find this a bit awkward. I suspect if it was all as kosher as
you suggest, you'd be doing it yourself instead of pure scripting.
Moreover, it presumes a schema-similarity and cleanliness of data
which legacy data, by and large, can't be trusted with. This whole
approach is basically, accept the legacy schema as it is and work it
incrementally into a shape that looks like your modern schema.
Hopefully you arrive! I'm skeptical.

> Actually, most of my data import scripts are while loops over
> cursor.fetchall(), which just use the ORM to put in new data. With
> MultiDB, I probably won't even need the cursor part, I can just loop
> over the legacy model and insert into the new one.

Well, exactly. That's how all of these types of scripts work. I
suppose after I did this 3 times I found the whole business a bit
repetitious, not to mention what I ended up with was a 500 line series
of loops, probably split into multiple files so I could run them
independently. And, at the end of the day, 100% trash code. What I
hope for was a tool that helped me logically partition this work, and
free me to marshall it with minimum hassle, so that I could run and re-
run it until, at the end of the day, it came out cleanly.

> While it might be nice to make this more generic, each of those while
> loops has slightly different bodies - correcting coordinates here,
> fixing postcodes there - and the generic bits only take up one or two
> lines each time.

Sure, a datetime formatter is potentially as little as 3 or 4 lines,
but re-usable across dozens of fields, to say nothing of re-usability
between conversion projects.

> I'd really like to see a more compact or succinct way of doing this, but
> I'm dubious as to how flexible it would be. I'm more than happy to be
> proven wrong, however.

class Conversion():
    date_added = legacy_date_added(format=funky_datetime_formatter)

> I read your initial proposal here as "code things in a sensible way",
> not "actively monitor performance and correct on the fly". Using
> pagination and making sure there's no memory leaks in the code's loops
> is a great idea, attempting to self-optimise at runtime probably isn't.

Yeah, I don't know where this active monitor stuff came up. But yeah,
pagination among others. There are other ideas, such as tying in
contenttypes and allowing users to see just how their legacy data
converted--immensely helpful in detecting problems.

I would love to see a "perfectionists with deadlines" approach catch
on in this area. I hearing from potential clients how much other
companies quote them for data conversion. Its always something like
$100/hr @ 20 hours. I'm glad when I can offer the same service at 4
hours. That on top of all the other borderline magic of Django.

-Steve

-- 
You received this message because you are subscribed to the Google Groups 
"Django developers" group.
To post to this group, send email to django-develop...@googlegroups.com.
To unsubscribe from this group, send email to 
django-developers+unsubscr...@googlegroups.com.
For more options, visit this group at 
http://groups.google.com/group/django-developers?hl=en.

Reply via email to