Re: GSoC: Data importation class

subs...@gmail.com Thu, 25 Mar 2010 10:48:02 -0700

On Mar 25, 12:34 pm, Malcolm Tredinnick <malc...@pointy-stick.com>
wrote:
> Hi,
>
> I see a few problems here. The gist of what follows is that it seems a
> bit abstract and as one tries to nail down the specifics it either
> devolves to a more-or-less already solved problem that doesn't require
> Django core changes, or a problem that is so unconstrained as to not be
> solvable by a framework (requiring, instead the full power of Python,
> which is already in the developer's hands).

In situations where inspectdb doesn't work, I've found its much less
work to get data into a shape where it _will_ work, rather than
resorting to scripting the entirety.

> On Thu, 2010-03-25 at 09:07 -0700, subs...@gmail.com wrote:
> > mix). With talk of model-level validation, the first approach is
> > becoming increasingly invalid,
>
> That's not a correct statement, since Django models can often be used to
> proscribe conditions on new data that is created via the web app, yet
> those conditions might not be required for the universal set of data
> that already exists. For example, webapp-generated data might always
> require a particular field, such as the creating user, to be filled in,
> whilst machine-generated data would not require that. Don't equate
> validation conditions at the model level with constraints on the data at
> the storage level.

I'm hard pressed to imagine a situation where I want validation to
only apply to incoming data. Out of laziness I might choose not to
apply some conditions to existing data as I did to a form, knowing
that the next time the user touches the record the validation will
kick in. This whole point seems to rest on a 'might' which I can't
recall ever encountering. Regardless, I tend to regard legacy data
with the same caution as I treat incoming user data, but it is
currently worlds more difficult to do the former compared to the
latter.

> The last bit sounds a bit nebulous. You could optimise it by not
> including any empty files, or be a bit more specific about what the
> empty files are meant to represent. :)

startapp, startproject, et al.

> It seems that you are talking about the cases where, by default, a
> different schema is required. The first approach is to make the models
> match the existing schema, on the grounds that the existing schema is a
> reasonable representation of the data. In the case where that isn't
> true, a migration is required, but the possibilities for such migrations
> are endless unless the original data can already be put into natural
> Django models. If inspectdb can already be run on the existing data, why
> not use that as the starting point and then the dev can use something
> like South to migrate to their schema of choice? It seems that we
> already have all the tools in that case.

South has a target use case of relatively simple changes to schema,
and assisting teams maintain synchronicity. However, it isn't long
before you're really pushing the limits of South. Take for example one
legacy model which needs to be split into two or three current models:
South has no answer for this as you may only use defaults for creation
of fields (in this case, foreign keys, or potentially OneToOnes, and
then what? What if my defaults are based on other values in that
record? Already, I'm on my own). By and large, South is an immaculate
tool for tracking changes during development.

> If inspectdb cannot generate a useful schema that can be modelled by
> Django, the user is going to have write a generic Python script in any
> case and the possibilities there are boundless and best left to the best
> tool available for the job at hand: Python itself.

In theory, yes. But in practice, I've found the shortest way around
the mountain is to get it into "SQL-enough" format manually. As for
Python, sure, but the more you write this monolithic script the more
you realize you're conducting a lot of repetitive work, the mechanics
of which are generally re-usable but an implementation which is
completely nuanced to your current task. If data importation is
something you do a lot, you've probably got a file somewhere holding
piecemeal bits that are hopefully vaguely useful to the next project,
all the while not being able to fight off the feeling that this
general task mirrors a lot of what goes on in forms with
clean_somefield() and clean().

> Adding system administration functionality to Django, which is what this
> monitoring is, feels like the wrong approach. It's not intended to
> replace everything else in your computing life. What is appropriate load
> usage for one case will be highly inappropriate elsewhere. How will you
> detect what you are labelling as "thrashing"?

I'm puzzled by this conclusion. The 'system administration
functionality' isn't in any way different to what you'd find in all
kinds of projects--South included. I'm not even sure what to say about
the 'everything else in your computing life' statement, but I will
assume good faith and that you're not alleging I'm presenting this as
some kind of crutch for more correct methods. As for detecting 'what
is thrashing', there are only so many tasks that can be conducted in
the business of moving large gobs of data, some of these tasks (often)
bring CPU to 100% and (hopefully never) bring free memory to %0.
Things that do one or the other are treated like bugs or avoided.

-Steve

-- 
You received this message because you are subscribed to the Google Groups 
"Django developers" group.
To post to this group, send email to django-develop...@googlegroups.com.
To unsubscribe from this group, send email to 
django-developers+unsubscr...@googlegroups.com.
For more options, visit this group at 
http://groups.google.com/group/django-developers?hl=en.

Re: GSoC: Data importation class

Reply via email to