Re: GSoC: Effortless Model Testing

Jason Ledbetter Thu, 03 Apr 2008 10:18:38 -0700

> The main part of your application is something I'll think about some
> more, but I'm not really sure what development effort is involved there.


To address this particular point a little more:

My original proposal was lean on details, which is entirely my fault.
To be honest, I've never participated in something like this and I
didn't want to make a 30 page document for anyone to comb through if
the reaction to the great idea is "we don't want that". As my previous
reply indicates, I have no problem with writing at length. ;) I just
didn't know what balance to strike for this particular situation.

I erred a little too much on the side of summary and ended up
accenting the wrong aspects of my proposal. My apologies. That's why
dialogue like this is so useful.

As for development time, it's easy to underestimate the problem space
here because when a given programmer writes a script to generate
things like this, it's tedious but straight forward. A more generic
tool actually becomes rather complex to implement well.

Let's take one example, a psuedo-random name generator. One
implementation is as simple as a dictionary of first names and last
names and a call to random.randrange. I included such an example in my
proposal to demonstrate the intended ease of plugging custom
generators into the batch system. However, the generators I'd include
are on a different level.

With the intended name generator, for example:

- Do we want middle names?
- Names for what culture?
- In unicode? Which unicode set?
- What maximum length?
- Do we want designators (jr, sr, etc)
- Which ones?

That's just the beginning of the problem space. It gets more complex
for, say, sample addresses. Addresses for which country? They're
structured wildly different. What if your contact database is truly
international? Now we need to match the culture of the name and the
address of each contact and figure out easy ways for the script to
plug all that into different fields.

Do you have the name as one long char field? Or two or three different
fields for each part of the name? Is Jr/Sr a selector or just
appended? And with addresses, the structure could be wildly different
on the model level, ranging from (address, city, state, zip) to a
simple line-broken longtext. And whatever structure suits the needs of
the developer, we need to address.

Furthermore, what if you have a question like: is the performance cost
of having the name in three fields instead of one justified for my
application? What if we drop it to two fields? By having batch
creation, smart generators, and a performance tester already built-in
we can answer these questions with hard numbers. We can even include
such a batch set in the django source for the purpose of testing new
design decisions.

If a django patch comes in that says it makes queries faster, how do
the django developers know if the claim is true? There are already
ways to test such a thing, but nothing beats seeing actual "on the
street" performance. When it comes to the performance of a complex
system like django, mini-tests are contrived and don't usually reflect
the reality. See any language shoot-out for an example of how unreal
contrived tests tend to be. ;)

So Django could have a rule that any performance claims need to be
accompanied with hard numbers on the chosen sample data batch.

If I'm chosen, once the project is complete it'd also be easy to
provide "teaching samples" of various Django applications which can
then generate their own data on install. Right now we have
instructions on creating a polling site in the documentation (last I
checked) that's used to teach the basics. We could supplement this
with, say, a sample blog, fully-designed and commented, that will
create its own sample blog entries on install.

$ django-admin addsampleproject blog

This will give live data to the student so that they can mess around
in the admin and on the code level, getting a feel for what it's like
working with a real Django application. Now, if we make the teaching
sample also the performance data set for people to test their Django
code changes on, then we get more of that delightful synergy.

What if I to create any sort of new feature (history tracking, etc)
for Django? The batch system could make it easy to stamp a "clean" app
on a development box with valid but expendable data for the programmer
to work with while developing his feature. He can finish his feature
and even include performance metrics to prove he's not slowing
anything down.

So yeah, my idea is a lot more ambitious and useful than it would
appear at first glance. Especially if the glance is at my woefully
understated proposal. ;)

Let me know what you think!


Thanks!

-Jason L.
--~--~---------~--~----~------------~-------~--~----~
You received this message because you are subscribed to the Google Groups 
"Django developers" group.
To post to this group, send email to django-developers@googlegroups.com
To unsubscribe from this group, send email to [EMAIL PROTECTED]
For more options, visit this group at 
http://groups.google.com/group/django-developers?hl=en
-~----------~----~----~----~------~----~------~--~---

Re: GSoC: Effortless Model Testing

Reply via email to