DavidA wrote:
> I am prepopulating my database with data from a number of flat files.
> I've written a small script to do this using my model class. While this
> works, there are a couple of kludges I made to get it to work and I was
> hoping someone could advise my on a better way to do this.

Hey Dave,

You didn't mention how many records you are importing nor how often you
will be re-importing your flat files.  I regularly have to import files
with ~2.5M records and I've found that using an ORM framework is too
slow for my requirements.

My first script to do the import used Django's ORM.  That script looped
over the lines of the file, slurping in the data, creating a new django
object out of the data and .save()ing the object.  It worked well and
the code was easy to read, but I needed more speed as this script was
going to need to run at least weekly and had to run at night,
completing before my users started hammering on the web app in the
morning.  The second script I wrote would drop existing indexes, then
iterating over the lines of the file, it worked directly with Django's
db object to INSERT the data into the (postgresql) database.  The
indexes were recreated at the end of the script.  If I recall
correctly, that script took around 5-6 hours to complete the import.
The script I'm using now iterates over the lines of the file, formating
the data into a new sql file that is psql (the postgresql command line
tool) friendly.  The script takes four minutes to create the sql file.
It then uses python's os.popen4 to load the python-generated sql file
into psql:

cmd = "psql -U my_user my_db -f %s" % path_to_sql_file
s_in, s_out = os.popen4(cmd)

This approach cut the import time down to just under 2 hours (the sql
file also drops indexes before importing the data, then recreates the
indexes afterwords).

So, If you are importing lots of data and have to do it frequently or
many times and time is of the essence, my recommendation is to give the
read-file/write-file/db-command-line three-step a try.

Good luck and best regards,

Eric.


--~--~---------~--~----~------------~-------~--~----~
You received this message because you are subscribed to the Google Groups 
"Django users" group.
To post to this group, send email to django-users@googlegroups.com
To unsubscribe from this group, send email to [EMAIL PROTECTED]
For more options, visit this group at 
http://groups.google.com/group/django-users
-~----------~----~----~----~------~----~------~--~---

Reply via email to