> I'm developing a Django project that's going to handle with
> big sets of data and want you to advise me. I have 10 internal
> bureaus and each of then has a 1.5 million registers database
> and it really looks to keep growwing on size on and on.  I
> intend to use Postgres.
> 
> The question:  what's the best way to handle and store this
> data? I tought about breaking the app model into 10 smaller
> ones (Bureau_1, Bureau_2, Bureau_3 etc) cause the main reports
> are splited by Bureau. Response time matters. What do you
> think?

I deal with fairly large datasets (my employer does cell-phone 
management, tending tens of thousands of phones, for hundreds of 
companies, with historical statement detail for each phone, and 
about 2.8e6 records of call-detail for those clients that require 
the 3mo worth that we'll keep...and only growing).

I can't say that splitting across multiple databases makes a very 
useful partitioning and it forces you to design your application 
around performance.  It also becomes a maint. headache as you 
have to touch each DB (or script) when performing changes. 
Rather than just adding a column to a table, you have to spew 
your ALTER TABLE statement across each DB.  It (unless each is on 
its own machine, where it doesn't matter) would also not be able 
to agressivly cache common tables.

Learning the ins-and-outs of Postgresql's EXPLAIN command can 
help you find bottle-necks (such as missing indexes).  I'm afraid 
I haven't become adroit at this.

The VACUUM ANALYZE can find and fix areas of usage that 
Postgresql can optimize.

I have had some performance problems with that call-detail table 
(with its 2.8e6 rows or so), but find that as it's indexed, as 
long as I pull from a joined table and only pull in the records I 
care about, it can be pretty snappy.  It's mostly sluggish when I 
try and do operations across the whole table rather than a subset 
of it, but even then, it's not too bad.

Fast disks (a raid configuration helps) and loads of memory (as 
much as your server will hold, or at least a couple gigs if 
you've got a super-server) will go a long way towards easing your 
data pains.  Multiple processors can help too, but most notably 
after you've eased the IO/memory bottle-necks.

Just my obeservations from the field,

-tim





--~--~---------~--~----~------------~-------~--~----~
You received this message because you are subscribed to the Google Groups 
"Django users" group.
To post to this group, send email to django-users@googlegroups.com
To unsubscribe from this group, send email to [EMAIL PROTECTED]
For more options, visit this group at 
http://groups.google.com/group/django-users?hl=en
-~----------~----~----~----~------~----~------~--~---

Reply via email to