Re: multidb - partitioning

2010-01-10 Thread Adrian Maier
On Sat, Jan 9, 2010 at 15:20, tom  wrote:

> Hi Adrian,
>
> thanks for your answer!
> The problem is, that i don't know how many projects will be created.
> Every user
> of the application can create and delete new projects.
> Maybe it's possible to execute some raw sql commands whenever a
> project is created or deleted (to create the new table and insert
> trigger).
> Do you think that's possible?
> Do you have any good tutorial about postgres and partitioning? I'm
> completly new to this topic.
>
>
One possibility is to have one child table for each existing project. And
one additional table for (all) the newly created projects (just as a
temporary
step until someone or a script creates the new tables).  Later, a midnight
script can check for new projects, create the needed new tables and
rewrite the trigger(s) . I see no reason why automating everything wouldn't
be possible .

By the way : how many projects do you expect to have in the database?
Thousands of projects means thousands of tables ,  and a huge insert
trigger that tries to determine where to insert each new record.  PostgreSQL
should be able to handle lots of tables,  but that trigger will become a
bottleneck at some point in future ...

On the other hand,  try to avoid being stuck into the partitioning idea:
maybe improving the performance for your database can be done
in other ways (maybe you are simply missing an index ! ) .


Sorry, I have no links about partitioning to suggest.

However,  i suggest you to join the pgsql-performance mailing list
and ask there for advice.  That is the PostgreSQL list for discussions
about specific performance issues. You'll find information about how
to subscribe on the postgresql website .

Just make sure that you post there enough information about your database:
the columns of the table,  the estimated number of rows , and a specific
select query that is executed too slowly . And of course the execution
plan of that query  ( explain analyse select .  from    where 
).
So, make sure that you are describing a concrete problem :  for example
you didn't mention what is actual the problem that you hope to solve through
partitioning.  Obviously, you are seeking for faster database response time.
But when exactly : when you insert data, or when you query the data ?
There are many postgres experts on pgsql-performance  ,  but you need
to describe well what are you trying to do;)



-- 
Adrian Maier
-- 

You received this message because you are subscribed to the Google Groups "Django users" group.

To post to this group, send email to django-us...@googlegroups.com.

To unsubscribe from this group, send email to django-users+unsubscr...@googlegroups.com.

For more options, visit this group at http://groups.google.com/group/django-users?hl=en.



Re: multidb - partitioning

2010-01-09 Thread Robert S
Hello, I'm new to django, so bear with me if this is a silly approach
but, could you assign the project name to db_table at run time?
Something like ...

models.py
...

# app_name and project_code may need to be declared with global
scope ...

APP_NAME = 'app_name'   # might be optional

global project_code = get_project_code(...)
def set_project_code(my_project_code):
  project_code = my_project_code

class Measurand(models.Model):
project = models.ForeignKey(Project)  # project is implied by
table name context, so may be unnecessary here?
avg_value = models.DecimalField(max_digits=10, decimal_places=4,
db_index=True)

class Meta:
   db_table = u'%s_%s_measurand'%( app_name, project_code )
   # app_name is usual django behaviour, but may not be necessary
if you are certain no other apps will conflict with the table name

   # another approach would be ...
   # db_table = u'%s_measurand'%( project_code )

...

Obviously, before you build project tables, you'd need to call
et_project_code

HTH
-- 
You received this message because you are subscribed to the Google Groups 
"Django users" group.
To post to this group, send email to django-us...@googlegroups.com.
To unsubscribe from this group, send email to 
django-users+unsubscr...@googlegroups.com.
For more options, visit this group at 
http://groups.google.com/group/django-users?hl=en.




Re: multidb - partitioning

2010-01-09 Thread tom
Hi Adrian,

thanks for your answer!
The problem is, that i don't know how many projects will be created.
Every user
of the application can create and delete new projects.
Maybe it's possible to execute some raw sql commands whenever a
project is created or deleted (to create the new table and insert
trigger).
Do you think that's possible?
Do you have any good tutorial about postgres and partitioning? I'm
completly new to this topic.

Cheers Tom

On 8 Jan., 22:01, Adrian Maier  wrote:
> Hello Tom,
>
> My feeling is that you have almost defined the solution on your own
> already. I think that you are on the right track.
>
> First, forget about multiple databases : table partitioning has nothing to
> do with it.
>
> Secondly, the fact that django is not explicitly supporting partitioning is
> not an issue:  partitioning is transparent for the applications.  An
> apllication that executes select/insert/update/delete commands doesn't need
> to do anything special .
>
> Thirdly, perhaps you should simply try to apply partitioning for your table,
> starting from the examples given in the  "Partitioning" chapter that you've
> mentioned :
> - create the Measurands table
> - create several project specific tables :
>     create table Measurands_project1 (
>                check PROJECT_ID=1
>         ) inherits (Measurands);
>
>     create table Measurands_project2 (
>                check PROJECT_ID=2
>         ) inherits (Measurands);
>
>     create table Measurands_project3 (
>                check PROJECT_ID=3
>         ) inherits (Measurands);
>
> - create the trigger on Measurands that inserts each new record into the
> right child Measurands_projectXX table .
>
> When selecting records from Measurands , the table inheritance mechanism
> will automatically provide acces to the project-tables:
> -  select * from Measurands     will retrieve all the records (because
> of table inheritance)
> -  select * from Measurands where PROJECT_ID=2   will retrieve the
> records for project 2 .
>
> In the second situation, the 'constraint exclusion' optimization of the
> recent PostgreSQL versions will cause that the database engine will read
> data only from Measurands_Project2 .
>
> The Django aplication will automatically take advantage of this optimization
> without any modification  : the only catch is that you'll have to create the
> tables manually (not with manage.py syncdb).  Also, the number of projects
> will have to be fixed in general (every new project involves creation of a
> new table ,  and modifying the insert trigger).
>
> After creating the tables you should do some testing to see that using
> partitioning is indeed giving better performance. Adding some indexes could
> be an alternative solution,  and the only way for knowing what's the best
> solution is testing .
>
> Good luck with experimenting 
>
> Adrian Maier
-- 
You received this message because you are subscribed to the Google Groups 
"Django users" group.
To post to this group, send email to django-us...@googlegroups.com.
To unsubscribe from this group, send email to 
django-users+unsubscr...@googlegroups.com.
For more options, visit this group at 
http://groups.google.com/group/django-users?hl=en.




Re: multidb - partitioning

2010-01-08 Thread Adrian Maier
On Fri, Jan 8, 2010 at 15:36, tom  wrote:

> Hi,
>
> i use postgres and want to use partitioning for tables (see
> http://www.postgresql.org/docs/8.3/interactive/ddl-partitioning.html )
>
> My Model looks like this:
>
> class Measurand(models.Model):
>project = models.ForeignKey(Project)
>avg_value = models.DecimalField(max_digits=10, decimal_places=4,
> db_index=True)
>
>
> I never want to query over all measurands. I just want to query over
> all measurands from _one_ project like this: project.measurand_set.all
> ()
>
> So the best solution i think is to have one table for every project.
> Is this possible with django and the new multidb-api?
>


Hello Tom,

My feeling is that you have almost defined the solution on your own
already. I think that you are on the right track.

First, forget about multiple databases : table partitioning has nothing to
do with it.

Secondly, the fact that django is not explicitly supporting partitioning is
not an issue:  partitioning is transparent for the applications.  An
apllication that executes select/insert/update/delete commands doesn't need
to do anything special .

Thirdly, perhaps you should simply try to apply partitioning for your table,
starting from the examples given in the  "Partitioning" chapter that you've
mentioned :
- create the Measurands table
- create several project specific tables :
create table Measurands_project1 (
   check PROJECT_ID=1
) inherits (Measurands);

create table Measurands_project2 (
   check PROJECT_ID=2
) inherits (Measurands);

create table Measurands_project3 (
   check PROJECT_ID=3
) inherits (Measurands);

- create the trigger on Measurands that inserts each new record into the
right child Measurands_projectXX table .

When selecting records from Measurands , the table inheritance mechanism
will automatically provide acces to the project-tables:
-  select * from Measurands will retrieve all the records (because
of table inheritance)
-  select * from Measurands where PROJECT_ID=2   will retrieve the
records for project 2 .

In the second situation, the 'constraint exclusion' optimization of the
recent PostgreSQL versions will cause that the database engine will read
data only from Measurands_Project2 .

The Django aplication will automatically take advantage of this optimization
without any modification  : the only catch is that you'll have to create the
tables manually (not with manage.py syncdb).  Also, the number of projects
will have to be fixed in general (every new project involves creation of a
new table ,  and modifying the insert trigger).

After creating the tables you should do some testing to see that using
partitioning is indeed giving better performance. Adding some indexes could
be an alternative solution,  and the only way for knowing what's the best
solution is testing .



Good luck with experimenting 

Adrian Maier
-- 

You received this message because you are subscribed to the Google Groups "Django users" group.

To post to this group, send email to django-us...@googlegroups.com.

To unsubscribe from this group, send email to django-users+unsubscr...@googlegroups.com.

For more options, visit this group at http://groups.google.com/group/django-users?hl=en.



Re: multidb - partitioning

2010-01-08 Thread tom
Hi Russell,

On 8 Jan., 15:40, Russell Keith-Magee  wrote:
> On Fri, Jan 8, 2010 at 9:36 PM, tom  wrote:
> Regardless of how nice and helpful they might be, Django doesn't
> provide explicit support for the features of specific databases.

i think partitioning is not a feature of a specific database.
postgres,
oracle and mysql all know partitioning.

The problem in my case ist, that i have a single table with about
60 Million rows. For example a simple Measurand.objects.count()
needs about 20-50 Seconds.

Cheers,

Tom
-- 
You received this message because you are subscribed to the Google Groups 
"Django users" group.
To post to this group, send email to django-us...@googlegroups.com.
To unsubscribe from this group, send email to 
django-users+unsubscr...@googlegroups.com.
For more options, visit this group at 
http://groups.google.com/group/django-users?hl=en.




Re: multidb - partitioning

2010-01-08 Thread Russell Keith-Magee
On Fri, Jan 8, 2010 at 9:36 PM, tom  wrote:
> Hi,
>
> i use postgres and want to use partitioning for tables (see
> http://www.postgresql.org/docs/8.3/interactive/ddl-partitioning.html )
>
> My Model looks like this:
>
> class Measurand(models.Model):
>    project = models.ForeignKey(Project)
>    avg_value = models.DecimalField(max_digits=10, decimal_places=4,
> db_index=True)
>
>
> I never want to query over all measurands. I just want to query over
> all measurands from _one_ project like this: project.measurand_set.all
> ()
>
> So the best solution i think is to have one table for every project.
> Is this possible with django and the new multidb-api?

The short answer is that you're on your own here.

Regardless of how nice and helpful they might be, Django doesn't
provide explicit support for the features of specific databases.

I'm not sure that multi-db will help you out here - you don't need
different connections to the database, you need different tables in a
single database. Your one-table-per-project approach sounds a little
closer to the mark, but there are plenty of details that will need to
be worked out (especially with regards to writing initial SQL scripts
to set up triggers and the like).

So - good luck, but you're in uncharted territory.

Yours,
Russ Magee %-)
-- 
You received this message because you are subscribed to the Google Groups 
"Django users" group.
To post to this group, send email to django-us...@googlegroups.com.
To unsubscribe from this group, send email to 
django-users+unsubscr...@googlegroups.com.
For more options, visit this group at 
http://groups.google.com/group/django-users?hl=en.




multidb - partitioning

2010-01-08 Thread tom
Hi,

i use postgres and want to use partitioning for tables (see
http://www.postgresql.org/docs/8.3/interactive/ddl-partitioning.html )

My Model looks like this:

class Measurand(models.Model):
project = models.ForeignKey(Project)
avg_value = models.DecimalField(max_digits=10, decimal_places=4,
db_index=True)


I never want to query over all measurands. I just want to query over
all measurands from _one_ project like this: project.measurand_set.all
()

So the best solution i think is to have one table for every project.
Is this possible with django and the new multidb-api?


cheers,

tom
-- 
You received this message because you are subscribed to the Google Groups 
"Django users" group.
To post to this group, send email to django-us...@googlegroups.com.
To unsubscribe from this group, send email to 
django-users+unsubscr...@googlegroups.com.
For more options, visit this group at 
http://groups.google.com/group/django-users?hl=en.