Re: multidb - partitioning
On Sat, Jan 9, 2010 at 15:20, tom wrote: > Hi Adrian, > > thanks for your answer! > The problem is, that i don't know how many projects will be created. > Every user > of the application can create and delete new projects. > Maybe it's possible to execute some raw sql commands whenever a > project is created or deleted (to create the new table and insert > trigger). > Do you think that's possible? > Do you have any good tutorial about postgres and partitioning? I'm > completly new to this topic. > > One possibility is to have one child table for each existing project. And one additional table for (all) the newly created projects (just as a temporary step until someone or a script creates the new tables). Later, a midnight script can check for new projects, create the needed new tables and rewrite the trigger(s) . I see no reason why automating everything wouldn't be possible . By the way : how many projects do you expect to have in the database? Thousands of projects means thousands of tables , and a huge insert trigger that tries to determine where to insert each new record. PostgreSQL should be able to handle lots of tables, but that trigger will become a bottleneck at some point in future ... On the other hand, try to avoid being stuck into the partitioning idea: maybe improving the performance for your database can be done in other ways (maybe you are simply missing an index ! ) . Sorry, I have no links about partitioning to suggest. However, i suggest you to join the pgsql-performance mailing list and ask there for advice. That is the PostgreSQL list for discussions about specific performance issues. You'll find information about how to subscribe on the postgresql website . Just make sure that you post there enough information about your database: the columns of the table, the estimated number of rows , and a specific select query that is executed too slowly . And of course the execution plan of that query ( explain analyse select . from where ). So, make sure that you are describing a concrete problem : for example you didn't mention what is actual the problem that you hope to solve through partitioning. Obviously, you are seeking for faster database response time. But when exactly : when you insert data, or when you query the data ? There are many postgres experts on pgsql-performance , but you need to describe well what are you trying to do;) -- Adrian Maier -- You received this message because you are subscribed to the Google Groups "Django users" group. To post to this group, send email to django-us...@googlegroups.com. To unsubscribe from this group, send email to django-users+unsubscr...@googlegroups.com. For more options, visit this group at http://groups.google.com/group/django-users?hl=en.
Re: multidb - partitioning
Hello, I'm new to django, so bear with me if this is a silly approach but, could you assign the project name to db_table at run time? Something like ... models.py ... # app_name and project_code may need to be declared with global scope ... APP_NAME = 'app_name' # might be optional global project_code = get_project_code(...) def set_project_code(my_project_code): project_code = my_project_code class Measurand(models.Model): project = models.ForeignKey(Project) # project is implied by table name context, so may be unnecessary here? avg_value = models.DecimalField(max_digits=10, decimal_places=4, db_index=True) class Meta: db_table = u'%s_%s_measurand'%( app_name, project_code ) # app_name is usual django behaviour, but may not be necessary if you are certain no other apps will conflict with the table name # another approach would be ... # db_table = u'%s_measurand'%( project_code ) ... Obviously, before you build project tables, you'd need to call et_project_code HTH -- You received this message because you are subscribed to the Google Groups "Django users" group. To post to this group, send email to django-us...@googlegroups.com. To unsubscribe from this group, send email to django-users+unsubscr...@googlegroups.com. For more options, visit this group at http://groups.google.com/group/django-users?hl=en.
Re: multidb - partitioning
Hi Adrian, thanks for your answer! The problem is, that i don't know how many projects will be created. Every user of the application can create and delete new projects. Maybe it's possible to execute some raw sql commands whenever a project is created or deleted (to create the new table and insert trigger). Do you think that's possible? Do you have any good tutorial about postgres and partitioning? I'm completly new to this topic. Cheers Tom On 8 Jan., 22:01, Adrian Maier wrote: > Hello Tom, > > My feeling is that you have almost defined the solution on your own > already. I think that you are on the right track. > > First, forget about multiple databases : table partitioning has nothing to > do with it. > > Secondly, the fact that django is not explicitly supporting partitioning is > not an issue: partitioning is transparent for the applications. An > apllication that executes select/insert/update/delete commands doesn't need > to do anything special . > > Thirdly, perhaps you should simply try to apply partitioning for your table, > starting from the examples given in the "Partitioning" chapter that you've > mentioned : > - create the Measurands table > - create several project specific tables : > create table Measurands_project1 ( > check PROJECT_ID=1 > ) inherits (Measurands); > > create table Measurands_project2 ( > check PROJECT_ID=2 > ) inherits (Measurands); > > create table Measurands_project3 ( > check PROJECT_ID=3 > ) inherits (Measurands); > > - create the trigger on Measurands that inserts each new record into the > right child Measurands_projectXX table . > > When selecting records from Measurands , the table inheritance mechanism > will automatically provide acces to the project-tables: > - select * from Measurands will retrieve all the records (because > of table inheritance) > - select * from Measurands where PROJECT_ID=2 will retrieve the > records for project 2 . > > In the second situation, the 'constraint exclusion' optimization of the > recent PostgreSQL versions will cause that the database engine will read > data only from Measurands_Project2 . > > The Django aplication will automatically take advantage of this optimization > without any modification : the only catch is that you'll have to create the > tables manually (not with manage.py syncdb). Also, the number of projects > will have to be fixed in general (every new project involves creation of a > new table , and modifying the insert trigger). > > After creating the tables you should do some testing to see that using > partitioning is indeed giving better performance. Adding some indexes could > be an alternative solution, and the only way for knowing what's the best > solution is testing . > > Good luck with experimenting > > Adrian Maier -- You received this message because you are subscribed to the Google Groups "Django users" group. To post to this group, send email to django-us...@googlegroups.com. To unsubscribe from this group, send email to django-users+unsubscr...@googlegroups.com. For more options, visit this group at http://groups.google.com/group/django-users?hl=en.
Re: multidb - partitioning
On Fri, Jan 8, 2010 at 15:36, tom wrote: > Hi, > > i use postgres and want to use partitioning for tables (see > http://www.postgresql.org/docs/8.3/interactive/ddl-partitioning.html ) > > My Model looks like this: > > class Measurand(models.Model): >project = models.ForeignKey(Project) >avg_value = models.DecimalField(max_digits=10, decimal_places=4, > db_index=True) > > > I never want to query over all measurands. I just want to query over > all measurands from _one_ project like this: project.measurand_set.all > () > > So the best solution i think is to have one table for every project. > Is this possible with django and the new multidb-api? > Hello Tom, My feeling is that you have almost defined the solution on your own already. I think that you are on the right track. First, forget about multiple databases : table partitioning has nothing to do with it. Secondly, the fact that django is not explicitly supporting partitioning is not an issue: partitioning is transparent for the applications. An apllication that executes select/insert/update/delete commands doesn't need to do anything special . Thirdly, perhaps you should simply try to apply partitioning for your table, starting from the examples given in the "Partitioning" chapter that you've mentioned : - create the Measurands table - create several project specific tables : create table Measurands_project1 ( check PROJECT_ID=1 ) inherits (Measurands); create table Measurands_project2 ( check PROJECT_ID=2 ) inherits (Measurands); create table Measurands_project3 ( check PROJECT_ID=3 ) inherits (Measurands); - create the trigger on Measurands that inserts each new record into the right child Measurands_projectXX table . When selecting records from Measurands , the table inheritance mechanism will automatically provide acces to the project-tables: - select * from Measurands will retrieve all the records (because of table inheritance) - select * from Measurands where PROJECT_ID=2 will retrieve the records for project 2 . In the second situation, the 'constraint exclusion' optimization of the recent PostgreSQL versions will cause that the database engine will read data only from Measurands_Project2 . The Django aplication will automatically take advantage of this optimization without any modification : the only catch is that you'll have to create the tables manually (not with manage.py syncdb). Also, the number of projects will have to be fixed in general (every new project involves creation of a new table , and modifying the insert trigger). After creating the tables you should do some testing to see that using partitioning is indeed giving better performance. Adding some indexes could be an alternative solution, and the only way for knowing what's the best solution is testing . Good luck with experimenting Adrian Maier -- You received this message because you are subscribed to the Google Groups "Django users" group. To post to this group, send email to django-us...@googlegroups.com. To unsubscribe from this group, send email to django-users+unsubscr...@googlegroups.com. For more options, visit this group at http://groups.google.com/group/django-users?hl=en.
Re: multidb - partitioning
Hi Russell, On 8 Jan., 15:40, Russell Keith-Magee wrote: > On Fri, Jan 8, 2010 at 9:36 PM, tom wrote: > Regardless of how nice and helpful they might be, Django doesn't > provide explicit support for the features of specific databases. i think partitioning is not a feature of a specific database. postgres, oracle and mysql all know partitioning. The problem in my case ist, that i have a single table with about 60 Million rows. For example a simple Measurand.objects.count() needs about 20-50 Seconds. Cheers, Tom -- You received this message because you are subscribed to the Google Groups "Django users" group. To post to this group, send email to django-us...@googlegroups.com. To unsubscribe from this group, send email to django-users+unsubscr...@googlegroups.com. For more options, visit this group at http://groups.google.com/group/django-users?hl=en.
Re: multidb - partitioning
On Fri, Jan 8, 2010 at 9:36 PM, tom wrote: > Hi, > > i use postgres and want to use partitioning for tables (see > http://www.postgresql.org/docs/8.3/interactive/ddl-partitioning.html ) > > My Model looks like this: > > class Measurand(models.Model): > project = models.ForeignKey(Project) > avg_value = models.DecimalField(max_digits=10, decimal_places=4, > db_index=True) > > > I never want to query over all measurands. I just want to query over > all measurands from _one_ project like this: project.measurand_set.all > () > > So the best solution i think is to have one table for every project. > Is this possible with django and the new multidb-api? The short answer is that you're on your own here. Regardless of how nice and helpful they might be, Django doesn't provide explicit support for the features of specific databases. I'm not sure that multi-db will help you out here - you don't need different connections to the database, you need different tables in a single database. Your one-table-per-project approach sounds a little closer to the mark, but there are plenty of details that will need to be worked out (especially with regards to writing initial SQL scripts to set up triggers and the like). So - good luck, but you're in uncharted territory. Yours, Russ Magee %-) -- You received this message because you are subscribed to the Google Groups "Django users" group. To post to this group, send email to django-us...@googlegroups.com. To unsubscribe from this group, send email to django-users+unsubscr...@googlegroups.com. For more options, visit this group at http://groups.google.com/group/django-users?hl=en.
multidb - partitioning
Hi, i use postgres and want to use partitioning for tables (see http://www.postgresql.org/docs/8.3/interactive/ddl-partitioning.html ) My Model looks like this: class Measurand(models.Model): project = models.ForeignKey(Project) avg_value = models.DecimalField(max_digits=10, decimal_places=4, db_index=True) I never want to query over all measurands. I just want to query over all measurands from _one_ project like this: project.measurand_set.all () So the best solution i think is to have one table for every project. Is this possible with django and the new multidb-api? cheers, tom -- You received this message because you are subscribed to the Google Groups "Django users" group. To post to this group, send email to django-us...@googlegroups.com. To unsubscribe from this group, send email to django-users+unsubscr...@googlegroups.com. For more options, visit this group at http://groups.google.com/group/django-users?hl=en.