Re: [PERFORM] Databases vs Schemas
On Fri, Oct 9, 2009 at 10:50 PM, Scott Carey sc...@richrelevance.com wrote: On 10/9/09 2:02 PM, Merlin Moncure mmonc...@gmail.com wrote: On Fri, Oct 9, 2009 at 1:46 PM, Scott Otis scott.o...@intand.com wrote: Over the next couple of months we will be creating an instance of our solution for each public school district in the US which is around 18,000. That means currently we would be creating 18,000 databases (all on one server right now which is running 8.4). I am assuming this is probably not the best way of doing things. Schema advantages: *) maintenance advantages; all functions/trigger functions can be shared. HUGE help if you use them *) can query shared data between schemas without major headaches *) a bit more efficiency especially if private data areas are small. kinda analogous to processes vs threads *) Can manage the complete system without changing database sessions. This is the kicker IMO. Database Advantages: *) More discrete. Easier to distinctly create, dump, drop, or move to separate server *) Smaller system catalogs might give efficiency benefits I'm concerned how a system with 57 * 18000 1M tables will function. I've got 200,000 tables in one db (8.4), and some tools barely work. The system catalogs get inefficient when large and psql especially has trouble. Tab completion takes forever, even if I make a schema s with one table in it and type s. and try and tab complete -- its as if its scanning all without a schema qualifier or using an index. Sometimes it does not match valid tables at all, and sometimes regex matching fails too ('\dt schema.*_*_*' intermittently flakes out if it returns a lot of matches). Other than that the number of tables doesn't seem to cause much performance trouble. The only exception is constraint exclusion which is fundamentally broken with too many tables on the performance and memory consumption side. Having a lot of tables really makes me wish VACUUM, ANALYZE, and other maintenance tools could partially matched object names with regex though. On the other hand, lots of databases probably has performance drawbacks too. And its maintenance drawbacks are even bigger. I certainly don't see any reason at all to try and put all of these in one schema. The only useful choices are schemas vs databases. I'd go for schemas unless the performance issues there are a problem. Schemas can be dumped/restored/backed up independent of one another easily too. They can, but: drop schema foo cascade; is a different operation than: drop database foo; The first is kinda surgical and the second is a rocket launcher. What would you rather have in battle? For the record, just about every database I've ever designed has had some of what I call 'de facto table partitioning' using schemas/search_path tricks. I'm working on a system right now that is going to get very large and if I started to run into psql problems I'd probably look at patching it, maybe \set an option to simplify some of the queries. merlin -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] Databases vs Schemas
On Fri, Oct 9, 2009 at 11:11 PM, Tom Lane t...@sss.pgh.pa.us wrote: Scott Carey sc...@richrelevance.com writes: I've got 200,000 tables in one db (8.4), and some tools barely work. The system catalogs get inefficient when large and psql especially has trouble. Tab completion takes forever, even if I make a schema s with one table in it and type s. and try and tab complete -- its as if its scanning all without a schema qualifier or using an index. The tab-completion queries have never been vetted for performance particularly :-( Just out of curiosity, how much does this help? alter function pg_table_is_visible(oid) cost 10; (You'll need to do it as superuser --- if it makes things worse, just set the cost back to 1.) Sometimes it does not match valid tables at all, and sometimes regex matching fails too ('\dt schema.*_*_*' intermittently flakes out if it returns a lot of matches). There are some arbitrary LIMIT 1000 clauses in those queries, which probably explains this ... but taking them out would likely cause libreadline to get indigestion ... regards, tom lane We ran into this exact situation with a pg 8.3 database and a very large number of tables. psql would wait for 20 to 30 seconds if the user was unlucky enough to hit the tab key. After doing some research with query logging, explain analyze and some trial and error, we came to the same conclusion. Altering the cost for the pg_table_is_visible function to 10 fixed our performance problem immediately. It appears that when the cost was set to 1, that the query optimizer first ran the function over the entire pg_class table. By increasing the cost, it now only runs the function over the rows returned by the other items in the where clause. -chris
Re: [PERFORM] Databases vs Schemas
On Sat, Oct 10, 2009 at 8:44 AM, Chris Kratz chris.kr...@vistashare.com wrote: alter function pg_table_is_visible(oid) cost 10; (You'll need to do it as superuser --- if it makes things worse, just set the cost back to 1.) Sometimes it does not match valid tables at all, and sometimes regex matching fails too ('\dt schema.*_*_*' intermittently flakes out if it returns a lot of matches). There are some arbitrary LIMIT 1000 clauses in those queries, which probably explains this ... but taking them out would likely cause libreadline to get indigestion ... regards, tom lane We ran into this exact situation with a pg 8.3 database and a very large number of tables. psql would wait for 20 to 30 seconds if the user was unlucky enough to hit the tab key. After doing some research with query logging, explain analyze and some trial and error, we came to the same conclusion. Altering the cost for the pg_table_is_visible function to 10 fixed our performance problem immediately. It appears that when the cost was set to 1, that the query optimizer first ran the function over the entire pg_class table. By increasing the cost, it now only runs the function over the rows returned by the other items in the where clause. We have a large number of objects in our db and this worked for me too! Thanks a lot. As a side note, it also makes slony create set stuff run really really slow as well, and I'm guessing there's a similar trick for the slony functions I can add and see if it helps. -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
[PERFORM] Databases vs Schemas
I am seeking advice on what the best setup for the following would be. My company provides a hosted web calendaring solution for school districts. For each school district we have a separate database. Each database has 57 tables. There are a total of 649 fields in those tables. Here is a table of the different kinds of field and how many there are: time without time zone bytea date smallint boolean integer timestamp without time zone numeric text 9 4 8 1 79 195 36 8 309 Over the next couple of months we will be creating an instance of our solution for each public school district in the US which is around 18,000. That means currently we would be creating 18,000 databases (all on one server right now - which is running 8.4). I am assuming this is probably not the best way of doing things. I have read up on schemas and it looks like a good change to make would be to create 1 database with 18,000 schemas. Would that be a good idea? What sort of issues should I be aware of (administrative, management, performance, etc...)? Is that too many schemas to put into 1 database? What are the limits on the number of databases and schemas you can create? Should I try to re-engineer things so that all 18,000 instances only use 1 database and 1 schema? Let me know if you need any more info. Any advice and information would be greatly appreciated. Regards, Scott Otis CIO / Lead Developer Intand www.intand.com
Re: [PERFORM] Databases vs Schemas
Scott Otis wrote: I am seeking advice on what the best setup for the following would be. My company provides a hosted web calendaring solution for school districts. For each school district we have a separate database. Each database has 57 tables. Over the next couple of months we will be creating an instance of our solution for each public school district in the US which is around 18,000. Why are you trying to keep all this information on one server? It seems like you have such perfectly independent silos of data, why not take the opportunity to scale out horizontally? It's usually a lot cheaper to buy 4 machines of power x than one machine of power (4x). -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] Databases vs Schemas
On Fri, Oct 9, 2009 at 1:46 PM, Scott Otis scott.o...@intand.com wrote: Over the next couple of months we will be creating an instance of our solution for each public school district in the US which is around 18,000. That means currently we would be creating 18,000 databases (all on one server right now – which is running 8.4). I am assuming this is probably not the best way of doing things. Schema advantages: *) maintenance advantages; all functions/trigger functions can be shared. HUGE help if you use them *) can query shared data between schemas without major headaches *) a bit more efficiency especially if private data areas are small. kinda analogous to processes vs threads *) Can manage the complete system without changing database sessions. This is the kicker IMO. Database Advantages: *) More discrete. Easier to distinctly create, dump, drop, or move to separate server *) Smaller system catalogs might give efficiency benefits merlin -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] Databases vs Schemas
Scott Carey sc...@richrelevance.com writes: I've got 200,000 tables in one db (8.4), and some tools barely work. The system catalogs get inefficient when large and psql especially has trouble. Tab completion takes forever, even if I make a schema s with one table in it and type s. and try and tab complete -- its as if its scanning all without a schema qualifier or using an index. The tab-completion queries have never been vetted for performance particularly :-( Just out of curiosity, how much does this help? alter function pg_table_is_visible(oid) cost 10; (You'll need to do it as superuser --- if it makes things worse, just set the cost back to 1.) Sometimes it does not match valid tables at all, and sometimes regex matching fails too ('\dt schema.*_*_*' intermittently flakes out if it returns a lot of matches). There are some arbitrary LIMIT 1000 clauses in those queries, which probably explains this ... but taking them out would likely cause libreadline to get indigestion ... regards, tom lane -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] Databases Vs. Schemas
hi Josh Berkus wrote: Stalin, We are evaluating the options for having multiple databases vs. schemas on a single database cluster for a custom grown app that we developed. Each app installs same set of tables for each service. And the service could easily be in thousands. so Is it better to have 1000 databases vs 1000 schemas in a database cluster. What are the performance overhead of having multiple databases vs. schemas (if any). I'm leaning towards having schemas rather than databases but i would like to get others opinion on this. Appreciate your reply. No performance difference AFAIK. The real question is whether you have to have queries joining several databases. If yes, use Schema; if no, use databases. don't forget the pg_hba.conf :) You need 1000 declaration. Was a thread before, title: performance problem - 10.000 databases Check this: http://groups.google.com/groups?hl=enlr=ie=UTF-8threadm=1068039213.28814.116.camel%40franki-laptop.tpi.plrnum=10prev=/groups%3Fq%3D1000%2Bdatabase%2Bgroup:comp.databases.postgresql.*%26hl%3Den%26lr%3D%26ie%3DUTF-8%26group%3Dcomp.databases.postgresql.*%26selm%3D1068039213.28814.116.camel%2540franki-laptop.tpi.pl%26rnum%3D10 C. ---(end of broadcast)--- TIP 5: Have you checked our extensive FAQ? http://www.postgresql.org/docs/faqs/FAQ.html
[PERFORM] Databases Vs. Schemas
Hi All, We are evaluating the options for having multiple databases vs. schemas on a single database cluster for a custom grown app that we developed. Each app installs same set of tables for each service. And the service could easily be in thousands. so Is it better to have 1000 databases vs 1000 schemas in a database cluster. What are the performance overhead of having multiple databases vs. schemas (if any). I'm leaning towards having schemas rather than databases but i would like to get others opinion on this. Appreciate your reply. Thanks, Stalin ---(end of broadcast)--- TIP 4: Don't 'kill -9' the postmaster
Re: [PERFORM] Databases Vs. Schemas
Stalin, We are evaluating the options for having multiple databases vs. schemas on a single database cluster for a custom grown app that we developed. Each app installs same set of tables for each service. And the service could easily be in thousands. so Is it better to have 1000 databases vs 1000 schemas in a database cluster. What are the performance overhead of having multiple databases vs. schemas (if any). I'm leaning towards having schemas rather than databases but i would like to get others opinion on this. Appreciate your reply. No performance difference AFAIK. The real question is whether you have to have queries joining several databases. If yes, use Schema; if no, use databases. -- -Josh Berkus Aglio Database Solutions San Francisco ---(end of broadcast)--- TIP 9: the planner will ignore your desire to choose an index scan if your joining column's datatypes do not match