Re: [HACKERS] Re: [COMMITTERS] Can not create more than 32766 databases in ufs file system.
>> So the question I would ask goes more like "do you really need 32K >> databases in one installation? Have you considered using schemas >> instead?" Databases are, by design, pretty heavyweight objects. > > I agree, but at the same time, we might: a) update our documentation to > indicate it depends on the filesystem, and b) consider how we might > work around this limit (and if we feel the effort to be worth it). I don't feel it's worth the effort. I can think of lots of hosted application configurations where one might need 33K tables. Note that PostgreSQL *already* handles this better than Oracle or MySQL do -- I know at least one case where our ability to handle large numbers of tables was a reason for migration from Oracle to PostgreSQL. However, I can think of no legitimate reason to need 33K active databases in a single instance. I think someone has confused databases with schema ... or even with tables. Filemaker developer, maybe? Or maybe it 10 active databases and 32.99K archive ones ... in which case they should be dumped to compressed backup and dropped. -- Josh Berkus PostgreSQL Experts Inc. www.pgexperts.com -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Re: [COMMITTERS] Can not create more than 32766 databases in ufs file system.
Andrew Dunstan writes: > Tom Lane wrote: >> So the question I would ask goes more like "do you really need 32K >> databases in one installation? Have you considered using schemas >> instead?" Databases are, by design, pretty heavyweight objects. > That's a fair question. OTOH, devising a scheme to get around it would > not be terribly difficult, would it? I can imagine a scheme where the > subdir for a database was lo/hi for some division of the database oid. I > guess it could make matters ugly for pg_migrator, though. As I said earlier, the number-of-subdirectories issue is not the important thing. The OP was already up to 160GB worth of system catalogs before his filesystem wimped out, and would be needing terabytes if he wanted to go significantly past the filesystem limit. So there is no point in devising some clever workaround for the limitations of one filesystem unless you want to reconsider our system catalog representation --- and that will carry actual user-visible functional costs; it's not just a cute hack somewhere in the guts of the system. regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Re: [COMMITTERS] Can not create more than 32766 databases in ufs file system.
Tom Lane wrote: So the question I would ask goes more like "do you really need 32K databases in one installation? Have you considered using schemas instead?" Databases are, by design, pretty heavyweight objects. That's a fair question. OTOH, devising a scheme to get around it would not be terribly difficult, would it? I can imagine a scheme where the subdir for a database was lo/hi for some division of the database oid. I guess it could make matters ugly for pg_migrator, though. cheers andrew -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Re: [COMMITTERS] Can not create more than 32766 databases in ufs file system.
On 09/12/2009 04:17 PM, Stephen Frost wrote: * Mark Mielke (m...@mark.mielke.cc) wrote: There is no technical requirement for PostgreSQL to separate data in databases or tables on subdirectory or file boundaries. Nothing wrong with having one or more large files that contain everything. Uhh, except where you run into system limitations on file size (eg- a 2G max file size..). You'll note PG creates files up to 1G and then splits them into separate files. It's not done just because it's fun. This becomes a bit of a side thread - but note that I carefully didn't say "exactly one file". I said "one or more large files that contain everything". That is, if we have 3 databases each of size 50 Mbytes, there is no technical reason why this cannot be stored within a single 150 Mbyte data file. Sure, if it goes beyond 2G, we can break it into a set of files, and treat each file as a 2G "block" in a virtual larger storage pool. VMWare has this for storing virtual drives. If we assume that 32k *databases* is reasonable for a single instance, for 32k databases to *require* 32k immediate sub-directories is the real problem. This can be solved either by: 1) Adding additional depth to the directory height to work around this limit (what the OP and you are proposing), or 2) Storing multiple databases within the same files or sub-directories. If you really must have this amount of scalability, I am suggesting that you consider all of the resources required to access 32k worth of sub-directories in the file systems, specifically including file descriptors, inodes, the backing bitmaps or extent mappings that allocate from the file system free space, the rather inefficient directory layouts of many file systems (many file systems still do LINEAR searches for filenames, making file lookups linearly slower as the directory becomes larger), and the kernel memory caches that track all of these little details. The POSIX guarantees required are certainly more heavy weight than the requirements that PostgreSQL has, and I am certain it is possible to create a targetted solution to this problem that is simpler and faster. For only a few databases and a few files, the effort isn't worth it. But, if supporting 32k+ *databases*, or even 32k+ tables and indexes is a major requirement, and a major design target, then PostgreSQL should do this stuff itself. Modern file systems don't have the 2G problem. ext2/ext3 for 4Kbyte blocks (standard) supports up to 2Tbytes. This also matches the practical limit on addressing a single physical disk, at least on the platforms I am familiar with. The requirement to stay under 2G for a single file is a bit out dated. I guess I'm not seeing how using 32k tables is a sensible model. For one thing, there's partitioning. For another, there's a large user base. 32K tables is, to be honest, not all that many, especially for some of these databases which reach into the multi-TB range.. Talking philosophically - the need to use table-based partitioning to achieve acceptable performance or storage requirements is somewhat of a hacky work around. It's effectively moving the database query logic back into the application space, where the application must know which tables contain which data. The inherited tables and automatic constraint-based query planning helps out, but it's still an elaborate hack. It's exposing data that the application should not need to care about, and then making it possible to hide some of it again. Table partitioning should be far more automatic. I don't want to break my theoretical table containing every call made on my network into per-hour tables, each with a constraint for the time range it includes data for. I want to create a table, with a timestamp column, fill it with billions of records, provide a few hints, and the database engine should be smart enough to partition the table such that my queries "just work". Back to reality - maybe things have not reached this level of maturity yet, and people with practical requirements today, have found that they need to use very complex manual partitioning schemes that chew up thousands of tables. So yes, things can be done to reduce the cost - but it seems like something is wrong if this is truly a requirement. I have no idea what you've been working with, but I hardly think it makes sense for PG to consider over 32k tables as not worth supporting. I don't advocate any limits. However, I also don't advocate designing PostgreSQL specifically for the case of 32k tables. If you want to use 32k tables, then you better have a file system that supports 32k+ files in a single directory, and a kernel that is able to work efficiently when postgres has thousands or more file descriptors open and in use at the same time. The system *supports* 32k tables, but if you look at the design, you'll see that it is not optimal for 32k tables. Even
Re: [HACKERS] Re: [COMMITTERS] Can not create more than 32766 databases in ufs file system.
* Tom Lane (t...@sss.pgh.pa.us) wrote: > I believe the filesystem limit the OP is hitting is on the number of > *subdirectories* per directory, not on the number of plain files. Right, I'm not entirely sure how we got onto the question of number of tables. > So the question I would ask goes more like "do you really need 32K > databases in one installation? Have you considered using schemas > instead?" Databases are, by design, pretty heavyweight objects. I agree, but at the same time, we might: a) update our documentation to indicate it depends on the filesystem, and b) consider how we might work around this limit (and if we feel the effort to be worth it). Thanks, Stephen signature.asc Description: Digital signature
Re: [HACKERS] Re: [COMMITTERS] Can not create more than 32766 databases in ufs file system.
Stephen Frost writes: > * Mark Mielke (m...@mark.mielke.cc) wrote: >> I guess I'm not seeing how using 32k tables is a sensible model. > For one thing, there's partitioning. For another, there's a large user > base. 32K tables is, to be honest, not all that many, especially for > some of these databases which reach into the multi-TB range.. I believe the filesystem limit the OP is hitting is on the number of *subdirectories* per directory, not on the number of plain files. If we had a hard limit at 32K tables many people would have hit it before now. So the question I would ask goes more like "do you really need 32K databases in one installation? Have you considered using schemas instead?" Databases are, by design, pretty heavyweight objects. regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Re: [COMMITTERS] Can not create more than 32766 databases in ufs file system.
* Mark Mielke (m...@mark.mielke.cc) wrote: > There is no technical requirement for PostgreSQL to separate data in > databases or tables on subdirectory or file boundaries. Nothing wrong > with having one or more large files that contain everything. Uhh, except where you run into system limitations on file size (eg- a 2G max file size..). You'll note PG creates files up to 1G and then splits them into separate files. It's not done just because it's fun. > I guess I'm not seeing how using 32k tables is a sensible model. For one thing, there's partitioning. For another, there's a large user base. 32K tables is, to be honest, not all that many, especially for some of these databases which reach into the multi-TB range.. > So yes, > things can be done to reduce the cost - but it seems like something is > wrong if this is truly a requirement. I have no idea what you've been working with, but I hardly think it makes sense for PG to consider over 32k tables as not worth supporting. > There are alternative models of > storage that would not require 32k tables, that likely perform better. Eh? You would advocate combining tables for no reason other than you think it's bad to have alot? > Do you agree with me that having 32k open file descriptors (or worse, > open on demand file descriptors that need to be re-opened many times) is > a problem? Nope. > Looking at PostgreSQL today - I don't think it's designed to scale to > this. Looking at SQL today, I think I would find it difficult to justify > creating a solution that requires this capability. Actually, I find that PG handles it pretty well. And we used to be an Oracle shop. Thanks, Stephen signature.asc Description: Digital signature
Re: [HACKERS] Re: [COMMITTERS] Can not create more than 32766 databases in ufs file system.
Mark Mielke writes: > My God - I thought 32k databases in the same directory was insane. > 220M+??? Considering that the system catalogs alone occupy about 5MB per database, that would require an impressive amount of storage... In practice I think users would be complaining about our choice to instantiate the catalogs per-database a lot sooner than they'd hit the subdirectory-count limit. BTW, there is another avenue that the OP could look into if he really wants this many databases on a UFS filesystem: split them up into multiple tablespaces. regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Re: [COMMITTERS] Can not create more than 32766 databases in ufs file system.
Stephen Frost writes: > * Mark Mielke (m...@mark.mielke.cc) wrote: >> No matter what scheme PostgreSQL uses for storing the data, there can be >> underlying file system limitations. > This is true, but there's a reason we only create 1GB files too. I > wouldn't be against a scheme such as described to minimize the impact to > PG of these limitations. There are plenty of filesystems available that do not have this silly limitation, so I don't see a reason for us to work around it. If the OP is on a platform that only offers UFS and ZFS, and he doesn't like either of those, maybe he should find another platform. regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Re: [COMMITTERS] Can not create more than 32766 databases in ufs file system.
On 09/12/2009 03:48 PM, Stephen Frost wrote: This would allow for 220M+ databases. I'm not sure how bad it'd be to introduce another field to pg_database which provides the directory (as it'd now be distinct from the oid..) or if that might require alot of changes. Not sure how easy it'd be to implement something to address this problem while we continue to tie the directory name to the oid. Other than bragging rights - what part of this would be a GOOD thing? :-) My God - I thought 32k databases in the same directory was insane. 220M+??? Hehehe... If you can patch PostgreSQL to support such extremes without hurting my performance - I'll shut up and leave you be. :-) Cheers, mark -- Mark Mielke -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Re: [COMMITTERS] Can not create more than 32766 databases in ufs file system.
On 09/12/2009 03:33 PM, Stephen Frost wrote: * Mark Mielke (m...@mark.mielke.cc) wrote: No matter what scheme PostgreSQL uses for storing the data, there can be underlying file system limitations. This is true, but there's a reason we only create 1GB files too. I wouldn't be against a scheme such as described to minimize the impact to PG of these limitations. Ok - but "minimizing the impact" does not necessarily mean "keep doing what we are doing, but work around the issue." One interpretation of the problem is that the problem is that PostgreSQL is trying to use too many sub-directories in the same containing directory. I would argue that the problem is that PostgreSQL requires so many sub-directories in the first place. There are many database designs that do not require one file per database. Berkeley DB JE, for instance, treats each "database" as one root in a larger tree. The entire database is stored in one set of files, where the files are created due to database volume, not database quantity. Tables can be thought of similarly. There are many ways PostgreSQL could work around this problem - your suggestion of using sub-directories being one of them - but what happens if this causes performance degradation for existing users, due to the extra file system lookups required on every access? Ehhh, it's likely to be cached.. Sounds like a stretch to me that this would actually be a performance hit. If it turns out to really be one, we could just wait to move to subdirectories until some threshold (eg- 30k) is hit. Cached does not eliminate the cost. It just means it doesn't have to go to disk. It still needs to traverse an additional level of the VFS tree. Sure, this is designed to be cheap - but this avoids the real cost from consideration - that of having so many subdirectories in the first place. Another solution would be to store everything in the same file. eh? There is no technical requirement for PostgreSQL to separate data in databases or tables on subdirectory or file boundaries. Nothing wrong with having one or more large files that contain everything. PostgreSQL doesn't happen to do this today - but it's bothered me at times that it has so many files in the database directory - even very small tables require their own files. In any case, I think this would be a significant architecture change for something that sounds like a bad idea. I would expect having 32k databases to have significant performance degradations in other ways. Actually, I think some of the changes to remove flatfiles might improve our performance with large numbers of databases. I also don't see how this would be a significant architecture change at all. If there are still issues that make having lots of databases slow, we might want to look into fixing those issues rather than saying "well, just don't do that". I guess I'm not seeing how using 32k tables is a sensible model. So yes, things can be done to reduce the cost - but it seems like something is wrong if this is truly a requirement. There are alternative models of storage that would not require 32k tables, that likely perform better. Although, I don't know your requirements, so perhaps I am missing something. In particular, I am thinking about having to open a file descriptor for each of these files. What sort of database architecture requires 32k databases or tables for the same PostgreSQL instance? Have you considered having an additional field for your primary key and combining several tables into one? I've got a ton of instances that have>32K tables. My approach is generally to keep the number of databases low, while having lots of schemas, but there are distinct downsides to that (specifically related to hiding information.. something alot of people care about, but thankfully I don't have to). Do you agree with me that having 32k open file descriptors (or worse, open on demand file descriptors that need to be re-opened many times) is a problem? Looking at PostgreSQL today - I don't think it's designed to scale to this. Looking at SQL today, I think I would find it difficult to justify creating a solution that requires this capability. Honestly - it seems a bit insane. Sorry. :-) Maybe I'm just naive... Cheers, mark -- Mark Mielke -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Re: [COMMITTERS] Can not create more than 32766 databases in ufs file system.
* Stephen Frost (sfr...@snowman.net) wrote: > Ehhh, it's likely to be cached.. Sounds like a stretch to me that this > would actually be a performance hit. If it turns out to really be one, > we could just wait to move to subdirectories until some threshold (eg- > 30k) is hit. Thinking this through a bit more, I realized that I didn't explain my thought here very well. My idea would be- do everything as we do now, until we hit a threshold (perhaps an easy one would be '1'). Once we hit the threshold, create a subdirectory first and then the new database directory in that. eg: 0/ 1/ 2/ 3/ 4/ [...] / 1/0/ 1/1/ 1/2/ 1/3/ 1/[...] 1// 2/0/ 2/1/ 2/2/ 2/[...] 2// 3/0/ 3/1/ [...] 0/0/ 0/1/ 0/[...] 0// This would allow for 220M+ databases. I'm not sure how bad it'd be to introduce another field to pg_database which provides the directory (as it'd now be distinct from the oid..) or if that might require alot of changes. Not sure how easy it'd be to implement something to address this problem while we continue to tie the directory name to the oid. Thanks, Stephen signature.asc Description: Digital signature
[HACKERS] Re: [COMMITTERS] Can not create more than 32766 databases in ufs file system.
(removed -committers) Mark, * Mark Mielke (m...@mark.mielke.cc) wrote: > No matter what scheme PostgreSQL uses for storing the data, there can be > underlying file system limitations. This is true, but there's a reason we only create 1GB files too. I wouldn't be against a scheme such as described to minimize the impact to PG of these limitations. > There are many ways PostgreSQL could work around this problem - your > suggestion of using sub-directories being one of them - but what happens > if this causes performance degradation for existing users, due to the > extra file system lookups required on every access? Ehhh, it's likely to be cached.. Sounds like a stretch to me that this would actually be a performance hit. If it turns out to really be one, we could just wait to move to subdirectories until some threshold (eg- 30k) is hit. > Another solution would be to store everything in the same file. eh? > In any case, I think this would be a significant architecture change for > something that sounds like a bad idea. I would expect having 32k > databases to have significant performance degradations in other ways. Actually, I think some of the changes to remove flatfiles might improve our performance with large numbers of databases. I also don't see how this would be a significant architecture change at all. If there are still issues that make having lots of databases slow, we might want to look into fixing those issues rather than saying "well, just don't do that". > In > particular, I am thinking about having to open a file descriptor for > each of these files. What sort of database architecture requires 32k > databases or tables for the same PostgreSQL instance? Have you > considered having an additional field for your primary key and combining > several tables into one? I've got a ton of instances that have >32K tables. My approach is generally to keep the number of databases low, while having lots of schemas, but there are distinct downsides to that (specifically related to hiding information.. something alot of people care about, but thankfully I don't have to). thanks, Stephen signature.asc Description: Digital signature
[HACKERS] Re: [COMMITTERS] Can not create more than 32766 databases in ufs file system.
Not sure that this really belongs on pgsql-committers - maybe pgsql-hackers? No matter what scheme PostgreSQL uses for storing the data, there can be underlying file system limitations. One solution, for example, would be to use a file system that does not have a limitation of 32k subdirectories. Although ext3 and/or ufs has this limit - ext4 has removed this limited. There are many ways PostgreSQL could work around this problem - your suggestion of using sub-directories being one of them - but what happens if this causes performance degradation for existing users, due to the extra file system lookups required on every access? Another solution would be to store everything in the same file. In any case, I think this would be a significant architecture change for something that sounds like a bad idea. I would expect having 32k databases to have significant performance degradations in other ways. In particular, I am thinking about having to open a file descriptor for each of these files. What sort of database architecture requires 32k databases or tables for the same PostgreSQL instance? Have you considered having an additional field for your primary key and combining several tables into one? Cheers, mark On 09/12/2009 02:49 PM, fulan Peng wrote: Hi, pgsql-committers! I cannot created more than 32766 databases with freeBSD in one setup, not as the document says, "as many as you like". I found the problem is that the directory pgsql/data/base cannot hold more than 32766 subdirectories. I suggest to make 32766 subdirectories in base directory, say /base/0, /base/1, /base/32765. Then in each subdirectory to put the database description. This way, we can have 32766x32766 databases. This is kind of "as many as you like". The ZFS system is not a solution for me. It is snail slow. -- Mark Mielke -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers