Re: [HACKERS] Re: [COMMITTERS] Can not create more than 32766 databases in ufs file system.

Mark Mielke Sat, 12 Sep 2009 12:52:44 -0700

On 09/12/2009 03:33 PM, Stephen Frost wrote:

* Mark Mielke (m...@mark.mielke.cc) wrote:

No matter what scheme PostgreSQL uses for storing the data, there can be
underlying file system limitations.

This is true, but there's a reason we only create 1GB files too.  I
wouldn't be against a scheme such as described to minimize the impact to
PG of these limitations.

Ok - but "minimizing the impact" does not necessarily mean "keep doingwhat we are doing, but work around the issue." One interpretation of theproblem is that the problem is that PostgreSQL is trying to use too manysub-directories in the same containing directory. I would argue that theproblem is that PostgreSQL requires so many sub-directories in the firstplace.

There are many database designs that do not require one file perdatabase. Berkeley DB JE, for instance, treats each "database" as oneroot in a larger tree. The entire database is stored in one set offiles, where the files are created due to database volume, not databasequantity. Tables can be thought of similarly.

There are many ways PostgreSQL could work around this problem - your
suggestion of using sub-directories being one of them - but what happens
if this causes performance degradation for existing users, due to the
extra file system lookups required on every access?

Ehhh, it's likely to be cached..  Sounds like a stretch to me that this
would actually be a performance hit.  If it turns out to really be one,
we could just wait to move to subdirectories until some threshold (eg-
30k) is hit.

Cached does not eliminate the cost. It just means it doesn't have to goto disk. It still needs to traverse an additional level of the VFS tree.Sure, this is designed to be cheap - but this avoids the real cost fromconsideration - that of having so many subdirectories in the first place.

Another solution would be to store everything in the same file.

eh?

There is no technical requirement for PostgreSQL to separate data indatabases or tables on subdirectory or file boundaries. Nothing wrongwith having one or more large files that contain everything. PostgreSQLdoesn't happen to do this today - but it's bothered me at times that ithas so many files in the database directory - even very small tablesrequire their own files.

In any case, I think this would be a significant architecture change for
something that sounds like a bad idea. I would expect having 32k
databases to have significant performance degradations in other ways.

Actually, I think some of the changes to remove flatfiles might improve
our performance with large numbers of databases.  I also don't see how
this would be a significant architecture change at all.  If there are
still issues that make having lots of databases slow, we might want to
look into fixing those issues rather than saying "well, just don't do
that".

I guess I'm not seeing how using 32k tables is a sensible model. So yes,things can be done to reduce the cost - but it seems like something iswrong if this is truly a requirement. There are alternative models ofstorage that would not require 32k tables, that likely perform better.Although, I don't know your requirements, so perhaps I am missing something.

In
particular, I am thinking about having to open a file descriptor for
each of these files. What sort of database architecture requires 32k
databases or tables for the same PostgreSQL instance? Have you
considered having an additional field for your primary key and combining
several tables into one?

I've got a ton of instances that have>32K tables.  My approach is
generally to keep the number of databases low, while having lots of
schemas, but there are distinct downsides to that (specifically related
to hiding information..  something alot of people care about, but
thankfully I don't have to).

Do you agree with me that having 32k open file descriptors (or worse,open on demand file descriptors that need to be re-opened many times) isa problem?

Looking at PostgreSQL today - I don't think it's designed to scale tothis. Looking at SQL today, I think I would find it difficult to justifycreating a solution that requires this capability.


Honestly - it seems a bit insane. Sorry. :-) Maybe I'm just naive...

Cheers,
mark

--
Mark Mielke<m...@mielke.cc>


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Re: [COMMITTERS] Can not create more than 32766 databases in ufs file system.

Reply via email to