On 01.03.2019 1:32, Thomas Munro wrote:
On Fri, Mar 1, 2019 at 10:41 AM Shawn Debnath <s...@amazon.com> wrote:
On Fri, Mar 01, 2019 at 10:33:06AM +1300, Thomas Munro wrote:
It doesn't make any sense to put things like clog or any other SLRU in
a non-default tablespace though.  It's perfectly OK if not all smgr
implementations know how to deal with tablespaces, and the SLRU
support should just not support that.
If the generic storage manager, or whatever we end up calling it, ends
up being generic enough - its possible that tablespace value would have
to be respected.
Right, you and I have discussed this a bit off-list, but for the
benefit of others, I think what you're getting at with "generic
storage manager" here is something like this: on the one hand, our
proposed revival of SMGR as a configuration point is about is
supporting alternative file layouts for bufmgr data, but at the same
time there is some background noise about direct IO, block encryption,
... and who knows what alternative block storage someone might come up
with ... at the block level.  So although it sounds a bit
contradictory to be saying "let's make all these different SMGRs!" at
the same time as saying "but we'll eventually need a single generic
SMGR that is smart enough to be parameterised for all of these
layouts!", I see why you say it.  In fact, the prime motivation for
putting SLRUs into shared buffers is to get better buffering, because
(anecdotally) slru.c's mini-buffer scheme performs abysmally without
the benefit of an OS page cache.  If we add optional direct IO support
(something I really want), we need it to apply to SLRUs, undo and
relations, ideally without duplicating code, so we'd probably want to
chop things up differently.  At some point I think we'll need to
separate the questions "how to map blocks to filenames and offsets"
and "how to actually perform IO".  I think the first question would be
controlled by the SMGR IDs as discussed, but the second question
probably needs to be controlled by GUCs that control all IO, and/or
special per relation settings (supposing you can encrypt just one
table, as a random example I know nothing about); but that seems way
out of scope for the present projects.  IMHO the best path from here
is to leave md.c totally untouched for now as the SMGR for plain old
relations, while we work on getting these new kinds of bufmgr data
into the tree as a first step, and a later hypothetical direct IO or
whatever project can pay for the refactoring to separate IO from
layout.


I completely agree with this statement:

At some point I think we'll need to separate the questions "how to map blocks to filenames and 
offsets" and "how to actually perform IO".


There are two subsystems developed in PgPro which are integrated in Postgres at file IO level: CFS (compressed file system) and SnapFS (fast database snapshots). First one provides page level encryption and compression, second - mechanism for fast restoring of database state. Both are implemented by patching fd.c. My first idea was to implement them as alternative storage devices (alternative to md.c). But it will require duplication of all segment mapping logic from md.c + file descriptors cache from fd.c. It will be nice if it is possible to redefine raw file operations (FileWrite, FileRead,...) without affecting segment mapping logic.

One more thing... From my point of view one of the drawbacks of Postgres is that it requires underlaying file system and is not able to work with raw partitions. It seems to me that bypassing fle system layer can significantly improve performance and give more possibilities for IO performance tuning. Certainly it will require a log of changes in Postgres storage layer so this is not what I suggest to implement or even discuss right now. But it may be useful to keep it in mind in discussions concerning "generic storage manager".


--
Konstantin Knizhnik
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company


Reply via email to