On Mon, 4 Dec 2023 at 22:03, Kirill Reshke <reshkekir...@gmail.com> wrote: > > On Mon, 4 Dec 2023 at 22:21, Matthias van de Meent > <boekewurm+postg...@gmail.com> wrote: >> >> On Mon, 4 Dec 2023 at 17:51, Kirill Reshke <reshkekir...@gmail.com> wrote: >> > >> > So, 0002 patch uses the `get_tablespace` function, which searches Catalog >> > to tablespace SMGR id. I wonder how `smgr_redo` would work with it? >> >> That's a very good point I hadn't considered in detail yet. Quite >> clearly, the current code is wrong in assuming that the catalog is >> accessible, and it should probably be stored in a way similar to >> pg_filenode.map in a file managed outside the buffer pool. >> > Hmm, pg_filenode.map is a nice idea. So, simply maintain TableSpaceOId -> > smgr id mapping in a separate file and update the whole file on any changes, > right? > Looks reasonable to me, but it is clear that this solution can be really slow > in some patterns, like if we create many-many tablespaces(the way you > suggested it in the per-relation SMGR feature). Maybe we can store data in > files somehow separately, and only update one chunk per operation.
Yes, but that's a later issue... I'm not sure many-many tablespaces is actually a good thing. There are already very few reasons to store tables in more than just the default tablespace. For temporary relations, there is indeed a guc to automatically put them into one tablespace; and I can see a similar thing being useful for temporary relations, too. Then there I can see high-performant local disks vs lower-performant (but cheaper) local disks also as something reasonable. But that only gets us to ~6 tablespaces, assuming separate tablespaces for each combination of (normal, temp, unlogged) * (fast, cheap). I'm not sure there are many other reasons to add tablespaces, let alone making one for each table. Note that you can select which tablespace a table is stored in, so I see very little reason to actually do something about large numbers of tablespaces being prohibitively expensive performance-wise. Why do you want to have a whole new storage configuration for each of your relations? > Anyway, if we use a `pg_filenode.map` - like solution, we need to reuse its > code infrasture, right? For example, it seems that code that calculates > checksums can be reused. > So, we need to refactor code here, define something like FileMap API maybe. > Or is it not really worth it? We can just write similar code twice. I'm not sure about that. I really doubt we'll need things that are that similar: right now, the tablespace->smgr mapping could be considered to be implied by the symlinks in /pg_tblspc/. Non-MD tablespaces could add a file <oid>.tblspc that detail their configuration, which would also fix the issue of spcoid->smgr mapping. Kind regards, Matthias van de Meent Neon (https://neon.tech)