Hello,

My 0.02€, some of which may just show some misunderstanding on my part:

 - you have clearly given quite a few thoughts about the what and how…
   which makes your message an interesting read.

 - Could this be proposed as some kind of extension, provided that enough
   hooks are available? ISTM that foreign tables and/or alternative
   storage engine (aka ACCESS METHOD) provide convenient APIs which could
   fit the need for these? Or are they not appropriate? You seem to
   suggest that there are not.

   If not, what could be done to improve API to allow what you are seeking
   to do? Maybe you need a somehow lower-level programmable API which does
   not exist already, or at least is not exported already, but could be
   specified and implemented with limited effort? Basically you would like
   to read/write pg pages to somewhere, and then there is the syncing
   issue to consider. Maybe such a "page storage" API could provide
   benefit for some specialized hardware, eg persistent memory stores,
   so there would be more reason to define it anyway? I think it might
   be valuable to give it some thoughts.

 - Could you maybe elaborate on how your plan differs from [4] and [5]?

 - Have you consider keeping page headers and compressing tuple data
   only?

 - I'm not sure there is a point in going below the underlying file
   system blocksize, quite often 4 KiB? Or maybe yes? Or is there
   a benefit to aim at 1/4 even if most pages overflow?

 - ISTM that your approach entails 3 "files". Could it be done with 2?
   I'd suggest that the possible overflow pointers (coa) could be part of
   the headers so that when reading the 3.1 page, then the header would
   tell where to find the overflow 3.2, without requiring an additional
   independent structure with very small data in it, most of it zeros.
   Possibly this is not possible, because it would require some available
   space in standard headers when the is page is not compressible, and
   there is not enough. Maybe creating a little room for that in
   existing headers (4 bytes could be enough?) would be a good compromise.
   Hmmm. Maybe the approach I suggest would only work for 1/2 compression,
   but not for other target ratios, but I think it could be made to work
   if the pointer can entail several blocks in the overflow table.

 - If one page is split in 3 parts, could it creates problems on syncing,
   if 1/3 or 2/3 pages get written, but maybe that is manageable with WAL
    as it would note that the page was not synced and that is enough for
    replay.

 - I'm unclear how you would manage the 2 representations of a page in
   memory. I'm afraid that a 8 KiB page compressed to 4 KiB would
   basically take 12 KiB, i.e. reduce the available memory for caching
   purposes. Hmmm. The current status is that a written page probably
   takes 16 KiB, once in shared buffers and once in the system caches,
   so it would be an improvement anyway.

 - Maybe the compressed and overflow table could become bloated somehow,
   which would require the vaccuuming implementation and add to the
   complexity of the implementation?

 - External tools should be available to allow page inspection, eg for
   debugging purposes.

--
Fabien.

Reply via email to