At 2020-05-21 15:04:55, "Fabien COELHO" <> wrote: > >Hello, 
> >My 0.02, some of which may just show some misunderstanding on my part: > > 
- Could this be proposed as some kind of extension, provided that enough > 
hooks are available? ISTM that foreign tables and/or alternative > storage 
engine (aka ACCESS METHOD) provide convenient APIs which could > fit the need 
for these? Or are they not appropriate? You seem to > suggest that there are 
not. > > If not, what could be done to improve API to allow what you are 
seeking > to do? Maybe you need a somehow lower-level programmable API which 
does > not exist already, or at least is not exported already, but could be > 
specified and implemented with limited effort? Basically you would like > to 
read/write pg pages to somewhere, and then there is the syncing > issue to 
consider. Maybe such a "page storage" API could provide > benefit for some 
specialized hardware, eg persistent memory stores, > so there would be more 
reason to define it anyway? I think it might > be valuable to give it some 
thoughts. Thank you for giving so many comments. In my opinion, developing a 
foreign table or a new storage engine, in addition to compression, also needs 
to do a lot of extra things. A similar explanation was mentioned in Nikolay P's 
email. The "page storage" API may be a good choice, and I will consider it, but 
I have not yet figured out how to implement it. > - Could you maybe elaborate 
on how your plan differs from [4] and [5]? My solution is similar to CFS, and 
it is also embedded in the file access layer (fd.c, md.c) to realize the 
mapping from block number to the corresponding file and location where 
compressed data is stored. However, the most important difference is that I 
hope to avoid the need for GC through the design of the page layout.
 >> The most difficult thing in CFS development is certainly >> 
defragmentation. In CFS it is done using background garbage collection, >> by 
one or one >> GC worker processes. The main challenges were to minimize its >> 
interaction with normal work of the system, make it fault tolerant and >> 
prevent unlimited growth of data segments. >> CFS is not introducing its own 
storage manager, it is mostly embedded in >> existed Postgres file access layer 
(fd.c, md.c). It allows to reused >> code responsible for mapping relations and 
file descriptors cache. As it >> was recently discussed in hackers, it may be 
good idea to separate the >> questions "how to map blocks to filenames and 
offsets" and "how to >> actually perform IO". In this it will be easier to 
implement compressed >> storage manager. > - Have you consider keeping page 
headers and compressing tuple data > only? In that case, we must add some 
additional information in the page header to identify whether this is a 
compressed page or an uncompressed page. When a compressed page becomes an 
uncompressed page, or vice versa, an uncompressed page becomes a compressed 
page, the original page header must be modified. This is unacceptable because 
it requires modifying the shared buffer and recalculating the checksum. 
However, it should be feasible to put this flag in the compressed address file. 
The problem with this is that even if a page only occupies the size of one 
compressed block, the address file needs to be read, that is, from 1 IO to 2 
IO. Since the address file is very small, it is basically a memory access, this 
cost may not be as large as I had imagined. > - I'm not sure there is a point 
in going below the underlying file > system blocksize, quite often 4 KiB? Or 
maybe yes? Or is there > a benefit to aim at 1/4 even if most pages overflow? 
My solution is mainly optimized for scenarios where the original page can be 
compressed to only require one compressed block of storage. The scene where the 
original page is stored in multiple compressed blocks is suitable for scenarios 
that are not particularly sensitive to performance, but are more concerned 
about the compression rate, such as cold data. In addition, users can also 
choose to compile PostgreSQL with 16KB or 32KB BLOCKSZ. > - ISTM that your 
approach entails 3 "files". Could it be done with 2? > I'd suggest that the 
possible overflow pointers (coa) could be part of > the headers so that when 
reading the 3.1 page, then the header would > tell where to find the overflow 
3.2, without requiring an additional > independent structure with very small 
data in it, most of it zeros. > Possibly this is not possible, because it would 
require some available > space in standard headers when the is page is not 
compressible, and > there is not enough. Maybe creating a little room for that 
in > existing headers (4 bytes could be enough?) would be a good compromise. > 
Hmmm. Maybe the approach I suggest would only work for 1/2 compression, > but 
not for other target ratios, but I think it could be made to work > if the 
pointer can entail several blocks in the overflow table. My solution is 
optimized for scenarios where the original page can be compressed to only need 
one compressed block to store, In this scenario, only 1 IO is required for 
reading and writing, and there is no need to access additional overflow address 
file and overflow data file. Your suggestion reminded me. The performance 
difference may not be as big as I thought (testing and comparison is required). 
If I give up the pursuit of "only one IO", the file layout can be simplified. 
For example, it is simplified to the following form, only two files (the 
following example uses a compressed block size of 4KB) # Page storage(Plan B) 
Use the compress address file to store the compressed block pointer, and the 
Compress data file stores the compressed block data. compress address file: 0 1 
2 3 +=======+=======+=======+=======+=======+ | head | 1 | 2 | 3,4 | 5 | 
+=======+=======+=======+=======+=======+ compress address file saves the 
following information for each page -Compressed size (when size is 0, it means 
uncompressed format) -Block number occupied in Compress data file By the way, I 
want to access the compress address file through mmap, just like snapfs
 Compress data file: 0 1 2 3 4 
+=========+=========+==========+=========+=========+ | data1 | data2 | data3_1 
| data3_2 | data4 | +=========+=========+==========+=========+=========+ | 4K | 
# Page storage(Plan C) Further, since the size of the compress address file is 
fixed, the above address file and data file can also be combined into one file 
0 1 2 123071 0 1 2 +=======+=======+=======+ +=======+=========+=========+ | 
head | 1 | 2 | ... | | data1 | data2 | ... +=======+=======+=======+ 
+=======+=========+=========+ head | address | data | If the difference in 
performance is so negligible, maybe Plan C is a better solution. (Are there any 
other problems?) > > - Maybe the compressed and overflow table could become 
bloated somehow, > which would require the vaccuuming implementation and add to 
the > complexity of the implementation? > Vacuuming is what I try to avoid. As 
I explained in the first email, even without vaccuum, bloating should not 
become a serious problem. >>However, the fragment will only appear in the scene 
where the size of the same block is frequently changed greatly after 
compression. >>... >>And no matter how severe the fragmentation, the total 
space occupied by the compressed table cannot be larger than the original table 
before compression. Best Regards Chen Huajun

Reply via email to