On Thu, Mar 31, 2022 at 9:16 AM Gunnar "Nick" Bluth <gunnar.bl...@pro-open.de> wrote: > That was meant to say "v10", sorry!
Hi, >From my point of view, at least, it would be preferable if you'd stop changing the subject line every time you post a new version. Based on the test results in http://postgr.es/m/42bfa680-7998-e7dc-b50e-480cdd986...@pro-open.de and the comments from Andres in https://www.postgresql.org/message-id/20211212234113.6rhmqxi5uzgipwx2%40alap3.anarazel.de my judgement would be that, as things stand today, this patch has no chance of being accepted, due to overhead. Now, Andres is currently working on an overhaul of the statistics collector and perhaps that would reduce the overhead of something like this to an acceptable level. If it does, that would be great news; I just don't know whether that's the case. As far as the statistics themselves are concerned, I am somewhat skeptical about whether it's really worth adding code for this. According to the documentation, the purpose of the patch is to allow you to assess choice of storage and compression method settings for a column and is not intended to be enabled permanently. However, it seems to me that you could assess that pretty easily without this patch: just create a couple of different tables with different settings, load up the same data via COPY into each one, and see what happens. Now you might answer that with the patch you would get more detailed and accurate statistics, and I think that's true, but it doesn't really look like the additional level of detail would be critical to have in order to make a proper assessment. You might also say that creating multiple copies of the table and loading the data multiple times would be expensive, and that's also true, but you don't really need to load it all. A representative sample of 1GB or so would probably suffice in most cases, and that doesn't seem likely to be a huge load on the system. Also, as we add more compression options, it's going to be hard to assess this sort of thing without trying stuff anyway. For example if you can set the lz4 compression level, you're not going to know which level is actually going to work best without trying out a bunch of them and seeing what happens. If we allow access to other sorts of compression parameters like zstd's "long" option, similarly, if you really care, you're going to have to try it. So my feeling is that this feels like a lot of machinery and a lot of worst-case overhead to solve a problem that's really pretty easy to solve without any new code at all, and therefore I'd be inclined to reject it. However, it's a well-known fact that sometimes my feelings about things are pretty stupid, and this might be one of those times. If so, I hope someone will enlighten me by telling me what I'm missing. Thanks, -- Robert Haas EDB: http://www.enterprisedb.com