On Tue, Feb 11, 2014 at 10:07 PM, Bruce Momjian <br...@momjian.us> wrote: > On Wed, Feb 5, 2014 at 10:57:57AM -0800, Peter Geoghegan wrote: >> On Wed, Feb 5, 2014 at 12:50 AM, Heikki Linnakangas >> <hlinnakan...@vmware.com> wrote: >> >> I think there's zero overlap. They're completely complimentary features. >> >> It's not like normal WAL records have an irrelevant volume. >> > >> > >> > Correct. Compressing a full-page image happens on the first update after a >> > checkpoint, and the diff between old and new tuple is not used in that >> > case. >> >> Uh, I really just meant that one thing that might overlap is >> considerations around the choice of compression algorithm. I think >> that there was some useful discussion of that on the other thread as >> well. > > Yes, that was my point. I though the compression of full-page images > was a huge win and that compression was pretty straight-forward, except > for the compression algorithm. If the compression algorithm issue is > resolved,
By issue, I assume you mean to say, which compression algorithm is best for this patch. For this patch, currently we have 2 algorithm's for which results have been posted. As far as I understand Heikki is pretty sure that the latest algorithm (compression using prefix-suffix match in old and new tuple) used for this patch is better than the other algorithm in terms of CPU gain or overhead. The performance data taken by me for the worst case for this algorithm shows there is a CPU overhead for this algorithm as well. OTOH the another algorithm (compression using old tuple as history) can be a bigger win in terms I/O reduction in more number of cases. In short, it is still not decided which algorithm to choose and whether it can be enabled by default or it is better to have table level switch to enable/disable it. So I think the decision to be taken here is about below points: 1. Are we okay with I/O reduction at the expense of CPU for *worst* cases and I/O reduction without impacting CPU (better overall tps) for *favourable* cases? 2. If we are not okay with worst case behaviour, then can we provide a table-level switch, so that it can be decided by user? 3. If none of above, then is there any other way to mitigate the worst case behaviour or shall we just reject this patch and move on. Given a choice to me, I would like to go with option-2, because I think for most cases UPDATE statement will have same data for old and new tuples except for some part of tuple (generally column's having large text data are not modified), so we will be end up mostly in favourable cases and surely for worst cases we don't want user to suffer from CPU overhead, so a table-level switch is also required. I think here one might argue that for some users it is not feasible to decide whether their tuples data for UPDATE is going to be similar or completely different and they are not at all ready for any risk for CPU overhead, but they would be happy to see I/O reduction in which case it is difficult to decide what should be the value of table-level switch. Here I think the only answer is "nothing is free" in this world, so either make sure about the application's behaviour for UPDATE statement before going to production or just don't enable this switch and be happy with the current behaviour. On the other side there will be users who will be pretty certain about their usage of UPDATE statement or atleast are ready to evaluate their application if they can get such a huge gain, so it would be quite useful feature for such users. >can we move move forward with the full-page compression patch? In my opinion, it is not certain that whatever compression algorithm got decided for this patch (if any) can be directly used for full-page compression, some ideas could be used or may be the algorithm could be tweaked a bit to make it usable for full-page compression. With Regards, Amit Kapila. EnterpriseDB: http://www.enterprisedb.com -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers