Thinking about the nearby thread[1] about overrunning MaxAllocSize during encoding conversion, it struck me that another thing we could usefully do to improve that situation is to be smarter about what's the growth factor --- the existing one-size-fits-all choice of MAX_CONVERSION_GROWTH = 4 is leaving a lot on the table.
In particular, it seems like we could usefully frame things as having a constant max growth factor associated with each target encoding, stored as a new field in pg_wchar_table[]. By definition, the max growth factor cannot be more than the maximum character length in the target encoding. So this approach immediately gives us a growth factor of 1 with any single-byte output encoding, and even many of the multibyte encodings would have max growth 2 or 3 without having to think any harder than that. But we can do better, I think, recognizing that all the supported encodings are ASCII extensions. The only possible way to expend 4 output bytes per input byte is if there is some 1-byte character that translates to a 4-byte character, and I think this is not the case for converting any of our encodings to UTF8. If you need at least a 2-byte character to produce a 3-byte or 4-byte UTF8 character, then UTF8 has max growth 2. I'm not quite sure if that's true for every source encoding, but I'm pretty certain it couldn't be worse than 3. It might be worth getting a bit more complex and having a 2-D array indexed by both source and destination encodings to determine the max growth factor. I haven't run tests to empirically verify what is the max growth factor. A fly in this ointment is: could a custom encoding conversion function violate our conclusions about what's the max growth factor? Maybe it would be worth treating the growth factor as a property of a particular conversion (i.e., add a column to pg_conversion) rather than making it a hard-wired property. In any case, it seems likely that we could end up with a multiplier of 1, 2, or 3 rather than 4 in just about every case of practical interest. That sure seems like a win when converting long strings. Thoughts? regards, tom lane [1] https://www.postgresql.org/message-id/flat/20190816181418.GA898@alvherre.pgsql