On Sat, May 13, 2017 at 1:57 PM, Tom Lane <t...@sss.pgh.pa.us> wrote: > Basically, this is simply saying that you're willing to ignore the > hard cases, which reduces the problem to one of documenting the > portability limitations. You might as well not even bother with > worrying about the integer case, because porting between little- > and big-endian systems is surely far less common than cases you've > already said you're okay with blowing off. > > That's not an unreasonable position to take, perhaps; doing better > than that is going to be a lot more work and it's not very clear > how much real-world benefit results. But I can't follow the point > of worrying about endianness but not encoding.
Encoding is a user choice, not a property of the machine. Or, looking at it from another point of view, the set of values that can be represented by an int4 is the same whether they are represented in big-endian form or in little-endian form, but the set of values that are representable changes when you switch encodings. You could argue that text-under-LATIN1 and text-under-UTF8 aren't really the same data type at all. It's one thing to say "you can pick up your data and move it to a different piece of hardware and nothing will break". It's quite another thing to say "you can pick up your data and convert it to a different encoding and nothing will break". The latter is generally false already. Maybe LATIN1 -> UTF8 is no-fail, but what about UTF8 -> LATIN1 or SJIS -> anything? Based on previous mailing list discussions, I'm under the impression that it is sometimes debatable how a character in one encoding should be converted to some other encoding, either because it's not clear whether there is a mapping at all or it's unclear what mapping should be used. See, e.g., 2dbbf33f4a95cdcce66365bcdb47c885a8858d3c, or https://www.postgresql.org/message-id/1739a900-30ab-f48e-aec4-2b35475ecf02%402ndquadrant.com where it was discussed that being able to convert encoding A -> encoding B does not guarantee the ability to perform the reverse conversion. Arguing that a given int4 value should hash to the same value on every platform seems like a request that is at least superficially reasonable, if possibly practically tricky in some cases. Arguing that every currently supported encoding should hash every character the same way when they don't all have the same set of characters and the mappings between them are occasionally debatable is asking for the impossible. I certainly don't want to commit to a design for hash partitioning that involves a compatibility break any time somebody changes any encoding conversion in the system, even if a hash function that involved translating every character to some sort of universal code point before hashing it didn't seem likely to be horribly slow. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers