Linus Heckemann wrote: > - there is a byte-array representation of text columns, which appears to > be independent of database encoding
Not sure what you're refering to. Both the on-disk and in-memory representations of text/varchar are encoding-dependent. > The obvious (to a naive user, like I was) approach, casting to bytea, > has exceptionally surprising behaviour: for many text strings, it does > exactly what the naive user might hope for, giving back the UTF-8 > representation. But multiple distinct text strings, like '\033' and > '\x1b', convert to the same byte string! And text strings containing a > backslash that doesn't fit the bytea hex format or the bytea escape > format will fail to convert completely! Yes. It seems a common mistake to forget or ignore that backslashes are special in the input text representation of bytea. It might be not obvious from reading the doc at [1] but we just need to quote backslashes by doubling them. AFAIK a working solution for the OP would be: sha256(replace(colname, '\', '\\')::bytea) The result is encoding-dependent, but that does not matter in the context of an expression. index. If the database ever needs to change its encoding, it will have to be recreated entirely anyway. [1] https://www.postgresql.org/docs/current/datatype-binary.html#DATATYPE-BINARY-BYTEA-ESCAPE-FORMAT Best regards, -- Daniel Vérité https://postgresql.verite.pro/
