Re: Can we get sha* function over text, that could be used in index?

Linus Heckemann Mon, 23 Feb 2026 07:16:36 -0800

"Daniel Verite" <[email protected]> writes:
>> - there is a byte-array representation of text columns, which appears to
>>  be independent of database encoding
>
> Not sure what you're refering to. Both the on-disk and in-memory
> representations of text/varchar are encoding-dependent.


Huh, I thought I'd come to the opposite conclusion from testing the
other day (md5('ÿ') giving the same results for both a utf-8 database
and a latin1 database), but I must have been holding something wrong,
because trying again does confirm this. Sorry about that!


>> The obvious (to a naive user, like I was) approach, casting to bytea,
>> has exceptionally surprising behaviour: for many text strings, it does
>> exactly what the naive user might hope for, giving back the UTF-8
>> representation. But multiple distinct text strings, like '\033' and
>> '\x1b', convert to the same byte string! And text strings containing a
>> backslash that doesn't fit the bytea hex format or the bytea escape
>> format will fail to convert completely!
>
> Yes. It seems a common mistake to forget or ignore that
> backslashes are special in the input text representation of bytea.
> It might be not obvious from reading the doc at [1]
> but we just need to quote backslashes by doubling them.
>
> AFAIK a working solution for the OP would be:
>  sha256(replace(colname, '\', '\\')::bytea)

That's so simple that it feels like it should have been
obvious... Thanks! Nevertheless, it feels like a weird and potentially
computationally expensive contortion for getting from a text value to
its bytea representation.

Given their encoding-dependence, shouldn't md5() (and ::bytea, for that
matter) be marked only STABLE, not IMMUTABLE? Or is IMMUTABLE's

> guaranteed to return the same results given the same arguments forever

only valid within the context of a single database?


md5() and ::bytea aside, it seems like it would be good to have both

 - a function for accessing the bytea representation of a text value
   without any extra steps, and

 - an IMMUTABLE function for getting a UTF-8 (specifically UTF-8,
   because it's the only encoding with a character set encompassing all
   other supported character sets) bytea representation of a text value
   regardless of the database encoding?

Cheers
Linus

Re: Can we get sha* function over text, that could be used in index?

Reply via email to