> On 29/07/2023 08:42 CEST Alex Shan <3341...@gmail.com> wrote:
>
> In my DB I have a large object over 4GB in size.
> I need to get its MD5 or SHA256 from within psql query, i.e. without
> exporting it to FS first.
>
> “SELECT md5(lo_get(loid));” doesnt work — “large object is too large”.
>
> Is there any other way to do it?

Is plpython3u [1] an option for you?  In that case you can use Python's hashlib
in a custom function and feed every page from pg_largeobject to a selected hash
function.

For example:

        BEGIN;

        CREATE EXTENSION plpython3u;

        CREATE OR REPLACE FUNCTION lo_hash(loid oid, name text)
            RETURNS bytea
            LANGUAGE plpython3u
        AS $$
            import hashlib
        
            hash = hashlib.new(name)
        
            # Check if large object exists.
            plan = plpy.prepare("""
                SELECT
                FROM pg_largeobject_metadata
                WHERE oid = $1
            """, ['oid'])
            rv = plpy.execute(plan, [loid])
        
            if rv.nrows() == 0:
                raise ValueError(f"large object {loid} does not exist")
        
            # Get all pages (possibly zero).
            plan = plpy.prepare("""
                SELECT data
                FROM pg_largeobject
                WHERE loid = $1
                ORDER BY pageno
            """, ['oid'])
            pages = plpy.cursor(plan, [loid])
        
            for page in pages:
                hash.update(page['data'])
        
            return hash.digest()
        $$;

        COMMIT;

Testing with 65 KiB null bytes:

        BEGIN;

        SELECT
          lo_from_bytea(0, decode(repeat('00', 1 << 16), 'hex')) AS test_loid
        \gset
        
        SELECT
          loid,
          count(*) AS n_pages,
          sum(length(data)) AS n_bytes
        FROM pg_largeobject
        WHERE loid = :test_loid
        GROUP BY loid;

          loid  | n_pages | n_bytes
        --------+---------+---------
         365958 |      32 |   65536
        (1 row)

        SELECT
          :test_loid AS loid,
          lo_hash(:test_loid, 'md5') AS md5;

          loid  |                md5
        --------+------------------------------------
         365958 | \xfcd6bcb56c1689fcef28b57c22475bad
        (1 row)

        SELECT
          :test_loid AS loid,
          lo_hash(:test_loid, 'sha256') AS sha256;

          loid  |                               sha256
        
--------+--------------------------------------------------------------------
         365958 | 
\xde2f256064a0af797747c2b97505dc0b9f3df0de4f489eac731c23ae9ca9cc31
        (1 row)

        ROLLBACK;

Verifying the hashes:

        $ head -c65536 /dev/zero | md5sum
        fcd6bcb56c1689fcef28b57c22475bad  -
        $ head -c65536 /dev/zero | sha256sum
        de2f256064a0af797747c2b97505dc0b9f3df0de4f489eac731c23ae9ca9cc31  -

[1] https://www.postgresql.org/docs/15/plpython.html

--
Erik


Reply via email to