Re: Optimize UUID parse using SIMD

Haibo Yan Thu, 25 Jun 2026 14:31:38 -0700

On Thu, Jun 25, 2026 at 11:28 AM Masahiko Sawada <[email protected]>
wrote:


> Hi all,
>
> I'd like to propose the $subject.
>
> Since commit ec8719ccbfcd made hex_decode_safe() SIMD-aware, decoding
> a run of hex digits is now fast. The attached patch reuses
> hex_decode_safe() in the UUID input function to speed up parsing.
>
> We accept several textual forms of a UUID[1]. The fast path handles
> the common ones: 32 hex digits, the canonical 8x-4x-4x-4x-12x form
> (where "nx" means n hex digits), and either of those wrapped in
> braces. Otherwise, it falls back to the ordinary scalar UUID parse.
>
> I've benchmarked the parse speed using the following query:
>
> CREATE TEMP TABLE u AS SELECT gen_random_uuid()::text AS t FROM
> generate_series(1, 1000000);
> EXPLAIN (ANALYZE, TIMING OFF) SELECT t::uuid FROM u;
>
> I compared the execution time of the second query, which measures
> uuid_in() alone, with/without SIMD optimization. Here are results (the
> median of 5 runs):
>
> HEAD: 208.879 ms
> Patched: 40.983 ms
>
> The improvements look promising to me. But in a realistic pipeline the
> parse is a small fraction of the work, so end-to-end gains could be
> much smaller.
>
> Feedback is very welcome.
>
> I may be missing something, but I wonder whether the fast path is relying
on
slightly different input semantics from the existing UUID parser.

In particular, hex_decode_safe() is not a strict “32 hex characters only”
decoder.  It skips whitespace, which is fine for its existing callers, but I
don’t think UUID input should treat whitespace inside the UUID body as
ignorable.  Also, since hex_decode_safe() returns void, the UUID fast path
cannot verify that exactly UUID_LEN bytes were produced.

So I think it would be safer either to pre-validate that the 32 source
characters are all hex digits before calling hex_decode_safe(), or to use a
UUID-specific strict hex decoder for this path.  After that, a comment
explaining why hex_decode_safe() is safe here would make the invariant much
clearer.

Could you also add a few regression tests for invalid inputs that contain
whitespace inside otherwise fast-path-looking UUID strings?  For example:

---------------------------------------------------------------

SELECT 'a0eebc99 9c0b4ef8bb6d6bb9bd380a11'::uuid;
SELECT 'a0eebc999c0b4ef8bb6d6bb9bd380a1 '::uuid;
SELECT '{a0eebc999c0b4ef8bb6d6bb9bd380a1 }'::uuid;
SELECT 'a0eebc99-9c0b-4ef8-bb6d-6bb9bd380a1 '::uuid;
---------------------------------------------------------------

These should continue to be rejected in the same way as the scalar parser.
Regards,

Haibo

> Regards,
>
> [1] https://www.postgresql.org/docs/devel/datatype-uuid.html#DATATYPE-UUID
>
> --
> Masahiko Sawada
> Amazon Web Services: https://aws.amazon.com
>

Re: Optimize UUID parse using SIMD

Reply via email to