On Thu, Jun 4, 2026 at 2:17 AM Tom Lane <[email protected]> wrote:
>
> Ewan Young <[email protected]> writes:
> > I noticed that setweight() reports an internal error (SQLSTATE XX000)
> > when the weight argument is not one of A/a, B/b, C/c, D/d, even though
> > the weight comes directly from user input. The two-argument variant
> > also prints the weight as a raw ASCII code, which is a bit unfriendly:
>
> > =# SELECT setweight('cat:1'::tsvector, 'p');
> > ERROR: unrecognized weight: 112
>
> I agree that these ought to be ereport()s. However, I suspect that
> the reason for printing bogus weights numerically was to avoid the
> risk of generating encoding-incorrect strings if the given char
> value has its high bit set. The existing code in tsvector_filter
> is failing to consider that hazard.
Ah, I hadn't considered that. You're right: in a multibyte encoding
the bogus byte could well be a fragment of a multibyte character, so
printing it with %c would inject an invalidly-encoded byte into the
error message. The style used in v2 (matching charout()) keeps
the message pure ASCII, which seems clearly safer.
>
> I experimented with making the error messages print non-ASCII
> characters differently, and soon decided that that added enough
> complexity that we shouldn't have three copies of it. So the
> attached proposed v2 also factors the code out into a new
> function parse_weight (maybe a different name would be better?).
Factoring it out looks like a clear improvement. parse_weight reads
fine to me; I don't think it's worth bikeshedding.
I tested v2 on top of current master:
- applies cleanly, builds without new warnings
- core regression suite passes
- manually exercised the error paths, and it works:
=# \set VERBOSITY verbose
=# SELECT setweight('cat:1'::tsvector, 'p');
ERROR: 22023: unrecognized weight: "p"
LOCATION: parse_weight, tsvector_op.c:236
=# SELECT setweight('cat:1'::tsvector, '\200');
ERROR: 22023: unrecognized weight: "\200"
LOCATION: parse_weight, tsvector_op.c:240
>
> I'm unconvinced that we really need a regression test case for
> this ...
Agreed, no objection to dropping it. The behavior worth checking is
the message formatting, which is easy enough to verify by hand.
>
> regards, tom lane
>
So v2 looks good to me. Thanks for improving the patch!
Best regards,
Ewan Young