Re: Use ereport() instead of elog() for invalid weights in setweight()

Ewan Young Wed, 03 Jun 2026 19:33:13 -0700

On Thu, Jun 4, 2026 at 2:17 AM Tom Lane <[email protected]> wrote:
>
> Ewan Young <[email protected]> writes:
> > I noticed that setweight() reports an internal error (SQLSTATE XX000)
> > when the weight argument is not one of A/a, B/b, C/c, D/d, even though
> > the weight comes directly from user input.  The two-argument variant
> > also prints the weight as a raw ASCII code, which is a bit unfriendly:
>
> >     =# SELECT setweight('cat:1'::tsvector, 'p');
> >     ERROR:  unrecognized weight: 112
>
> I agree that these ought to be ereport()s.  However, I suspect that
> the reason for printing bogus weights numerically was to avoid the
> risk of generating encoding-incorrect strings if the given char
> value has its high bit set.  The existing code in tsvector_filter
> is failing to consider that hazard.


Ah, I hadn't considered that.  You're right: in a multibyte encoding
the bogus byte could well be a fragment of a multibyte character, so
printing it with %c would inject an invalidly-encoded byte into the
error message.  The style used in v2 (matching charout()) keeps
the message pure ASCII, which seems clearly safer.

>
> I experimented with making the error messages print non-ASCII
> characters differently, and soon decided that that added enough
> complexity that we shouldn't have three copies of it.  So the
> attached proposed v2 also factors the code out into a new
> function parse_weight (maybe a different name would be better?).

Factoring it out looks like a clear improvement.  parse_weight reads
fine to me; I don't think it's worth bikeshedding.
I tested v2 on top of current master:
- applies cleanly, builds without new warnings
- core regression suite passes
- manually exercised the error paths, and it works:

=# \set VERBOSITY verbose
=# SELECT setweight('cat:1'::tsvector, 'p');
ERROR:  22023: unrecognized weight: "p"
LOCATION:  parse_weight, tsvector_op.c:236
=# SELECT setweight('cat:1'::tsvector, '\200');
ERROR:  22023: unrecognized weight: "\200"
LOCATION:  parse_weight, tsvector_op.c:240

>
> I'm unconvinced that we really need a regression test case for
> this ...

Agreed, no objection to dropping it.  The behavior worth checking is
the message formatting, which is easy enough to verify by hand.

>
>                         regards, tom lane
>

So v2 looks good to me.  Thanks for improving the patch!

Best regards,
Ewan Young

Re: Use ereport() instead of elog() for invalid weights in setweight()

Reply via email to