Re: [PATCH] json_lex_string: don't overread on bad UTF8

Jacob Champion Tue, 07 May 2024 14:06:37 -0700

On Mon, May 6, 2024 at 8:43 PM Michael Paquier <[email protected]> wrote:
> On Fri, May 03, 2024 at 07:05:38AM -0700, Jacob Champion wrote:
> > We could port something like that to src/common. IMO that'd be more
> > suited for an actual conversion routine, though, as opposed to a
> > parser that for the most part assumes you didn't lie about the input
> > encoding and is just trying not to crash if you're wrong. Most of the
> > time, the parser just copies bytes between delimiters around and it's
> > up to the caller to handle encodings... the exceptions to that are the
> > \uXXXX escapes and the error handling.
>
> Hmm.  That would still leave the backpatch issue at hand, which is
> kind of confusing to leave as it is.  Would it be complicated to
> truncate the entire byte sequence in the error message and just give
> up because we cannot do better if the input byte sequence is
> incomplete?


Maybe I've misunderstood, but isn't that what's being done in v2?

> > Maybe I'm missing
> > code somewhere, but I don't see a conversion routine from
> > json_errdetail() to the actual client/locale encoding. (And the parser
> > does not support multibyte input_encodings that contain ASCII in trail
> > bytes.)
>
> Referring to json_lex_string() that does UTF-8 -> ASCII -> give-up in
> its conversion for FRONTEND, I guess?  Yep.  This limitation looks
> like a problem, especially if plugging that to libpq.

Okay. How we deal with that will likely guide the "optimal" fix to
error reporting, I think...

Thanks,
--Jacob

Re: [PATCH] json_lex_string: don't overread on bad UTF8

Reply via email to