pgsql: Fix overread in JSON parsing errors for incomplete byte sequence
Fix overread in JSON parsing errors for incomplete byte sequences json_lex_string() relies on pg_encoding_mblen_bounded() to point to the end of a JSON string when generating an error message, and the input it uses is not guaranteed to be null-terminated. It was possible to walk off the end of the input buffer by a few bytes when the last bytes consist of an incomplete multi-byte sequence, as token_terminator would point to a location defined by pg_encoding_mblen_bounded() rather than the end of the input. This commit switches token_terminator so as the error uses data up to the end of the JSON input. More work should be done so as this code could rely on an equivalent of report_invalid_encoding() so as incorrect byte sequences can show in error messages in a readable form. This requires work for at least two cases in the JSON parsing API: an incomplete token and an invalid escape sequence. A more complete solution may be too invasive for a backpatch, so this is left as a future improvement, taking care of the overread first. A test is added on HEAD as test_json_parser makes this issue straight-forward to check. Note that pg_encoding_mblen_bounded() no longer has any callers. This will be removed on HEAD with a separate commit, as this is proving to encourage unsafe coding. Author: Jacob Champion Discussion: https://postgr.es/m/caoymi+ncm7pwls3ankcsmoqqtpjva8wmcdobtka3zrb2hzg...@mail.gmail.com Backpatch-through: 13 Branch -- REL_13_STABLE Details --- https://git.postgresql.org/pg/commitdiff/377c25d32275a65043f9dcf3d4668a56c37319e2 Modified Files -- src/common/jsonapi.c | 5 +++-- 1 file changed, 3 insertions(+), 2 deletions(-)
pgsql: Fix overread in JSON parsing errors for incomplete byte sequence
Fix overread in JSON parsing errors for incomplete byte sequences json_lex_string() relies on pg_encoding_mblen_bounded() to point to the end of a JSON string when generating an error message, and the input it uses is not guaranteed to be null-terminated. It was possible to walk off the end of the input buffer by a few bytes when the last bytes consist of an incomplete multi-byte sequence, as token_terminator would point to a location defined by pg_encoding_mblen_bounded() rather than the end of the input. This commit switches token_terminator so as the error uses data up to the end of the JSON input. More work should be done so as this code could rely on an equivalent of report_invalid_encoding() so as incorrect byte sequences can show in error messages in a readable form. This requires work for at least two cases in the JSON parsing API: an incomplete token and an invalid escape sequence. A more complete solution may be too invasive for a backpatch, so this is left as a future improvement, taking care of the overread first. A test is added on HEAD as test_json_parser makes this issue straight-forward to check. Note that pg_encoding_mblen_bounded() no longer has any callers. This will be removed on HEAD with a separate commit, as this is proving to encourage unsafe coding. Author: Jacob Champion Discussion: https://postgr.es/m/caoymi+ncm7pwls3ankcsmoqqtpjva8wmcdobtka3zrb2hzg...@mail.gmail.com Backpatch-through: 13 Branch -- master Details --- https://git.postgresql.org/pg/commitdiff/855517307db8efd397d49163d65a4fd3bdcc41bc Modified Files -- src/common/jsonapi.c | 4 ++-- src/test/modules/test_json_parser/t/002_inline.pl | 8 2 files changed, 10 insertions(+), 2 deletions(-)
pgsql: Fix overread in JSON parsing errors for incomplete byte sequence
Fix overread in JSON parsing errors for incomplete byte sequences json_lex_string() relies on pg_encoding_mblen_bounded() to point to the end of a JSON string when generating an error message, and the input it uses is not guaranteed to be null-terminated. It was possible to walk off the end of the input buffer by a few bytes when the last bytes consist of an incomplete multi-byte sequence, as token_terminator would point to a location defined by pg_encoding_mblen_bounded() rather than the end of the input. This commit switches token_terminator so as the error uses data up to the end of the JSON input. More work should be done so as this code could rely on an equivalent of report_invalid_encoding() so as incorrect byte sequences can show in error messages in a readable form. This requires work for at least two cases in the JSON parsing API: an incomplete token and an invalid escape sequence. A more complete solution may be too invasive for a backpatch, so this is left as a future improvement, taking care of the overread first. A test is added on HEAD as test_json_parser makes this issue straight-forward to check. Note that pg_encoding_mblen_bounded() no longer has any callers. This will be removed on HEAD with a separate commit, as this is proving to encourage unsafe coding. Author: Jacob Champion Discussion: https://postgr.es/m/caoymi+ncm7pwls3ankcsmoqqtpjva8wmcdobtka3zrb2hzg...@mail.gmail.com Backpatch-through: 13 Branch -- REL_16_STABLE Details --- https://git.postgresql.org/pg/commitdiff/5396a2987cc679a3ab5db26b54c8b76149489e29 Modified Files -- src/common/jsonapi.c | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-)
pgsql: Fix overread in JSON parsing errors for incomplete byte sequence
Fix overread in JSON parsing errors for incomplete byte sequences json_lex_string() relies on pg_encoding_mblen_bounded() to point to the end of a JSON string when generating an error message, and the input it uses is not guaranteed to be null-terminated. It was possible to walk off the end of the input buffer by a few bytes when the last bytes consist of an incomplete multi-byte sequence, as token_terminator would point to a location defined by pg_encoding_mblen_bounded() rather than the end of the input. This commit switches token_terminator so as the error uses data up to the end of the JSON input. More work should be done so as this code could rely on an equivalent of report_invalid_encoding() so as incorrect byte sequences can show in error messages in a readable form. This requires work for at least two cases in the JSON parsing API: an incomplete token and an invalid escape sequence. A more complete solution may be too invasive for a backpatch, so this is left as a future improvement, taking care of the overread first. A test is added on HEAD as test_json_parser makes this issue straight-forward to check. Note that pg_encoding_mblen_bounded() no longer has any callers. This will be removed on HEAD with a separate commit, as this is proving to encourage unsafe coding. Author: Jacob Champion Discussion: https://postgr.es/m/caoymi+ncm7pwls3ankcsmoqqtpjva8wmcdobtka3zrb2hzg...@mail.gmail.com Backpatch-through: 13 Branch -- REL_14_STABLE Details --- https://git.postgresql.org/pg/commitdiff/41adf9d960c0105ada615574126d5568d552f4db Modified Files -- src/common/jsonapi.c | 5 +++-- 1 file changed, 3 insertions(+), 2 deletions(-)
pgsql: Fix overread in JSON parsing errors for incomplete byte sequence
Fix overread in JSON parsing errors for incomplete byte sequences json_lex_string() relies on pg_encoding_mblen_bounded() to point to the end of a JSON string when generating an error message, and the input it uses is not guaranteed to be null-terminated. It was possible to walk off the end of the input buffer by a few bytes when the last bytes consist of an incomplete multi-byte sequence, as token_terminator would point to a location defined by pg_encoding_mblen_bounded() rather than the end of the input. This commit switches token_terminator so as the error uses data up to the end of the JSON input. More work should be done so as this code could rely on an equivalent of report_invalid_encoding() so as incorrect byte sequences can show in error messages in a readable form. This requires work for at least two cases in the JSON parsing API: an incomplete token and an invalid escape sequence. A more complete solution may be too invasive for a backpatch, so this is left as a future improvement, taking care of the overread first. A test is added on HEAD as test_json_parser makes this issue straight-forward to check. Note that pg_encoding_mblen_bounded() no longer has any callers. This will be removed on HEAD with a separate commit, as this is proving to encourage unsafe coding. Author: Jacob Champion Discussion: https://postgr.es/m/caoymi+ncm7pwls3ankcsmoqqtpjva8wmcdobtka3zrb2hzg...@mail.gmail.com Backpatch-through: 13 Branch -- REL_15_STABLE Details --- https://git.postgresql.org/pg/commitdiff/8c3f30e675e972e3c2750ea82781f2c4ba77a9f8 Modified Files -- src/common/jsonapi.c | 5 +++-- 1 file changed, 3 insertions(+), 2 deletions(-)