Re: [BUGS] control character check in JSON type seems broken
Robert Haas writes: > On Mon, Jun 4, 2012 at 8:48 PM, Tom Lane wrote: >> And so is that. IMO the error reporting in this module could stand to >> be reviewed altogether for compliance with our message guidelines. >> (For starters, why is it using errdetail_internal?) I refrained from >> editorializing on-the-fly, but I'm not too pleased with what I saw. > Huh. I have no idea why I thought errdetail_internal was a good idea. > Should we just change all those to errdetail? Yeah, they're clearly user-facing so I see no reason why they shouldn't be translatable. regards, tom lane -- Sent via pgsql-bugs mailing list (pgsql-bugs@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-bugs
Re: [BUGS] control character check in JSON type seems broken
On Mon, Jun 4, 2012 at 8:48 PM, Tom Lane wrote: >> In addition, error message above seems corrupted in my environment. >> Here we check not-escaped control character, so printing it with %c >> formatting might break log files. How about using decimal or hex dump >> in such cases? > > And so is that. IMO the error reporting in this module could stand to > be reviewed altogether for compliance with our message guidelines. > (For starters, why is it using errdetail_internal?) I refrained from > editorializing on-the-fly, but I'm not too pleased with what I saw. Huh. I have no idea why I thought errdetail_internal was a good idea. Should we just change all those to errdetail? -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-bugs mailing list (pgsql-bugs@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-bugs
Re: [BUGS] control character check in JSON type seems broken
Shigeru Hanada writes: > json_lex_string() misjudges that the token "ãã¼" contains naked > (not-escaped) control character, because its first byte is 0xe3 and it's > -29 for signed char interpreting, and it's less than 32. We need to > cast to unsigned char (or use unsigned variable) here. Yeah, that's wrong. > In addition, error message above seems corrupted in my environment. > Here we check not-escaped control character, so printing it with %c > formatting might break log files. How about using decimal or hex dump > in such cases? And so is that. IMO the error reporting in this module could stand to be reviewed altogether for compliance with our message guidelines. (For starters, why is it using errdetail_internal?) I refrained from editorializing on-the-fly, but I'm not too pleased with what I saw. Patch applied, thanks! regards, tom lane -- Sent via pgsql-bugs mailing list (pgsql-bugs@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-bugs
[BUGS] control character check in JSON type seems broken
Hi, A Japanese user found strange behavior in JSON type, so I'd like to share the issue here. He simply tested casting a string literal to json type, and got an unexpected error when he used a Japanese word as name and/or value of JSON object. In the example below, "キー" is a Japanese word which means "key", and its first letter has byte sequence "0xe3 0x82 0xad" in UTF-8. postgres=# select '{"キー":"value"}'::json; ERROR: invalid input syntax for type json LINE 1: select '{"キー":"value"}'::json; ^ DETAIL: line 1: Character " ・ must be escaped. With some debugging, I found that the problem is in json_lex_string(). json_lex_string() misjudges that the token "キー" contains naked (not-escaped) control character, because its first byte is 0xe3 and it's -29 for signed char interpreting, and it's less than 32. We need to cast to unsigned char (or use unsigned variable) here. In addition, error message above seems corrupted in my environment. Here we check not-escaped control character, so printing it with %c formatting might break log files. How about using decimal or hex dump in such cases? Please see attached patch which contains workaround for this issue. Regards, -- Shigeru HANADA diff --git a/src/backend/utils/adt/json.c b/src/backend/utils/adt/json.c index 8ab47de..1326d34 100644 --- a/src/backend/utils/adt/json.c +++ b/src/backend/utils/adt/json.c @@ -419,7 +419,7 @@ json_lex_string(JsonLexContext *lex) for (s = lex->token_start + 1; *s != '"'; ++s) { /* Per RFC4627, these characters MUST be escaped. */ - if (*s < 32) + if ((unsigned char) *s < 32) { /* A NUL byte marks the (premature) end of the string. */ if (*s == '\0') @@ -430,8 +430,8 @@ json_lex_string(JsonLexContext *lex) ereport(ERROR, (errcode(ERRCODE_INVALID_TEXT_REPRESENTATION), errmsg("invalid input syntax for type json"), -errdetail_internal("line %d: Character \"%c\" must be escaped.", - lex->line_number, *s))); +errdetail_internal("line %d: Character with value \"0x%02x\" must be escaped.", + lex->line_number, (unsigned char) *s))); } else if (*s == '\\') { diff --git a/src/test/regress/expected/json.out b/src/test/regress/expected/json.out index ed8b237..4b1ad89 100644 --- a/src/test/regress/expected/json.out +++ b/src/test/regress/expected/json.out @@ -26,8 +26,7 @@ def"'::json; -- ERROR, unescaped newline in string constant ERROR: invalid input syntax for type json LINE 1: SELECT '"abc ^ -DETAIL: line 1: Character " -" must be escaped. +DETAIL: line 1: Character with value "0x0a" must be escaped. SELECT '"\n\"\\"'::json; -- OK, legal escapes json -- -- Sent via pgsql-bugs mailing list (pgsql-bugs@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-bugs