Re: [PHP-DEV] json_encode() bug
Hi! This should really be fixed similar to the iconv //IGNORE flag - so that bad characters are just replaced with '?' Maybe using iconv is a way to go in your scenario? -- Stanislav Malyshev, Zend Software Architect [EMAIL PROTECTED] http://www.zend.com/ (408)253-8829 MSN: [EMAIL PROTECTED] -- PHP Internals - PHP Runtime Development Mailing List To unsubscribe, visit: http://www.php.net/unsub.php
Re: [PHP-DEV] json_encode() bug
This should really be fixed similar to the iconv //IGNORE flag - so that bad characters are just replaced with '?' We use it to render spam email summaries, and dont really care if the encoding is incorrect, just as long as it shows something. Throwing a warning without having a fix/workaround, just reduces the usefulness of the function. Regards Alan Stanislav Malyshev wrote: Hi! Right now, if json_encode sees wrong UTF-8 data, it just cuts the string in the middle, no error returned, no message produced. Example: var_dump(json_encode("ab\xE0")); var_dump(json_encode("ab\xE0\"")); Both strings get cut at "ab". I think it's not a good idea to just silently cut the data. In fact, I think it is a bug caused by this code in ext/json/utf8_to_utf16.c: if (c < 0) { return UTF8_END ? the_index : UTF8_ERROR; } which inherited this bug from code published on json.org. It should be: if (c < 0) { return (c == UTF8_END) ? the_index : UTF8_ERROR; } Now this is an easy fix but would lead to bad strings silently converted to empty strings. The question is - should we have an error there? If so, which one - E_WARNING, E_NOTICE? I'm for E_WARNING. Also filed as bug #43941. Any comments? -- PHP Internals - PHP Runtime Development Mailing List To unsubscribe, visit: http://www.php.net/unsub.php
Re: [PHP-DEV] json_encode() bug
On Tue, 5 Feb 2008, Richard Lynch wrote: > On Sun, February 3, 2008 7:51 pm, Stanislav Malyshev wrote: > >> Like I mentioned before (I think), it should not return an empty > >> string of course because programmatically it's not possible to > >> check for this. As most of our functions return false in those > >> cases, so should this function. > > > > AFAIR false is not valid JSON, so it would break a lot of code. > > Also, I am not sure we should change json_encode to return false on > > whole structure if one of the fields contains invalid utf-8. [snip] > Put partially decoded value somewhere in a variable to be accessed, so > that the programmer can make an informed decision about what to do. We're talking about encoding here though... regards, Derick -- Derick Rethans http://derickrethans.nl | http://ezcomponents.org | http://xdebug.org -- PHP Internals - PHP Runtime Development Mailing List To unsubscribe, visit: http://www.php.net/unsub.php
Re: [PHP-DEV] json_encode() bug
On Sun, February 3, 2008 7:51 pm, Stanislav Malyshev wrote: >> Like I mentioned before (I think), it should not return an empty >> string >> of course because programmatically it's not possible to check for >> this. >> As most of our functions return false in those cases, so should this >> function. > > AFAIR false is not valid JSON, so it would break a lot of code. Also, > I > am not sure we should change json_encode to return false on whole > structure if one of the fields contains invalid utf-8. I see what you are saying, but... If you want to write good quality code, you have to do something reasonable here. PERHAPS: Return false E_WARNING Put partially decoded value somewhere in a variable to be accessed, so that the programmer can make an informed decision about what to do. -- Some people have a "gift" link here. Know what I want? I want you to buy a CD from some indie artist. http://cdbaby.com/from/lynch Yeah, I get a buck. So? -- PHP Internals - PHP Runtime Development Mailing List To unsubscribe, visit: http://www.php.net/unsub.php
Re: [PHP-DEV] json_encode() bug
On Sun, 3 Feb 2008, Stanislav Malyshev wrote: > > Like I mentioned before (I think), it should not return an empty string of > > course because programmatically it's not possible to check for this. As most > > of our functions return false in those cases, so should this function. > > AFAIR false is not valid JSON, so it would break a lot of code. Also, I am not > sure we should change json_encode to return false on whole structure if one of > the fields contains invalid utf-8. Of course we should make it possible to detect if there is broken data. I think our normal mantra is "be strict with creating, be lenient while parsing". Because we're generating data here, we should *abort* if we find an error, and not return half baked results. To detect if we have broken data, the option is to check a return value. As normally this function returns a string, false (or null) is a good candidate for an error-result. regards, Derick -- Derick Rethans http://derickrethans.nl | http://ezcomponents.org | http://xdebug.org -- PHP Internals - PHP Runtime Development Mailing List To unsubscribe, visit: http://www.php.net/unsub.php
Re: [PHP-DEV] json_encode() bug
Like I mentioned before (I think), it should not return an empty string of course because programmatically it's not possible to check for this. As most of our functions return false in those cases, so should this function. AFAIR false is not valid JSON, so it would break a lot of code. Also, I am not sure we should change json_encode to return false on whole structure if one of the fields contains invalid utf-8. -- Stanislav Malyshev, Zend Software Architect [EMAIL PROTECTED] http://www.zend.com/ (408)253-8829 MSN: [EMAIL PROTECTED] -- PHP Internals - PHP Runtime Development Mailing List To unsubscribe, visit: http://www.php.net/unsub.php
Re: [PHP-DEV] json_encode() bug
On Fri, 25 Jan 2008, Stanislav Malyshev wrote: > Hi! > > Right now, if json_encode sees wrong UTF-8 data, it just cuts the string in > the middle, no error returned, no message produced. Example: > > var_dump(json_encode("ab\xE0")); > var_dump(json_encode("ab\xE0\"")); > > Both strings get cut at "ab". I think it's not a good idea to just silently > cut the data. In fact, I think it is a bug caused by this code in > ext/json/utf8_to_utf16.c: > if (c < 0) { > return UTF8_END ? the_index : UTF8_ERROR; > } > which inherited this bug from code published on json.org. It should be: > if (c < 0) { > return (c == UTF8_END) ? the_index : UTF8_ERROR; > } > Now this is an easy fix but would lead to bad strings silently converted to > empty strings. The question is - should we have an error there? If so, which > one - E_WARNING, E_NOTICE? I'm for E_WARNING. > Also filed as bug #43941. > Any comments? Like I mentioned before (I think), it should not return an empty string of course because programmatically it's not possible to check for this. As most of our functions return false in those cases, so should this function. Derick -- PHP Internals - PHP Runtime Development Mailing List To unsubscribe, visit: http://www.php.net/unsub.php
Re: [PHP-DEV] json_encode() bug
On Feb 1, 2008 9:06 PM, Stanislav Malyshev <[EMAIL PROTECTED]> wrote: > > Stupid question, who actually checks for E_* in his code at runtime > > after having called such functions? Not me and I would hate to. It > > sounds to me like a perfect exception use case. As this function can > > General policy in PHP as far as I know is that non-OO functions do not > do exceptions. > > > return nearly everything scalar we have (boolean, string, null, > > integer,...) we can't use our normal "returns FALSE on failure". > > This function can return only a string, but I'm not sure returning false > if somewhere in some value deep down your data is some bad utf-8 is > warranted. It is doable, though, but I'm afraid most of current code > never check for return of json_encode() so they are in for big surprises > when json_encode won't produce valid JSON. Oh right, I thought about/read json_decode. But the idea is the same yes. Not checking return values is a very bad habit, I can't count how many times I saw codes dying miserably only because they don't check returns values. Would it make sense to use exception for such cases? Aka to change our policy? I'm not an excpeption fan, but these cases are good examples of exception usages. We can invent a meta type "error" but it would be rather confusing ;) -- Pierre http://blog.thepimp.net | http://www.libgd.org -- PHP Internals - PHP Runtime Development Mailing List To unsubscribe, visit: http://www.php.net/unsub.php
Re: [PHP-DEV] json_encode() bug
Stupid question, who actually checks for E_* in his code at runtime after having called such functions? Not me and I would hate to. It sounds to me like a perfect exception use case. As this function can General policy in PHP as far as I know is that non-OO functions do not do exceptions. return nearly everything scalar we have (boolean, string, null, integer,...) we can't use our normal "returns FALSE on failure". This function can return only a string, but I'm not sure returning false if somewhere in some value deep down your data is some bad utf-8 is warranted. It is doable, though, but I'm afraid most of current code never check for return of json_encode() so they are in for big surprises when json_encode won't produce valid JSON. -- Stanislav Malyshev, Zend Software Architect [EMAIL PROTECTED] http://www.zend.com/ (408)253-8829 MSN: [EMAIL PROTECTED] -- PHP Internals - PHP Runtime Development Mailing List To unsubscribe, visit: http://www.php.net/unsub.php
Re: [PHP-DEV] json_encode() bug
On Feb 1, 2008 8:13 PM, Cristian Rodriguez <[EMAIL PROTECTED]> wrote: > 2008/1/25, Stanislav Malyshev <[EMAIL PROTECTED]>: > > > Now this is an easy fix but would lead to bad strings silently converted > > to empty strings. The question is - should we have an error there? If > > so, which one - E_WARNING, E_NOTICE? I'm for E_WARNING. > > Yes , E_WARNING is the right thing to have IMHO. Stupid question, who actually checks for E_* in his code at runtime after having called such functions? Not me and I would hate to. It sounds to me like a perfect exception use case. As this function can return nearly everything scalar we have (boolean, string, null, integer,...) we can't use our normal "returns FALSE on failure". Cheers, -- Pierre http://blog.thepimp.net | http://www.libgd.org -- PHP Internals - PHP Runtime Development Mailing List To unsubscribe, visit: http://www.php.net/unsub.php
Re: [PHP-DEV] json_encode() bug
2008/1/25, Stanislav Malyshev <[EMAIL PROTECTED]>: > Now this is an easy fix but would lead to bad strings silently converted > to empty strings. The question is - should we have an error there? If > so, which one - E_WARNING, E_NOTICE? I'm for E_WARNING. Yes , E_WARNING is the right thing to have IMHO. -- http://www.cristianrodriguez.net -- PHP Internals - PHP Runtime Development Mailing List To unsubscribe, visit: http://www.php.net/unsub.php
[PHP-DEV] json_encode() bug
Hi! Right now, if json_encode sees wrong UTF-8 data, it just cuts the string in the middle, no error returned, no message produced. Example: var_dump(json_encode("ab\xE0")); var_dump(json_encode("ab\xE0\"")); Both strings get cut at "ab". I think it's not a good idea to just silently cut the data. In fact, I think it is a bug caused by this code in ext/json/utf8_to_utf16.c: if (c < 0) { return UTF8_END ? the_index : UTF8_ERROR; } which inherited this bug from code published on json.org. It should be: if (c < 0) { return (c == UTF8_END) ? the_index : UTF8_ERROR; } Now this is an easy fix but would lead to bad strings silently converted to empty strings. The question is - should we have an error there? If so, which one - E_WARNING, E_NOTICE? I'm for E_WARNING. Also filed as bug #43941. Any comments? -- Stanislav Malyshev, Zend Software Architect [EMAIL PROTECTED] http://www.zend.com/ (408)253-8829 MSN: [EMAIL PROTECTED] -- PHP Internals - PHP Runtime Development Mailing List To unsubscribe, visit: http://www.php.net/unsub.php