ID: 40506 User updated by: php at koterov dot ru -Summary: Suggestion: json_encode() and non-UTF8 strings Reported By: php at koterov dot ru -Status: Bogus +Status: Open Bug Type: Feature/Change Request Operating System: all PHP Version: 5.2.1 New Comment:
I understand that JSON is UTF8-based format. But the question was different: why json_encode() wastes CPU time for analyze the input data instead of passing it through? And the second thought. Assume that the output of json_encode must be UTF8, OK. But why should it limit us to use UTF8 as its input parameter? Ideologically input != output. The main disadvantage that I cannot iterate through all of the input data and call iconv() for it before passing the resulting array to json_encode(). Because it is very CPU expensive (e.g. if I transfer more than 500 strings, each about 30 characters length, the slowdown is great). Theoretically json_encode() is irreplaceable for fast execution and CPU saving only, but it is totally impossible in non-UTF8 sites. Because of the speed is not needed, it is very easy to use PHP version of this function. I think that if we want to follow the RFC literally, it may be better to write json_encode() without any encoding analyzation, and after that - call iconv() ONE TIME to convert the resulting string to UTF8. It is much more faster than calling of iconv() for each input string. Maybe - pass the second optional parameter, $src_encoding, to json_encode() to specify input encoding. Previous Comments: ------------------------------------------------------------------------ [2007-02-17 15:50:14] [EMAIL PROTECTED] http://www.ietf.org/rfc/rfc4627.txt?number=4627 see section 3 ------------------------------------------------------------------------ [2007-02-16 10:47:31] php at koterov dot ru Description: ------------ Could you please explain why json_encode() takes care about the encoding at all? Why not to treat all the string data as a binary flow? This is very inconvenient and disallows the usage of json_encode() in non-UTF8 sites! :-( I have written a small substitution for json_encode(), but note that it of course works much more slow than json_encode() with big data arrays.. /** * Convert PHP scalar, array or hash to JS scalar/array/hash. */ function php2js($a) { if (is_null($a)) return 'null'; if ($a === false) return 'false'; if ($a === true) return 'true'; if (is_scalar($a)) { $a = addslashes($a); $a = str_replace("\n", '\n', $a); $a = str_replace("\r", '\r', $a); $a = preg_replace('{(</)(script)}i', "$1'+'$2", $a); return "'$a'"; } $isList = true; for ($i=0, reset($a); $i<count($a); $i++, next($a)) if (key($a) !== $i) { $isList = false; break; } $result = array(); if ($isList) { foreach ($a as $v) $result[] = php2js($v); return '[ ' . join(', ', $result) . ' ]'; } else { foreach ($a as $k=>$v) $result[] = php2js($k) . ': ' . php2js($v); return '{ ' . join(', ', $result) . ' }'; } } So, my suggestion is remove all string analyzation from json_encode() code. It also make this function to work faster. Reproduce code: --------------- <?php $a = array('a' => 'проверка', 'b' => array('слуха', 'глухого')); echo json_encode($a); ?> Expected result: ---------------- Correctly encoded string in the source 1-byte encoding. Actual result: -------------- Empty strings everywhere (and sometimes - notices that a string contains non-UTF8 characters). ------------------------------------------------------------------------ -- Edit this bug report at http://bugs.php.net/?id=40506&edit=1