From: jdolecek at NetBSD dot org Operating system: Any PHP version: 5.1.4 PHP Bug Type: WDDX related Bug description: WDDX cannot deserialize serialized UTF-8 encoded non-ASCII text
Description: ------------ WDDX cannot be used to encode certain UTF8-encoded iso-8859-1 text. Particularily those iso-8859-1 characters, which after conversion to UTF-8 generate sequence of characters with value in 128-160 range, which are recognized as control characters. Control characters are turned into <char code="XX"/> sequence by WDDX. wddx_deserialize() expects UTF-8 encoded string, and implicitly converts the text back to iso-8859-1 before deserializing the structure. This is done _before_ the <char code="XX"/> is replaced by the character. The < is thus recognized as part of the UTF-8 sequence, two-byte sequence is recoded to single-byte character and the result contains invalid XML (fragment 'char code="XX"/>'). Deserialization thus fails silently. I.e.: 1. iso-8859-1 is Z (ord(Z) > 128) 2. UTF-8 string is XY 3. WDDX serializes that as X<char code="ord(Y)"/> 4. deserializer converts UTF-8 input to iso-8859-1 before starting deserialization, result is Bchar code="ord(Y)"/> 5. deserializer detects invalid XML and aborts the decode, returns empty string Fix: Only recode ASCII control characters to <char code="XX" /> sequence: --- wddx.c.orig 2006-05-24 00:39:34.000000000 +0200 +++ wddx.c @@ -399,7 +399,8 @@ static void php_wddx_serialize_string(wd break; default: - if (iscntrl((int)*(unsigned char *)p)) { + if (iscntrl((int)*(unsigned char *)p) + && isascii((int)*(unsigned char *)p)) { FLUSH_BUF(); sprintf(control_buf, WDDX_CHAR, *p); php_wddx_add_chunk(packet, control_buf); Note - this patch also makes problem of Bug #37569 go away, but that patch is still useful to apply for code clarity. This bug is probably same problem as Bug #35241. Reproduce code: --------------- On UNIX with iso-8859-1 locale or Windows with Windows-1250 locale: var_dump( wddx_deserialize(wddx_serialize_value(utf8_encode(chr(200)))) ); Expected result: ---------------- string(1) "Č" Actual result: -------------- string(0) "" -- Edit bug report at http://bugs.php.net/?id=37571&edit=1 -- Try a CVS snapshot (PHP 4.4): http://bugs.php.net/fix.php?id=37571&r=trysnapshot44 Try a CVS snapshot (PHP 5.2): http://bugs.php.net/fix.php?id=37571&r=trysnapshot52 Try a CVS snapshot (PHP 6.0): http://bugs.php.net/fix.php?id=37571&r=trysnapshot60 Fixed in CVS: http://bugs.php.net/fix.php?id=37571&r=fixedcvs Fixed in release: http://bugs.php.net/fix.php?id=37571&r=alreadyfixed Need backtrace: http://bugs.php.net/fix.php?id=37571&r=needtrace Need Reproduce Script: http://bugs.php.net/fix.php?id=37571&r=needscript Try newer version: http://bugs.php.net/fix.php?id=37571&r=oldversion Not developer issue: http://bugs.php.net/fix.php?id=37571&r=support Expected behavior: http://bugs.php.net/fix.php?id=37571&r=notwrong Not enough info: http://bugs.php.net/fix.php?id=37571&r=notenoughinfo Submitted twice: http://bugs.php.net/fix.php?id=37571&r=submittedtwice register_globals: http://bugs.php.net/fix.php?id=37571&r=globals PHP 3 support discontinued: http://bugs.php.net/fix.php?id=37571&r=php3 Daylight Savings: http://bugs.php.net/fix.php?id=37571&r=dst IIS Stability: http://bugs.php.net/fix.php?id=37571&r=isapi Install GNU Sed: http://bugs.php.net/fix.php?id=37571&r=gnused Floating point limitations: http://bugs.php.net/fix.php?id=37571&r=float No Zend Extensions: http://bugs.php.net/fix.php?id=37571&r=nozend MySQL Configuration Error: http://bugs.php.net/fix.php?id=37571&r=mysqlcfg