From: jdolecek at NetBSD dot org
Operating system: Any
PHP version: 5.1.4
PHP Bug Type: WDDX related
Bug description: WDDX cannot deserialize serialized UTF-8 encoded non-ASCII
text
Description:
------------
WDDX cannot be used to encode certain UTF8-encoded iso-8859-1 text.
Particularily those iso-8859-1 characters, which after conversion to UTF-8
generate sequence of characters with value in 128-160 range, which are
recognized as control characters. Control characters are turned into <char
code="XX"/> sequence by WDDX.
wddx_deserialize() expects UTF-8 encoded string, and implicitly converts
the text back to iso-8859-1 before deserializing the structure. This is
done _before_
the <char code="XX"/> is replaced by the character. The < is thus
recognized as part of the UTF-8 sequence, two-byte sequence is recoded to
single-byte character and the result contains invalid XML (fragment 'char
code="XX"/>'). Deserialization thus fails silently.
I.e.:
1. iso-8859-1 is Z (ord(Z) > 128)
2. UTF-8 string is XY
3. WDDX serializes that as X<char code="ord(Y)"/>
4. deserializer converts UTF-8 input to iso-8859-1 before
starting deserialization, result is Bchar code="ord(Y)"/>
5. deserializer detects invalid XML and aborts the decode,
returns empty string
Fix:
Only recode ASCII control characters to <char code="XX" /> sequence:
--- wddx.c.orig 2006-05-24 00:39:34.000000000 +0200
+++ wddx.c
@@ -399,7 +399,8 @@ static void php_wddx_serialize_string(wd
break;
default:
- if (iscntrl((int)*(unsigned char
*)p)) {
+ if (iscntrl((int)*(unsigned char
*)p)
+ && isascii((int)*(unsigned
char *)p)) {
FLUSH_BUF();
sprintf(control_buf,
WDDX_CHAR, *p);
php_wddx_add_chunk(packet,
control_buf);
Note - this patch also makes problem of Bug #37569 go away, but that patch
is still useful to apply for code clarity.
This bug is probably same problem as Bug #35241.
Reproduce code:
---------------
On UNIX with iso-8859-1 locale or Windows with Windows-1250 locale:
var_dump(
wddx_deserialize(wddx_serialize_value(utf8_encode(chr(200))))
);
Expected result:
----------------
string(1) "Č"
Actual result:
--------------
string(0) ""
--
Edit bug report at http://bugs.php.net/?id=37571&edit=1
--
Try a CVS snapshot (PHP 4.4):
http://bugs.php.net/fix.php?id=37571&r=trysnapshot44
Try a CVS snapshot (PHP 5.2):
http://bugs.php.net/fix.php?id=37571&r=trysnapshot52
Try a CVS snapshot (PHP 6.0):
http://bugs.php.net/fix.php?id=37571&r=trysnapshot60
Fixed in CVS: http://bugs.php.net/fix.php?id=37571&r=fixedcvs
Fixed in release:
http://bugs.php.net/fix.php?id=37571&r=alreadyfixed
Need backtrace: http://bugs.php.net/fix.php?id=37571&r=needtrace
Need Reproduce Script: http://bugs.php.net/fix.php?id=37571&r=needscript
Try newer version: http://bugs.php.net/fix.php?id=37571&r=oldversion
Not developer issue: http://bugs.php.net/fix.php?id=37571&r=support
Expected behavior: http://bugs.php.net/fix.php?id=37571&r=notwrong
Not enough info:
http://bugs.php.net/fix.php?id=37571&r=notenoughinfo
Submitted twice:
http://bugs.php.net/fix.php?id=37571&r=submittedtwice
register_globals: http://bugs.php.net/fix.php?id=37571&r=globals
PHP 3 support discontinued: http://bugs.php.net/fix.php?id=37571&r=php3
Daylight Savings: http://bugs.php.net/fix.php?id=37571&r=dst
IIS Stability: http://bugs.php.net/fix.php?id=37571&r=isapi
Install GNU Sed: http://bugs.php.net/fix.php?id=37571&r=gnused
Floating point limitations: http://bugs.php.net/fix.php?id=37571&r=float
No Zend Extensions: http://bugs.php.net/fix.php?id=37571&r=nozend
MySQL Configuration Error: http://bugs.php.net/fix.php?id=37571&r=mysqlcfg