ID: 49687
Comment by: sird at rckc dot at
Reported By: sird at rckc dot at
Status: Open
Bug Type: *Unicode Issues
Operating System: *
PHP Version: 5.2.11
Assigned To: scottmac
New Comment:
oh, my mistake:
else if (c < 0x800) {
newbuf[(*newlen)++] = (0xc0 | (c >> 6));
newbuf[(*newlen)++] = (0x80 | (c & 0x3f));
}
should be:
else if (c < 0x800) {
if ( (s[1]&0xC0!=0x80) ){
newbuf[(*newlen)++] = '?';
}else{
newbuf[(*newlen)++] = (0xc0 | (c >> 6));
newbuf[(*newlen)++] = (0x80 | (c & 0x3f));
}
}
Previous Comments:
------------------------------------------------------------------------
[2009-10-16 04:41:27] sird at rckc dot at
I disagree.. how slow can it be to add 2 bit operations..
} else if (c < 0x800) {
change to
} else if (c < 0x800) {
if ( (s[1]&0xC0!=0x80) ){ // this is a new operation
newbuf[(*newlen)++] = '?'; // this are not new operations
pos--; // this are not new operations
s++; // this are not new operations
continue;
}
}
Besides, considering all real implementations do what the spec say they
should do (it's not validate it's valid UNICODE, is that UNICODE says
that the algorithm SHOULD do the check).. not doing it on PHP is just
nuts.
------------------------------------------------------------------------
[2009-10-16 04:01:21] [email protected]
PHP 5 has binary strings, not utf-8 strings. It does not attempt to do
any validation on input, so expecting addslashes to magically validate
things as utf-8 is wrong, simple as.
I agree that utf8_decode should do proper validation here though the
overhead of doing that validation is going to be slow. So I've coded up
a utf8_validate function. Still need to sort out some of the behaviour
first.
------------------------------------------------------------------------
[2009-10-16 03:41:30] sird at rckc dot at
oops!
you are right, :) the code before was unsigned short.
still, the other vulnerabilities remain.
I've made a blogpost that explains the other issues ;)
http://sirdarckcat.blogspot.com/2009/10/couple-of-unicode-issues-on-php-and.html
I updated the post to note the last bug was fixed on 5.2.11
Greetings!!
------------------------------------------------------------------------
[2009-10-16 03:32:19] [email protected]
On a 16-bit processor an int might be 16-bit, if you can get PHP to
compile then well done :-)
Did you even try running the test code?
------------------------------------------------------------------------
[2009-10-16 01:36:27] sird at rckc dot at
: [email protected]
It has come to my attention that this hasn't been fixed..
unsigned int has a size of 16 bits, don't take my word for it
http://www.acm.uiuc.edu/webmonkeys/book/c_guide/1.2.html
Section: 1.2.2 Variables
unsigned int 16 bits
I just downloaded PHP 5.2.11, and I quote the code:
// php-5.2.11.tar.bz2/php-5.2.11/ext/xml/xml.c#558
PHPAPI char *xml_utf8_decode( // ...
{
int pos = len;
char *newbuf = emallo // ...
unsigned int c; // sizeof(unsigned int)==16 bits
char (*decoder)(unsig // ...
xml_encoding *enc = x // ...
// ...
// #580
c = (unsigned char)(*s);
if (c >= 0xf0) { /* four bytes encoded, 21 bits */
if(pos-4 >= 0) {
c = ((s[0]&7)<<18) | ((s[1]&63)<<12) | ((s[2]&63)<<6) |
(s[3]&63);
} else {
c = '?';
}
s += 4;
pos -= 4;
// ...
Also no checking at ALL is made on the leading bytes (they should be in
the form: 10xx xxxx, a check is very easy, to check if s[0] has the
correct form: you do an AND with 1100 0000 and then compare it with 1000
0000.
s[0]&0xC0==0x80
Also, Overlong UTF is not being taken care of, that's yeah, yet another
vulnerability.
Greetings!!
------------------------------------------------------------------------
The remainder of the comments for this report are too long. To view
the rest of the comments, please view the bug report online at
http://bugs.php.net/49687
--
Edit this bug report at http://bugs.php.net/?id=49687&edit=1