ID:               49687
 Updated by:       scott...@php.net
 Reported By:      sird at rckc dot at
 Status:           Open
 Bug Type:         *Unicode Issues
 Operating System: *
 PHP Version:      5.2.11
-Assigned To:      
+Assigned To:      scottmac
 New Comment:

PHP 5 has binary strings, not utf-8 strings. It does not attempt to do
any validation on input, so expecting addslashes to magically validate
things as utf-8 is wrong, simple as.

I agree that utf8_decode should do proper validation here though the
overhead of doing that validation is going to be slow. So I've coded up
a utf8_validate function. Still need to sort out some of the behaviour
first.


Previous Comments:
------------------------------------------------------------------------

[2009-10-16 03:41:30] sird at rckc dot at

oops!

you are right, :) the code before was unsigned short.

still, the other vulnerabilities remain.

I've made a blogpost that explains the other issues ;)

http://sirdarckcat.blogspot.com/2009/10/couple-of-unicode-issues-on-php-and.html

I updated the post to note the last bug was fixed on 5.2.11

Greetings!!

------------------------------------------------------------------------

[2009-10-16 03:32:19] scott...@php.net

On a 16-bit processor an int might be 16-bit, if you can get PHP to
compile then well done :-)

Did you even try running the test code?

------------------------------------------------------------------------

[2009-10-16 01:36:27] sird at rckc dot at

: ras...@php.net

It has come to my attention that this hasn't been fixed..

unsigned int has a size of 16 bits, don't take my word for it

http://www.acm.uiuc.edu/webmonkeys/book/c_guide/1.2.html

Section: 1.2.2 Variables

unsigned int    16 bits

I just downloaded PHP 5.2.11, and I quote the code:


//  php-5.2.11.tar.bz2/php-5.2.11/ext/xml/xml.c#558
PHPAPI char *xml_utf8_decode(    //  ...
{
        int pos = len;
        char *newbuf = emallo    //  ...
        unsigned int c;          // sizeof(unsigned int)==16 bits
        char (*decoder)(unsig    //  ...
        xml_encoding *enc = x    //  ...
//  ...
//  #580
        c = (unsigned char)(*s);
        if (c >= 0xf0) { /* four bytes encoded, 21 bits */
                if(pos-4 >= 0) {
                        c = ((s[0]&7)<<18) | ((s[1]&63)<<12) | ((s[2]&63)<<6) | 
(s[3]&63);
                } else {
                        c = '?';        
                }
                s += 4;
                pos -= 4;
//  ...

Also no checking at ALL is made on the leading bytes (they should be in
the form: 10xx xxxx, a check is very easy, to check if s[0] has the
correct form: you do an AND with 1100 0000 and then compare it with 1000
0000.

s[0]&0xC0==0x80

Also, Overlong UTF is not being taken care of, that's yeah, yet another
vulnerability.

Greetings!!

------------------------------------------------------------------------

[2009-09-29 05:29:22] sird at rckc dot at

the rest is still dangerous.. eating chars without the 10xx xxxx is
against the spec, and overlong UTF.

------------------------------------------------------------------------

[2009-09-29 04:56:08] ras...@php.net

> there are several bugs in the code, one of them is that a variable
holding the value of the char is overflowed (trying to put 21 bits in
a
16 bits int)

That was fixed in 5.2.11

------------------------------------------------------------------------

The remainder of the comments for this report are too long. To view
the rest of the comments, please view the bug report online at
    http://bugs.php.net/49687

-- 
Edit this bug report at http://bugs.php.net/?id=49687&edit=1

Reply via email to