ID:               49687
 Comment by:       sird at rckc dot at
 Reported By:      sird at rckc dot at
 Status:           Open
 Bug Type:         *Unicode Issues
 Operating System: *
 PHP Version:      5.2.11
 Assigned To:      scottmac
 New Comment:

My last post, I promise..

it should say:
        c = ((s[0]&63)<<6) | (s[1]&63);

Greetz!


Previous Comments:
------------------------------------------------------------------------

[2009-10-16 04:52:21] sird at rckc dot at

Oh, duh! I'm reading the wrong function.. :( Sorry

                        if(pos-2 >= 0 || s[1]&0xC0!=0x80) {
                                c = ((s[0]&7)<<18) | ((s[1]&63)<<12) | 
((s[2]&63)<<6) | (s[3]&63);
                        } else {
                                c = '?';        
                        }

------------------------------------------------------------------------

[2009-10-16 04:45:25] sird at rckc dot at

oh, my mistake:
                else if (c < 0x800) {
                        newbuf[(*newlen)++] = (0xc0 | (c >> 6));
                        newbuf[(*newlen)++] = (0x80 | (c & 0x3f));
                }

should be:

                else if (c < 0x800) {
                        if ( (s[1]&0xC0!=0x80) ){
                            newbuf[(*newlen)++] = '?';
                        }else{
                            newbuf[(*newlen)++] = (0xc0 | (c >> 6));
                            newbuf[(*newlen)++] = (0x80 | (c & 0x3f));
                        }
                }

------------------------------------------------------------------------

[2009-10-16 04:41:27] sird at rckc dot at

I disagree.. how slow can it be to add 2 bit operations..

} else if (c < 0x800) {

change to

} else if (c < 0x800) {
    if ( (s[1]&0xC0!=0x80) ){  // this is a new operation
        newbuf[(*newlen)++] = '?'; // this are not new operations
        pos--; // this are not new operations
        s++; // this are not new operations
        continue;
    }
}

Besides, considering all real implementations do what the spec say they
should do (it's not validate it's valid UNICODE, is that UNICODE says
that the algorithm SHOULD do the check).. not doing it on PHP is just
nuts.

------------------------------------------------------------------------

[2009-10-16 04:01:21] scott...@php.net

PHP 5 has binary strings, not utf-8 strings. It does not attempt to do
any validation on input, so expecting addslashes to magically validate
things as utf-8 is wrong, simple as.

I agree that utf8_decode should do proper validation here though the
overhead of doing that validation is going to be slow. So I've coded up
a utf8_validate function. Still need to sort out some of the behaviour
first.

------------------------------------------------------------------------

[2009-10-16 03:41:30] sird at rckc dot at

oops!

you are right, :) the code before was unsigned short.

still, the other vulnerabilities remain.

I've made a blogpost that explains the other issues ;)

http://sirdarckcat.blogspot.com/2009/10/couple-of-unicode-issues-on-php-and.html

I updated the post to note the last bug was fixed on 5.2.11

Greetings!!

------------------------------------------------------------------------

The remainder of the comments for this report are too long. To view
the rest of the comments, please view the bug report online at
    http://bugs.php.net/49687

-- 
Edit this bug report at http://bugs.php.net/?id=49687&edit=1

Reply via email to