#49687 [Opn]: utf8_decode xml_utf8_decode vuln

2009-10-15 Thread sird at rckc dot at
 ID:   49687
 User updated by:  sird at rckc dot at
 Reported By:  sird at rckc dot at
 Status:   Open
 Bug Type: *Unicode Issues
 Operating System: *
 PHP Version:  5.2.11
 Assigned To:  scottmac
 New Comment:

I disagree.. how slow can it be to add 2 bit operations..

} else if (c < 0x800) {

change to

} else if (c < 0x800) {
if ( (s[1]&0xC0!=0x80) ){  // this is a new operation
newbuf[(*newlen)++] = '?'; // this are not new operations
pos--; // this are not new operations
s++; // this are not new operations
continue;
}
}

Besides, considering all real implementations do what the spec say they
should do (it's not validate it's valid UNICODE, is that UNICODE says
that the algorithm SHOULD do the check).. not doing it on PHP is just
nuts.


Previous Comments:


[2009-10-16 04:01:21] scott...@php.net

PHP 5 has binary strings, not utf-8 strings. It does not attempt to do
any validation on input, so expecting addslashes to magically validate
things as utf-8 is wrong, simple as.

I agree that utf8_decode should do proper validation here though the
overhead of doing that validation is going to be slow. So I've coded up
a utf8_validate function. Still need to sort out some of the behaviour
first.



[2009-10-16 03:41:30] sird at rckc dot at

oops!

you are right, :) the code before was unsigned short.

still, the other vulnerabilities remain.

I've made a blogpost that explains the other issues ;)

http://sirdarckcat.blogspot.com/2009/10/couple-of-unicode-issues-on-php-and.html

I updated the post to note the last bug was fixed on 5.2.11

Greetings!!



[2009-10-16 03:32:19] scott...@php.net

On a 16-bit processor an int might be 16-bit, if you can get PHP to
compile then well done :-)

Did you even try running the test code?



[2009-10-16 01:36:27] sird at rckc dot at

: ras...@php.net

It has come to my attention that this hasn't been fixed..

unsigned int has a size of 16 bits, don't take my word for it

http://www.acm.uiuc.edu/webmonkeys/book/c_guide/1.2.html

Section: 1.2.2 Variables

unsigned int16 bits

I just downloaded PHP 5.2.11, and I quote the code:


//  php-5.2.11.tar.bz2/php-5.2.11/ext/xml/xml.c#558
PHPAPI char *xml_utf8_decode(//  ...
{
int pos = len;
char *newbuf = emallo//  ...
unsigned int c;  // sizeof(unsigned int)==16 bits
char (*decoder)(unsig//  ...
xml_encoding *enc = x//  ...
//  ...
//  #580
c = (unsigned char)(*s);
if (c >= 0xf0) { /* four bytes encoded, 21 bits */
if(pos-4 >= 0) {
c = ((s[0]&7)<<18) | ((s[1]&63)<<12) | ((s[2]&63)<<6) | 
(s[3]&63);
} else {
c = '?';
}
s += 4;
pos -= 4;
//  ...

Also no checking at ALL is made on the leading bytes (they should be in
the form: 10xx , a check is very easy, to check if s[0] has the
correct form: you do an AND with 1100  and then compare it with 1000
.

s[0]&0xC0==0x80

Also, Overlong UTF is not being taken care of, that's yeah, yet another
vulnerability.

Greetings!!



[2009-09-29 05:29:22] sird at rckc dot at

the rest is still dangerous.. eating chars without the 10xx  is
against the spec, and overlong UTF.



The remainder of the comments for this report are too long. To view
the rest of the comments, please view the bug report online at
http://bugs.php.net/49687

-- 
Edit this bug report at http://bugs.php.net/?id=49687&edit=1



#49687 [Opn]: utf8_decode xml_utf8_decode vuln

2009-10-15 Thread scottmac
 ID:   49687
 Updated by:   scott...@php.net
 Reported By:  sird at rckc dot at
 Status:   Open
 Bug Type: *Unicode Issues
 Operating System: *
 PHP Version:  5.2.11
-Assigned To:  
+Assigned To:  scottmac
 New Comment:

PHP 5 has binary strings, not utf-8 strings. It does not attempt to do
any validation on input, so expecting addslashes to magically validate
things as utf-8 is wrong, simple as.

I agree that utf8_decode should do proper validation here though the
overhead of doing that validation is going to be slow. So I've coded up
a utf8_validate function. Still need to sort out some of the behaviour
first.


Previous Comments:


[2009-10-16 03:41:30] sird at rckc dot at

oops!

you are right, :) the code before was unsigned short.

still, the other vulnerabilities remain.

I've made a blogpost that explains the other issues ;)

http://sirdarckcat.blogspot.com/2009/10/couple-of-unicode-issues-on-php-and.html

I updated the post to note the last bug was fixed on 5.2.11

Greetings!!



[2009-10-16 03:32:19] scott...@php.net

On a 16-bit processor an int might be 16-bit, if you can get PHP to
compile then well done :-)

Did you even try running the test code?



[2009-10-16 01:36:27] sird at rckc dot at

: ras...@php.net

It has come to my attention that this hasn't been fixed..

unsigned int has a size of 16 bits, don't take my word for it

http://www.acm.uiuc.edu/webmonkeys/book/c_guide/1.2.html

Section: 1.2.2 Variables

unsigned int16 bits

I just downloaded PHP 5.2.11, and I quote the code:


//  php-5.2.11.tar.bz2/php-5.2.11/ext/xml/xml.c#558
PHPAPI char *xml_utf8_decode(//  ...
{
int pos = len;
char *newbuf = emallo//  ...
unsigned int c;  // sizeof(unsigned int)==16 bits
char (*decoder)(unsig//  ...
xml_encoding *enc = x//  ...
//  ...
//  #580
c = (unsigned char)(*s);
if (c >= 0xf0) { /* four bytes encoded, 21 bits */
if(pos-4 >= 0) {
c = ((s[0]&7)<<18) | ((s[1]&63)<<12) | ((s[2]&63)<<6) | 
(s[3]&63);
} else {
c = '?';
}
s += 4;
pos -= 4;
//  ...

Also no checking at ALL is made on the leading bytes (they should be in
the form: 10xx , a check is very easy, to check if s[0] has the
correct form: you do an AND with 1100  and then compare it with 1000
.

s[0]&0xC0==0x80

Also, Overlong UTF is not being taken care of, that's yeah, yet another
vulnerability.

Greetings!!



[2009-09-29 05:29:22] sird at rckc dot at

the rest is still dangerous.. eating chars without the 10xx  is
against the spec, and overlong UTF.



[2009-09-29 04:56:08] ras...@php.net

> there are several bugs in the code, one of them is that a variable
holding the value of the char is overflowed (trying to put 21 bits in
a
16 bits int)

That was fixed in 5.2.11



The remainder of the comments for this report are too long. To view
the rest of the comments, please view the bug report online at
http://bugs.php.net/49687

-- 
Edit this bug report at http://bugs.php.net/?id=49687&edit=1



#49687 [Opn]: utf8_decode xml_utf8_decode vuln

2009-09-28 Thread rasmus
 ID:   49687
 Updated by:   ras...@php.net
 Reported By:  sird at rckc dot at
 Status:   Open
 Bug Type: *Unicode Issues
 Operating System: *
 PHP Version:  5.2.11
 New Comment:

> there are several bugs in the code, one of them is that a variable
holding the value of the char is overflowed (trying to put 21 bits in
a
16 bits int)

That was fixed in 5.2.11


Previous Comments:


[2009-09-29 01:58:37] sird at rckc dot at

it is a PHP bug, the function is not decoding correctly, check the ppt
and the acunetix blog for details.

there are several bugs in the code, one of them is that a variable
holding the value of the char is overflowed (trying to put 21 bits in a
16 bits int), also the code is not checking if it is a valid unicode
char (reading unicode specification should explain it).

the example r...@80sec gave you was an overlong utf representation of a
single quote. that is forbidden by unicode, and should transform the
char to ?.

also, the code is not checking if the chars are valid UTF, so stuff
like:  are going to
be transformed to 


this is a very serious vulnerability and there are several bugs in the
same function (there's even unreachable code).

you can check the implementation of utf by Mozilla or Webkit, they do
it right. dont use java as a reference since they are also flawed.

due to the fact that PHP is for web applications and utf is widely
used, and it allows an attacker to do all type of attacks (from sql
injection to xss) its imperative to fix that function.

Greetings!!



[2009-09-28 19:38:24] sjo...@php.net

Is this a bug in PHP or in scripts which do utf8_decode(addslashes())
instead of addslashes(utf8_decode())? What do you propose to solve this
bug?



[2009-09-27 11:20:30] sird at rckc dot at

Description:

Taken from: http://bugs.php.net/bug.php?id=48230
> 
> Description:
> 
> xml_utf8_decode function incorrectly decode.
> 
> Reproduce code:
> ---
>  $ill=chr(0xf0).chr(0xc0).chr(0xc0).chr(0xa7);
> $ill=addslashes($ill);
> echo utf8_decode("$ill");
> echo htmlspecialchars ($ill,ENT_QUOTES,"utf-8" );
> ?>
> 
> Expected result:
> 
> it will output a "'" incorrectly.
> 
> Actual result:
> --
> it will output a "'" incorrectly.


This is actually a PHP security vulnerability.

Timeline:
* Reported by r...@80sec.com: May 11
* Discovered by webmas...@lapstore.de: June 19
* Discovered by Giorgio Maone / Eduardo Vela: July 14
* Reported and Fixed on PHPIDS: July 14
* Microsoft notified of a XSS Filter bypass: July 14
* Fixed XSS Filter bypass on NoScript 1.9.6:  July 20
* Vulnerability disclosed on BlackHat USA 2009: July 29
* Added signature to Acunetix WVS: August 14

References:
*
http://www.blackhat.com/presentations/bh-usa-09/VELANAVA/BHUSA09-VelaNava-FavoriteXSS-SLIDES.pdf
*
http://www.acunetix.com/blog/web-security-articles/security-risks-associated-with-utf8_decode/
* http://us2.php.net/manual/en/function.utf8-decode.php#83935
* http://bugs.php.net/bug.php?id=48230
* http://noscript.net/changelog

Read the references for further details.

Reproduce code:
---


Expected result:

? or 1=1-- -

Actual result:
--
' or 1=1--





-- 
Edit this bug report at http://bugs.php.net/?id=49687&edit=1