Req #47494 [Nab]: htmlspecialchars does not throw E_WARNING on multibyte problems

2012-10-18 Thread rasmus
Edit report at https://bugs.php.net/bug.php?id=47494&edit=1

 ID: 47494
 Updated by: ras...@php.net
 Reported by:philipp dot feigl at gmail dot com
 Summary:htmlspecialchars does not throw E_WARNING on
 multibyte problems
 Status: Not a bug
 Type:   Feature/Change Request
 Package:Strings related
 Operating System:   CentOS5
 PHP Version:5.2.8
 Block user comment: N
 Private report: N

 New Comment:

user at dudmail dot com you seem confused. A properly configured server doesn't 
have display_errors on in production so I don't see how it is leaking in that 
case. 

And as was pointed out, this code has been revamped in 5.4 to give you a number 
of options of what to do with invalid chars which means there is no need for 
the 
warning anymore.


Previous Comments:

[2012-10-19 06:11:33] user at dudmail dot com

Not showing with display_errors = 1 to avoid leaks on badly configured servers, 
while showing and thus leaking sensitive information with properly configured 
servers? This is lame.


[2012-09-13 18:53:41] lzsiga at freemail dot c3 dot hu

It would be a valid reason, if there were any plan to support utf16/32, as 
iso-8859-x and utf-8 are ASCII-compatible. But even then, the default value for 
the $encoding parameter still could be 'ascii(or compatible)'.

Or, like some other string operations, there could be a mb_htmlspecialchars 
function.


[2012-09-13 17:25:08] ras...@php.net

By simple I assume you mean an htmlspecialchars() function that doesn't check 
the 
validity of the characters. The problem is that we have to do that. We can't 
encode characters without understanding which charset we are dealing with and 
we 
need to make sure that the character we are looking at is a valid one. The 
world 
has moved beyond 7-bit ASCII, sorry.


[2012-09-13 17:07:47] lzsiga at freemail dot c3 dot hu

If the name of the function were 
'check_for_multibyte_validity_and_htmlspecialchars' then you'd be right, but 
even then I'd lobby for a simple 'htmlspecialchars' function... Doing something 
(ie multibyte validity check) that the user (the PHP-programmer in this case) 
didn't specifically ask doesn't seem to me to be a good idea (see magic_quotes 
for another example).

PS: Of course I wouldn't complaining (or even know about the whole question) if 
the default value hadn't been changed to 'UTF-8' in 5.4.


[2012-09-06 15:33:13] ras...@php.net

Also note that many, if not most, apps use this as their only validity filter 
and 
if you output invalid UTF-8, for example, it can lead to security problems like 
the well-known IE 0xE0 XSS exploit. So at some point along the line you have to 
do a multi-byte check and it may as well be here since we need to do it anyway.




The remainder of the comments for this report are too long. To view
the rest of the comments, please view the bug report online at

https://bugs.php.net/bug.php?id=47494


-- 
Edit this bug report at https://bugs.php.net/bug.php?id=47494&edit=1


Req #47494 [Nab]: htmlspecialchars does not throw E_WARNING on multibyte problems

2012-09-13 Thread rasmus
Edit report at https://bugs.php.net/bug.php?id=47494&edit=1

 ID: 47494
 Updated by: ras...@php.net
 Reported by:philipp dot feigl at gmail dot com
 Summary:htmlspecialchars does not throw E_WARNING on
 multibyte problems
 Status: Not a bug
 Type:   Feature/Change Request
 Package:Strings related
 Operating System:   CentOS5
 PHP Version:5.2.8
 Block user comment: N
 Private report: N

 New Comment:

By simple I assume you mean an htmlspecialchars() function that doesn't check 
the 
validity of the characters. The problem is that we have to do that. We can't 
encode characters without understanding which charset we are dealing with and 
we 
need to make sure that the character we are looking at is a valid one. The 
world 
has moved beyond 7-bit ASCII, sorry.


Previous Comments:

[2012-09-13 17:07:47] lzsiga at freemail dot c3 dot hu

If the name of the function were 
'check_for_multibyte_validity_and_htmlspecialchars' then you'd be right, but 
even then I'd lobby for a simple 'htmlspecialchars' function... Doing something 
(ie multibyte validity check) that the user (the PHP-programmer in this case) 
didn't specifically ask doesn't seem to me to be a good idea (see magic_quotes 
for another example).

PS: Of course I wouldn't complaining (or even know about the whole question) if 
the default value hadn't been changed to 'UTF-8' in 5.4.


[2012-09-06 15:33:13] ras...@php.net

Also note that many, if not most, apps use this as their only validity filter 
and 
if you output invalid UTF-8, for example, it can lead to security problems like 
the well-known IE 0xE0 XSS exploit. So at some point along the line you have to 
do a multi-byte check and it may as well be here since we need to do it anyway.


[2012-09-06 15:29:07] ras...@php.net

You assume ASCII7 compatibility for all encodings which is a bad assumption.


[2012-09-06 11:39:19] lzsiga at freemail dot c3 dot hu

Imho htmlspecialchars should not check for multi-byte validity at all, because 
it only deals with a few characters that are all in ASCII7, so it could safely 
ignore every byte between 0x80 and 0xFF. The third parameter could be simply 
ignored (as if it were 'ISO-8859-1')


[2012-08-30 19:21:49] ni...@php.net

@the disappointed user: PHP 5.4 no longer throws said warning (it was just 
confusing). Instead there are several new options for dealing with incorrect 
encoding. Of particular interest is ENT_SUBSTITUTE, which will replace invalid 
code unit sequences with the Unicode Replacement Character (instead of 
returning a rather unhelpful empty string). This way you can easily spot where 
the string is incorrectly encoded. Furthermore this option has the additional 
advantage of being more graceful (it just removed individual incorrectly 
encoded bytes, not the whole string).

Hope this helps you. More info in the docs: http://de2.php.net/htmlspecialchars




The remainder of the comments for this report are too long. To view
the rest of the comments, please view the bug report online at

https://bugs.php.net/bug.php?id=47494


-- 
Edit this bug report at https://bugs.php.net/bug.php?id=47494&edit=1


Req #47494 [Nab]: htmlspecialchars does not throw E_WARNING on multibyte problems

2012-09-06 Thread rasmus
Edit report at https://bugs.php.net/bug.php?id=47494&edit=1

 ID: 47494
 Updated by: ras...@php.net
 Reported by:philipp dot feigl at gmail dot com
 Summary:htmlspecialchars does not throw E_WARNING on
 multibyte problems
 Status: Not a bug
 Type:   Feature/Change Request
 Package:Strings related
 Operating System:   CentOS5
 PHP Version:5.2.8
 Block user comment: N
 Private report: N

 New Comment:

Also note that many, if not most, apps use this as their only validity filter 
and 
if you output invalid UTF-8, for example, it can lead to security problems like 
the well-known IE 0xE0 XSS exploit. So at some point along the line you have to 
do a multi-byte check and it may as well be here since we need to do it anyway.


Previous Comments:

[2012-09-06 15:29:07] ras...@php.net

You assume ASCII7 compatibility for all encodings which is a bad assumption.


[2012-09-06 11:39:19] lzsiga at freemail dot c3 dot hu

Imho htmlspecialchars should not check for multi-byte validity at all, because 
it only deals with a few characters that are all in ASCII7, so it could safely 
ignore every byte between 0x80 and 0xFF. The third parameter could be simply 
ignored (as if it were 'ISO-8859-1')


[2012-08-30 19:21:49] ni...@php.net

@the disappointed user: PHP 5.4 no longer throws said warning (it was just 
confusing). Instead there are several new options for dealing with incorrect 
encoding. Of particular interest is ENT_SUBSTITUTE, which will replace invalid 
code unit sequences with the Unicode Replacement Character (instead of 
returning a rather unhelpful empty string). This way you can easily spot where 
the string is incorrectly encoded. Furthermore this option has the additional 
advantage of being more graceful (it just removed individual incorrectly 
encoded bytes, not the whole string).

Hope this helps you. More info in the docs: http://de2.php.net/htmlspecialchars


[2012-08-30 19:01:22] another_disappointed_php_programmer at exam

This is very sad.

This is a bug, and it's sad that PHP core developers said that it's a feature 
and it won't be fixed. I'm disappointed.


[2012-07-01 15:34:03] ras...@php.net

This really isn't a bug. I do agree that the approach isn't ideal, but we 
shouldn't throw warnings on bad input here because htmlspecialchars() is 
explicitly designed to clean up bad input and it is run directly on user data 
most of the time. In order for someone to avoid this warning they would need to 
first call something like iconv('utf-8','utf-8') to clean up the input data and 
that doesn't make much sense since htmlspecialchars() essentially does that 
already. But, in order to help debugging there should be some way to see why an 
htmlspecialchars() call failed so a last_error() function similar to how it is 
handled for json decoding would make sense.




The remainder of the comments for this report are too long. To view
the rest of the comments, please view the bug report online at

https://bugs.php.net/bug.php?id=47494


-- 
Edit this bug report at https://bugs.php.net/bug.php?id=47494&edit=1


Req #47494 [Nab]: htmlspecialchars does not throw E_WARNING on multibyte problems

2012-09-06 Thread rasmus
Edit report at https://bugs.php.net/bug.php?id=47494&edit=1

 ID: 47494
 Updated by: ras...@php.net
 Reported by:philipp dot feigl at gmail dot com
 Summary:htmlspecialchars does not throw E_WARNING on
 multibyte problems
 Status: Not a bug
 Type:   Feature/Change Request
 Package:Strings related
 Operating System:   CentOS5
 PHP Version:5.2.8
 Block user comment: N
 Private report: N

 New Comment:

You assume ASCII7 compatibility for all encodings which is a bad assumption.


Previous Comments:

[2012-09-06 11:39:19] lzsiga at freemail dot c3 dot hu

Imho htmlspecialchars should not check for multi-byte validity at all, because 
it only deals with a few characters that are all in ASCII7, so it could safely 
ignore every byte between 0x80 and 0xFF. The third parameter could be simply 
ignored (as if it were 'ISO-8859-1')


[2012-08-30 19:21:49] ni...@php.net

@the disappointed user: PHP 5.4 no longer throws said warning (it was just 
confusing). Instead there are several new options for dealing with incorrect 
encoding. Of particular interest is ENT_SUBSTITUTE, which will replace invalid 
code unit sequences with the Unicode Replacement Character (instead of 
returning a rather unhelpful empty string). This way you can easily spot where 
the string is incorrectly encoded. Furthermore this option has the additional 
advantage of being more graceful (it just removed individual incorrectly 
encoded bytes, not the whole string).

Hope this helps you. More info in the docs: http://de2.php.net/htmlspecialchars


[2012-08-30 19:01:22] another_disappointed_php_programmer at exam

This is very sad.

This is a bug, and it's sad that PHP core developers said that it's a feature 
and it won't be fixed. I'm disappointed.


[2012-07-01 15:34:03] ras...@php.net

This really isn't a bug. I do agree that the approach isn't ideal, but we 
shouldn't throw warnings on bad input here because htmlspecialchars() is 
explicitly designed to clean up bad input and it is run directly on user data 
most of the time. In order for someone to avoid this warning they would need to 
first call something like iconv('utf-8','utf-8') to clean up the input data and 
that doesn't make much sense since htmlspecialchars() essentially does that 
already. But, in order to help debugging there should be some way to see why an 
htmlspecialchars() call failed so a last_error() function similar to how it is 
handled for json decoding would make sense.


[2012-07-01 15:12:31] chris at cbsinteractive dot com

Happening our production servers, can replicate, PHP 5.3.10, Centos 5.6




The remainder of the comments for this report are too long. To view
the rest of the comments, please view the bug report online at

https://bugs.php.net/bug.php?id=47494


-- 
Edit this bug report at https://bugs.php.net/bug.php?id=47494&edit=1


Req #47494 [Nab]: htmlspecialchars does not throw E_WARNING on multibyte problems

2012-08-30 Thread nikic
Edit report at https://bugs.php.net/bug.php?id=47494&edit=1

 ID: 47494
 Updated by: ni...@php.net
 Reported by:philipp dot feigl at gmail dot com
 Summary:htmlspecialchars does not throw E_WARNING on
 multibyte problems
 Status: Not a bug
 Type:   Feature/Change Request
 Package:Strings related
 Operating System:   CentOS5
 PHP Version:5.2.8
 Block user comment: N
 Private report: N

 New Comment:

@the disappointed user: PHP 5.4 no longer throws said warning (it was just 
confusing). Instead there are several new options for dealing with incorrect 
encoding. Of particular interest is ENT_SUBSTITUTE, which will replace invalid 
code unit sequences with the Unicode Replacement Character (instead of 
returning a rather unhelpful empty string). This way you can easily spot where 
the string is incorrectly encoded. Furthermore this option has the additional 
advantage of being more graceful (it just removed individual incorrectly 
encoded bytes, not the whole string).

Hope this helps you. More info in the docs: http://de2.php.net/htmlspecialchars


Previous Comments:

[2012-08-30 19:01:22] another_disappointed_php_programmer at exam

This is very sad.

This is a bug, and it's sad that PHP core developers said that it's a feature 
and it won't be fixed. I'm disappointed.


[2012-07-01 15:34:03] ras...@php.net

This really isn't a bug. I do agree that the approach isn't ideal, but we 
shouldn't throw warnings on bad input here because htmlspecialchars() is 
explicitly designed to clean up bad input and it is run directly on user data 
most of the time. In order for someone to avoid this warning they would need to 
first call something like iconv('utf-8','utf-8') to clean up the input data and 
that doesn't make much sense since htmlspecialchars() essentially does that 
already. But, in order to help debugging there should be some way to see why an 
htmlspecialchars() call failed so a last_error() function similar to how it is 
handled for json decoding would make sense.


[2012-07-01 15:12:31] chris at cbsinteractive dot com

Happening our production servers, can replicate, PHP 5.3.10, Centos 5.6


[2011-09-27 22:43:02] rudd-o at rudd-o dot com

Reported to /r/lolphp here: 
http://www.reddit.com/r/lolphp/comments/kso6p/if_error_reporting_is_on_htmlspecia
lchars_will/

Do you guys realize there is an ENTIRE COMMUNITY of people devoted to the 
collective practice of FACEPALMING at PHP's fails?

Hahaha.


[2011-06-01 18:36:28] larry at garfieldtech dot com

This bug should be reopened, not just documented.  Haven't we learned our 
lesson with magic_quotes and its ilk?  Designing PHP to try and save the user 
when the user does something stupid always backfires.  Always.  MySQL has the 
same problem, and it backfires there, too.

The current logic is simply backward.  "When display_errors is on, you get all 
errors except from this function.  When display_errors is off, you get no 
errors except from this one function."  There is no logical reason for that.

I'm working on a project that has been stalled for over a week while we try to 
figure out what's wrong with the character encoding configuration on our 
production server, only to realize that the data is (probably) bad but we 
didn't know it because of this bug.

This is a bug and should be fixed, not simply documented as dumb.

If a production server is misconfigured, that's not the job of the language to 
fix.  All that does is, as another commenter noted, punish those who configure 
their servers properly.  If anything, it is a security hole for people who DO 
configure their server properly by turning off display_errors, as then these 
strings would get echoed in production.  How is that helpful to anyone?




The remainder of the comments for this report are too long. To view
the rest of the comments, please view the bug report online at

https://bugs.php.net/bug.php?id=47494


-- 
Edit this bug report at https://bugs.php.net/bug.php?id=47494&edit=1


Bug->Req #47494 [Nab]: htmlspecialchars does not throw E_WARNING on multibyte problems

2012-07-01 Thread rasmus
Edit report at https://bugs.php.net/bug.php?id=47494&edit=1

 ID: 47494
 Updated by: ras...@php.net
 Reported by:philipp dot feigl at gmail dot com
 Summary:htmlspecialchars does not throw E_WARNING on
 multibyte problems
 Status: Not a bug
-Type:   Bug
+Type:   Feature/Change Request
 Package:Strings related
 Operating System:   CentOS5
 PHP Version:5.2.8
 Block user comment: N
 Private report: N

 New Comment:

This really isn't a bug. I do agree that the approach isn't ideal, but we 
shouldn't throw warnings on bad input here because htmlspecialchars() is 
explicitly designed to clean up bad input and it is run directly on user data 
most of the time. In order for someone to avoid this warning they would need to 
first call something like iconv('utf-8','utf-8') to clean up the input data and 
that doesn't make much sense since htmlspecialchars() essentially does that 
already. But, in order to help debugging there should be some way to see why an 
htmlspecialchars() call failed so a last_error() function similar to how it is 
handled for json decoding would make sense.


Previous Comments:

[2012-07-01 15:12:31] chris at cbsinteractive dot com

Happening our production servers, can replicate, PHP 5.3.10, Centos 5.6


[2011-09-27 22:43:02] rudd-o at rudd-o dot com

Reported to /r/lolphp here: 
http://www.reddit.com/r/lolphp/comments/kso6p/if_error_reporting_is_on_htmlspecia
lchars_will/

Do you guys realize there is an ENTIRE COMMUNITY of people devoted to the 
collective practice of FACEPALMING at PHP's fails?

Hahaha.


[2011-06-01 18:36:28] larry at garfieldtech dot com

This bug should be reopened, not just documented.  Haven't we learned our 
lesson with magic_quotes and its ilk?  Designing PHP to try and save the user 
when the user does something stupid always backfires.  Always.  MySQL has the 
same problem, and it backfires there, too.

The current logic is simply backward.  "When display_errors is on, you get all 
errors except from this function.  When display_errors is off, you get no 
errors except from this one function."  There is no logical reason for that.

I'm working on a project that has been stalled for over a week while we try to 
figure out what's wrong with the character encoding configuration on our 
production server, only to realize that the data is (probably) bad but we 
didn't know it because of this bug.

This is a bug and should be fixed, not simply documented as dumb.

If a production server is misconfigured, that's not the job of the language to 
fix.  All that does is, as another commenter noted, punish those who configure 
their servers properly.  If anything, it is a security hole for people who DO 
configure their server properly by turning off display_errors, as then these 
strings would get echoed in production.  How is that helpful to anyone?


[2011-05-03 17:48:02] pinkgothic at gmail dot com

Could this bug please get REOPENED as a documentation bug
then? As already stated, the absence of the information in
the documentation can be crippling for people who do things
-right-. (Admittedly right now "htmlspecialchars" has my
comment on the subject, but that's hardly official...)

(Sidenote: You might also want to close Bug #54109 as bogus
for consistency.)


[2011-05-03 17:33:35] ras...@php.net

This isn't a logic error. The idea is to prevent a user-triggered information 
leak by not showing this error to the user in case a production server is 
misconfigured and running with display_errors turned on.




The remainder of the comments for this report are too long. To view
the rest of the comments, please view the bug report online at

https://bugs.php.net/bug.php?id=47494


-- 
Edit this bug report at https://bugs.php.net/bug.php?id=47494&edit=1