ID:          28646
 Comment by:  papercrane at reversefold dot com
 Reported By: php at richardneill dot org
 Status:      Open
 Bug Type:    Feature/Change Request
 PHP Version: 4.3.6
 New Comment:

If you're really worried about magic_quotes (which I don't use
anyway...), then str_demoroniser should be magic_quotes aware, escaping
quotes only if magic_quotes_runtime is on.

Or perhaps it should be a second parameter to the function to escape
quotes or not. Making it do one or the other would break *someone's*
scripts.


Previous Comments:
------------------------------------------------------------------------

[2004-06-06 00:59:16] php at richardneill dot org

For safety's sake, it's probably wiser to have

< 
> 
\`
\'
\"

as the replacements.

Otherwise, we have a nice big security hole, since magic_quotes gets
bypassed.

------------------------------------------------------------------------

[2004-06-05 23:55:34] php at richardneill dot org

Description:
------------
Feature request: str_demoronise()

On my website, I often find users pasting content that was written in
Microsoft Word, and which contains undisplayable "ASCII" characters
where there should be single/double quotes. Anyone viewing the result
on a non-MS platform gets to see rectangles instead of quotes.

The problem has been solved in perl here:
http://www.fourmilab.ch/webtools/demoroniser/
I quote: 
============
Microsoft use their own "extension" to Latin-1, in which a variety of
characters which do not appear in Latin-1 are inserted in the range
0x82 through 0x95--this having the merit of being incompatible with
both Latin-1 and Unicode, which reserve this region for additional
control characters.
=============

I'd like to suggest the addition of a str_demoronise() function which
fixes these wrong characters, and replaces them by the correct ASCII.




Reproduce code:
---------------
>From the source of demoroniser, here are the substitutions made. The MS
column is what Microsoft use (in Hex); the FIX column is the
replacement:

MS      FIX

0x82    ,
0x83    <em>f</em>
0x84    ,,
0x85    ...
0x88    ^
0x89    ' °/°°'            <-- whitsepace; no '' quotes
0x8B    <
0x8C    Oe
0x91    `
0x92    '
0x93    "
0x94    "
0x95    *
0x96    -
0x97    --
0x98    <sup>~</sup>
0x99    <sup>TM</sup>
0x9B    >
0x9C    oe



------------------------------------------------------------------------


-- 
Edit this bug report at http://bugs.php.net/?id=28646&edit=1

Reply via email to