ID:               18521
 Updated by:       [EMAIL PROTECTED]
 Reported By:      [EMAIL PROTECTED]
-Status:           Open
+Status:           Closed
 Bug Type:         Strings related
 Operating System: Linux (RedHat 7.2)
 PHP Version:      4.2.2
 New Comment:

htmlentities does not support iso-8859-7.
As a workaround, use mb_convert_encoding to translate the
string to utf-8 and then apply htmlentities to the result.

htmlentities should emit a warning if you request an unsupported
charset/encoding instead of silently falling back on latin-1.
Fixed in CVS.


Previous Comments:
------------------------------------------------------------------------

[2002-07-24 21:29:00] [EMAIL PROTECTED]

Attempting to insert a sample script here, but since the problem
results from Greek text input, I don't know whether or not the Greek I
paste in here will be submitted intact. If not, the script works as a
functioning submission form if you can input Greek characters. See
http://alt.baltimoreimc.org/test.php for an example.

<?
$body = "����������� �������� �������� ���������� ������������� �����
��� �������, ���� ��� ���������� �� ����������� �� �� ����-�������
�������, �������� ������� �����, ��� �� ������ ������ ��� ���
����������� ��� ���� ���� �� ������.� �� ������, ������������
���������� ���������� ��� ����������� �������, ����������� ��
���������� ���� ����������� ����� ���������� ��������� ���
�������������.";
if ($body) {
        echo ('<hr>RAW (should be clean)');
        echo ('<hr>');
        echo ($body);
        echo ('<hr>HTMLENTITIES (should appear as accented Latin 1)');
        echo ('<hr>');
        echo (htmlentities($body,ENT_COMPAT,'ISO-8859-7'));
        echo ('<hr>HTMLSPECIALCHARS (should be clean)');
        echo ('<hr>');
        echo (htmlspecialchars($body,ENT_COMPAT,'ISO-8859-7'));
        echo ('<hr>');
} else {
?>
<form action="<? echo $PHP_SELF; ?>" method="post"
accept-charset="iso-8859-1,iso-8859-7">
        <textarea name="body" rows="4" cols="50"></textarea>
        <br>
        <input type="submit" name="submit" value="submit">
</form>
<?
}
?>

------------------------------------------------------------------------

[2002-07-24 19:26:34] [EMAIL PROTECTED]

Provide a short but complete example script for us to test with. (the
sample text also!)


------------------------------------------------------------------------

[2002-07-23 23:29:50] [EMAIL PROTECTED]

PHP compiled with:
./configure --with-config-file-path=/usr/local --with-mysql --with-gd
--with-gettext=/usr/bin --with-jpeg-dir --with-png-dir --with-tiff-dir
--with-ttf --with--enable-bcmath --enable-inline-optimization
--enable-sysvsem --enable-sysvshm --enable-trans-sid
--enable-shared-pdflib --with-regex=system --with-zlib
--with-curl=/usr/include --enable-sockets --with-apxs=/usr/sbin/apxs

I have been attempting to internationalize my PHP application
(specifically to support Greek). Data is stored in MySQL, and displayed
on the page using

nl2br(htmlentities($body))

For internationalization, I changed this to

nl2br(htmlentities($body,ENT_COMPAT,'ISO-8859-7'))

...but the Greek text appears on the page as accented Latin 1
characters, rather than Greek. (This line is the only "short script"
necessary to reproduce the problem, as long as the $body variable
contains Greek text).

If I use htmlspecialchars() instead, still indicating the charset, the
text appears fine (in Greek). If I simply use 

nl2br($body)

it also appears in Greek (but won't, obviously, escape any special
characters). So its appears to be directly related to htmlentities().

The web page itself contains a META header indicating
content="text/html; charset=ISO-8859-7"

...so the page should know what charset to expect, and shouldn't be
part of the problem.

This bug was also present in PHP 4.2.1.


------------------------------------------------------------------------


-- 
Edit this bug report at http://bugs.php.net/?id=18521&edit=1

Reply via email to