ID: 18521 Updated by: [EMAIL PROTECTED] Reported By: [EMAIL PROTECTED] -Status: Open +Status: Closed Bug Type: Strings related Operating System: Linux (RedHat 7.2) PHP Version: 4.2.2 New Comment:
htmlentities does not support iso-8859-7. As a workaround, use mb_convert_encoding to translate the string to utf-8 and then apply htmlentities to the result. htmlentities should emit a warning if you request an unsupported charset/encoding instead of silently falling back on latin-1. Fixed in CVS. Previous Comments: ------------------------------------------------------------------------ [2002-07-24 21:29:00] [EMAIL PROTECTED] Attempting to insert a sample script here, but since the problem results from Greek text input, I don't know whether or not the Greek I paste in here will be submitted intact. If not, the script works as a functioning submission form if you can input Greek characters. See http://alt.baltimoreimc.org/test.php for an example. <? $body = "����������� �������� �������� ���������� ������������� ����� ��� �������, ���� ��� ���������� �� ����������� �� �� ����-������� �������, �������� ������� �����, ��� �� ������ ������ ��� ��� ����������� ��� ���� ���� �� ������.� �� ������, ������������ ���������� ���������� ��� ����������� �������, ����������� �� ���������� ���� ����������� ����� ���������� ��������� ��� �������������."; if ($body) { echo ('<hr>RAW (should be clean)'); echo ('<hr>'); echo ($body); echo ('<hr>HTMLENTITIES (should appear as accented Latin 1)'); echo ('<hr>'); echo (htmlentities($body,ENT_COMPAT,'ISO-8859-7')); echo ('<hr>HTMLSPECIALCHARS (should be clean)'); echo ('<hr>'); echo (htmlspecialchars($body,ENT_COMPAT,'ISO-8859-7')); echo ('<hr>'); } else { ?> <form action="<? echo $PHP_SELF; ?>" method="post" accept-charset="iso-8859-1,iso-8859-7"> <textarea name="body" rows="4" cols="50"></textarea> <br> <input type="submit" name="submit" value="submit"> </form> <? } ?> ------------------------------------------------------------------------ [2002-07-24 19:26:34] [EMAIL PROTECTED] Provide a short but complete example script for us to test with. (the sample text also!) ------------------------------------------------------------------------ [2002-07-23 23:29:50] [EMAIL PROTECTED] PHP compiled with: ./configure --with-config-file-path=/usr/local --with-mysql --with-gd --with-gettext=/usr/bin --with-jpeg-dir --with-png-dir --with-tiff-dir --with-ttf --with--enable-bcmath --enable-inline-optimization --enable-sysvsem --enable-sysvshm --enable-trans-sid --enable-shared-pdflib --with-regex=system --with-zlib --with-curl=/usr/include --enable-sockets --with-apxs=/usr/sbin/apxs I have been attempting to internationalize my PHP application (specifically to support Greek). Data is stored in MySQL, and displayed on the page using nl2br(htmlentities($body)) For internationalization, I changed this to nl2br(htmlentities($body,ENT_COMPAT,'ISO-8859-7')) ...but the Greek text appears on the page as accented Latin 1 characters, rather than Greek. (This line is the only "short script" necessary to reproduce the problem, as long as the $body variable contains Greek text). If I use htmlspecialchars() instead, still indicating the charset, the text appears fine (in Greek). If I simply use nl2br($body) it also appears in Greek (but won't, obviously, escape any special characters). So its appears to be directly related to htmlentities(). The web page itself contains a META header indicating content="text/html; charset=ISO-8859-7" ...so the page should know what charset to expect, and shouldn't be part of the problem. This bug was also present in PHP 4.2.1. ------------------------------------------------------------------------ -- Edit this bug report at http://bugs.php.net/?id=18521&edit=1
