ID:               43549
 Updated by:       [EMAIL PROTECTED]
 Reported By:      mariusads at helpedia dot com
-Status:           Open
+Status:           Feedback
 Bug Type:         Strings related
 Operating System: Redhat?,  Linux
 PHP Version:      5.2.5
 New Comment:

You never specified the charset for the page. This works fine:

<html>
<head> 
<meta http-equiv="content-type" content="text/html; charset=utf-8" />
</head>
<body>
<pre>
<?php
$text = isset($_REQUEST['text']) ? $_REQUEST['text'] : '';
  
var_dump($text);
var_dump(htmlentities($text,ENT_QUOTES,'UTF-8'));

?>
</pre>
<form name="A" method="post">
<textarea name="text"></textarea>
<input name="sub" type="submit" value="submit"/>
</form>
</body></html>



Previous Comments:
------------------------------------------------------------------------

[2007-12-10 11:45:38] mariusads at helpedia dot com

Just downloaded on my computer (Windows 2003, PHP 5.2.5 from website)
and the same problem occurs.

For example this one works: 

hxtp://devtgdb.definethis.org:90/pc/faq/5842/Diablo-page1.html

but this one doesn't:

hxtp://devtgdb.definethis.org:90/pc/faq/5845/Diablo-page1.html

The source code is identical, only difference is ads are disabled from
site config.
Also, if the links don't work, sorry, you may read this while I'm
sleeping and my computer is turned off. Otherwise, it's cable
4mbps/512kbps so they should work.

(again, please replace hxtp with http)

------------------------------------------------------------------------

[2007-12-10 11:24:16] mariusads at helpedia dot com

Here are several pages that show this problem with htmlentities:

hxtp://www.tgdb.net/pc/cheats/19556/18_Wheels_of_Steel_Convoy-page1.html
hxtp://www.tgdb.net/pc/faq/5845/Diablo-page1.html

The content on the second link worked fine up until the PHP version was
upgraded.

This page and lots of other work:

hxtp://www.tgdb.net/pc/faq/5841/Diablo-page1.html

So it's not a badly coded script in the sense that it worked as I
planned.

You can see the text right before it's being sent to htmlentities in
all pages in a html comment, you just have to view the source (with the
only difference that I've replaced '--' with '==' as -- is not allowed
in comments.

When I reported the problem to the hosting company, I have uploaded the
test script written in the first post on two of their servers and a
server from Dreamhost.
PHP 5.2.5 hxtp://www.helpedia.com/test2.php
PHP 5.2.5 hxtp://www.tgdb.net/test2.php
PHP 5.2.2 hxtp://www.definethis.org/test2.php

I've opened the file a.txt in Firefox, pressed Ctrl+A to select all
text, copied to Clipboard and pasted it to the form. Result is an empty
string on PHP 5.2.5 and the correct string on PHP 5.2.2. Correct result
also on my work computer with PHP 5.2.4

I didn't manage to download 5.2.5 on my work computer and test it, so I
guess it could be a bad build on the hosting company's servers. Will try
in the following hour.

(replace hxtp with http, this page thinks I'm spamming)

------------------------------------------------------------------------

[2007-12-10 09:32:57] [EMAIL PROTECTED]

Works fine for me. Are you sure you have everything as utf-8..ie. the
page you're sending the form from has content-type set to utf-8 ?

------------------------------------------------------------------------

[2007-12-09 23:59:03] mariusads at helpedia dot com

Description:
------------
I run a website that accepts game cheats submissions from users and
displays them in categories and so on.
User submits .txt files which are saved on the driver, a certain page
on the website reads the text file or a fragment of it, performs
htmlentities on it and displays it on the screen.

Recently, the hosting company upgraded PHP to PHP 5.2.5 and with
htmlentities returned an empty string when trying to escape it.

I understand this is probably because of that fix regarding multi-byte
characters in string, making htmlentities ignore input.
That seems dumb a bit, shouldn't it return at least a string part
that's before that multibyte character?

Anyway, the file submitted is plain text and I honestly don't know what
 characters are wrong, that it would make htmlentities to ignore the
text.
The file is uploaded here: http://www.tgdb.net/a.txt

In the scripts I have the following code:

function htmlesc($text)
{ 
$s = html_entity_decode($text,ENT_QUOTES,'UTF-8');
return htmlentities($s,ENT_QUOTES,'UTF-8');}
}

The text passes html_entity_decode with no problems but htmlentities
returns empty string.

If possible, could you please tell me how could I check in the future
if a string contains multibyte characters, so that i don't have this
problem?

Right now, the only solution the hosting company gave to me was to add
a rule in .htaccess which makes the server process the PHP files with
PHP4.

Thank you for your help.
Marius Hudea

PS. The captcha doesn't seem to work right, I'm sure I didn't get the
captcha wrong 8 times in a row

Reproduce code:
---------------
I've used the code below uploaded on several web servers to test:

<html><body>
<?
$text = $_REQUEST['text'];
echo htmlentities($text,ENT_QUOTES,'UTF-8');
?>
<form name="A" method="post">
<textarea name="text"></textarea>
<input name="sub" type="submit" value="submit"/>
</form>
</body></html>

Test file: http://www.tgdb.net/a.txt

Expected result:
----------------
Expected to have the text displayed on the screen, to have the function
return a non-empty string.
Expected at least a partial string, up to that error, not having to
check scripts for 5 minutes to see what went wrong.

Actual result:
--------------
Copy and paste text from a.txt results in an empty string.
Any other text is processed correctly.


------------------------------------------------------------------------


-- 
Edit this bug report at http://bugs.php.net/?id=43549&edit=1

Reply via email to