At 8:01 AM +0200 10/1/07, Crab Hunt wrote:
Hi,
Is there a fix for removing the junk characters that appear when we copy and
paste some text from Microsoft word into a php form ? For example the double
quotes "" turn into something like *รข??*

thanks in advance.

Crab:

That's something I'm working on as well. Part of the problem is that not only does M$ inject junk, but the user may actually be putting in something other than ASCII. So, some of the strange stuff, may not be junk.

As such, I found this (use both):

$text = preg_replace('/([\xc0-\xdf].)/se', "'&#' . ((ord(substr('$1', 0, 1)) - 192) * 64 + (ord(substr('$1', 1, 1)) - 128)) . ';'", $text);

$text = preg_replace('/([\xe0-\xef]..)/se', "'&#' . ((ord(substr('$1', 0, 1)) - 224) * 4096 + (ord(substr('$1', 1, 1)) - 128) * 64 + (ord(substr('$1', 2, 1)) - 128)) . ';'", $text);

This is supposed to replace all characters (UTF-8) with their HTML entities.

This is untested by me, but shows promise.

If you find a simpler solution, please keep me in the loop.

Cheers,

tedd

--
-------
http://sperling.com  http://ancientstones.com  http://earthstones.com

--
PHP General Mailing List (http://www.php.net/)
To unsubscribe, visit: http://www.php.net/unsub.php

Reply via email to