On Fri, Aug 1, 2008 at 9:50 AM, Yeti <[EMAIL PROTECTED]> wrote: > <?php > *# Hello Community > # Internationalisation, a topic discussed more than enough and YES, I am > looking forward to PHP6. > # But in reality I still have to develop for PHP4 and that's where the dog > is burried ^^ > # We have a customer here who is running a small site, but still in five > different languages. > # Lately he started complaining about some strange site behaviours: > > # He has a discussion board where people can post their ideas, comments etc. > Nothing special > # Every post has a maximum length of 2048 characters, which is checked by > JavaScript at the Browser > # and after submitting the form by PHP. > > # Our mistake was to use strlen();* > global $cc_strlen; global $cc_mb; > $cc_strlen = $cc_mb = 0; > if (array_key_exists('text', $_POST)) { > $cc_strlen = strlen($_POST['text']); > $cc_mb = mb_strlen($_POST['text'], 'UTF-8'); *// new code* > if ($cc_strlen > 2048) { /* snip */ } // do something > } > > /* snip */ // do something > > *#this works fine as long as the user only submits single byte charachters, > but with UTF-8 the whole thing changes ..* > ?> > <!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Transitional//EN" " > http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd"> > <html xmlns="http://www.w3.org/1999/xhtml"> > <head> > <meta http-equiv="Content-Type" content="text/html; charset=utf-8" /> > <title>test</title> > </head> > <body> > <p>You submitted <?php echo $cc_strlen; ?> characters (STRLEN).</p> > <p>You submitted <?php echo $cc_mb; ?> characters (MB_STRLEN).</p> > <p>Characters Left:<span id="remainder">2048</span></p> > <form action="" method="post" onsubmit="return false;" id="post_form"> > <textarea id="post_text" name="text" onkeydown="check_length();" > onchange="check_length();" rows="10" cols="50">œŸŒ‡Ņ</textarea><br /> > <input type="submit" value="Submit" id="post_button" > onclick="submit_form();" /> > </form> > <script type="text/javascript"> > <!-- > var the_form = document.getElementById('post_form'); > var textarea = document.getElementById('post_text'); > var counter = document.getElementById('remainder'); > function check_length() { > var remainder = 2048 - textarea.value.length; > var length_alert = false; > if (remainder < 0) { > remainder = 0; > for (var count = textarea.value.length; (count >= 2048); (count -= 1)) { > textarea.value = textarea.value.substr(0, 2047); > counter.style.color = 'red' > length_alert = true; > } > } > if (length_alert) alert('You are already using 2048 characters.'); > if (document.all) { > counter.innerText = remainder; > } else { > counter.textContent = remainder; > } > } > function submit_form() { > check_length(); > the_form.submit(); > alert ('You submitted ' + textarea.value.length + ' characters'); > return true; > } > --> > </script> > <?php > *# Now as soon as one is starting to submit UTF-8 characters strlen is not > working proberly any more > # So we had to work through thousands of lines of code, replacing strlen() > with mb_strlen(); > # We also found mb_strlen to take about 8 times longer than strlen().* > > $s_t = microtime(); > mb_strlen('œŸŒ‡Ņ', 'UTF-8'); > $e_t = microtime(); > echo '<p>MB_STRLEN took : '.(($e_t - $s_t)*1000).' milliseconds</p>'; > $s_t = microtime(); > strlen('œŸŒ‡Ņ'); > $e_t = microtime(); > echo '<p>STRLEN took : '.(($e_t - $s_t)*1000).' milliseconds</p>'; > > *# So much for internationalisation. > # Just writing this as a reminder for everyone who is facing similar > situations.* > ?> > </body> > </html> >
You can't determine timing by simply calling each function one time. I changed your script to the following: <?php $iterations = 10000; $s_t = microtime(true); for ($i = 0; $i < $iterations; ++$i) { mb_strlen('œŸŒ‡Ņ', 'UTF-8'); } $e_t = microtime(true); echo '<p>MB_STRLEN took : '.(($e_t - $s_t)*1000/$iterations).' milliseconds</p>'; $s_t = microtime(true); for ($i = 0; $i < $iterations; ++$i) { strlen('œŸŒ‡Ņ'); } $e_t = microtime(true); echo '<p>STRLEN took : '.(($e_t - $s_t)*1000/$iterations).' milliseconds</p>'; ?> I ran this script several times, and the results below are fairly typical: MB_STRLEN took : 0.054733037948608 milliseconds STRLEN took : 0.037568092346191 milliseconds The multi-byte function is slower, but not even by a factor of 2 on my development machine. Andrew