The subject of character set detection (yes, I know, a hard problem to solve) came up on SO chat, and Niki noticed that we don't yet wrap the ICU UCharsetDetector API so I volunteered to put something together.
https://github.com/php/php-src/compare/master...sgolemon:intl.charsetdetector The trouble is, for the WIDE majority of my test cases so far, ICU is really bad at detecting character sets correctly (as I said, it's a tough problem). In fact, the ICU manual admits that it doesn't even look at all of the corpus text, and the "language detection" is a byproduct not meant for actual language detection. Given all that, I'm inclined to reject the idea of rolling this into PHP for fear of just confusing users without actually adding any value. Thoughts? -Sara -- PHP Internals - PHP Runtime Development Mailing List To unsubscribe, visit: http://www.php.net/unsub.php