Finally: The Unicode 3.0.1 standard changes the definition of UTF-8 such that overlong sequences must be signalled as an error condition by a conforming decoder, which is what we had recommended anyway for a long time for security reasons: http://www.unicode.org/unicode/uni2errata/UTF-8_Corrigendum.html Please check all your decoders. Test cases are on: http://www.cl.cam.ac.uk/~mgk25/ucs/examples/UTF-8-test.txt Markus -- Markus G. Kuhn, Computer Laboratory, University of Cambridge, UK Email: mkuhn at acm.org, WWW: <http://www.cl.cam.ac.uk/~mgk25/>