On 2015-06-03 2:55 PM, Robert Voliva wrote:
We're finding that, when working with the em dash character, the LEFT and
LENGTH functions don't work well together.  This query shows trying to
strip off the last character from a string containing an em dash:

mysql> select LEFT('031492349−0002,', LENGTH('031492349−0002,') - 1),
LENGTH('031492349−0002,'), LENGTH('031492349-0002,');
+------------------------------------------------------------+-----------------------------+---------------------------+
| LEFT('031492349−0002,', LENGTH('031492349−0002,') - 1)     |
LENGTH('031492349−0002,')   | LENGTH('031492349-0002,') |
+------------------------------------------------------------+-----------------------------+---------------------------+
| 031492349−0002,                                            |
              17 |                        15 |
+------------------------------------------------------------+-----------------------------+---------------------------+
1 row in set (0.06 sec)

Is this a bug?  If it's a "feature", what could we do instead to get around
this issue?

The last of the four '031...' strings in your query diverges from the others at the en-dash. In the earlier strings, the dash is a multibyte character whose hex value is E2, whereas the dash in the later string is the ASCII dash value 2D.

Since the earlier dashes are 3-byte chars, octet_length() returns 17 instead of 15.

PB

-----

Thanks,
Robert Voliva



--
MySQL General Mailing List
For list archives: http://lists.mysql.com/mysql
To unsubscribe:    http://lists.mysql.com/mysql

Reply via email to