Re: Issue with em dash character

2015-06-03 Thread Peter Brawley

On 2015-06-03 2:55 PM, Robert Voliva wrote:

We're finding that, when working with the em dash character, the LEFT and
LENGTH functions don't work well together.  This query shows trying to
strip off the last character from a string containing an em dash:

mysql> select LEFT('031492349−0002,', LENGTH('031492349−0002,') - 1),
LENGTH('031492349−0002,'), LENGTH('031492349-0002,');
++-+---+
| LEFT('031492349−0002,', LENGTH('031492349−0002,') - 1) |
LENGTH('031492349−0002,')   | LENGTH('031492349-0002,') |
++-+---+
| 031492349−0002,|
  17 |15 |
++-+---+
1 row in set (0.06 sec)

Is this a bug?  If it's a "feature", what could we do instead to get around
this issue?


The last of the four '031...' strings in your query diverges from the 
others at the en-dash. In the earlier strings, the dash is a multibyte 
character whose hex value is E2, whereas the dash in the later string is 
the ASCII dash value 2D.


Since the earlier dashes are 3-byte chars, octet_length() returns 17 
instead of 15.


PB

-


Thanks,
Robert Voliva




--
MySQL General Mailing List
For list archives: http://lists.mysql.com/mysql
To unsubscribe:http://lists.mysql.com/mysql



Re: Issue with em dash character

2015-06-03 Thread Rik Wasmus
LENGTH() measures bytes, CHAR_LENGTH() measures characters. There's
little use for LENGTH() for anything else then raw bytes.

On Wed, Jun 3, 2015 at 10:29 PM, Robert Voliva  wrote:
> information_schema.columns reports a character_set_name of 'utf8' and a
> collation_name of 'utf8_general_ci'
>
> On Wed, Jun 3, 2015 at 3:14 PM, Emil Oppeln-Bronikowski 
> wrote:
>
>>
>>  Is this a bug?  If it's a "feature", what could we do instead to get
>>> around
>>> this issue?
>>>
>>
>> Is your column set to unicode?

-- 
MySQL General Mailing List
For list archives: http://lists.mysql.com/mysql
To unsubscribe:http://lists.mysql.com/mysql



Re: Issue with em dash character

2015-06-03 Thread Robert Voliva
information_schema.columns reports a character_set_name of 'utf8' and a
collation_name of 'utf8_general_ci'

On Wed, Jun 3, 2015 at 3:14 PM, Emil Oppeln-Bronikowski 
wrote:

>
>  Is this a bug?  If it's a "feature", what could we do instead to get
>> around
>> this issue?
>>
>
> Is your column set to unicode?
>
> --
> MySQL General Mailing List
> For list archives: http://lists.mysql.com/mysql
> To unsubscribe:http://lists.mysql.com/mysql
>
>


Re: Issue with em dash character

2015-06-03 Thread Emil Oppeln-Bronikowski



Is this a bug?  If it's a "feature", what could we do instead to get around
this issue?


Is your column set to unicode?

--
MySQL General Mailing List
For list archives: http://lists.mysql.com/mysql
To unsubscribe:http://lists.mysql.com/mysql



Issue with em dash character

2015-06-03 Thread Robert Voliva
We're finding that, when working with the em dash character, the LEFT and
LENGTH functions don't work well together.  This query shows trying to
strip off the last character from a string containing an em dash:

mysql> select LEFT('031492349−0002,', LENGTH('031492349−0002,') - 1),
LENGTH('031492349−0002,'), LENGTH('031492349-0002,');
++-+---+
| LEFT('031492349−0002,', LENGTH('031492349−0002,') - 1) |
LENGTH('031492349−0002,')   | LENGTH('031492349-0002,') |
++-+---+
| 031492349−0002,|
 17 |15 |
++-+---+
1 row in set (0.06 sec)

Is this a bug?  If it's a "feature", what could we do instead to get around
this issue?

Thanks,
Robert Voliva