Re: duplicate rows in spite of multi-column unique constraint

2015-03-24 Thread Johan De Meersman
Please do 

  select id, customer_id, concat('-', group_id, '-') 
from app_customergroupmembership 
where customer_id ='ajEiQA';

I suspect one of those group IDs has a trailing space or similar 'invible' 
character that makes it not identical.


- Original Message -
 From: Chris Hornung chris.horn...@klaviyo.com
 To: MySql mysql@lists.mysql.com
 Sent: Monday, 23 March, 2015 18:20:36
 Subject: duplicate rows in spite of multi-column unique constraint

 Hello,
 
 I'm come across a situation where a table in our production DB has a
 relatively small number of duplicative rows that seemingly defy the
 unique constraint present on that table.
 
 We're running MySQL 5.6.19a via Amazon RDS. The table in question is
 ~250M rows.
 
 `show create table` gives:
 
 app_customergroupmembership | CREATE TABLE `app_customergroupmembership` (
   `id` bigint(20) unsigned NOT NULL AUTO_INCREMENT,
   `customer_id` varchar(6) COLLATE utf8_bin NOT NULL,
   `group_id` varchar(6) COLLATE utf8_bin NOT NULL,
   `created` datetime NOT NULL,
   PRIMARY KEY (`id`),
   UNIQUE KEY `app_customergroupmembership_customer_id_31afe160_uniq`
 (`customer_id`,`group_id`),
   KEY `app_customergroupmembership_group_id_18aedd38e3f8a4a0`
 (`group_id`,`created`)
 ) ENGINE=InnoDB AUTO_INCREMENT=21951158253 DEFAULT CHARSET=utf8
 COLLATE=utf8_bin
 
 
 Despite that, records with duplicate customer_id/group_id do exist:
 
 mysql select * from app_customergroupmembership where customer_id =
 'ajEiQA';
 +-+-+--+-+
| id  | customer_id | group_id | created |
 +-+-+--+-+
| 20279608258 | ajEiQA  | ddH6Ev   | 2015-02-17 00:14:54 |
| 20279608269 | ajEiQA  | ddH6Ev   | 2015-02-17 00:14:54 |
 +-+-+--+-+
 
 Interestingly, these dupe records can't seem to be queried when using
 both columns from the unique constraint in the WHERE clause:
 
 mysql select * from app_customergroupmembership where customer_id =
 'ajEiQA' and group_id = 'ddH6Ev';
 +-+-+--+-+
| id  | customer_id | group_id | created |
 +-+-+--+-+
| 20279608258 | ajEiQA  | ddH6Ev   | 2015-02-17 00:14:54 |
 +-+-+--+-+
 
 
 Any thoughts on how this situation came to pass, and how to prevent it
 from happening?
 
 Thanks,
 --
 Chris Hornung

-- 
Unhappiness is discouraged and will be corrected with kitten pictures.

-- 
MySQL General Mailing List
For list archives: http://lists.mysql.com/mysql
To unsubscribe:http://lists.mysql.com/mysql



Re: duplicate rows in spite of multi-column unique constraint

2015-03-24 Thread Chris Hornung
Thanks for the suggestions regarding non-printing characters, definitely 
makes sense as a likely culprit!


However, the data really does seem to be identical in this case:

mysql select id, customer_id, concat('-', group_id, '-') from 
app_customergroupmembership  where customer_id ='ajEiQA';

+-+-++
| id  | customer_id | concat('-', group_id, '-') |
+-+-++
| 20279608258 | ajEiQA  | -ddH6Ev-   |
| 20279608269 | ajEiQA  | -ddH6Ev-   |
+-+-++
2 rows in set (0.00 sec)


I also ran the data through hexdump as a secondary check, also looks 
identical:


 mysql  --defaults-extra-file=~/.customers_mysql.cnf app -s -e select 
id, customer_id, group_id from app_customergroupmembership  where 
customer_id ='ajEiQA';  | hexdump -c

000   2   0   2   7   9   6   0   8   2   5   8  \t   a   j   E   i
010   Q   A  \t   d   d   H   6   E   v  \n   2   0   2   7   9   6
020   0   8   2   6   9  \t   a   j   E   i   Q   A  \t   d   d   H
030   6   E   v  \n


Any other suggestions given this info?


Thanks,
--
Chris Hornung

Johan De Meersman mailto:vegiv...@tuxera.be
March 24, 2015 at 6:08 AM
Please do

   select id, customer_id, concat('-', group_id, '-')
 from app_customergroupmembership
 where customer_id ='ajEiQA';

I suspect one of those group IDs has a trailing space or similar 'invible' 
character that makes it not identical.


- Original Message -

From: Chris Hornungchris.horn...@klaviyo.com
To: MySqlmysql@lists.mysql.com
Sent: Monday, 23 March, 2015 18:20:36
Subject: duplicate rows in spite of multi-column unique constraint



Hello,

I'm come across a situation where a table in our production DB has a
relatively small number of duplicative rows that seemingly defy the
unique constraint present on that table.

We're running MySQL 5.6.19a via Amazon RDS. The table in question is
~250M rows.

`show create table` gives:

app_customergroupmembership | CREATE TABLE `app_customergroupmembership` (
   `id` bigint(20) unsigned NOT NULL AUTO_INCREMENT,
   `customer_id` varchar(6) COLLATE utf8_bin NOT NULL,
   `group_id` varchar(6) COLLATE utf8_bin NOT NULL,
   `created` datetime NOT NULL,
   PRIMARY KEY (`id`),
   UNIQUE KEY `app_customergroupmembership_customer_id_31afe160_uniq`
(`customer_id`,`group_id`),
   KEY `app_customergroupmembership_group_id_18aedd38e3f8a4a0`
(`group_id`,`created`)
) ENGINE=InnoDB AUTO_INCREMENT=21951158253 DEFAULT CHARSET=utf8
COLLATE=utf8_bin


Despite that, records with duplicate customer_id/group_id do exist:

mysql  select * from app_customergroupmembership where customer_id =
'ajEiQA';
+-+-+--+-+
| id  | customer_id | group_id | created |
+-+-+--+-+
| 20279608258 | ajEiQA  | ddH6Ev   | 2015-02-17 00:14:54 |
| 20279608269 | ajEiQA  | ddH6Ev   | 2015-02-17 00:14:54 |
+-+-+--+-+

Interestingly, these dupe records can't seem to be queried when using
both columns from the unique constraint in the WHERE clause:

mysql  select * from app_customergroupmembership where customer_id =
'ajEiQA' and group_id = 'ddH6Ev';
+-+-+--+-+
| id  | customer_id | group_id | created |
+-+-+--+-+
| 20279608258 | ajEiQA  | ddH6Ev   | 2015-02-17 00:14:54 |
+-+-+--+-+


Any thoughts on how this situation came to pass, and how to prevent it
from happening?

Thanks,
--
Chris Hornung






Re: duplicate rows in spite of multi-column unique constraint

2015-03-24 Thread shawn l.green

Hi Chris,

On 3/24/2015 10:07 AM, Chris Hornung wrote:

Thanks for the suggestions regarding non-printing characters, definitely
makes sense as a likely culprit!

However, the data really does seem to be identical in this case:

mysql select id, customer_id, concat('-', group_id, '-') from
app_customergroupmembership  where customer_id ='ajEiQA';
+-+-++
| id  | customer_id | concat('-', group_id, '-') |
+-+-++
| 20279608258 | ajEiQA  | -ddH6Ev-   |
| 20279608269 | ajEiQA  | -ddH6Ev-   |
+-+-++
2 rows in set (0.00 sec)


I also ran the data through hexdump as a secondary check, also looks
identical:

  mysql  --defaults-extra-file=~/.customers_mysql.cnf app -s -e select
id, customer_id, group_id from app_customergroupmembership  where
customer_id ='ajEiQA';  | hexdump -c
000   2   0   2   7   9   6   0   8   2   5   8  \t   a   j   E   i
010   Q   A  \t   d   d   H   6   E   v  \n   2   0   2   7   9   6
020   0   8   2   6   9  \t   a   j   E   i   Q   A  \t   d   d   H
030   6   E   v  \n


Any other suggestions given this info?




This reminded me of something so I went digging.  Turns out to be a bug 
introduced by a fix applied to a different bug in 5.6.12. We hate 
creating regressions but they do sometimes happen.


http://bugs.mysql.com/bug.php?id=73170

The fix was published in 5.5.40, 5.6.21, and 5.7.5.  You will need to 
upgrade to that release (or any later release) to avoid this happening 
in the future.


For now, manually resolve the duplication by deciding which id value you 
want to keep and discard the other copy of the row.


--
Shawn Green
MySQL Senior Principal Technical Support Engineer
Oracle USA, Inc. - Hardware and Software, Engineered to Work Together.
Office: Blountville, TN

--
MySQL General Mailing List
For list archives: http://lists.mysql.com/mysql
To unsubscribe:http://lists.mysql.com/mysql



duplicate rows in spite of multi-column unique constraint

2015-03-23 Thread Chris Hornung

Hello,

I'm come across a situation where a table in our production DB has a 
relatively small number of duplicative rows that seemingly defy the 
unique constraint present on that table.


We're running MySQL 5.6.19a via Amazon RDS. The table in question is 
~250M rows.


`show create table` gives:

app_customergroupmembership | CREATE TABLE `app_customergroupmembership` (
  `id` bigint(20) unsigned NOT NULL AUTO_INCREMENT,
  `customer_id` varchar(6) COLLATE utf8_bin NOT NULL,
  `group_id` varchar(6) COLLATE utf8_bin NOT NULL,
  `created` datetime NOT NULL,
  PRIMARY KEY (`id`),
  UNIQUE KEY `app_customergroupmembership_customer_id_31afe160_uniq` 
(`customer_id`,`group_id`),
  KEY `app_customergroupmembership_group_id_18aedd38e3f8a4a0` 
(`group_id`,`created`)
) ENGINE=InnoDB AUTO_INCREMENT=21951158253 DEFAULT CHARSET=utf8 
COLLATE=utf8_bin



Despite that, records with duplicate customer_id/group_id do exist:

mysql select * from app_customergroupmembership where customer_id = 
'ajEiQA';

+-+-+--+-+
| id  | customer_id | group_id | created |
+-+-+--+-+
| 20279608258 | ajEiQA  | ddH6Ev   | 2015-02-17 00:14:54 |
| 20279608269 | ajEiQA  | ddH6Ev   | 2015-02-17 00:14:54 |
+-+-+--+-+

Interestingly, these dupe records can't seem to be queried when using 
both columns from the unique constraint in the WHERE clause:


mysql select * from app_customergroupmembership where customer_id = 
'ajEiQA' and group_id = 'ddH6Ev';

+-+-+--+-+
| id  | customer_id | group_id | created |
+-+-+--+-+
| 20279608258 | ajEiQA  | ddH6Ev   | 2015-02-17 00:14:54 |
+-+-+--+-+


Any thoughts on how this situation came to pass, and how to prevent it 
from happening?


Thanks,
--
Chris Hornung