Re: Case confusion

2006-05-09 Thread Marcus Bointon

On 9 May 2006, at 02:22, Chris wrote:


I would run this query:

SELECT
   *
FROM mytable
WHERE LOWER(emailaddress) IN
   (SELECT
   LOWER(emailaddress)
   FROM mytable
   GROUP BY 1
   HAVING COUNT(emailaddress)  1)

This would show all duplicate emails, I would use the info this  
displays to choose which records to change/keep/delete.


May not be the best way, but it would work.


I'm giving it a go, but it's been running for about 4 hours at 60%  
CPU so far! Would it be quicker to tell it to use a case insensitive  
collation to locate the duplicates?


Marcus
--
Marcus Bointon
Synchromedia Limited: Putting you in the picture
[EMAIL PROTECTED] | http://www.synchromedia.co.uk


--
MySQL General Mailing List
For list archives: http://lists.mysql.com/mysql
To unsubscribe:http://lists.mysql.com/[EMAIL PROTECTED]



Re: Case confusion

2006-05-09 Thread Marcus Bointon

On 9 May 2006, at 13:12, Marcus Bointon wrote:


I'm giving it a go, but it's been running for about 4 hours at 60%  
CPU so far! Would it be quicker to tell it to use a case  
insensitive collation to locate the duplicates?




I managed to come up with a variation on my original attempt that got  
me what I needed.


Thanks,

Marcus
--
Marcus Bointon
Synchromedia Limited: Putting you in the picture
[EMAIL PROTECTED] | http://www.synchromedia.co.uk


--
MySQL General Mailing List
For list archives: http://lists.mysql.com/mysql
To unsubscribe:http://lists.mysql.com/[EMAIL PROTECTED]



Re: Re: Case confusion

2006-05-09 Thread 彭一凡
Though I do not know what your schema is, I think you may try to decompose it 
into several BCNFs 
which can erase the redundancy of emailaddress.
  
- Original Message - 
From: Marcus Bointon [EMAIL PROTECTED]
To: mysql@lists.mysql.com
Cc: Chris [EMAIL PROTECTED]
Sent: Tuesday, May 09, 2006 8:12 PM
Subject: Re: Case confusion


 On 9 May 2006, at 02:22, Chris wrote:
 
 I would run this query:

 SELECT
*
 FROM mytable
 WHERE LOWER(emailaddress) IN
(SELECT
LOWER(emailaddress)
FROM mytable
GROUP BY 1
HAVING COUNT(emailaddress)  1)

 This would show all duplicate emails, I would use the info this  
 displays to choose which records to change/keep/delete.

 May not be the best way, but it would work.
 
 I'm giving it a go, but it's been running for about 4 hours at 60%  
 CPU so far! Would it be quicker to tell it to use a case insensitive  
 collation to locate the duplicates?
 
 Marcus
 -- 
 Marcus Bointon
 Synchromedia Limited: Putting you in the picture
 [EMAIL PROTECTED] | http://www.synchromedia.co.uk

Re: Case confusion

2006-05-09 Thread Peter Brawley

Marcus,
I just noticed that a key field (emailaddress) in my db is case 
sensitive when it should not have been, so now I've got a bunch of 
what are effectively duplicate records. I'm having trouble picking 
them out so I can manually merge/delete them before changing the 
collation on the field to be case insensitive.


SELECT * FROM mytable group by lower(emailaddress) having 
count(emailaddress)  1


This is ok, but it only shows me the records with lower case addresses 
(I can't tell which case version is the correct one without looking at 
them) when I want to see the records with all cases, that is all of 
'[EMAIL PROTECTED]', '[EMAIL PROTECTED]' and '[EMAIL PROTECTED]'. I'm 
confusing myself with the case sensitivity and self-references!


I think there are about 45 duplicates out of about 200,000.

How can I find these pesky things?
If you need to inspect the dupes, rather than eliminate them on some 
criterion without inspection, this will fetch them:


SELECT LOWER(emailaddress), ...
FROM mytable
GROUP BY LOWER(emailaddress)
having COUNT(*)  1;

PB



thanks,

Marcus
--Marcus Bointon
Synchromedia Limited: Putting you in the picture
[EMAIL PROTECTED] | http://www.synchromedia.co.uk


--MySQL General Mailing List
For list archives: http://lists.mysql.com/mysql
To unsubscribe:
http://lists.mysql.com/[EMAIL PROTECTED]




--No virus found in this incoming message.
Checked by AVG Free Edition.
Version: 7.1.392 / Virus Database: 268.5.5/333 - Release Date: 5/5/2006





--
No virus found in this outgoing message.
Checked by AVG Free Edition.
Version: 7.1.392 / Virus Database: 268.5.5/334 - Release Date: 5/8/2006


--
MySQL General Mailing List
For list archives: http://lists.mysql.com/mysql
To unsubscribe:http://lists.mysql.com/[EMAIL PROTECTED]



Case confusion

2006-05-08 Thread Marcus Bointon
I just noticed that a key field (emailaddress) in my db is case  
sensitive when it should not have been, so now I've got a bunch of  
what are effectively duplicate records. I'm having trouble picking  
them out so I can manually merge/delete them before changing the  
collation on the field to be case insensitive.


SELECT * FROM mytable group by lower(emailaddress) having count 
(emailaddress)  1


This is ok, but it only shows me the records with lower case  
addresses (I can't tell which case version is the correct one without  
looking at them) when I want to see the records with all cases, that  
is all of '[EMAIL PROTECTED]', '[EMAIL PROTECTED]' and '[EMAIL PROTECTED]'.  
I'm confusing myself with the case sensitivity and self-references!


I think there are about 45 duplicates out of about 200,000.

How can I find these pesky things?

thanks,

Marcus
--
Marcus Bointon
Synchromedia Limited: Putting you in the picture
[EMAIL PROTECTED] | http://www.synchromedia.co.uk


--
MySQL General Mailing List
For list archives: http://lists.mysql.com/mysql
To unsubscribe:http://lists.mysql.com/[EMAIL PROTECTED]



Re: Case confusion

2006-05-08 Thread Chris



I would run this query:

SELECT
   *
FROM mytable
WHERE LOWER(emailaddress) IN
   (SELECT
   LOWER(emailaddress)
   FROM mytable
   GROUP BY 1
   HAVING COUNT(emailaddress)  1)

This would show all duplicate emails, I would use the info this displays 
to choose which records to change/keep/delete.


May not be the best way, but it would work.

Chris

Marcus Bointon wrote:
I just noticed that a key field (emailaddress) in my db is case 
sensitive when it should not have been, so now I've got a bunch of 
what are effectively duplicate records. I'm having trouble picking 
them out so I can manually merge/delete them before changing the 
collation on the field to be case insensitive.


SELECT * FROM mytable group by lower(emailaddress) having 
count(emailaddress)  1


This is ok, but it only shows me the records with lower case addresses 
(I can't tell which case version is the correct one without looking at 
them) when I want to see the records with all cases, that is all of 
'[EMAIL PROTECTED]', '[EMAIL PROTECTED]' and '[EMAIL PROTECTED]'. I'm 
confusing myself with the case sensitivity and self-references!


I think there are about 45 duplicates out of about 200,000.

How can I find these pesky things?

thanks,

Marcus
--Marcus Bointon
Synchromedia Limited: Putting you in the picture
[EMAIL PROTECTED] | http://www.synchromedia.co.uk


--MySQL General Mailing List
For list archives: http://lists.mysql.com/mysql
To unsubscribe:
http://lists.mysql.com/[EMAIL PROTECTED]





--
MySQL General Mailing List
For list archives: http://lists.mysql.com/mysql
To unsubscribe:http://lists.mysql.com/[EMAIL PROTECTED]