Re: Case confusion
On 9 May 2006, at 02:22, Chris wrote: I would run this query: SELECT * FROM mytable WHERE LOWER(emailaddress) IN (SELECT LOWER(emailaddress) FROM mytable GROUP BY 1 HAVING COUNT(emailaddress) 1) This would show all duplicate emails, I would use the info this displays to choose which records to change/keep/delete. May not be the best way, but it would work. I'm giving it a go, but it's been running for about 4 hours at 60% CPU so far! Would it be quicker to tell it to use a case insensitive collation to locate the duplicates? Marcus -- Marcus Bointon Synchromedia Limited: Putting you in the picture [EMAIL PROTECTED] | http://www.synchromedia.co.uk -- MySQL General Mailing List For list archives: http://lists.mysql.com/mysql To unsubscribe:http://lists.mysql.com/[EMAIL PROTECTED]
Re: Case confusion
On 9 May 2006, at 13:12, Marcus Bointon wrote: I'm giving it a go, but it's been running for about 4 hours at 60% CPU so far! Would it be quicker to tell it to use a case insensitive collation to locate the duplicates? I managed to come up with a variation on my original attempt that got me what I needed. Thanks, Marcus -- Marcus Bointon Synchromedia Limited: Putting you in the picture [EMAIL PROTECTED] | http://www.synchromedia.co.uk -- MySQL General Mailing List For list archives: http://lists.mysql.com/mysql To unsubscribe:http://lists.mysql.com/[EMAIL PROTECTED]
Re: Re: Case confusion
Though I do not know what your schema is, I think you may try to decompose it into several BCNFs which can erase the redundancy of emailaddress. - Original Message - From: Marcus Bointon [EMAIL PROTECTED] To: mysql@lists.mysql.com Cc: Chris [EMAIL PROTECTED] Sent: Tuesday, May 09, 2006 8:12 PM Subject: Re: Case confusion On 9 May 2006, at 02:22, Chris wrote: I would run this query: SELECT * FROM mytable WHERE LOWER(emailaddress) IN (SELECT LOWER(emailaddress) FROM mytable GROUP BY 1 HAVING COUNT(emailaddress) 1) This would show all duplicate emails, I would use the info this displays to choose which records to change/keep/delete. May not be the best way, but it would work. I'm giving it a go, but it's been running for about 4 hours at 60% CPU so far! Would it be quicker to tell it to use a case insensitive collation to locate the duplicates? Marcus -- Marcus Bointon Synchromedia Limited: Putting you in the picture [EMAIL PROTECTED] | http://www.synchromedia.co.uk
Re: Case confusion
Marcus, I just noticed that a key field (emailaddress) in my db is case sensitive when it should not have been, so now I've got a bunch of what are effectively duplicate records. I'm having trouble picking them out so I can manually merge/delete them before changing the collation on the field to be case insensitive. SELECT * FROM mytable group by lower(emailaddress) having count(emailaddress) 1 This is ok, but it only shows me the records with lower case addresses (I can't tell which case version is the correct one without looking at them) when I want to see the records with all cases, that is all of '[EMAIL PROTECTED]', '[EMAIL PROTECTED]' and '[EMAIL PROTECTED]'. I'm confusing myself with the case sensitivity and self-references! I think there are about 45 duplicates out of about 200,000. How can I find these pesky things? If you need to inspect the dupes, rather than eliminate them on some criterion without inspection, this will fetch them: SELECT LOWER(emailaddress), ... FROM mytable GROUP BY LOWER(emailaddress) having COUNT(*) 1; PB thanks, Marcus --Marcus Bointon Synchromedia Limited: Putting you in the picture [EMAIL PROTECTED] | http://www.synchromedia.co.uk --MySQL General Mailing List For list archives: http://lists.mysql.com/mysql To unsubscribe: http://lists.mysql.com/[EMAIL PROTECTED] --No virus found in this incoming message. Checked by AVG Free Edition. Version: 7.1.392 / Virus Database: 268.5.5/333 - Release Date: 5/5/2006 -- No virus found in this outgoing message. Checked by AVG Free Edition. Version: 7.1.392 / Virus Database: 268.5.5/334 - Release Date: 5/8/2006 -- MySQL General Mailing List For list archives: http://lists.mysql.com/mysql To unsubscribe:http://lists.mysql.com/[EMAIL PROTECTED]
Case confusion
I just noticed that a key field (emailaddress) in my db is case sensitive when it should not have been, so now I've got a bunch of what are effectively duplicate records. I'm having trouble picking them out so I can manually merge/delete them before changing the collation on the field to be case insensitive. SELECT * FROM mytable group by lower(emailaddress) having count (emailaddress) 1 This is ok, but it only shows me the records with lower case addresses (I can't tell which case version is the correct one without looking at them) when I want to see the records with all cases, that is all of '[EMAIL PROTECTED]', '[EMAIL PROTECTED]' and '[EMAIL PROTECTED]'. I'm confusing myself with the case sensitivity and self-references! I think there are about 45 duplicates out of about 200,000. How can I find these pesky things? thanks, Marcus -- Marcus Bointon Synchromedia Limited: Putting you in the picture [EMAIL PROTECTED] | http://www.synchromedia.co.uk -- MySQL General Mailing List For list archives: http://lists.mysql.com/mysql To unsubscribe:http://lists.mysql.com/[EMAIL PROTECTED]
Re: Case confusion
I would run this query: SELECT * FROM mytable WHERE LOWER(emailaddress) IN (SELECT LOWER(emailaddress) FROM mytable GROUP BY 1 HAVING COUNT(emailaddress) 1) This would show all duplicate emails, I would use the info this displays to choose which records to change/keep/delete. May not be the best way, but it would work. Chris Marcus Bointon wrote: I just noticed that a key field (emailaddress) in my db is case sensitive when it should not have been, so now I've got a bunch of what are effectively duplicate records. I'm having trouble picking them out so I can manually merge/delete them before changing the collation on the field to be case insensitive. SELECT * FROM mytable group by lower(emailaddress) having count(emailaddress) 1 This is ok, but it only shows me the records with lower case addresses (I can't tell which case version is the correct one without looking at them) when I want to see the records with all cases, that is all of '[EMAIL PROTECTED]', '[EMAIL PROTECTED]' and '[EMAIL PROTECTED]'. I'm confusing myself with the case sensitivity and self-references! I think there are about 45 duplicates out of about 200,000. How can I find these pesky things? thanks, Marcus --Marcus Bointon Synchromedia Limited: Putting you in the picture [EMAIL PROTECTED] | http://www.synchromedia.co.uk --MySQL General Mailing List For list archives: http://lists.mysql.com/mysql To unsubscribe: http://lists.mysql.com/[EMAIL PROTECTED] -- MySQL General Mailing List For list archives: http://lists.mysql.com/mysql To unsubscribe:http://lists.mysql.com/[EMAIL PROTECTED]