Re: Eliminating duplicates from self join results

Joerg Bruehe Wed, 12 Dec 2007 08:41:03 -0800

Hi Yashesh, all !


Yashesh Bhatia wrote:

Hello:

    I have the following table

select * from addressbook
+----+--------------------+
| id | email              |
+----+--------------------+
|  1 | [EMAIL PROTECTED] |
|  2 | [EMAIL PROTECTED] |
|  3 | [EMAIL PROTECTED] |
+----+--------------------+
3 rows in set (0.00 sec)

Now i wanted to find a list of duplicate contacts wherein i can get
the 1st contact with same email
and merge with the others. so in the above case id 1 has duplicates 2
and 3, 2 has 1 & 3 and 3 and 1 & 2.

I'm only interested in getting the first set of duplicates i.e. "1 has
duplicates 2 & 3".

[[...]]

select DISTINCT if(t1.id < t2.id, t1.id, t2.id) as id1,
       if(t1.id < t2.id, t1.email, t2.email) as email1,
       if(t1.id < t2.id, t2.id, t1.id) as id2,
       if(t1.id < t2.id, t2.email, t1.email) as email2
from addressbook t1, addressbook t2
where t1.email = t2.email and t1.id != t2.id
order by t1.id


Using                       and t1.id <  t2.id

would make your life a lot easier, allow you to drop the "if" and eventhe "distinct" (assuming your "id" values are unique).

+-----+--------------------+-----+--------------------+
| id1 | email1             | id2 | email2             |
+-----+--------------------+-----+--------------------+
|   1 | [EMAIL PROTECTED] |   2 | [EMAIL PROTECTED] |
|   1 | [EMAIL PROTECTED] |   3 | [EMAIL PROTECTED] |
|   2 | [EMAIL PROTECTED] |   3 | [EMAIL PROTECTED] |
+-----+--------------------+-----+--------------------+

I'm stuck trying to get a query that will give me only
+-----+--------------------+-----+--------------------+
| id1 | email1             | id2 | email2             |
+-----+--------------------+-----+--------------------+
|   1 | [EMAIL PROTECTED] |   2 | [EMAIL PROTECTED] |
|   1 | [EMAIL PROTECTED] |   3 | [EMAIL PROTECTED] |
+-----+--------------------+-----+--------------------+

Any help, feeback is deeply appreciated. Thanks a bunch in advance.

Tasks related to duplicate values often might be solved by usinggrouping and groupwise counting: "... GROUP BY ... HAVING COUNT(*) ..."

If you don't insist on getting it in this tabular form, you might use"group_concat()". Try along these lines (untested):


SELECT MIN(id), email, GROUP_CONCAT(id)
   FROM addressbook
   GROUP BY email
   HAVING COUNT(*) > 1

This would repeat the minimum id ("1" in your example data) in theconcatenated list.



For tabular form, try along these lines (untested):

SELECT MIN(id), email, id
   FROM addressbook
   WHERE id > MIN(id)
   GROUP BY email
   HAVING COUNT(*) > 1


I wish you success,
Joerg

--
Joerg Bruehe, Senior Production Engineer
MySQL AB, www.mysql.com

--
MySQL General Mailing List
For list archives: http://lists.mysql.com/mysql
To unsubscribe:    http://lists.mysql.com/[EMAIL PROTECTED]

Re: Eliminating duplicates from self join results

Reply via email to