RE: removing duplicate entries

Magnus Smith Thu, 07 Aug 2008 07:34:45 -0700

I think got it in the end by doing a union and a join.

delete AA, PA from ACCOUNTACTION AA, ACCOUNTPAYMENTACTION PA 
where AA.ID = PA.ID and AA.ID in
    (select D.ID from
 
        (select A1.ID from ACCOUNTACTION A1 
 
        left join
 
        (
        select * from ACCOUNTACTION A2
        where A2.ACTIONDATE like '2008-08-01 02:00%'
        group by A2.ACCOUNT_ID
        having count(A2.ACCOUNT_ID) > 1
 
        union
 
        select * from ACCOUNTACTION A3
        where A3.ACTIONDATE like '2008-08-01 02:00%'
        group by A3.ACCOUNT_ID having count(A3.ACCOUNT_ID) = 1
        ) as U1
 
        on A1.ID = U1.ID
        where A1.ACTIONDATE like '2008-08-01 02:00%'
        and U1.ID is NULL
    ) as D
);


Thanks for the pointers ;-)




-----Original Message-----
From: Magnus Smith [mailto:[EMAIL PROTECTED] 
Sent: 07 August 2008 10:35
To: Ananda Kumar
Cc: mysql@lists.mysql.com
Subject: RE: removing duplicate entries

Yes I can see you are correct. I tried setting up a little test case
myself.
 
CREATE TABLE ACCOUNTACTION (
         ID INT NOT NULL PRIMARY KEY,
         ACTIONDATE DATETIME,
         ACCOUNT_ID INT NOT NULL
       );
 
CREATE TABLE ACCOUNTPAYMENTACTION (
        ID INT NOT NULL PRIMARY KEY,
        AMOUNT INT
        );
 
INSERT INTO ACCOUNTACTION (ID, ACTIONDATE, ACCOUNT_ID)
VALUES('001', '2008-08-01 02:00:00', '101'),
('002', '2008-08-01 02:00:00', '101'),
('003', '2008-08-01 02:00:00', '101'),
('004', '2008-08-01 02:00:00', '102'),
('005', '2008-08-01 02:00:00', '103'),
('006', '2008-08-01 02:00:00', '104'),
('007', '2008-08-01 02:00:00', '104'),
('008', '2008-08-01 02:00:00', '105'),
('009', '2008-08-01 03:00:00', '104'),
('010', '2008-08-01 03:00:00', '105'),
('011', '2008-08-01 02:00:00', '106');
 
INSERT INTO ACCOUNTPAYMENTACTION (ID, AMOUNT)
VALUES('001', '1000'),
('002', '1000'),
('003', '1000'),
('004', '1000'),
('005', '1000'),
('006', '1000'),
('007', '1000'),
('008', '1000'),
('009', '1000'),
('010', '1000'),
('011', '1000');
 
 
I got the following query that seems to work on my test case.
 
I create a union of everything that is not a duplicate and then take the
ones that are not in this to be the duplicates
 
 
select ACCOUNTACTION.ID, ACCOUNTACTION.ACCOUNT_ID from ACCOUNTACTION 
where ACCOUNTACTION.ACTIONDATE like '2008-08-01 02:00%' 
and (ACCOUNTACTION.ID, ACCOUNTACTION.ACCOUNT_ID) 
not in
(select ACCOUNTACTION.ID, ACCOUNTACTION.ACCOUNT_ID from ACCOUNTACTION
where ACCOUNTACTION.ACTIONDATE like '2008-08-01 02:00%'
group by ACCOUNTACTION.ACCOUNT_ID
having count(ACCOUNTACTION.ACCOUNT_ID) > 1
union
select ACCOUNTACTION.ID, ACCOUNTACTION.ACCOUNT_ID from ACCOUNTACTION
where ACCOUNTACTION.ACTIONDATE like '2008-08-01 02:00%'
group by ACCOUNTACTION.ACCOUNT_ID having count(ACCOUNTACTION.ACCOUNT_ID)
= 1);
 
 
The problem is that when I try to apply this to my real database tables
that are quite large then the query does not return. 
 
I am thinking that there must be a more efficient way to write the
query? 
 
I would be pleased to hear any suggestions - thanks
 
 
 

________________________________

From: Ananda Kumar [mailto:[EMAIL PROTECTED] 
Sent: 06 August 2008 13:51
To: Magnus Smith
Cc: mysql@lists.mysql.com
Subject: Re: removing duplicate entries


I just did a test case here
 

select * from amc_25;
+------+
| id   |
+------+
|    2 |
|   14 |
|    1 |
|    2 |
+------+
4 rows in set (0.01 sec)

select id from amc_25 group by id having count(id) >1 and id!=min(id);

Empty set (0.00 sec)

 

It does not give me any rows. 

R u sure the rows returned, are the one you want to keep are indeed
duplicates 



 
On 8/6/08, Magnus Smith <[EMAIL PROTECTED]> wrote: 

        When I try the first suggestion (i) then I get all the 1682
duplicate rows.  The thing is that I need to keep the originals which
are the ones with the lowest ACCOUNTACTION.ID <http://accountaction.id/>
value.
         
        The second suggestion (ii) gives me 563 rows that are the
duplicates with the lowest ACCOUNTACTION.ID <http://accountaction.id/>
which are the ones I wish to keep
         
        So the ones I want to delete are the ones in (i) and not (ii)
         
        When I use 
        
         
        select ACCOUNTACTION.ID <http://accountaction.id/>  from
ACCOUNTACTION
        where ACCOUNTACTION.ACTIONDATE like '2008-08-01 02:00%'
        group by ACCOUNTACTION.ACCOUNT_ID
        having (count(ACCOUNTACTION.ACCOUNT_ID) > 1 
        and ACCOUNTACTION.ID <http://accountaction.id/>  !=
min(ACCOUNTACTION.ID <http://accountaction.id/> ));
         
        then I get 1119 rows which is all the duplicates in (i) less the
originals in (ii)
         
         
        The problem I'm having is using this in a delete statement.

         
________________________________

        From: Ananda Kumar [mailto:[EMAIL PROTECTED] 
        Sent: 06 August 2008 10:11
        To: Magnus Smith
        Cc: mysql@lists.mysql.com
        Subject: Re: removing duplicate entries
        
         
        
        I doubt the belwo sql will give you duplcates
         
        select ACCOUNTACTION.ID <http://accountaction.id/>  from
ACCOUNTACTION
        where ACCOUNTACTION.ACTIONDATE like '2008-08-01 02:00%'
        group by ACCOUNTACTION.ACCOUNT_ID
        having (count(ACCOUNTACTION.ACCOUNT_ID) > 1 and ACCOUNTACTION.ID
<http://accountaction.id/>  !=
        min(ACCOUNTACTION.ID <http://accountaction.id/> ));
         
        The reason being, for duplicates records accountaction.id
<http://accountaction.id/>  will always equal to min(accountaction.id
<http://accountaction.id/> ).
         
         
        try this
        select ACCOUNTACTION.ID <http://accountaction.id/>  from
ACCOUNTACTION
        where ACCOUNTACTION.ACTIONDATE like '2008-08-01 02:00%'
        group by ACCOUNTACTION.ACCOUNT_ID
        having (count(ACCOUNTACTION.ACCOUNT_ID) > 1
         
        or 
         
        select ACCOUNTACTION.ID <http://accountaction.id/>  from
ACCOUNTACTION
        where ACCOUNTACTION.ACTIONDATE like '2008-08-01 02:00%'
        group by ACCOUNTACTION.ACCOUNT_ID
        having (count(ACCOUNTACTION.ACCOUNT_ID) > 1 and ACCOUNTACTION.ID
<http://accountaction.id/> =min(accountaction.id
<http://accountaction.id/> );
         
        I would use the first select statement.
         

         
        On 8/6/08, Magnus Smith <[EMAIL PROTECTED]> wrote: 

                I have the following two tables
                
                ACCOUNTACTION
        
+-------------------+--------------+------+-----+---------+-------+
                | Field             | Type         | Null | Key |
Default | Extra |
        
+-------------------+--------------+------+-----+---------+-------+
                | ID                | bigint(20)   | NO   | PRI |
|       |
                | AccountActionType | varchar(31)  | YES  |     | NULL
|       |
                | DESCRIPTION       | varchar(255) | YES  |     | NULL
|       |
                | ACTIONDATE        | datetime     | YES  |     | NULL
|       |
                | ACCOUNT_ID        | bigint(20)   | YES  | MUL | NULL
|       |
        
+-------------------+--------------+------+-----+---------+-------+
                
                and
                
                ACCOUNTPAYMENTACTION
        
+---------------+------------+------+-----+---------+-------+
                | Field         | Type       | Null | Key | Default |
Extra |
        
+---------------+------------+------+-----+---------+-------+
                | ID            | bigint(20) | NO   | PRI |         |
|
                | AMOUNTINPENCE | bigint(20) | YES  |     | NULL    |
|
        
+---------------+------------+------+-----+---------+-------+
                
                ACCOUNTPAYMENTACTION shares the primary key with
ACCOUNTACTION
                
                I need to remove duplicate entries that occured at a
specific time in
                ACCOUNTACTION I then plan to remove the rows in
ACCOUNTPAYMENTACTION
                that are no longer referenced in ACCOUNTACTION by using
an outer join
                
                I can select the duplicate records in ACCOUNTACTION
using
                
                select ACCOUNTACTION.ID <http://accountaction.id/>  from
ACCOUNTACTION
                where ACCOUNTACTION.ACTIONDATE like '2008-08-01 02:00%'
                group by ACCOUNTACTION.ACCOUNT_ID
                having (count(ACCOUNTACTION.ACCOUNT_ID) > 1 and
ACCOUNTACTION.ID <http://accountaction.id/>  !=
                min(ACCOUNTACTION.ID <http://accountaction.id/> ));
                
                I am trying to delete these records but am having
trouble with the sql
                delete
                
                I tried the following but nothing happened
                
                delete ACCOUNTACTION where ACCOUNTACTION.ID
<http://accountaction.id/>  in
                (select ACCOUNTACTION.ID <http://accountaction.id/>
from ACCOUNTACTION
                where ACCOUNTACTION.ACTIONDATE like '2008-08-01 02:00%'
                group by ACCOUNTACTION.ACCOUNT_ID
                having (count(ACCOUNTACTION.ACCOUNT_ID) > 1 and
ACCOUNTACTION.ID <http://accountaction.id/>  !=
                min(ACCOUNTACTION.ID <http://accountaction.id/> )));
                
                Can anyone help me?
                





--
MySQL General Mailing List
For list archives: http://lists.mysql.com/mysql
To unsubscribe:    http://lists.mysql.com/[EMAIL PROTECTED]

RE: removing duplicate entries

Reply via email to