Re: removing duplicate entries

Rob Wultsch Thu, 07 Aug 2008 07:16:44 -0700

On Thu, Aug 7, 2008 at 2:34 AM, Magnus Smith
<[EMAIL PROTECTED]> wrote:
> Yes I can see you are correct. I tried setting up a little test case
> myself.
>
> CREATE TABLE ACCOUNTACTION (
>         ID INT NOT NULL PRIMARY KEY,
>         ACTIONDATE DATETIME,
>         ACCOUNT_ID INT NOT NULL
>       );
>
> CREATE TABLE ACCOUNTPAYMENTACTION (
>        ID INT NOT NULL PRIMARY KEY,
>        AMOUNT INT
>        );
>
> INSERT INTO ACCOUNTACTION (ID, ACTIONDATE, ACCOUNT_ID)
> VALUES('001', '2008-08-01 02:00:00', '101'),
> ('002', '2008-08-01 02:00:00', '101'),
> ('003', '2008-08-01 02:00:00', '101'),
> ('004', '2008-08-01 02:00:00', '102'),
> ('005', '2008-08-01 02:00:00', '103'),
> ('006', '2008-08-01 02:00:00', '104'),
> ('007', '2008-08-01 02:00:00', '104'),
> ('008', '2008-08-01 02:00:00', '105'),
> ('009', '2008-08-01 03:00:00', '104'),
> ('010', '2008-08-01 03:00:00', '105'),
> ('011', '2008-08-01 02:00:00', '106');
>
> INSERT INTO ACCOUNTPAYMENTACTION (ID, AMOUNT)
> VALUES('001', '1000'),
> ('002', '1000'),
> ('003', '1000'),
> ('004', '1000'),
> ('005', '1000'),
> ('006', '1000'),
> ('007', '1000'),
> ('008', '1000'),
> ('009', '1000'),
> ('010', '1000'),
> ('011', '1000');
>
>
> I got the following query that seems to work on my test case.
>
> I create a union of everything that is not a duplicate and then take the
> ones that are not in this to be the duplicates
>
>
> select ACCOUNTACTION.ID, ACCOUNTACTION.ACCOUNT_ID from ACCOUNTACTION
> where ACCOUNTACTION.ACTIONDATE like '2008-08-01 02:00%'
> and (ACCOUNTACTION.ID, ACCOUNTACTION.ACCOUNT_ID)
> not in
> (select ACCOUNTACTION.ID, ACCOUNTACTION.ACCOUNT_ID from ACCOUNTACTION
> where ACCOUNTACTION.ACTIONDATE like '2008-08-01 02:00%'
> group by ACCOUNTACTION.ACCOUNT_ID
> having count(ACCOUNTACTION.ACCOUNT_ID) > 1
> union
> select ACCOUNTACTION.ID, ACCOUNTACTION.ACCOUNT_ID from ACCOUNTACTION
> where ACCOUNTACTION.ACTIONDATE like '2008-08-01 02:00%'
> group by ACCOUNTACTION.ACCOUNT_ID having count(ACCOUNTACTION.ACCOUNT_ID)
> = 1);
>
>
> The problem is that when I try to apply this to my real database tables
> that are quite large then the query does not return.
>
> I am thinking that there must be a more efficient way to write the
> query?
>
> I would be pleased to hear any suggestions - thanks


Several suggestions:
1. Don't use string comparisons on datatypes that are not strings .
This causes every row to be examined and converted to a string.
2. Sub queries suck in MySQL. Consider converting to joins in general.
2.1 Derived tables in my experience often are not good for
performance. Consider materializing into temporary tables which are
properly indexed.

-- 
Rob Wultsch
[EMAIL PROTECTED]

-- 
MySQL General Mailing List
For list archives: http://lists.mysql.com/mysql
To unsubscribe:    http://lists.mysql.com/[EMAIL PROTECTED]

Re: removing duplicate entries

Reply via email to