[EMAIL PROTECTED] wrote:
Hi!
I have a problem that may only be solvable using a programming approach, but I'm wondering if there is an SQL method in MySQL.
The problem is that I have a table that looks like this:
CREATE TABLE `test` ( `uid` mediumint(6) NOT NULL auto_increment,
`perid` varchar(10) default NULL, `prog` char(2) default NULL,
`startdt` date default NULL,
`stopdt` date default NULL, PRIMARY KEY (`uid`) ) TYPE=MyISAM AUTO_INCREMENT=1 ;
Please see a listing (below) of some records, for this is my problem. The
perid identifies people (person id is perid). The next code says what
program they are in (always a 2 digit code). Then there is a startdate and
a stopdate. Some records (id# 471) are repeated because people are in the
same program and the startdates are on the same day, and the stopdates are
the only way to cull out the "repeats". When I say "repeats" I mean they
have the same perid, program and startdate.
This doesn't make sense to me. Are you saying that in your data model it is appropriate for the same person to start the same program on the same day 2 or 3 times? Or do you mean that it isn't, but your application has allowed that to happen, and now you are trying to clean up? It appears that you are changing a person's stopdt by adding a new row, rather than by updating the existing row. Ususally, that's not what you want.
Going forward, after you've removed the duplicates, I'd suggest adding a unique constraint to prevent duplicates:
ALTER TABLE test ADD UNIQUE per_prog_start (perid, prog, startdt);
Then change the stopdt by updating the existing row, rather than adding a new row.
I would very much like to get a listing for each perid that shows their perid, program, startdate and then the latest (only the latest) stopdate. This would give unique records if I were only to examine the perid, program code and startdate.
1 222 JP 2004-01-26 2006-01-26 2 38 SW 2004-01-21 2004-04-21 3 471 BP 2004-01-09 2004-06-01 4 471 BP 2004-01-09 2004-06-06 5 471 JP 2004-06-01 2004-06-08 6 471 JP 2004-06-08 2006-06-08 7 471 JP 2004-06-01 2006-06-15 8 471 JP 2004-11-10 2006-11-10 9 440 OT 2004-02-19 2004-04-01 10 479 JP 2003-11-14 2003-12-02
So is there some way to do this using an SQL query?
Yes.
SELECT perid, prog, startdt, MAX(stopdt) FROM test GROUP BY perid, prog, startdt;
Note that you can't get the uid with this simple query, because it isn't one of the grouped columns. If you need the uid, as I expect you do, it becomes a little more complicated.
CREATE TEMPORARY TABLE tmp ( `perid` varchar(10) default NULL, `prog` char(2) default NULL, `startdt` date default NULL, `stopdt` date default NULL );
LOCK TABLES test READ;
INSERT INTO tmp SELECT perid, prog, startdt, MAX(stopdt) FROM test GROUP BY perid, prog, startdt;
SELECT test.* FROM test JOIN tmp USING (perid, prog, startdt, stopdt) ORDER BY uid;
UNLOCK TABLES; DROP TABLE tmp;
This method is documented in the manual <http://dev.mysql.com/doc/mysql/en/example-Maximum-column-group-row.html>, along with a simpler version using subqueries if you have mysql 4.1.
Of course, that just selects the rows you want. If you want to remove the duplicates from test, you have a few options. If there aren't too many, you could
SELECT perid, prog, startdt FROM test GROUP BY perid, prog, startdt HAVING COUNT(stopdt) > 1;
to find the groups with duplicates, then manually remove the bad rows.
To have mysql do it, we modify the temp table solution from above:
CREATE TEMPORARY TABLE tmp ( `perid` varchar(10) default NULL, `prog` char(2) default NULL, `startdt` date default NULL, `stopdt` date default NULL );
LOCK TABLES test WRITE;
INSERT INTO tmp SELECT perid, prog, startdt, MAX(stopdt) FROM test GROUP BY perid, prog, startdt;
DELETE test FROM test LEFT JOIN tmp USING (perid, prog, startdt, stopdt) WHERE tmp.perid IS NULL;
UNLOCK TABLES; DROP TABLE tmp;
Thanks so much for looking at this!
Cheers!
-Alex
Michael
-- MySQL General Mailing List For list archives: http://lists.mysql.com/mysql To unsubscribe: http://lists.mysql.com/[EMAIL PROTECTED]