On Tuesday 28 June 2005 11:20 am, C. Ed Felt wrote: > Fellow Open Software Supporters: > > I would like to ask for some input on a database issue I am working on. > > I am getting repeat records in my billing database for a VoIP company. > There is no fix for this as it is a SIP, (Session Initiation Protocol), > specific problem so I am working on a post process solution. > > I am writing a script in Python and the database is MySQL. The > programming language doesn't matter since I am only looking for a > suggestion on my sql (MySQL) query. My current plan is to pull all > records (called Call Data Records or CDRs) from the database for the > last 24 hours in to a big array. If any of the records match on a field > called 'sessid' then they are copies (though not exact copies as there > can be timestamp differences). > > So, in short, (since a paragraph will get confusing): > > 1. SELECT all records in the CDR table after a requested date (usually > 24 hours). > 2. Store these records in a huge array. > 3. Find all repeats on the 'sessid' field and store this in an array. > 4. Delete all repeats (save one copy of each repeat CDR). > > Is there a MySQL query, (version 3), to select all rows that have one or > more matching rows on a specific field ('sessid')? This would > essentially combine steps 1, 2 and 3 in one MySQL query.
Yes. There a few ways, some depending on your version of MySQL. If you have 4.1 or newer with subselects you can do: You'll need a primary key column in your table to identify rows uniquely for this method: DELETE t.* FROM <table> t LEFT JOIN ( SELECT sessid, MIN(<uid>) AS <uid>, # any expression pick the uid of the row to keep COUNT(*) AS cnt FROM <table> GROUP BY sessid HAVING cnt > 1 ) dups USING(sessid) WHERE t.uid != dups.uid; With MySQL < 4.1, you can simply replace the subquery with a temp table populated with the same--only one extra query. If you don't have a primary key column in your table, it gets a bit more difficult--and might require an iterative approach, unless you have some other way of singling out rows, such as min(timestamp) != timestamp. -- Respectfully, Nicholas Leippe Sales Team Automation, LLC 1335 West 1650 North, Suite C Springville, UT 84663 +1 801.853.4090 http://www.salesteamautomation.com .===================================. | This has been a P.L.U.G. mailing. | | Don't Fear the Penguin. | | IRC: #utah at irc.freenode.net | `==================================='