Scott:


Rather than make the email address column unique, since you want to allow multiple email address instances, you can make a unique index based on email_address+group. Wouldn't that help, or is that still too simple for your situation?


Regarding temporary tables, from the MySQL manual:
From MySQL 3.23 on, you can use the TEMPORARY keyword when creating a table. A TEMPORARY table is visible only to the current connection, and is dropped automatically when the connection is closed. This means that two different connections can use the same temporary table name without conflicting with each other or with an existing non-TEMPORARY table of the same name. (The existing table is hidden until the temporary table is dropped.) From MySQL 4.0.2 on, you must have the CREATE TEMPORARY TABLES privilege to be able to create temporary tables.


I don't understand well enough how the group by function works, or select distinct. Would altering the table order to be ordered by bounce count DESC be enough to mean that when you do select distinct records, the record that comes first is the record that is selected? I don't know if it works reliably like that or if the selection is more random.

Wes


On Jul 20, 2004, at 6:37 PM, Scott Haneda wrote:

Mysql 4

I have a slightly tougher than usual condition in which I want to remove
duplicates. I am not sure how to deal with this and am looking for your
suggestions.


The table that stores my data is called addresses, it contains among others,
a field called email_address. Within this table emails are grouped on a
unique id, so for example, select email_address from addresses where
group_id = '5' AND user_id = '2' would show me all the addresses that I want
to work on.


The case is that users will always be adding more addresses to this group,
this is for a mailing list manager I am working on. The trouble is that I
suspect users will upload a batch of addresses, then a few weeks later, they
will upload a new set, but they will contain the old set as well. This
would pile up the duplicates in short order, and I don't want multiple
emails sent to the same person over and over again.


My first option is when they upload new addresses, to select and test for
the existence of that address. If it exists, do not add it, otherwise I
will add it. The size of some of these lists are large, in the 10's of
thousands. I suspect this will add too much overhead to the import time. I
can not make the column unique as there is good reason to have the email
address in the column more than once, since they are "groups" of email
addresses.


My thought is to allow the import of all the addresses, allow all the dupes,
then take out the dupes, I suspect this will be faster than a select for
every email address I want to import. I think this involves selecting
distinct() into a temp table, deleting the addresses from the main table,
then selecting into the old table from the temp table and then destroying
the temp table. If anyone can suggest a tricky way to do this with perhaps
a group by clause to simpy remove the dupes in one go, I would love to hear
it.


In regards to the temp table, is it up to me to maintain a unique temp table
name to not collide in the event 2 users were to hit the page at the same
time?



And now, the other "rub"....
Another field in the addresses table I used to track bounced emails, lets
call it 'bounces', which I increment by 1 on every bounce. Here is the
other issue, I can not simply remove the dupes without first determining
which dupe to remove. Basically, I want to remove all dupes where the
bounce count is 0 (default), but if there are more than 2 dupes, I want to
keep the one with the highest bounce count.


Any idea how I should be approaching this?
--
-------------------------------------------------------------
Scott Haneda                                Tel: 415.898.2602
http://www.newgeo.com                       Fax: 313.557.5052
[EMAIL PROTECTED]                            Novato, CA U.S.A.



--
MySQL General Mailing List
For list archives: http://lists.mysql.com/mysql
To unsubscribe: http://lists.mysql.com/[EMAIL PROTECTED]



Reply via email to