Indexes to speed up a duplicates query.

Tom Cunningham Sun, 27 Jul 2003 19:39:36 -0700

OK, say you're trying to find all the non-exact duplicates in a table, and especially
do it with a single query, so that the check can be part of a user interface.



Then naturally you're going to have a slow query, because it has to compare
every row with every other row.

You ought to be able to cut out exactly half of the records in the second table,
by just saying:

 SELECT * FROM TABLE
 LEFT JOIN users dups ON users.created<dups.created

But I can't get MySQL to use the index I have on 'created' (same results for
integer-primary key, & a string field).

My best guess why not is that (according to the manual) "MySQL will not use
an index when ... the use of the index would require MySQL to access more than
30% of the rows in the table" - is this the reason?


Another option: say that I thought that my duplicates will always have the same
letter in their surname, then I could make an index on the first letter of the
surname, and make the join just on *that*. This should make the number of rows
needed up to 26 times less, right?

But exactly how is this done? MySQL won't use my index on the following query:


 SELECT * FROM TABLE
 LEFT JOIN users dups 
   ON SUBSTRING(users.name,0,1)=SUBSTRING(dups.name,0,1)

Thank you all in advance.

Tom Cunningham.

-- 
MySQL General Mailing List
For list archives: http://lists.mysql.com/mysql
To unsubscribe:    http://lists.mysql.com/[EMAIL PROTECTED]

Indexes to speed up a duplicates query.

Reply via email to