Hi,

i've a rather large table (~ 1.800.000 rows) with five CHAR columns - let's say col1, 
col2, ...., col5. Col1has the primary key. The columns col2,col3,col4,col5 hold 
strings of variable length. I need to find duplicate entries that have the same value 
for col2,col3,col4 & col5 but (and thats the tricky part :-)) must use the LIKE 
operator as the columns contain whitespaces.

e.g.

0000000001      xufnc   qj3n5   qcm2    ....
0000000002      x%      qj3n%   qcm2    ....
0000000003      xufnc   qj3n5   qcm2    ....


the rows 1 and 3 are duplicates to each other (both directions), while row 2 is 
duplicate to row 1 and row 3 (single direction).


currently i'm using a java tool that selects all rows from the table and then 
generates SELECT queries for every single one of them. the table has two indexes on 
(col2,col3,col4,col5) and (col3,col2,col4,col5). As I use one query for each row, the 
server is able to pick the most valueable key each time.

SELECT COL1 FROM myTable WHERE COL2 = 'xufnc' AND COL3 = 'qj3n5' AND COL4 = 'qcm2'  
AND ... AND COL1 != '0000000001';
SELECT COL1 FROM myTable WHERE COL2 LIKE 'x%' AND COL3 LIKE 'qj3n%' AND COL4 = 'qcm2' 
AND ... AND COL1 != '0000000002';


well, as you probably think yourself sending 1.800.000 queries to the server sux a lot 
:-) on the other hand the LIKE operator that some of the queries require makes 
choosing the most valueable key essential.

anyone with a better idea to solve this? maybe the whole whitespaces-in-rows-thing can 
be solved differently?

regards
kst





--
MySQL General Mailing List
For list archives: http://lists.mysql.com/mysql
To unsubscribe:    http://lists.mysql.com/[EMAIL PROTECTED]

Reply via email to