On Mon, 11 Apr 2005, Dan Bolser wrote: > >Requirement: > >Given two columns of a table (Column1 and Column2) of length x, return two >columns (Column1_Scram and Column2_Scram) such that the distributions of >values in Column1 and Column2 are preserved in Column1_Scram and >Column2_Scram, but the pairs of values are randomized. > > >Solution suggested by Shawn Green: > >Create a table with two columns, and populate this table with random pairs >of primary keys picked from the original table. Additionally, allow no >duplicate primary keys within either column. Select x rows from this >table, and link both primary keys (the primary key pair) back to the >original table to get the appropriate number of randomized pairs of >Column1 and Column2. > >He suggests doing the above like this (more or less): > >OriginalTable >PK A B >1 a c >2 a d >3 b e >... > >CREATE TEMPORARY TABLE IntermediateTable ( > PK1 INT NOT NULL, > A CHAR(1) NOT NULL, > PK2 INT NOT NULL, > B CHAR(1) NOT NULL, > # > UNIQUE INDEX (PK1,A), > UNIQUE INDEX (PK2,B) >); > >INSERT IGNORE INTO IntermediateTable >SELECT > x.PK, x.A, > y.PK, y.B >FROM > OriginalTable x, > OriginalTable y >ORDER BY > RAND(); > >SELECT > x.A, > y.B >FROM > IntermediateTable >INNER JOIN > OriginalTable x ON (PK1 = x.PK) INNER JOIN > OriginalTable y ON (PK2 = y.PK) >LIMIT > the_length_of_OriginalTable; > > >The problem with this solution: > >Its too slow on reasonable sized tables!
Their is also a problem with the way RAND() works... SELECT x.PK, x.A, y.PK, y.B FROM OriginalTable x, OriginalTable y ORDER BY RAND() LIMIT 1; This takes soooooo long to pick a random row. Some cleaver 'LIMIT' optimization could pick a results set almost instantly, instead of taking in excess of half an hour with ~50,000 rows. -- MySQL General Mailing List For list archives: http://lists.mysql.com/mysql To unsubscribe: http://lists.mysql.com/[EMAIL PROTECTED]