Requirement: Given two columns of a table (Column1 and Column2) of length x, return two columns (Column1_Scram and Column2_Scram) such that the distributions of values in Column1 and Column2 are preserved in Column1_Scram and Column2_Scram, but the pairs of values are randomized.
Solution suggested by Shawn Green: Create a table with two columns, and populate this table with random pairs of primary keys picked from the original table. Additionally, allow no duplicate primary keys within either column. Select x rows from this table, and link both primary keys (the primary key pair) back to the original table to get the appropriate number of randomized pairs of Column1 and Column2. He suggests doing the above like this (more or less): OriginalTable PK A B 1 a c 2 a d 3 b e ... CREATE TEMPORARY TABLE IntermediateTable ( PK1 INT NOT NULL, A CHAR(1) NOT NULL, PK2 INT NOT NULL, B CHAR(1) NOT NULL, # UNIQUE INDEX (PK1,A), UNIQUE INDEX (PK2,B) ); INSERT IGNORE INTO IntermediateTable SELECT x.PK, x.A, y.PK, y.B FROM OriginalTable x, OriginalTable y ORDER BY RAND(); SELECT x.A, y.B FROM IntermediateTable INNER JOIN OriginalTable x ON (PK1 = x.PK) INNER JOIN OriginalTable y ON (PK2 = y.PK) LIMIT the_length_of_OriginalTable; The problem with this solution: Its too slow on reasonable sized tables! I never get past the second step with my data after 10 minutes! I have 52699 rows in my 'OriginalTable' leading to ~2.7 billion checks when inserting into the IntermediateTable... or rather 5.4 billion, as I guess it has to check both rows for the UNIQUE key constraint on every attempted insert. Ideally I would like to be able to do several thousand randomizations over my data, and at 10 mins a pop that would take all week. (assuming the query was about to finish when I killed it after 10 mins.) Is their a faster way to do this randomization in SQL? Am I doing something really dumb that was never intended by Shawn? I can easily get the data I need with a quick step into perl, but it would be really neat if I could do all this in MySQL. I can imagine a general way to create 'random' joins (over scrambled data) would have some interesting applications. Dan. -- MySQL General Mailing List For list archives: http://lists.mysql.com/mysql To unsubscribe: http://lists.mysql.com/[EMAIL PROTECTED]