On Mon, 11 Apr 2005, Dan Bolser wrote:

>
>Requirement: 
>
>Given two columns of a table (Column1 and Column2) of length x, return two
>columns (Column1_Scram and Column2_Scram) such that the distributions of
>values in Column1 and Column2 are preserved in Column1_Scram and
>Column2_Scram, but the pairs of values are randomized.
>
>
>Solution suggested by Shawn Green: 
>
>Create a table with two columns, and populate this table with random pairs
>of primary keys picked from the original table. Additionally, allow no
>duplicate primary keys within either column. Select x rows from this
>table, and link both primary keys (the primary key pair) back to the
>original table to get the appropriate number of randomized pairs of
>Column1 and Column2.
>
>He suggests doing the above like this (more or less):
>
>OriginalTable
>PK     A       B
>1      a       c
>2      a       d
>3      b       e
>...
>
>CREATE TEMPORARY TABLE IntermediateTable (
>  PK1  INT     NOT NULL,
>  A    CHAR(1) NOT NULL,
>  PK2  INT     NOT NULL,
>  B    CHAR(1) NOT NULL,
>  #
>  UNIQUE INDEX (PK1,A),
>  UNIQUE INDEX (PK2,B)
>);
>
>INSERT IGNORE INTO IntermediateTable
>SELECT 
>  x.PK, x.A,
>  y.PK, y.B
>FROM 
>  OriginalTable x,
>  OriginalTable y
>ORDER BY
>  RAND();
>
>SELECT 
>  x.A,
>  y.B
>FROM
>  IntermediateTable 
>INNER JOIN 
>  OriginalTable x ON (PK1 = x.PK) INNER JOIN
>  OriginalTable y ON (PK2 = y.PK)
>LIMIT 
>  the_length_of_OriginalTable;
>
>
>The problem with this solution: 
>
>Its too slow on reasonable sized tables! 


Their is also a problem with the way RAND() works...

SELECT
  x.PK, x.A,
  y.PK, y.B
FROM
  OriginalTable x,
  OriginalTable y
ORDER BY
  RAND()
LIMIT 
  1;

This takes soooooo long to pick a random row. Some cleaver 'LIMIT'
optimization could pick a results set almost instantly, instead of taking
in excess of half an hour with ~50,000 rows.





-- 
MySQL General Mailing List
For list archives: http://lists.mysql.com/mysql
To unsubscribe:    http://lists.mysql.com/[EMAIL PROTECTED]

Reply via email to