Hi, 

I have a request for a UDF that I would find really useful. I am
duplicating this question on the community list as well as the MySQL list,
because I am not sure where best to make this kind of request (see the
previous post http://lists.mysql.com/community/97).

I think the following syntax would be really cool...

SELECT 
  a.*,
  b.*
FROM
  table_one a 
RANDOM JOIN       -- <-- :)
  table_two b
USING 
  (ID)
;

Lets say table_one and table_two have a primary key called ID in common
(although of course that isn't necessary).

The idea is that the RANDOM JOIN would first calculate the INNER JOIN,
then scramble the 'links' (or rows) between the two tables. The result
would be the same number of rows overall, the same number of distinct
a.ID's and b.ID's, but those a.ID's and b.ID's would be randomly
associated with the marginals given by the correct inner join.

Hopefully that makes sense.

I think this function would be really useful for statistical analysis of
scientific data within MySQL (using randomized versions of the
associations within the tables).

Not sure if the above syntax is optimal, because I would like to be able
to do things like this...

table_one
Dept    Person  Gender
A       1       M
A       2       F
A       3       M
B       4       F
B       5       M
B       6       F

table_two
Dept    Person  Spending
A       1       10
A       2       20
A       3       30
B       4       40
B       5       50
B       6       60

SELECT 
  Dept,
  Gender,
  AVG(Spending)
FROM
  table_one a
INNER JOIN
  table_two b
USING 
  (Dept,Person)
GROUP BY
  Dept, Gender
;


With the above query (which I hope is sensible) I would like to keep the
departments fixed, and randomize the genders of the people in the
departments (keeping the number of each sex in each department the same).

So we could INNER JOIN using Dept and then RANDOM JOIN (as described
above) using Person - all in one 'JOIN'. All else being the same this
should randomize the Gender, but keep the marginals.

I guess this is overly complex given the problem, and it actually raises
more questions instantly (about statistical analysis), but the function is
basic (I hope), and like all SQL it is the simplicity that allows you to
build complex and robust statements.

Does the random join make any sense? Is this UDF material?

Any feedback is welcome,

Dan.


-- 
MySQL General Mailing List
For list archives: http://lists.mysql.com/mysql
To unsubscribe:    http://lists.mysql.com/[EMAIL PROTECTED]

Reply via email to