On Thu, 7 Apr 2005, Sean Nolan wrote: >I think you'll find you can do what you want with a cross join. A cross join >will join every row from the first table with every row from the second >table. It will not randomly do this, so you'd have to be creative in >"randomly" selecting rows in the WHERE clause. Here's how you could do this >with your data, pardon the highly original and very scientific >randomization, but it is just an example :-)
Perhaps I don't understand your "randomization" (because I really don't understand it ;), but I don't think a CROSS join does the trick, because I want to randomly pick the same rows from either side of the join that I would have gotten with an INNER JOIN using the 1:1 primary key between the two tables. This is analagous to sampling without replacement. If I do a cross join and then just restrict the number of rows returned (hey! I just worked out what you are doing below) I don't necessarily get the same 'marginals' (or to randomly pick the same rows from either side of the join) . This is analagous to sampling with replacement. Do you see what I mean? > >SELECT > Dept, > Gender, > AVG(Spending) >FROM > table_one a >CROSS JOIN > table_two b >WHERE (a.Person * b.Person) % 4 = 3 >GROUP BY > Dept, Gender; > >Sean > >----- Original Message ------ > >Hi, > >I have a request for a UDF that I would find really useful. I am >duplicating this question on the community list as well as the MySQL list, >because I am not sure where best to make this kind of request (see the >previous post http://lists.mysql.com/community/97). > >I think the following syntax would be really cool... > >SELECT > a.*, > b.* >FROM > table_one a >RANDOM JOIN -- <-- :) > table_two b >USING > (ID) >; > >Lets say table_one and table_two have a primary key called ID in common >(although of course that isn't necessary). > >The idea is that the RANDOM JOIN would first calculate the INNER JOIN, >then scramble the 'links' (or rows) between the two tables. The result >would be the same number of rows overall, the same number of distinct >a.ID's and b.ID's, but those a.ID's and b.ID's would be randomly >associated with the marginals given by the correct inner join. > >Hopefully that makes sense. > >I think this function would be really useful for statistical analysis of >scientific data within MySQL (using randomized versions of the >associations within the tables). > >Not sure if the above syntax is optimal, because I would like to be able >to do things like this... > >table_one >Dept Person Gender >A 1 M >A 2 F >A 3 M >B 4 F >B 5 M >B 6 F > >table_two >Dept Person Spending >A 1 10 >A 2 20 >A 3 30 >B 4 40 >B 5 50 >B 6 60 > >SELECT > Dept, > Gender, > AVG(Spending) >FROM > table_one a >INNER JOIN > table_two b >USING > (Dept,Person) >GROUP BY > Dept, Gender >; > > >With the above query (which I hope is sensible) I would like to keep the >departments fixed, and randomize the genders of the people in the >departments (keeping the number of each sex in each department the same). > >So we could INNER JOIN using Dept and then RANDOM JOIN (as described >above) using Person - all in one 'JOIN'. All else being the same this >should randomize the Gender, but keep the marginals. > >I guess this is overly complex given the problem, and it actually raises >more questions instantly (about statistical analysis), but the function is >basic (I hope), and like all SQL it is the simplicity that allows you to >build complex and robust statements. > >Does the random join make any sense? Is this UDF material? > >Any feedback is welcome, > >Dan. > >Thread >UDF request? - Dan Bolser, April 1 2005 12:10am > >_________________________________________________________________ >Don’t just search. Find. Check out the new MSN Search! >http://search.msn.click-url.com/go/onm00200636ave/direct/01/ > > > -- MySQL General Mailing List For list archives: http://lists.mysql.com/mysql To unsubscribe: http://lists.mysql.com/[EMAIL PROTECTED]