Dan Bolser <[EMAIL PROTECTED]> wrote on 04/08/2005 12:41:35 PM: > On Thu, 7 Apr 2005, Sean Nolan wrote: > > >I think you'll find you can do what you want with a cross join. A cross join > >will join every row from the first table with every row from the second
> >table. It will not randomly do this, so you'd have to be creative in > >"randomly" selecting rows in the WHERE clause. Here's how you could do this > >with your data, pardon the highly original and very scientific > >randomization, but it is just an example :-) > > Perhaps I don't understand your "randomization" (because I really don't > understand it ;), but I don't think a CROSS join does the trick, because I > want to randomly pick the same rows from either side of the join that I > would have gotten with an INNER JOIN using the 1:1 primary key between the > two tables. This is analagous to sampling without replacement. If I do a > cross join and then just restrict the number of rows returned (hey! I just > worked out what you are doing below) I don't necessarily get the same > 'marginals' (or to randomly pick the same rows from either side of the > join) . This is analagous to sampling with replacement. > > Do you see what I mean? > > > > > > > >SELECT > > Dept, > > Gender, > > AVG(Spending) > >FROM > > table_one a > >CROSS JOIN > > table_two b > >WHERE (a.Person * b.Person) % 4 = 3 > >GROUP BY > > Dept, Gender; > > > >Sean > > > >----- Original Message ------ > > > >Hi, > > > >I have a request for a UDF that I would find really useful. I am > >duplicating this question on the community list as well as the MySQL list, > >because I am not sure where best to make this kind of request (see the > >previous post http://lists.mysql.com/community/97). > > > >I think the following syntax would be really cool... > > > >SELECT > > a.*, > > b.* > >FROM > > table_one a > >RANDOM JOIN -- <-- :) > > table_two b > >USING > > (ID) > >; > > > >Lets say table_one and table_two have a primary key called ID in common > >(although of course that isn't necessary). > > > >The idea is that the RANDOM JOIN would first calculate the INNER JOIN, > >then scramble the 'links' (or rows) between the two tables. The result > >would be the same number of rows overall, the same number of distinct > >a.ID's and b.ID's, but those a.ID's and b.ID's would be randomly > >associated with the marginals given by the correct inner join. > > > >Hopefully that makes sense. > > > >I think this function would be really useful for statistical analysis of > >scientific data within MySQL (using randomized versions of the > >associations within the tables). > > > >Not sure if the above syntax is optimal, because I would like to be able > >to do things like this... > > > >table_one > >Dept Person Gender > >A 1 M > >A 2 F > >A 3 M > >B 4 F > >B 5 M > >B 6 F > > > >table_two > >Dept Person Spending > >A 1 10 > >A 2 20 > >A 3 30 > >B 4 40 > >B 5 50 > >B 6 60 > > > >SELECT > > Dept, > > Gender, > > AVG(Spending) > >FROM > > table_one a > >INNER JOIN > > table_two b > >USING > > (Dept,Person) > >GROUP BY > > Dept, Gender > >; > > > > > >With the above query (which I hope is sensible) I would like to keep the > >departments fixed, and randomize the genders of the people in the > >departments (keeping the number of each sex in each department the same). > > > >So we could INNER JOIN using Dept and then RANDOM JOIN (as described > >above) using Person - all in one 'JOIN'. All else being the same this > >should randomize the Gender, but keep the marginals. > > > >I guess this is overly complex given the problem, and it actually raises > >more questions instantly (about statistical analysis), but the function is > >basic (I hope), and like all SQL it is the simplicity that allows you to > >build complex and robust statements. > > > >Does the random join make any sense? Is this UDF material? > > > >Any feedback is welcome, > > > >Dan. > > > >Thread > >UDF request? - Dan Bolser, April 1 2005 12:10am > > > >_________________________________________________________________ > >Don?t just search. Find. Check out the new MSN Search! > >http://search.msn.click-url.com/go/onm00200636ave/direct/01/ > > > > > > > I think I get it. It sounds to me like you want to take a random sampling (possibly to include the entire set) and analyze each sample. You can return the results of a query in random order if you use the RAND() function: http://dev.mysql.com/doc/mysql/en/mathematical-functions.html <quote from site> As of MySQL 3.23, you can retrieve rows in random order like this: mysql> SELECT * FROM tbl_name ORDER BY RAND(); ORDER BY RAND() combined with LIMIT is useful for selecting a random sample of a set of rows: mysql> SELECT * FROM table1, table2 WHERE a=b AND c<d -> ORDER BY RAND() LIMIT 1000; <end quote> So if you want to randomly pick 1900 rows of data from a query you would say (you need to fill in the blanks with your query): SELECT ... FROM ... WHERE ... ORDER BY RAND() LIMIT 1900; As to randomly matching rows from one table with corresponding rows from a second table, that's just an illusion. For any two tables JOINed together, the same row combinations will exist regardless of order or sequence. TABLE A 1 red 2 blue 3 green Table B 1 ball 2 horse 3 wagon SELECT CONCAT(a.color,' ', b.item) FROM tablea a INNER JOIN tableb b ORDER BY RAND(); I just made a cross product (Cartesian product/cross join) of TableA with TableB. There will be 9 rows in some random order (as determined by the engine). However, if we sort the results (outside of MySQL) we would always get the same list in the same order because there would always be 3 red items (red ball, red horse, red wagon) and 3 blue items (blue ball, blue horse, blue wagon) and 3 green items (green ball, green horse, green wagon). However if you do not use the entire result set (you apply a LIMIT clause) so that you leave out some of the combinations, I think you achieve the results you are looking for. Extending this to our sample data, if I LIMIT the results to just 6 rows I get many different possible result sets (how many unique lists exist of 9 things when taken 6 at a time? - I would do the math but I can't remember which formula to use....). One set may have no red things while another could have 2 reds, 2 greens, and 2 blues and another 3 reds, 2 greens, and a blue. Is this the randomizing effect you were going for? Shawn Green Database Administrator Unimin Corporation - Spruce Pine