On Fri, 8 Apr 2005 [EMAIL PROTECTED] wrote: >Dan Bolser <[EMAIL PROTECTED]> wrote on 04/08/2005 12:41:35 PM: > >> On Thu, 7 Apr 2005, Sean Nolan wrote: >> >> >I think you'll find you can do what you want with a cross join. A cross >join >> >will join every row from the first table with every row from the second > >> >table. It will not randomly do this, so you'd have to be creative in >> >"randomly" selecting rows in the WHERE clause. Here's how you could do >this >> >with your data, pardon the highly original and very scientific >> >randomization, but it is just an example :-) >> >> Perhaps I don't understand your "randomization" (because I really don't >> understand it ;), but I don't think a CROSS join does the trick, because >I >> want to randomly pick the same rows from either side of the join that I >> would have gotten with an INNER JOIN using the 1:1 primary key between >the >> two tables. This is analagous to sampling without replacement. If I do a >> cross join and then just restrict the number of rows returned (hey! I >just >> worked out what you are doing below) I don't necessarily get the same >> 'marginals' (or to randomly pick the same rows from either side of the >> join) . This is analagous to sampling with replacement. >> >> Do you see what I mean? >> >> >> >> >> > >> >SELECT >> > Dept, >> > Gender, >> > AVG(Spending) >> >FROM >> > table_one a >> >CROSS JOIN >> > table_two b >> >WHERE (a.Person * b.Person) % 4 = 3 >> >GROUP BY >> > Dept, Gender; >> > >> >Sean >> > >> >----- Original Message ------ >> > >> >Hi, >> > >> >I have a request for a UDF that I would find really useful. I am >> >duplicating this question on the community list as well as the MySQL >list, >> >because I am not sure where best to make this kind of request (see the >> >previous post http://lists.mysql.com/community/97). >> > >> >I think the following syntax would be really cool... >> > >> >SELECT >> > a.*, >> > b.* >> >FROM >> > table_one a >> >RANDOM JOIN -- <-- :) >> > table_two b >> >USING >> > (ID) >> >; >> > >> >Lets say table_one and table_two have a primary key called ID in common >> >(although of course that isn't necessary). >> > >> >The idea is that the RANDOM JOIN would first calculate the INNER JOIN, >> >then scramble the 'links' (or rows) between the two tables. The result >> >would be the same number of rows overall, the same number of distinct >> >a.ID's and b.ID's, but those a.ID's and b.ID's would be randomly >> >associated with the marginals given by the correct inner join. >> > >> >Hopefully that makes sense. >> > >> >I think this function would be really useful for statistical analysis >of >> >scientific data within MySQL (using randomized versions of the >> >associations within the tables). >> > >> >Not sure if the above syntax is optimal, because I would like to be >able >> >to do things like this... >> > >> >table_one >> >Dept Person Gender >> >A 1 M >> >A 2 F >> >A 3 M >> >B 4 F >> >B 5 M >> >B 6 F >> > >> >table_two >> >Dept Person Spending >> >A 1 10 >> >A 2 20 >> >A 3 30 >> >B 4 40 >> >B 5 50 >> >B 6 60 >> > >> >SELECT >> > Dept, >> > Gender, >> > AVG(Spending) >> >FROM >> > table_one a >> >INNER JOIN >> > table_two b >> >USING >> > (Dept,Person) >> >GROUP BY >> > Dept, Gender >> >; >> > >> > >> >With the above query (which I hope is sensible) I would like to keep >the >> >departments fixed, and randomize the genders of the people in the >> >departments (keeping the number of each sex in each department the >same). >> > >> >So we could INNER JOIN using Dept and then RANDOM JOIN (as described >> >above) using Person - all in one 'JOIN'. All else being the same this >> >should randomize the Gender, but keep the marginals. >> > >> >I guess this is overly complex given the problem, and it actually >raises >> >more questions instantly (about statistical analysis), but the function >is >> >basic (I hope), and like all SQL it is the simplicity that allows you >to >> >build complex and robust statements. >> > >> >Does the random join make any sense? Is this UDF material? >> > >> >Any feedback is welcome, >> > >> >Dan. >> > >> >Thread >> >UDF request? - Dan Bolser, April 1 2005 12:10am >> > >> >_________________________________________________________________ >> >Don?t just search. Find. Check out the new MSN Search! >> >http://search.msn.click-url.com/go/onm00200636ave/direct/01/ >> > >> > >> > >> >I think I get it. It sounds to me like you want to take a random sampling >(possibly to include the entire set) and analyze each sample. You can >return the results of a query in random order if you use the RAND() >function: > >http://dev.mysql.com/doc/mysql/en/mathematical-functions.html > ><quote from site> >As of MySQL 3.23, you can retrieve rows in random order like this: > >mysql> SELECT * FROM tbl_name ORDER BY RAND(); > >ORDER BY RAND() combined with LIMIT is useful for selecting a random >sample of a set of rows: > >mysql> SELECT * FROM table1, table2 WHERE a=b AND c<d > -> ORDER BY RAND() LIMIT 1000; ><end quote> > >So if you want to randomly pick 1900 rows of data from a query you would >say (you need to fill in the blanks with your query): > >SELECT ... >FROM ... >WHERE ... >ORDER BY RAND() >LIMIT 1900; > >As to randomly matching rows from one table with corresponding rows from a >second table, that's just an illusion. For any two tables JOINed together, >the same row combinations will exist regardless of order or sequence. > >TABLE A >1 red >2 blue >3 green > >Table B >1 ball >2 horse >3 wagon > >SELECT CONCAT(a.color,' ', b.item) >FROM tablea a >INNER JOIN tableb b >ORDER BY RAND(); > >I just made a cross product (Cartesian product/cross join) of TableA with >TableB. There will be 9 rows in some random order (as determined by the >engine). However, if we sort the results (outside of MySQL) we would >always get the same list in the same order because there would always be 3 >red items (red ball, red horse, red wagon) and 3 blue items (blue ball, >blue horse, blue wagon) and 3 green items (green ball, green horse, green >wagon). > >However if you do not use the entire result set (you apply a LIMIT clause) >so that you leave out some of the combinations, I think you achieve the >results you are looking for. Extending this to our sample data, if I LIMIT >the results to just 6 rows I get many different possible result sets (how >many unique lists exist of 9 things when taken 6 at a time? - I would do >the math but I can't remember which formula to use....). One set may have >no red things while another could have 2 reds, 2 greens, and 2 blues and >another 3 reds, 2 greens, and a blue.
n choose k (for combinations), fact(9)/(fact(6)*fact(3))... >Is this the randomizing effect you were going for? I am still trying to work out what I do want.... OK, I think I worked it out, I want to *scramble* a list of pairs. Table1 A B 1 10 2 20 3 30 select scramble(A,B) from Table1; A_scram B_scram 1 30 2 10 3 20 (as an example, naturally the results could differ each time the query is run) The problem with just limiting the cartesian product is that it dosn't guarante that every value of A and every value of B is represented the correct number of times. Here is another example... A B 1 10 1 20 1 30 2 20 3 30 select scramble(A,B) from Table1; A_scram B_scram 1 30 1 20 1 30 2 10 3 20 (for example). How can I do this with SQL? Could a SCRAMBLE(A,B) UDF be a reasonable request? Cheers, Dan. > >Shawn Green >Database Administrator >Unimin Corporation - Spruce Pine > -- MySQL General Mailing List For list archives: http://lists.mysql.com/mysql To unsubscribe: http://lists.mysql.com/[EMAIL PROTECTED]