RE: UDF request?

Dan Bolser Fri, 08 Apr 2005 15:46:37 -0700

On Fri, 8 Apr 2005 [EMAIL PROTECTED] wrote:

>Dan Bolser <[EMAIL PROTECTED]> wrote on 04/08/2005 12:41:35 PM:
>
>> On Thu, 7 Apr 2005, Sean Nolan wrote:
>> 
>> >I think you'll find you can do what you want with a cross join. A cross 
>join 
>> >will join every row from the first table with every row from the second 
>
>> >table. It will not randomly do this, so you'd have to be creative in 
>> >"randomly" selecting rows in the WHERE clause. Here's how you could do 
>this 
>> >with your data, pardon the highly original and very scientific 
>> >randomization, but it is just an example :-)
>> 
>> Perhaps I don't understand your "randomization" (because I really don't
>> understand it ;), but I don't think a CROSS join does the trick, because 
>I
>> want to randomly pick the same rows from either side of the join that I
>> would have gotten with an INNER JOIN using the 1:1 primary key between 
>the
>> two tables. This is analagous to sampling without replacement. If I do a
>> cross join and then just restrict the number of rows returned (hey! I 
>just
>> worked out what you are doing below) I don't necessarily get the same
>> 'marginals' (or to randomly pick the same rows from either side of the
>> join) . This is analagous to sampling with replacement.
>> 
>> Do you see what I mean?
>> 
>> 
>> 
>> 
>> >
>> >SELECT
>> >  Dept,
>> >  Gender,
>> >  AVG(Spending)
>> >FROM
>> >  table_one a
>> >CROSS JOIN
>> >  table_two b
>> >WHERE (a.Person * b.Person) % 4 = 3
>> >GROUP BY
>> >  Dept, Gender;
>> >
>> >Sean
>> >
>> >----- Original Message ------
>> >
>> >Hi,
>> >
>> >I have a request for a UDF that I would find really useful. I am
>> >duplicating this question on the community list as well as the MySQL 
>list,
>> >because I am not sure where best to make this kind of request (see the
>> >previous post http://lists.mysql.com/community/97).
>> >
>> >I think the following syntax would be really cool...
>> >
>> >SELECT
>> >  a.*,
>> >  b.*
>> >FROM
>> >  table_one a
>> >RANDOM JOIN       -- <-- :)
>> >  table_two b
>> >USING
>> >  (ID)
>> >;
>> >
>> >Lets say table_one and table_two have a primary key called ID in common
>> >(although of course that isn't necessary).
>> >
>> >The idea is that the RANDOM JOIN would first calculate the INNER JOIN,
>> >then scramble the 'links' (or rows) between the two tables. The result
>> >would be the same number of rows overall, the same number of distinct
>> >a.ID's and b.ID's, but those a.ID's and b.ID's would be randomly
>> >associated with the marginals given by the correct inner join.
>> >
>> >Hopefully that makes sense.
>> >
>> >I think this function would be really useful for statistical analysis 
>of
>> >scientific data within MySQL (using randomized versions of the
>> >associations within the tables).
>> >
>> >Not sure if the above syntax is optimal, because I would like to be 
>able
>> >to do things like this...
>> >
>> >table_one
>> >Dept   Person   Gender
>> >A   1   M
>> >A   2   F
>> >A   3   M
>> >B   4   F
>> >B   5   M
>> >B   6   F
>> >
>> >table_two
>> >Dept    Person  Spending
>> >A       1       10
>> >A       2       20
>> >A       3       30
>> >B       4       40
>> >B       5       50
>> >B       6       60
>> >
>> >SELECT
>> >  Dept,
>> >  Gender,
>> >  AVG(Spending)
>> >FROM
>> >  table_one a
>> >INNER JOIN
>> >  table_two b
>> >USING
>> >  (Dept,Person)
>> >GROUP BY
>> >  Dept, Gender
>> >;
>> >
>> >
>> >With the above query (which I hope is sensible) I would like to keep 
>the
>> >departments fixed, and randomize the genders of the people in the
>> >departments (keeping the number of each sex in each department the 
>same).
>> >
>> >So we could INNER JOIN using Dept and then RANDOM JOIN (as described
>> >above) using Person - all in one 'JOIN'. All else being the same this
>> >should randomize the Gender, but keep the marginals.
>> >
>> >I guess this is overly complex given the problem, and it actually 
>raises
>> >more questions instantly (about statistical analysis), but the function 
>is
>> >basic (I hope), and like all SQL it is the simplicity that allows you 
>to
>> >build complex and robust statements.
>> >
>> >Does the random join make any sense? Is this UDF material?
>> >
>> >Any feedback is welcome,
>> >
>> >Dan.
>> >
>> >Thread
>> >UDF request? - Dan Bolser, April 1 2005 12:10am
>> >
>> >_________________________________________________________________
>> >Don?t just search. Find. Check out the new MSN Search! 
>> >http://search.msn.click-url.com/go/onm00200636ave/direct/01/
>> >
>> >
>> >
>> 
>I think I get it. It sounds to me like you want to take a random sampling 
>(possibly to include the entire set) and analyze each sample. You can 
>return the results of a query in random order if you use the RAND() 
>function:
>
>http://dev.mysql.com/doc/mysql/en/mathematical-functions.html
>
><quote from site>
>As of MySQL 3.23, you can retrieve rows in random order like this:
>
>mysql> SELECT * FROM tbl_name ORDER BY RAND();
>
>ORDER BY RAND() combined with LIMIT is useful for selecting a random 
>sample of a set of rows:
>
>mysql> SELECT * FROM table1, table2 WHERE a=b AND c<d
>    -> ORDER BY RAND() LIMIT 1000;
><end quote>
>
>So if you want to randomly pick 1900 rows of data from a query you would 
>say (you need to fill in the blanks with your query):
>
>SELECT ...
>FROM ...
>WHERE ...
>ORDER BY RAND()
>LIMIT 1900;
>
>As to randomly matching rows from one table with corresponding rows from a 
>second table, that's just an illusion. For any two tables JOINed together, 
>the same row combinations will exist regardless of order or sequence.
>
>TABLE A
>1 red
>2 blue
>3 green
>
>Table B
>1 ball
>2 horse
>3 wagon
>
>SELECT CONCAT(a.color,' ', b.item)
>FROM tablea a
>INNER JOIN tableb b
>ORDER BY RAND();
>
>I just made a cross product (Cartesian product/cross join) of TableA with 
>TableB. There will be 9 rows in some random order (as determined by the 
>engine). However, if we sort the results (outside of MySQL) we would 
>always get the same list in the same order because there would always be 3 
>red items (red ball, red horse, red wagon) and 3 blue items (blue ball, 
>blue horse, blue wagon) and 3 green items (green ball, green horse, green 
>wagon).
>
>However if you do not use the entire result set (you apply a LIMIT clause) 
>so that you leave out some of the combinations, I think you achieve the 
>results you are looking for. Extending this to our sample data, if I LIMIT 
>the results to just 6 rows I get many different possible result sets (how 
>many unique lists exist of 9 things when taken 6 at a time? - I would do 
>the math but I can't remember which formula to use....). One set may have 
>no red things while another could have 2 reds, 2 greens, and 2 blues and 
>another 3 reds, 2 greens, and a blue.


n choose k (for combinations), fact(9)/(fact(6)*fact(3))...


>Is this the randomizing effect you were going for?

I am still trying to work out what I do want....

OK, I think I worked it out, I want to *scramble* a list of pairs.

Table1
A       B
1       10
2       20
3       30

select scramble(A,B) from Table1;
A_scram B_scram
1       30
2       10
3       20

(as an example, naturally the results could differ each time the query is 
run)

The problem with just limiting the cartesian product is that it dosn't
guarante that every value of A and every value of B is represented the
correct number of times.

Here is another example...

A       B
1       10
1       20
1       30
2       20
3       30

select scramble(A,B) from Table1;
A_scram B_scram
1       30
1       20
1       30
2       10
3       20

(for example).

How can I do this with SQL? Could a SCRAMBLE(A,B) UDF be a reasonable
request?

Cheers,
Dan.


>
>Shawn Green
>Database Administrator
>Unimin Corporation - Spruce Pine
>


-- 
MySQL General Mailing List
For list archives: http://lists.mysql.com/mysql
To unsubscribe:    http://lists.mysql.com/[EMAIL PROTECTED]

RE: UDF request?

Reply via email to