RE: UDF request?

SGreen Fri, 08 Apr 2005 10:13:36 -0700

Dan Bolser <[EMAIL PROTECTED]> wrote on 04/08/2005 12:41:35 PM:

> On Thu, 7 Apr 2005, Sean Nolan wrote:
> 
> >I think you'll find you can do what you want with a cross join. A cross 
join 
> >will join every row from the first table with every row from the second


> >table. It will not randomly do this, so you'd have to be creative in 
> >"randomly" selecting rows in the WHERE clause. Here's how you could do 
this 
> >with your data, pardon the highly original and very scientific 
> >randomization, but it is just an example :-)
> 
> Perhaps I don't understand your "randomization" (because I really don't
> understand it ;), but I don't think a CROSS join does the trick, because 
I
> want to randomly pick the same rows from either side of the join that I
> would have gotten with an INNER JOIN using the 1:1 primary key between 
the
> two tables. This is analagous to sampling without replacement. If I do a
> cross join and then just restrict the number of rows returned (hey! I 
just
> worked out what you are doing below) I don't necessarily get the same
> 'marginals' (or to randomly pick the same rows from either side of the
> join) . This is analagous to sampling with replacement.
> 
> Do you see what I mean?
> 
> 
> 
> 
> >
> >SELECT
> >  Dept,
> >  Gender,
> >  AVG(Spending)
> >FROM
> >  table_one a
> >CROSS JOIN
> >  table_two b
> >WHERE (a.Person * b.Person) % 4 = 3
> >GROUP BY
> >  Dept, Gender;
> >
> >Sean
> >
> >----- Original Message ------
> >
> >Hi,
> >
> >I have a request for a UDF that I would find really useful. I am
> >duplicating this question on the community list as well as the MySQL 
list,
> >because I am not sure where best to make this kind of request (see the
> >previous post http://lists.mysql.com/community/97).
> >
> >I think the following syntax would be really cool...
> >
> >SELECT
> >  a.*,
> >  b.*
> >FROM
> >  table_one a
> >RANDOM JOIN       -- <-- :)
> >  table_two b
> >USING
> >  (ID)
> >;
> >
> >Lets say table_one and table_two have a primary key called ID in common
> >(although of course that isn't necessary).
> >
> >The idea is that the RANDOM JOIN would first calculate the INNER JOIN,
> >then scramble the 'links' (or rows) between the two tables. The result
> >would be the same number of rows overall, the same number of distinct
> >a.ID's and b.ID's, but those a.ID's and b.ID's would be randomly
> >associated with the marginals given by the correct inner join.
> >
> >Hopefully that makes sense.
> >
> >I think this function would be really useful for statistical analysis 
of
> >scientific data within MySQL (using randomized versions of the
> >associations within the tables).
> >
> >Not sure if the above syntax is optimal, because I would like to be 
able
> >to do things like this...
> >
> >table_one
> >Dept   Person   Gender
> >A   1   M
> >A   2   F
> >A   3   M
> >B   4   F
> >B   5   M
> >B   6   F
> >
> >table_two
> >Dept    Person  Spending
> >A       1       10
> >A       2       20
> >A       3       30
> >B       4       40
> >B       5       50
> >B       6       60
> >
> >SELECT
> >  Dept,
> >  Gender,
> >  AVG(Spending)
> >FROM
> >  table_one a
> >INNER JOIN
> >  table_two b
> >USING
> >  (Dept,Person)
> >GROUP BY
> >  Dept, Gender
> >;
> >
> >
> >With the above query (which I hope is sensible) I would like to keep 
the
> >departments fixed, and randomize the genders of the people in the
> >departments (keeping the number of each sex in each department the 
same).
> >
> >So we could INNER JOIN using Dept and then RANDOM JOIN (as described
> >above) using Person - all in one 'JOIN'. All else being the same this
> >should randomize the Gender, but keep the marginals.
> >
> >I guess this is overly complex given the problem, and it actually 
raises
> >more questions instantly (about statistical analysis), but the function 
is
> >basic (I hope), and like all SQL it is the simplicity that allows you 
to
> >build complex and robust statements.
> >
> >Does the random join make any sense? Is this UDF material?
> >
> >Any feedback is welcome,
> >
> >Dan.
> >
> >Thread
> >UDF request? - Dan Bolser, April 1 2005 12:10am
> >
> >_________________________________________________________________
> >Don?t just search. Find. Check out the new MSN Search! 
> >http://search.msn.click-url.com/go/onm00200636ave/direct/01/
> >
> >
> >
> 
I think I get it. It sounds to me like you want to take a random sampling 
(possibly to include the entire set) and analyze each sample. You can 
return the results of a query in random order if you use the RAND() 
function:

http://dev.mysql.com/doc/mysql/en/mathematical-functions.html

<quote from site>
As of MySQL 3.23, you can retrieve rows in random order like this:

mysql> SELECT * FROM tbl_name ORDER BY RAND();

ORDER BY RAND() combined with LIMIT is useful for selecting a random 
sample of a set of rows:

mysql> SELECT * FROM table1, table2 WHERE a=b AND c<d
    -> ORDER BY RAND() LIMIT 1000;
<end quote>

So if you want to randomly pick 1900 rows of data from a query you would 
say (you need to fill in the blanks with your query):

SELECT ...
FROM ...
WHERE ...
ORDER BY RAND()
LIMIT 1900;

As to randomly matching rows from one table with corresponding rows from a 
second table, that's just an illusion. For any two tables JOINed together, 
the same row combinations will exist regardless of order or sequence.

TABLE A
1 red
2 blue
3 green

Table B
1 ball
2 horse
3 wagon

SELECT CONCAT(a.color,' ', b.item)
FROM tablea a
INNER JOIN tableb b
ORDER BY RAND();

I just made a cross product (Cartesian product/cross join) of TableA with 
TableB. There will be 9 rows in some random order (as determined by the 
engine). However, if we sort the results (outside of MySQL) we would 
always get the same list in the same order because there would always be 3 
red items (red ball, red horse, red wagon) and 3 blue items (blue ball, 
blue horse, blue wagon) and 3 green items (green ball, green horse, green 
wagon).

However if you do not use the entire result set (you apply a LIMIT clause) 
so that you leave out some of the combinations, I think you achieve the 
results you are looking for. Extending this to our sample data, if I LIMIT 
the results to just 6 rows I get many different possible result sets (how 
many unique lists exist of 9 things when taken 6 at a time? - I would do 
the math but I can't remember which formula to use....). One set may have 
no red things while another could have 2 reds, 2 greens, and 2 blues and 
another 3 reds, 2 greens, and a blue.

Is this the randomizing effect you were going for?

Shawn Green
Database Administrator
Unimin Corporation - Spruce Pine

RE: UDF request?

Reply via email to