Re: UDF Request "AGGLOM()"

2005-12-05 Thread dmb
> Hi Dan,
>
> Dan Bolser wrote:
 Who can I prod about setting up a UDF repo at MySQL. I think 'they'
 should
 do this ;)
>>> Yep it's an existing idea, a very good one, and it's on the todo.
>>> Putting such an infrastructure into place will take some time though.
>>
>> I can imagine it isn't trivial to set up.
>>
>>> Would a special forum for this purpose perhaps be a good intermediate
>>> solution? Routines could be posted there, it's searchable...
>>
>> Forum = list?
>>
>> I think it would be a good start, if only to discuss things like this :)
>
> forums.mysql.com


Thanks. Don't know if it is just me, but I can't find anything to do with
UDF's on that forum. I was actually thinking that you meant an archived
mailing list, but a BBS is OK.




> Regards,
> Arjen.
> --
> Arjen Lentz, Community Relations Manager, MySQL AB
> Program Chair, MySQL Users Conference
>
> MySQL Users Conference 2006 (Santa Clara CA, 24-27 April)
> http://www.mysqluc.com/
>



-- 
MySQL General Mailing List
For list archives: http://lists.mysql.com/mysql
To unsubscribe:http://lists.mysql.com/[EMAIL PROTECTED]



Re: UDF Request "AGGLOM()"

2005-12-05 Thread Arjen Lentz

Hi Dan,

Dan Bolser wrote:

Who can I prod about setting up a UDF repo at MySQL. I think 'they' should
do this ;)

Yep it's an existing idea, a very good one, and it's on the todo.
Putting such an infrastructure into place will take some time though.


I can imagine it isn't trivial to set up.


Would a special forum for this purpose perhaps be a good intermediate
solution? Routines could be posted there, it's searchable...


Forum = list?

I think it would be a good start, if only to discuss things like this :)


forums.mysql.com


Regards,
Arjen.
--
Arjen Lentz, Community Relations Manager, MySQL AB
Program Chair, MySQL Users Conference

MySQL Users Conference 2006 (Santa Clara CA, 24-27 April)
http://www.mysqluc.com/

--
MySQL General Mailing List
For list archives: http://lists.mysql.com/mysql
To unsubscribe:http://lists.mysql.com/[EMAIL PROTECTED]



RE: UDF request?

2005-04-11 Thread SGreen
Dan Bolser <[EMAIL PROTECTED]> wrote on 04/08/2005 06:45:42 PM:

> On Fri, 8 Apr 2005 [EMAIL PROTECTED] wrote:
> 
> >Dan Bolser <[EMAIL PROTECTED]> wrote on 04/08/2005 12:41:35 PM:
> >
> >> On Thu, 7 Apr 2005, Sean Nolan wrote:
> >> 
> >> >I think you'll find you can do what you want with a cross join. A 
cross 
> >join 
> >> >will join every row from the first table with every row from the 
second 
> >
> >> >table. It will not randomly do this, so you'd have to be creative in 

> >> >"randomly" selecting rows in the WHERE clause. Here's how you could 
do 
> >this 
> >> >with your data, pardon the highly original and very scientific 
> >> >randomization, but it is just an example :-)
> >> 
> >> Perhaps I don't understand your "randomization" (because I really 
don't
> >> understand it ;), but I don't think a CROSS join does the trick, 
because 
> >I
> >> want to randomly pick the same rows from either side of the join that 
I
> >> would have gotten with an INNER JOIN using the 1:1 primary key 
between 
> >the
> >> two tables. This is analagous to sampling without replacement. If I 
do a
> >> cross join and then just restrict the number of rows returned (hey! I 

> >just
> >> worked out what you are doing below) I don't necessarily get the same
> >> 'marginals' (or to randomly pick the same rows from either side of 
the
> >> join) . This is analagous to sampling with replacement.
> >> 
> >> Do you see what I mean?
> >> 
> >> 
> >> 
> >> 
> >> >
> >> >SELECT
> >> >  Dept,
> >> >  Gender,
> >> >  AVG(Spending)
> >> >FROM
> >> >  table_one a
> >> >CROSS JOIN
> >> >  table_two b
> >> >WHERE (a.Person * b.Person) % 4 = 3
> >> >GROUP BY
> >> >  Dept, Gender;
> >> >
> >> >Sean
> >> >
> >> >- Original Message --
> >> >
> >> >Hi,
> >> >
> >> >I have a request for a UDF that I would find really useful. I am
> >> >duplicating this question on the community list as well as the MySQL 

> >list,
> >> >because I am not sure where best to make this kind of request (see 
the
> >> >previous post http://lists.mysql.com/community/97).
> >> >
> >> >I think the following syntax would be really cool...
> >> >
> >> >SELECT
> >> >  a.*,
> >> >  b.*
> >> >FROM
> >> >  table_one a
> >> >RANDOM JOIN   -- <-- :)
> >> >  table_two b
> >> >USING
> >> >  (ID)
> >> >;
> >> >
> >> >Lets say table_one and table_two have a primary key called ID in 
common
> >> >(although of course that isn't necessary).
> >> >
> >> >The idea is that the RANDOM JOIN would first calculate the INNER 
JOIN,
> >> >then scramble the 'links' (or rows) between the two tables. The 
result
> >> >would be the same number of rows overall, the same number of 
distinct
> >> >a.ID's and b.ID's, but those a.ID's and b.ID's would be randomly
> >> >associated with the marginals given by the correct inner join.
> >> >
> >> >Hopefully that makes sense.
> >> >
> >> >I think this function would be really useful for statistical 
analysis 
> >of
> >> >scientific data within MySQL (using randomized versions of the
> >> >associations within the tables).
> >> >
> >> >Not sure if the above syntax is optimal, because I would like to be 
> >able
> >> >to do things like this...
> >> >
> >> >table_one
> >> >Dept   Person   Gender
> >> >A   1   M
> >> >A   2   F
> >> >A   3   M
> >> >B   4   F
> >> >B   5   M
> >> >B   6   F
> >> >
> >> >table_two
> >> >DeptPerson  Spending
> >> >A   1   10
> >> >A   2   20
> >> >A   3   30
> >> >B   4   40
> >> >B   5   50
> >> >B   6   60
> >> >
> >> >SELECT
> >> >  Dept,
> >> >  Gender,
> >> >  AVG(Spending)
> >> >FROM
> >> >  table_one a
> >

Re: UDF Request "AGGLOM()"

2005-04-11 Thread Dan Bolser
On Mon, 11 Apr 2005, Arjen Lentz wrote:

>Hi Dan,
>
>On Sat, 2005-04-09 at 02:59, Dan Bolser wrote:
>> Who can I prod about setting up a UDF repo at MySQL. I think 'they' should
>> do this ;)
>
>Yep it's an existing idea, a very good one, and it's on the todo.
>Putting such an infrastructure into place will take some time though.

I can imagine it isn't trivial to set up.

>Would a special forum for this purpose perhaps be a good intermediate
>solution? Routines could be posted there, it's searchable...

Forum = list?

I think it would be a good start, if only to discuss things like this :)

All the best,
Dan.


>
>
>Regards,
>Arjen.
>


-- 
MySQL General Mailing List
For list archives: http://lists.mysql.com/mysql
To unsubscribe:http://lists.mysql.com/[EMAIL PROTECTED]



Re: UDF Request "AGGLOM()"

2005-04-10 Thread Arjen Lentz
Hi Dan,

On Sat, 2005-04-09 at 02:59, Dan Bolser wrote:
> Who can I prod about setting up a UDF repo at MySQL. I think 'they' should
> do this ;)

Yep it's an existing idea, a very good one, and it's on the todo.
Putting such an infrastructure into place will take some time though.

Would a special forum for this purpose perhaps be a good intermediate
solution? Routines could be posted there, it's searchable...


Regards,
Arjen.
-- 
Arjen Lentz, Community Relations Manager
MySQL AB, www.mysql.com

MySQL Users Conference (Santa Clara CA, 18-21 April 2005)
Only 8 days until the event!  http://www.mysqluc.com/



-- 
MySQL General Mailing List
For list archives: http://lists.mysql.com/mysql
To unsubscribe:http://lists.mysql.com/[EMAIL PROTECTED]



RE: UDF request?

2005-04-08 Thread Dan Bolser
On Fri, 8 Apr 2005 [EMAIL PROTECTED] wrote:

>Dan Bolser <[EMAIL PROTECTED]> wrote on 04/08/2005 12:41:35 PM:
>
>> On Thu, 7 Apr 2005, Sean Nolan wrote:
>> 
>> >I think you'll find you can do what you want with a cross join. A cross 
>join 
>> >will join every row from the first table with every row from the second 
>
>> >table. It will not randomly do this, so you'd have to be creative in 
>> >"randomly" selecting rows in the WHERE clause. Here's how you could do 
>this 
>> >with your data, pardon the highly original and very scientific 
>> >randomization, but it is just an example :-)
>> 
>> Perhaps I don't understand your "randomization" (because I really don't
>> understand it ;), but I don't think a CROSS join does the trick, because 
>I
>> want to randomly pick the same rows from either side of the join that I
>> would have gotten with an INNER JOIN using the 1:1 primary key between 
>the
>> two tables. This is analagous to sampling without replacement. If I do a
>> cross join and then just restrict the number of rows returned (hey! I 
>just
>> worked out what you are doing below) I don't necessarily get the same
>> 'marginals' (or to randomly pick the same rows from either side of the
>> join) . This is analagous to sampling with replacement.
>> 
>> Do you see what I mean?
>> 
>> 
>> 
>> 
>> >
>> >SELECT
>> >  Dept,
>> >  Gender,
>> >  AVG(Spending)
>> >FROM
>> >  table_one a
>> >CROSS JOIN
>> >  table_two b
>> >WHERE (a.Person * b.Person) % 4 = 3
>> >GROUP BY
>> >  Dept, Gender;
>> >
>> >Sean
>> >
>> >- Original Message --
>> >
>> >Hi,
>> >
>> >I have a request for a UDF that I would find really useful. I am
>> >duplicating this question on the community list as well as the MySQL 
>list,
>> >because I am not sure where best to make this kind of request (see the
>> >previous post http://lists.mysql.com/community/97).
>> >
>> >I think the following syntax would be really cool...
>> >
>> >SELECT
>> >  a.*,
>> >  b.*
>> >FROM
>> >  table_one a
>> >RANDOM JOIN   -- <-- :)
>> >  table_two b
>> >USING
>> >  (ID)
>> >;
>> >
>> >Lets say table_one and table_two have a primary key called ID in common
>> >(although of course that isn't necessary).
>> >
>> >The idea is that the RANDOM JOIN would first calculate the INNER JOIN,
>> >then scramble the 'links' (or rows) between the two tables. The result
>> >would be the same number of rows overall, the same number of distinct
>> >a.ID's and b.ID's, but those a.ID's and b.ID's would be randomly
>> >associated with the marginals given by the correct inner join.
>> >
>> >Hopefully that makes sense.
>> >
>> >I think this function would be really useful for statistical analysis 
>of
>> >scientific data within MySQL (using randomized versions of the
>> >associations within the tables).
>> >
>> >Not sure if the above syntax is optimal, because I would like to be 
>able
>> >to do things like this...
>> >
>> >table_one
>> >Dept   Person   Gender
>> >A   1   M
>> >A   2   F
>> >A   3   M
>> >B   4   F
>> >B   5   M
>> >B   6   F
>> >
>> >table_two
>> >DeptPerson  Spending
>> >A   1   10
>> >A   2   20
>> >A   3   30
>> >B   4   40
>> >B   5   50
>> >B   6   60
>> >
>> >SELECT
>> >  Dept,
>> >  Gender,
>> >  AVG(Spending)
>> >FROM
>> >  table_one a
>> >INNER JOIN
>> >  table_two b
>> >USING
>> >  (Dept,Person)
>> >GROUP BY
>> >  Dept, Gender
>> >;
>> >
>> >
>> >With the above query (which I hope is sensible) I would like to keep 
>the
>> >departments fixed, and randomize the genders of the people in the
>> >departments (keeping the number of each sex in each department the 
>same).
>> >
>> >So we could INNER JOIN using Dept and then RANDOM JOIN (as described
>> >above) using Person - all in one 'JOIN'. All else being the same this
>> >should randomize the Ge

RE: UDF request?

2005-04-08 Thread SGreen
Dan Bolser <[EMAIL PROTECTED]> wrote on 04/08/2005 12:41:35 PM:

> On Thu, 7 Apr 2005, Sean Nolan wrote:
> 
> >I think you'll find you can do what you want with a cross join. A cross 
join 
> >will join every row from the first table with every row from the second 

> >table. It will not randomly do this, so you'd have to be creative in 
> >"randomly" selecting rows in the WHERE clause. Here's how you could do 
this 
> >with your data, pardon the highly original and very scientific 
> >randomization, but it is just an example :-)
> 
> Perhaps I don't understand your "randomization" (because I really don't
> understand it ;), but I don't think a CROSS join does the trick, because 
I
> want to randomly pick the same rows from either side of the join that I
> would have gotten with an INNER JOIN using the 1:1 primary key between 
the
> two tables. This is analagous to sampling without replacement. If I do a
> cross join and then just restrict the number of rows returned (hey! I 
just
> worked out what you are doing below) I don't necessarily get the same
> 'marginals' (or to randomly pick the same rows from either side of the
> join) . This is analagous to sampling with replacement.
> 
> Do you see what I mean?
> 
> 
> 
> 
> >
> >SELECT
> >  Dept,
> >  Gender,
> >  AVG(Spending)
> >FROM
> >  table_one a
> >CROSS JOIN
> >  table_two b
> >WHERE (a.Person * b.Person) % 4 = 3
> >GROUP BY
> >  Dept, Gender;
> >
> >Sean
> >
> >- Original Message --
> >
> >Hi,
> >
> >I have a request for a UDF that I would find really useful. I am
> >duplicating this question on the community list as well as the MySQL 
list,
> >because I am not sure where best to make this kind of request (see the
> >previous post http://lists.mysql.com/community/97).
> >
> >I think the following syntax would be really cool...
> >
> >SELECT
> >  a.*,
> >  b.*
> >FROM
> >  table_one a
> >RANDOM JOIN   -- <-- :)
> >  table_two b
> >USING
> >  (ID)
> >;
> >
> >Lets say table_one and table_two have a primary key called ID in common
> >(although of course that isn't necessary).
> >
> >The idea is that the RANDOM JOIN would first calculate the INNER JOIN,
> >then scramble the 'links' (or rows) between the two tables. The result
> >would be the same number of rows overall, the same number of distinct
> >a.ID's and b.ID's, but those a.ID's and b.ID's would be randomly
> >associated with the marginals given by the correct inner join.
> >
> >Hopefully that makes sense.
> >
> >I think this function would be really useful for statistical analysis 
of
> >scientific data within MySQL (using randomized versions of the
> >associations within the tables).
> >
> >Not sure if the above syntax is optimal, because I would like to be 
able
> >to do things like this...
> >
> >table_one
> >Dept   Person   Gender
> >A   1   M
> >A   2   F
> >A   3   M
> >B   4   F
> >B   5   M
> >B   6   F
> >
> >table_two
> >DeptPerson  Spending
> >A   1   10
> >A   2   20
> >A   3   30
> >B   4   40
> >B   5   50
> >B   6   60
> >
> >SELECT
> >  Dept,
> >  Gender,
> >  AVG(Spending)
> >FROM
> >  table_one a
> >INNER JOIN
> >  table_two b
> >USING
> >  (Dept,Person)
> >GROUP BY
> >  Dept, Gender
> >;
> >
> >
> >With the above query (which I hope is sensible) I would like to keep 
the
> >departments fixed, and randomize the genders of the people in the
> >departments (keeping the number of each sex in each department the 
same).
> >
> >So we could INNER JOIN using Dept and then RANDOM JOIN (as described
> >above) using Person - all in one 'JOIN'. All else being the same this
> >should randomize the Gender, but keep the marginals.
> >
> >I guess this is overly complex given the problem, and it actually 
raises
> >more questions instantly (about statistical analysis), but the function 
is
> >basic (I hope), and like all SQL it is the simplicity that allows you 
to
> >build complex and robust statements.
> >
> >Does the random join make any sense? Is this UDF material?
> >
> >Any feedback is welcome,
> >
> >Dan.
> >
> >Thread
> >UDF request? - Dan Bolser, April 1 2005 12:10am
>

UDF Request "AGGLOM()"

2005-04-08 Thread Dan Bolser

Who can I prod about setting up a UDF repo at MySQL. I think 'they' should
do this ;)

http://lists.mysql.com/community/97

Anyway I am posting this request to 'community' because I still don't know
the appropriate place to post UDF related stuff.

This is anoter (potentially crazy) idea for a UDF that I would find very
usefull in my research...

AGGLOM - Simple agglomerative clustering for MySQL ...

The UDF would work on any NUMBER column, and return the
number of 'clusters' using agglomerative clustering
with a certain threshold as an input.

Agglomerative clustering merges any two numbers that
are within the 'threshold', and replaces those numbers
with the average of the two. The clustering proceedes
smallest 'gap' first, and stops when no two numbers are
within the threshold.

The result would be the number (or perhaps the values) of the
remaining clusters.

Syntax (suggested) 

AGGLOM(THRESH,expr (returning a number))

For example

Table1

C1 C2
A 1
A 2
A 3
A 4
A 5
A 6
A 7
B 10
B 11
B 12
B 56
B 57
B 58
B 99
B 101


SELECT C1, AGGLOM(C2,1) AS C3 FROM Table1 GROUP BY C1;

C1 C3
A 4
B 6


SELECT C1, AGGLOM(C2,2) AS C3 FROM Table1 GROUP BY C1;

C1 C3
A 3
B 3


SELECT C1, AGGLOM(C2,3) AS C3 Table1 GROUP BY C1;

C1 C3
A 2
B 3


SELECT C1, AGGLOM(C2,4) AS C3 Table1 GROUP BY C1;

C1 C3
A 1
B 3


SELECT C1, AGGLOM(C2,50) AS C3 Table1 GROUP BY C1;

C1 C3
A 1
B 1



Remember, merge numbers with the smallest difference
first, and replace each pair with the average of the
two. Recalculate the differences for the new number,
and repeat until no distance is smaller than the threshold.

This is a usefull clustering 'hack' to see if a distribution 
is bi-modal or multi modal for example. It is very quick to
calculate using a hash table, and could be a great
function to add.

Is this idea as crazy as I think it might be?




-- 
MySQL General Mailing List
For list archives: http://lists.mysql.com/mysql
To unsubscribe:http://lists.mysql.com/[EMAIL PROTECTED]



RE: UDF request?

2005-04-08 Thread Dan Bolser
On Thu, 7 Apr 2005, Sean Nolan wrote:

>I think you'll find you can do what you want with a cross join. A cross join 
>will join every row from the first table with every row from the second 
>table. It will not randomly do this, so you'd have to be creative in 
>"randomly" selecting rows in the WHERE clause. Here's how you could do this 
>with your data, pardon the highly original and very scientific 
>randomization, but it is just an example :-)

Perhaps I don't understand your "randomization" (because I really don't
understand it ;), but I don't think a CROSS join does the trick, because I
want to randomly pick the same rows from either side of the join that I
would have gotten with an INNER JOIN using the 1:1 primary key between the
two tables. This is analagous to sampling without replacement. If I do a
cross join and then just restrict the number of rows returned (hey! I just
worked out what you are doing below) I don't necessarily get the same
'marginals' (or to randomly pick the same rows from either side of the
join) . This is analagous to sampling with replacement.

Do you see what I mean?




>
>SELECT
>  Dept,
>  Gender,
>  AVG(Spending)
>FROM
>  table_one a
>CROSS JOIN
>  table_two b
>WHERE (a.Person * b.Person) % 4 = 3
>GROUP BY
>  Dept, Gender;
>
>Sean
>
>- Original Message --
>
>Hi,
>
>I have a request for a UDF that I would find really useful. I am
>duplicating this question on the community list as well as the MySQL list,
>because I am not sure where best to make this kind of request (see the
>previous post http://lists.mysql.com/community/97).
>
>I think the following syntax would be really cool...
>
>SELECT
>  a.*,
>  b.*
>FROM
>  table_one a
>RANDOM JOIN   -- <-- :)
>  table_two b
>USING
>  (ID)
>;
>
>Lets say table_one and table_two have a primary key called ID in common
>(although of course that isn't necessary).
>
>The idea is that the RANDOM JOIN would first calculate the INNER JOIN,
>then scramble the 'links' (or rows) between the two tables. The result
>would be the same number of rows overall, the same number of distinct
>a.ID's and b.ID's, but those a.ID's and b.ID's would be randomly
>associated with the marginals given by the correct inner join.
>
>Hopefully that makes sense.
>
>I think this function would be really useful for statistical analysis of
>scientific data within MySQL (using randomized versions of the
>associations within the tables).
>
>Not sure if the above syntax is optimal, because I would like to be able
>to do things like this...
>
>table_one
>Dept   Person  Gender
>A  1   M
>A  2   F
>A  3   M
>B  4   F
>B  5   M
>B  6   F
>
>table_two
>DeptPerson  Spending
>A   1   10
>A   2   20
>A   3   30
>B   4   40
>B   5   50
>B   6   60
>
>SELECT
>  Dept,
>  Gender,
>  AVG(Spending)
>FROM
>  table_one a
>INNER JOIN
>  table_two b
>USING
>  (Dept,Person)
>GROUP BY
>  Dept, Gender
>;
>
>
>With the above query (which I hope is sensible) I would like to keep the
>departments fixed, and randomize the genders of the people in the
>departments (keeping the number of each sex in each department the same).
>
>So we could INNER JOIN using Dept and then RANDOM JOIN (as described
>above) using Person - all in one 'JOIN'. All else being the same this
>should randomize the Gender, but keep the marginals.
>
>I guess this is overly complex given the problem, and it actually raises
>more questions instantly (about statistical analysis), but the function is
>basic (I hope), and like all SQL it is the simplicity that allows you to
>build complex and robust statements.
>
>Does the random join make any sense? Is this UDF material?
>
>Any feedback is welcome,
>
>Dan.
>
>Thread
>UDF request? - Dan Bolser, April 1 2005 12:10am
>
>_
>Don’t just search. Find. Check out the new MSN Search! 
>http://search.msn.click-url.com/go/onm00200636ave/direct/01/
>
>
>


--
MySQL General Mailing List
For list archives: http://lists.mysql.com/mysql
To unsubscribe:http://lists.mysql.com/[EMAIL PROTECTED]



RE: UDF request?

2005-04-07 Thread Sean Nolan
I think you'll find you can do what you want with a cross join. A cross join 
will join every row from the first table with every row from the second 
table. It will not randomly do this, so you'd have to be creative in 
"randomly" selecting rows in the WHERE clause. Here's how you could do this 
with your data, pardon the highly original and very scientific 
randomization, but it is just an example :-)

SELECT
 Dept,
 Gender,
 AVG(Spending)
FROM
 table_one a
CROSS JOIN
 table_two b
WHERE (a.Person * b.Person) % 4 = 3
GROUP BY
 Dept, Gender;
Sean
- Original Message --
Hi,
I have a request for a UDF that I would find really useful. I am
duplicating this question on the community list as well as the MySQL list,
because I am not sure where best to make this kind of request (see the
previous post http://lists.mysql.com/community/97).
I think the following syntax would be really cool...
SELECT
 a.*,
 b.*
FROM
 table_one a
RANDOM JOIN   -- <-- :)
 table_two b
USING
 (ID)
;
Lets say table_one and table_two have a primary key called ID in common
(although of course that isn't necessary).
The idea is that the RANDOM JOIN would first calculate the INNER JOIN,
then scramble the 'links' (or rows) between the two tables. The result
would be the same number of rows overall, the same number of distinct
a.ID's and b.ID's, but those a.ID's and b.ID's would be randomly
associated with the marginals given by the correct inner join.
Hopefully that makes sense.
I think this function would be really useful for statistical analysis of
scientific data within MySQL (using randomized versions of the
associations within the tables).
Not sure if the above syntax is optimal, because I would like to be able
to do things like this...
table_one
DeptPerson  Gender
A   1   M
A   2   F
A   3   M
B   4   F
B   5   M
B   6   F
table_two
DeptPerson  Spending
A   1   10
A   2   20
A   3   30
B   4   40
B   5   50
B   6   60
SELECT
 Dept,
 Gender,
 AVG(Spending)
FROM
 table_one a
INNER JOIN
 table_two b
USING
 (Dept,Person)
GROUP BY
 Dept, Gender
;
With the above query (which I hope is sensible) I would like to keep the
departments fixed, and randomize the genders of the people in the
departments (keeping the number of each sex in each department the same).
So we could INNER JOIN using Dept and then RANDOM JOIN (as described
above) using Person - all in one 'JOIN'. All else being the same this
should randomize the Gender, but keep the marginals.
I guess this is overly complex given the problem, and it actually raises
more questions instantly (about statistical analysis), but the function is
basic (I hope), and like all SQL it is the simplicity that allows you to
build complex and robust statements.
Does the random join make any sense? Is this UDF material?
Any feedback is welcome,
Dan.
Thread
UDF request? - Dan Bolser, April 1 2005 12:10am
_
FREE pop-up blocking with the new MSN Toolbar – get it now! 
http://toolbar.msn.click-url.com/go/onm00200415ave/direct/01/

--
MySQL General Mailing List
For list archives: http://lists.mysql.com/mysql
To unsubscribe:http://lists.mysql.com/[EMAIL PROTECTED]


RE: UDF request?

2005-04-07 Thread Sean Nolan
I think you'll find you can do what you want with a cross join. A cross join 
will join every row from the first table with every row from the second 
table. It will not randomly do this, so you'd have to be creative in 
"randomly" selecting rows in the WHERE clause. Here's how you could do this 
with your data, pardon the highly original and very scientific 
randomization, but it is just an example :-)

SELECT
 Dept,
 Gender,
 AVG(Spending)
FROM
 table_one a
CROSS JOIN
 table_two b
WHERE (a.Person * b.Person) % 4 = 3
GROUP BY
 Dept, Gender;
Sean
- Original Message --
Hi,
I have a request for a UDF that I would find really useful. I am
duplicating this question on the community list as well as the MySQL list,
because I am not sure where best to make this kind of request (see the
previous post http://lists.mysql.com/community/97).
I think the following syntax would be really cool...
SELECT
 a.*,
 b.*
FROM
 table_one a
RANDOM JOIN   -- <-- :)
 table_two b
USING
 (ID)
;
Lets say table_one and table_two have a primary key called ID in common
(although of course that isn't necessary).
The idea is that the RANDOM JOIN would first calculate the INNER JOIN,
then scramble the 'links' (or rows) between the two tables. The result
would be the same number of rows overall, the same number of distinct
a.ID's and b.ID's, but those a.ID's and b.ID's would be randomly
associated with the marginals given by the correct inner join.
Hopefully that makes sense.
I think this function would be really useful for statistical analysis of
scientific data within MySQL (using randomized versions of the
associations within the tables).
Not sure if the above syntax is optimal, because I would like to be able
to do things like this...
table_one
DeptPerson  Gender
A   1   M
A   2   F
A   3   M
B   4   F
B   5   M
B   6   F
table_two
DeptPerson  Spending
A   1   10
A   2   20
A   3   30
B   4   40
B   5   50
B   6   60
SELECT
 Dept,
 Gender,
 AVG(Spending)
FROM
 table_one a
INNER JOIN
 table_two b
USING
 (Dept,Person)
GROUP BY
 Dept, Gender
;
With the above query (which I hope is sensible) I would like to keep the
departments fixed, and randomize the genders of the people in the
departments (keeping the number of each sex in each department the same).
So we could INNER JOIN using Dept and then RANDOM JOIN (as described
above) using Person - all in one 'JOIN'. All else being the same this
should randomize the Gender, but keep the marginals.
I guess this is overly complex given the problem, and it actually raises
more questions instantly (about statistical analysis), but the function is
basic (I hope), and like all SQL it is the simplicity that allows you to
build complex and robust statements.
Does the random join make any sense? Is this UDF material?
Any feedback is welcome,
Dan.
Thread
UDF request? - Dan Bolser, April 1 2005 12:10am
_
Don’t just search. Find. Check out the new MSN Search! 
http://search.msn.click-url.com/go/onm00200636ave/direct/01/

--
MySQL General Mailing List
For list archives: http://lists.mysql.com/mysql
To unsubscribe:http://lists.mysql.com/[EMAIL PROTECTED]


UDF request?

2005-03-31 Thread Dan Bolser

Hi, 

I have a request for a UDF that I would find really useful. I am
duplicating this question on the community list as well as the MySQL list,
because I am not sure where best to make this kind of request (see the
previous post http://lists.mysql.com/community/97).

I think the following syntax would be really cool...

SELECT 
  a.*,
  b.*
FROM
  table_one a 
RANDOM JOIN   -- <-- :)
  table_two b
USING 
  (ID)
;

Lets say table_one and table_two have a primary key called ID in common
(although of course that isn't necessary).

The idea is that the RANDOM JOIN would first calculate the INNER JOIN,
then scramble the 'links' (or rows) between the two tables. The result
would be the same number of rows overall, the same number of distinct
a.ID's and b.ID's, but those a.ID's and b.ID's would be randomly
associated with the marginals given by the correct inner join.

Hopefully that makes sense.

I think this function would be really useful for statistical analysis of
scientific data within MySQL (using randomized versions of the
associations within the tables).

Not sure if the above syntax is optimal, because I would like to be able
to do things like this...

table_one
DeptPerson  Gender
A   1   M
A   2   F
A   3   M
B   4   F
B   5   M
B   6   F

table_two
DeptPerson  Spending
A   1   10
A   2   20
A   3   30
B   4   40
B   5   50
B   6   60

SELECT 
  Dept,
  Gender,
  AVG(Spending)
FROM
  table_one a
INNER JOIN
  table_two b
USING 
  (Dept,Person)
GROUP BY
  Dept, Gender
;


With the above query (which I hope is sensible) I would like to keep the
departments fixed, and randomize the genders of the people in the
departments (keeping the number of each sex in each department the same).

So we could INNER JOIN using Dept and then RANDOM JOIN (as described
above) using Person - all in one 'JOIN'. All else being the same this
should randomize the Gender, but keep the marginals.

I guess this is overly complex given the problem, and it actually raises
more questions instantly (about statistical analysis), but the function is
basic (I hope), and like all SQL it is the simplicity that allows you to
build complex and robust statements.

Does the random join make any sense? Is this UDF material?

Any feedback is welcome,

Dan.


-- 
MySQL General Mailing List
For list archives: http://lists.mysql.com/mysql
To unsubscribe:http://lists.mysql.com/[EMAIL PROTECTED]