Re: grouping by the difference between values in rows

Takeshi Hashimoto Tue, 21 Jan 2014 15:09:58 -0800

For me, it seems just use [case - when ] on the difference between x and y, and 
group by with output.


Good luck *\(^o^)/*

Sent from my iPhone

On Jan 21, 2014, at 15:38, h...@tbbs.net wrote:

>>>>> 2014/01/12 14:17 -0500, Larry Martell >>>>
> I've been asked to do something that I do not think is possible in SQL.
> 
> I have a query that has this basic form:
> 
> SELECT a, b, c, d, AVG(e), STD(e), CONCAT(x, ',', y) as f
> FROM t
> GROUP BY a, b, c, d, f
> 
> x and y are numbers (378.18, 2213.797 or 378.218, 2213.949 or
> 10053.490, 2542.094).
> 
> The business issue is that if either x or y in 2 rows that are in the
> same a, b, c, d group are within 1 of each other then they should be
> grouped together. And to make it more complicated, the tolerance is
> applied as a rolling continuum. For example, if the x and y in a set
> of grouped rows are:
> 
> row 1: 1.5, 9.5
> row 2: 2.4, 20.8
> row 3: 3.3, 40.6
> row 4: 4.2, 2.5
> row 5: 5.1, 10.1
> row 6: 6.0, 7.9
> row 7: 8.0, 21.0
> row 8: 100, 200
> 
> 1 through 6 get combined because all their X values are within the
> tolerance of some other X in the set that's been combined. 7's Y value
> is within the tolerance of 2's Y, so that should be combined as well.
> 8 is not combined because neither the X or Y value is within the
> tolerance of any X or Y in the set that was combined.
> 
> In python I can easily parse the data and identify the rows that need
> to be combined, but then I've lost the ability to calculate the
> average and std. The only way I can think of to do this is to remove
> the grouping from the SQL and do all the grouping and aggregating
> myself. But this query often returns 20k to 30k rows after grouping.
> It could easily be 80k to 100k rows that I have to process if I remove
> the grouping and I think that will be very slow.
> 
> Anyone have any ideas?
> <<<<<<<<
> I suspect you can carry out their ideas by something like this, in an SQL 
> procedure:
> 
> Besides your table "t", there are tables "t1", "t2", "t3", "tpair", and "tq".
> 
> With a cursor copy records from "t" to "t1" (with all of "t" s fields and an  
> "g1" besides) ordered by a, b, c, d, x, going through all the complications 
> of deciding where a group boundary falls, numbering the groups by "g1".
> 
> Repeat this copying from "t1" to "t2" (which has besides "g1" also "g2"), 
> ordered by a, b, c, d, y, numbering the groups by "g2". Now "t1" no longer is 
> needed.
> 
> Copy all distinct pairs of "g1" and "g2" to "tpair".
> 
> Until g = MIN(g1) of "tpair" is null, move (INSERT ... SELECT; DELETE ...) 
> from "tpair" to "tq" all pairs where g1=g, and as long as there is anything 
> to move from "tpair" to "tq" alternatly move records where any in "tpair" s 
> "g2" match any already in "tq" and any in "tpair" s "g1" match any already in 
> "tq". Now all the pairs in "tq" represent the same group: every record in 
> "t2" with a pair in "tq" is copied into "t3" with a new number "g3" instead 
> of the pair.
> 
> After this one may group "t3" by "g3".
> 
> 
> -- 
> MySQL General Mailing List
> For list archives: http://lists.mysql.com/mysql
> To unsubscribe:    http://lists.mysql.com/mysql
> 

--
MySQL General Mailing List
For list archives: http://lists.mysql.com/mysql
To unsubscribe:    http://lists.mysql.com/mysql

Re: grouping by the difference between values in rows

Reply via email to