On Wed, Jan 15, 2014 at 4:06 PM, shawn l.green <shawn.l.gr...@oracle.com> wrote: > Hello Larry, > > > On 1/13/2014 2:43 AM, Peter Brawley wrote: >> >> >> On 2014-01-12 9:13 PM, Larry Martell wrote: >>> >>> On Sun, Jan 12, 2014 at 2:47 PM, Peter Brawley >>> <peter.braw...@earthlink.net> wrote: >>>> >>>> On 2014-01-12 1:17 PM, Larry Martell wrote: >>>>> >>>>> I've been asked to do something that I do not think is possible in SQL. >>>>> >>>>> I have a query that has this basic form: >>>>> >>>>> SELECT a, b, c, d, AVG(e), STD(e), CONCAT(x, ',', y) as f >>>>> FROM t >>>>> GROUP BY a, b, c, d, f >>>>> >>>>> x and y are numbers (378.18, 2213.797 or 378.218, 2213.949 or >>>>> 10053.490, 2542.094). >>>>> >>>>> The business issue is that if either x or y in 2 rows that are in the >>>>> same a, b, c, d group are within 1 of each other then they should be >>>>> grouped together. And to make it more complicated, the tolerance is >>>>> applied as a rolling continuum. For example, if the x and y in a set >>>>> of grouped rows are: >>>>> >>>>> row 1: 1.5, 9.5 >>>>> row 2: 2.4, 20.8 >>>>> row 3: 3.3, 40.6 >>>>> row 4: 4.2, 2.5 >>>>> row 5: 5.1, 10.1 >>>>> row 6: 6.0, 7.9 >>>>> row 7: 8.0, 21.0 >>>>> row 8: 100, 200 >>>>> >>>>> 1 through 6 get combined because all their X values are within the >>>>> tolerance of some other X in the set that's been combined. 7's Y value >>>>> is within the tolerance of 2's Y, so that should be combined as well. >>>>> 8 is not combined because neither the X or Y value is within the >>>>> tolerance of any X or Y in the set that was combined. >>>>> >>>>> In python I can easily parse the data and identify the rows that need >>>>> to be combined, but then I've lost the ability to calculate the >>>>> average and std. The only way I can think of to do this is to remove >>>>> the grouping from the SQL and do all the grouping and aggregating >>>>> myself. But this query often returns 20k to 30k rows after grouping. >>>>> It could easily be 80k to 100k rows that I have to process if I remove >>>>> the grouping and I think that will be very slow. >>>>> >>>>> Anyone have any ideas? >>>> >>>> >>>> Could you compute the row-to-row values & write them to a temp table, >>>> then >>>> run the SQL that incorporates that result column? >>> >>> I thought of temp tables, but I could not come up with a way to use >>> them for this. How can I apply the x/y tolerance grouping in sql? >> >> >> Run the query you showed, saving the result to a temp table. In an sproc >> or your preferred app language, do the row-to-row processing to generate >> a new column in the temp table from the biz rules you outlined, now >> query the revised temp table as desired. >> > > This is a very interesting problem but I am not sure what to do with a > grouping. To me this appears to be a two-dimensional, nearest-neighbor-type > sorting problem. > > The way you outline it you could end up with all of the elements in your > analysis plane in a single group depending on how closely they are to each > other. Are long strings of points and blobs (clusters of points) in your 2-D > search space (your plane is defined by your {a,b,c,d} tuple) acceptable > results of your re-grouping process? > > If I had to draw this out graphically, you are stamping each of your {x,y} > points with a 2x2 boundary box (each point is in intersection in the middle) > and you want to know which sets of those boxes form a contiguous > intersection area (they overlap). > > Is that what you are looking for? One technique would be to start with > Peter's suggestion. Then reprocess that list to generate an acyclic directed > graph (by eliminating the reverse matches from your set), then tracing down > the tree. If point 2 links to point 7,then 7 also links to 2 (both would be > found by your nearness test. Eliminate any matches where the first point > appears after the second point from your list.
Thanks much for the reply Shawn. Unfortunately this project was put on the back burner. When I get back to it I'll be sure to try the suggestions given here and I will report back as to what worked. -- MySQL General Mailing List For list archives: http://lists.mysql.com/mysql To unsubscribe: http://lists.mysql.com/mysql