OK, I think I have this working. The last join was this:

JOIN (select id, target_name_id, ep,date_time from data_cst WHERE
target_name_id IN (775, 776, 777, 778, 779, 780, 45, 44, 116, 117,
118, 119, 120, 121) AND data_cst.date_time BETWEEN '2010-03-04
00:00:00' AND '2010-03-04 23:59:59' GROUP BY  target_name_id, ep
HAVING count(*) < 999 ORDER BY  ABS(data_cst.bottom - AVG(bottom))) cm
on cm.target_name_id = q1.target_name_id and cm.ep = q1.ep and
cm.date_time BETWEEN '2010-03-04 00:00:00' AND '2010-03-04 23:59:59'

I have do to more testing to ensure I'm getting the correct rows back.

Thanks for all the help!

On Wed, Sep 19, 2012 at 10:07 AM, Larry Martell <larry.mart...@gmail.com> wrote:
> On Tue, Sep 18, 2012 at 7:41 PM, Rick James <rja...@yahoo-inc.com> wrote:
>> SELECT  ((the appropriate id))          -- <--
>>     FROM  data_cst, rollup, data_target
>>     WHERE  data_target.name = rollup.Target
>>       AND  data_cst.ep = rollup.EP
>>       AND  data_cst.date_time BETWEEN '2010-03-04 00:00:00' AND '2010-03-04 
>> 23:59:59'
>>       AND  data_target.id = data_cst.target_name_id
>>     GROUP BY  rollup.Target, rollup.EP
>>     ORDER BY  ABS(data_cst.bottom - rollup.`Avg Bottom`)
>>     LIMIT  1               -- <--
>
> rollup is the name of the temp table that I was testing with. It
> contains the output of the original query. I don't have that in the
> real world. That's my issue - I can't figure out how to join with the
> result set and apply the order by to the join condition. What would I
> use in place of rollup to get the 'Avg Bottom'?
>
> Here's what my query looks like now:
>
> SELECT data_target.name as Target,
>        q1.ep as EP,
>        COUNT(*) as Wafers,
>        Lots,
>        SUM(numonep)/(COUNT(*)+SUM(CASE WHEN reruns > 0 THEN reruns
> ELSE 0 END)) as 'Sites/Wafer',
>        MAX(LastRun) as "Last Run",
>        SUM(CASE WHEN reruns > 0 THEN reruns ELSE 0 END) as Rerun,
>        COUNT(*)+SUM(CASE WHEN reruns > 0 THEN reruns ELSE 0 END) as Runs,
>        avgbottom as "Avg Bottom",
>        3*stdbottom as "3 Sig",
>        maxbottom as Max,
>        minbottom as Min,
>        SUM(numonep) as Count, SUM(numonep) - SUM(numbottoms) as NAs,
>        100-((SUM(numonep) - SUM(numbottoms))/SUM(numonep)*100) as "% Success",
>        3*stdbottom/avgbottom as "3Sig/Avg",
>        AVG(avgbottom) as 'Wafer Avg',
>        AVG(Wafer3Sigma) as 'Wafer 3 Sigma',
>        AVG(Ranges) as 'Avg Range',
>        3*STD(Ranges) as '3Sig of Ranges',
>        MAX(Ranges) as 'Max Range',
>        MIN(Ranges) as 'Min Range', (SUM(numonep) -
> SUM(numbottoms))/COUNT(*) as 'NAs/Wafer',
>        mx.id as maxID,
>        mn.id as minID,
>        lr.id as lrID
> FROM (SELECT target_name_id,
>              ep,
>              wafer_id,
>              COUNT(bottom) as numbottoms,
>              AVG(bottom) as avgbottom,
>              STD(bottom) as stdbottom,
>              MAX(bottom) as maxbottom,
>              MIN(bottom) as minbottom,
>              MAX(date_time) as "LastRun",
>              COUNT(*) as numonep,
>              COUNT(DISTINCT target_name_id, ep, lot_id,
> data_file_id)-1 as reruns,
>              COUNT(DISTINCT(lot_id)) as Lots,
>              3*STD(bottom) as Wafer3Sigma,
>              MAX(bottom) - MIN(bottom) as Ranges
>       FROM data_cst
>       WHERE target_name_id IN (775, 776, 777, 778, 779, 780, 45, 44,
> 116, 117, 118, 119, 120, 121)
>       AND data_cst.date_time BETWEEN '2010-03-04 00:00:00' AND
> '2010-03-04 23:59:59'
>       GROUP BY target_name_id, ep, wafer_id
>       HAVING count(*) < 999) q1
> join data_cst mn ON mn.bottom = q1.minbottom
> join data_cst mx on mx.bottom = q1.maxbottom
> join data_cst lr on lr.date_time = q1.LastRun,
> data_target
> WHERE data_target.id = q1.target_name_id
> GROUP BY q1.target_name_id, q1.ep;
>
>>
>>> -----Original Message-----
>>> From: Larry Martell [mailto:larry.mart...@gmail.com]
>>> Sent: Tuesday, September 18, 2012 2:57 PM
>>> To: Rick James
>>> Cc: peter.braw...@earthlink.net; mysql@lists.mysql.com
>>> Subject: Re: getting certain rows from a group by
>>>
>>> On Tue, Sep 18, 2012 at 4:01 PM, Rick James <rja...@yahoo-inc.com>
>>> wrote:
>>> > SELECT mn.*
>>> >        mx.*
>>> >> >    FROM ( SELECT @min = MIN(bottom),
>>> >> >                  @max = MAX(bottom), ... )
>>> >> >    JOIN data_cst mn  ON bottom = @min
>>> >> >    JOIN data_cst mx  ON bottom = @max
>>> >
>>> > That is, the FROM finds the value for the desired row.
>>> > The JOIN then gets to the rest of the fields.
>>> > Caveat:  If two rows have bottom = @min, it returns both.
>>>
>>>
>>> Thanks a lot Rick! This was super helpful. I've got it working for 3 of
>>> the 4 cases - min(bottom), max(bottom), and max(date_time). But I can't
>>> figure out how to work in the last case - where bottom is closest to
>>> avg(bottom). In an individual query I can get it with an order by, like
>>> this:
>>>
>>> SELECT rollup.Target, rollup.EP, rollup.`Avg Bottom`, data_cst.id,
>>> data_cst.bottom FROM data_cst, rollup, data_target WHERE
>>> data_target.name = rollup.Target AND data_cst.ep = rollup.EP AND
>>> data_cst.date_time BETWEEN '2010-03-04 00:00:00' AND '2010-03-04
>>> 23:59:59'
>>> AND data_target.id = data_cst.target_name_id GROUP BY rollup.Target,
>>> rollup.EP ORDER BY ABS(data_cst.bottom - rollup.`Avg Bottom`);
>>>
>>> Any way to work that into another join?
>>>
>>> Thanks!
>>>
>>> >
>>> >> -----Original Message-----
>>> >> From: Larry Martell [mailto:larry.mart...@gmail.com]
>>> >> Sent: Tuesday, September 18, 2012 12:54 PM
>>> >> To: Rick James
>>> >> Cc: peter.braw...@earthlink.net; mysql@lists.mysql.com
>>> >> Subject: Re: getting certain rows from a group by
>>> >>
>>> >> On Tue, Sep 18, 2012 at 2:05 PM, Rick James <rja...@yahoo-inc.com>
>>> >> wrote:
>>> >> > For single query...
>>> >> >
>>> >> > Plan A:
>>> >> > Repeat the SELECT as once for each (MIN, AVG, etc) as a subquery
>>> in
>>> >> >    WHERE bottom = ( SELECT MIN(bottom) ... )
>>> >> >
>>> >> > Plan B:
>>> >> >    FROM ( SELECT @min = MIN(bottom),
>>> >> >                  @max = MAX(bottom), ... )
>>> >> >    JOIN data_cst mn  ON bottom = @min
>>> >> >    JOIN data_cst mx  ON bottom = @max
>>> >>
>>> >>
>>> >> Can you elaborate on this? I don't see how this will give me what I
>>> >> need. I'm not looking for the min or max bottom - I already have
>>> that
>>> >> - I'm looking for the row from each group that has the min and max
>>> >> bottom.
>>> >>
>>> >> > Plan C:
>>> >> > Get rid of 3rd party packages that eventually get in your way
>>> >> > instead
>>> >> of 'helping'.
>>> >>
>>> >> It's not django that is getting in the way. django is a fantastic
>>> >> framework for web development. It's just changing requirements. The
>>> >> original developer (who no longer works here) created a few
>>> different
>>> >> classes, and then based on the original requirements, chose to use
>>> >> the one that supports a single query. That worked then, but may not
>>> >> work now. There are other classes that allow multiple queries, but
>>> >> then you have to parse the data and stuff it into the context that
>>> >> will be picked up by the browser yourself. The single query class
>>> >> does that for you. I may yet go that way, but I'm trying to avoid a
>>> >> big rewrite if I can.
>>> >>
>>> >> >> -----Original Message-----
>>> >> >> From: Larry Martell [mailto:larry.mart...@gmail.com]
>>> >> >> Sent: Tuesday, September 18, 2012 4:57 AM
>>> >> >> To: peter.braw...@earthlink.net
>>> >> >> Cc: mysql@lists.mysql.com
>>> >> >> Subject: Re: getting certain rows from a group by
>>> >> >>
>>> >> >> On Tue, Sep 18, 2012 at 7:52 AM, Peter Brawley
>>> >> >> <peter.braw...@earthlink.net> wrote:
>>> >> >> > On 2012-09-18 5:53 AM, Larry Martell wrote:
>>> >> >> >>
>>> >> >> >> I have this query:
>>> >> >> >>
>>> >> >> >> SELECT data_target.name as Target,
>>> >> >> >>         q1.ep as EP,
>>> >> >> >>         COUNT(*) as Wafers,
>>> >> >> >>         Lots,
>>> >> >> >>         SUM(numonep)/(COUNT(*)+SUM(CASE WHEN reruns > 0 THEN
>>> >> >> >> reruns ELSE 0 END)) as 'Sites/Wafer',
>>> >> >> >>         MAX(LastRun) as "Last Run",
>>> >> >> >>         SUM(CASE WHEN reruns > 0 THEN reruns ELSE 0 END) as
>>> >> Rerun,
>>> >> >> >>         COUNT(*)+SUM(CASE WHEN reruns > 0 THEN reruns ELSE 0
>>> >> >> >> END) as
>>> >> >> Runs,
>>> >> >> >>         avgbottom as "Avg Bottom",
>>> >> >> >>         3*stdbottom as "3 Sig",
>>> >> >> >>         maxbottom as Max,
>>> >> >> >>         minbottom as Min,
>>> >> >> >>         SUM(numonep) as Count,
>>> >> >> >>         SUM(numonep) - SUM(numbottoms) as NAs,
>>> >> >> >>         100-((SUM(numonep) -
>>> SUM(numbottoms))/SUM(numonep)*100)
>>> >> as
>>> >> >> "%
>>> >> >> >> Success",
>>> >> >> >>         3*stdbottom/avgbottom as "3Sig/Avg",
>>> >> >> >>         AVG(avgbottom) as 'Wafer Avg',
>>> >> >> >>         AVG(Wafer3Sigma) as 'Wafer 3 Sigma',
>>> >> >> >>         AVG(Ranges) as 'Avg Range',
>>> >> >> >>         3*STD(Ranges) as '3Sig of Ranges',
>>> >> >> >>         MAX(Ranges) as 'Max Range',
>>> >> >> >>         MIN(Ranges) as 'Min Range',
>>> >> >> >>         (SUM(numonep) - SUM(numbottoms))/COUNT(*) as
>>> 'NAs/Wafer'
>>> >> >> >> FROM (SELECT target_name_id,
>>> >> >> >>               ep,
>>> >> >> >>               wafer_id,
>>> >> >> >>               COUNT(bottom) as numbottoms,
>>> >> >> >>               AVG(bottom) as avgbottom,
>>> >> >> >>               STD(bottom) as stdbottom,
>>> >> >> >>               MAX(bottom) as maxbottom,
>>> >> >> >>               MIN(bottom) as minbottom,
>>> >> >> >>               MAX(date_time) as "LastRun",
>>> >> >> >>               COUNT(*) as numonep,
>>> >> >> >>               COUNT(DISTINCT target_name_id, ep, lot_id,
>>> >> >> >> data_file_id)-1 as reruns,
>>> >> >> >>               COUNT(DISTINCT(lot_id)) as Lots,
>>> >> >> >>               3*STD(bottom) as Wafer3Sigma,
>>> >> >> >>               MAX(bottom) - MIN(bottom) as Ranges
>>> >> >> >>        FROM data_cst
>>> >> >> >>        WHERE target_name_id IN (775, 776, 777, 778, 779, 780,
>>> >> >> >> 45,
>>> >> >> 44,
>>> >> >> >> 116, 117, 118, 119, 120, 121)
>>> >> >> >>        AND data_cst.date_time BETWEEN '2010-03-04 00:00:00'
>>> AND
>>> >> >> >> '2010-03-04 23:59:59'
>>> >> >> >>        GROUP BY target_name_id, ep, wafer_id
>>> >> >> >>        HAVING count(*) < 999) q1,
>>> >> >> >>       data_target
>>> >> >> >> WHERE data_target.id = target_name_id GROUP BY
>>> >> >> >> q1.target_name_id, q1.ep;
>>> >> >> >>
>>> >> >> >>
>>> >> >> >> This works fine. But now I need to get a certain column
>>> >> >> >> (image_measurer_id) with each row returned that corresponds to
>>> >> the
>>> >> >> >> row from the group that has bottom = Min(bottom), bottom =
>>> >> >> >> Max(bottom), bottom closest to Avg(bottom), and bottom from
>>> the
>>> >> >> >> row where date_time = Max(date_time).
>>> >> >> >>
>>> >> >> >> Is this even possible from one query?
>>> >> >> >
>>> >> >> >
>>> >> >> > Might be, but what's the importance of doing it as one query?
>>> >> >>
>>> >> >> Because it's part of a django based web app, and the class that
>>> >> >> this is part of only supports having a single query. To use
>>> >> >> multiple queries will require a fairly major rewrite of the
>>> server
>>> >> >> side of
>>> >> that app.
>>> >> >>
>>> >> >> > I'd start by
>>> >> >> > saving this result to a temp table and developing the new
>>> query.
>>> >> >> > When that's running, see if you can to optimise a query built
>>> by
>>> >> >> > replacing the reference to the temp table with the original
>>> query.
>>> >> >>
>>> >> >> Thanks, I'll look into this.

-- 
MySQL General Mailing List
For list archives: http://lists.mysql.com/mysql
To unsubscribe:    http://lists.mysql.com/mysql

Reply via email to