Well, I'm getting the proper rows from the 4 joins I added, but now I'm seeing a weird side effect - my counts are 3 times to much.
The query is really huge and nasty now, but I'm going to paste it below. In the outer most select, Wafers, Rerun, Runs, and Count are 3 times what they should be. If I take off the joins, I get the correct counts. I can't see why that's happening. SELECT data_target.name as Target, q1.ep as EP, COUNT(*) as Wafers, Lots, FORMAT(SUM(numonep)/(COUNT(*)+(SUM(GREATEST(q1.reruns, 0)))), 1) as 'Sites/Wafer', DATE_FORMAT(MAX(LastRun), '%m/%d/%y') as "Last Run", SUM(GREATEST(q1.reruns, 0)) as Rerun, COUNT(*)+SUM(GREATEST(q1.reruns, 0)) as Runs, FORMAT(avgbottom, 1) as "Avg Bottom", FORMAT(3*stdbottom, 2) as "3 Sig", FORMAT(maxbottom, 1) as Max, FORMAT(minbottom, 1) as Min, SUM(numonep) as Count, SUM(numonep) - SUM(numbottoms) as NAs, CONCAT(FORMAT(100*SUM(numbottoms)/SUM(numonep), 1), ' %') as "% Success", FORMAT(3*stdbottom/avgbottom, 2) as "3Sig/Avg", FORMAT(AVG(avgbottom), 1) as 'Wafer Avg', FORMAT(AVG(Wafer3Sigma), 2) as 'Wafer 3 Sigma', FORMAT(AVG(Ranges), 1) as 'Avg Range', FORMAT(3*STD(Ranges), 2) as '3Sig of Ranges', FORMAT(MAX(Ranges), 1) as 'Max Range', FORMAT(MIN(Ranges), 1) as 'Min Range', FORMAT((SUM(numonep) - SUM(numbottoms))/COUNT(*), 1) as 'NAs/Wafer', mx.maxImage as maxImage, mn.minImage as minImage, lr.lrImage as lrlrImage, cm.cmImage as cmcmImage FROM (SELECT target_name_id, ep, wafer_id, COUNT(bottom) as numbottoms, AVG(bottom) as avgbottom, STD(bottom) as stdbottom, MAX(bottom) as maxbottom, MIN(bottom) as minbottom, MAX(date_time) as "LastRun", COUNT(*) as numonep, COUNT(DISTINCT target_name_id, ep, lot_id, data_file_id)-1 as reruns, COUNT(DISTINCT(lot_id)) as Lots, 3*STD(bottom) as Wafer3Sigma, MAX(bottom) - MIN(bottom) as Ranges FROM data_cst WHERE TRUE AND data_cst.target_name_id IN (775, 776, 777, 778, 779, 780, 45, 44, 116, 117, 118, 119, 120, 121) AND data_cst.date_time BETWEEN '2010-03-04 00:00:00' AND '2010-03-04 23:59:59' GROUP BY target_name_id, ep, wafer_id HAVING count(*) < 999) q1 JOIN (SELECT data_cstimage.name as minImage, data_cst.bottom, data_cst.target_name_id, data_cst.date_time, data_cst.lot_id, data_cst.tool_id, data_cst.roiname, data_cst.recipe_id, data_cst.ep FROM data_cstimage, data_cst WHERE data_cstimage.id = data_cst.image_measurer_id) mn ON mn.bottom = q1.minbottom AND mn.target_name_id IN (775, 776, 777, 778, 779, 780, 45, 44, 116, 117, 118, 119, 120, 121) AND mn.date_time BETWEEN '2010-03-04 00:00:00' AND '2010-03-04 23:59:59' JOIN (SELECT data_cstimage.name as maxImage, data_cst.bottom, data_cst.target_name_id, data_cst.date_time, data_cst.lot_id, data_cst.tool_id, data_cst.roiname, data_cst.recipe_id, data_cst.ep FROM data_cstimage, data_cst WHERE data_cstimage.id = data_cst.image_measurer_id) mx ON mx.bottom = q1.maxbottom AND mx.target_name_id IN (775, 776, 777, 778, 779, 780, 45, 44, 116, 117, 118, 119, 120, 121) AND mx.date_time BETWEEN '2010-03-04 00:00:00' AND '2010-03-04 23:59:59' JOIN (SELECT data_cstimage.name as lrImage, data_cst.bottom, data_cst.target_name_id, data_cst.date_time, data_cst.lot_id, data_cst.tool_id, data_cst.roiname, data_cst.recipe_id, data_cst.ep FROM data_cstimage, data_cst WHERE data_cstimage.id = data_cst.image_measurer_id) lr ON lr.date_time = q1.LastRun AND lr.target_name_id IN (775, 776, 777, 778, 779, 780, 45, 44, 116, 117, 118, 119, 120, 121) AND lr.date_time BETWEEN '2010-03-04 00:00:00' AND '2010-03-04 23:59:59' JOIN (SELECT data_cstimage.name as cmImage, data_cst.bottom, data_cst.target_name_id, data_cst.date_time, data_cst.lot_id, data_cst.tool_id, data_cst.roiname, data_cst.recipe_id, data_cst.ep FROM data_cstimage, data_cst WHERE data_cstimage.id = data_cst.image_measurer_id AND data_cst.target_name_id IN (775, 776, 777, 778, 779, 780, 45, 44, 116, 117, 118, 119, 120, 121) AND data_cst.date_time BETWEEN '2010-03-04 00:00:00' AND '2010-03-04 23:59:59' GROUP BY target_name_id, ep HAVING count(*) < 999 ORDER BY ABS(bottom - AVG(bottom))) cm ON cm.target_name_id = q1.target_name_id AND cm.ep = q1.ep AND cm.target_name_id IN (775, 776, 777, 778, 779, 780, 45, 44, 116, 117, 118, 119, 120, 121) AND cm.date_time BETWEEN '2010-03-04 00:00:00' AND '2010-03-04 23:59:59' , data_target WHERE data_target.id = q1.target_name_id GROUP BY q1.target_name_id, q1.ep On Wed, Sep 19, 2012 at 3:27 PM, Larry Martell <larry.mart...@gmail.com> wrote: > OK, I think I have this working. The last join was this: > > JOIN (select id, target_name_id, ep,date_time from data_cst WHERE > target_name_id IN (775, 776, 777, 778, 779, 780, 45, 44, 116, 117, > 118, 119, 120, 121) AND data_cst.date_time BETWEEN '2010-03-04 > 00:00:00' AND '2010-03-04 23:59:59' GROUP BY target_name_id, ep > HAVING count(*) < 999 ORDER BY ABS(data_cst.bottom - AVG(bottom))) cm > on cm.target_name_id = q1.target_name_id and cm.ep = q1.ep and > cm.date_time BETWEEN '2010-03-04 00:00:00' AND '2010-03-04 23:59:59' > > I have do to more testing to ensure I'm getting the correct rows back. > > Thanks for all the help! > > On Wed, Sep 19, 2012 at 10:07 AM, Larry Martell <larry.mart...@gmail.com> > wrote: >> On Tue, Sep 18, 2012 at 7:41 PM, Rick James <rja...@yahoo-inc.com> wrote: >>> SELECT ((the appropriate id)) -- <-- >>> FROM data_cst, rollup, data_target >>> WHERE data_target.name = rollup.Target >>> AND data_cst.ep = rollup.EP >>> AND data_cst.date_time BETWEEN '2010-03-04 00:00:00' AND '2010-03-04 >>> 23:59:59' >>> AND data_target.id = data_cst.target_name_id >>> GROUP BY rollup.Target, rollup.EP >>> ORDER BY ABS(data_cst.bottom - rollup.`Avg Bottom`) >>> LIMIT 1 -- <-- >> >> rollup is the name of the temp table that I was testing with. It >> contains the output of the original query. I don't have that in the >> real world. That's my issue - I can't figure out how to join with the >> result set and apply the order by to the join condition. What would I >> use in place of rollup to get the 'Avg Bottom'? >> >> Here's what my query looks like now: >> >> SELECT data_target.name as Target, >> q1.ep as EP, >> COUNT(*) as Wafers, >> Lots, >> SUM(numonep)/(COUNT(*)+SUM(CASE WHEN reruns > 0 THEN reruns >> ELSE 0 END)) as 'Sites/Wafer', >> MAX(LastRun) as "Last Run", >> SUM(CASE WHEN reruns > 0 THEN reruns ELSE 0 END) as Rerun, >> COUNT(*)+SUM(CASE WHEN reruns > 0 THEN reruns ELSE 0 END) as Runs, >> avgbottom as "Avg Bottom", >> 3*stdbottom as "3 Sig", >> maxbottom as Max, >> minbottom as Min, >> SUM(numonep) as Count, SUM(numonep) - SUM(numbottoms) as NAs, >> 100-((SUM(numonep) - SUM(numbottoms))/SUM(numonep)*100) as "% >> Success", >> 3*stdbottom/avgbottom as "3Sig/Avg", >> AVG(avgbottom) as 'Wafer Avg', >> AVG(Wafer3Sigma) as 'Wafer 3 Sigma', >> AVG(Ranges) as 'Avg Range', >> 3*STD(Ranges) as '3Sig of Ranges', >> MAX(Ranges) as 'Max Range', >> MIN(Ranges) as 'Min Range', (SUM(numonep) - >> SUM(numbottoms))/COUNT(*) as 'NAs/Wafer', >> mx.id as maxID, >> mn.id as minID, >> lr.id as lrID >> FROM (SELECT target_name_id, >> ep, >> wafer_id, >> COUNT(bottom) as numbottoms, >> AVG(bottom) as avgbottom, >> STD(bottom) as stdbottom, >> MAX(bottom) as maxbottom, >> MIN(bottom) as minbottom, >> MAX(date_time) as "LastRun", >> COUNT(*) as numonep, >> COUNT(DISTINCT target_name_id, ep, lot_id, >> data_file_id)-1 as reruns, >> COUNT(DISTINCT(lot_id)) as Lots, >> 3*STD(bottom) as Wafer3Sigma, >> MAX(bottom) - MIN(bottom) as Ranges >> FROM data_cst >> WHERE target_name_id IN (775, 776, 777, 778, 779, 780, 45, 44, >> 116, 117, 118, 119, 120, 121) >> AND data_cst.date_time BETWEEN '2010-03-04 00:00:00' AND >> '2010-03-04 23:59:59' >> GROUP BY target_name_id, ep, wafer_id >> HAVING count(*) < 999) q1 >> join data_cst mn ON mn.bottom = q1.minbottom >> join data_cst mx on mx.bottom = q1.maxbottom >> join data_cst lr on lr.date_time = q1.LastRun, >> data_target >> WHERE data_target.id = q1.target_name_id >> GROUP BY q1.target_name_id, q1.ep; >> >>> >>>> -----Original Message----- >>>> From: Larry Martell [mailto:larry.mart...@gmail.com] >>>> Sent: Tuesday, September 18, 2012 2:57 PM >>>> To: Rick James >>>> Cc: peter.braw...@earthlink.net; mysql@lists.mysql.com >>>> Subject: Re: getting certain rows from a group by >>>> >>>> On Tue, Sep 18, 2012 at 4:01 PM, Rick James <rja...@yahoo-inc.com> >>>> wrote: >>>> > SELECT mn.* >>>> > mx.* >>>> >> > FROM ( SELECT @min = MIN(bottom), >>>> >> > @max = MAX(bottom), ... ) >>>> >> > JOIN data_cst mn ON bottom = @min >>>> >> > JOIN data_cst mx ON bottom = @max >>>> > >>>> > That is, the FROM finds the value for the desired row. >>>> > The JOIN then gets to the rest of the fields. >>>> > Caveat: If two rows have bottom = @min, it returns both. >>>> >>>> >>>> Thanks a lot Rick! This was super helpful. I've got it working for 3 of >>>> the 4 cases - min(bottom), max(bottom), and max(date_time). But I can't >>>> figure out how to work in the last case - where bottom is closest to >>>> avg(bottom). In an individual query I can get it with an order by, like >>>> this: >>>> >>>> SELECT rollup.Target, rollup.EP, rollup.`Avg Bottom`, data_cst.id, >>>> data_cst.bottom FROM data_cst, rollup, data_target WHERE >>>> data_target.name = rollup.Target AND data_cst.ep = rollup.EP AND >>>> data_cst.date_time BETWEEN '2010-03-04 00:00:00' AND '2010-03-04 >>>> 23:59:59' >>>> AND data_target.id = data_cst.target_name_id GROUP BY rollup.Target, >>>> rollup.EP ORDER BY ABS(data_cst.bottom - rollup.`Avg Bottom`); >>>> >>>> Any way to work that into another join? >>>> >>>> Thanks! >>>> >>>> > >>>> >> -----Original Message----- >>>> >> From: Larry Martell [mailto:larry.mart...@gmail.com] >>>> >> Sent: Tuesday, September 18, 2012 12:54 PM >>>> >> To: Rick James >>>> >> Cc: peter.braw...@earthlink.net; mysql@lists.mysql.com >>>> >> Subject: Re: getting certain rows from a group by >>>> >> >>>> >> On Tue, Sep 18, 2012 at 2:05 PM, Rick James <rja...@yahoo-inc.com> >>>> >> wrote: >>>> >> > For single query... >>>> >> > >>>> >> > Plan A: >>>> >> > Repeat the SELECT as once for each (MIN, AVG, etc) as a subquery >>>> in >>>> >> > WHERE bottom = ( SELECT MIN(bottom) ... ) >>>> >> > >>>> >> > Plan B: >>>> >> > FROM ( SELECT @min = MIN(bottom), >>>> >> > @max = MAX(bottom), ... ) >>>> >> > JOIN data_cst mn ON bottom = @min >>>> >> > JOIN data_cst mx ON bottom = @max >>>> >> >>>> >> >>>> >> Can you elaborate on this? I don't see how this will give me what I >>>> >> need. I'm not looking for the min or max bottom - I already have >>>> that >>>> >> - I'm looking for the row from each group that has the min and max >>>> >> bottom. >>>> >> >>>> >> > Plan C: >>>> >> > Get rid of 3rd party packages that eventually get in your way >>>> >> > instead >>>> >> of 'helping'. >>>> >> >>>> >> It's not django that is getting in the way. django is a fantastic >>>> >> framework for web development. It's just changing requirements. The >>>> >> original developer (who no longer works here) created a few >>>> different >>>> >> classes, and then based on the original requirements, chose to use >>>> >> the one that supports a single query. That worked then, but may not >>>> >> work now. There are other classes that allow multiple queries, but >>>> >> then you have to parse the data and stuff it into the context that >>>> >> will be picked up by the browser yourself. The single query class >>>> >> does that for you. I may yet go that way, but I'm trying to avoid a >>>> >> big rewrite if I can. >>>> >> >>>> >> >> -----Original Message----- >>>> >> >> From: Larry Martell [mailto:larry.mart...@gmail.com] >>>> >> >> Sent: Tuesday, September 18, 2012 4:57 AM >>>> >> >> To: peter.braw...@earthlink.net >>>> >> >> Cc: mysql@lists.mysql.com >>>> >> >> Subject: Re: getting certain rows from a group by >>>> >> >> >>>> >> >> On Tue, Sep 18, 2012 at 7:52 AM, Peter Brawley >>>> >> >> <peter.braw...@earthlink.net> wrote: >>>> >> >> > On 2012-09-18 5:53 AM, Larry Martell wrote: >>>> >> >> >> >>>> >> >> >> I have this query: >>>> >> >> >> >>>> >> >> >> SELECT data_target.name as Target, >>>> >> >> >> q1.ep as EP, >>>> >> >> >> COUNT(*) as Wafers, >>>> >> >> >> Lots, >>>> >> >> >> SUM(numonep)/(COUNT(*)+SUM(CASE WHEN reruns > 0 THEN >>>> >> >> >> reruns ELSE 0 END)) as 'Sites/Wafer', >>>> >> >> >> MAX(LastRun) as "Last Run", >>>> >> >> >> SUM(CASE WHEN reruns > 0 THEN reruns ELSE 0 END) as >>>> >> Rerun, >>>> >> >> >> COUNT(*)+SUM(CASE WHEN reruns > 0 THEN reruns ELSE 0 >>>> >> >> >> END) as >>>> >> >> Runs, >>>> >> >> >> avgbottom as "Avg Bottom", >>>> >> >> >> 3*stdbottom as "3 Sig", >>>> >> >> >> maxbottom as Max, >>>> >> >> >> minbottom as Min, >>>> >> >> >> SUM(numonep) as Count, >>>> >> >> >> SUM(numonep) - SUM(numbottoms) as NAs, >>>> >> >> >> 100-((SUM(numonep) - >>>> SUM(numbottoms))/SUM(numonep)*100) >>>> >> as >>>> >> >> "% >>>> >> >> >> Success", >>>> >> >> >> 3*stdbottom/avgbottom as "3Sig/Avg", >>>> >> >> >> AVG(avgbottom) as 'Wafer Avg', >>>> >> >> >> AVG(Wafer3Sigma) as 'Wafer 3 Sigma', >>>> >> >> >> AVG(Ranges) as 'Avg Range', >>>> >> >> >> 3*STD(Ranges) as '3Sig of Ranges', >>>> >> >> >> MAX(Ranges) as 'Max Range', >>>> >> >> >> MIN(Ranges) as 'Min Range', >>>> >> >> >> (SUM(numonep) - SUM(numbottoms))/COUNT(*) as >>>> 'NAs/Wafer' >>>> >> >> >> FROM (SELECT target_name_id, >>>> >> >> >> ep, >>>> >> >> >> wafer_id, >>>> >> >> >> COUNT(bottom) as numbottoms, >>>> >> >> >> AVG(bottom) as avgbottom, >>>> >> >> >> STD(bottom) as stdbottom, >>>> >> >> >> MAX(bottom) as maxbottom, >>>> >> >> >> MIN(bottom) as minbottom, >>>> >> >> >> MAX(date_time) as "LastRun", >>>> >> >> >> COUNT(*) as numonep, >>>> >> >> >> COUNT(DISTINCT target_name_id, ep, lot_id, >>>> >> >> >> data_file_id)-1 as reruns, >>>> >> >> >> COUNT(DISTINCT(lot_id)) as Lots, >>>> >> >> >> 3*STD(bottom) as Wafer3Sigma, >>>> >> >> >> MAX(bottom) - MIN(bottom) as Ranges >>>> >> >> >> FROM data_cst >>>> >> >> >> WHERE target_name_id IN (775, 776, 777, 778, 779, 780, >>>> >> >> >> 45, >>>> >> >> 44, >>>> >> >> >> 116, 117, 118, 119, 120, 121) >>>> >> >> >> AND data_cst.date_time BETWEEN '2010-03-04 00:00:00' >>>> AND >>>> >> >> >> '2010-03-04 23:59:59' >>>> >> >> >> GROUP BY target_name_id, ep, wafer_id >>>> >> >> >> HAVING count(*) < 999) q1, >>>> >> >> >> data_target >>>> >> >> >> WHERE data_target.id = target_name_id GROUP BY >>>> >> >> >> q1.target_name_id, q1.ep; >>>> >> >> >> >>>> >> >> >> >>>> >> >> >> This works fine. But now I need to get a certain column >>>> >> >> >> (image_measurer_id) with each row returned that corresponds to >>>> >> the >>>> >> >> >> row from the group that has bottom = Min(bottom), bottom = >>>> >> >> >> Max(bottom), bottom closest to Avg(bottom), and bottom from >>>> the >>>> >> >> >> row where date_time = Max(date_time). >>>> >> >> >> >>>> >> >> >> Is this even possible from one query? >>>> >> >> > >>>> >> >> > >>>> >> >> > Might be, but what's the importance of doing it as one query? >>>> >> >> >>>> >> >> Because it's part of a django based web app, and the class that >>>> >> >> this is part of only supports having a single query. To use >>>> >> >> multiple queries will require a fairly major rewrite of the >>>> server >>>> >> >> side of >>>> >> that app. >>>> >> >> >>>> >> >> > I'd start by >>>> >> >> > saving this result to a temp table and developing the new >>>> query. >>>> >> >> > When that's running, see if you can to optimise a query built >>>> by >>>> >> >> > replacing the reference to the temp table with the original >>>> query. >>>> >> >> >>>> >> >> Thanks, I'll look into this. -- MySQL General Mailing List For list archives: http://lists.mysql.com/mysql To unsubscribe: http://lists.mysql.com/mysql