On Wed, Sep 19, 2012 at 6:56 PM, Rick James <rja...@yahoo-inc.com> wrote: >> my counts are 3 times too much. > Without studying the code, I would guess that there is a JOIN between he data > that needs COUNTing and the GROUP BY for the COUNT. That is, it collects > more 'joined' rows before counting. Fixing it will probably make the query > even messier.
The issue was with the last run join. I had to change the where clause in the join from: lr.target_name_id IN (775, 776, 777, 778, 779, 780, 45, 44, 116, 117, 118, 119, 120, 121) to: lr.target_name_id = q1.target_name_id AND lr.ep = q1.ep In fact I had to do this for all the joins. Thanks again for all your help! > >> -----Original Message----- >> From: Larry Martell [mailto:larry.mart...@gmail.com] >> Sent: Wednesday, September 19, 2012 3:22 PM >> To: Rick James >> Cc: peter.braw...@earthlink.net; mysql@lists.mysql.com >> Subject: Re: getting certain rows from a group by >> >> Well, I'm getting the proper rows from the 4 joins I added, but now I'm >> seeing a weird side effect - my counts are 3 times to much. >> >> The query is really huge and nasty now, but I'm going to paste it >> below. >> >> In the outer most select, Wafers, Rerun, Runs, and Count are 3 times >> what they should be. If I take off the joins, I get the correct counts. >> I can't see why that's happening. >> >> >> SELECT data_target.name as Target, >> q1.ep as EP, >> COUNT(*) as Wafers, >> Lots, >> FORMAT(SUM(numonep)/(COUNT(*)+(SUM(GREATEST(q1.reruns, >> 0)))), 1) as 'Sites/Wafer', >> DATE_FORMAT(MAX(LastRun), '%m/%d/%y') as "Last Run", >> SUM(GREATEST(q1.reruns, 0)) as Rerun, >> COUNT(*)+SUM(GREATEST(q1.reruns, 0)) as Runs, >> FORMAT(avgbottom, 1) as "Avg Bottom", >> FORMAT(3*stdbottom, 2) as "3 Sig", >> FORMAT(maxbottom, 1) as Max, >> FORMAT(minbottom, 1) as Min, >> SUM(numonep) as Count, >> SUM(numonep) - SUM(numbottoms) as NAs, >> CONCAT(FORMAT(100*SUM(numbottoms)/SUM(numonep), 1), ' >> %') as "% Success", >> FORMAT(3*stdbottom/avgbottom, 2) as "3Sig/Avg", >> FORMAT(AVG(avgbottom), 1) as 'Wafer Avg', >> FORMAT(AVG(Wafer3Sigma), 2) as 'Wafer 3 Sigma', >> FORMAT(AVG(Ranges), 1) as 'Avg Range', >> FORMAT(3*STD(Ranges), 2) as '3Sig of Ranges', >> FORMAT(MAX(Ranges), 1) as 'Max Range', >> FORMAT(MIN(Ranges), 1) as 'Min Range', >> FORMAT((SUM(numonep) - SUM(numbottoms))/COUNT(*), 1) as >> 'NAs/Wafer', >> mx.maxImage as maxImage, >> mn.minImage as minImage, >> lr.lrImage as lrlrImage, >> cm.cmImage as cmcmImage >> FROM (SELECT target_name_id, >> ep, >> wafer_id, >> COUNT(bottom) as numbottoms, >> AVG(bottom) as avgbottom, >> STD(bottom) as stdbottom, >> MAX(bottom) as maxbottom, >> MIN(bottom) as minbottom, >> MAX(date_time) as "LastRun", >> COUNT(*) as numonep, >> COUNT(DISTINCT target_name_id, ep, lot_id, >> data_file_id)-1 as reruns, >> COUNT(DISTINCT(lot_id)) as Lots, >> 3*STD(bottom) as Wafer3Sigma, >> MAX(bottom) - MIN(bottom) as Ranges >> FROM data_cst >> WHERE TRUE >> AND data_cst.target_name_id IN (775, 776, 777, 778, 779, 780, >> 45, 44, 116, 117, 118, 119, 120, 121) >> AND data_cst.date_time BETWEEN '2010-03-04 00:00:00' AND >> '2010-03-04 23:59:59' >> GROUP BY target_name_id, ep, wafer_id >> HAVING count(*) < 999) q1 >> JOIN (SELECT data_cstimage.name as minImage, >> data_cst.bottom, >> data_cst.target_name_id, >> data_cst.date_time, >> data_cst.lot_id, >> data_cst.tool_id, >> data_cst.roiname, >> data_cst.recipe_id, >> data_cst.ep >> FROM data_cstimage, data_cst >> WHERE data_cstimage.id = >> data_cst.image_measurer_id) mn >> ON mn.bottom = q1.minbottom >> AND mn.target_name_id IN (775, 776, 777, 778, 779, 780, >> 45, 44, 116, 117, 118, 119, 120, 121) >> AND mn.date_time BETWEEN '2010-03-04 00:00:00' AND >> '2010-03-04 23:59:59' >> JOIN (SELECT data_cstimage.name as maxImage, >> data_cst.bottom, >> data_cst.target_name_id, >> data_cst.date_time, >> data_cst.lot_id, >> data_cst.tool_id, >> data_cst.roiname, >> data_cst.recipe_id, >> data_cst.ep >> FROM data_cstimage, data_cst >> WHERE data_cstimage.id = >> data_cst.image_measurer_id) mx >> ON mx.bottom = q1.maxbottom >> AND mx.target_name_id IN (775, 776, 777, 778, 779, 780, >> 45, 44, 116, 117, 118, 119, 120, 121) >> AND mx.date_time BETWEEN '2010-03-04 00:00:00' AND >> '2010-03-04 23:59:59' >> JOIN (SELECT data_cstimage.name as lrImage, >> data_cst.bottom, >> data_cst.target_name_id, >> data_cst.date_time, >> data_cst.lot_id, >> data_cst.tool_id, >> data_cst.roiname, >> data_cst.recipe_id, >> data_cst.ep >> FROM data_cstimage, data_cst >> WHERE data_cstimage.id = >> data_cst.image_measurer_id) lr >> ON lr.date_time = q1.LastRun >> AND lr.target_name_id IN (775, 776, 777, 778, 779, 780, >> 45, 44, 116, 117, 118, 119, 120, 121) >> AND lr.date_time BETWEEN '2010-03-04 00:00:00' AND >> '2010-03-04 23:59:59' >> JOIN (SELECT data_cstimage.name as cmImage, >> data_cst.bottom, >> data_cst.target_name_id, >> data_cst.date_time, >> data_cst.lot_id, >> data_cst.tool_id, >> data_cst.roiname, >> data_cst.recipe_id, >> data_cst.ep >> FROM data_cstimage, data_cst >> WHERE data_cstimage.id = data_cst.image_measurer_id >> AND data_cst.target_name_id IN (775, 776, 777, 778, >> 779, 780, 45, 44, 116, 117, 118, 119, 120, 121) >> AND data_cst.date_time BETWEEN '2010-03-04 00:00:00' >> AND '2010-03-04 23:59:59' >> GROUP BY target_name_id, ep >> HAVING count(*) < 999 >> ORDER BY ABS(bottom - AVG(bottom))) cm >> ON cm.target_name_id = q1.target_name_id >> AND cm.ep = q1.ep >> AND cm.target_name_id IN (775, 776, 777, 778, 779, 780, >> 45, 44, 116, 117, 118, 119, 120, 121) >> AND cm.date_time BETWEEN '2010-03-04 00:00:00' AND >> '2010-03-04 23:59:59' , >> data_target >> WHERE data_target.id = q1.target_name_id >> GROUP BY q1.target_name_id, q1.ep >> >> >> On Wed, Sep 19, 2012 at 3:27 PM, Larry Martell >> <larry.mart...@gmail.com> wrote: >> > OK, I think I have this working. The last join was this: >> > >> > JOIN (select id, target_name_id, ep,date_time from data_cst WHERE >> > target_name_id IN (775, 776, 777, 778, 779, 780, 45, 44, 116, 117, >> > 118, 119, 120, 121) AND data_cst.date_time BETWEEN '2010-03-04 >> > 00:00:00' AND '2010-03-04 23:59:59' GROUP BY target_name_id, ep >> > HAVING count(*) < 999 ORDER BY ABS(data_cst.bottom - AVG(bottom))) >> cm >> > on cm.target_name_id = q1.target_name_id and cm.ep = q1.ep and >> > cm.date_time BETWEEN '2010-03-04 00:00:00' AND '2010-03-04 23:59:59' >> > >> > I have do to more testing to ensure I'm getting the correct rows >> back. >> > >> > Thanks for all the help! >> > >> > On Wed, Sep 19, 2012 at 10:07 AM, Larry Martell >> <larry.mart...@gmail.com> wrote: >> >> On Tue, Sep 18, 2012 at 7:41 PM, Rick James <rja...@yahoo-inc.com> >> wrote: >> >>> SELECT ((the appropriate id)) -- <-- >> >>> FROM data_cst, rollup, data_target >> >>> WHERE data_target.name = rollup.Target >> >>> AND data_cst.ep = rollup.EP >> >>> AND data_cst.date_time BETWEEN '2010-03-04 00:00:00' AND >> '2010-03-04 23:59:59' >> >>> AND data_target.id = data_cst.target_name_id >> >>> GROUP BY rollup.Target, rollup.EP >> >>> ORDER BY ABS(data_cst.bottom - rollup.`Avg Bottom`) >> >>> LIMIT 1 -- <-- >> >> >> >> rollup is the name of the temp table that I was testing with. It >> >> contains the output of the original query. I don't have that in the >> >> real world. That's my issue - I can't figure out how to join with >> the >> >> result set and apply the order by to the join condition. What would >> I >> >> use in place of rollup to get the 'Avg Bottom'? >> >> >> >> Here's what my query looks like now: >> >> >> >> SELECT data_target.name as Target, >> >> q1.ep as EP, >> >> COUNT(*) as Wafers, >> >> Lots, >> >> SUM(numonep)/(COUNT(*)+SUM(CASE WHEN reruns > 0 THEN reruns >> >> ELSE 0 END)) as 'Sites/Wafer', >> >> MAX(LastRun) as "Last Run", >> >> SUM(CASE WHEN reruns > 0 THEN reruns ELSE 0 END) as Rerun, >> >> COUNT(*)+SUM(CASE WHEN reruns > 0 THEN reruns ELSE 0 END) as >> Runs, >> >> avgbottom as "Avg Bottom", >> >> 3*stdbottom as "3 Sig", >> >> maxbottom as Max, >> >> minbottom as Min, >> >> SUM(numonep) as Count, SUM(numonep) - SUM(numbottoms) as NAs, >> >> 100-((SUM(numonep) - SUM(numbottoms))/SUM(numonep)*100) as "% >> Success", >> >> 3*stdbottom/avgbottom as "3Sig/Avg", >> >> AVG(avgbottom) as 'Wafer Avg', >> >> AVG(Wafer3Sigma) as 'Wafer 3 Sigma', >> >> AVG(Ranges) as 'Avg Range', >> >> 3*STD(Ranges) as '3Sig of Ranges', >> >> MAX(Ranges) as 'Max Range', >> >> MIN(Ranges) as 'Min Range', (SUM(numonep) - >> >> SUM(numbottoms))/COUNT(*) as 'NAs/Wafer', >> >> mx.id as maxID, >> >> mn.id as minID, >> >> lr.id as lrID >> >> FROM (SELECT target_name_id, >> >> ep, >> >> wafer_id, >> >> COUNT(bottom) as numbottoms, >> >> AVG(bottom) as avgbottom, >> >> STD(bottom) as stdbottom, >> >> MAX(bottom) as maxbottom, >> >> MIN(bottom) as minbottom, >> >> MAX(date_time) as "LastRun", >> >> COUNT(*) as numonep, >> >> COUNT(DISTINCT target_name_id, ep, lot_id, >> >> data_file_id)-1 as reruns, >> >> COUNT(DISTINCT(lot_id)) as Lots, >> >> 3*STD(bottom) as Wafer3Sigma, >> >> MAX(bottom) - MIN(bottom) as Ranges >> >> FROM data_cst >> >> WHERE target_name_id IN (775, 776, 777, 778, 779, 780, 45, 44, >> >> 116, 117, 118, 119, 120, 121) >> >> AND data_cst.date_time BETWEEN '2010-03-04 00:00:00' AND >> >> '2010-03-04 23:59:59' >> >> GROUP BY target_name_id, ep, wafer_id >> >> HAVING count(*) < 999) q1 >> >> join data_cst mn ON mn.bottom = q1.minbottom join data_cst mx on >> >> mx.bottom = q1.maxbottom join data_cst lr on lr.date_time = >> >> q1.LastRun, data_target WHERE data_target.id = q1.target_name_id >> >> GROUP BY q1.target_name_id, q1.ep; >> >> >> >>> >> >>>> -----Original Message----- >> >>>> From: Larry Martell [mailto:larry.mart...@gmail.com] >> >>>> Sent: Tuesday, September 18, 2012 2:57 PM >> >>>> To: Rick James >> >>>> Cc: peter.braw...@earthlink.net; mysql@lists.mysql.com >> >>>> Subject: Re: getting certain rows from a group by >> >>>> >> >>>> On Tue, Sep 18, 2012 at 4:01 PM, Rick James <rja...@yahoo-inc.com> >> >>>> wrote: >> >>>> > SELECT mn.* >> >>>> > mx.* >> >>>> >> > FROM ( SELECT @min = MIN(bottom), >> >>>> >> > @max = MAX(bottom), ... ) >> >>>> >> > JOIN data_cst mn ON bottom = @min >> >>>> >> > JOIN data_cst mx ON bottom = @max >> >>>> > >> >>>> > That is, the FROM finds the value for the desired row. >> >>>> > The JOIN then gets to the rest of the fields. >> >>>> > Caveat: If two rows have bottom = @min, it returns both. >> >>>> >> >>>> >> >>>> Thanks a lot Rick! This was super helpful. I've got it working for >> >>>> 3 of the 4 cases - min(bottom), max(bottom), and max(date_time). >> >>>> But I can't figure out how to work in the last case - where bottom >> >>>> is closest to avg(bottom). In an individual query I can get it >> with >> >>>> an order by, like >> >>>> this: >> >>>> >> >>>> SELECT rollup.Target, rollup.EP, rollup.`Avg Bottom`, data_cst.id, >> >>>> data_cst.bottom FROM data_cst, rollup, data_target WHERE >> >>>> data_target.name = rollup.Target AND data_cst.ep = rollup.EP AND >> >>>> data_cst.date_time BETWEEN '2010-03-04 00:00:00' AND '2010-03-04 >> >>>> 23:59:59' >> >>>> AND data_target.id = data_cst.target_name_id GROUP BY >> >>>> rollup.Target, rollup.EP ORDER BY ABS(data_cst.bottom - >> rollup.`Avg >> >>>> Bottom`); >> >>>> >> >>>> Any way to work that into another join? >> >>>> >> >>>> Thanks! >> >>>> >> >>>> > >> >>>> >> -----Original Message----- >> >>>> >> From: Larry Martell [mailto:larry.mart...@gmail.com] >> >>>> >> Sent: Tuesday, September 18, 2012 12:54 PM >> >>>> >> To: Rick James >> >>>> >> Cc: peter.braw...@earthlink.net; mysql@lists.mysql.com >> >>>> >> Subject: Re: getting certain rows from a group by >> >>>> >> >> >>>> >> On Tue, Sep 18, 2012 at 2:05 PM, Rick James >> >>>> >> <rja...@yahoo-inc.com> >> >>>> >> wrote: >> >>>> >> > For single query... >> >>>> >> > >> >>>> >> > Plan A: >> >>>> >> > Repeat the SELECT as once for each (MIN, AVG, etc) as a >> >>>> >> > subquery >> >>>> in >> >>>> >> > WHERE bottom = ( SELECT MIN(bottom) ... ) >> >>>> >> > >> >>>> >> > Plan B: >> >>>> >> > FROM ( SELECT @min = MIN(bottom), >> >>>> >> > @max = MAX(bottom), ... ) >> >>>> >> > JOIN data_cst mn ON bottom = @min >> >>>> >> > JOIN data_cst mx ON bottom = @max >> >>>> >> >> >>>> >> >> >>>> >> Can you elaborate on this? I don't see how this will give me >> >>>> >> what I need. I'm not looking for the min or max bottom - I >> >>>> >> already have >> >>>> that >> >>>> >> - I'm looking for the row from each group that has the min and >> >>>> >> max bottom. >> >>>> >> >> >>>> >> > Plan C: >> >>>> >> > Get rid of 3rd party packages that eventually get in your way >> >>>> >> > instead >> >>>> >> of 'helping'. >> >>>> >> >> >>>> >> It's not django that is getting in the way. django is a >> >>>> >> fantastic framework for web development. It's just changing >> >>>> >> requirements. The original developer (who no longer works here) >> >>>> >> created a few >> >>>> different >> >>>> >> classes, and then based on the original requirements, chose to >> >>>> >> use the one that supports a single query. That worked then, but >> >>>> >> may not work now. There are other classes that allow multiple >> >>>> >> queries, but then you have to parse the data and stuff it into >> >>>> >> the context that will be picked up by the browser yourself. The >> >>>> >> single query class does that for you. I may yet go that way, >> but >> >>>> >> I'm trying to avoid a big rewrite if I can. >> >>>> >> >> >>>> >> >> -----Original Message----- >> >>>> >> >> From: Larry Martell [mailto:larry.mart...@gmail.com] >> >>>> >> >> Sent: Tuesday, September 18, 2012 4:57 AM >> >>>> >> >> To: peter.braw...@earthlink.net >> >>>> >> >> Cc: mysql@lists.mysql.com >> >>>> >> >> Subject: Re: getting certain rows from a group by >> >>>> >> >> >> >>>> >> >> On Tue, Sep 18, 2012 at 7:52 AM, Peter Brawley >> >>>> >> >> <peter.braw...@earthlink.net> wrote: >> >>>> >> >> > On 2012-09-18 5:53 AM, Larry Martell wrote: >> >>>> >> >> >> >> >>>> >> >> >> I have this query: >> >>>> >> >> >> >> >>>> >> >> >> SELECT data_target.name as Target, >> >>>> >> >> >> q1.ep as EP, >> >>>> >> >> >> COUNT(*) as Wafers, >> >>>> >> >> >> Lots, >> >>>> >> >> >> SUM(numonep)/(COUNT(*)+SUM(CASE WHEN reruns > 0 >> >>>> >> >> >> THEN reruns ELSE 0 END)) as 'Sites/Wafer', >> >>>> >> >> >> MAX(LastRun) as "Last Run", >> >>>> >> >> >> SUM(CASE WHEN reruns > 0 THEN reruns ELSE 0 END) >> >>>> >> >> >> as >> >>>> >> Rerun, >> >>>> >> >> >> COUNT(*)+SUM(CASE WHEN reruns > 0 THEN reruns >> ELSE >> >>>> >> >> >> 0 >> >>>> >> >> >> END) as >> >>>> >> >> Runs, >> >>>> >> >> >> avgbottom as "Avg Bottom", >> >>>> >> >> >> 3*stdbottom as "3 Sig", >> >>>> >> >> >> maxbottom as Max, >> >>>> >> >> >> minbottom as Min, >> >>>> >> >> >> SUM(numonep) as Count, >> >>>> >> >> >> SUM(numonep) - SUM(numbottoms) as NAs, >> >>>> >> >> >> 100-((SUM(numonep) - >> >>>> SUM(numbottoms))/SUM(numonep)*100) >> >>>> >> as >> >>>> >> >> "% >> >>>> >> >> >> Success", >> >>>> >> >> >> 3*stdbottom/avgbottom as "3Sig/Avg", >> >>>> >> >> >> AVG(avgbottom) as 'Wafer Avg', >> >>>> >> >> >> AVG(Wafer3Sigma) as 'Wafer 3 Sigma', >> >>>> >> >> >> AVG(Ranges) as 'Avg Range', >> >>>> >> >> >> 3*STD(Ranges) as '3Sig of Ranges', >> >>>> >> >> >> MAX(Ranges) as 'Max Range', >> >>>> >> >> >> MIN(Ranges) as 'Min Range', >> >>>> >> >> >> (SUM(numonep) - SUM(numbottoms))/COUNT(*) as >> >>>> 'NAs/Wafer' >> >>>> >> >> >> FROM (SELECT target_name_id, >> >>>> >> >> >> ep, >> >>>> >> >> >> wafer_id, >> >>>> >> >> >> COUNT(bottom) as numbottoms, >> >>>> >> >> >> AVG(bottom) as avgbottom, >> >>>> >> >> >> STD(bottom) as stdbottom, >> >>>> >> >> >> MAX(bottom) as maxbottom, >> >>>> >> >> >> MIN(bottom) as minbottom, >> >>>> >> >> >> MAX(date_time) as "LastRun", >> >>>> >> >> >> COUNT(*) as numonep, >> >>>> >> >> >> COUNT(DISTINCT target_name_id, ep, lot_id, >> >>>> >> >> >> data_file_id)-1 as reruns, >> >>>> >> >> >> COUNT(DISTINCT(lot_id)) as Lots, >> >>>> >> >> >> 3*STD(bottom) as Wafer3Sigma, >> >>>> >> >> >> MAX(bottom) - MIN(bottom) as Ranges >> >>>> >> >> >> FROM data_cst >> >>>> >> >> >> WHERE target_name_id IN (775, 776, 777, 778, 779, >> >>>> >> >> >> 780, 45, >> >>>> >> >> 44, >> >>>> >> >> >> 116, 117, 118, 119, 120, 121) >> >>>> >> >> >> AND data_cst.date_time BETWEEN '2010-03-04 >> 00:00:00' >> >>>> AND >> >>>> >> >> >> '2010-03-04 23:59:59' >> >>>> >> >> >> GROUP BY target_name_id, ep, wafer_id >> >>>> >> >> >> HAVING count(*) < 999) q1, >> >>>> >> >> >> data_target >> >>>> >> >> >> WHERE data_target.id = target_name_id GROUP BY >> >>>> >> >> >> q1.target_name_id, q1.ep; >> >>>> >> >> >> >> >>>> >> >> >> >> >>>> >> >> >> This works fine. But now I need to get a certain column >> >>>> >> >> >> (image_measurer_id) with each row returned that >> >>>> >> >> >> corresponds to >> >>>> >> the >> >>>> >> >> >> row from the group that has bottom = Min(bottom), bottom >> = >> >>>> >> >> >> Max(bottom), bottom closest to Avg(bottom), and bottom >> >>>> >> >> >> from >> >>>> the >> >>>> >> >> >> row where date_time = Max(date_time). >> >>>> >> >> >> >> >>>> >> >> >> Is this even possible from one query? >> >>>> >> >> > >> >>>> >> >> > >> >>>> >> >> > Might be, but what's the importance of doing it as one >> query? >> >>>> >> >> >> >>>> >> >> Because it's part of a django based web app, and the class >> >>>> >> >> that this is part of only supports having a single query. To >> >>>> >> >> use multiple queries will require a fairly major rewrite of >> >>>> >> >> the >> >>>> server >> >>>> >> >> side of >> >>>> >> that app. >> >>>> >> >> >> >>>> >> >> > I'd start by >> >>>> >> >> > saving this result to a temp table and developing the new >> >>>> query. >> >>>> >> >> > When that's running, see if you can to optimise a query >> >>>> >> >> > built >> >>>> by >> >>>> >> >> > replacing the reference to the temp table with the >> original >> >>>> query. >> >>>> >> >> >> >>>> >> >> Thanks, I'll look into this. -- MySQL General Mailing List For list archives: http://lists.mysql.com/mysql To unsubscribe: http://lists.mysql.com/mysql