On Wed, Sep 19, 2012 at 10:26 AM, Larry Martell <larry.mart...@gmail.com> wrote: > On Wed, Sep 19, 2012 at 12:04 AM, <h...@tbbs.net> wrote: >>>>>> 2012/09/18 06:53 -0400, Larry Martell >>>> >> This works fine. But now I need to get a certain column >> (image_measurer_id) with each row returned that corresponds to the row >> from the group that has bottom = Min(bottom), bottom = Max(bottom), >> bottom closest to Avg(bottom), and bottom from the row where date_time >> = Max(date_time). >> <<<<<<<< >> There is a standard and ugly way of getting such a thing, but it entails >> repeating the table reference (supposing you want any, not every, to match): >> >> SELECT data_cst.target_name_id, image_measurer_id, nb, xb, vb, xt, bottom >> FROM (SELECT Min(bottom) AS nb, Max(bottom) AS xb, MIN(ABS(Avg(bottom) - >> bottom)) AS vb, Max(date_time) AS xt >> FROM data_cst >> WHERE target_name_id IN (775, 776, 777, 778, 779, 780, 45, 44, 116, >> 117, 118, 119, 120, 121) >> AND DATE(data_cst.date_time) = '2010-03-04' >> GROUP BY target_name_id, ep, wafer_id >> HAVING count(*) < 999) AS st >> JOIN data_cst ON st.target_name_id = data_cst.target_name_id AND (nb = >> bottom OR xb = bottom OR vb = bottom OR date_time = xt) >> >> One record will be returned for every row that holds a relevant extremum, >> not guaranteed to be unique. >> >> This query pertains only to your original subquery, not the whole query. To >> get the result to which you refer, join this to your original query, to >> which you have added something like vb: >> >> SELECT * >> FROM (original query) JOIN (this query) ON nb = minbottom AND xb = maxbottom >> AND xt = "Last Run" AND vb = .... AND (original query).target_name_id = >> (this_query).target_name_id > > Thanks. I will look into doing it like this. > > I got the min, max, and date conditions to work by adding the joins > that Rick suggested: > > join data_cst mn ON mn.bottom = q1.minbottom > join data_cst mx ON mx.bottom = q1.maxbottom > join data_cst lr ON lr.date_time = q1.LastRun > > But now you have me wondering - if this just matching on that one > field, then it can be returning a row from data_cst that is not part > of the original result set? Do I need to add all the same select > criteria to each join?
I verified that this is the case. I was not getting the correct data back. I had to add: and mn.target_name_id = q1.target_name_id and mn.ep = q1.ep and mn.date_time BETWEEN '2010-03-04 00:00:00' AND '2010-03-04 23:59:59' to each join so I got the correct rows back. So now my only issue is still how to work the bottom is closest to avg(bottom) join into this. > >> --but I am not confident in the result. There are problems in the original >> query, the biggest that of avgbottom, stdbottom, maxbottom, minbottom none >> are aggregated over wafer_id. > > The inner query aggregates over target_name_id, ep, wafer_id, and the > the outer query aggregates that result set over target_name_id, ep. > The avg and std are what the client wants - average of averages and > standards deviations. As are the min and max - the outer query returns > the min and max of all the wafers for each target, ep. > >> Therefore, it is not certain from which record from q1 they are returned. >> MySQL tends to pick the first that fits, but not only is nothing guaranteed, >> it is explicitly written that if such not aggregated fields appear, the >> output is not determinate unless all pertinent are equal, the possibility >> wherof the reason for allowing it. > > Yes, if case with dups probably had not been thought of by them, and > will have to be addressed at some point. > >> >> When that has been handled, it is needful to change the foregoing query to >> match that one in two levels, because averaging is not associative (there is >> also the problem of equality-testing on generated floating-point numbers). >> If it were only MAX and MIN, one level of not GROUPing BY wafer_id would be >> all right. >> >> By the way, in the original query, I suggest instead of >> data_cst.date_time BETWEEN '2010-03-04 00:00:00' AND >> '2010-03-04 23:59:59' >> this, >> DATE(data_cst.date_time) = '2010-03-04' > > The date_time was just an example. Depending on user input the query > could have times in it. > >> Instead of >> >> ... q1, >> data_target >> WHERE data_target.id = target_name_id >> >> this would be better: >> >> ... q1 JOIN data_target ON data_target.id = target_name_id > > Why is one better then the other? > >> I believe that >> 100-((SUM(numonep) - SUM(numbottoms))/SUM(numonep)*100) >> = 100*SUM(numbottoms)/SUM(numonep) >> and >> SUM(CASE WHEN reruns > 0 THEN reruns ELSE 0 END) >> = SUM(GREATEST(reruns, 0)) > > Yes, thanks - yours is clearer and I've made those changes. -- MySQL General Mailing List For list archives: http://lists.mysql.com/mysql To unsubscribe: http://lists.mysql.com/mysql