On 12/31/2015 0:51, Larry Martell wrote:
I need to count the number of rows in a table that are grouped by a
list of columns, but I also need to exclude rows that have more then
some count when grouped by a different set of columns. Conceptually,
this is not hard, but I am having trouble doing this efficiently.
My first counting query would be this:
SELECT count(*)
FROM cst_rollup
GROUP BY target_name_id, ep, roiname, recipe_process,
recipe_product, recipe_layer, f_tag_bottom,
measname, recipe_id
But from this count I need to subtract the count of rows that have
more then 50 rows with a different grouping:
SELECT count(*)
FROM cst_rollup
GROUP BY target_name_id, ep, wafer_id
HAVING count(*) >= 50
As you can see, the second query has wafer_id, but the first query does not.
Currently I am doing this in python, and it's slow. In my current
implementation I have one query, and it selects the columns (i.e.
doesn't just count), and I have added wafer_id:
SELECT target_name_id, ep, roiname, recipe_process,
recipe_product, recipe_layer, f_tag_bottom,
measname, recipe_id, wafer_id
FROM cst_rollup
Then I go through the result set (which can be over 200k rows) and I
count the number of rows with matching (target_name_id, ep, wafer_id).
Then I go through the rows again and regroup them without wafer_id,
but skipping the rows that have more then 50 rows for that row's
(target_name_id, ep, wafer_id).
Is this clear to everyone what I am trying to do?
If I've understand this correctly, the resultset you wish to aggregate
on is ...
select target_name_id, ep, wafer_id
from cst_rollup a
left join ( -- exclude rows for which wafer_id count >= 50
select name_id, ep, wafer, count(*) n
from cst_rollup
group by target_name_id, ep, wafer_id
having n >= 50
) b using ( target_name_id, ep, wafer_id )
where b.target_name is null ;
If that's so, you could assemble that resultset in a temp table then run
the desired aggregate query on it, or you could aggregate on it directly
as a subquery.
PB
-----
I'd like to do this all in sql with count because then I do not have
to actually return and parse the data in python.
Can anyone think of a way to do this in sql in a way that will be more
efficient then my current implementation?
Thanks!
-Larry
--
MySQL General Mailing List
For list archives: http://lists.mysql.com/mysql
To unsubscribe: http://lists.mysql.com/mysql