On 12/31/2015 0:51, Larry Martell wrote:
I need to count the number of rows in a table that are grouped by a
list of columns, but I also need to exclude rows that have more then
some count when grouped by a different set of columns. Conceptually,
this is not hard, but I am having trouble doing this efficiently.

My first counting query would be this:

SELECT count(*)
FROM cst_rollup
GROUP BY target_name_id, ep, roiname, recipe_process,
recipe_product, recipe_layer, f_tag_bottom,
measname, recipe_id

But from this count I need to subtract the count of rows that have
more then 50 rows with a different grouping:

SELECT count(*)
FROM cst_rollup
GROUP BY target_name_id, ep, wafer_id
HAVING count(*) >= 50

As you can see, the second query has wafer_id, but the first query does not.

Currently I am doing this in python, and it's slow. In my current
implementation I have one query, and it selects the columns (i.e.
doesn't just count), and I have added wafer_id:

SELECT target_name_id, ep, roiname, recipe_process,
recipe_product, recipe_layer, f_tag_bottom,
measname, recipe_id, wafer_id
FROM cst_rollup

Then I go through the result set (which can be over 200k rows) and I
count the number of rows with matching (target_name_id, ep, wafer_id).
Then I go through the rows again and regroup them without wafer_id,
but skipping the rows that have more then 50 rows for that row's
(target_name_id, ep, wafer_id).

Is this clear to everyone what I am trying to do?

If I've understand this correctly, the resultset you wish to aggregate on is ...

select target_name_id, ep, wafer_id
from cst_rollup a
left join (               -- exclude rows for which wafer_id count >= 50
  select name_id, ep, wafer, count(*) n
  from cst_rollup
  group by target_name_id, ep, wafer_id
  having n >= 50
) b using ( target_name_id, ep, wafer_id )
where b.target_name is null ;

If that's so, you could assemble that resultset in a temp table then run the desired aggregate query on it, or you could aggregate on it directly as a subquery.

PB

-----

I'd like to do this all in sql with count because then I do not have
to actually return and parse the data in python.

Can anyone think of a way to do this in sql in a way that will be more
efficient then my current implementation?


Thanks!
-Larry



--
MySQL General Mailing List
For list archives: http://lists.mysql.com/mysql
To unsubscribe:    http://lists.mysql.com/mysql

Reply via email to