Re: help with query to count rows while excluding certain rows
On Fri, Jan 1, 2016 at 9:31 PM, Peter Brawley wrote: > On 1/1/2016 19:24, Larry Martell wrote: >> >> On Fri, Jan 1, 2016 at 2:12 PM, Peter Brawley >> wrote: >>> >>> On 12/31/2015 0:51, Larry Martell wrote: I need to count the number of rows in a table that are grouped by a list of columns, but I also need to exclude rows that have more then some count when grouped by a different set of columns. Conceptually, this is not hard, but I am having trouble doing this efficiently. My first counting query would be this: SELECT count(*) FROM cst_rollup GROUP BY target_name_id, ep, roiname, recipe_process, recipe_product, recipe_layer, f_tag_bottom, measname, recipe_id But from this count I need to subtract the count of rows that have more then 50 rows with a different grouping: SELECT count(*) FROM cst_rollup GROUP BY target_name_id, ep, wafer_id HAVING count(*) >= 50 As you can see, the second query has wafer_id, but the first query does not. Currently I am doing this in python, and it's slow. In my current implementation I have one query, and it selects the columns (i.e. doesn't just count), and I have added wafer_id: SELECT target_name_id, ep, roiname, recipe_process, recipe_product, recipe_layer, f_tag_bottom, measname, recipe_id, wafer_id FROM cst_rollup Then I go through the result set (which can be over 200k rows) and I count the number of rows with matching (target_name_id, ep, wafer_id). Then I go through the rows again and regroup them without wafer_id, but skipping the rows that have more then 50 rows for that row's (target_name_id, ep, wafer_id). Is this clear to everyone what I am trying to do? >>> >>> >>> If I've understand this correctly, the resultset you wish to aggregate on >>> is >>> ... >>> >>> select target_name_id, ep, wafer_id >>> from cst_rollup a >>> left join ( -- exclude rows for which wafer_id count >= 50 >>>select name_id, ep, wafer, count(*) n >>>from cst_rollup >>>group by target_name_id, ep, wafer_id >>>having n >= 50 >>> ) b using ( target_name_id, ep, wafer_id ) >>> where b.target_name is null ; >>> >>> If that's so, you could assemble that resultset in a temp table then run >>> the >>> desired aggregate query on it, or you could aggregate on it directly as a >>> subquery. >> >> That query gives: >> >> ERROR 1137 (HY000): Can't reopen table: 'a' > > > So, it's a temporary table, and you'll need to make that not so. Yes, cst_rollup is a temp table. The underlying table is millions of rows (with 300 columns) so for efficiency a subset of the rows and columns are selected into the temp table based on some user input. It's just the rows in the temp table that are of interest for the current report. I was able to get this working with a second temp table: CREATE TEMPORARY TABLE rollup_exclude SELECT target_name_id, ep, wafer_id, count(*) n FROM cst_rollup GROUP BY target_name_id, ep, wafer_id HAVING n >= 50 And then: SELECT count(*) FROM cst_rollup LEFT JOIN( SELECT target_name_id, ep, wafer_id FROM rollup_exclude) b USING (target_name_id, ep, wafer_id) WHERE b.target_name_id IS NULL GROUP by target_name_id, ep, roiname, recipe_process, recipe_product, recipe_layer, f_tag_bottom, measname, recipe_id And the rowcount from that query gave me what I needed. Thanks very much for the help Peter, you gave me a push toward the right path. -- MySQL General Mailing List For list archives: http://lists.mysql.com/mysql To unsubscribe:http://lists.mysql.com/mysql
Re: help with query to count rows while excluding certain rows
On Fri, Jan 1, 2016 at 2:12 PM, Peter Brawley wrote: > On 12/31/2015 0:51, Larry Martell wrote: >> >> I need to count the number of rows in a table that are grouped by a >> list of columns, but I also need to exclude rows that have more then >> some count when grouped by a different set of columns. Conceptually, >> this is not hard, but I am having trouble doing this efficiently. >> >> My first counting query would be this: >> >> SELECT count(*) >> FROM cst_rollup >> GROUP BY target_name_id, ep, roiname, recipe_process, >> recipe_product, recipe_layer, f_tag_bottom, >> measname, recipe_id >> >> But from this count I need to subtract the count of rows that have >> more then 50 rows with a different grouping: >> >> SELECT count(*) >> FROM cst_rollup >> GROUP BY target_name_id, ep, wafer_id >> HAVING count(*) >= 50 >> >> As you can see, the second query has wafer_id, but the first query does >> not. >> >> Currently I am doing this in python, and it's slow. In my current >> implementation I have one query, and it selects the columns (i.e. >> doesn't just count), and I have added wafer_id: >> >> SELECT target_name_id, ep, roiname, recipe_process, >> recipe_product, recipe_layer, f_tag_bottom, >> measname, recipe_id, wafer_id >> FROM cst_rollup >> >> Then I go through the result set (which can be over 200k rows) and I >> count the number of rows with matching (target_name_id, ep, wafer_id). >> Then I go through the rows again and regroup them without wafer_id, >> but skipping the rows that have more then 50 rows for that row's >> (target_name_id, ep, wafer_id). >> >> Is this clear to everyone what I am trying to do? > > > If I've understand this correctly, the resultset you wish to aggregate on is > ... > > select target_name_id, ep, wafer_id > from cst_rollup a > left join ( -- exclude rows for which wafer_id count >= 50 > select name_id, ep, wafer, count(*) n > from cst_rollup > group by target_name_id, ep, wafer_id > having n >= 50 > ) b using ( target_name_id, ep, wafer_id ) > where b.target_name is null ; > > If that's so, you could assemble that resultset in a temp table then run the > desired aggregate query on it, or you could aggregate on it directly as a > subquery. That query gives: ERROR 1137 (HY000): Can't reopen table: 'a' -- MySQL General Mailing List For list archives: http://lists.mysql.com/mysql To unsubscribe:http://lists.mysql.com/mysql
Re: help with query to count rows while excluding certain rows
On 12/31/2015 0:51, Larry Martell wrote: I need to count the number of rows in a table that are grouped by a list of columns, but I also need to exclude rows that have more then some count when grouped by a different set of columns. Conceptually, this is not hard, but I am having trouble doing this efficiently. My first counting query would be this: SELECT count(*) FROM cst_rollup GROUP BY target_name_id, ep, roiname, recipe_process, recipe_product, recipe_layer, f_tag_bottom, measname, recipe_id But from this count I need to subtract the count of rows that have more then 50 rows with a different grouping: SELECT count(*) FROM cst_rollup GROUP BY target_name_id, ep, wafer_id HAVING count(*) >= 50 As you can see, the second query has wafer_id, but the first query does not. Currently I am doing this in python, and it's slow. In my current implementation I have one query, and it selects the columns (i.e. doesn't just count), and I have added wafer_id: SELECT target_name_id, ep, roiname, recipe_process, recipe_product, recipe_layer, f_tag_bottom, measname, recipe_id, wafer_id FROM cst_rollup Then I go through the result set (which can be over 200k rows) and I count the number of rows with matching (target_name_id, ep, wafer_id). Then I go through the rows again and regroup them without wafer_id, but skipping the rows that have more then 50 rows for that row's (target_name_id, ep, wafer_id). Is this clear to everyone what I am trying to do? If I've understand this correctly, the resultset you wish to aggregate on is ... select target_name_id, ep, wafer_id from cst_rollup a left join ( -- exclude rows for which wafer_id count >= 50 select name_id, ep, wafer, count(*) n from cst_rollup group by target_name_id, ep, wafer_id having n >= 50 ) b using ( target_name_id, ep, wafer_id ) where b.target_name is null ; If that's so, you could assemble that resultset in a temp table then run the desired aggregate query on it, or you could aggregate on it directly as a subquery. PB - I'd like to do this all in sql with count because then I do not have to actually return and parse the data in python. Can anyone think of a way to do this in sql in a way that will be more efficient then my current implementation? Thanks! -Larry -- MySQL General Mailing List For list archives: http://lists.mysql.com/mysql To unsubscribe:http://lists.mysql.com/mysql
help with query to count rows while excluding certain rows
I need to count the number of rows in a table that are grouped by a list of columns, but I also need to exclude rows that have more then some count when grouped by a different set of columns. Conceptually, this is not hard, but I am having trouble doing this efficiently. My first counting query would be this: SELECT count(*) FROM cst_rollup GROUP BY target_name_id, ep, roiname, recipe_process, recipe_product, recipe_layer, f_tag_bottom, measname, recipe_id But from this count I need to subtract the count of rows that have more then 50 rows with a different grouping: SELECT count(*) FROM cst_rollup GROUP BY target_name_id, ep, wafer_id HAVING count(*) >= 50 As you can see, the second query has wafer_id, but the first query does not. Currently I am doing this in python, and it's slow. In my current implementation I have one query, and it selects the columns (i.e. doesn't just count), and I have added wafer_id: SELECT target_name_id, ep, roiname, recipe_process, recipe_product, recipe_layer, f_tag_bottom, measname, recipe_id, wafer_id FROM cst_rollup Then I go through the result set (which can be over 200k rows) and I count the number of rows with matching (target_name_id, ep, wafer_id). Then I go through the rows again and regroup them without wafer_id, but skipping the rows that have more then 50 rows for that row's (target_name_id, ep, wafer_id). Is this clear to everyone what I am trying to do? I'd like to do this all in sql with count because then I do not have to actually return and parse the data in python. Can anyone think of a way to do this in sql in a way that will be more efficient then my current implementation? Thanks! -Larry -- MySQL General Mailing List For list archives: http://lists.mysql.com/mysql To unsubscribe:http://lists.mysql.com/mysql