Re: help with query to count rows while excluding certain rows

2016-01-02 Thread Larry Martell
On Fri, Jan 1, 2016 at 9:31 PM, Peter Brawley
 wrote:
> On 1/1/2016 19:24, Larry Martell wrote:
>>
>> On Fri, Jan 1, 2016 at 2:12 PM, Peter Brawley
>>  wrote:
>>>
>>> On 12/31/2015 0:51, Larry Martell wrote:

 I need to count the number of rows in a table that are grouped by a
 list of columns, but I also need to exclude rows that have more then
 some count when grouped by a different set of columns. Conceptually,
 this is not hard, but I am having trouble doing this efficiently.

 My first counting query would be this:

 SELECT count(*)
 FROM cst_rollup
 GROUP BY target_name_id, ep, roiname, recipe_process,
 recipe_product, recipe_layer, f_tag_bottom,
 measname, recipe_id

 But from this count I need to subtract the count of rows that have
 more then 50 rows with a different grouping:

 SELECT count(*)
 FROM cst_rollup
 GROUP BY target_name_id, ep, wafer_id
 HAVING count(*) >= 50

 As you can see, the second query has wafer_id, but the first query does
 not.

 Currently I am doing this in python, and it's slow. In my current
 implementation I have one query, and it selects the columns (i.e.
 doesn't just count), and I have added wafer_id:

 SELECT target_name_id, ep, roiname, recipe_process,
 recipe_product, recipe_layer, f_tag_bottom,
 measname, recipe_id, wafer_id
 FROM cst_rollup

 Then I go through the result set (which can be over 200k rows) and I
 count the number of rows with matching (target_name_id, ep, wafer_id).
 Then I go through the rows again and regroup them without wafer_id,
 but skipping the rows that have more then 50 rows for that row's
 (target_name_id, ep, wafer_id).

 Is this clear to everyone what I am trying to do?
>>>
>>>
>>> If I've understand this correctly, the resultset you wish to aggregate on
>>> is
>>> ...
>>>
>>> select target_name_id, ep, wafer_id
>>> from cst_rollup a
>>> left join (   -- exclude rows for which wafer_id count >= 50
>>>select name_id, ep, wafer, count(*) n
>>>from cst_rollup
>>>group by target_name_id, ep, wafer_id
>>>having n >= 50
>>> ) b using ( target_name_id, ep, wafer_id )
>>> where b.target_name is null ;
>>>
>>> If that's so, you could assemble that resultset in a temp table then run
>>> the
>>> desired aggregate query on it, or you could aggregate on it directly as a
>>> subquery.
>>
>> That query gives:
>>
>> ERROR 1137 (HY000): Can't reopen table: 'a'
>
>
> So, it's a temporary table, and you'll need to make that not so.

Yes, cst_rollup is a temp table. The underlying table is millions of
rows (with 300 columns) so for efficiency a subset of the rows and
columns are selected into the temp table based on some user input.
It's just the rows in the temp table that are of interest for the
current report.

I was able to get this working with a second temp table:

CREATE TEMPORARY TABLE rollup_exclude
SELECT target_name_id, ep, wafer_id, count(*) n
FROM cst_rollup
GROUP BY target_name_id, ep, wafer_id
HAVING n >= 50

And then:

SELECT count(*)
FROM cst_rollup
LEFT JOIN(
SELECT target_name_id, ep, wafer_id
FROM rollup_exclude) b
USING (target_name_id, ep, wafer_id)
WHERE b.target_name_id IS NULL
GROUP by target_name_id, ep, roiname, recipe_process, recipe_product,
recipe_layer, f_tag_bottom, measname, recipe_id

And the rowcount from that query gave me what I needed.

Thanks very much for the help Peter, you gave me a push toward the right path.

-- 
MySQL General Mailing List
For list archives: http://lists.mysql.com/mysql
To unsubscribe:http://lists.mysql.com/mysql



Re: help with query to count rows while excluding certain rows

2016-01-01 Thread Larry Martell
On Fri, Jan 1, 2016 at 2:12 PM, Peter Brawley
 wrote:
> On 12/31/2015 0:51, Larry Martell wrote:
>>
>> I need to count the number of rows in a table that are grouped by a
>> list of columns, but I also need to exclude rows that have more then
>> some count when grouped by a different set of columns. Conceptually,
>> this is not hard, but I am having trouble doing this efficiently.
>>
>> My first counting query would be this:
>>
>> SELECT count(*)
>> FROM cst_rollup
>> GROUP BY target_name_id, ep, roiname, recipe_process,
>> recipe_product, recipe_layer, f_tag_bottom,
>> measname, recipe_id
>>
>> But from this count I need to subtract the count of rows that have
>> more then 50 rows with a different grouping:
>>
>> SELECT count(*)
>> FROM cst_rollup
>> GROUP BY target_name_id, ep, wafer_id
>> HAVING count(*) >= 50
>>
>> As you can see, the second query has wafer_id, but the first query does
>> not.
>>
>> Currently I am doing this in python, and it's slow. In my current
>> implementation I have one query, and it selects the columns (i.e.
>> doesn't just count), and I have added wafer_id:
>>
>> SELECT target_name_id, ep, roiname, recipe_process,
>> recipe_product, recipe_layer, f_tag_bottom,
>> measname, recipe_id, wafer_id
>> FROM cst_rollup
>>
>> Then I go through the result set (which can be over 200k rows) and I
>> count the number of rows with matching (target_name_id, ep, wafer_id).
>> Then I go through the rows again and regroup them without wafer_id,
>> but skipping the rows that have more then 50 rows for that row's
>> (target_name_id, ep, wafer_id).
>>
>> Is this clear to everyone what I am trying to do?
>
>
> If I've understand this correctly, the resultset you wish to aggregate on is
> ...
>
> select target_name_id, ep, wafer_id
> from cst_rollup a
> left join (   -- exclude rows for which wafer_id count >= 50
>   select name_id, ep, wafer, count(*) n
>   from cst_rollup
>   group by target_name_id, ep, wafer_id
>   having n >= 50
> ) b using ( target_name_id, ep, wafer_id )
> where b.target_name is null ;
>
> If that's so, you could assemble that resultset in a temp table then run the
> desired aggregate query on it, or you could aggregate on it directly as a
> subquery.

That query gives:

ERROR 1137 (HY000): Can't reopen table: 'a'

-- 
MySQL General Mailing List
For list archives: http://lists.mysql.com/mysql
To unsubscribe:http://lists.mysql.com/mysql



Re: help with query to count rows while excluding certain rows

2016-01-01 Thread Peter Brawley

On 12/31/2015 0:51, Larry Martell wrote:

I need to count the number of rows in a table that are grouped by a
list of columns, but I also need to exclude rows that have more then
some count when grouped by a different set of columns. Conceptually,
this is not hard, but I am having trouble doing this efficiently.

My first counting query would be this:

SELECT count(*)
FROM cst_rollup
GROUP BY target_name_id, ep, roiname, recipe_process,
recipe_product, recipe_layer, f_tag_bottom,
measname, recipe_id

But from this count I need to subtract the count of rows that have
more then 50 rows with a different grouping:

SELECT count(*)
FROM cst_rollup
GROUP BY target_name_id, ep, wafer_id
HAVING count(*) >= 50

As you can see, the second query has wafer_id, but the first query does not.

Currently I am doing this in python, and it's slow. In my current
implementation I have one query, and it selects the columns (i.e.
doesn't just count), and I have added wafer_id:

SELECT target_name_id, ep, roiname, recipe_process,
recipe_product, recipe_layer, f_tag_bottom,
measname, recipe_id, wafer_id
FROM cst_rollup

Then I go through the result set (which can be over 200k rows) and I
count the number of rows with matching (target_name_id, ep, wafer_id).
Then I go through the rows again and regroup them without wafer_id,
but skipping the rows that have more then 50 rows for that row's
(target_name_id, ep, wafer_id).

Is this clear to everyone what I am trying to do?


If I've understand this correctly, the resultset you wish to aggregate 
on is ...


select target_name_id, ep, wafer_id
from cst_rollup a
left join (   -- exclude rows for which wafer_id count >= 50
  select name_id, ep, wafer, count(*) n
  from cst_rollup
  group by target_name_id, ep, wafer_id
  having n >= 50
) b using ( target_name_id, ep, wafer_id )
where b.target_name is null ;

If that's so, you could assemble that resultset in a temp table then run 
the desired aggregate query on it, or you could aggregate on it directly 
as a subquery.


PB

-


I'd like to do this all in sql with count because then I do not have
to actually return and parse the data in python.

Can anyone think of a way to do this in sql in a way that will be more
efficient then my current implementation?


Thanks!
-Larry




--
MySQL General Mailing List
For list archives: http://lists.mysql.com/mysql
To unsubscribe:http://lists.mysql.com/mysql



help with query to count rows while excluding certain rows

2015-12-30 Thread Larry Martell
I need to count the number of rows in a table that are grouped by a
list of columns, but I also need to exclude rows that have more then
some count when grouped by a different set of columns. Conceptually,
this is not hard, but I am having trouble doing this efficiently.

My first counting query would be this:

SELECT count(*)
FROM cst_rollup
GROUP BY target_name_id, ep, roiname, recipe_process,
recipe_product, recipe_layer, f_tag_bottom,
measname, recipe_id

But from this count I need to subtract the count of rows that have
more then 50 rows with a different grouping:

SELECT count(*)
FROM cst_rollup
GROUP BY target_name_id, ep, wafer_id
HAVING count(*) >= 50

As you can see, the second query has wafer_id, but the first query does not.

Currently I am doing this in python, and it's slow. In my current
implementation I have one query, and it selects the columns (i.e.
doesn't just count), and I have added wafer_id:

SELECT target_name_id, ep, roiname, recipe_process,
recipe_product, recipe_layer, f_tag_bottom,
measname, recipe_id, wafer_id
FROM cst_rollup

Then I go through the result set (which can be over 200k rows) and I
count the number of rows with matching (target_name_id, ep, wafer_id).
Then I go through the rows again and regroup them without wafer_id,
but skipping the rows that have more then 50 rows for that row's
(target_name_id, ep, wafer_id).

Is this clear to everyone what I am trying to do?

I'd like to do this all in sql with count because then I do not have
to actually return and parse the data in python.

Can anyone think of a way to do this in sql in a way that will be more
efficient then my current implementation?


Thanks!
-Larry

-- 
MySQL General Mailing List
For list archives: http://lists.mysql.com/mysql
To unsubscribe:http://lists.mysql.com/mysql