Hi Neil,

On 11/22/2012 7:14 PM, h...@tbbs.net wrote:
2012/11/22 14:30 +0000, Neil Tompkins >>>>
I'm struggling with what I think is a basic select but can't think how to
do it : My data is

id,type

1000,5
1001,5
1002,2
1001,2
1003,2
1005,2
1006,1

 From this I what to get a distinct list of id where the type equals 2 and 5

Any ideas ?
<<<<<<<<
This ugly one, which generalizes:

select id,group_concat(type) AS tl from the_table group by id having 
find_in_set('2',tl) and find_in_set('5',tl)

Ugly becaus it involves so much converting between number & string.

For full generality one would indeed write
GROUP_CONCAT(type ORDER BY type)
and pass my "tl" and a string, say '1,2', to a procedure that using 
SUBSTRING_INDEX taking the strings for arrays ensures that all found in the first string 
is also in the second string. There are times when I wish SQL had arrays.

The fun part of solving this is to remember that SQL is a set-oriented language. For each element in the set, none of them can be both 2 and 5 at the same time. So, you have to build two sets and check to see which rows are in both.

One pattern works if you need to aggregate for just a few terms

SELECT a.id
from (select distinct id from mytable where type=2) a
INNER JOIN (select distinct id from mytable where type=5) b
  on a.id=b.id

However, this gets numerically very expensive with more than a few JOINS to the pattern. Also, there is no index on either of the temporary results (a or b) so this is a full Cartesian product of both tables. That means that although it gives you a correct answer, it will not scale to 100000's of rows (or more) in either set.


So, here is a way to assemble the same result that uses much less resources. Remember, each row you want is a member of a set.

CREATE TEMPORARY TABLE tmpList (
  id int
, type int
, PRIMARY KEY (id,type)
)

INSERT IGNORE tmpList
SELECT id,type
FROM mytable
WHERE type in (2,5)

SELECT id, count(type) hits
FROM tmplist
GROUP BY id
HAVING hits=2

DROP TEMPORARY TABLE tmpList

Can you see why this works?

I created an indexed subset of rows that match either value (2 or 5) but only keep one example of each. I accomplished that by the combination of PRIMARY KEY and INSERT IGNORE. Then I counted many type values each ID value represented in the subset. If I looked for 2 terms and I ended up with hits=2, then I know that those ID values matched on both terms.

You can expand on this pattern to also do partial (M of N search terms) or best-fit determinations.

I hope this was the kind of help you were looking for.

Regards,
--
Shawn Green
MySQL Principal Technical Support Engineer
Oracle USA, Inc. - Hardware and Software, Engineered to Work Together.
Office: Blountville, TN



--
MySQL General Mailing List
For list archives: http://lists.mysql.com/mysql
To unsubscribe:    http://lists.mysql.com/mysql

Reply via email to