Hi Neil,
On 11/22/2012 7:14 PM, h...@tbbs.net wrote:
2012/11/22 14:30 +0000, Neil Tompkins >>>>
I'm struggling with what I think is a basic select but can't think how to
do it : My data is
id,type
1000,5
1001,5
1002,2
1001,2
1003,2
1005,2
1006,1
From this I what to get a distinct list of id where the type equals 2 and 5
Any ideas ?
<<<<<<<<
This ugly one, which generalizes:
select id,group_concat(type) AS tl from the_table group by id having
find_in_set('2',tl) and find_in_set('5',tl)
Ugly becaus it involves so much converting between number & string.
For full generality one would indeed write
GROUP_CONCAT(type ORDER BY type)
and pass my "tl" and a string, say '1,2', to a procedure that using
SUBSTRING_INDEX taking the strings for arrays ensures that all found in the first string
is also in the second string. There are times when I wish SQL had arrays.
The fun part of solving this is to remember that SQL is a set-oriented
language. For each element in the set, none of them can be both 2 and 5
at the same time. So, you have to build two sets and check to see which
rows are in both.
One pattern works if you need to aggregate for just a few terms
SELECT a.id
from (select distinct id from mytable where type=2) a
INNER JOIN (select distinct id from mytable where type=5) b
on a.id=b.id
However, this gets numerically very expensive with more than a few JOINS
to the pattern. Also, there is no index on either of the temporary
results (a or b) so this is a full Cartesian product of both tables.
That means that although it gives you a correct answer, it will not
scale to 100000's of rows (or more) in either set.
So, here is a way to assemble the same result that uses much less
resources. Remember, each row you want is a member of a set.
CREATE TEMPORARY TABLE tmpList (
id int
, type int
, PRIMARY KEY (id,type)
)
INSERT IGNORE tmpList
SELECT id,type
FROM mytable
WHERE type in (2,5)
SELECT id, count(type) hits
FROM tmplist
GROUP BY id
HAVING hits=2
DROP TEMPORARY TABLE tmpList
Can you see why this works?
I created an indexed subset of rows that match either value (2 or 5) but
only keep one example of each. I accomplished that by the combination of
PRIMARY KEY and INSERT IGNORE. Then I counted many type values each ID
value represented in the subset. If I looked for 2 terms and I ended up
with hits=2, then I know that those ID values matched on both terms.
You can expand on this pattern to also do partial (M of N search terms)
or best-fit determinations.
I hope this was the kind of help you were looking for.
Regards,
--
Shawn Green
MySQL Principal Technical Support Engineer
Oracle USA, Inc. - Hardware and Software, Engineered to Work Together.
Office: Blountville, TN
--
MySQL General Mailing List
For list archives: http://lists.mysql.com/mysql
To unsubscribe: http://lists.mysql.com/mysql