RE: How do I select all rows of table that have some rows in another table (AND, not OR)
> -Original Message- > From: shawn green [mailto:shawn.l.gr...@oracle.com] > Sent: Tuesday, June 11, 2013 2:16 PM > To: mysql@lists.mysql.com > Subject: Re: How do I select all rows of table that have some rows in > another table (AND, not OR) > > Hello Daevid, > > On 6/11/2013 3:59 PM, Daevid Vincent wrote: > > I am trying to implement a filter so that a user could select various > genres > > they want "in" or "out". Perhaps they like 'action' and 'car chases' but > > don't like 'foreign' and 'drifting' (or whatever...) > > > > So I want something sort of like this, however IN() is using an "OR" > > comparison when I need it to be an "AND" > > > > SELECT DISTINCT > > s.`scene_id` AS `id`, > > GROUP_CONCAT(sg.`genre_id`) > > FROM > > `dvds` AS d > > JOIN `scenes_list` AS s > > ON s.`dvd_id` = d.`dvd_id` > > JOIN `scenes_genres` AS sg > > ON sg.`scene_id` = s.`scene_id` > > AND sg.`genre_id` IN (10,38) > > AND sg.`genre_id` NOT IN (22,61) > > GROUP BY s.`scene_id`; > > > > This is giving me way way too many rows returned. > > > > For example, I would expect this scene_id to be in the result set: > > > > SELECT * FROM scenes_genres WHERE scene_id = 17; > > > > scene_id genre_id > > -- > >17 1 > >17 3 > >17 10 <-- > >17 19 > >17 38 <-- > >17 53 > >17 58 > >17 59 > > > > And this scene ID to NOT be in the result set: > > > > SELECT * FROM scenes_genres WHERE scene_id = 11; > > > > scene_id genre_id > > -- > >11 1 > >11 10 <-- > >11 19 > >11 31 > >11 32 > ><-- but does not have 38 > >11 59 > > > > I've tried various subselect ideas, but of course this fails b/c genre_id > > can't be multiple things at one time (AND) > > > > JOIN `scenes_genres` AS sg > > ON sg.`scene_id` = s.`scene_id` > > AND sg.`genre_id` IN ( > >SELECT `genre_id` FROM `scenes_genres` > > WHERE `genre_id` = 10 > >AND `genre_id` = 38 > >AND `genre_id` <> 22 > >AND `genre_id` <> 61 > > ) > > > > And straight up like this failure too... > > > >JOIN `scenes_genres` AS sg > > ON sg.`scene_id` = s.`scene_id` > > AND (sg.`genre_id` = 10 AND sg.`genre_id` = 38) > > AND NOT (sg.`genre_id` = 22 AND sg.`genre_id` = 61) > > > > So I'm sort of out of ideas at this point and hoping someone has a way to > do > > this. > > > > Also, just for S&G this is how we are currently implementing it, but we > feel > > the REGEXP is killing our queries and while "clever" is a bit hacky and > > nullifies any indexes we have on the genres column as it requires a > > file_sort table scan to compare substrings basically... > > > > SELECT * FROM scene_all_genres WHERE scene_id = 17; > > > > scene_id genres > > > >17 1|3|10|19|38|53|58|59| > > > > SELECT * FROM scene_all_genres WHERE scene_id = 11; > > > > scene_id genres > > --- > >11 1|10|19|31|32|59| > > > > SELECT DISTINCT > > s.`scene_id` AS `id`, > > sg.`genres` > > FROM > > `scene_all_genres` AS sg, > > `dvds` AS d, > > `scenes_list` AS s > > WHEREdvd_id` = d.`dvd_id` > > AND sg.`scene_id` = s.`scene_id` > > AND sg.`genres` REGEXP '[[:<:]]10[[:>:]].*[[:<:]]38[[:>:]]' > > AND sg.`genres` NOT REGEXP > '(([[:<:]]22[[:>:]])|([[:<:]]61[[:>:]]))' > > > > ; > > > > http://dev.mysql.com/doc/refman/5.0/en/regexp.html#operator_regexp > > > > [[:<:]], [[:>:]] > > > > These markers stand for word boundaries. They match the beginning and end > of > > words, respectively. A word is a sequence of word characters that is not > > preceded by or followed by word characters. A word character is an > > alphanumeric character in the alnum class or an underscore (_). > > > > To me it looks like you want to know how to match N for N when looking > for subset of properties. It's a multi-step process using plain SQL but > it's fast: > > 1) Build a list of scenes containing the genres you want to see > > CREATE TEMPORARY TABLE tmpList(key(scene_id, genre_id)) > SELECT scene_id, genre_id > FROM scenes_genres > WHERE genre_id` IN (10,38) <-- in this case you have 2 terms to mach > > 2) From that list, determine which of those scenes also contain unwanted > genres and remove them. > > DELETE tmpList > FROM tmpList t > INNER JOIN scenes_genres sg >on sg.scene_id = t.scene_id >and sg.genre_id IN (22,61) > > # at this point, tmpList contains all scenes that have any of the > desired genres but none of the unwanted ones. > > 3)
Re: How do I select all rows of table that have some rows in another table (AND, not OR)
Hello Daevid, On 6/11/2013 3:59 PM, Daevid Vincent wrote: I am trying to implement a filter so that a user could select various genres they want "in" or "out". Perhaps they like 'action' and 'car chases' but don't like 'foreign' and 'drifting' (or whatever...) So I want something sort of like this, however IN() is using an "OR" comparison when I need it to be an "AND" SELECT DISTINCT s.`scene_id` AS `id`, GROUP_CONCAT(sg.`genre_id`) FROM `dvds` AS d JOIN `scenes_list` AS s ON s.`dvd_id` = d.`dvd_id` JOIN `scenes_genres` AS sg ON sg.`scene_id` = s.`scene_id` AND sg.`genre_id` IN (10,38) AND sg.`genre_id` NOT IN (22,61) GROUP BY s.`scene_id`; This is giving me way way too many rows returned. For example, I would expect this scene_id to be in the result set: SELECT * FROM scenes_genres WHERE scene_id = 17; scene_id genre_id -- 17 1 17 3 17 10 <-- 17 19 17 38 <-- 17 53 17 58 17 59 And this scene ID to NOT be in the result set: SELECT * FROM scenes_genres WHERE scene_id = 11; scene_id genre_id -- 11 1 11 10 <-- 11 19 11 31 11 32 <-- but does not have 38 11 59 I've tried various subselect ideas, but of course this fails b/c genre_id can't be multiple things at one time (AND) JOIN `scenes_genres` AS sg ON sg.`scene_id` = s.`scene_id` AND sg.`genre_id` IN ( SELECT `genre_id` FROM `scenes_genres` WHERE `genre_id` = 10 AND `genre_id` = 38 AND `genre_id` <> 22 AND `genre_id` <> 61 ) And straight up like this failure too... JOIN `scenes_genres` AS sg ON sg.`scene_id` = s.`scene_id` AND (sg.`genre_id` = 10 AND sg.`genre_id` = 38) AND NOT (sg.`genre_id` = 22 AND sg.`genre_id` = 61) So I'm sort of out of ideas at this point and hoping someone has a way to do this. Also, just for S&G this is how we are currently implementing it, but we feel the REGEXP is killing our queries and while "clever" is a bit hacky and nullifies any indexes we have on the genres column as it requires a file_sort table scan to compare substrings basically... SELECT * FROM scene_all_genres WHERE scene_id = 17; scene_id genres 17 1|3|10|19|38|53|58|59| SELECT * FROM scene_all_genres WHERE scene_id = 11; scene_id genres --- 11 1|10|19|31|32|59| SELECT DISTINCT s.`scene_id` AS `id`, sg.`genres` FROM `scene_all_genres` AS sg, `dvds` AS d, `scenes_list` AS s WHEREdvd_id` = d.`dvd_id` AND sg.`scene_id` = s.`scene_id` AND sg.`genres` REGEXP '[[:<:]]10[[:>:]].*[[:<:]]38[[:>:]]' AND sg.`genres` NOT REGEXP '(([[:<:]]22[[:>:]])|([[:<:]]61[[:>:]]))' ; http://dev.mysql.com/doc/refman/5.0/en/regexp.html#operator_regexp [[:<:]], [[:>:]] These markers stand for word boundaries. They match the beginning and end of words, respectively. A word is a sequence of word characters that is not preceded by or followed by word characters. A word character is an alphanumeric character in the alnum class or an underscore (_). To me it looks like you want to know how to match N for N when looking for subset of properties. It's a multi-step process using plain SQL but it's fast: 1) Build a list of scenes containing the genres you want to see CREATE TEMPORARY TABLE tmpList(key(scene_id, genre_id)) SELECT scene_id, genre_id FROM scenes_genres WHERE genre_id` IN (10,38) <-- in this case you have 2 terms to mach 2) From that list, determine which of those scenes also contain unwanted genres and remove them. DELETE tmpList FROM tmpList t INNER JOIN scenes_genres sg on sg.scene_id = t.scene_id and sg.genre_id IN (22,61) # at this point, tmpList contains all scenes that have any of the desired genres but none of the unwanted ones. 3) Check to see if any scene has all N matches. SELECT scene_id, count(genre_id) as matches FROM tmpList GROUP BY scene_id HAVING matches = 2 /* ^--- this is the number of terms you are trying to match. */ Of course you can modify the last query to eliminate the HAVING clause and pick the top 5 matching scenes (even if they are partial matches) like this SELECT scene_id, count(genre_id) as matches FROM tmpList GROUP BY scene_id ORDER BY matches DESC LIMIT 5 Let us know if this is not what you wanted to do. Note to the rest of the list: Yes, we could have done most of this with self-joins or subqueries, an EXISTS, and a NOT EXISTS inside a single statement but I think the execution time would have been miserable. I also think
How do I select all rows of table that have some rows in another table (AND, not OR)
I am trying to implement a filter so that a user could select various genres they want "in" or "out". Perhaps they like 'action' and 'car chases' but don't like 'foreign' and 'drifting' (or whatever...) So I want something sort of like this, however IN() is using an "OR" comparison when I need it to be an "AND" SELECT DISTINCT s.`scene_id` AS `id`, GROUP_CONCAT(sg.`genre_id`) FROM `dvds` AS d JOIN `scenes_list` AS s ON s.`dvd_id` = d.`dvd_id` JOIN `scenes_genres` AS sg ON sg.`scene_id` = s.`scene_id` AND sg.`genre_id` IN (10,38) AND sg.`genre_id` NOT IN (22,61) GROUP BY s.`scene_id`; This is giving me way way too many rows returned. For example, I would expect this scene_id to be in the result set: SELECT * FROM scenes_genres WHERE scene_id = 17; scene_id genre_id -- 17 1 17 3 17 10 <-- 17 19 17 38 <-- 17 53 17 58 17 59 And this scene ID to NOT be in the result set: SELECT * FROM scenes_genres WHERE scene_id = 11; scene_id genre_id -- 11 1 11 10 <-- 11 19 11 31 11 32 <-- but does not have 38 11 59 I've tried various subselect ideas, but of course this fails b/c genre_id can't be multiple things at one time (AND) JOIN `scenes_genres` AS sg ON sg.`scene_id` = s.`scene_id` AND sg.`genre_id` IN ( SELECT `genre_id` FROM `scenes_genres` WHERE `genre_id` = 10 AND `genre_id` = 38 AND `genre_id` <> 22 AND `genre_id` <> 61 ) And straight up like this failure too... JOIN `scenes_genres` AS sg ON sg.`scene_id` = s.`scene_id` AND (sg.`genre_id` = 10 AND sg.`genre_id` = 38) AND NOT (sg.`genre_id` = 22 AND sg.`genre_id` = 61) So I'm sort of out of ideas at this point and hoping someone has a way to do this. Also, just for S&G this is how we are currently implementing it, but we feel the REGEXP is killing our queries and while "clever" is a bit hacky and nullifies any indexes we have on the genres column as it requires a file_sort table scan to compare substrings basically... SELECT * FROM scene_all_genres WHERE scene_id = 17; scene_id genres 17 1|3|10|19|38|53|58|59| SELECT * FROM scene_all_genres WHERE scene_id = 11; scene_id genres --- 11 1|10|19|31|32|59| SELECT DISTINCT s.`scene_id` AS `id`, sg.`genres` FROM `scene_all_genres` AS sg, `dvds` AS d, `scenes_list` AS s WHEREdvd_id` = d.`dvd_id` AND sg.`scene_id` = s.`scene_id` AND sg.`genres` REGEXP '[[:<:]]10[[:>:]].*[[:<:]]38[[:>:]]' AND sg.`genres` NOT REGEXP '(([[:<:]]22[[:>:]])|([[:<:]]61[[:>:]]))' ; http://dev.mysql.com/doc/refman/5.0/en/regexp.html#operator_regexp [[:<:]], [[:>:]] These markers stand for word boundaries. They match the beginning and end of words, respectively. A word is a sequence of word characters that is not preceded by or followed by word characters. A word character is an alphanumeric character in the alnum class or an underscore (_). -- MySQL General Mailing List For list archives: http://lists.mysql.com/mysql To unsubscribe:http://lists.mysql.com/mysql