Hello Daevid,
On 6/11/2013 3:59 PM, Daevid Vincent wrote:
I am trying to implement a filter so that a user could select various genres
they want "in" or "out". Perhaps they like 'action' and 'car chases' but
don't like 'foreign' and 'drifting' (or whatever...)
So I want something sort of like this, however IN() is using an "OR"
comparison when I need it to be an "AND"
SELECT DISTINCT
s.`scene_id` AS `id`,
GROUP_CONCAT(sg.`genre_id`)
FROM
`dvds` AS d
JOIN `scenes_list` AS s
ON s.`dvd_id` = d.`dvd_id`
JOIN `scenes_genres` AS sg
ON sg.`scene_id` = s.`scene_id`
AND sg.`genre_id` IN (10,38)
AND sg.`genre_id` NOT IN (22,61)
GROUP BY s.`scene_id`;
This is giving me way way too many rows returned.
For example, I would expect this scene_id to be in the result set:
SELECT * FROM scenes_genres WHERE scene_id = 17;
scene_id genre_id
-------- ----------
17 1
17 3
17 10 <--
17 19
17 38 <--
17 53
17 58
17 59
And this scene ID to NOT be in the result set:
SELECT * FROM scenes_genres WHERE scene_id = 11;
scene_id genre_id
-------- ----------
11 1
11 10 <--
11 19
11 31
11 32
<-- but does not have 38
11 59
I've tried various subselect ideas, but of course this fails b/c genre_id
can't be multiple things at one time (AND)
JOIN `scenes_genres` AS sg
ON sg.`scene_id` = s.`scene_id`
AND sg.`genre_id` IN (
SELECT `genre_id` FROM `scenes_genres`
WHERE `genre_id` = 10
AND `genre_id` = 38
AND `genre_id` <> 22
AND `genre_id` <> 61
)
And straight up like this failure too...
JOIN `scenes_genres` AS sg
ON sg.`scene_id` = s.`scene_id`
AND (sg.`genre_id` = 10 AND sg.`genre_id` = 38)
AND NOT (sg.`genre_id` = 22 AND sg.`genre_id` = 61)
So I'm sort of out of ideas at this point and hoping someone has a way to do
this.
Also, just for S&G this is how we are currently implementing it, but we feel
the REGEXP is killing our queries and while "clever" is a bit hacky and
nullifies any indexes we have on the genres column as it requires a
file_sort table scan to compare substrings basically...
SELECT * FROM scene_all_genres WHERE scene_id = 17;
scene_id genres
-------- ------------------------
17 1|3|10|19|38|53|58|59|
SELECT * FROM scene_all_genres WHERE scene_id = 11;
scene_id genres
-------- -------------------
11 1|10|19|31|32|59|
SELECT DISTINCT
s.`scene_id` AS `id`,
sg.`genres`
FROM
`scene_all_genres` AS sg,
`dvds` AS d,
`scenes_list` AS s
WHERE dvd_id` = d.`dvd_id`
AND sg.`scene_id` = s.`scene_id`
AND sg.`genres` REGEXP '[[:<:]]10[[:>:]].*[[:<:]]38[[:>:]]'
AND sg.`genres` NOT REGEXP '(([[:<:]]22[[:>:]])|([[:<:]]61[[:>:]]))'
;
http://dev.mysql.com/doc/refman/5.0/en/regexp.html#operator_regexp
[[:<:]], [[:>:]]
These markers stand for word boundaries. They match the beginning and end of
words, respectively. A word is a sequence of word characters that is not
preceded by or followed by word characters. A word character is an
alphanumeric character in the alnum class or an underscore (_).
To me it looks like you want to know how to match N for N when looking
for subset of properties. It's a multi-step process using plain SQL but
it's fast:
1) Build a list of scenes containing the genres you want to see
CREATE TEMPORARY TABLE tmpList(key(scene_id, genre_id))
SELECT scene_id, genre_id
FROM scenes_genres
WHERE genre_id` IN (10,38) <-- in this case you have 2 terms to mach
2) From that list, determine which of those scenes also contain unwanted
genres and remove them.
DELETE tmpList
FROM tmpList t
INNER JOIN scenes_genres sg
on sg.scene_id = t.scene_id
and sg.genre_id IN (22,61)
# at this point, tmpList contains all scenes that have any of the
desired genres but none of the unwanted ones.
3) Check to see if any scene has all N matches.
SELECT scene_id, count(genre_id) as matches
FROM tmpList
GROUP BY scene_id
HAVING matches = 2
/* ^--- this is the number of terms you are trying to match. */
Of course you can modify the last query to eliminate the HAVING clause
and pick the top 5 matching scenes (even if they are partial matches)
like this
SELECT scene_id, count(genre_id) as matches
FROM tmpList
GROUP BY scene_id
ORDER BY matches DESC
LIMIT 5
Let us know if this is not what you wanted to do.
Note to the rest of the list:
Yes, we could have done most of this with self-joins or subqueries, an
EXISTS, and a NOT EXISTS inside a single statement but I think the
execution time would have been miserable. I also think that that
approach also would not have allowed us to evaluate a partial match
(like 5 out of 7 target genres), only complete matches would have been
returned.
I am still very interested in seeing alternative solutions :)
--
Shawn Green
MySQL Principal Technical Support Engineer
Oracle USA, Inc. - Hardware and Software, Engineered to Work Together.
Office: Blountville, TN
--
MySQL General Mailing List
For list archives: http://lists.mysql.com/mysql
To unsubscribe: http://lists.mysql.com/mysql