Hello Daevid,

On 6/11/2013 3:59 PM, Daevid Vincent wrote:
I am trying to implement a filter so that a user could select various genres
they want "in" or "out". Perhaps they like 'action' and 'car chases' but
don't like 'foreign' and 'drifting' (or whatever...)

So I want something sort of like this, however IN() is using an "OR"
comparison when I need it to be an "AND"

SELECT DISTINCT
         s.`scene_id` AS `id`,
         GROUP_CONCAT(sg.`genre_id`)
FROM
         `dvds` AS d
         JOIN `scenes_list` AS s
                 ON s.`dvd_id` = d.`dvd_id`
         JOIN `scenes_genres` AS sg
                 ON sg.`scene_id` = s.`scene_id`
                 AND sg.`genre_id` IN (10,38)
                 AND sg.`genre_id` NOT IN (22,61)
GROUP BY s.`scene_id`;

This is giving me way way too many rows returned.

For example, I would expect this scene_id to be in the result set:

SELECT * FROM scenes_genres WHERE scene_id = 17;

scene_id  genre_id
--------  ----------
       17           1
       17           3
       17          10 <--
       17          19
       17          38 <--
       17          53
       17          58
       17          59

And this scene ID to NOT be in the result set:

SELECT * FROM scenes_genres WHERE scene_id = 11;

scene_id  genre_id
--------  ----------
       11           1
       11          10 <--
       11          19
       11          31
       11          32
                       <-- but does not have 38
       11          59

I've tried various subselect ideas, but of course this fails b/c genre_id
can't be multiple things at one time (AND)

    JOIN `scenes_genres` AS sg
      ON sg.`scene_id` = s.`scene_id`
         AND sg.`genre_id` IN (
           SELECT `genre_id` FROM `scenes_genres`
            WHERE `genre_id` = 10
                   AND `genre_id` = 38
                   AND `genre_id` <> 22
                   AND `genre_id` <> 61
        )

And straight up like this failure too...

   JOIN `scenes_genres` AS sg
     ON sg.`scene_id` = s.`scene_id`
        AND (sg.`genre_id` = 10 AND sg.`genre_id` = 38)
        AND NOT (sg.`genre_id` = 22 AND sg.`genre_id` = 61)

So I'm sort of out of ideas at this point and hoping someone has a way to do
this.

Also, just for S&G this is how we are currently implementing it, but we feel
the REGEXP is killing our queries and while "clever" is a bit hacky and
nullifies any indexes we have on the genres column as it requires a
file_sort table scan to compare substrings basically...

SELECT * FROM scene_all_genres WHERE scene_id = 17;

scene_id  genres
--------  ------------------------
       17  1|3|10|19|38|53|58|59|

SELECT * FROM scene_all_genres WHERE scene_id = 11;

scene_id  genres
--------  -------------------
       11  1|10|19|31|32|59|

SELECT DISTINCT
         s.`scene_id` AS `id`,
         sg.`genres`
FROM
         `scene_all_genres` AS sg,
         `dvds` AS d,
         `scenes_list` AS s
WHERE    dvd_id` = d.`dvd_id`
         AND sg.`scene_id` = s.`scene_id`
         AND sg.`genres` REGEXP '[[:<:]]10[[:>:]].*[[:<:]]38[[:>:]]'
         AND sg.`genres` NOT REGEXP '(([[:<:]]22[[:>:]])|([[:<:]]61[[:>:]]))'

;

http://dev.mysql.com/doc/refman/5.0/en/regexp.html#operator_regexp

[[:<:]], [[:>:]]

These markers stand for word boundaries. They match the beginning and end of
words, respectively. A word is a sequence of word characters that is not
preceded by or followed by word characters. A word character is an
alphanumeric character in the alnum class or an underscore (_).


To me it looks like you want to know how to match N for N when looking for subset of properties. It's a multi-step process using plain SQL but it's fast:

1) Build a list of scenes containing the genres you want to see

CREATE TEMPORARY TABLE tmpList(key(scene_id, genre_id))
SELECT scene_id, genre_id
FROM scenes_genres
WHERE genre_id` IN (10,38)  <-- in this case you have 2 terms to mach

2) From that list, determine which of those scenes also contain unwanted genres and remove them.

DELETE tmpList
FROM tmpList t
INNER JOIN scenes_genres sg
  on sg.scene_id = t.scene_id
  and sg.genre_id IN (22,61)

# at this point, tmpList contains all scenes that have any of the desired genres but none of the unwanted ones.

3) Check to see if any scene has all N matches.

SELECT scene_id, count(genre_id) as matches
FROM tmpList
GROUP BY scene_id
HAVING matches = 2
/* ^--- this is the number of terms you are trying to match. */


Of course you can modify the last query to eliminate the HAVING clause and pick the top 5 matching scenes (even if they are partial matches) like this

SELECT scene_id, count(genre_id) as matches
FROM tmpList
GROUP BY scene_id
ORDER BY matches DESC
LIMIT 5

Let us know if this is not what you wanted to do.



Note to the rest of the list:

Yes, we could have done most of this with self-joins or subqueries, an EXISTS, and a NOT EXISTS inside a single statement but I think the execution time would have been miserable. I also think that that approach also would not have allowed us to evaluate a partial match (like 5 out of 7 target genres), only complete matches would have been returned.

I am still very interested in seeing alternative solutions :)

--
Shawn Green
MySQL Principal Technical Support Engineer
Oracle USA, Inc. - Hardware and Software, Engineered to Work Together.
Office: Blountville, TN

--
MySQL General Mailing List
For list archives: http://lists.mysql.com/mysql
To unsubscribe:    http://lists.mysql.com/mysql

Reply via email to