"I'm the ORIGINAL Rick James, B...."  (And, I'm still alive.)  LOL

If you are using PHP, you might want to stop at 31 bits per INT/SET.  PHP seems 
not to yet be in the 64-bit world.

> -----Original Message-----
> From: Daevid Vincent [mailto:dae...@daevid.com]
> Sent: Wednesday, June 12, 2013 11:26 AM
> To: mysql@lists.mysql.com
> Subject: RE: How do I select all rows of table that have some rows in
> another table (AND, not OR)
> 
> This idea is so fancy pants and clever I *wish* it could have worked for
> me.
> I checked and we actually have 65 genres currently (with more to come I'm
> sure) LOL *FML*. I'm almost ashamed I didn't think of this. I <3 me some
> bitmasks and this solution is so elegant. It's unfortunate there isn't a
> way to use more than 64-bits natively.
> 
> You're RICK JAMES Bitch! :-p   (please tell me you know the Dave
> Chappelles
> skit I'm referring to)
> 
> > -----Original Message-----
> > From: Rick James [mailto:rja...@yahoo-inc.com]
> > Sent: Wednesday, June 12, 2013 9:39 AM
> > To: Daevid Vincent; mysql@lists.mysql.com
> > Cc: 'shawn green'
> > Subject: RE: How do I select all rows of table that have some rows in
> > another table (AND, not OR)
> >
> > Thinking out of the box... (And posting my reply at the 'wrong' end of
> > the email.)...
> >
> > Are there fewer than 64 genres?  Use a SET or BIGINT UNSIGNED.
> >
> > > > >                  AND sg.`genre_id` IN (10,38)
> > > > >                  AND sg.`genre_id` NOT IN (22,61)
> >
> > -->
> >     AND  genre & ((1<<10) | (1<<38)) != 0
> >     AND  genre & ((1<<22) | (1<<61))  = 0
> >
> > This would avoid having that extra table, and save a lot of space.
> >
> > If you have more than 64 genres, then
> > Plan A: clump them into some kind of application grouping and use
> > multiple INTs/SETs.
> > Plan B: do mod & div arithmetic to compute which genre field to tackle.
> >
> > For B, something like:
> >     AND  (genre1 & (1<<0)) + (genre3 & (1<<8)) != 0
> >     AND  (genre2 & (1<<2)) + (genre6 & (1<<1))  = 0 (That's assuming
> > 10 bits per genre# field.  I would use 32 and INT
> > UNSIGNED.)
> >
> >
> >
> > > -----Original Message-----
> > > From: Daevid Vincent [mailto:dae...@daevid.com]
> > > Sent: Tuesday, June 11, 2013 4:17 PM
> > > To: mysql@lists.mysql.com
> > > Cc: 'shawn green'
> > > Subject: RE: How do I select all rows of table that have some rows
> > > in another table (AND, not OR)
> > >
> > >
> > >
> > > > -----Original Message-----
> > > > From: shawn green [mailto:shawn.l.gr...@oracle.com]
> > > > Sent: Tuesday, June 11, 2013 2:16 PM
> > > > To: mysql@lists.mysql.com
> > > > Subject: Re: How do I select all rows of table that have some rows
> > > > in another table (AND, not OR)
> > > >
> > > > Hello Daevid,
> > > >
> > > > On 6/11/2013 3:59 PM, Daevid Vincent wrote:
> > > > > I am trying to implement a filter so that a user could select
> > > > > various
> > > > genres
> > > > > they want "in" or "out". Perhaps they like 'action' and 'car
> chases'
> > > > > but don't like 'foreign' and 'drifting' (or whatever...)
> > > > >
> > > > > So I want something sort of like this, however IN() is using an
> "OR"
> > > > > comparison when I need it to be an "AND"
> > > > >
> > > > > SELECT DISTINCT
> > > > >          s.`scene_id` AS `id`,
> > > > >          GROUP_CONCAT(sg.`genre_id`) FROM
> > > > >          `dvds` AS d
> > > > >          JOIN `scenes_list` AS s
> > > > >                  ON s.`dvd_id` = d.`dvd_id`
> > > > >          JOIN `scenes_genres` AS sg
> > > > >                  ON sg.`scene_id` = s.`scene_id`
> > > > >                  AND sg.`genre_id` IN (10,38)
> > > > >                  AND sg.`genre_id` NOT IN (22,61) GROUP BY
> > > > > s.`scene_id`;
> > > > >
> > > > > This is giving me way way too many rows returned.
> > > > >
> > > > > For example, I would expect this scene_id to be in the result set:
> > > > >
> > > > > SELECT * FROM scenes_genres WHERE scene_id = 17;
> > > > >
> > > > > scene_id  genre_id
> > > > > --------  ----------
> > > > >        17           1
> > > > >        17           3
> > > > >        17          10 <--
> > > > >        17          19
> > > > >        17          38 <--
> > > > >        17          53
> > > > >        17          58
> > > > >        17          59
> > > > >
> > > > > And this scene ID to NOT be in the result set:
> > > > >
> > > > > SELECT * FROM scenes_genres WHERE scene_id = 11;
> > > > >
> > > > > scene_id  genre_id
> > > > > --------  ----------
> > > > >        11           1
> > > > >        11          10 <--
> > > > >        11          19
> > > > >        11          31
> > > > >        11          32
> > > > >                      <-- but does not have 38
> > > > >        11          59
> > > > >
> > > > > I've tried various subselect ideas, but of course this fails b/c
> > > genre_id
> > > > > can't be multiple things at one time (AND)
> > > > >
> > > > >     JOIN `scenes_genres` AS sg
> > > > >       ON sg.`scene_id` = s.`scene_id`
> > > > >          AND sg.`genre_id` IN (
> > > > >            SELECT `genre_id` FROM `scenes_genres`
> > > > >             WHERE `genre_id` = 10
> > > > >                    AND `genre_id` = 38
> > > > >                    AND `genre_id` <> 22
> > > > >                    AND `genre_id` <> 61
> > > > >         )
> > > > >
> > > > > And straight up like this failure too...
> > > > >
> > > > >    JOIN `scenes_genres` AS sg
> > > > >      ON sg.`scene_id` = s.`scene_id`
> > > > >         AND (sg.`genre_id` = 10 AND sg.`genre_id` = 38)
> > > > >         AND NOT (sg.`genre_id` = 22 AND sg.`genre_id` = 61)
> > > > >
> > > > > So I'm sort of out of ideas at this point and hoping someone has
> > > > > a way
> > > to
> > > > do
> > > > > this.
> > > > >
> > > > > Also, just for S&G this is how we are currently implementing it,
> > > > > but we
> > > > feel
> > > > > the REGEXP is killing our queries and while "clever" is a bit
> > > > > hacky and nullifies any indexes we have on the genres column as
> > > > > it requires a file_sort table scan to compare substrings
> basically...
> > > > >
> > > > > SELECT * FROM scene_all_genres WHERE scene_id = 17;
> > > > >
> > > > > scene_id  genres
> > > > > --------  ------------------------
> > > > >        17  1|3|10|19|38|53|58|59|
> > > > >
> > > > > SELECT * FROM scene_all_genres WHERE scene_id = 11;
> > > > >
> > > > > scene_id  genres
> > > > > --------  -------------------
> > > > >        11  1|10|19|31|32|59|
> > > > >
> > > > > SELECT DISTINCT
> > > > >          s.`scene_id` AS `id`,
> > > > >          sg.`genres`
> > > > > FROM
> > > > >          `scene_all_genres` AS sg,
> > > > >          `dvds` AS d,
> > > > >          `scenes_list` AS s
> > > > > WHERE          dvd_id` = d.`dvd_id`
> > > > >          AND sg.`scene_id` = s.`scene_id`
> > > > >          AND sg.`genres` REGEXP
> '[[:<:]]10[[:>:]].*[[:<:]]38[[:>:]]'
> > > > >          AND sg.`genres` NOT REGEXP
> > > > '(([[:<:]]22[[:>:]])|([[:<:]]61[[:>:]]))'
> > > > >
> > > > > ;
> > > > >
> > > > > http://dev.mysql.com/doc/refman/5.0/en/regexp.html#operator_rege
> > > > > xp
> > > > >
> > > > > [[:<:]], [[:>:]]
> > > > >
> > > > > These markers stand for word boundaries. They match the
> > > > > beginning and
> > > end
> > > > of
> > > > > words, respectively. A word is a sequence of word characters
> > > > > that is not preceded by or followed by word characters. A word
> > > > > character is an alphanumeric character in the alnum class or an
> underscore (_).
> > > > >
> > > >
> > > > To me it looks like you want to know how to match N for N when
> > > > looking for subset of properties. It's a multi-step process using
> > > > plain SQL but it's fast:
> > > >
> > > > 1) Build a list of scenes containing the genres you want to see
> > > >
> > > > CREATE TEMPORARY TABLE tmpList(key(scene_id, genre_id)) SELECT
> > > > scene_id, genre_id FROM scenes_genres WHERE genre_id` IN (10,38)
> > > > <-- in this case you have 2 terms to mach
> > > >
> > > > 2) From that list, determine which of those scenes also contain
> > > > unwanted genres and remove them.
> > > >
> > > > DELETE tmpList
> > > > FROM tmpList t
> > > > INNER JOIN scenes_genres sg
> > > >    on sg.scene_id = t.scene_id
> > > >    and sg.genre_id IN (22,61)
> > > >
> > > > # at this point, tmpList contains all scenes that have any of the
> > > > desired genres but none of the unwanted ones.
> > > >
> > > > 3) Check to see if any scene has all N matches.
> > > >
> > > > SELECT scene_id, count(genre_id) as matches FROM tmpList GROUP BY
> > > > scene_id HAVING matches = 2
> > > > /* ^--- this is the number of terms you are trying to match. */
> > > >
> > > >
> > > > Of course you can modify the last query to eliminate the HAVING
> > > > clause and pick the top 5 matching scenes (even if they are
> > > > partial matches) like this
> > > >
> > > > SELECT scene_id, count(genre_id) as matches FROM tmpList GROUP BY
> > > > scene_id ORDER BY matches DESC LIMIT 5
> > > >
> > > > Let us know if this is not what you wanted to do.
> > > >
> > > >
> > > >
> > > > Note to the rest of the list:
> > > >
> > > > Yes, we could have done most of this with self-joins or
> > > > subqueries, an EXISTS, and a NOT EXISTS inside a single statement
> > > > but I think the execution time would have been miserable. I also
> > > > think that that approach also would not have allowed us to
> > > > evaluate a partial match (like 5 out of 7 target genres), only
> > > > complete matches would have been returned.
> > > >
> > > > I am still very interested in seeing alternative solutions :)
> > > >
> > > > --
> > > > Shawn Green
> > > > MySQL Principal Technical Support Engineer Oracle USA, Inc. -
> > > > Hardware and Software, Engineered to Work Together.
> > > > Office: Blountville, TN
> > >
> > > Shawn, thank you for taking the time to reply. I wasn't expecting
> > > the solution to be so much "work" with multiple statements like
> > > that. I was thinking it could be done in one (or two, as in split
> > > out a portion of
> it
> > > in
> > > PHP and re-insert it to the original SQL to avoid a JOIN or
> something).
> > > Part
> > > of the issue is that we use PHP to generate the $sql string by
> > > appending bits and pieces depending on the search criteria thereby
> > > keeping the 'path'
> > > through the SQL statement simple and relatively linear. To implement
> this
> > > would require significant re-writing and/or special cases where we
> > > could introduce errors or omissions in the future. The frustrating
> > > part is
> that
> > > the REGEXP query we use now only takes about 2 seconds on my DEV VM
> (same
> > > database as PROD), however when the RDBMS is loaded it then takes up
> > > to
> 30
> > > seconds so in theory it's not even that inefficient given the #
> > > rows. We do use memcached for the results, but since there are so
> > > many combinations
> a
> > > user could choose, our hit ratio is not so great and therefore the
> > > cache isn't doing us much good and this is why the RDBMS can get
> > > loaded up easily.
> > >
> > > How can an "OR" be so simple using IN() but "AND" be so overly
> complex?
> > > Seems that mysql should have another function for ALL() that works
> > > just like
> > > IN() to handle this kind of scenario.
> > >
> > >
> > >
> > >
> > > --
> > > MySQL General Mailing List
> > > For list archives: http://lists.mysql.com/mysql
> > > To unsubscribe:    http://lists.mysql.com/mysql
> >
> >
> > --
> > MySQL General Mailing List
> > For list archives: http://lists.mysql.com/mysql
> > To unsubscribe:    http://lists.mysql.com/mysql
> 
> 
> --
> MySQL General Mailing List
> For list archives: http://lists.mysql.com/mysql
> To unsubscribe:    http://lists.mysql.com/mysql


--
MySQL General Mailing List
For list archives: http://lists.mysql.com/mysql
To unsubscribe:    http://lists.mysql.com/mysql

Reply via email to