RE: How do I select all rows of table that have some rows in another table (AND, not OR)

2013-06-11 Thread Daevid Vincent


> -Original Message-
> From: shawn green [mailto:shawn.l.gr...@oracle.com]
> Sent: Tuesday, June 11, 2013 2:16 PM
> To: mysql@lists.mysql.com
> Subject: Re: How do I select all rows of table that have some rows in
> another table (AND, not OR)
> 
> Hello Daevid,
> 
> On 6/11/2013 3:59 PM, Daevid Vincent wrote:
> > I am trying to implement a filter so that a user could select various
> genres
> > they want "in" or "out". Perhaps they like 'action' and 'car chases' but
> > don't like 'foreign' and 'drifting' (or whatever...)
> >
> > So I want something sort of like this, however IN() is using an "OR"
> > comparison when I need it to be an "AND"
> >
> > SELECT DISTINCT
> >  s.`scene_id` AS `id`,
> >  GROUP_CONCAT(sg.`genre_id`)
> > FROM
> >  `dvds` AS d
> >  JOIN `scenes_list` AS s
> >  ON s.`dvd_id` = d.`dvd_id`
> >  JOIN `scenes_genres` AS sg
> >  ON sg.`scene_id` = s.`scene_id`
> >  AND sg.`genre_id` IN (10,38)
> >  AND sg.`genre_id` NOT IN (22,61)
> > GROUP BY s.`scene_id`;
> >
> > This is giving me way way too many rows returned.
> >
> > For example, I would expect this scene_id to be in the result set:
> >
> > SELECT * FROM scenes_genres WHERE scene_id = 17;
> >
> > scene_id  genre_id
> >   --
> >17   1
> >17   3
> >17  10 <--
> >17  19
> >17  38 <--
> >17  53
> >17  58
> >17  59
> >
> > And this scene ID to NOT be in the result set:
> >
> > SELECT * FROM scenes_genres WHERE scene_id = 11;
> >
> > scene_id  genre_id
> >   --
> >11   1
> >11  10 <--
> >11  19
> >11  31
> >11  32
> ><-- but does not have 38
> >11  59
> >
> > I've tried various subselect ideas, but of course this fails b/c
genre_id
> > can't be multiple things at one time (AND)
> >
> > JOIN `scenes_genres` AS sg
> >   ON sg.`scene_id` = s.`scene_id`
> >  AND sg.`genre_id` IN (
> >SELECT `genre_id` FROM `scenes_genres`
> > WHERE `genre_id` = 10
> >AND `genre_id` = 38
> >AND `genre_id` <> 22
> >AND `genre_id` <> 61
> > )
> >
> > And straight up like this failure too...
> >
> >JOIN `scenes_genres` AS sg
> >  ON sg.`scene_id` = s.`scene_id`
> > AND (sg.`genre_id` = 10 AND sg.`genre_id` = 38)
> > AND NOT (sg.`genre_id` = 22 AND sg.`genre_id` = 61)
> >
> > So I'm sort of out of ideas at this point and hoping someone has a way
to
> do
> > this.
> >
> > Also, just for S&G this is how we are currently implementing it, but we
> feel
> > the REGEXP is killing our queries and while "clever" is a bit hacky and
> > nullifies any indexes we have on the genres column as it requires a
> > file_sort table scan to compare substrings basically...
> >
> > SELECT * FROM scene_all_genres WHERE scene_id = 17;
> >
> > scene_id  genres
> >   
> >17  1|3|10|19|38|53|58|59|
> >
> > SELECT * FROM scene_all_genres WHERE scene_id = 11;
> >
> > scene_id  genres
> >   ---
> >11  1|10|19|31|32|59|
> >
> > SELECT DISTINCT
> >  s.`scene_id` AS `id`,
> >  sg.`genres`
> > FROM
> >  `scene_all_genres` AS sg,
> >  `dvds` AS d,
> >  `scenes_list` AS s
> > WHEREdvd_id` = d.`dvd_id`
> >  AND sg.`scene_id` = s.`scene_id`
> >  AND sg.`genres` REGEXP '[[:<:]]10[[:>:]].*[[:<:]]38[[:>:]]'
> >  AND sg.`genres` NOT REGEXP
> '(([[:<:]]22[[:>:]])|([[:<:]]61[[:>:]]))'
> >
> > ;
> >
> > http://dev.mysql.com/doc/refman/5.0/en/regexp.html#operator_regexp
> >
> > [[:<:]], [[:>:]]
> >
> > These markers stand for word boundaries. They match the beginning and
end
> of
> > words, respectively. A word is a sequence of word characters that is not
> > preceded by or followed by word characters. A word character is an
> > alphanumeric character in the alnum class or an underscore (_).
> >
> 
> To me it looks like you want to know how to match N for N when looking
> for subset of properties. It's a multi-step process using plain SQL but
> it's fast:
> 
> 1) Build a list of scenes containing the genres you want to see
> 
> CREATE TEMPORARY TABLE tmpList(key(scene_id, genre_id))
> SELECT scene_id, genre_id
> FROM scenes_genres
> WHERE genre_id` IN (10,38)  <-- in this case you have 2 terms to mach
> 
> 2) From that list, determine which of those scenes also contain unwanted
> genres and remove them.
> 
> DELETE tmpList
> FROM tmpList t
> INNER JOIN scenes_genres sg
>on sg.scene_id = t.scene_id
>and sg.genre_id IN (22,61)
> 
> # at this point, tmpList contains all scenes that have any of the
> desired genres but none of the unwanted ones.
> 
> 3)

Re: How do I select all rows of table that have some rows in another table (AND, not OR)

2013-06-11 Thread shawn green

Hello Daevid,

On 6/11/2013 3:59 PM, Daevid Vincent wrote:

I am trying to implement a filter so that a user could select various genres
they want "in" or "out". Perhaps they like 'action' and 'car chases' but
don't like 'foreign' and 'drifting' (or whatever...)

So I want something sort of like this, however IN() is using an "OR"
comparison when I need it to be an "AND"

SELECT DISTINCT
 s.`scene_id` AS `id`,
 GROUP_CONCAT(sg.`genre_id`)
FROM
 `dvds` AS d
 JOIN `scenes_list` AS s
 ON s.`dvd_id` = d.`dvd_id`
 JOIN `scenes_genres` AS sg
 ON sg.`scene_id` = s.`scene_id`
 AND sg.`genre_id` IN (10,38)
 AND sg.`genre_id` NOT IN (22,61)
GROUP BY s.`scene_id`;

This is giving me way way too many rows returned.

For example, I would expect this scene_id to be in the result set:

SELECT * FROM scenes_genres WHERE scene_id = 17;

scene_id  genre_id
  --
   17   1
   17   3
   17  10 <--
   17  19
   17  38 <--
   17  53
   17  58
   17  59

And this scene ID to NOT be in the result set:

SELECT * FROM scenes_genres WHERE scene_id = 11;

scene_id  genre_id
  --
   11   1
   11  10 <--
   11  19
   11  31
   11  32
   <-- but does not have 38
   11  59

I've tried various subselect ideas, but of course this fails b/c genre_id
can't be multiple things at one time (AND)

JOIN `scenes_genres` AS sg
  ON sg.`scene_id` = s.`scene_id`
 AND sg.`genre_id` IN (
   SELECT `genre_id` FROM `scenes_genres`
WHERE `genre_id` = 10
   AND `genre_id` = 38
   AND `genre_id` <> 22
   AND `genre_id` <> 61
)

And straight up like this failure too...

   JOIN `scenes_genres` AS sg
 ON sg.`scene_id` = s.`scene_id`
AND (sg.`genre_id` = 10 AND sg.`genre_id` = 38)
AND NOT (sg.`genre_id` = 22 AND sg.`genre_id` = 61)

So I'm sort of out of ideas at this point and hoping someone has a way to do
this.

Also, just for S&G this is how we are currently implementing it, but we feel
the REGEXP is killing our queries and while "clever" is a bit hacky and
nullifies any indexes we have on the genres column as it requires a
file_sort table scan to compare substrings basically...

SELECT * FROM scene_all_genres WHERE scene_id = 17;

scene_id  genres
  
   17  1|3|10|19|38|53|58|59|

SELECT * FROM scene_all_genres WHERE scene_id = 11;

scene_id  genres
  ---
   11  1|10|19|31|32|59|

SELECT DISTINCT
 s.`scene_id` AS `id`,
 sg.`genres`
FROM
 `scene_all_genres` AS sg,
 `dvds` AS d,
 `scenes_list` AS s
WHEREdvd_id` = d.`dvd_id`
 AND sg.`scene_id` = s.`scene_id`
 AND sg.`genres` REGEXP '[[:<:]]10[[:>:]].*[[:<:]]38[[:>:]]'
 AND sg.`genres` NOT REGEXP '(([[:<:]]22[[:>:]])|([[:<:]]61[[:>:]]))'

;

http://dev.mysql.com/doc/refman/5.0/en/regexp.html#operator_regexp

[[:<:]], [[:>:]]

These markers stand for word boundaries. They match the beginning and end of
words, respectively. A word is a sequence of word characters that is not
preceded by or followed by word characters. A word character is an
alphanumeric character in the alnum class or an underscore (_).



To me it looks like you want to know how to match N for N when looking 
for subset of properties. It's a multi-step process using plain SQL but 
it's fast:


1) Build a list of scenes containing the genres you want to see

CREATE TEMPORARY TABLE tmpList(key(scene_id, genre_id))
SELECT scene_id, genre_id
FROM scenes_genres
WHERE genre_id` IN (10,38)  <-- in this case you have 2 terms to mach

2) From that list, determine which of those scenes also contain unwanted 
genres and remove them.


DELETE tmpList
FROM tmpList t
INNER JOIN scenes_genres sg
  on sg.scene_id = t.scene_id
  and sg.genre_id IN (22,61)

# at this point, tmpList contains all scenes that have any of the 
desired genres but none of the unwanted ones.


3) Check to see if any scene has all N matches.

SELECT scene_id, count(genre_id) as matches
FROM tmpList
GROUP BY scene_id
HAVING matches = 2
/* ^--- this is the number of terms you are trying to match. */


Of course you can modify the last query to eliminate the HAVING clause 
and pick the top 5 matching scenes (even if they are partial matches) 
like this


SELECT scene_id, count(genre_id) as matches
FROM tmpList
GROUP BY scene_id
ORDER BY matches DESC
LIMIT 5

Let us know if this is not what you wanted to do.



Note to the rest of the list:

Yes, we could have done most of this with self-joins or subqueries, an 
EXISTS, and a NOT EXISTS inside a single statement but I think the 
execution time would have been miserable. I also think

How do I select all rows of table that have some rows in another table (AND, not OR)

2013-06-11 Thread Daevid Vincent
I am trying to implement a filter so that a user could select various genres
they want "in" or "out". Perhaps they like 'action' and 'car chases' but
don't like 'foreign' and 'drifting' (or whatever...)

So I want something sort of like this, however IN() is using an "OR"
comparison when I need it to be an "AND"

SELECT DISTINCT 
s.`scene_id` AS `id`,
GROUP_CONCAT(sg.`genre_id`)
FROM
`dvds` AS d 
JOIN `scenes_list` AS s 
ON s.`dvd_id` = d.`dvd_id` 
JOIN `scenes_genres` AS sg 
ON sg.`scene_id` = s.`scene_id` 
AND sg.`genre_id` IN (10,38) 
AND sg.`genre_id` NOT IN (22,61)
GROUP BY s.`scene_id`;

This is giving me way way too many rows returned. 

For example, I would expect this scene_id to be in the result set:

SELECT * FROM scenes_genres WHERE scene_id = 17;

scene_id  genre_id  
  --
  17   1
  17   3
  17  10 <--
  17  19
  17  38 <--
  17  53
  17  58
  17  59

And this scene ID to NOT be in the result set:

SELECT * FROM scenes_genres WHERE scene_id = 11;

scene_id  genre_id  
  --
  11   1
  11  10 <--  
  11  19
  11  31
  11  32
   <-- but does not have 38
  11  59

I've tried various subselect ideas, but of course this fails b/c genre_id
can't be multiple things at one time (AND)

   JOIN `scenes_genres` AS sg 
 ON sg.`scene_id` = s.`scene_id` 
AND sg.`genre_id` IN (
  SELECT `genre_id` FROM `scenes_genres` 
   WHERE `genre_id` = 10 
  AND `genre_id` = 38
  AND `genre_id` <> 22 
  AND `genre_id` <> 61
   )

And straight up like this failure too...

  JOIN `scenes_genres` AS sg 
ON sg.`scene_id` = s.`scene_id` 
   AND (sg.`genre_id` = 10 AND sg.`genre_id` = 38) 
   AND NOT (sg.`genre_id` = 22 AND sg.`genre_id` = 61)

So I'm sort of out of ideas at this point and hoping someone has a way to do
this.

Also, just for S&G this is how we are currently implementing it, but we feel
the REGEXP is killing our queries and while "clever" is a bit hacky and
nullifies any indexes we have on the genres column as it requires a
file_sort table scan to compare substrings basically...

SELECT * FROM scene_all_genres WHERE scene_id = 17;

scene_id  genres  
  
  17  1|3|10|19|38|53|58|59|  

SELECT * FROM scene_all_genres WHERE scene_id = 11;

scene_id  genres 
  ---
  11  1|10|19|31|32|59|  

SELECT DISTINCT 
s.`scene_id` AS `id`,
sg.`genres` 
FROM
`scene_all_genres` AS sg,
`dvds` AS d,
`scenes_list` AS s 
WHEREdvd_id` = d.`dvd_id` 
AND sg.`scene_id` = s.`scene_id` 
AND sg.`genres` REGEXP '[[:<:]]10[[:>:]].*[[:<:]]38[[:>:]]' 
AND sg.`genres` NOT REGEXP '(([[:<:]]22[[:>:]])|([[:<:]]61[[:>:]]))'

;

http://dev.mysql.com/doc/refman/5.0/en/regexp.html#operator_regexp

[[:<:]], [[:>:]]

These markers stand for word boundaries. They match the beginning and end of
words, respectively. A word is a sequence of word characters that is not
preceded by or followed by word characters. A word character is an
alphanumeric character in the alnum class or an underscore (_).



-- 
MySQL General Mailing List
For list archives: http://lists.mysql.com/mysql
To unsubscribe:http://lists.mysql.com/mysql