>>>> 2013/06/11 12:59 -0700, Daevid Vincent >>>>
Also, just for S&G this is how we are currently implementing it, but we feel
the REGEXP is killing our queries and while "clever" is a bit hacky and
nullifies any indexes we have on the genres column as it requires a
file_sort table scan to compare substrings basically...

SELECT * FROM scene_all_genres WHERE scene_id = 17;

scene_id  genres                  
--------  ------------------------
      17  1|3|10|19|38|53|58|59|  

SELECT * FROM scene_all_genres WHERE scene_id = 11;

scene_id  genres             
--------  -------------------
      11  1|10|19|31|32|59|  
<<<<<<<<
Except that, it seems to me, it somehow reflects the reality of assigning 
attributes to the scenes (movies?) that you catalog. In a way, it looks very 
much like using a bitstring wherin each place stands for one attribute. If you 
then have also a bitstring for each user s likes and one for rows (peeves), 
telling howmany 1s are at the same place for the "genres" and liking (bit-AND, 
MySQL "&" followed by telling the number of 1s), and same for the "genres" and 
the row or peeve yields a number howmany match for liking, and how many match 
for becoming peeved. If the liking is enough greater than the becoming peeved, 
the scene and the user match.

Unhappily, although this, using bitstring for set of attributes to match, is an 
old and well understood topic, MySQL s support for bitstrings is poor, limited 
to integers (as C is so limited)--that is, to 64 bits. If you have more, you 
have to use more "words". There is, furthermore, no function for telling 
howmany 1s (or 0s) there are in an integer.

Now, if, in a more perfect world, MySQL had bitstring, and, furthermore, MySQL 
s SET were mapped onto bitstring, where it belongs, you could not only use bit 
operations (MySQL s & | ^), but also name the bits as you like.

The problem with writing one s own bit-telling function is, of course, time, 
and hiding useful information from the optimizer. In any case, here is a 
function for it, using an old well worn trick that depends on binary arithmetic:

delimiter ?
create function bittell(B INTEGER) RETURNS INTEGER
DETERMINISTIC
NO SQL
COMMENT 'Howmany 1s in argument?'
begin
declare E integer;
SET E = 0;
WHILE B <> 0 DO
        set B = (B-1) & B, E = E + 1;
end WHILE;
RETURN E;
end ?
delimiter ;

If you stick with the character-string set, with a slight change in 
representation you can use a simpler-looking pattern--not more efficient, if 
MySQL s implementation is good, but of easier reading: separate the decimal 
numerals with a character that is neither a decimal digit nor a REGEXP 
operator, and bound the whole string with it--comma or semicolon (among others) 
are good.
       ',1,10,19,31,32,59,' REGEXP ',10,+.*,38,' is 0
       ',1,10,19,31,32,59,' REGEXP ',10,+.*,32,' is 1
(See also FIND_IN_SET.)

Somewhere I read that for lack of support bitstring has been withdrawn from the 
SQL standard. This is such an obvious use; why is it not supported?


-- 
MySQL General Mailing List
For list archives: http://lists.mysql.com/mysql
To unsubscribe:    http://lists.mysql.com/mysql

Reply via email to