Re: MySQL and set complements
Sure can: show create table activities; CREATE TABLE `activities` ( `id` int(11) NOT NULL AUTO_INCREMENT, `user_id` int(11) DEFAULT NULL, `email` varchar(100) DEFAULT NULL, `country_iso` varchar(2) DEFAULT NULL, `tags` varchar(255) DEFAULT NULL, `postcode` int(11) DEFAULT NULL, `activity` varchar(100) DEFAULT NULL, `page_id` int(11) DEFAULT NULL, `donation_frequency` varchar(100) DEFAULT NULL, `email_id` int(11) DEFAULT NULL, `created_at` timestamp NOT NULL DEFAULT CURRENT_TIMESTAMP ON UPDATE CURRENT_TIMESTAMP, PRIMARY KEY (`id`), KEY `user_idx` (`user_id`), KEY `email_idx` (`email_id`), KEY `activity_idx` (`activity`) ) ENGINE=MyISAM AUTO_INCREMENT=11331976 DEFAULT CHARSET=latin1 And the explain: ++-+---+---+-+--+-+++-+ | id | select_type | table | type | possible_keys | key | key_len | ref| rows | Extra | ++-+---+---+-+--+-+++-+ | 1 | SIMPLE | u | index | NULL| id_idx | 5 | NULL | 972064 | Using index | | 1 | SIMPLE | a | ref | user_idx,email_idx,activity_idx | user_idx | 5 | getup.u.id | 20 | Using where | ++-+---+---+-+--+-+++-+ Cheers, Leonardo Borges www.leonardoborges.com On Fri, Jul 8, 2011 at 11:58 AM, Johnny Withers wrote: > Can you post show create table for activity and explain output of the > problem query? > > On Jul 7, 2011 8:51 PM, "Leonardo Borges" > wrote: > > Hello everyone, > > I have an increasingly popular web application running on top of mysql and > due to its popularity, I'm running into performance issues. After carefully > examining database indexes and tuning queries I was able to pin down the > slowest part of the system. > > The app's got a user segmentation tool that allows you to filter users > based > on a range of criteria from which the slowest is: "Select all users that > did > not receive the email of id 100" > > To answer this question we turn to the activities table, which is basically > a denormalized log of actions taken by the user in this format: > user_id | activity| email_id | ... > 10 | email_sent | 100| ... > 10 | subscribed | NULL | ... > 10 | email_open | 100| ... > > > Given this table and the question above, the usual way of finding out all > users who did not receive this email is through the use of a left outer > join, such as: > > select u.id > from users u > left outer join activities a > on u.id = a.user_id > and a.activity = 'email_sent' > and a.email_id = 100 > where a.user_id is null > > That's all fine for medium-ish tables. However our current activities > table > has over 13 million rows, slowing the hell out of this left outer join, > taking about 52 seconds in my machine. > > What this query is trying to do is to get the relative complement of set > A(users) to B(activities). As far as I know mysql doesn't support set > subtraction, thus the reason for these queries being slow. > > Based on that I've setup a test database on Postgresql, which supports this > very set operation and rewrote the query to look like this: > > select u.id > from users u > except > select a.user_id > from activities a > where a.activity = 'email_sent' > and a.email_id = 100; > > The fact that postgresql knows how to subtract sets brought this query down > to only 4 seconds. > > My question then is: since this is a somewhat common query in our system, > are there any workarounds I could use in mysql to improve things? > > I did find one myself, but it's a bit convoluted and might not perform well > under load, but the following sql script gives me similar performance in > mysql: > > create temporary table email_sent_100 > select a.user_id > from user_activity_events a > where a.activity = 'email_sent' > > > and a.email_id = 100; > > create index user_id_idx on email_sent_100(user_id); //this could > potentially bring the runtime down in the case of a larg temp table. > > select count(u.id) > from users u > left outer join email_sent_100 s > on u.id = s.user_id > and s.user_id is null; > > A lot more lines and a lot more complex, but does the job in this example. > > I'd appreciate your thoughts. > > Cheers, > Leonardo Borges > www.leonardoborges.com > >
Re: MySQL and set complements
Can you post show create table for activity and explain output of the problem query? On Jul 7, 2011 8:51 PM, "Leonardo Borges" wrote: Hello everyone, I have an increasingly popular web application running on top of mysql and due to its popularity, I'm running into performance issues. After carefully examining database indexes and tuning queries I was able to pin down the slowest part of the system. The app's got a user segmentation tool that allows you to filter users based on a range of criteria from which the slowest is: "Select all users that did not receive the email of id 100" To answer this question we turn to the activities table, which is basically a denormalized log of actions taken by the user in this format: user_id | activity| email_id | ... 10 | email_sent | 100| ... 10 | subscribed | NULL | ... 10 | email_open | 100| ... Given this table and the question above, the usual way of finding out all users who did not receive this email is through the use of a left outer join, such as: select u.id from users u left outer join activities a on u.id = a.user_id and a.activity = 'email_sent' and a.email_id = 100 where a.user_id is null That's all fine for medium-ish tables. However our current activities table has over 13 million rows, slowing the hell out of this left outer join, taking about 52 seconds in my machine. What this query is trying to do is to get the relative complement of set A(users) to B(activities). As far as I know mysql doesn't support set subtraction, thus the reason for these queries being slow. Based on that I've setup a test database on Postgresql, which supports this very set operation and rewrote the query to look like this: select u.id from users u except select a.user_id from activities a where a.activity = 'email_sent' and a.email_id = 100; The fact that postgresql knows how to subtract sets brought this query down to only 4 seconds. My question then is: since this is a somewhat common query in our system, are there any workarounds I could use in mysql to improve things? I did find one myself, but it's a bit convoluted and might not perform well under load, but the following sql script gives me similar performance in mysql: create temporary table email_sent_100 select a.user_id from user_activity_events a where a.activity = 'email_sent' and a.email_id = 100; create index user_id_idx on email_sent_100(user_id); //this could potentially bring the runtime down in the case of a larg temp table. select count(u.id) from users u left outer join email_sent_100 s on u.id = s.user_id and s.user_id is null; A lot more lines and a lot more complex, but does the job in this example. I'd appreciate your thoughts. Cheers, Leonardo Borges www.leonardoborges.com
MySQL and set complements
Hello everyone, I have an increasingly popular web application running on top of mysql and due to its popularity, I'm running into performance issues. After carefully examining database indexes and tuning queries I was able to pin down the slowest part of the system. The app's got a user segmentation tool that allows you to filter users based on a range of criteria from which the slowest is: "Select all users that did not receive the email of id 100" To answer this question we turn to the activities table, which is basically a denormalized log of actions taken by the user in this format: user_id | activity| email_id | ... 10 | email_sent | 100| ... 10 | subscribed | NULL | ... 10 | email_open | 100| ... Given this table and the question above, the usual way of finding out all users who did not receive this email is through the use of a left outer join, such as: select u.id from users u left outer join activities a on u.id = a.user_id and a.activity = 'email_sent' and a.email_id = 100 where a.user_id is null That's all fine for medium-ish tables. However our current activities table has over 13 million rows, slowing the hell out of this left outer join, taking about 52 seconds in my machine. What this query is trying to do is to get the relative complement of set A(users) to B(activities). As far as I know mysql doesn't support set subtraction, thus the reason for these queries being slow. Based on that I've setup a test database on Postgresql, which supports this very set operation and rewrote the query to look like this: select u.id from users u except select a.user_id from activities a where a.activity = 'email_sent' and a.email_id = 100; The fact that postgresql knows how to subtract sets brought this query down to only 4 seconds. My question then is: since this is a somewhat common query in our system, are there any workarounds I could use in mysql to improve things? I did find one myself, but it's a bit convoluted and might not perform well under load, but the following sql script gives me similar performance in mysql: create temporary table email_sent_100 select a.user_id from user_activity_events a where a.activity = 'email_sent' and a.email_id = 100; create index user_id_idx on email_sent_100(user_id); //this could potentially bring the runtime down in the case of a larg temp table. select count(u.id) from users u left outer join email_sent_100 s on u.id = s.user_id and s.user_id is null; A lot more lines and a lot more complex, but does the job in this example. I'd appreciate your thoughts. Cheers, Leonardo Borges www.leonardoborges.com
Re: dumb question?
2011/07/06 23:56 -0700, XL Cordemans Goede morgen, and thank you for your suggestion. I am actually wondering if the difference between lasso 8.1 and 8.6 is so big that traditional lasso code can not be used when connecting w/ MySQL ? You mentioned "... This can be done in the server configuration, so no alterations are necessary ...". This mode is set in "my.cnf" (under Windows "my.ini"), found in one of a variety of standard places, in the variable "sql-mode", say sql-mode="ANSI,STRICT_TRANS_TABLES,NO_AUTO_CREATE_USER,NO_ENGINE_SUBSTITUTION" . The mode in question is "ANSI". There is always a copy of this file in the directory into which the program MySQL was installed, but that is not the first place where it is sought. Under Linux, say, "/etc/my.cnf" takes precedence. Quote from help: On Windows, MySQL programs read startup options from the following files, in the specified order (top items are used first). File NamePurpose WINDIR\my.ini, WINDIR\my.cnf Global options C:\my.ini, C:\my.cnf Global options INSTALLDIR\my.ini, INSTALLDIR\my.cnf Global options On Unix, Linux and Mac OS X, MySQL programs read startup options from the following files, in the specified order (top items are used first). File NamePurpose /etc/my.cnf Global options /etc/mysql/my.cnfGlobal options SYSCONFDIR/my.cnfGlobal options $MYSQL_HOME/my.cnf Server-specific options defaults-extra-file The file specified with --defaults-extra-file=path, if any ~/.my.cnfUser-specific options ~ represents the current user's home directory (the value of $HOME). SYSCONFDIR represents the directory specified with the SYSCONFDIR option to CMake when MySQL was built. By default, this is the etc directory located under the compiled-in installation directory. MYSQL_HOME is an environment variable containing the path to the directory in which the server-specific my.cnf file resides. If MYSQL_HOME is not set and you start the server using the mysqld_safe program, mysqld_safe attempts to set MYSQL_HOME as follows: Let BASEDIR and DATADIR represent the path names of the MySQL base directory and data directory, respectively. If there is a my.cnf file in DATADIR but not in BASEDIR, mysqld_safe sets MYSQL_HOME to DATADIR. Otherwise, if MYSQL_HOME is not set and there is no my.cnf file in DATADIR, mysqld_safe sets MYSQL_HOME to BASEDIR. In MySQL 5.5, use of DATADIR as the location for my.cnf is deprecated. Typically, DATADIR is /usr/local/mysql/data for a binary installation or /usr/local/var for a source installation. Note that this is the data directory location that was specified at configuration time, not the one specified with the --datadir option when mysqld starts. Use of --datadir at runtime has no effect on where the server looks for option files, because it looks for them before processing any options. MySQL looks for option files in the order just described and reads any that exist. If an option file that you want to use does not exist, create it with a plain text editor. -- MySQL General Mailing List For list archives: http://lists.mysql.com/mysql To unsubscribe:http://lists.mysql.com/mysql?unsub=arch...@jab.org