Sure can: show create table activities;
CREATE TABLE `activities` ( `id` int(11) NOT NULL AUTO_INCREMENT, `user_id` int(11) DEFAULT NULL, `email` varchar(100) DEFAULT NULL, `country_iso` varchar(2) DEFAULT NULL, `tags` varchar(255) DEFAULT NULL, `postcode` int(11) DEFAULT NULL, `activity` varchar(100) DEFAULT NULL, `page_id` int(11) DEFAULT NULL, `donation_frequency` varchar(100) DEFAULT NULL, `email_id` int(11) DEFAULT NULL, `created_at` timestamp NOT NULL DEFAULT CURRENT_TIMESTAMP ON UPDATE CURRENT_TIMESTAMP, PRIMARY KEY (`id`), KEY `user_idx` (`user_id`), KEY `email_idx` (`email_id`), KEY `activity_idx` (`activity`) ) ENGINE=MyISAM AUTO_INCREMENT=11331976 DEFAULT CHARSET=latin1 And the explain: +----+-------------+-------+-------+---------------------------------+----------+---------+------------+--------+-------------+ | id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra | +----+-------------+-------+-------+---------------------------------+----------+---------+------------+--------+-------------+ | 1 | SIMPLE | u | index | NULL | id_idx | 5 | NULL | 972064 | Using index | | 1 | SIMPLE | a | ref | user_idx,email_idx,activity_idx | user_idx | 5 | getup.u.id | 20 | Using where | +----+-------------+-------+-------+---------------------------------+----------+---------+------------+--------+-------------+ Cheers, Leonardo Borges www.leonardoborges.com On Fri, Jul 8, 2011 at 11:58 AM, Johnny Withers <joh...@pixelated.net>wrote: > Can you post show create table for activity and explain output of the > problem query? > > On Jul 7, 2011 8:51 PM, "Leonardo Borges" <leonardoborges...@gmail.com> > wrote: > > Hello everyone, > > I have an increasingly popular web application running on top of mysql and > due to its popularity, I'm running into performance issues. After carefully > examining database indexes and tuning queries I was able to pin down the > slowest part of the system. > > The app's got a user segmentation tool that allows you to filter users > based > on a range of criteria from which the slowest is: "Select all users that > did > not receive the email of id 100" > > To answer this question we turn to the activities table, which is basically > a denormalized log of actions taken by the user in this format: > user_id | activity | email_id | ... > 10 | email_sent | 100 | ... > 10 | subscribed | NULL | ... > 10 | email_open | 100 | ... > > > Given this table and the question above, the usual way of finding out all > users who did not receive this email is through the use of a left outer > join, such as: > > select u.id > from users u > left outer join activities a > on u.id = a.user_id > and a.activity = 'email_sent' > and a.email_id = 100 > where a.user_id is null > > That's all fine for medium-ish tables. However our current activities > table > has over 13 million rows, slowing the hell out of this left outer join, > taking about 52 seconds in my machine. > > What this query is trying to do is to get the relative complement of set > A(users) to B(activities). As far as I know mysql doesn't support set > subtraction, thus the reason for these queries being slow. > > Based on that I've setup a test database on Postgresql, which supports this > very set operation and rewrote the query to look like this: > > select u.id > from users u > except > select a.user_id > from activities a > where a.activity = 'email_sent' > and a.email_id = 100; > > The fact that postgresql knows how to subtract sets brought this query down > to only 4 seconds. > > My question then is: since this is a somewhat common query in our system, > are there any workarounds I could use in mysql to improve things? > > I did find one myself, but it's a bit convoluted and might not perform well > under load, but the following sql script gives me similar performance in > mysql: > > create temporary table email_sent_100 > select a.user_id > from user_activity_events a > where a.activity = 'email_sent' > > > and a.email_id = 100; > > create index user_id_idx on email_sent_100(user_id); //this could > potentially bring the runtime down in the case of a larg temp table. > > select count(u.id) > from users u > left outer join email_sent_100 s > on u.id = s.user_id > and s.user_id is null; > > A lot more lines and a lot more complex, but does the job in this example. > > I'd appreciate your thoughts. > > Cheers, > Leonardo Borges > www.leonardoborges.com > >