Leonardo, I think a new compound key on email_id and activity in the activities table may help.
I'm not sure if this will help or not, Its hard to test w/o having a large data set to test against. On Thu, Jul 7, 2011 at 9:07 PM, Leonardo Borges <leonardoborges...@gmail.com > wrote: > Sure can: > > show create table activities; > > CREATE TABLE `activities` ( > `id` int(11) NOT NULL AUTO_INCREMENT, > `user_id` int(11) DEFAULT NULL, > `email` varchar(100) DEFAULT NULL, > `country_iso` varchar(2) DEFAULT NULL, > `tags` varchar(255) DEFAULT NULL, > `postcode` int(11) DEFAULT NULL, > `activity` varchar(100) DEFAULT NULL, > `page_id` int(11) DEFAULT NULL, > `donation_frequency` varchar(100) DEFAULT NULL, > `email_id` int(11) DEFAULT NULL, > `created_at` timestamp NOT NULL DEFAULT CURRENT_TIMESTAMP ON UPDATE > CURRENT_TIMESTAMP, > PRIMARY KEY (`id`), > KEY `user_idx` (`user_id`), > KEY `email_idx` (`email_id`), > KEY `activity_idx` (`activity`) > ) ENGINE=MyISAM AUTO_INCREMENT=11331976 DEFAULT CHARSET=latin1 > > > And the explain: > > > +----+-------------+-------+-------+---------------------------------+----------+---------+------------+--------+-------------+ > | id | select_type | table | type | possible_keys | key > | key_len | ref | rows | Extra | > > +----+-------------+-------+-------+---------------------------------+----------+---------+------------+--------+-------------+ > | 1 | SIMPLE | u | index | NULL | > id_idx | 5 | NULL | 972064 | Using index | > | 1 | SIMPLE | a | ref | user_idx,email_idx,activity_idx | > user_idx | 5 | getup.u.id | 20 | Using where | > > +----+-------------+-------+-------+---------------------------------+----------+---------+------------+--------+-------------+ > > > Cheers, > Leonardo Borges > www.leonardoborges.com > > > On Fri, Jul 8, 2011 at 11:58 AM, Johnny Withers <joh...@pixelated.net>wrote: > >> Can you post show create table for activity and explain output of the >> problem query? >> >> On Jul 7, 2011 8:51 PM, "Leonardo Borges" <leonardoborges...@gmail.com> >> wrote: >> >> Hello everyone, >> >> I have an increasingly popular web application running on top of mysql and >> due to its popularity, I'm running into performance issues. After >> carefully >> examining database indexes and tuning queries I was able to pin down the >> slowest part of the system. >> >> The app's got a user segmentation tool that allows you to filter users >> based >> on a range of criteria from which the slowest is: "Select all users that >> did >> not receive the email of id 100" >> >> To answer this question we turn to the activities table, which is >> basically >> a denormalized log of actions taken by the user in this format: >> user_id | activity | email_id | ... >> 10 | email_sent | 100 | ... >> 10 | subscribed | NULL | ... >> 10 | email_open | 100 | ... >> >> >> Given this table and the question above, the usual way of finding out all >> users who did not receive this email is through the use of a left outer >> join, such as: >> >> select u.id >> from users u >> left outer join activities a >> on u.id = a.user_id >> and a.activity = 'email_sent' >> and a.email_id = 100 >> where a.user_id is null >> >> That's all fine for medium-ish tables. However our current activities >> table >> has over 13 million rows, slowing the hell out of this left outer join, >> taking about 52 seconds in my machine. >> >> What this query is trying to do is to get the relative complement of set >> A(users) to B(activities). As far as I know mysql doesn't support set >> subtraction, thus the reason for these queries being slow. >> >> Based on that I've setup a test database on Postgresql, which supports >> this >> very set operation and rewrote the query to look like this: >> >> select u.id >> from users u >> except >> select a.user_id >> from activities a >> where a.activity = 'email_sent' >> and a.email_id = 100; >> >> The fact that postgresql knows how to subtract sets brought this query >> down >> to only 4 seconds. >> >> My question then is: since this is a somewhat common query in our system, >> are there any workarounds I could use in mysql to improve things? >> >> I did find one myself, but it's a bit convoluted and might not perform >> well >> under load, but the following sql script gives me similar performance in >> mysql: >> >> create temporary table email_sent_100 >> select a.user_id >> from user_activity_events a >> where a.activity = 'email_sent' >> >> >> and a.email_id = 100; >> >> create index user_id_idx on email_sent_100(user_id); //this could >> potentially bring the runtime down in the case of a larg temp table. >> >> select count(u.id) >> from users u >> left outer join email_sent_100 s >> on u.id = s.user_id >> and s.user_id is null; >> >> A lot more lines and a lot more complex, but does the job in this example. >> >> I'd appreciate your thoughts. >> >> Cheers, >> Leonardo Borges >> www.leonardoborges.com >> >> > -- ----------------------------- Johnny Withers 601.209.4985 joh...@pixelated.net