What did the explain output look like after the new index?
On Fri, Jul 8, 2011 at 8:53 AM, Leonardo Borges <leonardoborges...@gmail.com > wrote: > Hi Johnny, > > I just gave that a try but it didn't help as I suspected. > > I still believe the problem is in mysql not being able to handle set > subtractions. Therefore, it has to perform the work harder to return the > rows that represent a "no match" with NULL values in place so they can then > be filtered by the WHERE clause. > > > This type of query seems to be a corner case in mysql one should be aware > about when working with large datasets. > > Cheers, > Leonardo Borges > www.leonardoborges.com > > > On Fri, Jul 8, 2011 at 11:18 PM, Johnny Withers <joh...@pixelated.net>wrote: > >> Leonardo, >> >> I think a new compound key on email_id and activity in the activities >> table may help. >> >> I'm not sure if this will help or not, Its hard to test w/o having a large >> data set to test against. >> >> >> On Thu, Jul 7, 2011 at 9:07 PM, Leonardo Borges < >> leonardoborges...@gmail.com> wrote: >> >>> Sure can: >>> >>> show create table activities; >>> >>> CREATE TABLE `activities` ( >>> `id` int(11) NOT NULL AUTO_INCREMENT, >>> `user_id` int(11) DEFAULT NULL, >>> `email` varchar(100) DEFAULT NULL, >>> `country_iso` varchar(2) DEFAULT NULL, >>> `tags` varchar(255) DEFAULT NULL, >>> `postcode` int(11) DEFAULT NULL, >>> `activity` varchar(100) DEFAULT NULL, >>> `page_id` int(11) DEFAULT NULL, >>> `donation_frequency` varchar(100) DEFAULT NULL, >>> `email_id` int(11) DEFAULT NULL, >>> `created_at` timestamp NOT NULL DEFAULT CURRENT_TIMESTAMP ON UPDATE >>> CURRENT_TIMESTAMP, >>> PRIMARY KEY (`id`), >>> KEY `user_idx` (`user_id`), >>> KEY `email_idx` (`email_id`), >>> KEY `activity_idx` (`activity`) >>> ) ENGINE=MyISAM AUTO_INCREMENT=11331976 DEFAULT CHARSET=latin1 >>> >>> >>> And the explain: >>> >>> >>> +----+-------------+-------+-------+---------------------------------+----------+---------+------------+--------+-------------+ >>> | id | select_type | table | type | possible_keys | >>> key | key_len | ref | rows | Extra | >>> >>> +----+-------------+-------+-------+---------------------------------+----------+---------+------------+--------+-------------+ >>> | 1 | SIMPLE | u | index | NULL | >>> id_idx | 5 | NULL | 972064 | Using index | >>> | 1 | SIMPLE | a | ref | user_idx,email_idx,activity_idx | >>> user_idx | 5 | getup.u.id | 20 | Using where | >>> >>> +----+-------------+-------+-------+---------------------------------+----------+---------+------------+--------+-------------+ >>> >>> >>> Cheers, >>> Leonardo Borges >>> www.leonardoborges.com >>> >>> >>> On Fri, Jul 8, 2011 at 11:58 AM, Johnny Withers <joh...@pixelated.net>wrote: >>> >>>> Can you post show create table for activity and explain output of the >>>> problem query? >>>> >>>> On Jul 7, 2011 8:51 PM, "Leonardo Borges" <leonardoborges...@gmail.com> >>>> wrote: >>>> >>>> Hello everyone, >>>> >>>> I have an increasingly popular web application running on top of mysql >>>> and >>>> due to its popularity, I'm running into performance issues. After >>>> carefully >>>> examining database indexes and tuning queries I was able to pin down the >>>> slowest part of the system. >>>> >>>> The app's got a user segmentation tool that allows you to filter users >>>> based >>>> on a range of criteria from which the slowest is: "Select all users that >>>> did >>>> not receive the email of id 100" >>>> >>>> To answer this question we turn to the activities table, which is >>>> basically >>>> a denormalized log of actions taken by the user in this format: >>>> user_id | activity | email_id | ... >>>> 10 | email_sent | 100 | ... >>>> 10 | subscribed | NULL | ... >>>> 10 | email_open | 100 | ... >>>> >>>> >>>> Given this table and the question above, the usual way of finding out >>>> all >>>> users who did not receive this email is through the use of a left outer >>>> join, such as: >>>> >>>> select u.id >>>> from users u >>>> left outer join activities a >>>> on u.id = a.user_id >>>> and a.activity = 'email_sent' >>>> and a.email_id = 100 >>>> where a.user_id is null >>>> >>>> That's all fine for medium-ish tables. However our current activities >>>> table >>>> has over 13 million rows, slowing the hell out of this left outer join, >>>> taking about 52 seconds in my machine. >>>> >>>> What this query is trying to do is to get the relative complement of set >>>> A(users) to B(activities). As far as I know mysql doesn't support set >>>> subtraction, thus the reason for these queries being slow. >>>> >>>> Based on that I've setup a test database on Postgresql, which supports >>>> this >>>> very set operation and rewrote the query to look like this: >>>> >>>> select u.id >>>> from users u >>>> except >>>> select a.user_id >>>> from activities a >>>> where a.activity = 'email_sent' >>>> and a.email_id = 100; >>>> >>>> The fact that postgresql knows how to subtract sets brought this query >>>> down >>>> to only 4 seconds. >>>> >>>> My question then is: since this is a somewhat common query in our >>>> system, >>>> are there any workarounds I could use in mysql to improve things? >>>> >>>> I did find one myself, but it's a bit convoluted and might not perform >>>> well >>>> under load, but the following sql script gives me similar performance in >>>> mysql: >>>> >>>> create temporary table email_sent_100 >>>> select a.user_id >>>> from user_activity_events a >>>> where a.activity = 'email_sent' >>>> >>>> >>>> and a.email_id = 100; >>>> >>>> create index user_id_idx on email_sent_100(user_id); //this could >>>> potentially bring the runtime down in the case of a larg temp table. >>>> >>>> select count(u.id) >>>> from users u >>>> left outer join email_sent_100 s >>>> on u.id = s.user_id >>>> and s.user_id is null; >>>> >>>> A lot more lines and a lot more complex, but does the job in this >>>> example. >>>> >>>> I'd appreciate your thoughts. >>>> >>>> Cheers, >>>> Leonardo Borges >>>> www.leonardoborges.com >>>> >>>> >>> >> >> >> -- >> ----------------------------- >> Johnny Withers >> 601.209.4985 >> joh...@pixelated.net >> > > -- ----------------------------- Johnny Withers 601.209.4985 joh...@pixelated.net