Same as before, but with the new index listed in the possible keys: +----+-------------+-------+-------+----------------------------------------------+----------+---------+------------+--------+-------------+ | id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra | +----+-------------+-------+-------+----------------------------------------------+----------+---------+------------+--------+-------------+ | 1 | SIMPLE | u | index | NULL | id_idx | 5 | NULL | 972064 | Using index | | 1 | SIMPLE | a | ref | user_idx,email_idx,activity_idx,compound_idx | user_idx | 5 | getup.u.id | 20 | Using where | +----+-------------+-------+-------+----------------------------------------------+----------+---------+------------+--------+-------------+
On Sat, Jul 9, 2011 at 12:00 AM, Johnny Withers <joh...@pixelated.net>wrote: > What did the explain output look like after the new index? > > > On Fri, Jul 8, 2011 at 8:53 AM, Leonardo Borges < > leonardoborges...@gmail.com> wrote: > >> Hi Johnny, >> >> I just gave that a try but it didn't help as I suspected. >> >> I still believe the problem is in mysql not being able to handle set >> subtractions. Therefore, it has to perform the work harder to return the >> rows that represent a "no match" with NULL values in place so they can then >> be filtered by the WHERE clause. >> >> >> This type of query seems to be a corner case in mysql one should be aware >> about when working with large datasets. >> >> Cheers, >> Leonardo Borges >> www.leonardoborges.com >> >> >> On Fri, Jul 8, 2011 at 11:18 PM, Johnny Withers <joh...@pixelated.net>wrote: >> >>> Leonardo, >>> >>> I think a new compound key on email_id and activity in the activities >>> table may help. >>> >>> I'm not sure if this will help or not, Its hard to test w/o having a >>> large data set to test against. >>> >>> >>> On Thu, Jul 7, 2011 at 9:07 PM, Leonardo Borges < >>> leonardoborges...@gmail.com> wrote: >>> >>>> Sure can: >>>> >>>> show create table activities; >>>> >>>> CREATE TABLE `activities` ( >>>> `id` int(11) NOT NULL AUTO_INCREMENT, >>>> `user_id` int(11) DEFAULT NULL, >>>> `email` varchar(100) DEFAULT NULL, >>>> `country_iso` varchar(2) DEFAULT NULL, >>>> `tags` varchar(255) DEFAULT NULL, >>>> `postcode` int(11) DEFAULT NULL, >>>> `activity` varchar(100) DEFAULT NULL, >>>> `page_id` int(11) DEFAULT NULL, >>>> `donation_frequency` varchar(100) DEFAULT NULL, >>>> `email_id` int(11) DEFAULT NULL, >>>> `created_at` timestamp NOT NULL DEFAULT CURRENT_TIMESTAMP ON UPDATE >>>> CURRENT_TIMESTAMP, >>>> PRIMARY KEY (`id`), >>>> KEY `user_idx` (`user_id`), >>>> KEY `email_idx` (`email_id`), >>>> KEY `activity_idx` (`activity`) >>>> ) ENGINE=MyISAM AUTO_INCREMENT=11331976 DEFAULT CHARSET=latin1 >>>> >>>> >>>> And the explain: >>>> >>>> >>>> +----+-------------+-------+-------+---------------------------------+----------+---------+------------+--------+-------------+ >>>> | id | select_type | table | type | possible_keys | >>>> key | key_len | ref | rows | Extra | >>>> >>>> +----+-------------+-------+-------+---------------------------------+----------+---------+------------+--------+-------------+ >>>> | 1 | SIMPLE | u | index | NULL | >>>> id_idx | 5 | NULL | 972064 | Using index | >>>> | 1 | SIMPLE | a | ref | user_idx,email_idx,activity_idx | >>>> user_idx | 5 | getup.u.id | 20 | Using where | >>>> >>>> +----+-------------+-------+-------+---------------------------------+----------+---------+------------+--------+-------------+ >>>> >>>> >>>> Cheers, >>>> Leonardo Borges >>>> www.leonardoborges.com >>>> >>>> >>>> On Fri, Jul 8, 2011 at 11:58 AM, Johnny Withers >>>> <joh...@pixelated.net>wrote: >>>> >>>>> Can you post show create table for activity and explain output of the >>>>> problem query? >>>>> >>>>> On Jul 7, 2011 8:51 PM, "Leonardo Borges" <leonardoborges...@gmail.com> >>>>> wrote: >>>>> >>>>> Hello everyone, >>>>> >>>>> I have an increasingly popular web application running on top of mysql >>>>> and >>>>> due to its popularity, I'm running into performance issues. After >>>>> carefully >>>>> examining database indexes and tuning queries I was able to pin down >>>>> the >>>>> slowest part of the system. >>>>> >>>>> The app's got a user segmentation tool that allows you to filter users >>>>> based >>>>> on a range of criteria from which the slowest is: "Select all users >>>>> that did >>>>> not receive the email of id 100" >>>>> >>>>> To answer this question we turn to the activities table, which is >>>>> basically >>>>> a denormalized log of actions taken by the user in this format: >>>>> user_id | activity | email_id | ... >>>>> 10 | email_sent | 100 | ... >>>>> 10 | subscribed | NULL | ... >>>>> 10 | email_open | 100 | ... >>>>> >>>>> >>>>> Given this table and the question above, the usual way of finding out >>>>> all >>>>> users who did not receive this email is through the use of a left outer >>>>> join, such as: >>>>> >>>>> select u.id >>>>> from users u >>>>> left outer join activities a >>>>> on u.id = a.user_id >>>>> and a.activity = 'email_sent' >>>>> and a.email_id = 100 >>>>> where a.user_id is null >>>>> >>>>> That's all fine for medium-ish tables. However our current activities >>>>> table >>>>> has over 13 million rows, slowing the hell out of this left outer join, >>>>> taking about 52 seconds in my machine. >>>>> >>>>> What this query is trying to do is to get the relative complement of >>>>> set >>>>> A(users) to B(activities). As far as I know mysql doesn't support set >>>>> subtraction, thus the reason for these queries being slow. >>>>> >>>>> Based on that I've setup a test database on Postgresql, which supports >>>>> this >>>>> very set operation and rewrote the query to look like this: >>>>> >>>>> select u.id >>>>> from users u >>>>> except >>>>> select a.user_id >>>>> from activities a >>>>> where a.activity = 'email_sent' >>>>> and a.email_id = 100; >>>>> >>>>> The fact that postgresql knows how to subtract sets brought this query >>>>> down >>>>> to only 4 seconds. >>>>> >>>>> My question then is: since this is a somewhat common query in our >>>>> system, >>>>> are there any workarounds I could use in mysql to improve things? >>>>> >>>>> I did find one myself, but it's a bit convoluted and might not perform >>>>> well >>>>> under load, but the following sql script gives me similar performance >>>>> in >>>>> mysql: >>>>> >>>>> create temporary table email_sent_100 >>>>> select a.user_id >>>>> from user_activity_events a >>>>> where a.activity = 'email_sent' >>>>> >>>>> >>>>> and a.email_id = 100; >>>>> >>>>> create index user_id_idx on email_sent_100(user_id); //this could >>>>> potentially bring the runtime down in the case of a larg temp table. >>>>> >>>>> select count(u.id) >>>>> from users u >>>>> left outer join email_sent_100 s >>>>> on u.id = s.user_id >>>>> and s.user_id is null; >>>>> >>>>> A lot more lines and a lot more complex, but does the job in this >>>>> example. >>>>> >>>>> I'd appreciate your thoughts. >>>>> >>>>> Cheers, >>>>> Leonardo Borges >>>>> www.leonardoborges.com >>>>> >>>>> >>>> >>> >>> >>> -- >>> ----------------------------- >>> Johnny Withers >>> 601.209.4985 >>> joh...@pixelated.net >>> >> >> > > > -- > ----------------------------- > Johnny Withers > 601.209.4985 > joh...@pixelated.net >