Sure can:

show create table activities;

CREATE TABLE `activities` (
  `id` int(11) NOT NULL AUTO_INCREMENT,
  `user_id` int(11) DEFAULT NULL,
  `email` varchar(100) DEFAULT NULL,
  `country_iso` varchar(2) DEFAULT NULL,
  `tags` varchar(255) DEFAULT NULL,
  `postcode` int(11) DEFAULT NULL,
  `activity` varchar(100) DEFAULT NULL,
  `page_id` int(11) DEFAULT NULL,
  `donation_frequency` varchar(100) DEFAULT NULL,
  `email_id` int(11) DEFAULT NULL,
  `created_at` timestamp NOT NULL DEFAULT CURRENT_TIMESTAMP ON UPDATE
CURRENT_TIMESTAMP,
  PRIMARY KEY (`id`),
  KEY `user_idx` (`user_id`),
  KEY `email_idx` (`email_id`),
  KEY `activity_idx` (`activity`)
) ENGINE=MyISAM AUTO_INCREMENT=11331976 DEFAULT CHARSET=latin1


And the explain:

+----+-------------+-------+-------+---------------------------------+----------+---------+------------+--------+-------------+
| id | select_type | table | type  | possible_keys                   | key
   | key_len | ref        | rows   | Extra       |
+----+-------------+-------+-------+---------------------------------+----------+---------+------------+--------+-------------+
|  1 | SIMPLE      | u     | index | NULL                            |
id_idx   | 5       | NULL       | 972064 | Using index |
|  1 | SIMPLE      | a     | ref   | user_idx,email_idx,activity_idx |
user_idx | 5       | getup.u.id |     20 | Using where |
+----+-------------+-------+-------+---------------------------------+----------+---------+------------+--------+-------------+


Cheers,
Leonardo Borges
www.leonardoborges.com


On Fri, Jul 8, 2011 at 11:58 AM, Johnny Withers <joh...@pixelated.net>wrote:

> Can you post show create table for activity and explain output of the
> problem query?
>
> On Jul 7, 2011 8:51 PM, "Leonardo Borges" <leonardoborges...@gmail.com>
> wrote:
>
> Hello everyone,
>
> I have an increasingly popular web application running on top of mysql and
> due to its popularity, I'm running into performance issues. After carefully
> examining database indexes and tuning queries I was able to pin down the
> slowest part of the system.
>
> The app's got a user segmentation tool that allows you to filter users
> based
> on a range of criteria from which the slowest is: "Select all users that
> did
> not receive the email of id 100"
>
> To answer this question we turn to the activities table, which is basically
> a denormalized log of actions taken by the user in this format:
> user_id  | activity        | email_id | ...
> 10         | email_sent   | 100        | ...
> 10         | subscribed   | NULL     | ...
> 10         | email_open  | 100        | ...
>
>
> Given this table and the question above, the usual way of finding out all
> users who did not receive this email is through the use of a left outer
> join, such as:
>
> select u.id
> from users u
> left outer join activities a
> on u.id = a.user_id
> and a.activity = 'email_sent'
> and a.email_id = 100
> where a.user_id is null
>
> That's all fine  for medium-ish tables. However our current activities
> table
> has over 13 million rows, slowing the hell out of this left outer join,
> taking about 52 seconds in my machine.
>
> What this query is trying to do is to get the relative complement of set
> A(users) to B(activities). As far as I know mysql doesn't support set
> subtraction, thus the reason for these queries being slow.
>
> Based on that I've setup a test database on Postgresql, which supports this
> very set operation and rewrote the query to look like this:
>
> select u.id
> from users u
> except
> select a.user_id
> from activities a
> where a.activity = 'email_sent'
> and a.email_id = 100;
>
> The fact that postgresql knows how to subtract sets brought this query down
> to only 4 seconds.
>
> My question then is: since this is a somewhat common query in our system,
> are there any workarounds I could use in mysql to improve things?
>
> I did find one myself, but it's a bit convoluted and might not perform well
> under load, but the following sql script gives me similar performance in
> mysql:
>
> create temporary table email_sent_100
> select a.user_id
> from user_activity_events a
> where a.activity = 'email_sent'
>
>
> and a.email_id = 100;
>
> create index user_id_idx on email_sent_100(user_id);   //this could
> potentially bring the runtime down in the case of a larg temp table.
>
> select count(u.id)
> from users u
> left outer join email_sent_100 s
>   on u.id = s.user_id
>   and s.user_id is null;
>
> A lot more lines and a lot more complex, but does the job in this example.
>
> I'd appreciate your thoughts.
>
> Cheers,
> Leonardo Borges
> www.leonardoborges.com
>
>

Reply via email to