Just now realized I answered to Mike only.... oops.

So posting it again... forcing the use of the use_id index didn't really
improve things, unfortunately.


Cheers,
Leonardo Borges
www.leonardoborges.com


On Sat, Jul 9, 2011 at 7:24 AM, mos <mo...@fastmail.fm> wrote:

> Leonardo,
>    What happens when you use "force index(user_id)" ?
>
> See 
> http://dev.mysql.com/doc/**refman/5.1/en/index-hints.html<http://dev.mysql.com/doc/refman/5.1/en/index-hints.html>
>
> Mike
>
> At 09:19 AM 7/8/2011, you wrote:
>
>> Same as before, but with the new index listed in the possible keys:
>>
>> +----+-------------+-------+--**-----+------------------------**
>> ----------------------+-------**---+---------+------------+---**
>> -----+-------------+
>> | id | select_type | table | type  | possible_keys
>>     | key      | key_len | ref        | rows   | Extra       |
>> +----+-------------+-------+--**-----+------------------------**
>> ----------------------+-------**---+---------+------------+---**
>> -----+-------------+
>> |  1 | SIMPLE      | u     | index | NULL
>>      | id_idx   | 5       | NULL       | 972064 | Using index |
>> |  1 | SIMPLE      | a     | ref   |
>> user_idx,email_idx,activity_**idx,compound_idx | user_idx | 5       |
>> getup.u.id |     20 | Using where |
>> +----+-------------+-------+--**-----+------------------------**
>> ----------------------+-------**---+---------+------------+---**
>> -----+-------------+
>>
>>
>>
>> On Sat, Jul 9, 2011 at 12:00 AM, Johnny Withers <joh...@pixelated.net
>> >wrote:
>>
>> > What did the explain output look like after the new index?
>> >
>> >
>> > On Fri, Jul 8, 2011 at 8:53 AM, Leonardo Borges <
>> > leonardoborges...@gmail.com> wrote:
>> >
>> >> Hi Johnny,
>> >>
>> >> I just gave that a try but it didn't help as I suspected.
>> >>
>> >> I still believe the problem is in mysql not being able to handle set
>> >> subtractions. Therefore, it has to perform the work harder to return
>> the
>> >> rows that represent a "no match" with NULL values in place so they can
>> then
>> >> be filtered by the WHERE clause.
>> >>
>> >>
>> >> This type of query seems to be a corner case in mysql one should be
>> aware
>> >> about when working with large datasets.
>> >>
>> >> Cheers,
>> >> Leonardo Borges
>> >> www.leonardoborges.com
>> >>
>> >>
>> >> On Fri, Jul 8, 2011 at 11:18 PM, Johnny Withers <joh...@pixelated.net
>> >wrote:
>> >>
>> >>> Leonardo,
>> >>>
>> >>> I think a new compound key on email_id and activity in the activities
>> >>> table may help.
>> >>>
>> >>> I'm not sure if this will help or not, Its hard to test w/o having a
>> >>> large data set to test against.
>> >>>
>> >>>
>> >>> On Thu, Jul 7, 2011 at 9:07 PM, Leonardo Borges <
>> >>> leonardoborges...@gmail.com> wrote:
>> >>>
>> >>>> Sure can:
>> >>>>
>> >>>> show create table activities;
>> >>>>
>> >>>> CREATE TABLE `activities` (
>> >>>>   `id` int(11) NOT NULL AUTO_INCREMENT,
>> >>>>   `user_id` int(11) DEFAULT NULL,
>> >>>>   `email` varchar(100) DEFAULT NULL,
>> >>>>   `country_iso` varchar(2) DEFAULT NULL,
>> >>>>   `tags` varchar(255) DEFAULT NULL,
>> >>>>   `postcode` int(11) DEFAULT NULL,
>> >>>>   `activity` varchar(100) DEFAULT NULL,
>> >>>>   `page_id` int(11) DEFAULT NULL,
>> >>>>   `donation_frequency` varchar(100) DEFAULT NULL,
>> >>>>   `email_id` int(11) DEFAULT NULL,
>> >>>>   `created_at` timestamp NOT NULL DEFAULT CURRENT_TIMESTAMP ON UPDATE
>> >>>> CURRENT_TIMESTAMP,
>> >>>>   PRIMARY KEY (`id`),
>> >>>>   KEY `user_idx` (`user_id`),
>> >>>>   KEY `email_idx` (`email_id`),
>> >>>>   KEY `activity_idx` (`activity`)
>> >>>> ) ENGINE=MyISAM AUTO_INCREMENT=11331976 DEFAULT CHARSET=latin1
>> >>>>
>> >>>>
>> >>>> And the explain:
>> >>>>
>> >>>>
>> >>>> +----+-------------+-------+--**-----+------------------------**
>> ---------+----------+---------**+------------+--------+-------**------+
>> >>>> | id | select_type | table | type  | possible_keys
>> |
>> >>>> key      | key_len | ref        | rows   | Extra       |
>> >>>>
>> >>>> +----+-------------+-------+--**-----+------------------------**
>> ---------+----------+---------**+------------+--------+-------**------+
>> >>>> |  1 | SIMPLE      | u     | index | NULL
>>  |
>> >>>> id_idx   | 5       | NULL       | 972064 | Using index |
>> >>>> |  1 | SIMPLE      | a     | ref   | user_idx,email_idx,activity_**idx
>> |
>> >>>> user_idx | 5       | getup.u.id |     20 | Using where |
>> >>>>
>> >>>> +----+-------------+-------+--**-----+------------------------**
>> ---------+----------+---------**+------------+--------+-------**------+
>> >>>>
>> >>>>
>> >>>> Cheers,
>> >>>> Leonardo Borges
>> >>>> www.leonardoborges.com
>> >>>>
>> >>>>
>> >>>> On Fri, Jul 8, 2011 at 11:58 AM, Johnny Withers <
>> joh...@pixelated.net>wrote:
>> >>>>
>> >>>>> Can you post show create table for activity and explain output of
>> the
>> >>>>> problem query?
>> >>>>>
>> >>>>> On Jul 7, 2011 8:51 PM, "Leonardo Borges" <
>> leonardoborges...@gmail.com>
>> >>>>> wrote:
>> >>>>>
>> >>>>> Hello everyone,
>> >>>>>
>> >>>>> I have an increasingly popular web application running on top of
>> mysql
>> >>>>> and
>> >>>>> due to its popularity, I'm running into performance issues. After
>> >>>>> carefully
>> >>>>> examining database indexes and tuning queries I was able to pin down
>> >>>>> the
>> >>>>> slowest part of the system.
>> >>>>>
>> >>>>> The app's got a user segmentation tool that allows you to filter
>> users
>> >>>>> based
>> >>>>> on a range of criteria from which the slowest is: "Select all users
>> >>>>> that did
>> >>>>> not receive the email of id 100"
>> >>>>>
>> >>>>> To answer this question we turn to the activities table, which is
>> >>>>> basically
>> >>>>> a denormalized log of actions taken by the user in this format:
>> >>>>> user_id  | activity        | email_id | ...
>> >>>>> 10         | email_sent   | 100        | ...
>> >>>>> 10         | subscribed   | NULL     | ...
>> >>>>> 10         | email_open  | 100        | ...
>> >>>>>
>> >>>>>
>> >>>>> Given this table and the question above, the usual way of finding
>> out
>> >>>>> all
>> >>>>> users who did not receive this email is through the use of a left
>> outer
>> >>>>> join, such as:
>> >>>>>
>> >>>>> select u.id
>> >>>>> from users u
>> >>>>> left outer join activities a
>> >>>>> on u.id = a.user_id
>> >>>>> and a.activity = 'email_sent'
>> >>>>> and a.email_id = 100
>> >>>>> where a.user_id is null
>> >>>>>
>> >>>>> That's all fine  for medium-ish tables. However our current
>> activities
>> >>>>> table
>> >>>>> has over 13 million rows, slowing the hell out of this left outer
>> join,
>> >>>>> taking about 52 seconds in my machine.
>> >>>>>
>> >>>>> What this query is trying to do is to get the relative complement of
>> >>>>> set
>> >>>>> A(users) to B(activities). As far as I know mysql doesn't support
>> set
>> >>>>> subtraction, thus the reason for these queries being slow.
>> >>>>>
>> >>>>> Based on that I've setup a test database on Postgresql, which
>> supports
>> >>>>> this
>> >>>>> very set operation and rewrote the query to look like this:
>> >>>>>
>> >>>>> select u.id
>> >>>>> from users u
>> >>>>> except
>> >>>>> select a.user_id
>> >>>>> from activities a
>> >>>>> where a.activity = 'email_sent'
>> >>>>> and a.email_id = 100;
>> >>>>>
>> >>>>> The fact that postgresql knows how to subtract sets brought this
>> query
>> >>>>> down
>> >>>>> to only 4 seconds.
>> >>>>>
>> >>>>> My question then is: since this is a somewhat common query in our
>> >>>>> system,
>> >>>>> are there any workarounds I could use in mysql to improve things?
>> >>>>>
>> >>>>> I did find one myself, but it's a bit convoluted and might not
>> perform
>> >>>>> well
>> >>>>> under load, but the following sql script gives me similar
>> performance
>> >>>>> in
>> >>>>> mysql:
>> >>>>>
>> >>>>> create temporary table email_sent_100
>> >>>>> select a.user_id
>> >>>>> from user_activity_events a
>> >>>>> where a.activity = 'email_sent'
>> >>>>>
>> >>>>>
>> >>>>> and a.email_id = 100;
>> >>>>>
>> >>>>> create index user_id_idx on email_sent_100(user_id);   //this could
>> >>>>> potentially bring the runtime down in the case of a larg temp table.
>> >>>>>
>> >>>>> select count(u.id)
>> >>>>> from users u
>> >>>>> left outer join email_sent_100 s
>> >>>>>   on u.id = s.user_id
>> >>>>>   and s.user_id is null;
>> >>>>>
>> >>>>> A lot more lines and a lot more complex, but does the job in this
>> >>>>> example.
>> >>>>>
>> >>>>> I'd appreciate your thoughts.
>> >>>>>
>> >>>>> Cheers,
>> >>>>> Leonardo Borges
>> >>>>> www.leonardoborges.com
>> >>>>>
>> >>>>>
>> >>>>
>> >>>
>> >>>
>> >>> --
>> >>> -----------------------------
>> >>> Johnny Withers
>> >>> 601.209.4985
>> >>> joh...@pixelated.net
>> >>>
>> >>
>> >>
>> >
>> >
>> > --
>> > -----------------------------
>> > Johnny Withers
>> > 601.209.4985
>> > joh...@pixelated.net
>> >
>>
>
>
> --
> MySQL General Mailing List
> For list archives: http://lists.mysql.com/mysql
> To unsubscribe:    http://lists.mysql.com/mysql?**
> unsub=leonardoborges.rj@gmail.**com<http://lists.mysql.com/mysql?unsub=leonardoborges...@gmail.com>
>
>

Reply via email to