Just now realized I answered to Mike only.... oops. So posting it again... forcing the use of the use_id index didn't really improve things, unfortunately.
Cheers, Leonardo Borges www.leonardoborges.com On Sat, Jul 9, 2011 at 7:24 AM, mos <mo...@fastmail.fm> wrote: > Leonardo, > What happens when you use "force index(user_id)" ? > > See > http://dev.mysql.com/doc/**refman/5.1/en/index-hints.html<http://dev.mysql.com/doc/refman/5.1/en/index-hints.html> > > Mike > > At 09:19 AM 7/8/2011, you wrote: > >> Same as before, but with the new index listed in the possible keys: >> >> +----+-------------+-------+--**-----+------------------------** >> ----------------------+-------**---+---------+------------+---** >> -----+-------------+ >> | id | select_type | table | type | possible_keys >> | key | key_len | ref | rows | Extra | >> +----+-------------+-------+--**-----+------------------------** >> ----------------------+-------**---+---------+------------+---** >> -----+-------------+ >> | 1 | SIMPLE | u | index | NULL >> | id_idx | 5 | NULL | 972064 | Using index | >> | 1 | SIMPLE | a | ref | >> user_idx,email_idx,activity_**idx,compound_idx | user_idx | 5 | >> getup.u.id | 20 | Using where | >> +----+-------------+-------+--**-----+------------------------** >> ----------------------+-------**---+---------+------------+---** >> -----+-------------+ >> >> >> >> On Sat, Jul 9, 2011 at 12:00 AM, Johnny Withers <joh...@pixelated.net >> >wrote: >> >> > What did the explain output look like after the new index? >> > >> > >> > On Fri, Jul 8, 2011 at 8:53 AM, Leonardo Borges < >> > leonardoborges...@gmail.com> wrote: >> > >> >> Hi Johnny, >> >> >> >> I just gave that a try but it didn't help as I suspected. >> >> >> >> I still believe the problem is in mysql not being able to handle set >> >> subtractions. Therefore, it has to perform the work harder to return >> the >> >> rows that represent a "no match" with NULL values in place so they can >> then >> >> be filtered by the WHERE clause. >> >> >> >> >> >> This type of query seems to be a corner case in mysql one should be >> aware >> >> about when working with large datasets. >> >> >> >> Cheers, >> >> Leonardo Borges >> >> www.leonardoborges.com >> >> >> >> >> >> On Fri, Jul 8, 2011 at 11:18 PM, Johnny Withers <joh...@pixelated.net >> >wrote: >> >> >> >>> Leonardo, >> >>> >> >>> I think a new compound key on email_id and activity in the activities >> >>> table may help. >> >>> >> >>> I'm not sure if this will help or not, Its hard to test w/o having a >> >>> large data set to test against. >> >>> >> >>> >> >>> On Thu, Jul 7, 2011 at 9:07 PM, Leonardo Borges < >> >>> leonardoborges...@gmail.com> wrote: >> >>> >> >>>> Sure can: >> >>>> >> >>>> show create table activities; >> >>>> >> >>>> CREATE TABLE `activities` ( >> >>>> `id` int(11) NOT NULL AUTO_INCREMENT, >> >>>> `user_id` int(11) DEFAULT NULL, >> >>>> `email` varchar(100) DEFAULT NULL, >> >>>> `country_iso` varchar(2) DEFAULT NULL, >> >>>> `tags` varchar(255) DEFAULT NULL, >> >>>> `postcode` int(11) DEFAULT NULL, >> >>>> `activity` varchar(100) DEFAULT NULL, >> >>>> `page_id` int(11) DEFAULT NULL, >> >>>> `donation_frequency` varchar(100) DEFAULT NULL, >> >>>> `email_id` int(11) DEFAULT NULL, >> >>>> `created_at` timestamp NOT NULL DEFAULT CURRENT_TIMESTAMP ON UPDATE >> >>>> CURRENT_TIMESTAMP, >> >>>> PRIMARY KEY (`id`), >> >>>> KEY `user_idx` (`user_id`), >> >>>> KEY `email_idx` (`email_id`), >> >>>> KEY `activity_idx` (`activity`) >> >>>> ) ENGINE=MyISAM AUTO_INCREMENT=11331976 DEFAULT CHARSET=latin1 >> >>>> >> >>>> >> >>>> And the explain: >> >>>> >> >>>> >> >>>> +----+-------------+-------+--**-----+------------------------** >> ---------+----------+---------**+------------+--------+-------**------+ >> >>>> | id | select_type | table | type | possible_keys >> | >> >>>> key | key_len | ref | rows | Extra | >> >>>> >> >>>> +----+-------------+-------+--**-----+------------------------** >> ---------+----------+---------**+------------+--------+-------**------+ >> >>>> | 1 | SIMPLE | u | index | NULL >> | >> >>>> id_idx | 5 | NULL | 972064 | Using index | >> >>>> | 1 | SIMPLE | a | ref | user_idx,email_idx,activity_**idx >> | >> >>>> user_idx | 5 | getup.u.id | 20 | Using where | >> >>>> >> >>>> +----+-------------+-------+--**-----+------------------------** >> ---------+----------+---------**+------------+--------+-------**------+ >> >>>> >> >>>> >> >>>> Cheers, >> >>>> Leonardo Borges >> >>>> www.leonardoborges.com >> >>>> >> >>>> >> >>>> On Fri, Jul 8, 2011 at 11:58 AM, Johnny Withers < >> joh...@pixelated.net>wrote: >> >>>> >> >>>>> Can you post show create table for activity and explain output of >> the >> >>>>> problem query? >> >>>>> >> >>>>> On Jul 7, 2011 8:51 PM, "Leonardo Borges" < >> leonardoborges...@gmail.com> >> >>>>> wrote: >> >>>>> >> >>>>> Hello everyone, >> >>>>> >> >>>>> I have an increasingly popular web application running on top of >> mysql >> >>>>> and >> >>>>> due to its popularity, I'm running into performance issues. After >> >>>>> carefully >> >>>>> examining database indexes and tuning queries I was able to pin down >> >>>>> the >> >>>>> slowest part of the system. >> >>>>> >> >>>>> The app's got a user segmentation tool that allows you to filter >> users >> >>>>> based >> >>>>> on a range of criteria from which the slowest is: "Select all users >> >>>>> that did >> >>>>> not receive the email of id 100" >> >>>>> >> >>>>> To answer this question we turn to the activities table, which is >> >>>>> basically >> >>>>> a denormalized log of actions taken by the user in this format: >> >>>>> user_id | activity | email_id | ... >> >>>>> 10 | email_sent | 100 | ... >> >>>>> 10 | subscribed | NULL | ... >> >>>>> 10 | email_open | 100 | ... >> >>>>> >> >>>>> >> >>>>> Given this table and the question above, the usual way of finding >> out >> >>>>> all >> >>>>> users who did not receive this email is through the use of a left >> outer >> >>>>> join, such as: >> >>>>> >> >>>>> select u.id >> >>>>> from users u >> >>>>> left outer join activities a >> >>>>> on u.id = a.user_id >> >>>>> and a.activity = 'email_sent' >> >>>>> and a.email_id = 100 >> >>>>> where a.user_id is null >> >>>>> >> >>>>> That's all fine for medium-ish tables. However our current >> activities >> >>>>> table >> >>>>> has over 13 million rows, slowing the hell out of this left outer >> join, >> >>>>> taking about 52 seconds in my machine. >> >>>>> >> >>>>> What this query is trying to do is to get the relative complement of >> >>>>> set >> >>>>> A(users) to B(activities). As far as I know mysql doesn't support >> set >> >>>>> subtraction, thus the reason for these queries being slow. >> >>>>> >> >>>>> Based on that I've setup a test database on Postgresql, which >> supports >> >>>>> this >> >>>>> very set operation and rewrote the query to look like this: >> >>>>> >> >>>>> select u.id >> >>>>> from users u >> >>>>> except >> >>>>> select a.user_id >> >>>>> from activities a >> >>>>> where a.activity = 'email_sent' >> >>>>> and a.email_id = 100; >> >>>>> >> >>>>> The fact that postgresql knows how to subtract sets brought this >> query >> >>>>> down >> >>>>> to only 4 seconds. >> >>>>> >> >>>>> My question then is: since this is a somewhat common query in our >> >>>>> system, >> >>>>> are there any workarounds I could use in mysql to improve things? >> >>>>> >> >>>>> I did find one myself, but it's a bit convoluted and might not >> perform >> >>>>> well >> >>>>> under load, but the following sql script gives me similar >> performance >> >>>>> in >> >>>>> mysql: >> >>>>> >> >>>>> create temporary table email_sent_100 >> >>>>> select a.user_id >> >>>>> from user_activity_events a >> >>>>> where a.activity = 'email_sent' >> >>>>> >> >>>>> >> >>>>> and a.email_id = 100; >> >>>>> >> >>>>> create index user_id_idx on email_sent_100(user_id); //this could >> >>>>> potentially bring the runtime down in the case of a larg temp table. >> >>>>> >> >>>>> select count(u.id) >> >>>>> from users u >> >>>>> left outer join email_sent_100 s >> >>>>> on u.id = s.user_id >> >>>>> and s.user_id is null; >> >>>>> >> >>>>> A lot more lines and a lot more complex, but does the job in this >> >>>>> example. >> >>>>> >> >>>>> I'd appreciate your thoughts. >> >>>>> >> >>>>> Cheers, >> >>>>> Leonardo Borges >> >>>>> www.leonardoborges.com >> >>>>> >> >>>>> >> >>>> >> >>> >> >>> >> >>> -- >> >>> ----------------------------- >> >>> Johnny Withers >> >>> 601.209.4985 >> >>> joh...@pixelated.net >> >>> >> >> >> >> >> > >> > >> > -- >> > ----------------------------- >> > Johnny Withers >> > 601.209.4985 >> > joh...@pixelated.net >> > >> > > > -- > MySQL General Mailing List > For list archives: http://lists.mysql.com/mysql > To unsubscribe: http://lists.mysql.com/mysql?** > unsub=leonardoborges.rj@gmail.**com<http://lists.mysql.com/mysql?unsub=leonardoborges...@gmail.com> > >