Re: MySQL and set complements

2011-07-07 Thread Leonardo Borges
Sure can:

show create table activities;

CREATE TABLE `activities` (
  `id` int(11) NOT NULL AUTO_INCREMENT,
  `user_id` int(11) DEFAULT NULL,
  `email` varchar(100) DEFAULT NULL,
  `country_iso` varchar(2) DEFAULT NULL,
  `tags` varchar(255) DEFAULT NULL,
  `postcode` int(11) DEFAULT NULL,
  `activity` varchar(100) DEFAULT NULL,
  `page_id` int(11) DEFAULT NULL,
  `donation_frequency` varchar(100) DEFAULT NULL,
  `email_id` int(11) DEFAULT NULL,
  `created_at` timestamp NOT NULL DEFAULT CURRENT_TIMESTAMP ON UPDATE
CURRENT_TIMESTAMP,
  PRIMARY KEY (`id`),
  KEY `user_idx` (`user_id`),
  KEY `email_idx` (`email_id`),
  KEY `activity_idx` (`activity`)
) ENGINE=MyISAM AUTO_INCREMENT=11331976 DEFAULT CHARSET=latin1


And the explain:

++-+---+---+-+--+-+++-+
| id | select_type | table | type  | possible_keys   | key
   | key_len | ref| rows   | Extra   |
++-+---+---+-+--+-+++-+
|  1 | SIMPLE  | u | index | NULL|
id_idx   | 5   | NULL   | 972064 | Using index |
|  1 | SIMPLE  | a | ref   | user_idx,email_idx,activity_idx |
user_idx | 5   | getup.u.id | 20 | Using where |
++-+---+---+-+--+-+++-+


Cheers,
Leonardo Borges
www.leonardoborges.com


On Fri, Jul 8, 2011 at 11:58 AM, Johnny Withers wrote:

> Can you post show create table for activity and explain output of the
> problem query?
>
> On Jul 7, 2011 8:51 PM, "Leonardo Borges" 
> wrote:
>
> Hello everyone,
>
> I have an increasingly popular web application running on top of mysql and
> due to its popularity, I'm running into performance issues. After carefully
> examining database indexes and tuning queries I was able to pin down the
> slowest part of the system.
>
> The app's got a user segmentation tool that allows you to filter users
> based
> on a range of criteria from which the slowest is: "Select all users that
> did
> not receive the email of id 100"
>
> To answer this question we turn to the activities table, which is basically
> a denormalized log of actions taken by the user in this format:
> user_id  | activity| email_id | ...
> 10 | email_sent   | 100| ...
> 10 | subscribed   | NULL | ...
> 10 | email_open  | 100| ...
>
>
> Given this table and the question above, the usual way of finding out all
> users who did not receive this email is through the use of a left outer
> join, such as:
>
> select u.id
> from users u
> left outer join activities a
> on u.id = a.user_id
> and a.activity = 'email_sent'
> and a.email_id = 100
> where a.user_id is null
>
> That's all fine  for medium-ish tables. However our current activities
> table
> has over 13 million rows, slowing the hell out of this left outer join,
> taking about 52 seconds in my machine.
>
> What this query is trying to do is to get the relative complement of set
> A(users) to B(activities). As far as I know mysql doesn't support set
> subtraction, thus the reason for these queries being slow.
>
> Based on that I've setup a test database on Postgresql, which supports this
> very set operation and rewrote the query to look like this:
>
> select u.id
> from users u
> except
> select a.user_id
> from activities a
> where a.activity = 'email_sent'
> and a.email_id = 100;
>
> The fact that postgresql knows how to subtract sets brought this query down
> to only 4 seconds.
>
> My question then is: since this is a somewhat common query in our system,
> are there any workarounds I could use in mysql to improve things?
>
> I did find one myself, but it's a bit convoluted and might not perform well
> under load, but the following sql script gives me similar performance in
> mysql:
>
> create temporary table email_sent_100
> select a.user_id
> from user_activity_events a
> where a.activity = 'email_sent'
>
>
> and a.email_id = 100;
>
> create index user_id_idx on email_sent_100(user_id);   //this could
> potentially bring the runtime down in the case of a larg temp table.
>
> select count(u.id)
> from users u
> left outer join email_sent_100 s
>   on u.id = s.user_id
>   and s.user_id is null;
>
> A lot more lines and a lot more complex, but does the job in this example.
>
> I'd appreciate your thoughts.
>
> Cheers,
> Leonardo Borges
> www.leonardoborges.com
>
>


Re: MySQL and set complements

2011-07-07 Thread Johnny Withers
Can you post show create table for activity and explain output of the
problem query?

On Jul 7, 2011 8:51 PM, "Leonardo Borges" 
wrote:

Hello everyone,

I have an increasingly popular web application running on top of mysql and
due to its popularity, I'm running into performance issues. After carefully
examining database indexes and tuning queries I was able to pin down the
slowest part of the system.

The app's got a user segmentation tool that allows you to filter users based
on a range of criteria from which the slowest is: "Select all users that did
not receive the email of id 100"

To answer this question we turn to the activities table, which is basically
a denormalized log of actions taken by the user in this format:
user_id  | activity| email_id | ...
10 | email_sent   | 100| ...
10 | subscribed   | NULL | ...
10 | email_open  | 100| ...


Given this table and the question above, the usual way of finding out all
users who did not receive this email is through the use of a left outer
join, such as:

select u.id
from users u
left outer join activities a
on u.id = a.user_id
and a.activity = 'email_sent'
and a.email_id = 100
where a.user_id is null

That's all fine  for medium-ish tables. However our current activities table
has over 13 million rows, slowing the hell out of this left outer join,
taking about 52 seconds in my machine.

What this query is trying to do is to get the relative complement of set
A(users) to B(activities). As far as I know mysql doesn't support set
subtraction, thus the reason for these queries being slow.

Based on that I've setup a test database on Postgresql, which supports this
very set operation and rewrote the query to look like this:

select u.id
from users u
except
select a.user_id
from activities a
where a.activity = 'email_sent'
and a.email_id = 100;

The fact that postgresql knows how to subtract sets brought this query down
to only 4 seconds.

My question then is: since this is a somewhat common query in our system,
are there any workarounds I could use in mysql to improve things?

I did find one myself, but it's a bit convoluted and might not perform well
under load, but the following sql script gives me similar performance in
mysql:

create temporary table email_sent_100
select a.user_id
from user_activity_events a
where a.activity = 'email_sent'


and a.email_id = 100;

create index user_id_idx on email_sent_100(user_id);   //this could
potentially bring the runtime down in the case of a larg temp table.

select count(u.id)
from users u
left outer join email_sent_100 s
  on u.id = s.user_id
  and s.user_id is null;

A lot more lines and a lot more complex, but does the job in this example.

I'd appreciate your thoughts.

Cheers,
Leonardo Borges
www.leonardoborges.com


MySQL and set complements

2011-07-07 Thread Leonardo Borges
Hello everyone,

I have an increasingly popular web application running on top of mysql and
due to its popularity, I'm running into performance issues. After carefully
examining database indexes and tuning queries I was able to pin down the
slowest part of the system.

The app's got a user segmentation tool that allows you to filter users based
on a range of criteria from which the slowest is: "Select all users that did
not receive the email of id 100"

To answer this question we turn to the activities table, which is basically
a denormalized log of actions taken by the user in this format:
user_id  | activity| email_id | ...
10 | email_sent   | 100| ...
10 | subscribed   | NULL | ...
10 | email_open  | 100| ...


Given this table and the question above, the usual way of finding out all
users who did not receive this email is through the use of a left outer
join, such as:

select u.id
from users u
left outer join activities a
on u.id = a.user_id
and a.activity = 'email_sent'
and a.email_id = 100
where a.user_id is null

That's all fine  for medium-ish tables. However our current activities table
has over 13 million rows, slowing the hell out of this left outer join,
taking about 52 seconds in my machine.

What this query is trying to do is to get the relative complement of set
A(users) to B(activities). As far as I know mysql doesn't support set
subtraction, thus the reason for these queries being slow.

Based on that I've setup a test database on Postgresql, which supports this
very set operation and rewrote the query to look like this:

select u.id
from users u
except
select a.user_id
from activities a
where a.activity = 'email_sent'
and a.email_id = 100;

The fact that postgresql knows how to subtract sets brought this query down
to only 4 seconds.

My question then is: since this is a somewhat common query in our system,
are there any workarounds I could use in mysql to improve things?

I did find one myself, but it's a bit convoluted and might not perform well
under load, but the following sql script gives me similar performance in
mysql:

create temporary table email_sent_100
select a.user_id
from user_activity_events a
where a.activity = 'email_sent'


and a.email_id = 100;

create index user_id_idx on email_sent_100(user_id);   //this could
potentially bring the runtime down in the case of a larg temp table.

select count(u.id)
from users u
left outer join email_sent_100 s
   on u.id = s.user_id
   and s.user_id is null;

A lot more lines and a lot more complex, but does the job in this example.

I'd appreciate your thoughts.

Cheers,
Leonardo Borges
www.leonardoborges.com


Re: dumb question?

2011-07-07 Thread Hal�sz S�ndor
 2011/07/06 23:56 -0700, XL Cordemans 
Goede morgen, and thank you for your suggestion. I am actually wondering if the 
difference between lasso 8.1 and 8.6 is so big that traditional lasso code can 
not be used when connecting w/ MySQL ? You mentioned "... This can be done in 
the server configuration, so no alterations are necessary ...". 

This mode is set in "my.cnf" (under Windows "my.ini"), found in one of a 
variety of standard places, in the variable "sql-mode", say

sql-mode="ANSI,STRICT_TRANS_TABLES,NO_AUTO_CREATE_USER,NO_ENGINE_SUBSTITUTION"

. The mode in question is "ANSI". There is always a copy of this file in the 
directory into which the program MySQL was installed, but that is not the first 
place where it is sought. Under Linux, say, "/etc/my.cnf" takes precedence.

Quote from help:

On Windows, MySQL programs read startup options from the following files, in 
the specified order (top items are used first). 

File NamePurpose
WINDIR\my.ini, WINDIR\my.cnf Global options
C:\my.ini, C:\my.cnf Global options
INSTALLDIR\my.ini, INSTALLDIR\my.cnf Global options


On Unix, Linux and Mac OS X, MySQL programs read startup options from the 
following files, in the specified order (top items are used first). 

File NamePurpose
/etc/my.cnf  Global options
/etc/mysql/my.cnfGlobal options
SYSCONFDIR/my.cnfGlobal options
$MYSQL_HOME/my.cnf   Server-specific options
defaults-extra-file  The file specified with 
--defaults-extra-file=path, if any
~/.my.cnfUser-specific options

~ represents the current user's home directory (the value of $HOME). 

SYSCONFDIR represents the directory specified with the SYSCONFDIR option to 
CMake when MySQL was built. By default, this is the etc directory located under 
the compiled-in installation directory. 

MYSQL_HOME is an environment variable containing the path to the directory in 
which the server-specific my.cnf file resides. If MYSQL_HOME is not set and you 
start the server using the mysqld_safe program, mysqld_safe attempts to set 
MYSQL_HOME as follows: 

Let BASEDIR and DATADIR represent the path names of the MySQL base directory 
and data directory, respectively. 

If there is a my.cnf file in DATADIR but not in BASEDIR, mysqld_safe sets 
MYSQL_HOME to DATADIR. 

Otherwise, if MYSQL_HOME is not set and there is no my.cnf file in DATADIR, 
mysqld_safe sets MYSQL_HOME to BASEDIR. 

In MySQL 5.5, use of DATADIR as the location for my.cnf is deprecated. 

Typically, DATADIR is /usr/local/mysql/data for a binary installation or 
/usr/local/var for a source installation. Note that this is the data directory 
location that was specified at configuration time, not the one specified with 
the --datadir option when mysqld starts. Use of --datadir at runtime has no 
effect on where the server looks for option files, because it looks for them 
before processing any options. 

MySQL looks for option files in the order just described and reads any that 
exist. If an option file that you want to use does not exist, create it with a 
plain text editor.


-- 
MySQL General Mailing List
For list archives: http://lists.mysql.com/mysql
To unsubscribe:http://lists.mysql.com/mysql?unsub=arch...@jab.org