Re: merge table: speed benefits?

2005-12-25 Thread Tom Cunningham
Thanks for the reply Brad, two questions about your situtation:

(1) The speedup only applies because of concurrent queries, right?
With sequential queries you would expect no speedup?

(2) If you have 180 underlying tables, then each query to the merge
table has to do 180 separate index lookups, right?

Tom.


On 24 Dec 2005 02:08:12 -, [EMAIL PROTECTED] <[EMAIL PROTECTED]> wrote:
> Tom Cunningham writes:
> >(a) The docs say that spreading the underlying tables across different
> >disks can make queries faster. I don't quite understand how this will
> >work in a normal query: if I do a SUM(amount) over the entire table,
> >will it be quicker if the table is spread across different disks? I
> >don't see how mysql can avoid doing this sequentially.
>
> Tom,
>  Multiple disks definitely helps, since in my case it allows me
> to have N times the available read speed and N times the available
> number iops (IOs/sec).  In my case I have a  61GB merge table that
> is based upon 180 separate myisam tables.  This table contains 487M
> records.  The kinds of queries I'm doing, I would not be able to
> handle cleanly without merge tables on 4.1.15.
>
>  Since trying to back this up would be a nightmare, I rely upon
> a standby master, and 3 query slaves for this data.  This way things
> work reasonably, and I have some hope of surviving some kind of
> hardware failure.  This is just one small portion of the substantial
> data set I'm responsible for over at Technorati.
>   Brad Eacker ([EMAIL PROTECTED])
>[EMAIL PROTECTED]
>

--
MySQL General Mailing List
For list archives: http://lists.mysql.com/mysql
To unsubscribe:http://lists.mysql.com/[EMAIL PROTECTED]



MERGE tables considered harmful for data warehouse fact tables

2005-12-22 Thread Tom Cunningham
The script to prove it follows.

Splitting a million-row fact table into a 5-part merge table makes
3-second queries take 9  seconds.

The basic problem is this: fact tables are generally referenced by
unique combinations of dimensions, and though often only one
underlying table needs to be referenced, mysql doesn't know this, so
every single underlying table is polled for each
dimension-combination.

Practical alternatives: (1) leave your fact table as a big one; (2)
split the table up, but put logic in the client-app so it knows which
table to address; (3) use a *union* of queries instead of a merge
table, then mysql could look at each underlying table one at a time.

If anyone has ideas for other alternatives, or improvements on the
script, please tell me.

Thanks.

Tom.



#

# MERGE FACT TABLE TEST -
[EMAIL PROTECTED]


   USE sandbox;

CREATE TABLE big_table LIKE mysql.help_keyword;
 ALTER TABLE big_table ENGINE=MERGE UNION=(mysql.help_keyword);

  DROP TABLE IF EXISTS dimension_1;
CREATE TABLE dimension_1 (
   key_1   INT PRIMARY KEY NOT NULL,
   attribute_1 VARCHAR(255) NOT NULL,
   INDEX attribute_1 (attribute_1(10))
);
   SET @A:=1;
INSERT INTO dimension_1
SELECT @A:[EMAIL PROTECTED],
   SHA(RAND())
  FROM big_table b1, big_table b2, big_table b3
 LIMIT 10;

  DROP TABLE IF EXISTS dimension_2;
CREATE TABLE dimension_2 (
   key_2   INT PRIMARY KEY NOT NULL,
   attribute_2 VARCHAR(255) NOT NULL,
   INDEX attribute_1 (attribute_2(10))
);
   SET @A:=1;
INSERT INTO dimension_2
SELECT @A:[EMAIL PROTECTED],
   SHA(RAND())
  FROM big_table b1, big_table b2, big_table b3
 LIMIT 10;


  DROP TABLE IF EXISTS facts;
CREATE TABLE facts (
   key_1 INT UNSIGNED,
   key_2 INT UNSIGNED,
   fact_1 INT UNSIGNED,
   fact_2 INT UNSIGNED,
   PRIMARY KEY key_1_key_2 (key_1,key_2)
);
INSERT IGNORE INTO facts
SELECT CEIL(RAND()*10),
   CEIL(RAND()*10),
   CEIL(RAND()*100),
   CEIL(RAND()*100)
  FROM big_table b1, big_table b2, big_table b3
 LIMIT 100;



## Typical query: (takes about 1.5seconds)
SELECT SQL_NO_CACHE attribute_1, sum(fact_1), avg(fact_2)
  FROM facts
  JOIN dimension_1 ON dimension_1.key_1=facts.key_1
  JOIN dimension_2 ON dimension_2.key_2=facts.key_2
 WHERE attribute_1 LIKE 'ff%'
   AND attribute_2 LIKE 'a3%'
 GROUP BY attribute_1
 LIMIT 1;


## Variation 1: Unionised fact table;

DROP TABLES f1, f2, f3, f4, f5;
CREATE TABLE f1 LIKE facts; INSERT INTO f1 SELECT * FROM facts WHERE
key_1 BETWEEN 0 AND 2;
CREATE TABLE f2 LIKE facts; INSERT INTO f2 SELECT * FROM facts WHERE
key_1 BETWEEN 20001 AND 4;
CREATE TABLE f3 LIKE facts; INSERT INTO f3 SELECT * FROM facts WHERE
key_1 BETWEEN 40001 AND 6;
CREATE TABLE f4 LIKE facts; INSERT INTO f4 SELECT * FROM facts WHERE
key_1 BETWEEN 60001 AND 8;
CREATE TABLE f5 LIKE facts; INSERT INTO f5 SELECT * FROM facts WHERE
key_1 BETWEEN 80001 AND 10;
RENAME TABLE facts TO facts_old;
CREATE TABLE facts LIKE facts_old;
ALTER TABLE facts ENGINE=MERGE UNION=(f1,f2,f3,f4,f5);

# (now try the above same query again)

--
MySQL General Mailing List
For list archives: http://lists.mysql.com/mysql
To unsubscribe:http://lists.mysql.com/[EMAIL PROTECTED]



Re: merge table: speed benefits?

2005-12-21 Thread Tom Cunningham
OK thanks for your help, so my summary:

(1) spreading merged tables across disks will only help concurrent
queries (in my data-warehouse application I'm doing things serially).

(2) there's no efficiency in the way a merge table splits the indexes
into smaller files -- if anything, it will be less efficient. *Unless*
you bypass the merge table and go directly to the underlying table,
which would require some client-side logic.

(3) mysql 5.1 has data partitioning, but that's at least 6 months away?


So: my original problem was trying to decide whether to split into
month-chunks or week-chunks. I think I will choose month-chunks, which
will make data management easier, and there are no performance gains
to be expected from going with week-chunks.

Thanks.

Tom.

--
MySQL General Mailing List
For list archives: http://lists.mysql.com/mysql
To unsubscribe:http://lists.mysql.com/[EMAIL PROTECTED]



merge table: speed benefits?

2005-12-20 Thread Tom Cunningham
Hi all.

I'm working on splitting a 5G myisam fact table into separate parts,
and putting a merge table on top.

his will definitely help with table management, but I am hoping that a
merge table might help me with query speed:

(a) The docs say that spreading the underlying tables across different
disks can make queries faster. I don't quite understand how this will
work in a normal query: if I do a SUM(amount) over the entire table,
will it be quicker if the table is spread across different disks? I
don't see how mysql can avoid doing this sequentially.

(b) I also thought that if I do a series of queries, via the merge
table, which all go to a single underlying table, then that would be
quicker than the same queries to the original monster table: because
the index and data caches would just be cacheing for a single table.
is this true?. On reflection I thought that the index and data
caches only cache *parts* of tables, not entire tables, so it will
make no difference.

(c) Finally: in theory the optimiser could use two indexes: use index
A to decide which tables to look at, then use index B within those
tables to find the rows that meet another condition. This would be an
advantage over a non-merged table. But I don't think Mysql can do
this, can it?

So should I expect any speed benefits to splitting my fact table?

Tom.

--
MySQL General Mailing List
For list archives: http://lists.mysql.com/mysql
To unsubscribe:http://lists.mysql.com/[EMAIL PROTECTED]



sudden mysqldump errors

2005-08-04 Thread Tom Cunningham
Hi list,

our overnight backup scripts have suddenly been getting a lot of
mysqldump errors like:

mysqldump: Error 2013: Lost connection to MySQL server during query
when dumping table `cs_email_things` at row: 512234

I've checked the obvious things, and no luck:
 * disk space is OK
 * nothing in the mysqld log
 * nothing in /var/log/messages/
 * CHECK TABLES doesn't complain about the tables

In fact, now that I think of it, it's happened on two different
servers; though mysqldump is always running on the same machine.

Any ideas?

Thanks.

Tom.

--
MySQL General Mailing List
For list archives: http://lists.mysql.com/mysql
To unsubscribe:http://lists.mysql.com/[EMAIL PROTECTED]



cross-reference of field names in query

2005-06-20 Thread Tom Cunningham
Here's something I've been curious about:

Referring to one field from another field in a 'select' query:

SELECT data*2 AS intermediate_result,
  intermediate_result + 1 AS final_result
FROM table;

I know that MySql 4.1 can't handle it, but:

(a) Is it part of standard SQL? (I have an example from a Joe Celko
book that seems to do it)

(b) If yes, then why not MySql? If no, then is there a simple reason
why it's not sensible?

Tom.

--
MySQL General Mailing List
For list archives: http://lists.mysql.com/mysql
To unsubscribe:http://lists.mysql.com/[EMAIL PROTECTED]



Re: resolving ambiguous column name in subquery

2005-04-04 Thread Tom Cunningham
I take it all back. I see now why this behaviour is desirable, or at
least standard.

E.g., see:

https://aurora.vcu.edu/db2help/db2s0/c2corr.htm

On Apr 4, 2005 2:40 PM, Tom Cunningham <[EMAIL PROTECTED]> wrote:
> Hi, the following seems wrong to me. Not sure whether this is a bug.
> 
> In short: a column-name in a subquery can refer to a table *outside*
> of the subquery (fair enough), but if the column-name is ambiguous
> between *inside* and *outside*, the parser assumes that it refers to
> the *inside* context.
> 
> I would have thought it better to reject the query where it is ambiguous.
> 
> Tom.
> 
> 
> 
> drop table t1;
> create table t1 (c1 int);
> 
> drop table t2;
> create table t2 (c2 int);
> 
> delete from t1 where c1 not in (select c1 from t2);
>   > works, & slightly confusing, but reasonable
> --
> 
> drop table t1;
> create table t1 (c int);
> 
> drop table t2;
> create table t2 (c int);
> 
> delete from t1 where c not in (select c from t2);
>> this query works, but resolves the ambiguity in the *opposite*
> way to above. The subquery column now refers to the inside subquery.
>

-- 
MySQL General Mailing List
For list archives: http://lists.mysql.com/mysql
To unsubscribe:http://lists.mysql.com/[EMAIL PROTECTED]



resolving ambiguous column name in subquery

2005-04-04 Thread Tom Cunningham
Hi, the following seems wrong to me. Not sure whether this is a bug.

In short: a column-name in a subquery can refer to a table *outside*
of the subquery (fair enough), but if the column-name is ambiguous
between *inside* and *outside*, the parser assumes that it refers to
the *inside* context.

I would have thought it better to reject the query where it is ambiguous.

Tom.



drop table t1;
create table t1 (c1 int);

drop table t2;
create table t2 (c2 int);

delete from t1 where c1 not in (select c1 from t2);
  > works, & slightly confusing, but reasonable
--

drop table t1;
create table t1 (c int);

drop table t2;
create table t2 (c int);

delete from t1 where c not in (select c from t2);
   > this query works, but resolves the ambiguity in the *opposite*
way to above. The subquery column now refers to the inside subquery.

-- 
MySQL General Mailing List
For list archives: http://lists.mysql.com/mysql
To unsubscribe:http://lists.mysql.com/[EMAIL PROTECTED]



Re: INSERT .. SELECT ... ON DUPLICATE KEY UPDATE

2005-02-27 Thread Tom Cunningham
Thanks Shawn, Sergei.

I'll get onto the new version as soon as I can.

Tom.

-- 
MySQL General Mailing List
For list archives: http://lists.mysql.com/mysql
To unsubscribe:http://lists.mysql.com/[EMAIL PROTECTED]



INSERT .. SELECT ... ON DUPLICATE KEY UPDATE

2005-02-24 Thread Tom Cunningham
It appears you can't combine an insert-select with an on-duplicate-key-update.

I would find it very useful if you *could* do this. I know it would be
complicate how you would handle the syntax for what to do when you hit
a duplicate key, could do this: update all the columns that are *not*
involved in duplicate-key clashes.

This would help in situations like the following:

--> You have a table with some columns that are unique, & other
columns which are descriptive of those unique combinations. You also
have an auto-increment key.
--> You have a lot of data you want to insert into this table, & add
new entries where you haven't before heard of that combination of
unique-keys, & otherwise update the non-unique columns.

Possibilities:
1. INSERT IGNORE: doesn't update the non-unique columns.
2. REPLACE: screws up the auto-increment columns.

Any other ideas?

Note: the situation described is one you find when implementing
data-warehouse so-called "slowly changing dimensions".

Tom.

-- 
MySQL General Mailing List
For list archives: http://lists.mysql.com/mysql
To unsubscribe:http://lists.mysql.com/[EMAIL PROTECTED]



Re: passing environment variable to an SQL script

2005-02-04 Thread Tom Cunningham
I found an answer:

mysql -e "SET @VAR:=1234; SOURCE xxx.sql"

-- 
MySQL General Mailing List
For list archives: http://lists.mysql.com/mysql
To unsubscribe:http://lists.mysql.com/[EMAIL PROTECTED]



passing environment variable to an SQL script

2005-02-01 Thread Tom Cunningham
Hi everybody.

I do a lot of maintenance things on the DB by piping SQL files to my MySql DB.

What would be very useful is to find a way of passing very simple
*arguments* to these scripts. The only way I can think of it is using
a nasty trick

> mysql --set-variable=wait_timeout=1234 < file.sql

and file.sql contains the line:
> SELECT @VAR:=@@wait_timeout;

This lets me smuggle a variable into my SQL script. Has anyone got a
cleaner way?

Tom.

-- 
MySQL General Mailing List
For list archives: http://lists.mysql.com/mysql
To unsubscribe:http://lists.mysql.com/[EMAIL PROTECTED]



data-warehouse ETL with MySQL

2005-01-14 Thread Tom Cunningham
Hi everybody.

This is just a note to tell people what I'm doing, hoping I might get
some comments.

In short: I have been using MySQL SQL scripts to do data-warehouse ETL
for about 6 months. It's gone well, but perhaps getting a little
creaky now.

The general format is this:
 1. Add any new records to my dimensions, if not already there. Done
with a "LEFT JOIN dimension ON (..dimension attributes..) WHERE
dimension.pk IS NULL"

 2. Insert the facts, using the dimensional keys.

 3. Update the fact records' other columns.

The predictable problem I'm having is the lack of abstractness of SQL,
so I have to repeat things which I don't want to repeat.

I'm looking at an alternative, which may cut down repetition, and I'm
curious if anyone else has thought or written about this sort of
stuff.

The alternative:
 1. Put *all* the ETL data into a flat table first.

 2. Have a unique index on *all* the rows in my slowly-changing
dimension. So new dimension records are simply added with an "INSERT
IGNORE" from the flat table.

 3. Insert the facts from the flat table using NATURAL JOINs with the
dimension tables.

This second approach has advantages & disadvantages:

GOOD: Less mentioning of column names, so the script is shorter & more
robust to change.

GOOD: The last advantage particularly good for what I'd call
"attribute-combination" columns than the other method, where the
dimension has no natural key.

BAD: All dimension columns must be considered to be slowly-changing
(becuase of the unique  key and natural join) - though doesn't seem
serious problem now.


I have some sample scripts fleshing these ideas out more, if anyone's
interested (tom, at, videoisland, dot, com).

-- 
MySQL General Mailing List
For list archives: http://lists.mysql.com/mysql
To unsubscribe:http://lists.mysql.com/[EMAIL PROTECTED]



not recognizing an index (possible_keys)

2004-11-12 Thread Tom Cunningham
I've had something funny happen to me thismorning: I did an EXPLAIN -
it couldn't find an index that was there. Then I did a DESCRIBE. Then
I did an EXPLAIN again, and it *could* find the index.

Is this expected behaviour? (4.0.22-standard)

1st EXPLAIN:
+---+--++-+-+---+-+--+
| table | type | possible_keys  | key | key_len | ref   |
rows| Extra|
+---+--++-+-+---+-+--+
| c | ref  | PRIMARY,status | status  |   1 | const | 
 44394 | Using where; Using temporary; Using filesort |
| s | ref  | customer_id,status | customer_id |   8 | c.id  | 
 1 | Using where  |
| crl   | ALL  | NULL   | NULL|NULL | NULL  |
1668865 |  |
+---+--++-+-+---+-+--+


then I did a DESC on "crl".
And a SHOW CREATE TABLE on "crl".

Then I did another EXPLAIN on the query, and got (note: now there are
possible_keys on "crl"):

+---+--+++-++---+--+
| table | type | possible_keys  | key|
key_len | ref| rows  | Extra  
 |
+---+--+++-++---+--+
| c | ref  | PRIMARY,status | status |
  1 | const  | 44394 | Using where; Using temporary; Using
filesort |
| s | ref  | customer_id,status | customer_id|
  8 | c.id   | 1 | Using where
 |
| crl   | ref  | idx_customer_id_status | idx_customer_id_status |
 11 | c.id,const |17 |
 |
+---+--+++-++---+--+

-- 
MySQL General Mailing List
For list archives: http://lists.mysql.com/mysql
To unsubscribe:http://lists.mysql.com/[EMAIL PROTECTED]



size of indexes with prefixes

2004-11-02 Thread Tom Cunningham
Hi - I've been trying different prefixes on my index & I'm getting
funny results: in short:

   >> an index on a column-prefix is bigger than an index on the whole column <<

using "show table status" I get total index size as:
12147712 - without index 
13239296 - with full index
13455360 - with index on first 5 chars
13157376 - with index on first 3 chars

 - i.e., the prefix-index is *bigger* than the full index - does this
make sense?

The column is a varchar(255).
About 2/3 of the entries are empty-string, & the other 1/3 have avg
length 8-chars, std 3.5.

Would making the empty-strings null make any difference to the size of
the index?


Thanks in advance everybody. (Using 4.0.20)

P.S. - there's not a simple way of getting size of individual index
sizes is there? Is it a direct function of the cardinality when you do
"show indexes on [table]"

Tom.

-- 
MySQL General Mailing List
For list archives: http://lists.mysql.com/mysql
To unsubscribe:http://lists.mysql.com/[EMAIL PROTECTED]



Re: "Data Truncated" warning about join, not insert?

2004-10-27 Thread Tom Cunningham
Sorry I forgot to include table-defs: ..

OK, I've got it entirely reproducible now, without giving away any
sensitive data.

No, the column-lengths are the same. But doing this has shown me that
it's something to do with the partial key on the 'email' column in the
'user' table. If you remove it, it stops happening. This is not what I
would have expected - does it make sense?

---

  DROP TABLE IF EXISTS customers;
CREATE TABLE `customers` (
  `id` bigint(20) NOT NULL auto_increment,
  `email` varchar(255) NOT NULL default '',
  PRIMARY KEY  (`id`),
  UNIQUE KEY `email` (`email`)
) ENGINE=MyISAM DEFAULT CHARSET=latin1;

drop table if exists user;
CREATE TABLE `user` (
  `user_id` int(11) NOT NULL auto_increment,
  `email` varchar(255) NOT NULL default '',
  PRIMARY KEY  (`user_id`),
  KEY `user_id` (`user_id`),
  KEY `email` (`email`(15))
) ENGINE=MyISAM DEFAULT CHARSET=latin1;

insert into user (user_id, email)
values 
(5122 , '[EMAIL PROTECTED]' ),
(5143 , '[EMAIL PROTECTED]');

INSERT INTO customers (id)
SELECT null
  FROM user
  LEFT JOIN user dup_user
ON dup_user.user_id>user.user_id
   AND user.email=dup_user.email
LIMIT 1;

show warnings;

-- 
MySQL General Mailing List
For list archives: http://lists.mysql.com/mysql
To unsubscribe:http://lists.mysql.com/[EMAIL PROTECTED]



"Data Truncated" warning about join, not insert?

2004-10-26 Thread Tom Cunningham
Hi Everybody.

I have a funny problem: I'm getting thousands of "Data Truncated"
warnings when I insert a single row of data into a table, using an
INSERT...SELECT.

The funny thing is, the warning is for a column I'm not even
inserting. It's only a column that I'm doing a join on.

The query is this:


INSERT INTO customers (id)
SELECT null
  FROM statistics.user
  LEFT JOIN statistics.user dup_user 
 ON dup_user.domain_id=user.domain_id
   AND dup_user.user_id>user.user_id 
   AND user.email=dup_user.email
 LIMIT 1;
+-+--++
| Level   | Code | Message|
+-+--++
| Warning | 1265 | Data truncated for column 'email' at row 1 |
+-+--++



(MySQL on a SuSE 9 box, tested with both 4.0.18-standard and 4.1.2-alpha-

-- 
MySQL General Mailing List
For list archives: http://lists.mysql.com/mysql
To unsubscribe:http://lists.mysql.com/[EMAIL PROTECTED]



Re: Partial Row Reads?

2004-10-25 Thread Tom Cunningham
I think what Harald is saying (& if he's not, then I say this):

You could have an alternative table structure like this: - it should
make queries much quicker:


create table raddata_2004_10_ONE (
 granID   integer not null,
 scanID   tinyint unsigned not null,
 fpID tinyint not null,
 c1   float not null,
...
 c500float not null,
)

create table raddata_2004_10_TWO (
 granID   integer not null,
 scanID   tinyint unsigned not null,
 fpID tinyint not null,
 c501   float not null,
...
 c1000float not null,
)

But maybe with a more logical way of partitioning the columns among
the different tables.

Or, another option, perhaps you've thought of, have a table like this:
should make indexing much worse, but querying much better.

create table raddata_2004_10_TWO (
 granID   integer not null,
 scanID   tinyint unsigned not null,
 fpID tinyint not null,
 cID   smallint not null,
 cValue   float not null,
  primary key (granID, scanID, fpID, cID)
  index (granID, cID)
)

OLD QUERY:
Select c1 from raddata_2004_10 where granID between 147568 and 15

NEW QUERY:
Select cValue from raddata_2004_10 where granID between 147568 and
15 and cID=1;
(should be v. fast)

---> incidentally: I have a question: when you have a table like this
with a primary key which has a lot of columns, is there any
performance benefit to adding a new primary key, as an auto-increment
column, & keeping the old primary key as a unique index? I thought
maybe there might be some addressing / hashing issues which worked out
quicker?

Tom.

-- 
MySQL General Mailing List
For list archives: http://lists.mysql.com/mysql
To unsubscribe:http://lists.mysql.com/[EMAIL PROTECTED]



Large amounts of SQL

2004-07-22 Thread Tom Cunningham
Hi this is just a general question - how do other people organise a 
large amount of SQL?

I have begun to have a lot of SQL files lying around, used for some 
data-mining type stuff, & things are getting a bit messy.

I do my best with:
(a) putting user variables at the start of files for anything configurable
(b) putting repeated stuff in separate files & calling with SOURCE
(c) using nice formatting, where you have a line of spaces running down 
the middle:
http://www.hk8.org/old_web/oracle/prog2/ch03_02.htm

I wonder does anyone else have any good tips?
I was thinking there might be a good pre-processor you could write to 
make SQL more manageable and less repetitive - perhaps as a MysqlCC 
plugin. Anyone heard of anything along these lines?

Tom.
--
MySQL General Mailing List
For list archives: http://lists.mysql.com/mysql
To unsubscribe:http://lists.mysql.com/[EMAIL PROTECTED]


User variables & optimization of IF()

2004-07-07 Thread Tom Cunningham
Hi everybody.
I generally try to parametrize out my queries as much as possible with 
user-variables. So, say you have a general query for all users:

---
SELECT ...
FROM users
---
It's nice to do this:

SET @USER_ID:= NULL;
SELECT ...
FROM users
WHERE IF(@USER_ID IS NOT NULL,[EMAIL PROTECTED],1)
---
This *works*, but when you set @USER_ID to a number, the 4.1 optimizer 
doesn't optimize it very well: it does a table-scan, instead of using 
the index on user_id.

Has anyone got a good alternative that will do less table-scans?
Or, if a developer is listening, can you give us details on how the 
optimizer deals with IF()?

Thanks.
Tom.
--
MySQL General Mailing List
For list archives: http://lists.mysql.com/mysql
To unsubscribe:http://lists.mysql.com/[EMAIL PROTECTED]


User variables & optimization of IF()

2004-07-06 Thread Tom Cunningham
Hi everybody.
I often try to parametrize out my queries as much as possible with 
user-variables. So, say you have a general query for all users:

---
SELECT ...
FROM users
---
It's nice to do this:

SET @USER_ID:= NULL;
SELECT ...
FROM users
WHERE IF(@USER_ID IS NOT NULL,[EMAIL PROTECTED],1)
---
This *works*, but when you set @USER_ID to a variable, the 4.1 optimizer 
doesn't optimize it very well: it does a table-scan, instead of using 
the index on user_id.

Has anyone got a good alternative that will do less table-scans?
Or, if a developer is listening, can you give us details on how the 
optimizer deals with IF()?

Thanks.
Tom.

--
MySQL General Mailing List
For list archives: http://lists.mysql.com/mysql
To unsubscribe:http://lists.mysql.com/[EMAIL PROTECTED]


replication corruption because of max_packet_size

2004-05-21 Thread Tom Cunningham
hi everybody,
Just a note of a problem I found - to see whether worth reporting this 
as a bug, or suggest that someone mention it in the manual.

In short: *when replication fails because 'max_packet_size' is too 
small, that failure can corrupt the relay logs*. So even when you fix 
the max_packet_size, you get another replication error.

I got into this problem, and I've found a couple of mentions of other 
people having this problem on Google.

The fix is this: (1) stop slave (make a note of Exec_master_log); (2) 
set max_packet_size to be larger; (3) do a CHANGE MASTER to re-read logs 
from the Exec_master_log position; (4) start slave.

Tom.


replication of temporary tables

2004-05-12 Thread Tom Cunningham
Hi,

In short: what is the standard MySQL handling of temporary tables, 
*especially* when you've got a slave filter on to only replicate certain 
tables?

My replication stopped over the weekend, because a query was executed 
which created temporary tables, then used that table to update one of 
the non-temporary tables which I was replicating.

Do I need to add all possible temporary tables to my slave-side 
replication filter? (i.e., replicate-do-table=temp1, temp2, ...) (even 
when I did so, it didn't seem to work, but I'll have another go).

Tom.

--
MySQL General Mailing List
For list archives: http://lists.mysql.com/mysql
To unsubscribe:http://lists.mysql.com/[EMAIL PROTECTED]


Re: sql prob

2004-01-30 Thread Tom Cunningham
Try this:

CREATE TEMPORARY TABLE max_entry
 SELECT NoPeople, Max(Date_entry) AS Date_entry FROM table;

SELECT NoPeople FROM table NATURAL JOIN max_entry
 WHERE Departure IS NULL;

> Hi all :)
> 
> I have 1 relation A.
> 
> Structure of A:
> NoPeople
> Date_entry
> Date_departure
> 
> and key is (NoPeople + Date_entry)
> 
> exemple of records:
> 
> NoPeople Entry Departure
> P1 E1 D1
> P1 E2 D2
> P1 E3 NULL
> 
> P2 E1 NULL
> 
> P3 E1 D1
> 
> P4 E1 NULL
> P4 E2 NULL
> P4 E3 D2
> 
> E3>E2>E1 and D2>D1. P1,P2,P3 and P4 are different persons.
> 
> I would like to query all the people whose date_entry is their MAX date 
> entry AND that have their date departure to NULL. In this exemple, P1 
> and P2 are ok. P4 isn't ok because max(Date_entry) is E3 and 
> Date_departure - when Date_entry is E3 - isn't NULL.
> 
> I tried some queries with 'group by' but i can't make it working.
> Any hint "sql masters"  :)
> 
> fab
> 
> 
> 
> 
> 
> 
> -- 
> MySQL General Mailing List
> For list archives: http://lists.mysql.com/mysql
> To unsubscribe:http://lists.mysql.com/[EMAIL PROTECTED]
> 


-- 
MySQL General Mailing List
For list archives: http://lists.mysql.com/mysql
To unsubscribe:http://lists.mysql.com/[EMAIL PROTECTED]



Re: Indexes to speed up a duplicates query.

2003-07-27 Thread Tom Cunningham
Here's the full query I'm trying to do:

explain select 
 users.status, 
 users.oid as oid, 
 users.type as type, 
 users.exclusive as exclusive, 
 userse.o_initials as exclusive_initials, 
 users.name_processed as name_processed, 
 users.o_company as o_company, 
 unix_timestamp(users.created) as created, 
 count(distinct notes.oid) as notes_num, 
 modifiers.o_initials as modified_by_initials, 
 duplicateusers.oid as duplicate, 
 duplicateusers.name_processed as duplicate_name, 
 duplicateusers.o_company as duplicate_company, 
 unix_timestamp(duplicateusers.created) as duplicate_created, 
 duplicateusers.modified as duplicate_modified, 
 count(distinct duplicatenotes.oid) as duplicate_num_notes 
fROM users 
 left join users duplicateusers on (duplicateusers.created < users.created)

 left join users userse on users.exclusive=userse.oid 
 left join notes on notes.parent=users.oid 
 left join users modifiers on users.modified_by=modifiers.oid 
 left join notes duplicatenotes on duplicatenotes.parent=duplicateusers.oid

where (users.status='O') 
 and (((1=1 and users.type='USER'))) 
 and (soundex(users.o_last_name)=soundex(duplicateusers.o_last_name) 
   or (users.o_email_address=duplicateusers.o_email_address))
group by oid

---
Plus a supplementary question: whether I have 'limit 1' or 'limit 100' doesn't
seem to make any difference. I thought it must be my 'order by' clause, but
I got rid of that & no change.
 - this is 


Tom Cunningham.

-- 
MySQL General Mailing List
For list archives: http://lists.mysql.com/mysql
To unsubscribe:http://lists.mysql.com/[EMAIL PROTECTED]



Indexes to speed up a duplicates query.

2003-07-27 Thread Tom Cunningham
OK, say you're trying to find all the non-exact duplicates in a table, and especially
do it with a single query, so that the check can be part of a user interface.


Then naturally you're going to have a slow query, because it has to compare
every row with every other row.

You ought to be able to cut out exactly half of the records in the second table,
by just saying:

 SELECT * FROM TABLE
 LEFT JOIN users dups ON users.createdhttp://lists.mysql.com/mysql
To unsubscribe:http://lists.mysql.com/[EMAIL PROTECTED]