RE: remove temporary table from SELECT query

2007-08-09 Thread Andrew Armstrong
It goes to a temporary table when MySQL does not have enough memory
(allocated) to store the temporary results in memory, so it needs to create
a temporary table on disk.

Try increasing the memory buffer size or eliminating more rows from the
query.

-Original Message-
From: Mike Zupan [mailto:[EMAIL PROTECTED] 
Sent: Friday, 10 August 2007 4:52 AM
To: mysql@lists.mysql.com
Subject: remove temporary table from SELECT query

I have been pulling my hair out over a temporary table being created in the
following query

SELECT SQL_NO_CACHE SQL_CALC_FOUND_ROWS entryid,title FROM
friends_test INNER JOIN entries ON friendLink=userid AND userLink=2 order by
entryid

if I change userLink=2 to friendLink=2 it is fine and its very fast. If i
leave it the query is around 2 seconds.



++-+--+--+-+--+-
+---+--+
-+
| id | select_type | table| type | possible_keys   | key  |
key_len | ref   | rows |
Extra   |
++-+--+--+-+--+-
+---+--+
-+
|  1 | SIMPLE  | friends_test | ref  | userLink,friendLink | userLink |
3   | const |  458 | Using temporary; Using
filesort |
|  1 | SIMPLE  | entries  | ref  | userid  | userid   |
4   | photoblog.friends_test.friendLink |   11 | Using
where |
++-+--+--+-+--+-
+---+--+
-+

The above is an explain of the bad query


Here is the table data for the friends_test and entries table


CREATE TABLE `friends_test` (
  `friendID` mediumint(8) NOT NULL auto_increment,
  `userLink` mediumint(8) unsigned NOT NULL,
  `friendLink` mediumint(8) unsigned NOT NULL,
  `status` tinyint(1) NOT NULL default '1',
  PRIMARY KEY  (`friendID`),
  KEY `userLink` (`userLink`),
  KEY `friendLink` (`friendLink`)
) ENGINE=MyISAM DEFAULT CHARSET=latin1 AUTO_INCREMENT=74971 ;


CREATE TABLE `entries` (
  `entryid` mediumint(10) unsigned NOT NULL auto_increment,
  `userid` mediumint(8) unsigned default NULL,
  `title` varchar(255) character set utf8 collate utf8_unicode_ci default
NULL,
  `photos` text,
  `sizes` mediumtext NOT NULL,
  `text` text character set utf8 collate utf8_unicode_ci,
  `category` int(6) unsigned default NULL,
  `created` int(10) unsigned default NULL,
  `ts` int(10) unsigned default '0',
  `modified` int(10) unsigned default NULL,
  `date` date NOT NULL default '-00-00',
  `comments` smallint(3) unsigned NOT NULL default '1',
  `views` mediumint(8) NOT NULL default '0',
  `dir` varchar(10) NOT NULL default 'photos',
  `server` varchar(20) NOT NULL default 'i1.photoblog.com',
  `notes` longtext character set utf8 collate utf8_unicode_ci NOT NULL,
  `titles` text character set utf8 collate utf8_unicode_ci NOT NULL,
  PRIMARY KEY  (`entryid`),
  KEY `userid` (`userid`),
  KEY `date` (`date`),
  KEY `created` (`created`),
  KEY `ts` (`ts`),
  FULLTEXT KEY `title` (`title`,`text`)
) ENGINE=MyISAM DEFAULT CHARSET=latin1 AUTO_INCREMENT=204306 ;



any help or pointers is a BIG help.


--
MySQL General Mailing List
For list archives: http://lists.mysql.com/mysql
To unsubscribe:http://lists.mysql.com/[EMAIL PROTECTED]



RE: How can I delete a block of random rows?

2007-08-03 Thread Andrew Armstrong
Are you sure you want to delete random rows, or do you (if you have
sequential IDs) just want to delete every n'th row?

DELETE FROM table WHERE id MOD 5 = 0

Delete every 5th row from the table assuming sequential IDs with no missing
numbers.

Something like that anyway.

-Original Message-
From: Baron Schwartz [mailto:[EMAIL PROTECTED] 
Sent: Saturday, 4 August 2007 12:41 PM
To: Daevid Vincent
Cc: 'MySQL General'
Subject: Re: How can I delete a block of random rows?

Daevid Vincent wrote:
 I have a SQL dump that I load in for testing with lots of interesting
 data.
  
 I want to now pair that down to a smaller subset, however I can't just
 delete from a range of ID's, as the data is sequential.
  
 I want to delete say 1000 random rows from the table.

What fraction of the rows do you want to delete? 1%?

DELETE FROM TABLE WHERE RAND()  .01;

-- 
MySQL General Mailing List
For list archives: http://lists.mysql.com/mysql
To unsubscribe:http://lists.mysql.com/[EMAIL PROTECTED]




-- 
MySQL General Mailing List
For list archives: http://lists.mysql.com/mysql
To unsubscribe:http://lists.mysql.com/[EMAIL PROTECTED]



RE: Is MySQL Embedded the solution?

2007-07-28 Thread Andrew Armstrong
You may want to consider SQLite if you have not seen it already.

http://www.sqlite.org/

- Andrew

-Original Message-
From: Car Toper [mailto:[EMAIL PROTECTED] 
Sent: Sunday, 29 July 2007 7:10 AM
To: mysql@lists.mysql.com
Subject: Is MySQL Embedded the solution?

I am starting to do the RD on the next phase of development on my
program and I am looking for a database and I am wondering if MySQL
Embedded is the solution.

The database needs to be embedded from the standpoint that when my
application starts, so does the database, when mine shuts down, so
does the database.  The real key feature I am in need of is the
physical database file to be able to be treated like any other
datafile.

Think of my software as job based.  The user fires up my software,
creates a new job, does his/her thing and exits.   All the data,
including the database file, for each job needs to exist in the job
directory/folder so that later the user can copy the job folder to
other locations.

I know file base databases like MS Access are designed this way and I
know databases like MS SQL are NOT.  The MS SQL physical files cannot
eaily be moved, or at least not the last time I check;)

Will MySQL Embedded give me the ability I need to create a new
physical database file for every job and load up older database jobs
once the files are moved?

Cartoper

-- 
MySQL General Mailing List
For list archives: http://lists.mysql.com/mysql
To unsubscribe:http://lists.mysql.com/[EMAIL PROTECTED]




-- 
MySQL General Mailing List
For list archives: http://lists.mysql.com/mysql
To unsubscribe:http://lists.mysql.com/[EMAIL PROTECTED]



Using index for group-by: Not working?

2007-07-28 Thread Andrew Armstrong
Hi,

 

I have the following query:

 

SELECT c2, c3, c4, Count(DISTINCT c5)

FROM table1 

WHERE c1 IN (1, 2, 3...)

GROUP BY c2, c3, c4

order by null

 

Yet I can only get it at best to show (under extra): Using where, using
filesort.

 

I have read up on:
http://dev.mysql.com/doc/refman/5.0/en/loose-index-scan.html and
http://dev.mysql.com/doc/refman/5.0/en/distinct-optimization.html and
http://dev.mysql.com/doc/refman/5.0/en/explain.html yet cannot get this
index to be used.

 

I am running MySQL '5.1.17-beta-community-nt-debug'

 

There are over 600,000 rows in table1 for my testing.

 

I have tried placing indexes on: idx(c2, c3, c4, c5) and idx(c1, c2, c3, c4,
c5) and indx(c5, c1, c2, c3, c4) with no result.

 

Is there a reason I cannot get this query to use an index for grouping?

 

Cheers,

Andrew



Alternative to subquery to perform distinct aggregation in query

2007-07-28 Thread Andrew Armstrong
Hi,

 

I have a query at the moment like this:

 

SELECT SQL_NO_CACHE STRAIGHT_JOIN t1.col1, t1.col2, t2.col1, ...
MAX(t1.col6)...

(

SELECT Count(DISTINCT col1)

FROM table3 t3

WHERE t3.col1 = t1.col1 AND t3.col2 = t1.col2 AND t3.col1 IN
(139903,140244,140583,140924,141265,141601,141942)

) AS uniquecount

FROM table1 t1 

INNER JOIN table2 t2 ON t1.col6 = t2.id

WHERE t2.id IN (139903,140244,140583,140924,141265,141601,141942)

GROUP BY t1.col1, t1.col2, t1.col3

ORDER BY NULL

 

Basically, you can tell that theres a main table with information that's
aggregated, and then another table with matching rows to aggregate too - per
row for the first table.

 

This appears very slow. I've tried running the queries separately and they
appear to be performing better on their own (as somewhat expected).

 

Does anyone have any ideas on how to optimize the above query? I think I
will just go with the latter dual query approach as it is more gentle on the
database server too.

 

Cheers,

Andrew



RE: Using index for group-by: Not working?

2007-07-28 Thread Andrew Armstrong
Hey Terry,

1) I've tried placing the Count/Distinct bit first - no change. Could you
elaborate on your bottom up parsing? I am not aware of this myself (and
have not read this anywhere). Would be interested if you could elaborate or
provide more info.

2) When a GROUP BY is performed, sorting is done also (usually because its
free). Sorting by NULL will tell MySQL to not even bother doing this. I've
seen 'sort by null' (as suggested elsewhere) avoid 'Using filesort' under
the Extra column when its not even needed.

Thanks for the response.

-Original Message-
From: Terry Mehlman [mailto:[EMAIL PROTECTED] 
Sent: Sunday, 29 July 2007 1:18 PM
To: Andrew Armstrong
Subject: Re: Using index for group-by: Not working?

just a shot in the dark, but i would suggest two changes to your query.

1) put the count (distinct c5) first rather than last.  as i'm sure
you know the parse happens from the bottom up.  so, the indexes you
are placing on c2, c3, and c4 aren't doing you any good if you put the
distinct clause in their way.

2) order by null?  take that out and see if your performance improves.

just a couple of thoughts.



On 7/28/07, Andrew Armstrong [EMAIL PROTECTED] wrote:
 Hi,



 I have the following query:



 SELECT c2, c3, c4, Count(DISTINCT c5)

 FROM table1

 WHERE c1 IN (1, 2, 3...)

 GROUP BY c2, c3, c4

 order by null



 Yet I can only get it at best to show (under extra): Using where, using
 filesort.



 I have read up on:
 http://dev.mysql.com/doc/refman/5.0/en/loose-index-scan.html and
 http://dev.mysql.com/doc/refman/5.0/en/distinct-optimization.html and
 http://dev.mysql.com/doc/refman/5.0/en/explain.html yet cannot get this
 index to be used.



 I am running MySQL '5.1.17-beta-community-nt-debug'



 There are over 600,000 rows in table1 for my testing.



 I have tried placing indexes on: idx(c2, c3, c4, c5) and idx(c1, c2, c3,
c4,
 c5) and indx(c5, c1, c2, c3, c4) with no result.



 Is there a reason I cannot get this query to use an index for grouping?



 Cheers,

 Andrew




-- 
That which Voldemort does not value, he takes no trouble to
comprehend.  Of house-elves and children's tales, of love, loyalty,
and innocence, Voldemort knows and understands nothing.  Nothing.
That they all have a power beyond his own, a power beyond the reach of
any magic, is a truth he has never grasped.

- JK Rowling



--
MySQL General Mailing List
For list archives: http://lists.mysql.com/mysql
To unsubscribe:http://lists.mysql.com/[EMAIL PROTECTED]



RE: Using index for group-by: Not working?

2007-07-28 Thread Andrew Armstrong
It's just occurred to me that the IN clause is not a constant.

This probably throws out any chance of using an index for group by?

Cheers

-Original Message-
From: Andrew Armstrong [mailto:[EMAIL PROTECTED] 
Sent: Sunday, 29 July 2007 1:07 PM
To: mysql@lists.mysql.com
Subject: Using index for group-by: Not working?

Hi,

 

I have the following query:

 

SELECT c2, c3, c4, Count(DISTINCT c5)

FROM table1 

WHERE c1 IN (1, 2, 3...)

GROUP BY c2, c3, c4

order by null

 

Yet I can only get it at best to show (under extra): Using where, using
filesort.

 

I have read up on:
http://dev.mysql.com/doc/refman/5.0/en/loose-index-scan.html and
http://dev.mysql.com/doc/refman/5.0/en/distinct-optimization.html and
http://dev.mysql.com/doc/refman/5.0/en/explain.html yet cannot get this
index to be used.

 

I am running MySQL '5.1.17-beta-community-nt-debug'

 

There are over 600,000 rows in table1 for my testing.

 

I have tried placing indexes on: idx(c2, c3, c4, c5) and idx(c1, c2, c3, c4,
c5) and indx(c5, c1, c2, c3, c4) with no result.

 

Is there a reason I cannot get this query to use an index for grouping?

 

Cheers,

Andrew



--
MySQL General Mailing List
For list archives: http://lists.mysql.com/mysql
To unsubscribe:http://lists.mysql.com/[EMAIL PROTECTED]



RE: Data Warehousing and MySQL vs PostgreSQL

2007-07-27 Thread Andrew Armstrong
Jochem,

Yeah, I'm aware of partition pruning - and would be partitioning in such a
way to ensure there is a small partition to be scanned/inserted to when
required.

I'm aware of timezone issues and we just use unix timestamps so there are no
problems here.

I'll look closer into MySQL's partitioning.

Cheers
- Andrew

-Original Message-
From: Jochem van Dieten [mailto:[EMAIL PROTECTED] 
Sent: Friday, 27 July 2007 6:44 PM
To: mysql@lists.mysql.com
Subject: Re: Data Warehousing and MySQL vs PostgreSQL

On 7/26/07, Andrew Armstrong wrote:
 *   Table 1: 80,000,000 rows - 9.5 GB
 *   Table 2: 1,000,000,000 rows - 8.9 GB
 This is a generic star schema design for data warehousing.

 I have read that it is better if perhaps partitioning is implemented,
where
 new data is added to a partitioned table (eg, that represents a single
day)
 and then when those samples expire - simply drop the partition.

 I believe partitioning would solve issues with SELECT and INSERT
performance
 because the actual index tree size (and data in the table itself) would be
 reduced.

While partitioning will most likely alleviate your DML woes, partially
by breaking it up and partially by changing DML to DDL, we can not
make any reasonable statement about your SELECT performance since we
don't know what type of queries you will be running. If your queries
are going to cross all partitions partitioning is not going to help
you much, if your queries typically only touch one partition it will
help a lot.

 I am a bit hesitant however to go with PostgreSQL because the partitioning
 system seems a bit less easier to work with than MySQL (5.1's)
 implementation; as I would need to maintain my own master table for
clients
 to query, and I do not think partition pruning is in use at this time (eg,
 the analyser can ignore partitions that wont contain any matching data
based
 on the query being issued).

The PostgreSQL planner will prune every partition it can determine not
to have any matching data. Whether it can determine so depends on the
query and it can be a bit picky (e.g. if you pick the wrong datatype
for a timestamp it may not be a candidate for pruning because it has a
timezone, which is a bit unexpected if you are not used to timestamps
having a timezone at all).

Jochem

-- 
MySQL General Mailing List
For list archives: http://lists.mysql.com/mysql
To unsubscribe:http://lists.mysql.com/[EMAIL PROTECTED]




--
MySQL General Mailing List
For list archives: http://lists.mysql.com/mysql
To unsubscribe:http://lists.mysql.com/[EMAIL PROTECTED]



Data Warehousing and MySQL vs PostgreSQL

2007-07-26 Thread Andrew Armstrong
Hello,

 

I am seeking information on best practices with regards to Data Warehousing
and MySQL. I am considering moving to PostgreSQL.

 

I am currently using MySQL as the database of choice. I am now running into
performance issues with regards to large tables.

 

At the moment, I have the following general schema:

 

*   Table 1: 80,000,000 rows - 9.5 GB
*   Table 2: 1,000,000,000 rows - 8.9 GB

 

And a few other misc tables (Period table to hold time values, which point
to rows in Table 1 and Table 2).

 

This is a generic star schema design for data warehousing.

 

Information is added to this warehouse (DW) every five minutes.

 

Information is deleted from this DW as well, after every five minutes.

 

The data being recorded is time sensitive. As data ages, it may be deleted.
Groups of samples are aggregated into a summary/aggregation sample prior to
being deleted.

 

I believe this is starting to cause fragmentation issues (please correct me
if I am wrong) where old, deleted row positions get re-used by new rows on
disk.

This is possibly causing issues when aggregation needs to take place (eg,
condense the last six samples of five minute intervals to a thirty minute
sample).

 

Would this possibly be causing overhead because the samples may be scattered
around the disk instead of being in a line, due to old, deleted row
positions (further back in the database table) now being used for new data?
Is this a concern I should have?

 

The data in these tables could also be segmented/partitioned into their own
individual tables by about a factor of six. For example, five minute samples
are kept for 48 hours or so and then expire. 30 minute aggregation samples
are kept for longer, etc. I believe performance can be improved by
separating these different samples for one.

 

At the moment with MySQL, DELETE'ing from the table (in batches of say
10,000 rows every few seconds due to a LIMIT clause on the DELETE query),
rows are removed very slowly.

Additionally, INSERT's are sometimes very slow.

 

I believe this is just due to the sheer amount of data involved in this
single table.

 

I have read that it is better if perhaps partitioning is implemented, where
new data is added to a partitioned table (eg, that represents a single day)
and then when those samples expire - simply drop the partition.

 

I believe partitioning would solve issues with SELECT and INSERT performance
because the actual index tree size (and data in the table itself) would be
reduced. Additionally, expired samples can simply have their owning
table/partition dropped. This would (as I see it) eliminate some performance
issues with new rows being inserted, and aggregation of existing rows would
only need to check a smaller subset of the overall table due to looking at
only a handful of partitions instead of them all. 

 

Can anyone provide any insight on whether they have experienced these
performance issues before? Keep in mind these issues are on MySQL, and I am
wondering whether moving to PostgreSQL may be able to assist me.

 

Concurrent DELETE's and INSERT's every five minutes also hurts clients
trying to retrieve statistics, as the tables run on MyISAM (Table locks are
needed for exclusive writes). From what I can tell, PostgreSQL does not
require table locking for concurrent READ/INSERT/DELETE statements.

 

Should I be worried about such a large table? My main concern is that its so
slow because the index tree needs to be updated with each insert (and the
tree is no doubt quite big with so many rows already).

Will partitioning help here? I believe it would assist somewhat.

 

I am a bit hesitant however to go with PostgreSQL because the partitioning
system seems a bit less easier to work with than MySQL (5.1's)
implementation; as I would need to maintain my own master table for clients
to query, and I do not think partition pruning is in use at this time (eg,
the analyser can ignore partitions that wont contain any matching data based
on the query being issued).

 

Any comments or assistance will be appreciated.

 

Regards,

Andrew



RE: Data Warehousing and MySQL vs PostgreSQL

2007-07-26 Thread Andrew Armstrong
Hey,

Table 1 has 29 columns. Row length median is 78. Row size median is 126
bytes. Fixed table format.

Table 2 has 6 columns. Row length median is 22. Row size median is 96 bytes.
Fixed table format.

There are three indexes on each of these tables.

Table 1 has an index on 4 columns, and another two indexes on one column
each.

Table 2 has an index on 5 columns, and another two indexes on one column
each. 

Regards,
Andrew


-Original Message-
From: Ow Mun Heng [mailto:[EMAIL PROTECTED] 
Sent: Thursday, 26 July 2007 6:45 PM
To: Andrew Armstrong
Cc: mysql@lists.mysql.com
Subject: Re: Data Warehousing and MySQL vs PostgreSQL

On Thu, 2007-07-26 at 18:37 +1000, Andrew Armstrong wrote:
 Hello,
 
  
 
 I am seeking information on best practices with regards to Data
Warehousing
 and MySQL. I am considering moving to PostgreSQL.
 * Table 1: 80,000,000 rows - 9.5 GB
 * Table 2: 1,000,000,000 rows - 8.9 GB


Just curious, how many columns are there in each table? as the database
seems small to me

-- 
MySQL General Mailing List
For list archives: http://lists.mysql.com/mysql
To unsubscribe:http://lists.mysql.com/[EMAIL PROTECTED]




--
MySQL General Mailing List
For list archives: http://lists.mysql.com/mysql
To unsubscribe:http://lists.mysql.com/[EMAIL PROTECTED]



RE: Data Warehousing and MySQL vs PostgreSQL

2007-07-26 Thread Andrew Armstrong
Do you have a suggestion to how this should be implemented?

Data is aggregated over time and summary rows are created.

-Original Message-
From: Wallace Reis [mailto:[EMAIL PROTECTED] 
Sent: Thursday, 26 July 2007 8:43 PM
To: Andrew Armstrong
Cc: mysql@lists.mysql.com
Subject: Re: Data Warehousing and MySQL vs PostgreSQL

On 7/26/07, Andrew Armstrong [EMAIL PROTECTED] wrote:
 snip
 Information is deleted from this DW as well, after every five minutes.
 The data being recorded is time sensitive. As data ages, it may be
deleted.
 Groups of samples are aggregated into a summary/aggregation sample prior
to
 being deleted.
 I believe this is starting to cause fragmentation issues (please correct
me
 if I am wrong) where old, deleted row positions get re-used by new rows on
 disk.

There's something wrong. DW is a non-volatile database. Meaning that
data in the database is never over-written or deleted, once committed,
the data is static, read-only, but retained for future reporting.

-- 
wallace reis/wreis
Núcleo de Biologia Computacional e
Gestão de Informações Biotecnológicas/LABBI

-- 
MySQL General Mailing List
For list archives: http://lists.mysql.com/mysql
To unsubscribe:http://lists.mysql.com/[EMAIL PROTECTED]




--
MySQL General Mailing List
For list archives: http://lists.mysql.com/mysql
To unsubscribe:http://lists.mysql.com/[EMAIL PROTECTED]



RE: Data Warehousing and MySQL vs PostgreSQL

2007-07-26 Thread Andrew Armstrong
I've already chosen granularity for samples.

5 minute samples for example can expire after two days.

Aggregated/summarized rows of this data (30 minute sample for example, which
is the aggregation of the past 30 minutes worth of 5 minute samples in a
given window) expire after a week, etc.

I'm more concerned as to why inserts begin to slow down so much due to the
large table size.

-Original Message-
From: Wallace Reis [mailto:[EMAIL PROTECTED] 
Sent: Friday, 27 July 2007 1:02 AM
To: Andrew Armstrong
Cc: mysql@lists.mysql.com
Subject: Re: Data Warehousing and MySQL vs PostgreSQL

On 7/26/07, Andrew Armstrong [EMAIL PROTECTED] wrote:
 Do you have a suggestion to how this should be implemented?
 Data is aggregated over time and summary rows are created.

I think that you didnt design correctly your DW.
It should have just one very larger table (the fact table).
Data should never be deleted. If your client want to query data about
'five minutes sample' when they are already expired?
You should decide the data's granularity. And if you want to agregate
them, do roll up.  Or you can create materialized views for these
aggregates.

-- 
wallace reis/wreis
Núcleo de Biologia Computacional e
Gestão de Informações Biotecnológicas/LABBI

-- 
MySQL General Mailing List
For list archives: http://lists.mysql.com/mysql
To unsubscribe:http://lists.mysql.com/[EMAIL PROTECTED]




--
MySQL General Mailing List
For list archives: http://lists.mysql.com/mysql
To unsubscribe:http://lists.mysql.com/[EMAIL PROTECTED]



RE: Data Warehousing and MySQL vs PostgreSQL

2007-07-26 Thread Andrew Armstrong
Yep, thanks.

I've been looking at MySQL's partitioning in 5.1 which seems the best
choice. It's not GA yet, but I may consider using it as it appears to be
working fine at the moment.

-Original Message-
From: Brent Baisley [mailto:[EMAIL PROTECTED] 
Sent: Friday, 27 July 2007 10:23 AM
To: Andrew Armstrong
Cc: 'Wallace Reis'; mysql@lists.mysql.com
Subject: Re: Data Warehousing and MySQL vs PostgreSQL

Wallace is right, Data Warehousing shouldn't delete any data. MySQL  
isn't as robust as say, Oracle, for partitioning so you need to fudge  
things a little. I think partitioning is the way to go and you should  
use MERGE tables to handle your partitions. Really what you are  
looking to do is create a 48 hour view, or whatever time frame you  
want. You can retain all the data, just contain a subset of it in a  
MERGE table.
If you break out your tables into certain intervals, you can modify  
what tables are contained in a MERGE table on the fly. It's  
instantaneous since you are really only modifying a table  
description, not the physical structure. You can either reference the  
tables directly or reference the set through the MERGE table.

There are certain gotchas with MERGE tables you need to be aware of  
(read the manual).

An additional option is to use a feeder table that would be InnoDB.  
Your main tables would be MyISAM. Every 5 minutes you would grab the  
old data from the InnoDB table and insert it into the MyISAM/MERGE  
table setup. Then delete the records from the InnoDB table, which  
would only ever contain at most 10 minutes worth of data so adding  
and deleting should be quick.

I've used both techniques and others to manage high insert, large  
tables. Although I only reached about 500 million records for a 1  
month time period.

On Jul 26, 2007, at 6:17 PM, Andrew Armstrong wrote:

 I've already chosen granularity for samples.

 5 minute samples for example can expire after two days.

 Aggregated/summarized rows of this data (30 minute sample for  
 example, which
 is the aggregation of the past 30 minutes worth of 5 minute samples  
 in a
 given window) expire after a week, etc.

 I'm more concerned as to why inserts begin to slow down so much due  
 to the
 large table size.

 -Original Message-
 From: Wallace Reis [mailto:[EMAIL PROTECTED]
 Sent: Friday, 27 July 2007 1:02 AM
 To: Andrew Armstrong
 Cc: mysql@lists.mysql.com
 Subject: Re: Data Warehousing and MySQL vs PostgreSQL

 On 7/26/07, Andrew Armstrong [EMAIL PROTECTED] wrote:
 Do you have a suggestion to how this should be implemented?
 Data is aggregated over time and summary rows are created.

 I think that you didnt design correctly your DW.
 It should have just one very larger table (the fact table).
 Data should never be deleted. If your client want to query data about
 'five minutes sample' when they are already expired?
 You should decide the data's granularity. And if you want to agregate
 them, do roll up.  Or you can create materialized views for these
 aggregates.

 -- 
 wallace reis/wreis
 Núcleo de Biologia Computacional e
 Gestão de Informações Biotecnológicas/LABBI

 -- 
 MySQL General Mailing List
 For list archives: http://lists.mysql.com/mysql
 To unsubscribe:http://lists.mysql.com/mysql? 
 [EMAIL PROTECTED]




 --
 MySQL General Mailing List
 For list archives: http://lists.mysql.com/mysql
 To unsubscribe:http://lists.mysql.com/mysql? 
 [EMAIL PROTECTED]



-- 
MySQL General Mailing List
For list archives: http://lists.mysql.com/mysql
To unsubscribe:http://lists.mysql.com/[EMAIL PROTECTED]



--
MySQL General Mailing List
For list archives: http://lists.mysql.com/mysql
To unsubscribe:http://lists.mysql.com/[EMAIL PROTECTED]