RE: remove temporary table from SELECT query
It goes to a temporary table when MySQL does not have enough memory (allocated) to store the temporary results in memory, so it needs to create a temporary table on disk. Try increasing the memory buffer size or eliminating more rows from the query. -Original Message- From: Mike Zupan [mailto:[EMAIL PROTECTED] Sent: Friday, 10 August 2007 4:52 AM To: mysql@lists.mysql.com Subject: remove temporary table from SELECT query I have been pulling my hair out over a temporary table being created in the following query SELECT SQL_NO_CACHE SQL_CALC_FOUND_ROWS entryid,title FROM friends_test INNER JOIN entries ON friendLink=userid AND userLink=2 order by entryid if I change userLink=2 to friendLink=2 it is fine and its very fast. If i leave it the query is around 2 seconds. ++-+--+--+-+--+- +---+--+ -+ | id | select_type | table| type | possible_keys | key | key_len | ref | rows | Extra | ++-+--+--+-+--+- +---+--+ -+ | 1 | SIMPLE | friends_test | ref | userLink,friendLink | userLink | 3 | const | 458 | Using temporary; Using filesort | | 1 | SIMPLE | entries | ref | userid | userid | 4 | photoblog.friends_test.friendLink | 11 | Using where | ++-+--+--+-+--+- +---+--+ -+ The above is an explain of the bad query Here is the table data for the friends_test and entries table CREATE TABLE `friends_test` ( `friendID` mediumint(8) NOT NULL auto_increment, `userLink` mediumint(8) unsigned NOT NULL, `friendLink` mediumint(8) unsigned NOT NULL, `status` tinyint(1) NOT NULL default '1', PRIMARY KEY (`friendID`), KEY `userLink` (`userLink`), KEY `friendLink` (`friendLink`) ) ENGINE=MyISAM DEFAULT CHARSET=latin1 AUTO_INCREMENT=74971 ; CREATE TABLE `entries` ( `entryid` mediumint(10) unsigned NOT NULL auto_increment, `userid` mediumint(8) unsigned default NULL, `title` varchar(255) character set utf8 collate utf8_unicode_ci default NULL, `photos` text, `sizes` mediumtext NOT NULL, `text` text character set utf8 collate utf8_unicode_ci, `category` int(6) unsigned default NULL, `created` int(10) unsigned default NULL, `ts` int(10) unsigned default '0', `modified` int(10) unsigned default NULL, `date` date NOT NULL default '-00-00', `comments` smallint(3) unsigned NOT NULL default '1', `views` mediumint(8) NOT NULL default '0', `dir` varchar(10) NOT NULL default 'photos', `server` varchar(20) NOT NULL default 'i1.photoblog.com', `notes` longtext character set utf8 collate utf8_unicode_ci NOT NULL, `titles` text character set utf8 collate utf8_unicode_ci NOT NULL, PRIMARY KEY (`entryid`), KEY `userid` (`userid`), KEY `date` (`date`), KEY `created` (`created`), KEY `ts` (`ts`), FULLTEXT KEY `title` (`title`,`text`) ) ENGINE=MyISAM DEFAULT CHARSET=latin1 AUTO_INCREMENT=204306 ; any help or pointers is a BIG help. -- MySQL General Mailing List For list archives: http://lists.mysql.com/mysql To unsubscribe:http://lists.mysql.com/[EMAIL PROTECTED]
RE: How can I delete a block of random rows?
Are you sure you want to delete random rows, or do you (if you have sequential IDs) just want to delete every n'th row? DELETE FROM table WHERE id MOD 5 = 0 Delete every 5th row from the table assuming sequential IDs with no missing numbers. Something like that anyway. -Original Message- From: Baron Schwartz [mailto:[EMAIL PROTECTED] Sent: Saturday, 4 August 2007 12:41 PM To: Daevid Vincent Cc: 'MySQL General' Subject: Re: How can I delete a block of random rows? Daevid Vincent wrote: I have a SQL dump that I load in for testing with lots of interesting data. I want to now pair that down to a smaller subset, however I can't just delete from a range of ID's, as the data is sequential. I want to delete say 1000 random rows from the table. What fraction of the rows do you want to delete? 1%? DELETE FROM TABLE WHERE RAND() .01; -- MySQL General Mailing List For list archives: http://lists.mysql.com/mysql To unsubscribe:http://lists.mysql.com/[EMAIL PROTECTED] -- MySQL General Mailing List For list archives: http://lists.mysql.com/mysql To unsubscribe:http://lists.mysql.com/[EMAIL PROTECTED]
RE: Is MySQL Embedded the solution?
You may want to consider SQLite if you have not seen it already. http://www.sqlite.org/ - Andrew -Original Message- From: Car Toper [mailto:[EMAIL PROTECTED] Sent: Sunday, 29 July 2007 7:10 AM To: mysql@lists.mysql.com Subject: Is MySQL Embedded the solution? I am starting to do the RD on the next phase of development on my program and I am looking for a database and I am wondering if MySQL Embedded is the solution. The database needs to be embedded from the standpoint that when my application starts, so does the database, when mine shuts down, so does the database. The real key feature I am in need of is the physical database file to be able to be treated like any other datafile. Think of my software as job based. The user fires up my software, creates a new job, does his/her thing and exits. All the data, including the database file, for each job needs to exist in the job directory/folder so that later the user can copy the job folder to other locations. I know file base databases like MS Access are designed this way and I know databases like MS SQL are NOT. The MS SQL physical files cannot eaily be moved, or at least not the last time I check;) Will MySQL Embedded give me the ability I need to create a new physical database file for every job and load up older database jobs once the files are moved? Cartoper -- MySQL General Mailing List For list archives: http://lists.mysql.com/mysql To unsubscribe:http://lists.mysql.com/[EMAIL PROTECTED] -- MySQL General Mailing List For list archives: http://lists.mysql.com/mysql To unsubscribe:http://lists.mysql.com/[EMAIL PROTECTED]
Using index for group-by: Not working?
Hi, I have the following query: SELECT c2, c3, c4, Count(DISTINCT c5) FROM table1 WHERE c1 IN (1, 2, 3...) GROUP BY c2, c3, c4 order by null Yet I can only get it at best to show (under extra): Using where, using filesort. I have read up on: http://dev.mysql.com/doc/refman/5.0/en/loose-index-scan.html and http://dev.mysql.com/doc/refman/5.0/en/distinct-optimization.html and http://dev.mysql.com/doc/refman/5.0/en/explain.html yet cannot get this index to be used. I am running MySQL '5.1.17-beta-community-nt-debug' There are over 600,000 rows in table1 for my testing. I have tried placing indexes on: idx(c2, c3, c4, c5) and idx(c1, c2, c3, c4, c5) and indx(c5, c1, c2, c3, c4) with no result. Is there a reason I cannot get this query to use an index for grouping? Cheers, Andrew
Alternative to subquery to perform distinct aggregation in query
Hi, I have a query at the moment like this: SELECT SQL_NO_CACHE STRAIGHT_JOIN t1.col1, t1.col2, t2.col1, ... MAX(t1.col6)... ( SELECT Count(DISTINCT col1) FROM table3 t3 WHERE t3.col1 = t1.col1 AND t3.col2 = t1.col2 AND t3.col1 IN (139903,140244,140583,140924,141265,141601,141942) ) AS uniquecount FROM table1 t1 INNER JOIN table2 t2 ON t1.col6 = t2.id WHERE t2.id IN (139903,140244,140583,140924,141265,141601,141942) GROUP BY t1.col1, t1.col2, t1.col3 ORDER BY NULL Basically, you can tell that theres a main table with information that's aggregated, and then another table with matching rows to aggregate too - per row for the first table. This appears very slow. I've tried running the queries separately and they appear to be performing better on their own (as somewhat expected). Does anyone have any ideas on how to optimize the above query? I think I will just go with the latter dual query approach as it is more gentle on the database server too. Cheers, Andrew
RE: Using index for group-by: Not working?
Hey Terry, 1) I've tried placing the Count/Distinct bit first - no change. Could you elaborate on your bottom up parsing? I am not aware of this myself (and have not read this anywhere). Would be interested if you could elaborate or provide more info. 2) When a GROUP BY is performed, sorting is done also (usually because its free). Sorting by NULL will tell MySQL to not even bother doing this. I've seen 'sort by null' (as suggested elsewhere) avoid 'Using filesort' under the Extra column when its not even needed. Thanks for the response. -Original Message- From: Terry Mehlman [mailto:[EMAIL PROTECTED] Sent: Sunday, 29 July 2007 1:18 PM To: Andrew Armstrong Subject: Re: Using index for group-by: Not working? just a shot in the dark, but i would suggest two changes to your query. 1) put the count (distinct c5) first rather than last. as i'm sure you know the parse happens from the bottom up. so, the indexes you are placing on c2, c3, and c4 aren't doing you any good if you put the distinct clause in their way. 2) order by null? take that out and see if your performance improves. just a couple of thoughts. On 7/28/07, Andrew Armstrong [EMAIL PROTECTED] wrote: Hi, I have the following query: SELECT c2, c3, c4, Count(DISTINCT c5) FROM table1 WHERE c1 IN (1, 2, 3...) GROUP BY c2, c3, c4 order by null Yet I can only get it at best to show (under extra): Using where, using filesort. I have read up on: http://dev.mysql.com/doc/refman/5.0/en/loose-index-scan.html and http://dev.mysql.com/doc/refman/5.0/en/distinct-optimization.html and http://dev.mysql.com/doc/refman/5.0/en/explain.html yet cannot get this index to be used. I am running MySQL '5.1.17-beta-community-nt-debug' There are over 600,000 rows in table1 for my testing. I have tried placing indexes on: idx(c2, c3, c4, c5) and idx(c1, c2, c3, c4, c5) and indx(c5, c1, c2, c3, c4) with no result. Is there a reason I cannot get this query to use an index for grouping? Cheers, Andrew -- That which Voldemort does not value, he takes no trouble to comprehend. Of house-elves and children's tales, of love, loyalty, and innocence, Voldemort knows and understands nothing. Nothing. That they all have a power beyond his own, a power beyond the reach of any magic, is a truth he has never grasped. - JK Rowling -- MySQL General Mailing List For list archives: http://lists.mysql.com/mysql To unsubscribe:http://lists.mysql.com/[EMAIL PROTECTED]
RE: Using index for group-by: Not working?
It's just occurred to me that the IN clause is not a constant. This probably throws out any chance of using an index for group by? Cheers -Original Message- From: Andrew Armstrong [mailto:[EMAIL PROTECTED] Sent: Sunday, 29 July 2007 1:07 PM To: mysql@lists.mysql.com Subject: Using index for group-by: Not working? Hi, I have the following query: SELECT c2, c3, c4, Count(DISTINCT c5) FROM table1 WHERE c1 IN (1, 2, 3...) GROUP BY c2, c3, c4 order by null Yet I can only get it at best to show (under extra): Using where, using filesort. I have read up on: http://dev.mysql.com/doc/refman/5.0/en/loose-index-scan.html and http://dev.mysql.com/doc/refman/5.0/en/distinct-optimization.html and http://dev.mysql.com/doc/refman/5.0/en/explain.html yet cannot get this index to be used. I am running MySQL '5.1.17-beta-community-nt-debug' There are over 600,000 rows in table1 for my testing. I have tried placing indexes on: idx(c2, c3, c4, c5) and idx(c1, c2, c3, c4, c5) and indx(c5, c1, c2, c3, c4) with no result. Is there a reason I cannot get this query to use an index for grouping? Cheers, Andrew -- MySQL General Mailing List For list archives: http://lists.mysql.com/mysql To unsubscribe:http://lists.mysql.com/[EMAIL PROTECTED]
RE: Data Warehousing and MySQL vs PostgreSQL
Jochem, Yeah, I'm aware of partition pruning - and would be partitioning in such a way to ensure there is a small partition to be scanned/inserted to when required. I'm aware of timezone issues and we just use unix timestamps so there are no problems here. I'll look closer into MySQL's partitioning. Cheers - Andrew -Original Message- From: Jochem van Dieten [mailto:[EMAIL PROTECTED] Sent: Friday, 27 July 2007 6:44 PM To: mysql@lists.mysql.com Subject: Re: Data Warehousing and MySQL vs PostgreSQL On 7/26/07, Andrew Armstrong wrote: * Table 1: 80,000,000 rows - 9.5 GB * Table 2: 1,000,000,000 rows - 8.9 GB This is a generic star schema design for data warehousing. I have read that it is better if perhaps partitioning is implemented, where new data is added to a partitioned table (eg, that represents a single day) and then when those samples expire - simply drop the partition. I believe partitioning would solve issues with SELECT and INSERT performance because the actual index tree size (and data in the table itself) would be reduced. While partitioning will most likely alleviate your DML woes, partially by breaking it up and partially by changing DML to DDL, we can not make any reasonable statement about your SELECT performance since we don't know what type of queries you will be running. If your queries are going to cross all partitions partitioning is not going to help you much, if your queries typically only touch one partition it will help a lot. I am a bit hesitant however to go with PostgreSQL because the partitioning system seems a bit less easier to work with than MySQL (5.1's) implementation; as I would need to maintain my own master table for clients to query, and I do not think partition pruning is in use at this time (eg, the analyser can ignore partitions that wont contain any matching data based on the query being issued). The PostgreSQL planner will prune every partition it can determine not to have any matching data. Whether it can determine so depends on the query and it can be a bit picky (e.g. if you pick the wrong datatype for a timestamp it may not be a candidate for pruning because it has a timezone, which is a bit unexpected if you are not used to timestamps having a timezone at all). Jochem -- MySQL General Mailing List For list archives: http://lists.mysql.com/mysql To unsubscribe:http://lists.mysql.com/[EMAIL PROTECTED] -- MySQL General Mailing List For list archives: http://lists.mysql.com/mysql To unsubscribe:http://lists.mysql.com/[EMAIL PROTECTED]
Data Warehousing and MySQL vs PostgreSQL
Hello, I am seeking information on best practices with regards to Data Warehousing and MySQL. I am considering moving to PostgreSQL. I am currently using MySQL as the database of choice. I am now running into performance issues with regards to large tables. At the moment, I have the following general schema: * Table 1: 80,000,000 rows - 9.5 GB * Table 2: 1,000,000,000 rows - 8.9 GB And a few other misc tables (Period table to hold time values, which point to rows in Table 1 and Table 2). This is a generic star schema design for data warehousing. Information is added to this warehouse (DW) every five minutes. Information is deleted from this DW as well, after every five minutes. The data being recorded is time sensitive. As data ages, it may be deleted. Groups of samples are aggregated into a summary/aggregation sample prior to being deleted. I believe this is starting to cause fragmentation issues (please correct me if I am wrong) where old, deleted row positions get re-used by new rows on disk. This is possibly causing issues when aggregation needs to take place (eg, condense the last six samples of five minute intervals to a thirty minute sample). Would this possibly be causing overhead because the samples may be scattered around the disk instead of being in a line, due to old, deleted row positions (further back in the database table) now being used for new data? Is this a concern I should have? The data in these tables could also be segmented/partitioned into their own individual tables by about a factor of six. For example, five minute samples are kept for 48 hours or so and then expire. 30 minute aggregation samples are kept for longer, etc. I believe performance can be improved by separating these different samples for one. At the moment with MySQL, DELETE'ing from the table (in batches of say 10,000 rows every few seconds due to a LIMIT clause on the DELETE query), rows are removed very slowly. Additionally, INSERT's are sometimes very slow. I believe this is just due to the sheer amount of data involved in this single table. I have read that it is better if perhaps partitioning is implemented, where new data is added to a partitioned table (eg, that represents a single day) and then when those samples expire - simply drop the partition. I believe partitioning would solve issues with SELECT and INSERT performance because the actual index tree size (and data in the table itself) would be reduced. Additionally, expired samples can simply have their owning table/partition dropped. This would (as I see it) eliminate some performance issues with new rows being inserted, and aggregation of existing rows would only need to check a smaller subset of the overall table due to looking at only a handful of partitions instead of them all. Can anyone provide any insight on whether they have experienced these performance issues before? Keep in mind these issues are on MySQL, and I am wondering whether moving to PostgreSQL may be able to assist me. Concurrent DELETE's and INSERT's every five minutes also hurts clients trying to retrieve statistics, as the tables run on MyISAM (Table locks are needed for exclusive writes). From what I can tell, PostgreSQL does not require table locking for concurrent READ/INSERT/DELETE statements. Should I be worried about such a large table? My main concern is that its so slow because the index tree needs to be updated with each insert (and the tree is no doubt quite big with so many rows already). Will partitioning help here? I believe it would assist somewhat. I am a bit hesitant however to go with PostgreSQL because the partitioning system seems a bit less easier to work with than MySQL (5.1's) implementation; as I would need to maintain my own master table for clients to query, and I do not think partition pruning is in use at this time (eg, the analyser can ignore partitions that wont contain any matching data based on the query being issued). Any comments or assistance will be appreciated. Regards, Andrew
RE: Data Warehousing and MySQL vs PostgreSQL
Hey, Table 1 has 29 columns. Row length median is 78. Row size median is 126 bytes. Fixed table format. Table 2 has 6 columns. Row length median is 22. Row size median is 96 bytes. Fixed table format. There are three indexes on each of these tables. Table 1 has an index on 4 columns, and another two indexes on one column each. Table 2 has an index on 5 columns, and another two indexes on one column each. Regards, Andrew -Original Message- From: Ow Mun Heng [mailto:[EMAIL PROTECTED] Sent: Thursday, 26 July 2007 6:45 PM To: Andrew Armstrong Cc: mysql@lists.mysql.com Subject: Re: Data Warehousing and MySQL vs PostgreSQL On Thu, 2007-07-26 at 18:37 +1000, Andrew Armstrong wrote: Hello, I am seeking information on best practices with regards to Data Warehousing and MySQL. I am considering moving to PostgreSQL. * Table 1: 80,000,000 rows - 9.5 GB * Table 2: 1,000,000,000 rows - 8.9 GB Just curious, how many columns are there in each table? as the database seems small to me -- MySQL General Mailing List For list archives: http://lists.mysql.com/mysql To unsubscribe:http://lists.mysql.com/[EMAIL PROTECTED] -- MySQL General Mailing List For list archives: http://lists.mysql.com/mysql To unsubscribe:http://lists.mysql.com/[EMAIL PROTECTED]
RE: Data Warehousing and MySQL vs PostgreSQL
Do you have a suggestion to how this should be implemented? Data is aggregated over time and summary rows are created. -Original Message- From: Wallace Reis [mailto:[EMAIL PROTECTED] Sent: Thursday, 26 July 2007 8:43 PM To: Andrew Armstrong Cc: mysql@lists.mysql.com Subject: Re: Data Warehousing and MySQL vs PostgreSQL On 7/26/07, Andrew Armstrong [EMAIL PROTECTED] wrote: snip Information is deleted from this DW as well, after every five minutes. The data being recorded is time sensitive. As data ages, it may be deleted. Groups of samples are aggregated into a summary/aggregation sample prior to being deleted. I believe this is starting to cause fragmentation issues (please correct me if I am wrong) where old, deleted row positions get re-used by new rows on disk. There's something wrong. DW is a non-volatile database. Meaning that data in the database is never over-written or deleted, once committed, the data is static, read-only, but retained for future reporting. -- wallace reis/wreis Núcleo de Biologia Computacional e Gestão de Informações Biotecnológicas/LABBI -- MySQL General Mailing List For list archives: http://lists.mysql.com/mysql To unsubscribe:http://lists.mysql.com/[EMAIL PROTECTED] -- MySQL General Mailing List For list archives: http://lists.mysql.com/mysql To unsubscribe:http://lists.mysql.com/[EMAIL PROTECTED]
RE: Data Warehousing and MySQL vs PostgreSQL
I've already chosen granularity for samples. 5 minute samples for example can expire after two days. Aggregated/summarized rows of this data (30 minute sample for example, which is the aggregation of the past 30 minutes worth of 5 minute samples in a given window) expire after a week, etc. I'm more concerned as to why inserts begin to slow down so much due to the large table size. -Original Message- From: Wallace Reis [mailto:[EMAIL PROTECTED] Sent: Friday, 27 July 2007 1:02 AM To: Andrew Armstrong Cc: mysql@lists.mysql.com Subject: Re: Data Warehousing and MySQL vs PostgreSQL On 7/26/07, Andrew Armstrong [EMAIL PROTECTED] wrote: Do you have a suggestion to how this should be implemented? Data is aggregated over time and summary rows are created. I think that you didnt design correctly your DW. It should have just one very larger table (the fact table). Data should never be deleted. If your client want to query data about 'five minutes sample' when they are already expired? You should decide the data's granularity. And if you want to agregate them, do roll up. Or you can create materialized views for these aggregates. -- wallace reis/wreis Núcleo de Biologia Computacional e Gestão de Informações Biotecnológicas/LABBI -- MySQL General Mailing List For list archives: http://lists.mysql.com/mysql To unsubscribe:http://lists.mysql.com/[EMAIL PROTECTED] -- MySQL General Mailing List For list archives: http://lists.mysql.com/mysql To unsubscribe:http://lists.mysql.com/[EMAIL PROTECTED]
RE: Data Warehousing and MySQL vs PostgreSQL
Yep, thanks. I've been looking at MySQL's partitioning in 5.1 which seems the best choice. It's not GA yet, but I may consider using it as it appears to be working fine at the moment. -Original Message- From: Brent Baisley [mailto:[EMAIL PROTECTED] Sent: Friday, 27 July 2007 10:23 AM To: Andrew Armstrong Cc: 'Wallace Reis'; mysql@lists.mysql.com Subject: Re: Data Warehousing and MySQL vs PostgreSQL Wallace is right, Data Warehousing shouldn't delete any data. MySQL isn't as robust as say, Oracle, for partitioning so you need to fudge things a little. I think partitioning is the way to go and you should use MERGE tables to handle your partitions. Really what you are looking to do is create a 48 hour view, or whatever time frame you want. You can retain all the data, just contain a subset of it in a MERGE table. If you break out your tables into certain intervals, you can modify what tables are contained in a MERGE table on the fly. It's instantaneous since you are really only modifying a table description, not the physical structure. You can either reference the tables directly or reference the set through the MERGE table. There are certain gotchas with MERGE tables you need to be aware of (read the manual). An additional option is to use a feeder table that would be InnoDB. Your main tables would be MyISAM. Every 5 minutes you would grab the old data from the InnoDB table and insert it into the MyISAM/MERGE table setup. Then delete the records from the InnoDB table, which would only ever contain at most 10 minutes worth of data so adding and deleting should be quick. I've used both techniques and others to manage high insert, large tables. Although I only reached about 500 million records for a 1 month time period. On Jul 26, 2007, at 6:17 PM, Andrew Armstrong wrote: I've already chosen granularity for samples. 5 minute samples for example can expire after two days. Aggregated/summarized rows of this data (30 minute sample for example, which is the aggregation of the past 30 minutes worth of 5 minute samples in a given window) expire after a week, etc. I'm more concerned as to why inserts begin to slow down so much due to the large table size. -Original Message- From: Wallace Reis [mailto:[EMAIL PROTECTED] Sent: Friday, 27 July 2007 1:02 AM To: Andrew Armstrong Cc: mysql@lists.mysql.com Subject: Re: Data Warehousing and MySQL vs PostgreSQL On 7/26/07, Andrew Armstrong [EMAIL PROTECTED] wrote: Do you have a suggestion to how this should be implemented? Data is aggregated over time and summary rows are created. I think that you didnt design correctly your DW. It should have just one very larger table (the fact table). Data should never be deleted. If your client want to query data about 'five minutes sample' when they are already expired? You should decide the data's granularity. And if you want to agregate them, do roll up. Or you can create materialized views for these aggregates. -- wallace reis/wreis Núcleo de Biologia Computacional e Gestão de Informações Biotecnológicas/LABBI -- MySQL General Mailing List For list archives: http://lists.mysql.com/mysql To unsubscribe:http://lists.mysql.com/mysql? [EMAIL PROTECTED] -- MySQL General Mailing List For list archives: http://lists.mysql.com/mysql To unsubscribe:http://lists.mysql.com/mysql? [EMAIL PROTECTED] -- MySQL General Mailing List For list archives: http://lists.mysql.com/mysql To unsubscribe:http://lists.mysql.com/[EMAIL PROTECTED] -- MySQL General Mailing List For list archives: http://lists.mysql.com/mysql To unsubscribe:http://lists.mysql.com/[EMAIL PROTECTED]