Re: optimization help
That's quite a query. You may not be able to optimize it well with those nested selects. You may want to think about changing your query around a little, perhaps joining pieces of data using whatever programming language you're using on the front end. You have MySQL doing a lot of work and perhaps transferring a lot of data. If some of those selects are pulling data that is redundant across many rows, if may be more efficient to join them on the front end using arrays or something similar. Another alternative would be to use a different table type like a MERGE table. That allows you to query multiple tables as one, which is something you had asked about. You need to be appear of it's limitations, like unique indexes not being enforced across tables. So if you want to use auto increment, you need to set the value when you create a new table to add it to the merge setup. On Jun 27, 2007, at 12:16 PM, Dave G wrote: I think I can do that: I don't have any other indexes, just the keys. mysql> show create table data__ProcessedDataFrames; +--- +- -- -- -- -- -- -- -+ | Table | Create Table | +--- +- -- -- -- -- -- -- -+ | data__ProcessedDataFrames | CREATE TABLE `data__ProcessedDataFrames` ( `processed_id` int(10) unsigned NOT NULL auto_increment, `top_level_product_name` varchar(255) default NULL, `test_id` int(10) unsigned default NULL, `payload_time` double default NULL, `universal_time` double default NULL, `processed_data` mediumblob, PRIMARY KEY (`processed_id`), KEY `test_id` (`test_id`), KEY `payload_time` (`payload_time`), KEY `top_level_product_name` (`top_level_product_name`) ) ENGINE=MyISAM AUTO_INCREMENT=1392568 DEFAULT CHARSET=latin1 | +--- +- -- -- -- -- -- -- -+ 1 row in set (0.00 sec) mysql> As for the amount of data I expect to get infinite really. Our setup: we have serveral boxes we are running tests from, where sql is the storage engine on each local box, then we will have a main storage area for all relevant tests. Based on passed data, the the tables size will be pushing 4G as it is (when we port the data) and expect at least that much more over the life of this software, but since the payloads I will be getting the data from have not been developed yet, I can't be entirely sure. One of the reasons I was inquiring as to whether breaking it up into several tables would be a good idea is because it would make it easier for me to merge the data from the different testers into the main data repository that way. Otherwise I will have to figure out a good way of redoing the test_id in each test that is stored in the main repository. Slow queries will be a little hard to show without giving a full evaluation of my system. So I'll simplify it a little. I'm doing several joins to get the right parameters to query this table in a stored procedure . but when it comes down to it, the query on this table is the big one and I can modify my other joins, just making the query on this table fast is my concern. Example query: select payload_time,HEX(processed_data) from data__ProcessedDataFrames where test_id=18 AND top_level_product_name="DataProduct" AND payload_time 11808.74704 AND payload_time < 1180564096.24967; What I'm concerned about is with how mu
Re: optimization help
On 6/27/07, Dave G <[EMAIL PROTECTED]> wrote: select payload_time,HEX(processed_data) from data__ProcessedDataFrames where test_id=18 AND top_level_product_name="DataProduct" AND payload_time > 11808.74704 AND payload_time < 1180564096.24967; What I'm concerned about is with how much data I will eventually have, even scanning over the KEYS will take a long time. Hi Dave, In the case above, you want to be sure that everything involved in the query is indexed or a key (probably the same thing). To give an example, "test_id=18" ... if that isn't indexed, it will be an O(N) scan over all records in a table rather than an O(log N) retrieval based on some BTREE or similar. As a first step, be sure that everything involved in a typical query is indexed. For joins, the related columns should also be indexed. Dave A.
Re: optimization help
I do, but I don't see any way around that with the data I have. Dave G. > Good Afternoon David > > sounds as if you have a number of non-unique indices (even possibly FTS!) > slowing down queries..this should help you concentrate on the slower > indices > mysql> > select TABLE_NAME,COLUMN_NAME,INDEX_NAME from > INFORMATION_SCHEMA.STATISTICS > where NON_UNIQUE=1; > > Anyone else? > Martin-- > This email message and any files transmitted with it contain confidential > information intended only for the person(s) to whom this email message is > addressed. If you have received this email message in error, please > notify > the sender immediately by telephone or email and destroy the original > message without making a copy. Thank you. > > - Original Message - > From: "Dave G" <[EMAIL PROTECTED]> > To: > Sent: Wednesday, June 27, 2007 11:32 AM > Subject: optimization help > > >>I have a table in my database (currently) that grows to be huge (and I >> need to keep the data). I'm in a redesign phase and I'm trying to do it >> right. So here are the relevant details: >> >> The table has several keys involved: >> >> mysql> desc data__ProcessedDataFrames; >> ++--+--+-+-++ >> | Field | Type | Null | Key | Default | >> Extra >> | >> ++--+--+-+-++ >> | processed_id | int(10) unsigned | NO | PRI | NULL| >> auto_increment | >> | top_level_product_name | varchar(255) | YES | MUL | NULL| >> | >> | test_id| int(10) unsigned | YES | MUL | NULL| >> | >> | p_time | double | YES | MUL | NULL| >> | >> | processed_data | mediumblob | YES | | NULL| >> | >> ++--+--+-+-++ >> 6 rows in set (0.00 sec) >> >> This is the table that contains the data I'm interested in currently. >> Queries on this table when it gets large is slow as molasses. I'm >> thinking about making a new table for anything with a different test_id >> any opinions as to whether this is good or bad? >> >> Before you make fun of me for my questions, I a bit new to database >> programming. >> >> If it is better design to break it into smaller tables (for speed >> anyway) >> then I would need to know how to query over multiple tables as though it >> was one table. Join will do this, but that takes forever (unless of >> course I may be doing this wrong), so that's not a good option. I need >> to >> be able to query over mutiple test_ids, which will be multiple tables, >> for >> specific top_level_product_name, with in some time range (using p_time). >> >> Any help would be appreciated. I will happily give more information if >> you need to offer an educated opinion. >> >> Thanks >> >> David Godsey >> >> >> -- >> MySQL General Mailing List >> For list archives: http://lists.mysql.com/mysql >> To unsubscribe: >> http://lists.mysql.com/[EMAIL PROTECTED] >> >> > -- MySQL General Mailing List For list archives: http://lists.mysql.com/mysql To unsubscribe:http://lists.mysql.com/[EMAIL PROTECTED]
Re: optimization help
I think I can do that: I don't have any other indexes, just the keys. mysql> show create table data__ProcessedDataFrames; +---+--+ | Table | Create Table | +---+--+ | data__ProcessedDataFrames | CREATE TABLE `data__ProcessedDataFrames` ( `processed_id` int(10) unsigned NOT NULL auto_increment, `top_level_product_name` varchar(255) default NULL, `test_id` int(10) unsigned default NULL, `payload_time` double default NULL, `universal_time` double default NULL, `processed_data` mediumblob, PRIMARY KEY (`processed_id`), KEY `test_id` (`test_id`), KEY `payload_time` (`payload_time`), KEY `top_level_product_name` (`top_level_product_name`) ) ENGINE=MyISAM AUTO_INCREMENT=1392568 DEFAULT CHARSET=latin1 | +---+--+ 1 row in set (0.00 sec) mysql> As for the amount of data I expect to get infinite really. Our setup: we have serveral boxes we are running tests from, where sql is the storage engine on each local box, then we will have a main storage area for all relevant tests. Based on passed data, the the tables size will be pushing 4G as it is (when we port the data) and expect at least that much more over the life of this software, but since the payloads I will be getting the data from have not been developed yet, I can't be entirely sure. One of the reasons I was inquiring as to whether breaking it up into several tables would be a good idea is because it would make it easier for me to merge the data from the different testers into the main data repository that way. Otherwise I will have to figure out a good way of redoing the test_id in each test that is stored in the main repository. Slow queries will be a little hard to show without giving a full evaluation of my system. So I'll simplify it a little. I'm doing several joins to get the right parameters to query this table in a stored procedure . but when it comes down to it, the query on this table is the big one and I can modify my other joins, just making the query on this table fast is my concern. Example query: select payload_time,HEX(processed_data) from data__ProcessedDataFrames where test_id=18 AND top_level_product_name="DataProduct" AND payload_time > 11808.74704 AND payload_time < 1180564096.24967; What I'm concerned about is with how much data I will eventually have, even scanning over the KEYS will take a long time. Thanks Dave G. BTW: heres the giant query that I use. SELECT E.product_id, product_name, D.top_level_product_name, processed_id, product_offset, product_length, version_id, byte_order, ROUND(R.payload_time,6) as payload_time, SUBSTR( BINARY(processed_data), FLOOR(product_offset/8)+1, CEIL(product_length/8)) as substring, (SELECT HEX(substring)) as raw_data, (SELECT toString ( substring, round(char_length(raw_data)/2,0), data_type, (SELECT attribute_value FROM config__DataProductAttributes WHERE attribute_name = 'FormatString' AND config__DataProductAttributes.product_id = E.product_id), product_offset % 8, (product_length + (product_offset % 8)) % 8, product_length, byte_orde
Re: optimization help
Good Afternoon David sounds as if you have a number of non-unique indices (even possibly FTS!) slowing down queries..this should help you concentrate on the slower indices mysql> select TABLE_NAME,COLUMN_NAME,INDEX_NAME from INFORMATION_SCHEMA.STATISTICS where NON_UNIQUE=1; Anyone else? Martin-- This email message and any files transmitted with it contain confidential information intended only for the person(s) to whom this email message is addressed. If you have received this email message in error, please notify the sender immediately by telephone or email and destroy the original message without making a copy. Thank you. - Original Message - From: "Dave G" <[EMAIL PROTECTED]> To: Sent: Wednesday, June 27, 2007 11:32 AM Subject: optimization help I have a table in my database (currently) that grows to be huge (and I need to keep the data). I'm in a redesign phase and I'm trying to do it right. So here are the relevant details: The table has several keys involved: mysql> desc data__ProcessedDataFrames; ++--+--+-+-++ | Field | Type | Null | Key | Default | Extra | ++--+--+-+-++ | processed_id | int(10) unsigned | NO | PRI | NULL| auto_increment | | top_level_product_name | varchar(255) | YES | MUL | NULL| | | test_id| int(10) unsigned | YES | MUL | NULL| | | p_time | double | YES | MUL | NULL| | | processed_data | mediumblob | YES | | NULL| | ++--+--+-+-++ 6 rows in set (0.00 sec) This is the table that contains the data I'm interested in currently. Queries on this table when it gets large is slow as molasses. I'm thinking about making a new table for anything with a different test_id any opinions as to whether this is good or bad? Before you make fun of me for my questions, I a bit new to database programming. If it is better design to break it into smaller tables (for speed anyway) then I would need to know how to query over multiple tables as though it was one table. Join will do this, but that takes forever (unless of course I may be doing this wrong), so that's not a good option. I need to be able to query over mutiple test_ids, which will be multiple tables, for specific top_level_product_name, with in some time range (using p_time). Any help would be appreciated. I will happily give more information if you need to offer an educated opinion. Thanks David Godsey -- MySQL General Mailing List For list archives: http://lists.mysql.com/mysql To unsubscribe:http://lists.mysql.com/[EMAIL PROTECTED] -- MySQL General Mailing List For list archives: http://lists.mysql.com/mysql To unsubscribe:http://lists.mysql.com/[EMAIL PROTECTED]
Re: optimization help
On 6/27/07, Dave G <[EMAIL PROTECTED]> wrote: Queries on this table when it gets large is slow as molasses. I'm thinking about making a new table for anything with a different test_id any opinions as to whether this is good or bad? Hi Dave G., We need to know how: a)How large the table might grow to, and b)The queries you'll be executing on the table, and c)The anticipated size of the result sets (from the queries). Generally speaking, you are shooting for O(log N) on the queries. You need to be sure that the queries you are issuing combined with the database design will allow that. So, I need to know the specific queries. Dave A.
Re: Optimization help
On Jan 12, 2004, at 08:09 AM, Douglas Sims wrote: I think... you don't have an index on the "Incident" field itself, just on (Date, Incident, Type, Task) which means that it concatenates those fields and orders the result - thus this may be virtually useless if you're looking for a specific incident within a large date range. Since your query has a specific incident number, indexing that field would probably help a lot. Thanks for the help, Douglas. That was the ticket. We're back under a second for queries now. Mike Schienle, Custom Visuals http://www.customvisuals.com/ -- MySQL General Mailing List For list archives: http://lists.mysql.com/mysql To unsubscribe:http://lists.mysql.com/[EMAIL PROTECTED]
Re: Optimization help
I think... you don't have an index on the "Incident" field itself, just on (Date, Incident, Type, Task) which means that it concatenates those fields and orders the result - thus this may be virtually useless if you're looking for a specific incident within a large date range. Since your query has a specific incident number, indexing that field would probably help a lot. Do a SHOW INDEXES FROM DeltaPAF; To see the indexes that are actually there. or EXPLAIN SELECT Date FROM DeltaPAF WHERE Date>="2003-12-11" AND Date<="2004-01-11" AND Incident=98996144; to see which indexes MySQL is really using. For example, in the table below, there are really only two indexes, the one primary key index and the second name index. The Seq_in_index column shows the fields that are included in the index but the ones that aren't listed first will be much harder to find. Like a telephone directory, which is ordered by lastname, firstname - both fields are indexed but they are in the same index, so finding a specific firstname still means a full table scan. Good luck! mysql> describe test1; ++-+--+-++---+ | Field | Type| Null | Key | Default| Extra | ++-+--+-++---+ | name | varchar(20) | YES | MUL | NULL | | | mydate | date| | PRI | -00-00 | | | number | int(10) | | PRI | 0 | | ++-+--+-++---+ 3 rows in set (0.00 sec) mysql> show indexes from test1; +---++--+--+-+---+-+--++--++-+ | Table | Non_unique | Key_name | Seq_in_index | Column_name | Collation | Cardinality | Sub_part | Packed | Null | Index_type | Comment | +---++--+--+-+---+-+--++--++-+ | test1 | 0 | PRIMARY |1 | mydate | A |NULL | NULL | NULL | | BTREE | | | test1 | 0 | PRIMARY |2 | number | A | 0 | NULL | NULL | | BTREE | | | test1 | 1 | name |1 | name| A |NULL | NULL | NULL | YES | BTREE | | | test1 | 1 | name |2 | mydate | A |NULL | NULL | NULL | | BTREE | | | test1 | 1 | name |3 | number | A |NULL | NULL | NULL | | BTREE | | +---++--+--+-+---+-+--++--++-+ 5 rows in set (0.15 sec) Mike Schienle wrote: Hi all - I have a speed problem that I don't understand. I've been pretty active with DB's for a few years, but I'm no expert, so let me know if I'm missing the obvious. I have Paul DuBois' MySQL book (New Riders edition) and Descartes and Bunce's Programming DBI book on my desk, so feel free to reference something there if that will help. Here's the table I'm working from and it's structure: CREATE TABLE DeltaPAF ( Date DATE NOT NULL, Type VARCHAR(4) NOT NULL, Incident INT UNSIGNED NOT NULL, Mgr VARCHAR(4) NOT NULL, Site VARCHAR(40) NOT NULL, Task ENUM('Proposed', 'Approved', 'Completed', 'Invoiced', 'Expired', 'Rejected', 'Cancelled') NOT NULL, Webpage MEDIUMTEXT NOT NULL, BudgetDECIMAL(12, 2) DEFAULT 0.00, PRIMARY KEY (Date, Incident, Type, Task), INDEX (Type, Mgr, Site) ); I have about 125,000 records in the table and it's running on an older 400 MHz MacOS X 10.2.8 system. The MySQL version is 3.23.52. The following query comes back with 210 records in about 0.6 seconds. mysql> SELECT Date FROM DeltaPAF WHERE Date>="2003-12-11" -> AND Date<="2004-01-11" AND Incident=98996144; However, this query comes back with 210 records in a little over 2 minutes. mysql> SELECT Budget FROM DeltaPAF WHERE Date>="2003-12-11" -> AND Date<="2004-01-11" AND Incident=98996144; Can someone clue me in how I might get the SELECT Budget query to return in a similar time to the SELECT Date query? I tried adding an index for Budget, knowing it shouldn't help, and it didn't. FWIW, the Webpage fields average about 5K characters, but can be as much as 40K. Mike Schienle, Custom Visuals http://www.customvisuals.com/ -- MySQL General Mailing List For list archives: http://lists.mysql.com/mysql To unsubscribe:http://lists.mysql.com/[EMAIL PROTECTED]