Re: optimization help

2007-06-27 Thread David T. Ashley

On 6/27/07, Dave G [EMAIL PROTECTED] wrote:


Queries on this table when it gets large is slow as molasses.  I'm
thinking about making a new table for anything with a different test_id
 any opinions as to whether this is good or bad?



Hi Dave G.,

We need to know how:

a)How large the table might grow to, and

b)The queries you'll be executing on the table, and

c)The anticipated size of the result sets (from the queries).

Generally speaking, you are shooting for O(log N) on the queries.  You need
to be sure that the queries you are issuing combined with the database
design will allow that.

So, I need to know the specific queries.

Dave A.


Re: optimization help

2007-06-27 Thread Martin Gainty

Good Afternoon David

sounds as if you have a number of non-unique indices (even possibly FTS!) 
slowing down queries..this should help you concentrate on the slower indices

mysql
select TABLE_NAME,COLUMN_NAME,INDEX_NAME from INFORMATION_SCHEMA.STATISTICS 
where NON_UNIQUE=1;


Anyone else?
Martin--
This email message and any files transmitted with it contain confidential
information intended only for the person(s) to whom this email message is
addressed.  If you have received this email message in error, please notify
the sender immediately by telephone or email and destroy the original
message without making a copy.  Thank you.

- Original Message - 
From: Dave G [EMAIL PROTECTED]

To: mysql@lists.mysql.com
Sent: Wednesday, June 27, 2007 11:32 AM
Subject: optimization help



I have a table in my database (currently) that grows to be huge (and I
need to keep the data).  I'm in a redesign phase and I'm trying to do it
right.  So here are the relevant details:

The table has several keys involved:

mysql desc data__ProcessedDataFrames;
++--+--+-+-++
| Field  | Type | Null | Key | Default | Extra
|
++--+--+-+-++
| processed_id   | int(10) unsigned | NO   | PRI | NULL|
auto_increment |
| top_level_product_name | varchar(255) | YES  | MUL | NULL|
|
| test_id| int(10) unsigned | YES  | MUL | NULL|
|
| p_time | double   | YES  | MUL | NULL|
|
| processed_data | mediumblob   | YES  | | NULL|
|
++--+--+-+-++
6 rows in set (0.00 sec)

This is the table that contains the data I'm interested in currently.
Queries on this table when it gets large is slow as molasses.  I'm
thinking about making a new table for anything with a different test_id
 any opinions as to whether this is good or bad?

Before you make fun of me for my questions, I a bit new to database
programming.

If it is better design to break it into smaller tables (for speed anyway)
then I would need to know how to query over multiple tables as though it
was one table.  Join will do this, but that takes forever (unless of
course I may be doing this wrong), so that's not a good option.  I need to
be able to query over mutiple test_ids, which will be multiple tables, for
specific top_level_product_name, with in some time range (using p_time).

Any help would be appreciated.  I will happily give more information if
you need to offer an educated opinion.

Thanks

David Godsey


--
MySQL General Mailing List
For list archives: http://lists.mysql.com/mysql
To unsubscribe:http://lists.mysql.com/[EMAIL PROTECTED]





--
MySQL General Mailing List
For list archives: http://lists.mysql.com/mysql
To unsubscribe:http://lists.mysql.com/[EMAIL PROTECTED]



Re: optimization help

2007-06-27 Thread Dave G
I think I can do that:

I don't have any other indexes, just the keys.

mysql show create table data__ProcessedDataFrames;
+---+--+
| Table | Create Table   |
+---+--+
| data__ProcessedDataFrames | CREATE TABLE `data__ProcessedDataFrames` (
  `processed_id` int(10) unsigned NOT NULL auto_increment,
  `top_level_product_name` varchar(255) default NULL,
  `test_id` int(10) unsigned default NULL,
  `payload_time` double default NULL,
  `universal_time` double default NULL,
  `processed_data` mediumblob,
  PRIMARY KEY  (`processed_id`),
  KEY `test_id` (`test_id`),
  KEY `payload_time` (`payload_time`),
  KEY `top_level_product_name` (`top_level_product_name`)
) ENGINE=MyISAM AUTO_INCREMENT=1392568 DEFAULT CHARSET=latin1 |
+---+--+
1 row in set (0.00 sec)

mysql

As for the amount of data I expect to get  infinite really.

Our setup: we have serveral boxes we are running tests from, where sql is
the storage engine on each local box, then we will have a main storage
area for all relevant tests.  Based on passed data, the the tables size
will be pushing 4G as it is (when we port the data)  and expect at least
that much more over the life of this software, but since the payloads I
will be getting the data from have not been developed yet, I can't be
entirely sure.  One of the reasons I was inquiring as to whether breaking
it up into several tables would be a good idea is because it would make it
easier for me to merge the data from the different testers into the main
data repository that way.  Otherwise I will have to figure out a good way
of redoing the test_id in each test that is stored in the main repository.

Slow queries will be a little hard to show without giving a full
evaluation of my system.  So I'll simplify it a little.  I'm doing several
joins to get the right parameters to query this table in a stored
procedure  . but when it comes down to it, the query on this table is
the big one and I can modify my other joins, just making the query on this
table fast is my concern.  Example query:

select payload_time,HEX(processed_data) from data__ProcessedDataFrames
where test_id=18 AND top_level_product_name=DataProduct AND payload_time
 11808.74704 AND payload_time  1180564096.24967;

What I'm concerned about is with how much data I will eventually have,
even scanning over the KEYS will take a long time.

Thanks

Dave G.

BTW: heres the giant query that I use.
   SELECT
  E.product_id,
  product_name,
  D.top_level_product_name,
  processed_id,
  product_offset,
  product_length,
  version_id,
byte_order,
  ROUND(R.payload_time,6) as payload_time,
  SUBSTR(
 BINARY(processed_data),
 FLOOR(product_offset/8)+1,
 CEIL(product_length/8)) as substring,
(SELECT HEX(substring)) as raw_data,
  (SELECT toString (
substring,
round(char_length(raw_data)/2,0),
 data_type,
 (SELECT attribute_value FROM
config__DataProductAttributes WHERE attribute_name =
'FormatString' AND
   config__DataProductAttributes.product_id =
  E.product_id),
 product_offset % 8,
 (product_length + (product_offset % 8)) % 8,
 product_length,
 byte_order,
   

Re: optimization help

2007-06-27 Thread Dave G
I do, but I don't see any way around that with the data I have.

Dave G.

 Good Afternoon David

 sounds as if you have a number of non-unique indices (even possibly FTS!)
 slowing down queries..this should help you concentrate on the slower
 indices
 mysql
 select TABLE_NAME,COLUMN_NAME,INDEX_NAME from
 INFORMATION_SCHEMA.STATISTICS
 where NON_UNIQUE=1;

 Anyone else?
 Martin--
 This email message and any files transmitted with it contain confidential
 information intended only for the person(s) to whom this email message is
 addressed.  If you have received this email message in error, please
 notify
 the sender immediately by telephone or email and destroy the original
 message without making a copy.  Thank you.

 - Original Message -
 From: Dave G [EMAIL PROTECTED]
 To: mysql@lists.mysql.com
 Sent: Wednesday, June 27, 2007 11:32 AM
 Subject: optimization help


I have a table in my database (currently) that grows to be huge (and I
 need to keep the data).  I'm in a redesign phase and I'm trying to do it
 right.  So here are the relevant details:

 The table has several keys involved:

 mysql desc data__ProcessedDataFrames;
 ++--+--+-+-++
 | Field  | Type | Null | Key | Default |
 Extra
 |
 ++--+--+-+-++
 | processed_id   | int(10) unsigned | NO   | PRI | NULL|
 auto_increment |
 | top_level_product_name | varchar(255) | YES  | MUL | NULL|
 |
 | test_id| int(10) unsigned | YES  | MUL | NULL|
 |
 | p_time | double   | YES  | MUL | NULL|
 |
 | processed_data | mediumblob   | YES  | | NULL|
 |
 ++--+--+-+-++
 6 rows in set (0.00 sec)

 This is the table that contains the data I'm interested in currently.
 Queries on this table when it gets large is slow as molasses.  I'm
 thinking about making a new table for anything with a different test_id
  any opinions as to whether this is good or bad?

 Before you make fun of me for my questions, I a bit new to database
 programming.

 If it is better design to break it into smaller tables (for speed
 anyway)
 then I would need to know how to query over multiple tables as though it
 was one table.  Join will do this, but that takes forever (unless of
 course I may be doing this wrong), so that's not a good option.  I need
 to
 be able to query over mutiple test_ids, which will be multiple tables,
 for
 specific top_level_product_name, with in some time range (using p_time).

 Any help would be appreciated.  I will happily give more information if
 you need to offer an educated opinion.

 Thanks

 David Godsey


 --
 MySQL General Mailing List
 For list archives: http://lists.mysql.com/mysql
 To unsubscribe:
 http://lists.mysql.com/[EMAIL PROTECTED]






-- 
MySQL General Mailing List
For list archives: http://lists.mysql.com/mysql
To unsubscribe:http://lists.mysql.com/[EMAIL PROTECTED]



Re: optimization help

2007-06-27 Thread David T. Ashley

On 6/27/07, Dave G [EMAIL PROTECTED] wrote:


select payload_time,HEX(processed_data) from data__ProcessedDataFrames
where test_id=18 AND top_level_product_name=DataProduct AND payload_time
 11808.74704 AND payload_time  1180564096.24967;

What I'm concerned about is with how much data I will eventually have,
even scanning over the KEYS will take a long time.



Hi Dave,

In the case above, you want to be sure that everything involved in the query
is indexed or a key (probably the same thing).

To give an example, test_id=18 ... if that isn't indexed, it will be an
O(N) scan over all records in a table rather than an O(log N) retrieval
based on some BTREE or similar.  As a first step, be sure that everything
involved in a typical query is indexed.

For joins, the related columns should also be indexed.

Dave A.


Re: optimization help

2007-06-27 Thread Brent Baisley
That's quite a query. You may not be able to optimize it well with  
those nested selects. You may want to think about changing your query  
around a little, perhaps joining pieces of data using whatever  
programming language you're using on the front end. You have MySQL  
doing a lot of work and perhaps transferring a lot of data.
If some of those selects are pulling data that is redundant across  
many rows, if may be more efficient to join them on the front end  
using arrays or something similar.


Another alternative would be to use a different table type like a  
MERGE table. That allows you to query multiple tables as one, which  
is something you had asked about. You need to be appear of it's  
limitations, like unique indexes not being enforced across tables. So  
if you want to use auto increment, you need to set the value when you  
create a new table to add it to the merge setup.



On Jun 27, 2007, at 12:16 PM, Dave G wrote:


I think I can do that:

I don't have any other indexes, just the keys.

mysql show create table data__ProcessedDataFrames;
+--- 
+- 
-- 
-- 
-- 
-- 
-- 
-- 
-+

| Table | Create Table   |
+--- 
+- 
-- 
-- 
-- 
-- 
-- 
-- 
-+
| data__ProcessedDataFrames | CREATE TABLE  
`data__ProcessedDataFrames` (

  `processed_id` int(10) unsigned NOT NULL auto_increment,
  `top_level_product_name` varchar(255) default NULL,
  `test_id` int(10) unsigned default NULL,
  `payload_time` double default NULL,
  `universal_time` double default NULL,
  `processed_data` mediumblob,
  PRIMARY KEY  (`processed_id`),
  KEY `test_id` (`test_id`),
  KEY `payload_time` (`payload_time`),
  KEY `top_level_product_name` (`top_level_product_name`)
) ENGINE=MyISAM AUTO_INCREMENT=1392568 DEFAULT CHARSET=latin1 |
+--- 
+- 
-- 
-- 
-- 
-- 
-- 
-- 
-+

1 row in set (0.00 sec)

mysql

As for the amount of data I expect to get  infinite really.

Our setup: we have serveral boxes we are running tests from, where  
sql is

the storage engine on each local box, then we will have a main storage
area for all relevant tests.  Based on passed data, the the tables  
size
will be pushing 4G as it is (when we port the data)  and expect at  
least
that much more over the life of this software, but since the  
payloads I

will be getting the data from have not been developed yet, I can't be
entirely sure.  One of the reasons I was inquiring as to whether  
breaking
it up into several tables would be a good idea is because it would  
make it
easier for me to merge the data from the different testers into the  
main
data repository that way.  Otherwise I will have to figure out a  
good way
of redoing the test_id in each test that is stored in the main  
repository.


Slow queries will be a little hard to show without giving a full
evaluation of my system.  So I'll simplify it a little.  I'm doing  
several

joins to get the right parameters to query this table in a stored
procedure  . but when it comes down to it, the query on this  
table is
the big one and I can modify my other joins, just making the query  
on this

table fast is my concern.  Example query:

select payload_time,HEX(processed_data) from data__ProcessedDataFrames
where test_id=18 AND top_level_product_name=DataProduct AND  
payload_time

11808.74704 AND payload_time  1180564096.24967;


What I'm concerned about is with how much 

Re: Optimization help

2004-01-12 Thread Douglas Sims
I think... you don't have an index on the Incident field itself, just 
on (Date, Incident, Type, Task) which means that it concatenates those 
fields and orders the result - thus this may be virtually useless if 
you're looking for a specific incident within a large date range.  Since 
your query has a specific incident number,  indexing that field would 
probably help a lot.

Do a SHOW INDEXES FROM DeltaPAF;

To see the indexes that are actually there.

or

EXPLAIN SELECT Date FROM DeltaPAF WHERE Date=2003-12-11 AND 
Date=2004-01-11 AND Incident=98996144;

to see which indexes MySQL is really using.

For example, in the table below, there are really only two indexes, the 
one primary key index and the second name index.  The Seq_in_index 
column shows the fields that are included in the index but the ones that 
aren't listed first will be much harder to find.  Like a telephone 
directory, which is ordered by lastname, firstname - both fields are 
indexed but they are in the same index, so finding a specific firstname 
still means a full table scan.

Good luck!

mysql describe test1;
++-+--+-++---+
| Field  | Type| Null | Key | Default| Extra |
++-+--+-++---+
| name   | varchar(20) | YES  | MUL | NULL   |   |
| mydate | date|  | PRI | -00-00 |   |
| number | int(10) |  | PRI | 0  |   |
++-+--+-++---+
3 rows in set (0.00 sec)
mysql show indexes from test1;
+---++--+--+-+---+-+--++--++-+ 

| Table | Non_unique | Key_name | Seq_in_index | Column_name | Collation 
| Cardinality | Sub_part | Packed | Null | Index_type | Comment |
+---++--+--+-+---+-+--++--++-+ 

| test1 |  0 | PRIMARY  |1 | mydate  | A 
|NULL | NULL | NULL   |  | BTREE  | |
| test1 |  0 | PRIMARY  |2 | number  | A 
|   0 | NULL | NULL   |  | BTREE  | |
| test1 |  1 | name |1 | name| A 
|NULL | NULL | NULL   | YES  | BTREE  | |
| test1 |  1 | name |2 | mydate  | A 
|NULL | NULL | NULL   |  | BTREE  | |
| test1 |  1 | name |3 | number  | A 
|NULL | NULL | NULL   |  | BTREE  | |
+---++--+--+-+---+-+--++--++-+ 

5 rows in set (0.15 sec)



Mike Schienle wrote:

Hi all -

I have a speed problem that I don't understand. I've been pretty 
active with DB's for a few years, but I'm no expert, so let me know if 
I'm missing the obvious. I have Paul DuBois' MySQL book (New Riders 
edition) and Descartes and Bunce's Programming DBI book on my desk, so 
feel free to reference something there if that will help.

Here's the table I'm working from and it's structure:
CREATE TABLE DeltaPAF (
  Date  DATE NOT NULL,
  Type  VARCHAR(4) NOT NULL,
  Incident  INT UNSIGNED NOT NULL,
  Mgr   VARCHAR(4) NOT NULL,
  Site  VARCHAR(40) NOT NULL,
  Task  ENUM('Proposed', 'Approved', 'Completed', 'Invoiced',
 'Expired', 'Rejected', 'Cancelled') NOT NULL,
  Webpage   MEDIUMTEXT NOT NULL,
  BudgetDECIMAL(12, 2) DEFAULT 0.00,
  PRIMARY KEY (Date, Incident, Type, Task),
  INDEX (Type, Mgr, Site)
);
I have about 125,000 records in the table and it's running on an older 
400 MHz MacOS X 10.2.8 system. The MySQL version is 3.23.52.

The following query comes back with 210 records in about 0.6 seconds.
mysql SELECT Date FROM DeltaPAF WHERE Date=2003-12-11
- AND Date=2004-01-11 AND Incident=98996144;
However, this query comes back with 210 records in a little over 2 
minutes.
mysql SELECT Budget FROM DeltaPAF WHERE Date=2003-12-11
- AND Date=2004-01-11 AND Incident=98996144;

Can someone clue me in how I might get the SELECT Budget query to 
return in a similar time to the SELECT Date query? I tried adding an 
index for Budget, knowing it shouldn't help, and it didn't. FWIW, the 
Webpage fields average about 5K characters, but can be as much as 40K.

Mike Schienle, Custom Visuals
http://www.customvisuals.com/



--
MySQL General Mailing List
For list archives: http://lists.mysql.com/mysql
To unsubscribe:http://lists.mysql.com/[EMAIL PROTECTED]


Re: Optimization help

2004-01-12 Thread Mike Schienle
On Jan 12, 2004, at 08:09 AM, Douglas Sims wrote:

I think... you don't have an index on the Incident field itself, 
just on (Date, Incident, Type, Task) which means that it concatenates 
those fields and orders the result - thus this may be virtually 
useless if you're looking for a specific incident within a large date 
range.  Since your query has a specific incident number,  indexing 
that field would probably help a lot.
Thanks for the help, Douglas. That was the ticket. We're back under a 
second for queries now.

Mike Schienle, Custom Visuals
http://www.customvisuals.com/
--
MySQL General Mailing List
For list archives: http://lists.mysql.com/mysql
To unsubscribe:http://lists.mysql.com/[EMAIL PROTECTED]