I have a situation where a query using indexing takes about 2x longer
than non-indexed to complete. I'd greatly appreciate any suggestions as
to how to improve the results or, if not possible, an explanation for
the results (ex. My data represents a worst-case scenario for MySQL
indexing algorithm.) 


Table Details----------

-Schema:

timestamp   CHAR(14) not null,
f2   CHAR(2)  not null,
category   CHAR(12) not null,
f4   CHAR(10) not null, 
f5   CHAR(13) not null,
f6   CHAR(2) not null,
f7   CHAR(5)

-timestamp is of the form yyyymmddhhmmss

-Table comprised of 3.5 million records of production data.

-timestamp has 1 of 86,400 unique values
-category has 1 of ~9,000 unique values

-Table is loaded using INFILE, then the indexing timestamp and category
fields are indexed separately. 

-The raw data has been pre-sorted on the timestamp field in ascending
order. 


Test Query-----------

select timestamp,f4 from TABLE_NAME where
timestamp between '19981117000000' and '19981117010000' 
and category = 'ABC'

Returns 12,685 records in 58 seconds non-indexed and 114 seconds indexed
averaged over ten iterations of the test query.


Operating Environment------
Perl 5.004_04

-MySQL Host:
Running tests on:    SunOS 5.6 sun4u
Server version:      MySQL 3.21.33b
SPARCserver 1000
Main Memory 128 MB
Virtual Memory is 461 MB

-MySQL Variables:
| join_buffer        | 131072 
| key_buffer         | 1048568
| max_allowed_packet |
65536                                                                     
| max_connections    |
90                                                                        
| max_join_size      |
4294967295                                                                
| max_sort_length    |
1024                                                                      
| net_buffer_length  | 8192        
| record_buffer      | 131072 
| skip_locking       | 0
| sort_buffer        |
2097144                                                                   
| table_cache        |
64                                                                        
| thread_stack       |
65536                                                                     
| tmp_table_size     | 1048576 

-All testing was run after hours on a load free client and mysql server
machines. Network traffic was negligible.

-When in production, the table will be loaded only once per day during
off peak hours (no other inserts or updates after bulk load). I can
expect ~50 queries / day resembling with each query resembling the test
query.


Actions Taken / Results------

-Verified that indexing was correct and actively being used by using
command line mysql and EXPLAIN. Also verified results by querying
against a smaller (100,000) record table which expectedly resulted in
the indexed query returning faster than the non-indexed query.

mysql> explain select timestamp,f4 from PILC where timestamp between
'19981117000000' and '19981117010000' and category = 'FRAME_CHK';

| table | type | possible_keys  | key    | key_len | ref       | rows |
Extra                     |
| PILC  | ref  | timendx,catendx | abcndx |      12 | FRAME_CHK |   11 |
range used on key timendx |


-From command line mysql:
indexed: 12658 rows in set (1 min 19.66 sec)
non-indexed: 12658 rows in set (2 min 5.91 sec)


-Modified MySQL parameters as indicated in Chapter 10 of documentation:

Run #1: record_buffer 131072    
Query returns in 54.6 secs (averaged over 10 trials)

Run #2: record_buffer 1044480 
Query returns in 70.8 secs (averaged over 10 trials)

Run#3:  -O key_buffer=16M -O table_cache=128 -O sort_buffer=4M -O
record_buffer=1M 
Query returns in 72.7 secs (averaged over 10 trials)


-Ran test-ATIS benchmark script (note: these scripts do not use
indexing)
No test measurement varied by more than 5%.

Run 1:
ATIS table test

Creating tables
Time for create_table (28):  3 secs ( 0.00 usr  0.01 sys =  0.01 cpu)

Inserting data
Time to insert (9768): 32 secs ( 2.03 usr  2.16 sys =  4.19 cpu)

Retrieving data
Time for select_simple_join (500): 18 secs ( 2.34 usr  1.89 sys =  4.23
cpu)
Time for select_join (100): 21 secs ( 1.85 usr  1.58 sys =  3.43 cpu)
Time for select_distinct (800): 173 secs ( 7.96 usr  5.98 sys = 13.94
cpu)
Time for select_group (2400): 134 secs ( 5.26 usr  2.75 sys =  8.01 cpu)

Removing tables
Time to drop_table (28):  3 secs ( 0.00 usr  0.01 sys =  0.01 cpu)
Total time: 384 secs (19.45 usr 14.38 sys = 33.83 cpu)
 

Run 2:
ATIS table test

Creating tables
Time for create_table (28):  3 secs ( 0.01 usr  0.02 sys =  0.03 cpu)

Inserting data
Time to insert (9768): 31 secs ( 1.88 usr  2.59 sys =  4.47 cpu)

Retrieving data
Time for select_simple_join (500): 17 secs ( 2.47 usr  1.87 sys =  4.34
cpu)
Time for select_join (100): 22 secs ( 2.05 usr  1.44 sys =  3.49 cpu)
Time for select_distinct (800): 172 secs ( 8.47 usr  6.55 sys = 15.02
cpu)
Time for select_group (2400): 133 secs ( 5.40 usr  2.85 sys =  8.25 cpu)

Removing tables
Time to drop_table (28):  2 secs ( 0.01 usr  0.00 sys =  0.01 cpu)
Total time: 380 secs (20.31 usr 15.33 sys = 35.64 cpu)


Run 3:
ATIS table test

Creating tables
Time for create_table (28):  3 secs ( 0.00 usr  0.04 sys =  0.04 cpu)

Inserting data
Time to insert (9768): 30 secs ( 2.19 usr  2.65 sys =  4.84 cpu)

Retrieving data
Time for select_simple_join (500): 17 secs ( 2.30 usr  1.99 sys =  4.29
cpu)
Time for select_join (100): 23 secs ( 1.93 usr  1.71 sys =  3.64 cpu)
Time for select_distinct (800): 178 secs ( 8.04 usr  7.05 sys = 15.09
cpu)
Time for select_group (2400): 135 secs ( 4.75 usr  3.09 sys =  7.84 cpu)

Removing tables
Time to drop_table (28):  3 secs ( 0.00 usr  0.00 sys =  0.00 cpu)
Total time: 389 secs (19.24 usr 16.53 sys = 35.77 cpu)


-Changed timestamp from CHAR to DATETIME (thought:  more efficient type
= better performance WRONG! though it did reduce table space usage by
11%)
Query returned in 332 secs. non-indexed vs. 162 secs. indexed.


Notes--------

-I tried the same test using a very popular commercial database (VPCDB)
with minimal tuning (I'm not a DBA just yet) using the same client,
host, perl, DBI, environment, etc. and received the same results (ie.
index query slower than non-indexed query for large table), though MySQL
was faster :-)  Those stats:
MySQL: Non-indexed 58 vs. Indexed 114 secs
VPCDB: Non-Indexed 118 vs. Indexed 167 secs

-My application requires speed (it will be web-based) but not
transaction processing (though robust backup/recovery (hot?) is
important).

-Not sure if executable was complied statically / mit-threads / etc.
since compile log file missing.



Thanks for your advice/suggestions, Frank V.

Big thanks to TCX and Monty in particular for MySQL!


-- 

Frank Vetesi            Email: [EMAIL PROTECTED]
Lockheed Martin                 
Advanced Technology Labs        Voice: 609-338-3923
1 Federal Street - A&E 3W       Fax:   609-338-4122     
Camden, NJ 08102                MARCALL: 8*794-3923
-----------------------------------------------------------
Send a mail to [EMAIL PROTECTED] with
unsubscribe mysql [EMAIL PROTECTED]
in the body of the message to unsubscribe from this list.

Reply via email to