Hi all. Im looking for some help to speed up some queries I have to run. Ill try to
briefly describe the setup, but mostly try let my code explain itself.
There are three tables described below from a database that describes the execution of
a Java program. class_loads records details of each class. methods records details of
each method. Each method is associated with a record from class_loads via the class_id
field. Finally method_executions contains info about method executions by mapping
method_ids (which can be used to join with methods table) to event_numbers which
represent time.
mysql> describe class_loads;
+--------------+---------------------+------+-----+---------+-------+
| Field | Type | Null | Key | Default | Extra |
+--------------+---------------------+------+-----+---------+-------+
| event_number | bigint(20) unsigned | | PRI | 0 | |
| class_id | int(10) unsigned | | PRI | 0 | |
| class_name | varchar(255) | | MUL | | |
| source_name | varchar(255) | | | | |
+--------------+---------------------+------+-----+---------+-------+
5 rows in set (0.08 sec)
mysql> describe methods;
+------------------+------------------+------+-----+---------+-------+
| Field | Type | Null | Key | Default | Extra |
+------------------+------------------+------+-----+---------+-------+
| method_id | int(10) unsigned | | PRI | 0 | |
| class_id | int(10) unsigned | | MUL | 0 | |
| method_name | varchar(255) | | | | |
| method_signature | varchar(255) | | | | |
+------------------+------------------+------+-----+---------+-------+
4 rows in set (0.00 sec)
mysql> describe method_executions;
+--------------+---------------------+------+-----+---------+-------+
| Field | Type | Null | Key | Default | Extra |
+--------------+---------------------+------+-----+---------+-------+
| event_number | bigint(20) unsigned | | PRI | 0 | |
| method_id | int(10) unsigned | | | 0 | |
+--------------+---------------------+------+-----+---------+-------+
3 rows in set (0.00 sec)
The objective is to compute what I would call an execution frequency matrix (time
intervals as rows, methods as columns), i.e. for some given event_number/time interval
[x, y), I need to determine how often each method with a name conforming to a
specified regex is executed within that interval.
Heres what I have done. Being a novice Im sure its painfully ineffecient. The
bottleneck is indicated below. To give an idea of the table sizes, in a fairly small
example we have 199, 2423 and 2434194 rows in class_loads, methods and
method_executions respectively.
CREATE TEMPORARY TABLE temp_class_loads (
class_id INT NOT NULL,
class_name VARCHAR(255) NOT NULL,
source_name VARCHAR(255) NOT NULL)
CREATE TEMPORARY TABLE temp_methods (
method_id INT NOT NULL,
source_name VARCHAR(255) NOT NULL,
class_name VARCHAR(255) NOT NULL,
method_name VARCHAR(255) NOT NULL,
method_signature VARCHAR(255) NOT NULL)
LOCK TABLES class_loads READ, methods READ, method_entries READ
INSERT INTO temp_class_loads (
SELECT class_id, class_name, source_name
FROM class_loads
WHERE class_name LIKE "com.%")
INSERT INTO temp_methods (
SELECT method_id, source_name, class_name, method_name, method_signature
FROM temp_class_loads, methods
WHERE temp_class_loads.class_id = methods.class_id
GROUP BY method_id)
{do the following statements for various [x, y) event_number ranges - The third one is
the bottleneck!}
CREATE TEMPORARY TABLE interval_method_freqs (
method_id INT NOT NULL,
count INT NOT NULL)
INSERT INTO interval_method_freqs (
SELECT method_id, COUNT(*)
FROM method_executions
WHERE method_executions.event_number BETWEEN x AND y
GROUP BY method_id)
SELECT SUM(count)
FROM temp_methods LEFT JOIN interval_method_freqs USING(method_id)
GROUP BY source_name, class_name, method_name, method_signature
DROP TABLE interval_method_freqs
{loop}
UNLOCK TABLES
DROP TABLE temp_methods
DROP TABLE temp_class_loads
If you got this far thanks for making the effort! I look forward to reading suggested
improvements.
Regards,
A.