I think I can do that:
I don't have any other indexes, just the keys.
mysql> show create table data__ProcessedDataFrames;
+---------------------------
+---------------------------------------------------------------------
----------------------------------------------------------------------
----------------------------------------------------------------------
----------------------------------------------------------------------
----------------------------------------------------------------------
----------------------------------------------------------------------
----------------------------------------------------------------------
-----------------------------------------+
| Table | Create Table |
+---------------------------
+---------------------------------------------------------------------
----------------------------------------------------------------------
----------------------------------------------------------------------
----------------------------------------------------------------------
----------------------------------------------------------------------
----------------------------------------------------------------------
----------------------------------------------------------------------
-----------------------------------------+
| data__ProcessedDataFrames | CREATE TABLE
`data__ProcessedDataFrames` (
`processed_id` int(10) unsigned NOT NULL auto_increment,
`top_level_product_name` varchar(255) default NULL,
`test_id` int(10) unsigned default NULL,
`payload_time` double default NULL,
`universal_time` double default NULL,
`processed_data` mediumblob,
PRIMARY KEY (`processed_id`),
KEY `test_id` (`test_id`),
KEY `payload_time` (`payload_time`),
KEY `top_level_product_name` (`top_level_product_name`)
) ENGINE=MyISAM AUTO_INCREMENT=1392568 DEFAULT CHARSET=latin1 |
+---------------------------
+---------------------------------------------------------------------
----------------------------------------------------------------------
----------------------------------------------------------------------
----------------------------------------------------------------------
----------------------------------------------------------------------
----------------------------------------------------------------------
----------------------------------------------------------------------
-----------------------------------------+
1 row in set (0.00 sec)
mysql>
As for the amount of data I expect to get .... infinite really.
Our setup: we have serveral boxes we are running tests from, where
sql is
the storage engine on each local box, then we will have a main storage
area for all relevant tests. Based on passed data, the the tables
size
will be pushing 4G as it is (when we port the data) and expect at
least
that much more over the life of this software, but since the
payloads I
will be getting the data from have not been developed yet, I can't be
entirely sure. One of the reasons I was inquiring as to whether
breaking
it up into several tables would be a good idea is because it would
make it
easier for me to merge the data from the different testers into the
main
data repository that way. Otherwise I will have to figure out a
good way
of redoing the test_id in each test that is stored in the main
repository.
Slow queries will be a little hard to show without giving a full
evaluation of my system. So I'll simplify it a little. I'm doing
several
joins to get the right parameters to query this table in a stored
procedure ..... but when it comes down to it, the query on this
table is
the big one and I can modify my other joins, just making the query
on this
table fast is my concern. Example query:
select payload_time,HEX(processed_data) from data__ProcessedDataFrames
where test_id=18 AND top_level_product_name="DataProduct" AND
payload_time
118080000.74704 AND payload_time < 1180564096.24967;
What I'm concerned about is with how much data I will eventually have,
even scanning over the KEYS will take a long time.
Thanks
Dave G.
BTW: heres the giant query that I use.
SELECT
E.product_id,
product_name,
D.top_level_product_name,
processed_id,
product_offset,
product_length,
version_id,
byte_order,
ROUND(R.payload_time,6) as payload_time,
SUBSTR(
BINARY(processed_data),
FLOOR(product_offset/8)+1,
CEIL(product_length/8)) as substring,
(SELECT HEX(substring)) as raw_data,
(SELECT toString (
substring,
round(char_length(raw_data)/2,0),
data_type,
(SELECT attribute_value FROM
config__DataProductAttributes WHERE attribute_name =
'FormatString' AND
config__DataProductAttributes.product_id =
E.product_id),
product_offset % 8,
(product_length + (product_offset % 8)) % 8,
product_length,
byte_order,
(SELECT attribute_value FROM config__DataProductAttributes
WHERE
attribute_name = 'ConvParams' AND
config__DataProductAttributes.product_id =
E.product_id))) as converted_data,
(SELECT enum_name FROM config__DataProductEnumConversions WHERE
product_id = E.product_id AND
enum_value = converted_data) as enumerated_data,
(SELECT metricTest(converted_data,
(SELECT xmlTestMetric FROM test__TestMetrics
WHERE product_id = E.product_id))) as test_metric,
data_type,
R.test_id
FROM display__DataProducts_in_Element AS E INNER JOIN
(config__DataProducts AS D INNER JOIN
(data__ProcessedDataFrames AS R INNER JOIN
(SELECT
E.element_id,R.test_id,R.top_level_product_name,max(round
(payload_time,6))
as payload_time FROM
display__DataProducts_in_Element E,
config__DataProducts D,
data__ProcessedDataFrames R WHERE
payload_time <= ptime AND test_id like str_tid AND
D.top_level_product_name =
R.top_level_product_name AND
E.product_id = D.product_id AND
E.element_id=id GROUP BY
D.top_level_product_name) AS S
ON R.test_id = S.test_id AND R.top_level_product_name =
S.top_level_product_name AND R.payload_time = S.payload_time)
ON D.top_level_product_name = R.top_level_product_name)
ON E.product_id = D.product_id
WHERE E.element_id=id
GROUP BY product_id;
toString is a UDF I wrote for data conversions. the config__
tables are
used to interpret the blob in the data__ProcessedDataFrames table
Thanks again for the help.
Relevant data might include the actual schema for your tables, (DESC
does not inform us about what indexes you have or what table types
you
are using) and samples of the queries which are running slowly. Also
any estimates you may have worked up about how much data you expect
this thing to hold would also be useful.
- michael dykman
On 6/27/07, Dave G <[EMAIL PROTECTED]> wrote:
I have a table in my database (currently) that grows to be huge
(and I
need to keep the data). I'm in a redesign phase and I'm trying
to do it
right. So here are the relevant details:
The table has several keys involved:
mysql> desc data__ProcessedDataFrames;
+------------------------+------------------+------+-----
+---------+----------------+
| Field | Type | Null | Key | Default |
Extra
|
+------------------------+------------------+------+-----
+---------+----------------+
| processed_id | int(10) unsigned | NO | PRI | NULL |
auto_increment |
| top_level_product_name | varchar(255) | YES | MUL | NULL |
|
| test_id | int(10) unsigned | YES | MUL | NULL |
|
| payload_time | double | YES | MUL | NULL |
|
| processed_data | mediumblob | YES | | NULL |
|
+------------------------+------------------+------+-----
+---------+----------------+
6 rows in set (0.00 sec)
This is the table that contains the data I'm interested in
currently.
Queries on this table when it gets large is slow as molasses. I'm
thinking about making a new table for anything with a different
test_id
.... any opinions as to whether this is good or bad?
Before you make fun of me for my questions, I a bit new to database
programming.
If it is better design to break it into smaller tables (for speed
anyway)
then I would need to know how to query over multiple tables as
though it
was one table. Join will do this, but that takes forever (unless of
course I may be doing this wrong), so that's not a good option.
I need
to
be able to query over mutiple test_ids, which will be multiple
tables,
for
specific top_level_product_name, with in some time range (using
p_time).
Any help would be appreciated. I will happily give more
information if
you need to offer an educated opinion.
Thanks
David Godsey
--
MySQL General Mailing List
For list archives: http://lists.mysql.com/mysql
To unsubscribe: http://lists.mysql.com/mysql?
[EMAIL PROTECTED]
--
- michael dykman
- [EMAIL PROTECTED]
- All models are wrong. Some models are useful.
--
MySQL General Mailing List
For list archives: http://lists.mysql.com/mysql
To unsubscribe: http://lists.mysql.com/mysql?
[EMAIL PROTECTED]