Robert Crowell wrote:
I have data from two different sources that I would like to store in a mysql
database. Dataset_1 has pieces of information A, B, and C (let's say first,
middle, and last name) and Dataset_2 has pieces of information A, B, C, and D
(first, middle, last names and a DOUBLE value).
Dataset_1 has 9.5 million records, while Dataset_2 has only 0.5 million records.
I am interested in keeping all of this information, and would like to store
Dataset_1 and Dataset_2 in the same table.
However, if the table was created with columns A, B, C, and D, most of the
entries in the D column will be NULL. Is this considered 'good form'? Even
though these entries are all NULL, they will still consume the disk space that
a DOUBLE will, correct? Will this impact the speed of querying this table?
Would it be better to create one table to store A, B, C and a second table
solely to store D?
I'd say I am most concerned with performance, then storage space, and finally
with good form.
Thanks for your time.
-Rob Crowell
To my understanding, there will be a performance hit IF you need to
index column D - the many NULL values in the column will need to be
stored in the index, and an index on 9 mil DOUBLE's may be hard for your
server to keep in the key_buffer. If you do need an index on D, then I
would suggest creating an INT field to join the two tables on (ie, an
auto_increment field or such), then modifying the tables to this format:
table_1: A, B, C, I (9.5mil rows)
table_2: I, D (0.5mil rows)
In this way, an index on D will not be wasteful, and a join will be
simple. Also, storing ~10 mil INT's at 4bytes each ( 9.5 in one table,
0.5 in the other) will probably take up less disk space than the 1.5mil
duplicate varchar columns in your current table.
Best regards,
Devananda vdv
--
MySQL General Mailing List
For list archives: http://lists.mysql.com/mysql
To unsubscribe: http://lists.mysql.com/[EMAIL PROTECTED]