Robert Crowell wrote:
I have data from two different sources that I would like to store in a mysql
database.  Dataset_1 has pieces of information A, B, and C (let's say first,
middle, and last name) and Dataset_2 has pieces of information A, B, C, and D
(first, middle, last names and a DOUBLE value).

Dataset_1 has 9.5 million records, while Dataset_2 has only 0.5 million records.
 I am interested in keeping all of this information, and would like to store
Dataset_1 and Dataset_2 in the same table.

However, if the table was created with columns A, B, C, and D, most of the
entries in the D column will be NULL.  Is this considered 'good form'?  Even
though these entries are all NULL, they will still consume the disk space that
a DOUBLE will, correct?  Will this impact the speed of querying this table?

Would it be better to create one table to store A, B, C and a second table
solely to store D?

I'd say I am most concerned with performance, then storage space, and finally
with good form.

Thanks for your time.

-Rob Crowell


To my understanding, there will be a performance hit IF you need to index column D - the many NULL values in the column will need to be stored in the index, and an index on 9 mil DOUBLE's may be hard for your server to keep in the key_buffer. If you do need an index on D, then I would suggest creating an INT field to join the two tables on (ie, an auto_increment field or such), then modifying the tables to this format:
table_1: A, B, C, I (9.5mil rows)
table_2: I, D (0.5mil rows)
In this way, an index on D will not be wasteful, and a join will be simple. Also, storing ~10 mil INT's at 4bytes each ( 9.5 in one table, 0.5 in the other) will probably take up less disk space than the 1.5mil duplicate varchar columns in your current table.

Best regards,
Devananda vdv

--
MySQL General Mailing List
For list archives: http://lists.mysql.com/mysql
To unsubscribe:    http://lists.mysql.com/[EMAIL PROTECTED]

Reply via email to