Well, here's the semi-relational way to do it:

CREATE TABLE `MyStrings` (
`ID` int NOT NULL AUTO_INCREMENT ,
`String` varchar(100) NOT NULL DEFAULT '' ,
UNIQUE `StringIndex`(`String`)
PRIMARY KEY ('ID') )
TYPE=MYISAM

CREATE TABLE `MyIntegers` (
`ID` int NOT NULL AUTO_INCREMENT ,
`StringID` int NOT NULL DEFAULT '' ,
`Integer` int NOT NULL DEFAULT '' ,
INDEX `StringIDIndex`(`StringID`),
PRIMARY KEY (`ID`) )
TYPE=MYISAM

MyStrings contains your unique strings with a unique index on those strings. MyIntegers is linked to MyStrings via the StringID foreign key. Lookups go something like this:

SELECT ID FROM MyStrings WHERE String='something'

This uses the unique index on String which means it should be quite fast.

SELECT Integer FROM MyIntegers WHERE StringID=[ID From Previous Select Goes Here]

This should return one or more integers associated with that key.

This way will pretty much give you good speed as your entries grow, even beyond 400k. However, it may not be nessicary to do this. You can probably keep your original table design, and just index on the strings. It's not like 400,000 indexed strings are going to give MySQL a hard time, unless you have poor hardware. Still, I'd recommend considering the above example as a model for your solution.

----- Original Message ----- From: "Tommy Svensson (InfoGrafix)" <[EMAIL PROTECTED]>
To: <mysql@lists.mysql.com>
Sent: Tuesday, June 07, 2005 1:49 PM
Subject: What's the optimal db design choice for my 400 000 entries?


Hi all you mysql gurus,

I have 400 000 unique strings where each and every one of these strings are associated with 1 - 50 (appr.) integer values.

Now, pretty simple for you guys I guess, but how will I design my
database to make a search interface against this data as rapid as possible?

My first guess for a table whould be

  TABLE list
string     int
=======    =====
string1    234
string1    6323
string1    343
string2    313
string2    9055
...
string434  5445
string434  12
...


But I come to a grinding halt when I realize this table will be 400 000 rows big, times the sum over the number of associated integers for each unique string... let's say each unique string always have 10 associated integers. Then we have a table of 400 000 x 10 = 4 000 000 rows. Is this really the best approach?

A search would appropriately look like this

Search: string434

and produce the following output

Result: 5445 12.

I guess the sql could look like this:

SELECT int FROM list WHERE string = 'string434'

What would be the optimal select query if I would like to return ints for strings that begin with say 'str'?

And what if I wanted to search for strings where I only know the mid part or the end of the string? How then would a performance stable sql command look?

If the above design suggestion is the optimal one, then I guess the internal database workings do not contain duplicates of the string values as they do in the table...? I'm kinda hoping a binary tree is built or something similar. Though I read somewhere that in order for the db to optimize and create btrees and the like, some data must be indexed... but how? I can't very well make the string column 'primary' since it isn't...?

Very, very, very interested in hearing what you pros have to say! Thx a lot for this forum!

/Tommy


--
MySQL General Mailing List
For list archives: http://lists.mysql.com/mysql
To unsubscribe:    http://lists.mysql.com/[EMAIL PROTECTED]




--
MySQL General Mailing List
For list archives: http://lists.mysql.com/mysql
To unsubscribe:    http://lists.mysql.com/[EMAIL PROTECTED]

Reply via email to