Re: What's the optimal db design choice for my 400 000 entries?

Jeremiah Gowdy Wed, 08 Jun 2005 07:12:08 -0700

Well, here's the semi-relational way to do it:

CREATE TABLE `MyStrings` (
`ID` int NOT NULL AUTO_INCREMENT ,
`String` varchar(100) NOT NULL DEFAULT '' ,
UNIQUE `StringIndex`(`String`)
PRIMARY KEY ('ID') )
TYPE=MYISAM


CREATE TABLE `MyIntegers` (
`ID` int NOT NULL AUTO_INCREMENT ,
`StringID` int NOT NULL DEFAULT '' ,
`Integer` int NOT NULL DEFAULT '' ,
INDEX `StringIDIndex`(`StringID`),
PRIMARY KEY (`ID`) )
TYPE=MYISAM

MyStrings contains your unique strings with a unique index on those strings.MyIntegers is linked to MyStrings via the StringID foreign key. Lookups gosomething like this:


SELECT ID FROM MyStrings WHERE String='something'

This uses the unique index on String which means it should be quite fast.

SELECT Integer FROM MyIntegers WHERE StringID=[ID From Previous Select GoesHere]


This should return one or more integers associated with that key.

This way will pretty much give you good speed as your entries grow, evenbeyond 400k. However, it may not be nessicary to do this. You can probablykeep your original table design, and just index on the strings. It's notlike 400,000 indexed strings are going to give MySQL a hard time, unless youhave poor hardware. Still, I'd recommend considering the above example as amodel for your solution.

----- Original Message -----From: "Tommy Svensson (InfoGrafix)" <[EMAIL PROTECTED]>

To: <mysql@lists.mysql.com>
Sent: Tuesday, June 07, 2005 1:49 PM
Subject: What's the optimal db design choice for my 400 000 entries?

Hi all you mysql gurus,
I have 400 000 unique strings where each and every one of these stringsare associated with 1 - 50 (appr.) integer values.
Now, pretty simple for you guys I guess, but how will I design my
database to make a search interface against this data as rapid aspossible?
My first guess for a table whould be

  TABLE list
string     int
=======    =====
string1    234
string1    6323
string1    343
string2    313
string2    9055
...
string434  5445
string434  12
...
But I come to a grinding halt when I realize this table will be 400 000rows big, times the sum over the number of associated integers for eachunique string... let's say each unique string always have 10 associatedintegers. Then we have a table of 400 000 x 10 = 4 000 000 rows. Is thisreally the best approach?
A search would appropriately look like this

Search: string434

and produce the following output

Result: 5445 12.

I guess the sql could look like this:

SELECT int FROM list WHERE string = 'string434'
What would be the optimal select query if I would like to return ints forstrings that begin with say 'str'?
And what if I wanted to search for strings where I only know the mid partor the end of the string? How then would a performance stable sql commandlook?
If the above design suggestion is the optimal one, then I guess theinternal database workings do not contain duplicates of the string valuesas they do in the table...? I'm kinda hoping a binary tree is built orsomething similar. Though I read somewhere that in order for the db tooptimize and create btrees and the like, some data must be indexed... buthow? I can't very well make the string column 'primary' since it isn't...?
Very, very, very interested in hearing what you pros have to say! Thx alot for this forum!
/Tommy


--
MySQL General Mailing List
For list archives: http://lists.mysql.com/mysql
To unsubscribe:    http://lists.mysql.com/[EMAIL PROTECTED]



--
MySQL General Mailing List
For list archives: http://lists.mysql.com/mysql
To unsubscribe:    http://lists.mysql.com/[EMAIL PROTECTED]

Re: What's the optimal db design choice for my 400 000 entries?

Reply via email to