Hello.
If you think that it is possible to have several records with the same post and title, then, in my opinion, your table should have two indexes for foreign key constraints and one AUTO_INCREMENT field for primary key. mysql> show create table titles_posts\G; *************************** 1. row *************************** Table: titles_posts Create Table: CREATE TABLE `titles_posts` ( `id` int(10) unsigned NOT NULL auto_increment, `tid` int(10) unsigned default NULL, `pid` int(10) unsigned default NULL, PRIMARY KEY (`id`), KEY `tid` (`tid`), KEY `pid` (`pid`), CONSTRAINT `titles_posts_ibfk_1` FOREIGN KEY (`tid`) REFERENCES `titles` (`tid`), CONSTRAINT `titles_posts_ibfk_2` FOREIGN KEY (`pid`) REFERENCES `postings` (`pid`) ) ENGINE=InnoDB DEFAULT CHARSET=utf8 mysql> show create table titles\G *************************** 1. row *************************** Table: titles Create Table: CREATE TABLE `titles` ( `tid` int(10) unsigned NOT NULL auto_increment, `title` varchar(256) default NULL, PRIMARY KEY (`tid`) ) ENGINE=InnoDB DEFAULT CHARSET=latin1 mysql> show create table postings\G *************************** 1. row *************************** Table: postings Create Table: CREATE TABLE `postings` ( `pid` int(10) unsigned NOT NULL auto_increment, `post` varchar(65500) default NULL, PRIMARY KEY (`pid`) ) ENGINE=InnoDB DEFAULT CHARSET=latin1 The performance penalty of updating an auto_increment index is low. If you are sure that it is impossible to have two records with the same post and title, you should make a choice - what is more important for you: speed or data integrity. I usually prefer the data integrity, so, I'd probably create a primary key on tid and pid: mysql> show create table titles_posts\G; *************************** 1. row *************************** Table: titles_posts Create Table: CREATE TABLE `titles_posts` ( `tid` int(10) unsigned NOT NULL default '0', `pid` int(10) unsigned NOT NULL default '0', PRIMARY KEY (`tid`,`pid`), KEY `pid` (`pid`), CONSTRAINT `titles_posts_ibfk_1` FOREIGN KEY (`tid`) REFERENCES `titles` (`tid`), CONSTRAINT `titles_posts_ibfk_2` FOREIGN KEY (`pid`) REFERENCES `postings` (`pid`) ) ENGINE=InnoDB DEFAULT CHARSET=utf8 > what is the performance like when I only know the fkJobTitle and I >want to > find all job postings for that title. Is it a linear search? If >not, why not? If fkJobTitle key has high cardinality the search will be quite fast. Use the benchmarks to make a final desision. > So what SQL statements would I use to look up the job title, insert it >if it > does not exist and return the integer PK. I could use SELECT and >if that At first, you should add the record to the titles table. For this task, I'd perform a simple INSERT and later obtain the value of tid by selecting LAST_INSERT_ID(): mysql> insert into titles set title="title1"; Query OK, 1 row affected (0,00 sec) mysql> select last_insert_id(); +------------------+ | last_insert_id() | +------------------+ | 6 | +------------------+ mysql> select * from titles order by tid; +-----+--------+ | tid | title | +-----+--------+ | 1 | yo | | 3 | yo1 | | 4 | yo2 | | 5 | title | | 6 | title1 | +-----+--------+ In case of 'duplicate key' error, I get the value of tid using SELECT. "Siegfried Heintze" <[EMAIL PROTECTED]> wrote: > I have 100K job posting records and 40K job title records. There is a M:M > relationship here. I expect these tables to grow rapidly. > > What is the best way to design a junction or link table? Do I need to create > a primary key? > > My thought was no primary key, just two indices on each foreign key (FK). > > Some folks feel you should always have a primary key. However, if I make > fkJobTitle the primary key, can it still have duplicates? > > Some folks say I should have a composite primary key consisting of > fkJobTitle and fkJobPosting. This does buy me uniqueness. However, what is > the performance like when I only know the fkJobTitle and I want to find all > job postings for that title. Is it a linear search? If not, why not? I don't > know the fkJobPosting to exploit the primary key. > > Some folks say I say I should have a separate auto increment integer > separate from fkJobPosting and fkJobTitle. Then I could make the "id" field > (what is your favorite naming convention for this field?) the primary key > and index fkJobTitle and fkJobPosting separately. But this means every time > I insert into the junction table, I have to update three index structures. > Is this a problem? > > What is your opinion? > > Lastly, I have learned that MySQL has a rich set of extensions for the SQL > syntax. Let's suppose I have several hundred jobs which may or may not have > been previously inserted into my data base already where each job contains a > posting and a title. For each one I have to look it up, insert it if it does > not already exist in the database, and, return the integer PK. > > So what SQL statements would I use to look up the job title, insert it if it > does not exist and return the integer PK. I could use SELECT and if that > fails, INSERT or vice versa. But a previous poster informed me there are > other statements like REPLACE or INSERT ... ON DUPLICATE... and maybe there > are others. (Apparently REPLACE will INSERT if it is not already there). I > was using REPLACE and now that I am using V5 I can use INSERT... ON > DUPLICATE. Which would be best? > > Thanks, > Siegfried > > -- For technical support contracts, goto https://order.mysql.com/?ref=ensita This email is sponsored by Ensita.NET http://www.ensita.net/ __ ___ ___ ____ __ / |/ /_ __/ __/ __ \/ / Gleb Paharenko / /|_/ / // /\ \/ /_/ / /__ [EMAIL PROTECTED] /_/ /_/\_, /___/\___\_\___/ MySQL AB / Ensita.NET <___/ www.mysql.com -- MySQL General Mailing List For list archives: http://lists.mysql.com/mysql To unsubscribe: http://lists.mysql.com/[EMAIL PROTECTED]