"Siegfried Heintze" <[EMAIL PROTECTED]> wrote on 07/24/2005 11:35:36 AM:
> I have a large number of job titles (40K). Each job title has multiple > keywords making a one-to-many parent-child relationship. > > If I join job title with company name, address, company url, company city, > job name, job location, job url (etc...) I have a mighty wide result set > that will be repeated for each keyword. > > What I have done in the past (in a different, much smaller, application) is > perform a join of everything except the keyword and store everything in a > hashmap. > > Then I iterate thru each wide row in the hashmap and perform a separate > SELECT statement foreach row in this hashmap to fetch the multiple keywords. > > Whew! That would be a lot of RAM (and paging) for this application. > > Are there any other more efficient approaches? > > Thanks, > Siegfried > > There are two major classes of efficiency when dealing with any RDBMS: time efficiency (faster results), space efficiency (stored data takes less room on the disk). Which one are you worried about? If it were me, I would start with all of the data normalized: * a Companies table (name, address, url, city, etc) * a Job Titles table (a list of names) * a Keywords table (a list of words used to describe Job Titles) * a JobPosting table ( Relates Companies to Job Titles. Should also be used to track things like dateposted, dateclosed, salary offered, etc.) * a Postings_Keywords table (matches a Posting to multiple Keywords). I would only denormalize if testing showed a dramatic improvement in performance by doing so. I would think that the Job Title to Keyword relationship would be different between Companies. One company posting for a "Programmer" may want VB while another wants PHP and PERL. By associating the Keywords with a Posting (and not just the Job Title), you can make that list Company-specific. Shawn Green Database Administrator Unimin Corporation - Spruce Pine