Kevin, Thank you for the extensive reply, the information you provide makes perfect sense. I have read about normalization so was anticipating that choice #2 would probably be more appropriate.
In terms of the specifics of the project... This portion of the database would act as a sort of bibliography. It would hold the source information for data represented elsewhere in the database; however, there are no real constraints as to what type of source it can be. I'd like to define some standards such as "book," "article," "website," and even self-authored types, but in reality there are possibly 100s of different types of sources that the information can derive from, each with its own fields (with some admitted overlap). With that last admission in mind, would it make further sense to have a pre-built, but extensible table of fields and just do the linking from the standard fields table to the source type in a different table? In other words, the "required" table would just present foreign keys to the source types table and the fields table. sourceTypes: id=1, name="book", ...; id=2, name="article", ... sourceFields: id=1, name="Author", ... sourceRequireds: id=1, type=1, field=1; id=2, type=2, field=1, ... Thank you again for your help! ~Jeffrey Santos On Wed, Sep 17, 2008 at 7:19 AM, Kevin Hunter <[EMAIL PROTECTED]> wrote: > At 5:46am -0400 on Wed, 17 Sep 2008, Jeffrey Santos wrote: > > I'm developing an application > > that will require information from various sources. Since what > > might be considered required information about those sources will > > vary (occasionally dramatically), I'm unsure as to the correct > > design for the database tables. In other words I'm left with (I > > believe) two choices: > > > > 1. Create a source table that will contain, among other things, > > every possible field for a source type and sort out what is > > considered required information at the application level. > > > If possible, maybe a general "conventional wisdom" statement > > would greatly help my education on these matters! > > This is a common pitfall of noobs to data modeling. The idea is to try > to think of everything at the forefront, which is almost always an > impossible task, or to think of a minimal case to jump start the coding > process, and then add "model components" later as necessary. > > The second pitfall is keeping model logic in the application. This > violates two similar principles: Single-Point-of-Authority and > Don't-Repeat-Yourself. > > You are using a database so that it can maintain the structure and > integrity of your data. There is absolutely no other compelling reason > to use a DB. (If you didn't need integrity and structure, you'd use a > simpler and mucho faster flat file.) Let the DB do its job and be the > single-point-of-authority. The application should certainly do things > "the right way", putting data where it needs to go, but it should not be > the application's /responsibility/ to keep data integrity and structure. > If you rely on your application to maintain your structure, you presume > that your programmers are perfect and will think of every little detail. > (They're not and they won't. I promise.) > > As a general technology, databases receive a lot of work so that > applications developers don't have to sweat the little details. Like > making sure that every user in a table has a last name. Like making > sure the user typed a 0 instead of an O. Like defining constraints so > that developers don't have to make sure an account has enough money to > make a withdraw. All they need know is that it didn't go through. > > The other principle of DRY also holds true. At the point you have your > application try to maintain data constraints, you will inevitably have > lots of repeated or similar code to maintain similar data cases. That's > a different kind of nightmare. Similarly, with the DB, it's silly to > define multiple columns for similar data. That's spreadsheet think. > Like col_toyota, col_ford, col_chevy, col_lexus, col_buick. No. Make a > single column as a foreign key to another table. > > Some keywords to use with Google: > > normalize > normalization > foreign keys > foreign key constraints > innodb > > Wikipedia is often a good starting point. > > Once you've learned the concept, I can't stress enough that you should > normalize, normalize, normalize. > > > 2. Create a few source tables detailing the required information about > > each source type. The only way I can think of doing this is something > along > > the lines of: > > > and then pull only the appropriate data from the database at query time > > using JOINs. > > This is closer to a better approach. That said, be more specific about > the problem you're trying to solve and folks may be better able to > provide useful advices. > > Kevin >