On Wed, Sep 17, 2008 at 7:19 AM, Kevin Hunter <[EMAIL PROTECTED]> wrote: > At 5:46am -0400 on Wed, 17 Sep 2008, Jeffrey Santos wrote: >> I'm developing an application >> that will require information from various sources. Since what >> might be considered required information about those sources will >> vary (occasionally dramatically), I'm unsure as to the correct >> design for the database tables. In other words I'm left with (I >> believe) two choices: >> >> 1. Create a source table that will contain, among other things, >> every possible field for a source type and sort out what is >> considered required information at the application level. > >> If possible, maybe a general "conventional wisdom" statement >> would greatly help my education on these matters! > > This is a common pitfall of noobs to data modeling. The idea is to try > to think of everything at the forefront, which is almost always an > impossible task, or to think of a minimal case to jump start the coding > process, and then add "model components" later as necessary. > > The second pitfall is keeping model logic in the application. This > violates two similar principles: Single-Point-of-Authority and > Don't-Repeat-Yourself. > > You are using a database so that it can maintain the structure and > integrity of your data. There is absolutely no other compelling reason > to use a DB. (If you didn't need integrity and structure, you'd use a > simpler and mucho faster flat file.) Let the DB do its job and be the > single-point-of-authority. The application should certainly do things > "the right way", putting data where it needs to go, but it should not be > the application's /responsibility/ to keep data integrity and structure. > If you rely on your application to maintain your structure, you presume > that your programmers are perfect and will think of every little detail. > (They're not and they won't. I promise.) > > As a general technology, databases receive a lot of work so that > applications developers don't have to sweat the little details. Like > making sure that every user in a table has a last name. Like making > sure the user typed a 0 instead of an O. Like defining constraints so > that developers don't have to make sure an account has enough money to > make a withdraw. All they need know is that it didn't go through. > > The other principle of DRY also holds true. At the point you have your > application try to maintain data constraints, you will inevitably have > lots of repeated or similar code to maintain similar data cases. That's > a different kind of nightmare. Similarly, with the DB, it's silly to > define multiple columns for similar data. That's spreadsheet think. > Like col_toyota, col_ford, col_chevy, col_lexus, col_buick. No. Make a > single column as a foreign key to another table. > > Some keywords to use with Google: > > normalize > normalization > foreign keys > foreign key constraints > innodb > > Wikipedia is often a good starting point. > > Once you've learned the concept, I can't stress enough that you should > normalize, normalize, normalize. > >> 2. Create a few source tables detailing the required information about >> each source type. The only way I can think of doing this is something along >> the lines of: > >> and then pull only the appropriate data from the database at query time >> using JOINs. > > This is closer to a better approach. That said, be more specific about > the problem you're trying to solve and folks may be better able to > provide useful advices. > > Kevin >
In cases like these, I go "vertical" instead of "horizontal". Meaning don't use columns, use records. Using columns will force you to modify the table structure every time a new data element comes in, which will probably be often in this case. Think of fields as labels for your data. If you create a table with 2 fields (label, value), you can have unlimited "fields" and add new ones at will because they are just new records with a different label. Depending on your data, you may want to have multiple "value" fields for different data types (i.e. number, text, date). It's harder to program, but it makes things very flexible. Indexing every item is very feasible since you are not creating an index for every field, just label+value. The table will grow quickly as far as number of records, but MySQL handles millions of records without a problem. Hope that helps Brent Baisley -- MySQL General Mailing List For list archives: http://lists.mysql.com/mysql To unsubscribe: http://lists.mysql.com/[EMAIL PROTECTED]