Re: Relational Databasing on busy webserver (Benchmark of Enum?!)
On 09/23/2008 02:42 PM, Ben A.H. wrote: I figured that was what you meant... I guess my table didn't work (see above message...don't ya' love plaintext :-O)... Has anyone ever tried to benchmark the difference between utilizing ENUMs vs. traditional relational databasing? I would think ENUM is ideal for items I specified at the beginning of this thread, items I would think would be part of MANY (if not MOST) databases (state, country, gender, industry, occupation, referredFrom, ethnicity, position)... In my case, it would allow me to eliminate 15+ tables... I'm just wondering why database ENUMS aren't used more often... (what's the catch) Just thought I'd jump in with some terms here: When you're saying 'traditional relational databasing' is kind of misleading. You're probably still relational in some sense of the term even using a ton of ENUMs. What you're talking about is fully normalized form. Take a look here: http://en.wikipedia.org/wiki/Database_normalization ..to answer your question, normalization isn't done for speed. In fact, I'd hazard a guess that fully normalized databases are almost always slower than if the designed makes some shortcuts. What they provide is a fully structured way to organize your data. If you're just storing data, and you know there's only a limited number of ways you're going to pull data out of the set, then I'd make as many shortcuts as I could to provide speed. If you are using the dataset for analysis, and you're not sure how the users are going to extract data, then you may want to go further down the normalization road. As with anything normalization can reach a point of dubious effect, but it serves a very important purpose. By normalizing your data, and not using things like ENUMs, you're maintaining an accurate data structure which then can be arbitrarily used. But yeah, it's not necessarily fast, probably the opposite. Hope that helps. -Micah -- MySQL General Mailing List For list archives: http://lists.mysql.com/mysql To unsubscribe:http://lists.mysql.com/[EMAIL PROTECTED]
Re: Relational Databasing on busy webserver (Benchmark of Enum?!)
I figured that was what you meant... I guess my table didn't work (see above message...don't ya' love plaintext :-O)... Has anyone ever tried to benchmark the difference between utilizing ENUMs vs. traditional relational databasing? I would think ENUM is ideal for items I specified at the beginning of this thread, items I would think would be part of MANY (if not MOST) databases (state, country, gender, industry, occupation, referredFrom, ethnicity, position)... In my case, it would allow me to eliminate 15+ tables... I'm just wondering why database ENUMS aren't used more often... (what's the catch) ""Olexandr Melnyk"" <[EMAIL PROTECTED]> wrote in message news:[EMAIL PROTECTED] > On Tue, Sep 23, 2008 at 6:13 PM, Olexandr Melnyk <[EMAIL PROTECTED]> > wrote: > >> Plus, if the same query is run very often and table is almost static, >> chances are high that the result will be in query cache. >> > > Just realized that I haven't mentioned that this sentence is related to > storing states in the database, rather than in the application layer. > > -- > Sincerely yours, > Olexandr Melnyk > http://omelnyk.net/ > -- MySQL General Mailing List For list archives: http://lists.mysql.com/mysql To unsubscribe:http://lists.mysql.com/[EMAIL PROTECTED]
Re: Relational Databasing on busy webserver
Hello all, Thank-you for all of your help, I was really surprised by the speed & quality of responses. Below is a table I've created based on some reading I did following everyone's suggestions (I hope the table shows correctly)... I'm leaning towards the pure Relational as I like having all data in a singular, easily conceptualized & exported datastore but we may go with the ENUM approach if it provides a performance boost. Feel free to comment... Relational Enum Scripted Maintainability + (Simplest, known in industry, other programmers can easily understand/maintain) + (Almost as simple as Relational, known in industry, other programmers can easily understand/maintain) - (Will have 2 datastores, the SQL DB & XML or hard-coded values; NOT complicated, but not as intuitively maintainable) Scalability + (Easily Scalable, simplified by the fact there's 1 datastore. Adding a State as easy as adding record to db, no code changes needed!) + (Easily Scalable, simplified by the fact there's 1 datastore. Adding a State as easy as adding new ENUM to db, no code changes needed!) N/- (Not as Easily scalable, adding/removing field means changing code which always has risks; of course done right: we'd just add new code to an XML file) Code Complexity + (Super simple, easy to understand, GET DATA:connect, query, show; WRITE DATA validate, connect, write) + (Super simple, easy to understand, GET DATA:connect, query, show; WRITE DATA validate, connect, write) - (Alittle more involved GET DATA: connect, query, turn codes into words via code, showd WRITE DATA: similar to Relational/Enum Performance N or - N or - + Could potentially increase performance by taking load off SQL server. Portability (Using w/ 3rd party System or future external application) + Integrating w/ 3rd party SW or new applications simple. + (?) Integrating w/ 3rd party SW or new applications probably like Relational. Are ENUMS handled/returned via normal SELECT queries? (if so, just like relational) - Would have to write the decodify module for any outside system. Isn't intuitively portable. ThanX, Ben ""David Ashley"" <[EMAIL PROTECTED]> wrote in message news:[EMAIL PROTECTED] > On Tue, Sep 23, 2008 at 11:13 AM, Olexandr Melnyk <[EMAIL PROTECTED]> > wrote: > >> On 9/23/08, David Ashley <[EMAIL PROTECTED]> wrote: >>> >>> For example, for the 50 states, a lot of programmers would put this >>> logic >>> in >>> the web script and just store the two-letter postal code in the database >>> table (but with no separate table for "states"). The mapping from "MI" >>> to >>> "Michigan" would occur in the web scripts; as would the logic to know >>> that >>> "XQ" is not a state. >>> >> >> Keep in mind that, as you add more countries, maintaining in-scipt >> code/name mappings of state codes will become more and more cumbersome. >> Another problem will arise if you want to view user information, along >> with >> full country and state names, from a different programming language. >> >> Plus, if the same query is run very often and table is almost static, >> chances are high that the result will be in query cache. > > > The "different programming language" issue is one I hadn't thought about. > Good catch. > > I program in PHP only, and I run queries manually only if I need to debug > something hard. But I could see the issue you cited arising in a more > complex environment. I could even see a "mixed" web application where the > web presence is a mixture of PHP, Python, and Perl. > > Good catch. > -- MySQL General Mailing List For list archives: http://lists.mysql.com/mysql To unsubscribe:http://lists.mysql.com/[EMAIL PROTECTED]
Re: Relational Databasing on busy webserver
On Tue, Sep 23, 2008 at 6:13 PM, Olexandr Melnyk <[EMAIL PROTECTED]> wrote: > Plus, if the same query is run very often and table is almost static, > chances are high that the result will be in query cache. > Just realized that I haven't mentioned that this sentence is related to storing states in the database, rather than in the application layer. -- Sincerely yours, Olexandr Melnyk http://omelnyk.net/
Re: Relational Databasing on busy webserver
> It sounds like what you're looking for is an ENUM value: > > http://dev.mysql.com/doc/refman/5.0/en/enum.html > > Bear in mind when using this data-type that if you do want to add a new > value (such as a new state/country), you will have to perform an ALTER > TABLE statement, which can take some time to execute over a table > storing a lot of data. Using the relational method means you'd just have > to add one row to a table, which is significantly faster. Just to clarify: Wouldn't the ALTER TABLE though just have to be run once? and on the backend? I guess what I'm asking is: would this affect web-users in any way? > > Thanks > > On Mon, 2008-09-22 at 17:12 -0400, Ben A.H. wrote: >> Hello, >> >> We are setting up a relatively common web application which collects >> "user >> information"... Right off the bat our system will have over 200,000 USER >> RECORDS so having an efficient database & lookup scheme is critical. >> I am a programmer/developer with some education in databasing but my >> forte >> is programming hence this odd (I think) question... >> >> Obviously we'll have the standard USER table & a bunch of supporting >> tables. >> >> For items like STATE/PROVINCE standard database logic would dictate I >> setup: >> >> USER >>name >>email >>.etc... >>StateID (foreign key) >> >> >> STATE >>StateID >>StateName >> >> But I often wonder if there's any benefit in having a State table... >> >> For one, new States/Provinces are not that likely, removal of these >> entities >> is also unlikely (except maybe Quebec :-)) so the chances of having to >> make >> alternations to the State table near nil. It raises the question of >> whether >> or not a State Table is even necessary. >> Using a traditional SQL State table, I'd have to do an SQL query to >> populate >> User State/Province options every time the "New User Registration" form >> is >> shown - isn't this needless overhead?! >> Would my webforms not load faster if State/Province information was >> hard-coded as options for a combobox? Is this something people have >> experimented with? >> >> There are various other fields that I believe could be handled like this >> for >> a cumulative performance boost. For example: country, state/province, >> gender, industry, occupation, ethnicity, language are all options that >> aren't going to change that often. Then again, when we do have to display >> the users choice; for example if user.countryID=3 we'd have to have a way >> to >> turn that number 3 into the word "Canada" when we display the users >> profile... I'd probably do this via XML lookup. >> >> Has anyone experimented with the benefits/tradeoffs of such a scheme? As >> I've said, we are dealing with at least 200,000 user records, probably >> 300,000-400,000 in the next year. The User table contains at least 50 >> attributes, 15 of which are Foreign Keys that link to tables that will >> likely never change (the users choices for countryID, stateID, >> OperatingSystemID, internet_connectionTypeID, internetUserFrequencyID, >> industryID, occupationID)... >> >> ThanX in advance >> Ben >> >> >> > -- > Ian Simpson > System Administrator > MyJobGroup > > -- MySQL General Mailing List For list archives: http://lists.mysql.com/mysql To unsubscribe:http://lists.mysql.com/[EMAIL PROTECTED]
Re: Relational Databasing on busy webserver
On Tue, Sep 23, 2008 at 11:13 AM, Olexandr Melnyk <[EMAIL PROTECTED]> wrote: > On 9/23/08, David Ashley <[EMAIL PROTECTED]> wrote: >> >> For example, for the 50 states, a lot of programmers would put this logic >> in >> the web script and just store the two-letter postal code in the database >> table (but with no separate table for "states"). The mapping from "MI" to >> "Michigan" would occur in the web scripts; as would the logic to know that >> "XQ" is not a state. >> > > Keep in mind that, as you add more countries, maintaining in-scipt > code/name mappings of state codes will become more and more cumbersome. > Another problem will arise if you want to view user information, along with > full country and state names, from a different programming language. > > Plus, if the same query is run very often and table is almost static, > chances are high that the result will be in query cache. The "different programming language" issue is one I hadn't thought about. Good catch. I program in PHP only, and I run queries manually only if I need to debug something hard. But I could see the issue you cited arising in a more complex environment. I could even see a "mixed" web application where the web presence is a mixture of PHP, Python, and Perl. Good catch.
Re: Relational Databasing on busy webserver
On 9/23/08, David Ashley <[EMAIL PROTECTED]> wrote: > > For example, for the 50 states, a lot of programmers would put this logic > in > the web script and just store the two-letter postal code in the database > table (but with no separate table for "states"). The mapping from "MI" to > "Michigan" would occur in the web scripts; as would the logic to know that > "XQ" is not a state. > Keep in mind that, as you add more countries, maintaining in-scipt code/name mappings of state codes will become more and more cumbersome. Another problem will arise if you want to view user information, along with full country and state names, from a different programming language. Plus, if the same query is run very often and table is almost static, chances are high that the result will be in query cache. -- Sincerely yours, Olexandr Melnyk <>< http://omelnyk.net/
Re: Relational Databasing on busy webserver
On Mon, Sep 22, 2008 at 5:12 PM, Ben A.H. <[EMAIL PROTECTED]> wrote: > > There are various other fields that I believe could be handled like this > for > a cumulative performance boost. For example: country, state/province, > gender, industry, occupation, ethnicity, language are all options that > aren't going to change that often. Then again, when we do have to display > the users choice; for example if user.countryID=3 we'd have to have a way > to > turn that number 3 into the word "Canada" when we display the users > profile... I'd probably do this via XML lookup. > > Has anyone experimented with the benefits/tradeoffs of such a scheme? As > I've said, we are dealing with at least 200,000 user records, probably > 300,000-400,000 in the next year. The User table contains at least 50 > attributes, 15 of which are Foreign Keys that link to tables that will > likely never change (the users choices for countryID, stateID, > OperatingSystemID, internet_connectionTypeID, internetUserFrequencyID, > industryID, occupationID)... > Hi Ben, This is a very common issue in web database development. The issue is really whether the tables should exist (logically) in the database or in the web scripts. For example, for the 50 states, a lot of programmers would put this logic in the web script and just store the two-letter postal code in the database table (but with no separate table for "states"). The mapping from "MI" to "Michigan" would occur in the web scripts; as would the logic to know that "XQ" is not a state. This is done all the time. As long as the database is indexed properly for the queries you want to run, there is not usually a severe performance hit. Either approach is fine; but the web script approach is usually easier. I just have one piece of advice for you: Use something stored in the database that won't change when you update the fixed tables in the web scripts. For example, a 2-letter postal state code is fine; because when they add the 51st state they won't change any of the existing codes. Similarly, use "M", "F", etc. for sex. Don't store anything in the database that is tied to, for example, the integer index of a table entry in the web script or stored in some other way where when you modify the script you'll need to alter the database as well; otherwise you'll be writing a separate script to remap values of columns in the database so that you can use the new scripts. Dave.
Re: Relational Databasing on busy webserver
Hi Ben, It sounds like what you're looking for is an ENUM value: http://dev.mysql.com/doc/refman/5.0/en/enum.html Bear in mind when using this data-type that if you do want to add a new value (such as a new state/country), you will have to perform an ALTER TABLE statement, which can take some time to execute over a table storing a lot of data. Using the relational method means you'd just have to add one row to a table, which is significantly faster. Thanks On Mon, 2008-09-22 at 17:12 -0400, Ben A.H. wrote: > Hello, > > We are setting up a relatively common web application which collects "user > information"... Right off the bat our system will have over 200,000 USER > RECORDS so having an efficient database & lookup scheme is critical. > I am a programmer/developer with some education in databasing but my forte > is programming hence this odd (I think) question... > > Obviously we'll have the standard USER table & a bunch of supporting tables. > > For items like STATE/PROVINCE standard database logic would dictate I setup: > > USER >name >email >.etc... >StateID (foreign key) > > > STATE >StateID >StateName > > But I often wonder if there's any benefit in having a State table... > > For one, new States/Provinces are not that likely, removal of these entities > is also unlikely (except maybe Quebec :-)) so the chances of having to make > alternations to the State table near nil. It raises the question of whether > or not a State Table is even necessary. > Using a traditional SQL State table, I'd have to do an SQL query to populate > User State/Province options every time the "New User Registration" form is > shown - isn't this needless overhead?! > Would my webforms not load faster if State/Province information was > hard-coded as options for a combobox? Is this something people have > experimented with? > > There are various other fields that I believe could be handled like this for > a cumulative performance boost. For example: country, state/province, > gender, industry, occupation, ethnicity, language are all options that > aren't going to change that often. Then again, when we do have to display > the users choice; for example if user.countryID=3 we'd have to have a way to > turn that number 3 into the word "Canada" when we display the users > profile... I'd probably do this via XML lookup. > > Has anyone experimented with the benefits/tradeoffs of such a scheme? As > I've said, we are dealing with at least 200,000 user records, probably > 300,000-400,000 in the next year. The User table contains at least 50 > attributes, 15 of which are Foreign Keys that link to tables that will > likely never change (the users choices for countryID, stateID, > OperatingSystemID, internet_connectionTypeID, internetUserFrequencyID, > industryID, occupationID)... > > ThanX in advance > Ben > > > -- Ian Simpson System Administrator MyJobGroup -- MySQL General Mailing List For list archives: http://lists.mysql.com/mysql To unsubscribe:http://lists.mysql.com/[EMAIL PROTECTED]
Relational Databasing on busy webserver
Hello, We are setting up a relatively common web application which collects "user information"... Right off the bat our system will have over 200,000 USER RECORDS so having an efficient database & lookup scheme is critical. I am a programmer/developer with some education in databasing but my forte is programming hence this odd (I think) question... Obviously we'll have the standard USER table & a bunch of supporting tables. For items like STATE/PROVINCE standard database logic would dictate I setup: USER name email .etc... StateID (foreign key) STATE StateID StateName But I often wonder if there's any benefit in having a State table... For one, new States/Provinces are not that likely, removal of these entities is also unlikely (except maybe Quebec :-)) so the chances of having to make alternations to the State table near nil. It raises the question of whether or not a State Table is even necessary. Using a traditional SQL State table, I'd have to do an SQL query to populate User State/Province options every time the "New User Registration" form is shown - isn't this needless overhead?! Would my webforms not load faster if State/Province information was hard-coded as options for a combobox? Is this something people have experimented with? There are various other fields that I believe could be handled like this for a cumulative performance boost. For example: country, state/province, gender, industry, occupation, ethnicity, language are all options that aren't going to change that often. Then again, when we do have to display the users choice; for example if user.countryID=3 we'd have to have a way to turn that number 3 into the word "Canada" when we display the users profile... I'd probably do this via XML lookup. Has anyone experimented with the benefits/tradeoffs of such a scheme? As I've said, we are dealing with at least 200,000 user records, probably 300,000-400,000 in the next year. The User table contains at least 50 attributes, 15 of which are Foreign Keys that link to tables that will likely never change (the users choices for countryID, stateID, OperatingSystemID, internet_connectionTypeID, internetUserFrequencyID, industryID, occupationID)... ThanX in advance Ben -- MySQL General Mailing List For list archives: http://lists.mysql.com/mysql To unsubscribe:http://lists.mysql.com/[EMAIL PROTECTED]