Re: Relational Databasing on busy webserver (Benchmark of Enum?!)

2008-09-23 Thread Micah Stevens

On 09/23/2008 02:42 PM, Ben A.H. wrote:
I figured that was what you meant... I guess my table didn't work (see above 
message...don't ya' love plaintext :-O)...


Has anyone ever tried to benchmark the difference between utilizing ENUMs 
vs. traditional relational databasing? I would think ENUM is ideal for items 
I specified at the beginning of this thread, items I would think would be 
part of MANY (if not MOST) databases (state, country, gender, industry, 
occupation, referredFrom, ethnicity, position)... In my case, it would allow 
me to eliminate 15+ tables...


I'm just wondering why database ENUMS aren't used more often... (what's the 
catch)
  
Just thought I'd jump in with some terms here: When you're saying 
'traditional relational databasing' is kind of misleading. You're 
probably still relational in some sense of the term even using a ton of 
ENUMs. What you're talking about is fully normalized form. Take a look here:


http://en.wikipedia.org/wiki/Database_normalization

..to answer your question, normalization isn't done for speed. In fact, 
I'd hazard a guess that fully normalized databases are almost always 
slower than if the designed makes some shortcuts. What they provide is a 
fully structured way to organize your data.


If you're just storing data, and you know there's only a limited number 
of ways you're going to pull data out of the set, then I'd make as many 
shortcuts as I could to provide speed. If you are using the dataset for 
analysis, and you're not sure how the users are going to extract data, 
then you may want to go further down the normalization road.


As with anything normalization can reach a point of dubious effect, but 
it serves a very important purpose. By normalizing your data, and not 
using things like ENUMs, you're maintaining an accurate data structure 
which then can be arbitrarily used. But yeah, it's not necessarily fast, 
probably the opposite.


Hope that helps.
-Micah

--
MySQL General Mailing List
For list archives: http://lists.mysql.com/mysql
To unsubscribe:http://lists.mysql.com/[EMAIL PROTECTED]



Re: Relational Databasing on busy webserver (Benchmark of Enum?!)

2008-09-23 Thread Ben A.H.
I figured that was what you meant... I guess my table didn't work (see above 
message...don't ya' love plaintext :-O)...

Has anyone ever tried to benchmark the difference between utilizing ENUMs 
vs. traditional relational databasing? I would think ENUM is ideal for items 
I specified at the beginning of this thread, items I would think would be 
part of MANY (if not MOST) databases (state, country, gender, industry, 
occupation, referredFrom, ethnicity, position)... In my case, it would allow 
me to eliminate 15+ tables...

I'm just wondering why database ENUMS aren't used more often... (what's the 
catch)



""Olexandr Melnyk"" <[EMAIL PROTECTED]> wrote in message 
news:[EMAIL PROTECTED]
> On Tue, Sep 23, 2008 at 6:13 PM, Olexandr Melnyk <[EMAIL PROTECTED]> 
> wrote:
>
>> Plus, if the same query is run very often and table is almost static,
>> chances are high that the result will be in query cache.
>>
>
> Just realized that I haven't mentioned that this sentence is related to
> storing states in the database, rather than in the application layer.
>
> -- 
> Sincerely yours,
> Olexandr Melnyk
> http://omelnyk.net/
> 



-- 
MySQL General Mailing List
For list archives: http://lists.mysql.com/mysql
To unsubscribe:http://lists.mysql.com/[EMAIL PROTECTED]



Re: Relational Databasing on busy webserver

2008-09-23 Thread Ben A.H.
Hello all,

Thank-you for all of your help, I was really surprised by the speed & 
quality of responses. Below is a table I've created based on some reading I 
did following everyone's suggestions (I hope the table shows correctly)... 
I'm leaning towards the pure Relational as I like having all data in a 
singular, easily conceptualized & exported datastore but we may go with the 
ENUM approach if it provides a performance boost. Feel free to comment...


 Relational
 Enum
 Scripted

  Maintainability
 +

  (Simplest, known in industry, other programmers can easily 
understand/maintain)
 +

  (Almost as simple as Relational, known in industry, other programmers 
can easily understand/maintain)
 -

  (Will have 2 datastores, the SQL DB & XML or hard-coded values; NOT 
complicated, but not as intuitively maintainable)

  Scalability
 +

   (Easily Scalable, simplified by the fact there's 1 datastore. Adding 
a State as easy as adding record to db, no code changes needed!)
 +

   (Easily Scalable, simplified by the fact there's 1 datastore. Adding 
a State as easy as adding new ENUM to db, no code changes needed!)
 N/-

  (Not as Easily scalable, adding/removing field means changing code 
which always has risks; of course done right: we'd just add new code to an 
XML file)

  Code Complexity
 +

   (Super simple, easy to understand, GET DATA:connect, query, show;

  WRITE DATA validate, connect, write)
 +

   (Super simple, easy to understand, GET DATA:connect, query, show;

  WRITE DATA validate, connect, write)
 -

  (Alittle more involved

  GET DATA: connect, query, turn codes into words via code, showd

  WRITE DATA: similar to Relational/Enum

  Performance
 N or -
 N or -
 +

  Could potentially increase performance by taking load off SQL server.

  Portability (Using w/ 3rd party System or future external application)
 +

  Integrating w/ 3rd party SW or new applications simple.
 +  (?)

  Integrating w/ 3rd party SW or new applications probably like 
Relational. Are ENUMS handled/returned via normal SELECT queries? (if so, 
just like relational)
 -

  Would have to write the decodify module for any outside system. Isn't 
intuitively portable.



ThanX,

Ben


""David Ashley"" <[EMAIL PROTECTED]> wrote in message 
news:[EMAIL PROTECTED]
> On Tue, Sep 23, 2008 at 11:13 AM, Olexandr Melnyk <[EMAIL PROTECTED]> 
> wrote:
>
>>  On 9/23/08, David Ashley <[EMAIL PROTECTED]> wrote:
>>>
>>> For example, for the 50 states, a lot of programmers would put this 
>>> logic
>>> in
>>> the web script and just store the two-letter postal code in the database
>>> table (but with no separate table for "states").  The mapping from "MI" 
>>> to
>>> "Michigan" would occur in the web scripts; as would the logic to know 
>>> that
>>> "XQ" is not a state.
>>>
>>
>> Keep in mind that, as you add more countries, maintaining in-scipt
>> code/name mappings of state codes will become more and more cumbersome.
>> Another problem will arise if you want to view user information, along 
>> with
>> full country and state names, from a different programming language.
>>
>> Plus, if the same query is run very often and table is almost static,
>> chances are high that the result will be in query cache.
>
>
> The "different programming language" issue is one I hadn't thought about.
> Good catch.
>
> I program in PHP only, and I run queries manually only if I need to debug
> something hard.  But I could see the issue you cited arising in a more
> complex environment.  I could even see a "mixed" web application where the
> web presence is a mixture of PHP, Python, and Perl.
>
> Good catch.
> 



-- 
MySQL General Mailing List
For list archives: http://lists.mysql.com/mysql
To unsubscribe:http://lists.mysql.com/[EMAIL PROTECTED]



Re: Relational Databasing on busy webserver

2008-09-23 Thread Olexandr Melnyk
On Tue, Sep 23, 2008 at 6:13 PM, Olexandr Melnyk <[EMAIL PROTECTED]> wrote:

> Plus, if the same query is run very often and table is almost static,
> chances are high that the result will be in query cache.
>

Just realized that I haven't mentioned that this sentence is related to
storing states in the database, rather than in the application layer.

-- 
Sincerely yours,
Olexandr Melnyk
http://omelnyk.net/


Re: Relational Databasing on busy webserver

2008-09-23 Thread Ben A.H.
> It sounds like what you're looking for is an ENUM value:
>
> http://dev.mysql.com/doc/refman/5.0/en/enum.html
>
> Bear in mind when using this data-type that if you do want to add a new
> value (such as a new state/country), you will have to perform an ALTER
> TABLE statement, which can take some time to execute over a table
> storing a lot of data. Using the relational method means you'd just have
> to add one row to a table, which is significantly faster.

Just to clarify:
Wouldn't the ALTER TABLE though just have to be run once? and on the 
backend?
I guess what I'm asking is: would this affect web-users in any way?






>
> Thanks
>
> On Mon, 2008-09-22 at 17:12 -0400, Ben A.H. wrote:
>> Hello,
>>
>> We are setting up a relatively common web application which collects 
>> "user
>> information"... Right off the bat our system will have over 200,000 USER
>> RECORDS so having an efficient database & lookup scheme is critical.
>> I am a programmer/developer with some education in databasing but my 
>> forte
>> is programming hence this odd (I think) question...
>>
>> Obviously we'll have the standard USER table & a bunch of supporting 
>> tables.
>>
>> For items like STATE/PROVINCE standard database logic would dictate I 
>> setup:
>>
>> USER
>>name
>>email
>>.etc...
>>StateID  (foreign key)
>>
>>
>> STATE
>>StateID
>>StateName
>>
>> But I often wonder if there's any benefit in having a State table...
>>
>> For one, new States/Provinces are not that likely, removal of these 
>> entities
>> is also unlikely (except maybe Quebec :-)) so the chances of having to 
>> make
>> alternations to the State table near nil. It raises the question of 
>> whether
>> or not a State Table is even necessary.
>> Using a traditional SQL State table, I'd have to do an SQL query to 
>> populate
>> User State/Province options every time the "New User Registration" form 
>> is
>> shown - isn't this needless overhead?!
>> Would my webforms not load faster if State/Province information was
>> hard-coded as options for a combobox? Is this something people have
>> experimented with?
>>
>> There are various other fields that I believe could be handled like this 
>> for
>> a cumulative performance boost. For example: country, state/province,
>> gender, industry, occupation, ethnicity, language are all options that
>> aren't going to change that often. Then again, when we do have to display
>> the users choice; for example if user.countryID=3 we'd have to have a way 
>> to
>> turn that number 3 into the word "Canada" when we display the users
>> profile... I'd probably do this via XML lookup.
>>
>> Has anyone experimented with the benefits/tradeoffs of such a scheme? As
>> I've said, we are dealing with at least 200,000 user records, probably
>> 300,000-400,000 in the next year. The User table contains at least 50
>> attributes, 15 of which are Foreign Keys that link to tables that will
>> likely never change (the users choices for countryID, stateID,
>> OperatingSystemID, internet_connectionTypeID, internetUserFrequencyID,
>> industryID, occupationID)...
>>
>> ThanX in advance
>> Ben
>>
>>
>>
> -- 
> Ian Simpson
> System Administrator
> MyJobGroup
>
> 



-- 
MySQL General Mailing List
For list archives: http://lists.mysql.com/mysql
To unsubscribe:http://lists.mysql.com/[EMAIL PROTECTED]



Re: Relational Databasing on busy webserver

2008-09-23 Thread David Ashley
On Tue, Sep 23, 2008 at 11:13 AM, Olexandr Melnyk <[EMAIL PROTECTED]> wrote:

>  On 9/23/08, David Ashley <[EMAIL PROTECTED]> wrote:
>>
>> For example, for the 50 states, a lot of programmers would put this logic
>> in
>> the web script and just store the two-letter postal code in the database
>> table (but with no separate table for "states").  The mapping from "MI" to
>> "Michigan" would occur in the web scripts; as would the logic to know that
>> "XQ" is not a state.
>>
>
> Keep in mind that, as you add more countries, maintaining in-scipt
> code/name mappings of state codes will become more and more cumbersome.
> Another problem will arise if you want to view user information, along with
> full country and state names, from a different programming language.
>
> Plus, if the same query is run very often and table is almost static,
> chances are high that the result will be in query cache.


The "different programming language" issue is one I hadn't thought about.
Good catch.

I program in PHP only, and I run queries manually only if I need to debug
something hard.  But I could see the issue you cited arising in a more
complex environment.  I could even see a "mixed" web application where the
web presence is a mixture of PHP, Python, and Perl.

Good catch.


Re: Relational Databasing on busy webserver

2008-09-23 Thread Olexandr Melnyk
On 9/23/08, David Ashley <[EMAIL PROTECTED]> wrote:
>
> For example, for the 50 states, a lot of programmers would put this logic
> in
> the web script and just store the two-letter postal code in the database
> table (but with no separate table for "states").  The mapping from "MI" to
> "Michigan" would occur in the web scripts; as would the logic to know that
> "XQ" is not a state.
>

Keep in mind that, as you add more countries, maintaining in-scipt code/name
mappings of state codes will become more and more cumbersome. Another
problem will arise if you want to view user information, along with full
country and state names, from a different programming language.

Plus, if the same query is run very often and table is almost static,
chances are high that the result will be in query cache.

-- 
Sincerely yours,
Olexandr Melnyk <><
http://omelnyk.net/


Re: Relational Databasing on busy webserver

2008-09-23 Thread David Ashley
On Mon, Sep 22, 2008 at 5:12 PM, Ben A.H. <[EMAIL PROTECTED]> wrote:

>
> There are various other fields that I believe could be handled like this
> for
> a cumulative performance boost. For example: country, state/province,
> gender, industry, occupation, ethnicity, language are all options that
> aren't going to change that often. Then again, when we do have to display
> the users choice; for example if user.countryID=3 we'd have to have a way
> to
> turn that number 3 into the word "Canada" when we display the users
> profile... I'd probably do this via XML lookup.
>
> Has anyone experimented with the benefits/tradeoffs of such a scheme? As
> I've said, we are dealing with at least 200,000 user records, probably
> 300,000-400,000 in the next year. The User table contains at least 50
> attributes, 15 of which are Foreign Keys that link to tables that will
> likely never change (the users choices for countryID, stateID,
> OperatingSystemID, internet_connectionTypeID, internetUserFrequencyID,
> industryID, occupationID)...
>
Hi Ben,

This is a very common issue in web database development.  The issue is
really whether the tables should exist (logically) in the database or in the
web scripts.

For example, for the 50 states, a lot of programmers would put this logic in
the web script and just store the two-letter postal code in the database
table (but with no separate table for "states").  The mapping from "MI" to
"Michigan" would occur in the web scripts; as would the logic to know that
"XQ" is not a state.

This is done all the time.

As long as the database is indexed properly for the queries you want to run,
there is not usually a severe performance hit.  Either approach is fine; but
the web script approach is usually easier.

I just have one piece of advice for you:

Use something stored in the database that won't change when you update the
fixed tables in the web scripts.  For example, a 2-letter postal state code
is fine; because when they add the 51st state they won't change any of the
existing codes.  Similarly, use "M", "F", etc. for sex.  Don't store
anything in the database that is tied to, for example, the integer index of
a table entry in the web script or stored in some other way where when you
modify the script you'll need to alter the database as well; otherwise
you'll be writing a separate script to remap values of columns in the
database so that you can use the new scripts.

Dave.


Re: Relational Databasing on busy webserver

2008-09-23 Thread Ian Simpson
Hi Ben,

It sounds like what you're looking for is an ENUM value:

http://dev.mysql.com/doc/refman/5.0/en/enum.html

Bear in mind when using this data-type that if you do want to add a new
value (such as a new state/country), you will have to perform an ALTER
TABLE statement, which can take some time to execute over a table
storing a lot of data. Using the relational method means you'd just have
to add one row to a table, which is significantly faster.

Thanks 

On Mon, 2008-09-22 at 17:12 -0400, Ben A.H. wrote:
> Hello,
> 
> We are setting up a relatively common web application which collects "user 
> information"... Right off the bat our system will have over 200,000 USER 
> RECORDS so having an efficient database & lookup scheme is critical.
> I am a programmer/developer with some education in databasing but my forte 
> is programming hence this odd (I think) question...
> 
> Obviously we'll have the standard USER table & a bunch of supporting tables.
> 
> For items like STATE/PROVINCE standard database logic would dictate I setup:
> 
> USER
>name
>email
>.etc...
>StateID  (foreign key)
> 
> 
> STATE
>StateID
>StateName
> 
> But I often wonder if there's any benefit in having a State table...
> 
> For one, new States/Provinces are not that likely, removal of these entities 
> is also unlikely (except maybe Quebec :-)) so the chances of having to make 
> alternations to the State table near nil. It raises the question of whether 
> or not a State Table is even necessary.
> Using a traditional SQL State table, I'd have to do an SQL query to populate 
> User State/Province options every time the "New User Registration" form is 
> shown - isn't this needless overhead?!
> Would my webforms not load faster if State/Province information was 
> hard-coded as options for a combobox? Is this something people have 
> experimented with?
> 
> There are various other fields that I believe could be handled like this for 
> a cumulative performance boost. For example: country, state/province, 
> gender, industry, occupation, ethnicity, language are all options that 
> aren't going to change that often. Then again, when we do have to display 
> the users choice; for example if user.countryID=3 we'd have to have a way to 
> turn that number 3 into the word "Canada" when we display the users 
> profile... I'd probably do this via XML lookup.
> 
> Has anyone experimented with the benefits/tradeoffs of such a scheme? As 
> I've said, we are dealing with at least 200,000 user records, probably 
> 300,000-400,000 in the next year. The User table contains at least 50 
> attributes, 15 of which are Foreign Keys that link to tables that will 
> likely never change (the users choices for countryID, stateID, 
> OperatingSystemID, internet_connectionTypeID, internetUserFrequencyID, 
> industryID, occupationID)...
> 
> ThanX in advance
> Ben
> 
> 
> 
-- 
Ian Simpson
System Administrator
MyJobGroup



-- 
MySQL General Mailing List
For list archives: http://lists.mysql.com/mysql
To unsubscribe:http://lists.mysql.com/[EMAIL PROTECTED]



Relational Databasing on busy webserver

2008-09-22 Thread Ben A.H.
Hello,

We are setting up a relatively common web application which collects "user 
information"... Right off the bat our system will have over 200,000 USER 
RECORDS so having an efficient database & lookup scheme is critical.
I am a programmer/developer with some education in databasing but my forte 
is programming hence this odd (I think) question...

Obviously we'll have the standard USER table & a bunch of supporting tables.

For items like STATE/PROVINCE standard database logic would dictate I setup:

USER
   name
   email
   .etc...
   StateID  (foreign key)


STATE
   StateID
   StateName

But I often wonder if there's any benefit in having a State table...

For one, new States/Provinces are not that likely, removal of these entities 
is also unlikely (except maybe Quebec :-)) so the chances of having to make 
alternations to the State table near nil. It raises the question of whether 
or not a State Table is even necessary.
Using a traditional SQL State table, I'd have to do an SQL query to populate 
User State/Province options every time the "New User Registration" form is 
shown - isn't this needless overhead?!
Would my webforms not load faster if State/Province information was 
hard-coded as options for a combobox? Is this something people have 
experimented with?

There are various other fields that I believe could be handled like this for 
a cumulative performance boost. For example: country, state/province, 
gender, industry, occupation, ethnicity, language are all options that 
aren't going to change that often. Then again, when we do have to display 
the users choice; for example if user.countryID=3 we'd have to have a way to 
turn that number 3 into the word "Canada" when we display the users 
profile... I'd probably do this via XML lookup.

Has anyone experimented with the benefits/tradeoffs of such a scheme? As 
I've said, we are dealing with at least 200,000 user records, probably 
300,000-400,000 in the next year. The User table contains at least 50 
attributes, 15 of which are Foreign Keys that link to tables that will 
likely never change (the users choices for countryID, stateID, 
OperatingSystemID, internet_connectionTypeID, internetUserFrequencyID, 
industryID, occupationID)...

ThanX in advance
Ben



-- 
MySQL General Mailing List
For list archives: http://lists.mysql.com/mysql
To unsubscribe:http://lists.mysql.com/[EMAIL PROTECTED]