Re: Relational Databasing on busy webserver

2008-09-23 Thread David Ashley
On Mon, Sep 22, 2008 at 5:12 PM, Ben A.H. [EMAIL PROTECTED] wrote:


 There are various other fields that I believe could be handled like this
 for
 a cumulative performance boost. For example: country, state/province,
 gender, industry, occupation, ethnicity, language are all options that
 aren't going to change that often. Then again, when we do have to display
 the users choice; for example if user.countryID=3 we'd have to have a way
 to
 turn that number 3 into the word Canada when we display the users
 profile... I'd probably do this via XML lookup.

 Has anyone experimented with the benefits/tradeoffs of such a scheme? As
 I've said, we are dealing with at least 200,000 user records, probably
 300,000-400,000 in the next year. The User table contains at least 50
 attributes, 15 of which are Foreign Keys that link to tables that will
 likely never change (the users choices for countryID, stateID,
 OperatingSystemID, internet_connectionTypeID, internetUserFrequencyID,
 industryID, occupationID)...

Hi Ben,

This is a very common issue in web database development.  The issue is
really whether the tables should exist (logically) in the database or in the
web scripts.

For example, for the 50 states, a lot of programmers would put this logic in
the web script and just store the two-letter postal code in the database
table (but with no separate table for states).  The mapping from MI to
Michigan would occur in the web scripts; as would the logic to know that
XQ is not a state.

This is done all the time.

As long as the database is indexed properly for the queries you want to run,
there is not usually a severe performance hit.  Either approach is fine; but
the web script approach is usually easier.

I just have one piece of advice for you:

Use something stored in the database that won't change when you update the
fixed tables in the web scripts.  For example, a 2-letter postal state code
is fine; because when they add the 51st state they won't change any of the
existing codes.  Similarly, use M, F, etc. for sex.  Don't store
anything in the database that is tied to, for example, the integer index of
a table entry in the web script or stored in some other way where when you
modify the script you'll need to alter the database as well; otherwise
you'll be writing a separate script to remap values of columns in the
database so that you can use the new scripts.

Dave.


Re: Relational Databasing on busy webserver

2008-09-23 Thread David Ashley
On Tue, Sep 23, 2008 at 11:13 AM, Olexandr Melnyk [EMAIL PROTECTED] wrote:

  On 9/23/08, David Ashley [EMAIL PROTECTED] wrote:

 For example, for the 50 states, a lot of programmers would put this logic
 in
 the web script and just store the two-letter postal code in the database
 table (but with no separate table for states).  The mapping from MI to
 Michigan would occur in the web scripts; as would the logic to know that
 XQ is not a state.


 Keep in mind that, as you add more countries, maintaining in-scipt
 code/name mappings of state codes will become more and more cumbersome.
 Another problem will arise if you want to view user information, along with
 full country and state names, from a different programming language.

 Plus, if the same query is run very often and table is almost static,
 chances are high that the result will be in query cache.


The different programming language issue is one I hadn't thought about.
Good catch.

I program in PHP only, and I run queries manually only if I need to debug
something hard.  But I could see the issue you cited arising in a more
complex environment.  I could even see a mixed web application where the
web presence is a mixture of PHP, Python, and Perl.

Good catch.


Re: Patch for mysql

2008-09-22 Thread David Ashley
On Mon, Sep 22, 2008 at 7:54 AM, Colin Charles [EMAIL PROTECTED]wrote:

 Nagaraj S wrote:

 Hi Gurus,

 Good Day! How to apply mysql bug fixing patches or it is just updating or
 replacing some code in a c file and recompiling the source?


 Depending on how your patch looks like, you will use the Unix patch(1)
 utility or not

 Anyone able to supply some simple steps to patch instructions?  I have
 mysql
 server running on Solaris 10 system.


 Why not just use the method that Solaris provides to update the MySQL
 package?

 Applying patches requires you to recompile MySQL, and if you're already
 curious about how to apply a patch (and you're not clear what patch this
 might be), I'd avoid this method of getting things done

One possible ambiguity is that patch means different things to different
people.

To hard-core *nix users, patch strictly means the behavior of the patch
program (man patch).

However, to certain others, patch may mean any bugfix mechanism (updates).

I suspect that the OP used patch when the lingo of the list would have
been update.


Re: Appropriate Design

2008-09-17 Thread David Ashley
On Wed, Sep 17, 2008 at 5:46 AM, Jeffrey Santos [EMAIL PROTECTED] wrote:

 I'm not sure if this is the right list to ask such a question; if not, any
 direction as to the appropriate forum would be much appreciated!

 This may be a simple question, but I'm still on the early stages of my
 learning about databases.  I'm developing an application that will require
 information from various sources.  Since what might be considered required
 information about those sources will vary (occasionally dramatically), I'm
 unsure as to the correct design for the database tables.  In other words
 I'm
 left with (I believe) two choices:

 1.   Create a source table that will contain, among other things, every
 possible field for a source type and sort out what is considered required
 information at the application level.

 2.   Create a few source tables detailing the required information about
 each source type.  The only way I can think of doing this is something
 along
 the lines of:

 sourceFields: fieldId, fieldName, 
 sourceRequireds: reqId, typeId, fieldId, 
 sourceInfo: infoId, fieldId, sourceId, 

 and then pull only the appropriate data from the database at query time
 using JOINs.

 Which of these options (or a third I have no idea about!) would be
 appropriate here?

 If possible, maybe a general conventional wisdom statement would greatly
 help my education on these matters!

 Thank you,

 Jeffrey Santos


The second solution (multiple tables) is generally the only acceptable
solution.

The general rule (for one table or many) is whether the fields in the single
table are fixed in quantity or perhaps are repetitive enough to be tedious
or repetitive to manipulate.

For example, assume you want a database of people in your state and the cars
they own.  Clearly, since people vary widely in the number of cars they own
and since each car has similar data, puttling fields like car1vin, car2vin,
car3vin in the people table is the wrong solution.

Your problem description is rather vague, but the way I'm reading it is that
your sources vary widely in the fields that need to be recorded.  In that
case, you'd generally want a table of sources (each instance of a source), a
table of fields (each field that MAY be associated with a source), and a
table of sourcefields (the actual data for a given field with a given
source).

But you really need to solidify the description of your problem or have a
simplest example so people can really help you.  I don't fully understand
your description.

Dave.


Re: Appropriate Design

2008-09-17 Thread David Ashley
On Wed, Sep 17, 2008 at 11:38 AM, Jeffrey Santos [EMAIL PROTECTED]wrote:

 Hi David,

 My project involves storing information that could have been gathered from
 wildly different sources.  You can think of this part of the database as a
 sort of bibliography for the sources of that information.  If I gathered
 information from a book, for instance, there are some generally required
 fields such as Author, Title, ... etc.  A website, on the other hand,
 would also require fields such as URL and Date Visited and so on and so
 forth.

 I hope that's more clear!


Ah, OK, the essence of your problem is what is called in some programming
languages a variant record.  One table is related to another table in a
one-to-many relationship, the the items in the second table may vary wildly
in what needs to be stored.

http://en.wikibooks.org/wiki/Programming:Ada:Types:record#Variant_record

The simplest real-life example I can think of would be to make a record of
each citizen of the world and all of their property.  The is a one-to-many
relationship between people and property.  The problem is that each item of
property has different attributes.  A computer will have different things
you want to record than a goat.

I'm not qualified to even guess at this question in the context of MySQL.
I'm sure others can be more helpful.

The two approaches that come to mind are:

a)For simple problems, in the second table to have an enumerated
discriminator field and a set of additional fields that is the union of
everything required, with some unused depending on the discriminator.

b)A more sophisticated approach involving named attributes.

I've gotta bow out of this one.  For all I know, MySQL might support variant
records directly.  I'm not worthy, I'm not worthy ...

Dave.


Re: MySQL not running on fresh LAMP install

2008-09-16 Thread David Ashley
On Tue, Sep 16, 2008 at 11:02 AM, Dave M G [EMAIL PROTECTED] wrote:

 MySQL List,

 I'm going through a post-hard-drive-crash process of reinstalling my whole
 operating system, and MySQL is acting very strange.

 I installed it as I always have done, by using Ubuntu's Synaptic Package
 Manager to install all the default packages for a LAMP server. Every time
 before I've done this it has worked like a dream.

 It seemed to work when I first installed it yesterday, but today, when
 trying to access phpMyAdmin, I got this error:
 #2002 - The server is not responding (or the local MySQL server's socket is
 not correctly configured)

 I figured I'd give MySQL a bit of a kick just to see if that would help:

 [EMAIL PROTECTED]:~$ sudo /etc/init.d/mysqld restart

 But it gave me this new and unusual response:

 sudo: /etc/init.d/mysqld: command not found

 This is only a narrow part of your problem, but did you forget rc.d as
part of the path?

i.e. /etc/rc.d/init.d/mysqld ...

??


Re: How many key buffer size should be allocate?

2008-09-08 Thread David Ashley
On Mon, Sep 8, 2008 at 12:47 AM, Dan Nelson [EMAIL PROTECTED] wrote:

 In the last episode (Sep 08), Yi Wang said:
  According to the mysql manual, I changed key_buffer_size from 8M to
  512M. But In mysql administrator, key buffer hitrate seems stay
  unchanged. And key buffer usage always below 4M.
 
  The total db size is about 200M. And the system's load is slight. key
  buffer size is 8M before changing. And the hitrate is about 100%
  intermittently.

 If your database is only 200M, then your indexes are probably less than
 half that (if you total up all of your .MYI files, that's the total
 size of all your indexes).  Setting key_buffer_size larger than your
 total index size doesn't do any good.

The other thing that struck me about this particular application is that
with typical RAM size for a modern computer, and with a lightly loaded
server, the ENTIRE database will effectively be cached:

http://en.wikipedia.org/wiki/Page_cache

Because of the small database size, I'm not confident that using the key
cache features of MySQL will lead to a significant performance increase.

There will be some performance increase, naturally, because it is cheaper to
find index values locally in MySQL rather than make the fseek() calls or
whatever through the OS.  But I also have to consider that the OS calls
usually involve an open file handle and primarily integer operations, and
are highly optimized.  I'm just not sure what kind of a performance increase
would result from using the key cache features of MySQL for such a small
database.

The OP should definitely explore these features, for (a) academic value, (b)
future expansion of the database, (c) other databases, (d) in case he is
running the database on an older server, and (e) in case the server is not
lightly loaded.  But in general for a 200M database ... I'm not sure of the
benefit.

I've never actually tried any of this, so I could be totally wrong about the
performance implications with a small database ...

Dave.


Re: Large Query Question.

2008-09-04 Thread David Ashley
On Thu, Sep 4, 2008 at 10:38 AM, mos [EMAIL PROTECTED] wrote
Jim,

The problem is likely your index is not defined properly. Use an Explain
in front of the query to see if it can use just one index from each table.

I would try building a compound index on

Products: (RecordReference, FeedId)

ProductContributors: (RecordReference, RowType)

This should get it to execute the join and where clause using just one index
from each table. Give that a try and see if it speeds things up. :)

Mike



I concur.

The SELECT time is going to resemble something like:

K_1 * F_1(number_of_records_in_database) + K_2 *
F_2(number_of_records_selected)

If the indices are effective, F_1 = log(N), but if the indices are not
effective, F_1 = N.

One thing you may want to try to narrow down the problem is just
retrieving 100 records (the COUNT clause of a query) and see how that
affects the speed, then try the full set and see how it is different.

If they aren't very different, then it is a F_1 problem.

But if they are different, then it is a K_2 / F_2 problem.

As far as K_2 or F_2 problems ...

Another possibility is that you are using ORDER BY on a large result set
that isn't indexed for an effective sort.  Try dropping the ORDER BY and see
what happens.

My view of how MySQL might work internally is perhaps naive.  But sorting
can be worst case O(N**2).

Dave.

 On 3-Sep-08, at 3:02 PM, mos wrote:

 Jim,
 Retrieving 100,000 rows will always take some time. Do you really
 need to return that many rows? Are you selecting just the columns you need?
 What are the slow queries?

 Mike

 At 12:05 PM 9/3/2008, Jim Leavitt wrote:

 Greetings List,

 We have a medium-large size database application which we are trying
 to optimize and I have a few questions.

 Server Specs
 1 Dual Core 2.6 Ghz
 2GB Ram

 Database Specs
 51 Tables
 Min 10 rows, Max 100 rows
 Total size approx 2GB

 My.cnf
 [mysqld]
 set-variable=local-infile=0
 log-slow-queries=slow-queries.log
 datadir=/var/lib/mysql
 socket=/var/lib/mysql/mysql.sock
 old_passwords=1
 key_buffer = 512M
 max_allowed_packet=4M
 sort_buffer_size = 512M
 read_buffer_size = 512M
 read_rnd_buffer_size = 256M
 record_buffer = 256M
 myisam_sort_buffer_size = 512M
 thread_cache = 128
 query_cache_limit = 1M
 query_cache_type = 1
 query_cache_size = 32M
 join_buffer = 512M
 table_cache = 512


 We are having trouble with certain queries which are returning
 anywhere from 10 - 30 rows.  Total query time is taking approx
 1 - 2 mins depending on load.  Is there anything in our conf file
 which could improve our performance?  Are there any hardware
 recommendations that could help us improve the speed?  Would more
 memory help us?  Any comments or recommendations are greatly
 appreciated.

 Thanks much.


 Jim Leavitt
 Developer
 Treefrog Interactive Inc. (http://www.treefrog.cawww.treefrog.ca)
 Bringing the Internet to Life





 --
 MySQL General Mailing List
 For list archives: http://lists.mysql.com/mysql
 http://lists.mysql.com/mysql
 To unsubscribe: http://lists.mysql.com/[EMAIL PROTECTED]
 http://lists.mysql.com/[EMAIL PROTECTED]


 Jim Leavitt
 Developer
 Treefrog Interactive Inc. (http://www.treefrog.ca/www.treefrog.ca)
 Bringing the Internet to Life
 ph: 905-836-4442 ext 104
 fx: 905-895-6561




 --
 MySQL General Mailing List
 For list archives: http://lists.mysql.com/mysql
 To unsubscribe:
 http://lists.mysql.com/[EMAIL PROTECTED]




Fwd: Large Query Question.

2008-09-04 Thread David Ashley
 I concur.

The SELECT time is going to resemble something like:

K_1 * F_1(number_of_records_in_database) + K_2 *
F_2(number_of_records_selected)

If the indices are effective, F_1 = log(N), but if the indices are not
effective, F_1 = N.

One thing you may want to try to narrow down the problem is just
retrieving 100 records (the COUNT clause of a query) and see how that
affects the speed, then try the full set and see how it is different.

If they aren't very different, then it is a F_1 problem.

But if they are different, then it is a K_2 / F_2 problem.

As far as K_2 or F_2 problems ...

Another possibility is that you are using ORDER BY on a large result set
that isn't indexed for an effective sort.  Try dropping the ORDER BY and see
what happens.

My view of how MySQL might work internally is perhaps naive.  But sorting
can be worst case O(N**2).

Dave.


Addendum:  I misremembered the SQL keywords.  It isn't COUNT.  It is (I
think) LIMIT.

Also, ORDER BY might be GROUP BY.

Oopsie.


Re: Large Query Question.

2008-09-03 Thread David Ashley
On Wed, Sep 3, 2008 at 1:05 PM, Jim Leavitt [EMAIL PROTECTED] wrote:


 We are having trouble with certain queries which are returning anywhere
 from 10 - 30 rows.  Total query time is taking approx 1 - 2 mins
 depending on load.  Is there anything in our conf file which could improve
 our performance?  Are there any hardware recommendations that could help us
 improve the speed?  Would more memory help us?  Any comments or
 recommendations are greatly appreciated.

Returning 100,000 to 300,000 rows will take some time no matter how you
slice it.

A more common approach is to be sure that the database is organized for
O(log N) retrieval, then to retrieve only the records you need (the ones you
need to display, for example), then to execute a second query to get more,
and then a third query, etc.

O(log N) retrieval = indices for the columns and the database arranged so
that equality and ordering are implemented using native data types.

What is your application?

Do you really need all those rows at one time?