Re: Ancient, unsolved high-CPU problem
Rene, How are you querying the database during normal use? What kind of applications are you using? ~Jeffrey Santos On Mon, Sep 22, 2008 at 2:46 PM, Rene Fournier [EMAIL PROTECTED] wrote: Uptime: 1054977 Threads: 10 Questions: 15576766 Slow queries: 229 Opens: 489 Flush tables: 1 Open tables: 483 Queries per second avg: 14.765 I know what the slow queries are--some that take 20-30 seconds to compute, and they are normal. The number of open tables seems high, no? The database that gets 95% of the load has ~35 tables in total. As for cron jobs, I have a number of command-line PHP scripts that perform regular queries. They've been running for about 10 days now. The current high CPU state started a couple days ago. On 22-Sep-08, at 8:30 PM, Martin Gainty wrote: curious if you have any cron jobs starting to execute? what does mysqladmin status show ? Martin __ Disclaimer and confidentiality note Everything in this e-mail and any attachments relates to the official business of Sender. This transmission is of a confidential nature and Sender does not endorse distribution to any party other than intended recipient. Sender does not necessarily endorse content contained within this transmission. From: [EMAIL PROTECTED] To: mysql@lists.mysql.com Subject: Ancient, unsolved high-CPU problem Date: Mon, 22 Sep 2008 19:41:25 +0200 For the longest time, I've had a strange problem with MySQL. Basically, after a certain amount of time--sometimes a few days, sometimes a couple weeks--its CPU usage will go from a steady 20-30% to 80-90%. Actual load and number of queries is the same, nothing else changes. If I shutdown MySQL and restart it (not the server), CPU% goes back to normal. What could this be? (Xserve G5 2GHz, 8GB, 3x250GB RAID5, Mac OS X 10.4.11, MySQL 5.0.51a) ...Rene -- MySQL General Mailing List For list archives: http://lists.mysql.com/mysql To unsubscribe: http://lists.mysql.com/[EMAIL PROTECTED] Get more out of the Web. Learn 10 hidden secrets of Windows Live. Learn Now
Appropriate Design
I'm not sure if this is the right list to ask such a question; if not, any direction as to the appropriate forum would be much appreciated! This may be a simple question, but I'm still on the early stages of my learning about databases. I'm developing an application that will require information from various sources. Since what might be considered required information about those sources will vary (occasionally dramatically), I'm unsure as to the correct design for the database tables. In other words I'm left with (I believe) two choices: 1. Create a source table that will contain, among other things, every possible field for a source type and sort out what is considered required information at the application level. 2. Create a few source tables detailing the required information about each source type. The only way I can think of doing this is something along the lines of: sourceFields: fieldId, fieldName, sourceRequireds: reqId, typeId, fieldId, sourceInfo: infoId, fieldId, sourceId, and then pull only the appropriate data from the database at query time using JOINs. Which of these options (or a third I have no idea about!) would be appropriate here? If possible, maybe a general conventional wisdom statement would greatly help my education on these matters! Thank you, Jeffrey Santos
Re: Appropriate Design
Kevin, Thank you for the extensive reply, the information you provide makes perfect sense. I have read about normalization so was anticipating that choice #2 would probably be more appropriate. In terms of the specifics of the project... This portion of the database would act as a sort of bibliography. It would hold the source information for data represented elsewhere in the database; however, there are no real constraints as to what type of source it can be. I'd like to define some standards such as book, article, website, and even self-authored types, but in reality there are possibly 100s of different types of sources that the information can derive from, each with its own fields (with some admitted overlap). With that last admission in mind, would it make further sense to have a pre-built, but extensible table of fields and just do the linking from the standard fields table to the source type in a different table? In other words, the required table would just present foreign keys to the source types table and the fields table. sourceTypes: id=1, name=book, ...; id=2, name=article, ... sourceFields: id=1, name=Author, ... sourceRequireds: id=1, type=1, field=1; id=2, type=2, field=1, ... Thank you again for your help! ~Jeffrey Santos On Wed, Sep 17, 2008 at 7:19 AM, Kevin Hunter [EMAIL PROTECTED] wrote: At 5:46am -0400 on Wed, 17 Sep 2008, Jeffrey Santos wrote: I'm developing an application that will require information from various sources. Since what might be considered required information about those sources will vary (occasionally dramatically), I'm unsure as to the correct design for the database tables. In other words I'm left with (I believe) two choices: 1. Create a source table that will contain, among other things, every possible field for a source type and sort out what is considered required information at the application level. If possible, maybe a general conventional wisdom statement would greatly help my education on these matters! This is a common pitfall of noobs to data modeling. The idea is to try to think of everything at the forefront, which is almost always an impossible task, or to think of a minimal case to jump start the coding process, and then add model components later as necessary. The second pitfall is keeping model logic in the application. This violates two similar principles: Single-Point-of-Authority and Don't-Repeat-Yourself. You are using a database so that it can maintain the structure and integrity of your data. There is absolutely no other compelling reason to use a DB. (If you didn't need integrity and structure, you'd use a simpler and mucho faster flat file.) Let the DB do its job and be the single-point-of-authority. The application should certainly do things the right way, putting data where it needs to go, but it should not be the application's /responsibility/ to keep data integrity and structure. If you rely on your application to maintain your structure, you presume that your programmers are perfect and will think of every little detail. (They're not and they won't. I promise.) As a general technology, databases receive a lot of work so that applications developers don't have to sweat the little details. Like making sure that every user in a table has a last name. Like making sure the user typed a 0 instead of an O. Like defining constraints so that developers don't have to make sure an account has enough money to make a withdraw. All they need know is that it didn't go through. The other principle of DRY also holds true. At the point you have your application try to maintain data constraints, you will inevitably have lots of repeated or similar code to maintain similar data cases. That's a different kind of nightmare. Similarly, with the DB, it's silly to define multiple columns for similar data. That's spreadsheet think. Like col_toyota, col_ford, col_chevy, col_lexus, col_buick. No. Make a single column as a foreign key to another table. Some keywords to use with Google: normalize normalization foreign keys foreign key constraints innodb Wikipedia is often a good starting point. Once you've learned the concept, I can't stress enough that you should normalize, normalize, normalize. 2. Create a few source tables detailing the required information about each source type. The only way I can think of doing this is something along the lines of: and then pull only the appropriate data from the database at query time using JOINs. This is closer to a better approach. That said, be more specific about the problem you're trying to solve and folks may be better able to provide useful advices. Kevin
Re: Appropriate Design
In my example given earlier which accounts for possible overlap in required fields, would your paragraph still apply? e.g., I would probably need all the field information every time I access a source type's requireds so that would fall under your description; however, that would mean having to have repeated instances of, for example, Author fields under the different source types. Which becomes more problematic? The JOIN overhead or the repetition of similar data? Thank you, ~Jeffrey Santos On Wed, Sep 17, 2008 at 11:21 AM, Martin Gainty [EMAIL PROTECTED] wrote: Good Morning Joins can easily result in FTS..so you might want to consider foreign key to child tables so assuming your starting with sourceRequireds as your topmost parent table with foreign keys to (child) sourceInfo and sourceFields keep in mind that later on de-normalising either child table could be time-consuming if you know know there exists a 1:1 relationship between sourceRequireds and all columns in any of the tables e.g. sourceFields incorporate each of the sourceFields columns into parent sourceRequireds will save JOIN and any/all cartesian resultset processing later.. anyone ? Martin __ Disclaimer and confidentiality note Everything in this e-mail and any attachments relates to the official business of Sender. This transmission is of a confidential nature and Sender does not endorse distribution to any party other than intended recipient. Sender does not necessarily endorse content contained within this transmission. Date: Wed, 17 Sep 2008 09:57:05 -0400 From: [EMAIL PROTECTED] To: [EMAIL PROTECTED] Subject: Re: Appropriate Design CC: mysql@lists.mysql.com On Wed, Sep 17, 2008 at 5:46 AM, Jeffrey Santos [EMAIL PROTECTED] wrote: I'm not sure if this is the right list to ask such a question; if not, any direction as to the appropriate forum would be much appreciated! This may be a simple question, but I'm still on the early stages of my learning about databases. I'm developing an application that will require information from various sources. Since what might be considered required information about those sources will vary (occasionally dramatically), I'm unsure as to the correct design for the database tables. In other words I'm left with (I believe) two choices: 1. Create a source table that will contain, among other things, every possible field for a source type and sort out what is considered required information at the application level. 2. Create a few source tables detailing the required information about each source type. The only way I can think of doing this is something along the lines of: sourceFields: fieldId, fieldName, sourceRequireds: reqId, typeId, fieldId, sourceInfo: infoId, fieldId, sourceId, and then pull only the appropriate data from the database at query time using JOINs. Which of these options (or a third I have no idea about!) would be appropriate here? If possible, maybe a general conventional wisdom statement would greatly help my education on these matters! Thank you, Jeffrey Santos The second solution (multiple tables) is generally the only acceptable solution. The general rule (for one table or many) is whether the fields in the single table are fixed in quantity or perhaps are repetitive enough to be tedious or repetitive to manipulate. For example, assume you want a database of people in your state and the cars they own. Clearly, since people vary widely in the number of cars they own and since each car has similar data, puttling fields like car1vin, car2vin, car3vin in the people table is the wrong solution. Your problem description is rather vague, but the way I'm reading it is that your sources vary widely in the fields that need to be recorded. In that case, you'd generally want a table of sources (each instance of a source), a table of fields (each field that MAY be associated with a source), and a table of sourcefields (the actual data for a given field with a given source). But you really need to solidify the description of your problem or have a simplest example so people can really help you. I don't fully understand your description. Dave. -- Get more out of the Web. Learn 10 hidden secrets of Windows Live. Learn Nowhttp://windowslive.com/connect/post/jamiethomson.spaces.live.com-Blog-cns%21550F681DAD532637%215295.entry?ocid=TXT_TAGLM_WL_getmore_092008
Re: Appropriate Design
Hi David, My project involves storing information that could have been gathered from wildly different sources. You can think of this part of the database as a sort of bibliography for the sources of that information. If I gathered information from a book, for instance, there are some generally required fields such as Author, Title, ... etc. A website, on the other hand, would also require fields such as URL and Date Visited and so on and so forth. I hope that's more clear! ~Jeffrey Santos On Wed, Sep 17, 2008 at 9:57 AM, David Ashley [EMAIL PROTECTED] wrote: On Wed, Sep 17, 2008 at 5:46 AM, Jeffrey Santos [EMAIL PROTECTED]wrote: I'm not sure if this is the right list to ask such a question; if not, any direction as to the appropriate forum would be much appreciated! This may be a simple question, but I'm still on the early stages of my learning about databases. I'm developing an application that will require information from various sources. Since what might be considered required information about those sources will vary (occasionally dramatically), I'm unsure as to the correct design for the database tables. In other words I'm left with (I believe) two choices: 1. Create a source table that will contain, among other things, every possible field for a source type and sort out what is considered required information at the application level. 2. Create a few source tables detailing the required information about each source type. The only way I can think of doing this is something along the lines of: sourceFields: fieldId, fieldName, sourceRequireds: reqId, typeId, fieldId, sourceInfo: infoId, fieldId, sourceId, and then pull only the appropriate data from the database at query time using JOINs. Which of these options (or a third I have no idea about!) would be appropriate here? If possible, maybe a general conventional wisdom statement would greatly help my education on these matters! Thank you, Jeffrey Santos The second solution (multiple tables) is generally the only acceptable solution. The general rule (for one table or many) is whether the fields in the single table are fixed in quantity or perhaps are repetitive enough to be tedious or repetitive to manipulate. For example, assume you want a database of people in your state and the cars they own. Clearly, since people vary widely in the number of cars they own and since each car has similar data, puttling fields like car1vin, car2vin, car3vin in the people table is the wrong solution. Your problem description is rather vague, but the way I'm reading it is that your sources vary widely in the fields that need to be recorded. In that case, you'd generally want a table of sources (each instance of a source), a table of fields (each field that MAY be associated with a source), and a table of sourcefields (the actual data for a given field with a given source). But you really need to solidify the description of your problem or have a simplest example so people can really help you. I don't fully understand your description. Dave.
Re: Appropriate Design
Thank you, that was a very helpful discussion. I'm worried a bit about scalability. This is a personal project so I will be the primary user of the end-result; however, if it is robust enough, I'd like to open it up to a larger audience. This type of scheme doesn't seem to lend itself to easily adding new fields or document types on-the-fly from within the application. Am I missing something or is that a fair assessment? It is not necessarily a current requirement; however, if it may be added later, I'd like to consider the implications now. We are also talking about small pieces of individual data, some of which might be self-authored, meaning that many of the fields that would be normally present in any work won't be in most of them here, such as Title, Publisher etc. Would including them in the documents aggregate table be a majority rules type of decision? Thank you again for your response, ~Jeffrey Santos On Wed, Sep 17, 2008 at 12:47 PM, Geert-Jan Brits [EMAIL PROTECTED] wrote: Hi Jeffrey, David already gave a lot of valid points. Table-per-documenttype seems the way to go here. As to the 'best' db-scheme for your task given your description you have to ask yourself a couple of questions: Please note that with a document-type I mean a type like book, html page, etc. 1. with variable fields per document I gather you mean that fields can differ from document-type to document-type right? But at the same time for any given document-type you can infer beforehand what fields this document-type should contain right? This would be ideal for table-per-documenttype. You can add a book-table once you start integrating books, and afterwards create a table for say word-documents with different fields altogether. 2. do you want your application to be able to search over all documents of different document-types at once? The answer is probably yes. This means you will need to have 1 central table which contains foreign keys to each row in each of the table-per-documenttype tables. Call this table 'documents' or something (see below). Given that this is a requirement (I'm guessing here), you HAVE to do joins anyway when you want to search over ALL documents of all different types. 3. given if 2 is true it would be best in my opinion to put shared fields in the documents-table. Fields like author,title, google-like summary, publication_date, rating, filesize, content-type, fetch-date, Modification-date etc come to mind. 4. normalization vs. Speed. Normalization is a good thing... Having said that, when you often only want to query 1 document-type at a time, doing joins between 1 table-per-dcoumenttype table and the documents-table is a bit Waste of time. Especially with large number of rows (when I hear talk about html-pages I think alot of rows ;-) . For this you could consider keeping the shared fields (author, publication-date, etc) Redundant in each of the table-per-documenttype tables. Of course this means that you have to keep these redundant pieces of information in a consistent state, otherwise madness lies ahead. For the most part your import-application should take care of this, although perhaps other people know how to do this (partially) with MySQL. This requires some work / testing, but it would surely be worth it. What you're left with is this: --- | documents --- |id k |documentid fk |documenttypeid fk |author |summary |content-type |publication-date |fetch-date |... Field ID is globally unique over all documents of all different types Field Documentid is locally unique to all documents of type documenttypeid Field documenttypeid points to a table documenttypes where all different documenttypes are described (i.e: 1:Book, 2:HTML, 3:WORD-DOC,etc) -- |documenttypes -- |documenttypeid k |documenttype_name And N different table-per-documenttype tables For instance -- | Books -- | documentidk | (redundant shared fields) | book specific fields -- | HTML -- | documentidk | (redundant shared fields) | html specific fields (rawhtml , cleanedhtml, stripped_html,etc) THis enables you to: - use the documenttypeid to filter based on doucmenttypes (only books and pdf's no problem) - use the documenttypeid to determine with which table-per-documenttype table to join. (define a map-structure in your application which holds the relation between documenttypeid and the table with which to join or you could add the table-name as A column to the Documenttypes-table but the lattter option would require joining with the documenttypes-table which
UPDATE statement optimization
Hey all! I'm a very novice MYSQL user of the mentality of get it working, then get it working the right way, feeling the best way to learn is to just do it. I got things working but now I'm looking back and trying to get better efficiency out of my SQL statements. Database is setup like this: Have one table using primary keys called fcsets Have a second table called fcusers with a field vsets which is a comma separated list of primary keys referring to the first table Users can dynamically delete entries from the first table, so I want to be able to remove that entry from all users who have it in their csl vsets in fcusers. This is a PHP statement but I think you'll get the general idea of how I'm doing it. Am looking for some suggestions on how to do this more efficiently: UPDATE fcusers SET vsets=TRIM(BOTH ',' FROM CONCAT_WS(',', SUBSTRING_INDEX(`vsets`, ',', FIND_IN_SET('{$setkey}', `vsets`) - 1), SUBSTRING_INDEX(`vsets`, ',', FIND_IN_SET(REVERSE('{$setkey}'), REVERSE(`vsets`)) * -1 + 1))) WHERE vsets REGEXP '^.*,?{$setkey},?.*$' Where $setkey obviously refers to the key of the entry in fcsets being deleted. Thanks! - Jeff
RE: UPDATE statement optimization
I understand and it's actually the original way I was designing the databases. Only thing that stopped me from doing it that way was the fact that the sets a user belongs to (vsets) is also dynamic, the user can change which sets he/she belongs to. Meaning that if there are 50 sets and 50 users then we have the possibility of 2500 rows in the table for a relatively small population. Since each user will have most likely created their own sets and the ability to create multiple ones, the dynamic is probably more along the lines of a 3:1 set to user ratio, if not higher. So. 1000 users, 3000 sets 300 possible rows. While this is an extreme case, the possibility remains. Now having said that, the way I'm doing it isn't efficient due to processing time it would take to delete from the CSL on each 1000 users every time a set gets deleted. But, is it not equally inefficient to have to record and store 3 million rows? I'm not very familiar with the backend structure of how mysql actually stores information so I recognize the comparison might not even be valid. - Jeff _ From: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED] Sent: Monday, October 10, 2005 1:26 PM To: Jeffrey Santos Cc: mysql@lists.mysql.com Subject: Re: UPDATE statement optimization Jeffrey Santos [EMAIL PROTECTED] wrote on 10/10/2005 01:07:03 PM: Hey all! I'm a very novice MYSQL user of the mentality of get it working, then get it working the right way, feeling the best way to learn is to just do it. I got things working but now I'm looking back and trying to get better efficiency out of my SQL statements. Database is setup like this: Have one table using primary keys called fcsets Have a second table called fcusers with a field vsets which is a comma separated list of primary keys referring to the first table Users can dynamically delete entries from the first table, so I want to be able to remove that entry from all users who have it in their csl vsets in fcusers. This is a PHP statement but I think you'll get the general idea of how I'm doing it. Am looking for some suggestions on how to do this more efficiently: UPDATE fcusers SET vsets=TRIM(BOTH ',' FROM CONCAT_WS(',', SUBSTRING_INDEX(`vsets`, ',', FIND_IN_SET('{$setkey}', `vsets`) - 1), SUBSTRING_INDEX(`vsets`, ',', FIND_IN_SET(REVERSE('{$setkey}'), REVERSE(`vsets`)) * -1 + 1))) WHERE vsets REGEXP '^.*,?{$setkey},?.*$' Where $setkey obviously refers to the key of the entry in fcsets being deleted. Thanks! - Jeff You are having a hard time managing your set to user relationships because of your schema (database design). You need to normalize your data. That means you need to get rid of the comma separated list on fcusers. You need to make one entry in some table for each association with a set each user has. I assume there is more information in the fcusers table other than just what set a user belongs to and you don't want multiple copies of that information floating about. That means you need to make a third table, called something like `users_sets`, just to contain each pairs of PK values of your users and the PK values of the sets they belong to. It would look something like: CREATE TABLE users_sets ( users_id int, sets_id, PRIMARY KEY (users_id, set_id) ) Does that make sense? You create one row in users_sets for each time a user and a set become associated. When that association ends, delete the appropriate row. You might be thinking that working with multiple rows of data seems like more work, it won't be. Adding and subtracting sets to users (or users to sets) becomes simple INSERT and DELETE statements. Everything will move much faster. Can you see the pattern or would you like more details? Shawn Green Database Administrator Unimin Corporation - Spruce Pine
RE: UPDATE statement optimization
Shawn, Thank you for the advice, I had a feeling you'd say something like that. Will reorganize my databases. Makes my code a little easier to follow to boot :-P Thanks again, - Jeff _ From: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED] Sent: Monday, October 10, 2005 1:55 PM To: Jeffrey Santos Cc: mysql@lists.mysql.com Subject: RE: UPDATE statement optimization Unless you are on a severely underpowered machine, MySQL will handle 3 million rows without any problems. If you are on such an underpowered machine, then your current process must absolutely CRAWL!!! Don't worry numbers in the millionsl when you are working with enterprise-quality database engines, like MySQL. As long as you have the disk space, I wouldn't think twice about it. :-) As an example, those three million rows may use up to 15-18MB including indexes. You have WAY more free space than that on your disks, don't you. As it is, you have to decompose 1000 different lists of numbers, manually scan your decompositions for certain values, delete those certain values (if they exist), reconstitute each list, and update the original record. None of those activities (except for the final UPDATE) can possibly use an index and you always have to do it to EVERY row! It's much, MUCH more practical to use the separate table method (not to mention much, much faster.) Shawn Green Database Administrator Unimin Corporation - Spruce Pine Jeffrey Santos [EMAIL PROTECTED] wrote on 10/10/2005 01:37:58 PM: I understand and it's actually the original way I was designing the databases. Only thing that stopped me from doing it that way was the fact that the sets a user belongs to (vsets) is also dynamic, the user can change which sets he/she belongs to. Meaning that if there are 50 sets and 50 users then we have the possibility of 2500 rows in the table for a relatively small population. Since each user will have most likely created their own sets and the ability to create multiple ones, the dynamic is probably more along the lines of a 3:1 set to user ratio, if not higher. So. 1000 users, 3000 sets 300 possible rows. While this is an extreme case, the possibility remains. Now having said that, the way I'm doing it isn't efficient due to processing time it would take to delete from the CSL on each 1000 users every time a set gets deleted. But, is it not equally inefficient to have to record and store 3 million rows? I'm not very familiar with the backend structure of how mysql actually stores information so I recognize the comparison might not even be valid. - Jeff From: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED] Sent: Monday, October 10, 2005 1:26 PM To: Jeffrey Santos Cc: mysql@lists.mysql.com Subject: Re: UPDATE statement optimization Jeffrey Santos [EMAIL PROTECTED] wrote on 10/10/2005 01:07:03 PM: Hey all! I'm a very novice MYSQL user of the mentality of get it working, then get it working the right way, feeling the best way to learn is to just do it. I got things working but now I'm looking back and trying to get better efficiency out of my SQL statements. Database is setup like this: Have one table using primary keys called fcsets Have a second table called fcusers with a field vsets which is a comma separated list of primary keys referring to the first table Users can dynamically delete entries from the first table, so I want to be able to remove that entry from all users who have it in their csl vsets in fcusers. This is a PHP statement but I think you'll get the general idea of how I'm doing it. Am looking for some suggestions on how to do this more efficiently: UPDATE fcusers SET vsets=TRIM(BOTH ',' FROM CONCAT_WS(',', SUBSTRING_INDEX(`vsets`, ',', FIND_IN_SET('{$setkey}', `vsets`) - 1), SUBSTRING_INDEX(`vsets`, ',', FIND_IN_SET(REVERSE('{$setkey}'), REVERSE(`vsets`)) * -1 + 1))) WHERE vsets REGEXP '^.*,?{$setkey},?.*$' Where $setkey obviously refers to the key of the entry in fcsets being deleted. Thanks! - Jeff You are having a hard time managing your set to user relationships because of your schema (database design). You need to normalize your data. That means you need to get rid of the comma separated list on fcusers. You need to make one entry in some table for each association with a set each user has. I assume there is more information in the fcusers table other than just what set a user belongs to and you don't want multiple copies of that information floating about. That means you need to make a third table, called something like `users_sets`, just to contain each pairs of PK values of your users and the PK values of the sets they belong to. It would look something like: CREATE TABLE users_sets ( users_id int, sets_id, PRIMARY KEY