Re: Ancient, unsolved high-CPU problem

2008-09-23 Thread Jeffrey Santos
Rene,

How are you querying the database during normal use?  What kind of
applications are you using?

~Jeffrey Santos

On Mon, Sep 22, 2008 at 2:46 PM, Rene Fournier [EMAIL PROTECTED] wrote:

 Uptime: 1054977  Threads: 10  Questions: 15576766  Slow queries: 229
  Opens: 489  Flush tables: 1  Open tables: 483  Queries per second avg:
 14.765

 

 I know what the slow queries are--some that take 20-30 seconds to compute,
 and they are normal. The number of open tables seems high, no? The database
 that gets 95% of the load has ~35 tables in total.

 As for cron jobs, I have a number of command-line PHP scripts that perform
 regular queries. They've been running for about 10 days now. The current
 high CPU state started a couple days ago.




 On 22-Sep-08, at 8:30 PM, Martin Gainty wrote:

  curious if you have any cron jobs starting to execute?

 what does mysqladmin status show ?

 Martin
 __
 Disclaimer and confidentiality note
 Everything in this e-mail and any attachments relates to the official
 business of Sender. This transmission is of a confidential nature and Sender
 does not endorse distribution to any party other than intended recipient.
 Sender does not necessarily endorse content contained within this
 transmission.


  From: [EMAIL PROTECTED]
  To: mysql@lists.mysql.com
  Subject: Ancient, unsolved high-CPU problem
  Date: Mon, 22 Sep 2008 19:41:25 +0200

 
  For the longest time, I've had a strange problem with MySQL.
  Basically, after a certain amount of time--sometimes a few days,
  sometimes a couple weeks--its CPU usage will go from a steady 20-30%
  to 80-90%. Actual load and number of queries is the same, nothing else
  changes.
 
  If I shutdown MySQL and restart it (not the server), CPU% goes back to
  normal. What could this be?
 
  (Xserve G5 2GHz, 8GB, 3x250GB RAID5, Mac OS X 10.4.11, MySQL 5.0.51a)
 
  ...Rene
 
  --
  MySQL General Mailing List
  For list archives: http://lists.mysql.com/mysql
  To unsubscribe: http://lists.mysql.com/[EMAIL PROTECTED]
 

 Get more out of the Web. Learn 10 hidden secrets of Windows Live. Learn
 Now





Appropriate Design

2008-09-17 Thread Jeffrey Santos
I'm not sure if this is the right list to ask such a question; if not, any
direction as to the appropriate forum would be much appreciated!

This may be a simple question, but I'm still on the early stages of my
learning about databases.  I'm developing an application that will require
information from various sources.  Since what might be considered required
information about those sources will vary (occasionally dramatically), I'm
unsure as to the correct design for the database tables.  In other words I'm
left with (I believe) two choices:

1.   Create a source table that will contain, among other things, every
possible field for a source type and sort out what is considered required
information at the application level.

2.   Create a few source tables detailing the required information about
each source type.  The only way I can think of doing this is something along
the lines of:

sourceFields: fieldId, fieldName, 
sourceRequireds: reqId, typeId, fieldId, 
sourceInfo: infoId, fieldId, sourceId, 

and then pull only the appropriate data from the database at query time
using JOINs.

Which of these options (or a third I have no idea about!) would be
appropriate here?

If possible, maybe a general conventional wisdom statement would greatly
help my education on these matters!

Thank you,

Jeffrey Santos


Re: Appropriate Design

2008-09-17 Thread Jeffrey Santos
Kevin,

Thank you for the extensive reply, the information you provide makes perfect
sense.  I have read about normalization so was anticipating that choice #2
would probably be more appropriate.

In terms of the specifics of the project... This portion of the database
would act as a sort of bibliography.  It would hold the source information
for data represented elsewhere in the database; however, there are no real
constraints as to what type of source it can be.  I'd like to define some
standards such as book, article, website, and even self-authored
types, but in reality there are possibly 100s of different types of sources
that the information can derive from, each with its own fields (with some
admitted overlap).

With that last admission in mind, would it make further sense to have a
pre-built, but extensible table of fields and just do the linking from the
standard fields table to the source type in a different table? In other
words, the required table would just present foreign keys to the source
types table and the fields table.

sourceTypes: id=1, name=book, ...; id=2, name=article, ...
sourceFields: id=1, name=Author, ...
sourceRequireds: id=1, type=1, field=1; id=2, type=2, field=1, ...

Thank you again for your help!

~Jeffrey Santos


On Wed, Sep 17, 2008 at 7:19 AM, Kevin Hunter [EMAIL PROTECTED] wrote:

 At 5:46am -0400 on Wed, 17 Sep 2008, Jeffrey Santos wrote:
  I'm developing an application
  that will require information from various sources.  Since what
  might be considered required information about those sources will
  vary (occasionally dramatically), I'm unsure as to the correct
  design for the database tables.  In other words I'm left with (I
  believe) two choices:
 
  1.   Create a source table that will contain, among other things,
  every possible field for a source type and sort out what is
  considered required information at the application level.

  If possible, maybe a general conventional wisdom statement
  would greatly help my education on these matters!

 This is a common pitfall of noobs to data modeling.  The idea is to try
 to think of everything at the forefront, which is almost always an
 impossible task, or to think of a minimal case to jump start the coding
 process, and then add model components later as necessary.

 The second pitfall is keeping model logic in the application.  This
 violates two similar principles: Single-Point-of-Authority and
 Don't-Repeat-Yourself.

 You are using a database so that it can maintain the structure and
 integrity of your data.  There is absolutely no other compelling reason
 to use a DB.  (If you didn't need integrity and structure, you'd use a
 simpler and mucho faster flat file.)  Let the DB do its job and be the
 single-point-of-authority.  The application should certainly do things
 the right way, putting data where it needs to go, but it should not be
 the application's /responsibility/ to keep data integrity and structure.
  If you rely on your application to maintain your structure, you presume
 that your programmers are perfect and will think of every little detail.
  (They're not and they won't.  I promise.)

 As a general technology, databases receive a lot of work so that
 applications developers don't have to sweat the little details.  Like
 making sure that every user in a table has a last name.  Like making
 sure the user typed a 0 instead of an O.  Like defining constraints so
 that developers don't have to make sure an account has enough money to
 make a withdraw.  All they need know is that it didn't go through.

 The other principle of DRY also holds true.  At the point you have your
 application try to maintain data constraints, you will inevitably have
 lots of repeated or similar code to maintain similar data cases.  That's
 a different kind of nightmare.  Similarly, with the DB, it's silly to
 define multiple columns for similar data.  That's spreadsheet think.
 Like col_toyota, col_ford, col_chevy, col_lexus, col_buick.  No.  Make a
 single column as a foreign key to another table.

 Some keywords to use with Google:

 normalize
 normalization
 foreign keys
 foreign key constraints
 innodb

 Wikipedia is often a good starting point.

 Once you've learned the concept, I can't stress enough that you should
 normalize, normalize, normalize.

  2.   Create a few source tables detailing the required information about
  each source type.  The only way I can think of doing this is something
 along
  the lines of:

  and then pull only the appropriate data from the database at query time
  using JOINs.

 This is closer to a better approach.  That said, be more specific about
 the problem you're trying to solve and folks may be better able to
 provide useful advices.

 Kevin



Re: Appropriate Design

2008-09-17 Thread Jeffrey Santos
In my example given earlier which accounts for possible overlap in required
fields, would your paragraph still apply?

e.g., I would probably need all the field information every time I access a
source type's requireds so that would fall under your description; however,
that would mean having to have repeated instances of, for example, Author
fields under the different source types.  Which becomes more problematic?
The JOIN overhead or the repetition of similar data?

Thank you,

~Jeffrey Santos

On Wed, Sep 17, 2008 at 11:21 AM, Martin Gainty [EMAIL PROTECTED] wrote:

  Good Morning

 Joins can easily result in FTS..so you might want to consider foreign key
 to child tables
 so assuming your starting with sourceRequireds as your topmost parent table
 with foreign keys to (child) sourceInfo and sourceFields
 keep in mind that later on de-normalising either child table could be
 time-consuming
 if you know know there exists a 1:1 relationship between sourceRequireds
 and all columns in
 any of the tables e.g. sourceFields
 incorporate each of the sourceFields columns into parent sourceRequireds
 will save
 JOIN and any/all cartesian resultset processing later..

 anyone ?
 Martin
 __
 Disclaimer and confidentiality note
 Everything in this e-mail and any attachments relates to the official
 business of Sender. This transmission is of a confidential nature and Sender
 does not endorse distribution to any party other than intended recipient.
 Sender does not necessarily endorse content contained within this
 transmission.


  Date: Wed, 17 Sep 2008 09:57:05 -0400
  From: [EMAIL PROTECTED]
  To: [EMAIL PROTECTED]
  Subject: Re: Appropriate Design
  CC: mysql@lists.mysql.com

 
  On Wed, Sep 17, 2008 at 5:46 AM, Jeffrey Santos [EMAIL PROTECTED]
 wrote:
 
   I'm not sure if this is the right list to ask such a question; if not,
 any
   direction as to the appropriate forum would be much appreciated!
  
   This may be a simple question, but I'm still on the early stages of my
   learning about databases. I'm developing an application that will
 require
   information from various sources. Since what might be considered
 required
   information about those sources will vary (occasionally dramatically),
 I'm
   unsure as to the correct design for the database tables. In other words
   I'm
   left with (I believe) two choices:
  
   1. Create a source table that will contain, among other things, every
   possible field for a source type and sort out what is considered
 required
   information at the application level.
  
   2. Create a few source tables detailing the required information about
   each source type. The only way I can think of doing this is something
   along
   the lines of:
  
   sourceFields: fieldId, fieldName, 
   sourceRequireds: reqId, typeId, fieldId, 
   sourceInfo: infoId, fieldId, sourceId, 
  
   and then pull only the appropriate data from the database at query time
   using JOINs.
  
   Which of these options (or a third I have no idea about!) would be
   appropriate here?
  
   If possible, maybe a general conventional wisdom statement would
 greatly
   help my education on these matters!
  
   Thank you,
  
   Jeffrey Santos
 
 
  The second solution (multiple tables) is generally the only acceptable
  solution.
 
  The general rule (for one table or many) is whether the fields in the
 single
  table are fixed in quantity or perhaps are repetitive enough to be
 tedious
  or repetitive to manipulate.
 
  For example, assume you want a database of people in your state and the
 cars
  they own. Clearly, since people vary widely in the number of cars they
 own
  and since each car has similar data, puttling fields like car1vin,
 car2vin,
  car3vin in the people table is the wrong solution.
 
  Your problem description is rather vague, but the way I'm reading it is
 that
  your sources vary widely in the fields that need to be recorded. In that
  case, you'd generally want a table of sources (each instance of a
 source), a
  table of fields (each field that MAY be associated with a source), and a
  table of sourcefields (the actual data for a given field with a given
  source).
 
  But you really need to solidify the description of your problem or have a
  simplest example so people can really help you. I don't fully understand
  your description.
 
  Dave.

 --
 Get more out of the Web. Learn 10 hidden secrets of Windows Live. Learn
 Nowhttp://windowslive.com/connect/post/jamiethomson.spaces.live.com-Blog-cns%21550F681DAD532637%215295.entry?ocid=TXT_TAGLM_WL_getmore_092008



Re: Appropriate Design

2008-09-17 Thread Jeffrey Santos
Hi David,

My project involves storing information that could have been gathered from
wildly different sources.  You can think of this part of the database as a
sort of bibliography for the sources of that information.  If I gathered
information from a book, for instance, there are some generally required
fields such as Author, Title, ... etc.  A website, on the other hand,
would also require fields such as URL and Date Visited and so on and so
forth.

I hope that's more clear!

~Jeffrey Santos

On Wed, Sep 17, 2008 at 9:57 AM, David Ashley [EMAIL PROTECTED] wrote:

 On Wed, Sep 17, 2008 at 5:46 AM, Jeffrey Santos [EMAIL PROTECTED]wrote:

 I'm not sure if this is the right list to ask such a question; if not, any
 direction as to the appropriate forum would be much appreciated!

 This may be a simple question, but I'm still on the early stages of my
 learning about databases.  I'm developing an application that will require
 information from various sources.  Since what might be considered required
 information about those sources will vary (occasionally dramatically), I'm
 unsure as to the correct design for the database tables.  In other words
 I'm
 left with (I believe) two choices:

 1.   Create a source table that will contain, among other things, every
 possible field for a source type and sort out what is considered required
 information at the application level.

 2.   Create a few source tables detailing the required information about
 each source type.  The only way I can think of doing this is something
 along
 the lines of:

 sourceFields: fieldId, fieldName, 
 sourceRequireds: reqId, typeId, fieldId, 
 sourceInfo: infoId, fieldId, sourceId, 

 and then pull only the appropriate data from the database at query time
 using JOINs.

 Which of these options (or a third I have no idea about!) would be
 appropriate here?

 If possible, maybe a general conventional wisdom statement would greatly
 help my education on these matters!

 Thank you,

 Jeffrey Santos


 The second solution (multiple tables) is generally the only acceptable
 solution.

 The general rule (for one table or many) is whether the fields in the
 single table are fixed in quantity or perhaps are repetitive enough to be
 tedious or repetitive to manipulate.

 For example, assume you want a database of people in your state and the
 cars they own.  Clearly, since people vary widely in the number of cars they
 own and since each car has similar data, puttling fields like car1vin,
 car2vin, car3vin in the people table is the wrong solution.

 Your problem description is rather vague, but the way I'm reading it is
 that your sources vary widely in the fields that need to be recorded.  In
 that case, you'd generally want a table of sources (each instance of a
 source), a table of fields (each field that MAY be associated with a
 source), and a table of sourcefields (the actual data for a given field with
 a given source).

 But you really need to solidify the description of your problem or have a
 simplest example so people can really help you.  I don't fully understand
 your description.

 Dave.



Re: Appropriate Design

2008-09-17 Thread Jeffrey Santos
Thank you, that was a very helpful discussion.

I'm worried a bit about scalability.  This is a personal project so I will
be the primary user of the end-result; however, if it is robust enough, I'd
like to open it up to a larger audience.  This type of scheme doesn't seem
to lend itself to easily adding new fields or document types on-the-fly
from within the application.  Am I missing something or is that a fair
assessment?  It is not necessarily a current requirement; however, if it may
be added later, I'd like to consider the implications now.

We are also talking about small pieces of individual data, some of which
might be self-authored, meaning that many of the fields that would be
normally present in any work won't be in most of them here, such as Title,
Publisher etc. Would including them in the documents aggregate table be a
majority rules type of decision?

Thank you again for your response,

~Jeffrey Santos

On Wed, Sep 17, 2008 at 12:47 PM, Geert-Jan Brits [EMAIL PROTECTED] wrote:

 Hi Jeffrey,

 David already gave a lot of valid points.
 Table-per-documenttype seems the way to go here.

 As to the 'best' db-scheme for your task given your description you have to
 ask yourself a couple of questions:
 Please note that with a document-type I mean a type like book, html page,
 etc.

 1. with variable fields per document I gather you mean that fields can
 differ from document-type to document-type right?
 But at the same time for any given document-type you can infer beforehand
 what fields this document-type should contain right?
 This would be ideal for table-per-documenttype. You can add a book-table
 once you start integrating books, and afterwards create a table for say
 word-documents with different fields altogether.

 2. do you want your application to be able to search over all documents of
 different document-types at once?
 The answer is probably yes.
 This means you will need to have 1 central table which contains foreign
 keys
 to each row in each of the table-per-documenttype tables.
 Call this table 'documents' or something (see below).
 Given that this is a requirement (I'm guessing here), you HAVE to do joins
 anyway when you want to search over ALL documents of all different types.

 3. given if 2 is true it would be best in my opinion to put shared fields
 in
 the documents-table. Fields like author,title, google-like summary,
 publication_date, rating, filesize, content-type, fetch-date,
 Modification-date etc come to mind.

 4. normalization vs. Speed.
 Normalization is a good thing... Having said that, when you often only want
 to query 1 document-type at a time, doing joins between 1
 table-per-dcoumenttype table and the documents-table is a bit
 Waste of time. Especially with large number of rows (when I hear talk about
 html-pages I think alot of rows ;-) .   For this you could consider keeping
 the shared fields (author, publication-date, etc)
 Redundant in each of the table-per-documenttype tables.

 Of course this means that you have to keep these redundant pieces of
 information in a consistent state, otherwise madness lies ahead.
 For the most part your import-application should take care of this,
 although
 perhaps other people know how to do this (partially) with MySQL.
 This requires some work / testing, but it would surely be worth it.

 What you're left with is this:


 ---
 | documents
 ---
 |id k
 |documentid fk
 |documenttypeid fk
 |author
 |summary
 |content-type
 |publication-date
 |fetch-date
 |...

 Field ID is globally unique over all documents of all different types
 Field Documentid is locally unique to all documents of type documenttypeid
 Field documenttypeid points to a table documenttypes where all different
 documenttypes are described (i.e: 1:Book, 2:HTML, 3:WORD-DOC,etc)

 --
 |documenttypes
 --
 |documenttypeid k
 |documenttype_name

 And N different table-per-documenttype tables

 For instance

 --
 | Books
 --
 | documentidk
 | (redundant shared fields)
 | book specific fields


 --
 | HTML
 --
 | documentidk
 | (redundant shared fields)
 | html specific fields (rawhtml , cleanedhtml, stripped_html,etc)



 THis enables you to:
 - use the documenttypeid to filter based on doucmenttypes (only books and
 pdf's no problem)
 - use the documenttypeid to determine with which table-per-documenttype
 table to join.
(define a map-structure in your application which holds the
 relation between documenttypeid and the table with which to join or  you
 could add the table-name as
A column to the Documenttypes-table but the lattter option would
 require joining with the documenttypes-table which

UPDATE statement optimization

2005-10-10 Thread Jeffrey Santos
Hey all!

 

I'm a very novice MYSQL user of the mentality of get it working, then get
it working the right way, feeling the best way to learn is to just do it.
I got things working but now I'm looking back and trying to get better
efficiency out of my SQL statements.

 

Database is setup like this:

 

Have one table using primary keys called fcsets

Have a second table called fcusers with a field vsets which is a comma
separated list of primary keys referring to the first table

 

Users can dynamically delete entries from the first table, so I want to be
able to remove that entry from all users who have it in their csl vsets in
fcusers.  This is a PHP statement but I think you'll get the general idea of
how I'm doing it.  Am looking for some suggestions on how to do this more
efficiently:

 

UPDATE fcusers SET vsets=TRIM(BOTH ',' FROM CONCAT_WS(',',
SUBSTRING_INDEX(`vsets`, ',', FIND_IN_SET('{$setkey}', `vsets`) - 1),
SUBSTRING_INDEX(`vsets`, ',', FIND_IN_SET(REVERSE('{$setkey}'),
REVERSE(`vsets`)) * -1 + 1))) WHERE vsets REGEXP '^.*,?{$setkey},?.*$'

 

Where $setkey obviously refers to the key of the entry in fcsets being
deleted.

 

Thanks!

 

- Jeff



RE: UPDATE statement optimization

2005-10-10 Thread Jeffrey Santos
I understand and it's actually the original way I was designing the
databases.  Only thing that stopped me from doing it that way was the fact
that the sets a user belongs to (vsets) is also dynamic, the user can change
which sets he/she belongs to.  Meaning that if there are 50 sets and 50
users then we have the possibility of 2500 rows in the table for a
relatively small population.  Since each user will have most likely created
their own sets and the ability to create multiple ones, the dynamic is
probably more along the lines of a 3:1 set to user ratio, if not higher. So.

 

1000 users, 3000 sets 300 possible rows.  While this is an extreme case,
the possibility remains.  Now having said that, the way I'm doing it isn't
efficient due to processing time it would take to delete from the CSL on
each 1000 users every time a set gets deleted.  But, is it not equally
inefficient to have to record and store 3 million rows?  I'm not very
familiar with the backend structure of how mysql actually stores information
so I recognize the comparison might not even be valid.

 

- Jeff

 

  _  

From: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED] 
Sent: Monday, October 10, 2005 1:26 PM
To: Jeffrey Santos
Cc: mysql@lists.mysql.com
Subject: Re: UPDATE statement optimization

 



Jeffrey Santos [EMAIL PROTECTED] wrote on 10/10/2005 01:07:03 PM:

 Hey all!
 
  
 
 I'm a very novice MYSQL user of the mentality of get it working, then get
 it working the right way, feeling the best way to learn is to just do it.
 I got things working but now I'm looking back and trying to get better
 efficiency out of my SQL statements.
 
  
 
 Database is setup like this:
 
  
 
 Have one table using primary keys called fcsets
 
 Have a second table called fcusers with a field vsets which is a comma
 separated list of primary keys referring to the first table
 
  
 
 Users can dynamically delete entries from the first table, so I want to be
 able to remove that entry from all users who have it in their csl vsets in
 fcusers.  This is a PHP statement but I think you'll get the general idea
of
 how I'm doing it.  Am looking for some suggestions on how to do this more
 efficiently:
 
  
 
 UPDATE fcusers SET vsets=TRIM(BOTH ',' FROM CONCAT_WS(',',
 SUBSTRING_INDEX(`vsets`, ',', FIND_IN_SET('{$setkey}', `vsets`) - 1),
 SUBSTRING_INDEX(`vsets`, ',', FIND_IN_SET(REVERSE('{$setkey}'),
 REVERSE(`vsets`)) * -1 + 1))) WHERE vsets REGEXP '^.*,?{$setkey},?.*$'
 
  
 
 Where $setkey obviously refers to the key of the entry in fcsets being
 deleted.
 
  
 
 Thanks!
 
  
 
 - Jeff
 

You are having a hard time managing your set to user relationships because
of your schema (database design). 

You need to normalize your data. That means you need to get rid of the comma
separated list on fcusers. You need to make one entry in some table for each
association with a set each user has. 

I assume there is more information in the fcusers table other than just what
set a user belongs to and you don't want multiple copies of that information
floating about. That means you need to make a third table, called something
like  `users_sets`, just to contain each pairs of PK values of your users
and the PK values of the sets they belong to. It would look something like: 

CREATE TABLE users_sets ( 
users_id int, 
sets_id, 
PRIMARY KEY (users_id, set_id) 
) 

Does that make sense? You create one row in users_sets for each time a user
and a set become associated. When that association ends, delete the
appropriate row. 

You might be thinking that working with multiple rows of data seems like
more work, it won't be. Adding and subtracting sets to users (or users to
sets) becomes simple INSERT and DELETE statements. Everything will move much
faster. 

Can you see the pattern or would you like more details? 


Shawn Green
Database Administrator
Unimin Corporation - Spruce Pine



RE: UPDATE statement optimization

2005-10-10 Thread Jeffrey Santos
Shawn,

 

Thank you for the advice, I had a feeling you'd say something like that.
Will reorganize my databases.  Makes my code a little easier to follow to
boot :-P

 

Thanks again,

 

- Jeff

 

  _  

From: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED] 
Sent: Monday, October 10, 2005 1:55 PM
To: Jeffrey Santos
Cc: mysql@lists.mysql.com
Subject: RE: UPDATE statement optimization

 


Unless you are on a severely underpowered machine, MySQL will handle 3
million rows without any problems. If you are on such an underpowered
machine, then your current process must absolutely CRAWL!!! Don't worry
numbers in the millionsl when you are working with enterprise-quality
database engines, like MySQL. As long as you have the disk space, I wouldn't
think twice about it. :-) As an example, those three million rows may use up
to 15-18MB including indexes. You have WAY more free space than that on your
disks, don't you. 

As it is, you have to decompose 1000 different lists of numbers, manually
scan your decompositions for certain values, delete those certain values (if
they exist), reconstitute each list, and update the original record. None of
those activities (except for the final UPDATE) can possibly  use an index
and you always have to do it to EVERY row!  It's much, MUCH more practical
to use the separate table method (not to mention much, much faster.) 

Shawn Green
Database Administrator
Unimin Corporation - Spruce Pine 



Jeffrey Santos [EMAIL PROTECTED] wrote on 10/10/2005 01:37:58 PM:

 I understand and it's actually the original way I was designing the 
 databases.  Only thing that stopped me from doing it that way was 
 the fact that the sets a user belongs to (vsets) is also dynamic, 
 the user can change which sets he/she belongs to.  Meaning that if 
 there are 50 sets and 50 users then we have the possibility of 2500 
 rows in the table for a relatively small population.  Since each 
 user will have most likely created their own sets and the ability to
 create multiple ones, the dynamic is probably more along the lines 
 of a 3:1 set to user ratio, if not higher. So. 
   
 1000 users, 3000 sets 300 possible rows.  While this is an 
 extreme case, the possibility remains.  Now having said that, the 
 way I'm doing it isn't efficient due to processing time it would 
 take to delete from the CSL on each 1000 users every time a set gets
 deleted.  But, is it not equally inefficient to have to record and 
 store 3 million rows?  I'm not very familiar with the backend 
 structure of how mysql actually stores information so I recognize 
 the comparison might not even be valid. 
   
 - Jeff 
   
 
 From: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED] 
 Sent: Monday, October 10, 2005 1:26 PM
 To: Jeffrey Santos
 Cc: mysql@lists.mysql.com
 Subject: Re: UPDATE statement optimization 
   
 
 
 Jeffrey Santos [EMAIL PROTECTED] wrote on 10/10/2005 01:07:03
PM:
 
  Hey all!
  
   
  
  I'm a very novice MYSQL user of the mentality of get it working, then
get
  it working the right way, feeling the best way to learn is to just do
it.
  I got things working but now I'm looking back and trying to get better
  efficiency out of my SQL statements.
  
   
  
  Database is setup like this:
  
   
  
  Have one table using primary keys called fcsets
  
  Have a second table called fcusers with a field vsets which is a
comma
  separated list of primary keys referring to the first table
  
   
  
  Users can dynamically delete entries from the first table, so I want to
be
  able to remove that entry from all users who have it in their csl vsets
in
  fcusers.  This is a PHP statement but I think you'll get the general
idea of
  how I'm doing it.  Am looking for some suggestions on how to do this
more
  efficiently:
  
   
  
  UPDATE fcusers SET vsets=TRIM(BOTH ',' FROM CONCAT_WS(',',
  SUBSTRING_INDEX(`vsets`, ',', FIND_IN_SET('{$setkey}', `vsets`) - 1),
  SUBSTRING_INDEX(`vsets`, ',', FIND_IN_SET(REVERSE('{$setkey}'),
  REVERSE(`vsets`)) * -1 + 1))) WHERE vsets REGEXP '^.*,?{$setkey},?.*$'
  
   
  
  Where $setkey obviously refers to the key of the entry in fcsets being
  deleted.
  
   
  
  Thanks!
  
   
  
  - Jeff
  
 
 You are having a hard time managing your set to user relationships 
 because of your schema (database design). 
 
 You need to normalize your data. That means you need to get rid of 
 the comma separated list on fcusers. You need to make one entry in 
 some table for each association with a set each user has. 
 
 I assume there is more information in the fcusers table other than 
 just what set a user belongs to and you don't want multiple copies 
 of that information floating about. That means you need to make a 
 third table, called something like  `users_sets`, just to contain 
 each pairs of PK values of your users  and the PK values of the sets
 they belong to. It would look something like: 
 
 CREATE TABLE users_sets ( 
 users_id int, 
 sets_id, 
 PRIMARY KEY