RE: Audit Table storage for Primary Key(s)

Rick James Fri, 31 May 2013 12:21:41 -0700

UUID PRIMARY KEY (or even secondary index) -- 
Once the table gets big enough (bigger than RAM cache), each row INSERTed (or 
SELECTed) will be a disk hit.  (Rule of Thumb:  only 100 hits/sec.)  This is 
because _random_ keys (like UUID) make caching useless.  Actually, the slowdown 
will be gradual.  For example, once the table is 5 times as big as the cache, 
80% (1-1/5) of the INSERTs/SELECTs will hit disk.
Bottom line -- Avoid UUIDs in huge tables, if at all possible.  (Exception:  
The bits in type-1 UUIDs can be rearranged to be roughly chronological.)


BIGINT -- You cannot possibly hit its max with any existing hardware.

MyISAM -- PRIMARY KEY is just another secondary index.  Secondary indexes are 
separate BTrees.

InnoDB -- PRIMARY KEY and data coexist in the same BTree.  Secondary indexes 
are separate BTrees.

So, assuming this "audit" table will be huge (too big to be cached), you need 
to carefully consider every index, both for writing and for reading.

You mentioned that you might audit 50 tables?  An index that starts with 
table_name would be inserting/selecting in 50 spots.  If the second part of the 
index is something 'chronological', such as an AUTO_INCREMENT or TIMESTAMP, 
then there would be 50 "hot spots" in the index.  This is quite efficient.  
INDEX(table_name, UUID) would be bad because of the randomness.

InnoDB may be the preferred engine, even though the footprint is bigger.  This 
is because careful design of the PK could lead to INSERTs into hot spot(s), 
plus SELECTs being able to take advantage of locality of reference.  With 
PRIMARY KEY(table_name, ...), and SELECT .. WHERE tablename='...', InnoDB will 
find all the rows together (fewer disk hits); MyISAM will find the data 
scattered (more disk hits, hence slower).

Another aspect...  Would your SELECTs say "WHERE ... AND timestamp BETWEEN..." 
?  And, would you _usually_ query _recent_ times?  If so, there could be a 
boost from doing both of these
** PARTITION BY RANGE(TO_DAYS(timestamp))
** Move timestamp to the _end_ of any indexes that it is in.

I would be happy to discuss these principles further.  To be able to discuss 
more specifically, please provide
** Your tentative SHOW CREATE TABLE
** how big you plan for the table to become (#rows or GB),
** how much RAM you have

> -----Original Message-----
> From: Carsten Pedersen [mailto:[email protected]]
> Sent: Friday, May 31, 2013 4:05 AM
> Cc: [MySQL]
> Subject: Re: Audit Table storage for Primary Key(s)
> 
> Based on the little information available, I would make a lookup field
> consisting of tablename and primary keys.
> 
> (although I still believe that storing this information in the database
> in the first place is probably the wrong approach, but to each his own)
> 
> / Carsten
> 
> On 31-05-2013 12:58, Neil Tompkins wrote:
> > The kind of look ups will be trying to diagnose when and by who
> > applied a update.  So the primary key of the audit is important.  My
> > question is for performance, should the primary key be stored as a
> > indexed field like I mentioned before, or should I have a actual
> > individual field per primary key
> >
> >
> > On Fri, May 31, 2013 at 11:03 AM, Carsten Pedersen
> > <[email protected] <mailto:[email protected]>> wrote:
> >
> >     Again: Unless you can give some idea as to the kind of lookups
> you
> >     will be performing (which fields? Temporal values? etc.), it is
> >     impossible to give advice on the table structure. I wouldn't
> blame
> >     anyone for not being able to do so; saving data for debugging
> will
> >     always be a moving target and almost by definition you don't know
> >     today what you'll be looking for tomorrow.
> >
> >     That's why I think that using CSV tables _the contents of which
> can
> >     subsequently be analyzed using any of a number of text file
> >     processing tools_ may indeed be your best initial option.
> >
> >     On UUIDs vs. INTs: (1) Please do yourself a favor and read up on
> how
> >     UUIDs are generated. If it's the same server that generates all
> the
> >     UUIDs, you won't get a lot of uniqueness for the amount of space
> >     you'll be using for your data and index; (2) Please do the math
> of
> >     just how many inserts you can do per second over the next 1.000
> >     years if you use a longint auto-increment field for your PK.
> >
> >     / Carsten
> >
> >     On 31-05-2013 11 <tel:31-05-2013%2011>:14, Neil Tompkins wrote:
> >
> >         Thanks for your response.  We expect to use the Audit log
> when
> >         looking into
> >         exceptions and/or any need to debug table updates.  I don't
> >         think a CSV
> >         table would be sufficient as we are wanting to use a
> interface
> >         to query
> >         this data at least on a daily basis if not weekly.
> >
> >         I use UUID because we have currently 54 tables, of which
> >         probably 30 will
> >         be audited.  So a INT PK wouldn't work because of the number
> of
> >         updates we
> >         are applying.
> >
> >
> >         On Fri, May 31, 2013 at 9:58 AM, Carsten Pedersen
> >         <[email protected] <mailto:[email protected]>>wrote:
> >
> >             On 30-05-2013 09:27, Neil Tompkins wrote:
> >
> >                 Hi,
> >
> >                 I've created a Audit table which tracks any changed
> >                 fields for multiple
> >                 tables.  In my Audit table I'm using a UUID for the
> >                 primary key.  However
> >                 I
> >                 need to have a reference back to the primary key(s)
> of
> >                 the table audited.
> >
> >                 At the moment I've a VARCHAR field which stores
> primary
> >                 keys like
> >
> >                 1
> >                 1|2013-05-29
> >                 2|2013-05-29
> >                 2
> >                 3
> >                 1|2|2
> >                 etc
> >
> >                 Is this the best approach, or should I have a
> individual
> >                 field in the
> >                 audit
> >                 table for all primary keys.  At the moment I think
> the
> >                 max number of
> >                 primary keys on any given table is 3
> >
> >                 Thanks
> >                 Neil
> >
> >
> >             First you need to ask yourself how you expect to use the
> >             table in the
> >             future. Will you be looking up the data on a regular
> basis?
> >             Or will lookups
> >             only be something you will do in exceptional situtions?
> >
> >             What is the intended goal of having a UUID for the
> primary
> >             key rather
> >             than, say, an integer - or having no PK at all?
> >
> >             My immediate thought when reading this was "why even
> store
> >             that data in a
> >             table?" - if it's a simple log, use a log file.
> Especially
> >             if you don't
> >             know how you intend to search for data later on. There
> are
> >             many tools that
> >             are far superior to SQL when it comes to searching for
> text
> >             strings. You
> >             could even consider having a CSV table, which will give
> you
> >             an SQL
> >             interface to said text file.
> >
> >             / Carsten
> >
> >
> >
> >             --
> >             MySQL General Mailing List
> >             For list archives: http://lists.mysql.com/mysql
> >             To unsubscribe: http://lists.mysql.com/mysql
> >
> >
> >
> >
> >     --
> >     MySQL General Mailing List
> >     For list archives: http://lists.mysql.com/mysql
> >     To unsubscribe: http://lists.mysql.com/mysql
> >
> >
> 
> --
> MySQL General Mailing List
> For list archives: http://lists.mysql.com/mysql
> To unsubscribe:    http://lists.mysql.com/mysql


--
MySQL General Mailing List
For list archives: http://lists.mysql.com/mysql
To unsubscribe:    http://lists.mysql.com/mysql

RE: Audit Table storage for Primary Key(s)

Reply via email to