Re: Audit Table storage for Primary Key(s)

Andrew Moore Fri, 31 May 2013 16:08:04 -0700

Ah-ha, excuse my earlier response, I was under the impression you were
trying to track schema changes etc.


A


On Fri, May 31, 2013 at 7:54 PM, Rick James <[email protected]> wrote:

> UUID PRIMARY KEY (or even secondary index) --
> Once the table gets big enough (bigger than RAM cache), each row INSERTed
> (or SELECTed) will be a disk hit.  (Rule of Thumb:  only 100 hits/sec.)
>  This is because _random_ keys (like UUID) make caching useless.  Actually,
> the slowdown will be gradual.  For example, once the table is 5 times as
> big as the cache, 80% (1-1/5) of the INSERTs/SELECTs will hit disk.
> Bottom line -- Avoid UUIDs in huge tables, if at all possible.
>  (Exception:  The bits in type-1 UUIDs can be rearranged to be roughly
> chronological.)
>
> BIGINT -- You cannot possibly hit its max with any existing hardware.
>
> MyISAM -- PRIMARY KEY is just another secondary index.  Secondary indexes
> are separate BTrees.
>
> InnoDB -- PRIMARY KEY and data coexist in the same BTree.  Secondary
> indexes are separate BTrees.
>
> So, assuming this "audit" table will be huge (too big to be cached), you
> need to carefully consider every index, both for writing and for reading.
>
> You mentioned that you might audit 50 tables?  An index that starts with
> table_name would be inserting/selecting in 50 spots.  If the second part of
> the index is something 'chronological', such as an AUTO_INCREMENT or
> TIMESTAMP, then there would be 50 "hot spots" in the index.  This is quite
> efficient.  INDEX(table_name, UUID) would be bad because of the randomness.
>
> InnoDB may be the preferred engine, even though the footprint is bigger.
>  This is because careful design of the PK could lead to INSERTs into hot
> spot(s), plus SELECTs being able to take advantage of locality of
> reference.  With PRIMARY KEY(table_name, ...), and SELECT .. WHERE
> tablename='...', InnoDB will find all the rows together (fewer disk hits);
> MyISAM will find the data scattered (more disk hits, hence slower).
>
> Another aspect...  Would your SELECTs say "WHERE ... AND timestamp
> BETWEEN..." ?  And, would you _usually_ query _recent_ times?  If so, there
> could be a boost from doing both of these
> ** PARTITION BY RANGE(TO_DAYS(timestamp))
> ** Move timestamp to the _end_ of any indexes that it is in.
>
> I would be happy to discuss these principles further.  To be able to
> discuss more specifically, please provide
> ** Your tentative SHOW CREATE TABLE
> ** how big you plan for the table to become (#rows or GB),
> ** how much RAM you have
>
> > -----Original Message-----
> > From: Carsten Pedersen [mailto:[email protected]]
> > Sent: Friday, May 31, 2013 4:05 AM
> > Cc: [MySQL]
> > Subject: Re: Audit Table storage for Primary Key(s)
> >
> > Based on the little information available, I would make a lookup field
> > consisting of tablename and primary keys.
> >
> > (although I still believe that storing this information in the database
> > in the first place is probably the wrong approach, but to each his own)
> >
> > / Carsten
> >
> > On 31-05-2013 12:58, Neil Tompkins wrote:
> > > The kind of look ups will be trying to diagnose when and by who
> > > applied a update.  So the primary key of the audit is important.  My
> > > question is for performance, should the primary key be stored as a
> > > indexed field like I mentioned before, or should I have a actual
> > > individual field per primary key
> > >
> > >
> > > On Fri, May 31, 2013 at 11:03 AM, Carsten Pedersen
> > > <[email protected] <mailto:[email protected]>> wrote:
> > >
> > >     Again: Unless you can give some idea as to the kind of lookups
> > you
> > >     will be performing (which fields? Temporal values? etc.), it is
> > >     impossible to give advice on the table structure. I wouldn't
> > blame
> > >     anyone for not being able to do so; saving data for debugging
> > will
> > >     always be a moving target and almost by definition you don't know
> > >     today what you'll be looking for tomorrow.
> > >
> > >     That's why I think that using CSV tables _the contents of which
> > can
> > >     subsequently be analyzed using any of a number of text file
> > >     processing tools_ may indeed be your best initial option.
> > >
> > >     On UUIDs vs. INTs: (1) Please do yourself a favor and read up on
> > how
> > >     UUIDs are generated. If it's the same server that generates all
> > the
> > >     UUIDs, you won't get a lot of uniqueness for the amount of space
> > >     you'll be using for your data and index; (2) Please do the math
> > of
> > >     just how many inserts you can do per second over the next 1.000
> > >     years if you use a longint auto-increment field for your PK.
> > >
> > >     / Carsten
> > >
> > >     On 31-05-2013 11 <tel:31-05-2013%2011>:14, Neil Tompkins wrote:
> > >
> > >         Thanks for your response.  We expect to use the Audit log
> > when
> > >         looking into
> > >         exceptions and/or any need to debug table updates.  I don't
> > >         think a CSV
> > >         table would be sufficient as we are wanting to use a
> > interface
> > >         to query
> > >         this data at least on a daily basis if not weekly.
> > >
> > >         I use UUID because we have currently 54 tables, of which
> > >         probably 30 will
> > >         be audited.  So a INT PK wouldn't work because of the number
> > of
> > >         updates we
> > >         are applying.
> > >
> > >
> > >         On Fri, May 31, 2013 at 9:58 AM, Carsten Pedersen
> > >         <[email protected] <mailto:[email protected]>>wrote:
> > >
> > >             On 30-05-2013 09:27, Neil Tompkins wrote:
> > >
> > >                 Hi,
> > >
> > >                 I've created a Audit table which tracks any changed
> > >                 fields for multiple
> > >                 tables.  In my Audit table I'm using a UUID for the
> > >                 primary key.  However
> > >                 I
> > >                 need to have a reference back to the primary key(s)
> > of
> > >                 the table audited.
> > >
> > >                 At the moment I've a VARCHAR field which stores
> > primary
> > >                 keys like
> > >
> > >                 1
> > >                 1|2013-05-29
> > >                 2|2013-05-29
> > >                 2
> > >                 3
> > >                 1|2|2
> > >                 etc
> > >
> > >                 Is this the best approach, or should I have a
> > individual
> > >                 field in the
> > >                 audit
> > >                 table for all primary keys.  At the moment I think
> > the
> > >                 max number of
> > >                 primary keys on any given table is 3
> > >
> > >                 Thanks
> > >                 Neil
> > >
> > >
> > >             First you need to ask yourself how you expect to use the
> > >             table in the
> > >             future. Will you be looking up the data on a regular
> > basis?
> > >             Or will lookups
> > >             only be something you will do in exceptional situtions?
> > >
> > >             What is the intended goal of having a UUID for the
> > primary
> > >             key rather
> > >             than, say, an integer - or having no PK at all?
> > >
> > >             My immediate thought when reading this was "why even
> > store
> > >             that data in a
> > >             table?" - if it's a simple log, use a log file.
> > Especially
> > >             if you don't
> > >             know how you intend to search for data later on. There
> > are
> > >             many tools that
> > >             are far superior to SQL when it comes to searching for
> > text
> > >             strings. You
> > >             could even consider having a CSV table, which will give
> > you
> > >             an SQL
> > >             interface to said text file.
> > >
> > >             / Carsten
> > >
> > >
> > >
> > >             --
> > >             MySQL General Mailing List
> > >             For list archives: http://lists.mysql.com/mysql
> > >             To unsubscribe: http://lists.mysql.com/mysql
> > >
> > >
> > >
> > >
> > >     --
> > >     MySQL General Mailing List
> > >     For list archives: http://lists.mysql.com/mysql
> > >     To unsubscribe: http://lists.mysql.com/mysql
> > >
> > >
> >
> > --
> > MySQL General Mailing List
> > For list archives: http://lists.mysql.com/mysql
> > To unsubscribe:    http://lists.mysql.com/mysql
>
>
> --
> MySQL General Mailing List
> For list archives: http://lists.mysql.com/mysql
> To unsubscribe:    http://lists.mysql.com/mysql
>
>

Re: Audit Table storage for Primary Key(s)

Reply via email to