Re: audit trail support

Malcolm Tredinnick Fri, 20 Apr 2007 19:42:37 -0700

Hi Jason,

On Fri, 2007-04-20 at 13:30 -0400, Jason McVetta wrote:
> I need to add real audit trail and change-notification support to an
> existing Django app and to one that is under development.  The best
> way to do this, it seems to me, is to add audit support to the Django
> framework itself.  My requirements are similar to those named by Paul
> Childs in his django-users post last June:
>       * rows, once written, are immutable -- system should be able to
>         operate without UPDATE and DELETE privileges on the audit db
>       * editing a record creates a new row (with higher version and/or
>         date) but does not touch old record
>       * deleting a record creates a new row to be created with the
>         "record_is_deleted" flag set to true
>       * (optional) whenever a new row is written, user must supply a
>         comment explaining the change
>       * (optional) whenever a new row is written, an email is sent to
>         an address specified in the model
> To clarify, when I say "row" I mean a literal row in the database; and
> when I say "record" I mean the representation of a Django model.  A
> record will have many rows associated with it in the audit db, if it
> has been modified many times.  
> 
> Audit trail support would be enabled on a per-model basis, probably by
> including an AuditTrail inner class.  That inner class would then be
> used to specify audit options.  Beyond including this inner class, and
> an to-be-determined method for attaching comments to changes, audit
> trail support should be invisible to developers.


In your descriptions below, you remove the ability for a developer to
use manual primary keys, by the sound of it, since one of your special
fields wants to be a single-column primary key. That's not invisible.

>  Both the existing admin log and the FullHistory branch are inadequate
> for my requirements.  

Understood. There's a bunch of trade-offs that have to be made with any
of these solutions. You have harder (as in stricter, not more difficult)
recording requirements, so your solution will be more invasive than
FullHistory. Uros Trebec, the Full History summer of code student last
year, and I went around and around on the trade-offs a bit when working
out an approach he could take last year. I think it will be hard to
reconcile both goals, so you've probably made a practical decision in
making requirements like yours a separate application.

> I will be working on this project as part of my Real Job(tm), so
> devoting time to it should not be a problem.  However, before I begin
> coding, I want the community's input on a few issues. 
> 
> What is the right way, at DB level, to implement the audit trail?  I
> had two ideas:
>      1. The table representing a model with AuditTrail enabled will
>         include fields such as "record_id", "row_id",
>         "record_version", "record_is_deleted", and "row_timestamp".
>         Row_id will be the primary key for the table, but will not be
>         meaningful for identifying a given record.  Record_id and
>         record_version will be unique together, and together
>         sufficient for locating the current version of a given record.
>         Reading the current record can be accomplished by a query like
>         "SELECT *, max(record_version) FROM table_name WHERE
>         record_is_deleted IS FALSE GROUP BY record_id", or a database
>         view encapsulating the same query. 
>         Advantage:  The audit table is guaranteed to be in sync with
>         the production data, since they are one and the same
>         Disadvantage:  Significantly heavier database load.  

I wouldn't go as far as to say "significantly heavier" without a lot of
profiling under operational loads. Using sensible table indexing,
including functional indexes on databases that have them, should make
that query perform reasonably well for many cases (gut feel based on a
bit of experience with large databases).

A real drawback of this method that you don't mention is that adding
audit support requires changing the database table. So you can't add it
to standard (shipped with Django) or third-party modules easily.

>      1. The audit table, as described above, is written seperately
>         from the working table used for reads.  It would be most
>         useful if the audit table could be written to a wholly
>         separate database.  The working table is unchanged from
>         today's Django. 
>         Advantage:  Fairly trivial increase in database load
>         Disadvantage:  In the event of an application error, it would
>         be possible for the working table and audit table to get out
>         of sync

You still have to change the main table for the model, though, right?
Otherwise you won't be able to have more than one tuple of data for each
model (due to constraint violations). This was a problem that existed in
the design of the FullHistory work as well: how to store arbitrary bits
of changed information without having to alter existing tables.

If you could find a way that all the audit-related information was in a
separate table, I would be tempted to implement it similar to
ContentTypes -- as a table that is related to the model(s) it governs
(maybe there is a shadow audit table for each real model table, but
that's a bit uncool from a database design perspective). The extra table
join won't really hurt you with today's databases and proper indexing.

> 
> Perhaps the core developers can tell me, will it be possible to do
> this as a contrib plugin?  I would, of course, rather do it as a
> plugin than hack on the Django core.  But either way, my ultimate goal
> is to create a high quality solution that can be included in the main
> trunk. 

I suspect you will be able to do most of it as a third-party app without
needing to hack on core. Some monkey patching -- a term I hate because
it suggests it is wrong somehow, but I'll stick with the common phrase
-- of the QuerySet class might be necessary (replacing QuerySet with a
version that understands how to get the "real" row for a model). You
might be able to get away with subclassing QuerySet in a lot of cases,
but I'm not sure that will give you everything.

Saving is going to be the interesting one. Replacing the save() method
on the Model class via monkey patching should be able to fix this,
although you have to make sure insert your own save() method
sufficiently early, which will require some clear thinking.

If I were you, I'd be trying my hardest to make this non-invasive in
core. It's a bit of a fringe requirement and adds a lot of heavy
machinery to common code paths if it's in core, so it's going to have to
be very convincing to warrant hurting the common case (which is where
you don't need this at all).

> Lastly, would it have been more appropriate to post this to
> django-developers?

*shrug* It's kind of third-party development at the moment, so either
list works. If you get hung up on how the deep internals work at any
point, they're certainly good questions for the developers' list (just
because it will have less applicability to our broader userbase).
Developers read both lists, so relevant posts aren't going to be
overlooked more often on one list than the other.

Regards,
Malcolm


--~--~---------~--~----~------------~-------~--~----~
You received this message because you are subscribed to the Google Groups 
"Django users" group.
To post to this group, send email to django-users@googlegroups.com
To unsubscribe from this group, send email to [EMAIL PROTECTED]
For more options, visit this group at 
http://groups.google.com/group/django-users?hl=en
-~----------~----~----~----~------~----~------~--~---

Re: audit trail support

Reply via email to