Hoorah! I was able to complete my single PL/pgSQL function to create history
records tracking individual field value changes generically for all my tables.
Some developers call this an "audit trail", though an accountant might say
otherwise.
Thanks for all the replies.
Special thanks to John DeSoi for pointing me to a crucial code example to make
PL/pgSQL interpret:
"OLD." || myColumnNameVar
as:
OLD.first_name (for example)
The crucial line is:
EXECUTE 'SELECT ($1).' || ri.column_name || '::text' INTO t USING NEW;
found here:
http://wiki.postgresql.org/wiki/PL/pgSQL_Dynamic_Triggers
I'm new to SQL and Postgres, so I may be wrong but my interpretation of this is:
Since there appears no way to make the PL/pgSQL interpreter interpret our
desired string, the trick is to look outside of PL/pgSQL and instead use the
SQL interpreter. Brilliant! It works, and it seems to be fast enough, at least
for my needs.
I'll share my current code & table structure below. Caveat: This code has not
yet been thoroughly tested, nor has it been deployed. I only finalized it
today.
--Basil Bourque
[A] I'm working on a greenfield project, where:
• I've built every table to have a primary key of type uuid named exactly
"pkey_".
• Every table has a TIMESTAMPTZ field named "record_modified_".
My approach below hard-codes these assumptions.
[B] I have this "history_" table:
CREATE TABLE history_
(
pkey_ uuid NOT NULL DEFAULT uuid_generate_v1mc(), -- The primary key for this
table, though no primary key constraint was created (for the sake of
performance and conservation). This column and timestamp_ column are the only
two columns about this table itself. All other columns are about the
inserted/modified/deleted record in some other table.
table_name_ character varying(120) NOT NULL, -- Name of table whose row is
being affected (inserted, deleted, or modified).
column_name_ character varying(120) NOT NULL, -- Name of the column in some
other table whose row value is being modified. This column's value is empty
string if the operation was DELETE.
timestamp_ timestamp with time zone NOT NULL DEFAULT clock_timestamp(), --
The moment this record was created. Using the clock_timestamp() function as a
default, to capture the actual moment in time rather than moment when
transaction began.
db_user_name_ character varying(120) NOT NULL DEFAULT "current_user"(), --
The name of the Postgres user logged in to this database connection/session.
app_name_ character varying(120) NOT NULL DEFAULT
current_setting('application_name'::text), -- The name of the application
connected to the database. May also include the version number of app, and the
name of the human user authenticated within the app.
old_value_ character varying(120) NOT NULL DEFAULT ''::character varying,
new_value_ character varying(120) NOT NULL DEFAULT ''::character varying,
uuid_ uuid NOT NULL, -- The UUID of the row being affected, the row being
inserted, updated, or deleted. Assumes every table whose history is being
recorded uses the 'uuid' data type as its primary key.
operation_ character varying(120) NOT NULL, -- What database operation
resulted in this trigger running: INSERT, UPDATE, DELETE, or TRUNCATE.
table_oid_ oid NOT NULL, -- The oid of the table whose record is being
modified. May be helpful if a table name changes over time.
ordinal_position_of_column_ integer NOT NULL, -- The position of the affected
column in its table. Every new column gets a number, incremented by one for
each. This may be helpful when analyzing changes across a stretch of time
during which a column's name was changed. Apparently columns have no oid, so we
are recording this number instead.
transaction_began_ timestamp with time zone NOT NULL DEFAULT
transaction_timestamp() -- The time when the current transaction began. Can act
like a transaction identifier, to group multiple "history_" rows of the same
transaction together. This is not foolproof, as multiple transaction could
possibly start in the same split second moment. Assuming the computer's clock
has a fine resolution, this chance of a coincidence should be quite miniscule.
)
I do not have a primary constraint for this table. The "pkey_" column acts as a
primary key, but there is no need for an index or uniqueness testing for this
special table.
[C] For every table I want to track field-level value changes, I create a
trigger like this:
CREATE TRIGGER XXX_trigger_history_
AFTER INSERT OR UPDATE OR DELETE ON XXX_
FOR EACH ROW EXECUTE PROCEDURE make_history_();
where 'XXX' is the name of the table.
[D] I created this function:
CREATE OR REPLACE FUNCTION make_history_()
RETURNS TRIGGER
LANGUAGE plpgsql
AS
$BODY$
/* Purpose: Make a history of changes to most fields in the table calling
this trigger function.
This kind of history tracking is also known as an "audit trail".
This function works by detecting each change in value for important
fields in a certain table.
This trigger function then calls another function to create a row in
the "history_" table.
This kind of feature is often called an "audit trail" by software
developers. I avoid using that term in this context as a real
audit trail in accounting terms involves more than this simple field
change tracking.
*/
/* © 2011 Basil Bourque. This source code may be used freely forever by
anyone taking full responsibility for doing so, without warranty.
Thanks so very much to John DeSoi of the [email protected]
mailing list for pointing me to this crucial code example:
http://wiki.postgresql.org/wiki/PL/pgSQL_Dynamic_Triggers
Before reading that example, my previous efforts led me to conclude a
generic history facility written in PL/pgSQL was impossible.
*/
/* We make these assumptions about any table using this function in its
trigger:
• Has a primary key named "pkey_" of type uuid.
• Has a field tracking the datetime the record was last
modified, named "record_modified_" of type timestamptz.
• Is in the default/current schema.
While it might be nice to rewrite this function to escape these
assumptions, I've spent all my energies to get this far.
I welcome feedback from anyone who want to take this further.
*/
/* Notes:
The 'OLD' and 'NEW' variables represent the entire row whose
INSERT/UPDATE/DELETE caused this trigger to run.
The 'TG_xxx' variables are special variables created automatically by
Postgres for the trigger function.
For example, TG_OP indicates which modification operation is happening:
INSERT, UPDATE, or DELETE.
http://www.postgresql.org/docs/current/static/plpgsql-trigger.html
"clock_timestamp()" gets the actual time at the moment of execution. In
contrast, most other timestamp
functions return the time when the current transaction began.
For more information, see:
http://www.postgresql.org/docs/current/static/functions-datetime.html
The "history_" table also includes a column "transaction_began_"
defaulting to "transaction_timestamp()". This timestamp can act
like a transaction identifier, to group multiple "history_" rows of the
same transaction together. This is not foolproof, as
multiple transaction could possibly start in the same split second
moment. Assuming the computer's clock has a fine resolution,
this chance of a coincidence should be quite miniscule. If someone
knows a way to get a true transaction id, please share.
*/
DECLARE
ri RECORD; -- About this data type "RECORD":
http://www.postgresql.org/docs/current/static/plpgsql-declarations.html#PLPGSQL-DECLARATION-RECORDS
oldValue TEXT;
newValue TEXT;
isColumnSignificant BOOLEAN;
isValueModified BOOLEAN;
BEGIN
/*RAISE NOTICE E'\n Running function: make_history_for_row_
----------------\n\n Operation: %\n Schema: %\n Table: %\n',
TG_OP,
TG_TABLE_SCHEMA,
TG_TABLE_NAME;*/
IF (TG_OP = 'INSERT') OR (TG_OP = 'UPDATE') THEN
NEW.record_modified_ = clock_timestamp(); -- Record the moment
this row is being saved.
FOR ri IN
-- Fetch a ResultSet listing columns defined for this
trigger's table.
SELECT ordinal_position, column_name, data_type
FROM information_schema.columns
WHERE
table_schema = quote_ident(TG_TABLE_SCHEMA)
AND table_name = quote_ident(TG_TABLE_NAME)
ORDER BY ordinal_position
LOOP
-- For each column in this trigger's table, copy the
OLD & NEW values into respective variables.
-- NEW value
EXECUTE 'SELECT ($1).' || ri.column_name || '::text'
INTO newValue USING NEW;
-- OLD value
IF (TG_OP = 'INSERT') THEN -- If operation is an
INSERT, we have no OLD value, so use an empty string.
oldValue := ''::varchar;
ELSE -- Else operation is an UPDATE, so capture the
OLD value.
EXECUTE 'SELECT ($1).' || ri.column_name ||
'::text' INTO oldValue USING OLD;
END IF;
-- Make noise for debugging.
/*RAISE NOTICE E'\n Column #: %\n Name: %\n Type: %\n
Old: %\n New: %\n',
ri.ordinal_position,
ri.column_name,
ri.data_type,
oldValue,
newValue;*/
-- ToDo: Add code to throw an Exception if the
primary key value is changing (other than from NULL on an INSERT).
-- ToDo: Add code to ignore columns whose data
type does not cast well to TEXT/VARCHAR.
-- Ignore some columns:
-- • Those whose names are marked with a
trailing x.
-- • The primary key.
-- • Our timestamp fields recording the
row's creation and most recent modification.
isColumnSignificant := (position( '_x_' in
ri.column_name ) < 1) AND (ri.column_name <> 'pkey_') AND (ri.column_name <>
'record_created_') AND (ri.column_name <> 'record_modified_');
IF isColumnSignificant THEN
isValueModified := oldValue <> newValue; -- If
this nthField in the table was modified, make history.
IF isValueModified THEN
/*RAISE NOTICE E'Inserting history_ row
for INSERT or UPDATE.\n';*/
INSERT INTO history_ ( operation_,
table_oid_, table_name_, uuid_, column_name_, ordinal_position_of_column_,
old_value_, new_value_ )
VALUES ( TG_OP, TG_RELID,
TG_TABLE_NAME, NEW.pkey_, ri.column_name::VARCHAR, ri.ordinal_position,
oldValue::VARCHAR, newValue::VARCHAR );
END IF;
END IF;
END LOOP;
RETURN NEW;
ELSIF (TG_OP = 'DELETE') THEN
/*RAISE NOTICE E'Inserting history_ row for DELETE.\n';*/
-- Similar to INSERT above, but refers to OLD instead of
NEW, and passes empty values for last 4 fields.
INSERT INTO history_ ( operation_, table_oid_, table_name_,
uuid_, column_name_, ordinal_position_of_column_, old_value_, new_value_ )
VALUES ( TG_OP, TG_RELID, TG_TABLE_NAME, OLD.pkey_,
''::VARCHAR, 0, ''::VARCHAR, ''::VARCHAR );
RETURN OLD;
END IF;
/* Should never reach this point. Branching in code above should always
reach a call to RETURN. */
RAISE EXCEPTION 'Unexpectedly reached the bottom of this function
without calling RETURN.';
END;
$BODY$;
<end of code>
--
Sent via pgsql-general mailing list ([email protected])
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general