subject:"\[GENERAL\] table versioning approach \(not auditing\)"

Re: [GENERAL] table versioning approach (not auditing)

2014-10-10 Thread Jim Nasby


On 10/7/14, 10:40 PM, Gavin Flower wrote:

Yeah, I'm pretty convinced at this point that history/versioning should be 
built on top of a schema that always contains the current information, if for 
no other reason than so you always have a PK that points to what's current in 
addition to your history PKs.

One of the motivations for having an effective_date, was being able to put 
changes into the database ahead of time.


Yeah, allowing for future data makes things more interesting. My first 
inclination is that it's a completely separate requirement, and you would track 
the history of all records that you had at a point in time. Doing that means 
you can see things like someone changing the effective date from Nov. 1 to Dec. 
1. But clearly this is an area where you have to take the business case into 
account.
--
Jim Nasby, Data Architect, Blue Treble Consulting
Data in Trouble? Get it in Treble! http://BlueTreble.com


--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general

Re: [GENERAL] table versioning approach (not auditing)

2014-10-07 Thread Gavin Flower


On 08/10/14 13:29, Jim Nasby wrote:

On 10/6/14, 6:10 PM, Gavin Flower wrote:
Even if timestamps are used extensively, you'd have to be careful 
joining on them. You may have information valid at T1 and changing at 
T3, but the transaction has T2, where T1 < T2 < T3 - the appropriate 
set of data would be associated with T1, would would not get anywhere 
trying to find data with a timestamp of T2 (unless you were very 
lucky!).


Yeah, this is why I think timestamps need to be shunned in favor of 
explicit pointers. Anyone that thinks timestamps are good enough 
hasn't thought the problem through completely. :)


I also think there's potential value to storing full transaction 
information (presumably in a separate table): txid_current(), 
txid_current_snapshot(), now(), current_user, maybe some other stuff 
(client IP address?). That way you can tell exactly what created a 
history record. With appropriate shenanigans you can theoretically 
determine exactly what other history data would be visible at that 
time without using pointers (but man would that bu ugly!)


Actually things like phone numbers are tricky.  Sometimes you may 
want to use the current phone number, and not the one extant at that 
time (as you want to phone the contact now), or you may still want 
the old phone number (was the call to a specific number at date/time 
legitimate & who do we charge the cost of the call too).


Yeah, I'm pretty convinced at this point that history/versioning 
should be built on top of a schema that always contains the current 
information, if for no other reason than so you always have a PK that 
points to what's current in addition to your history PKs.
One of the motivations for having an effective_date, was being able to 
put changes into the database ahead of time.


Finding the current value uses the same logic a find the value at any 
other date/time - so you don't need a special schema to distinguish the 
current state from anything else.  For example:


   DROP TABLE IF EXISTS stock;

   CREATE TABLE stock
   (
id text,
effective_date timestamptz,
price numeric
   );

   INSERT INTO stock
   (
   id,
   effective_date,
   price
   )
   VALUES
('y88', '2014-10-01', 12.0),
('x42', '2014-10-01', 12.1),
('x42', '2014-10-08', 12.2),
('x42', '2014-10-10', 12.3),
('x42', '2014-10-16', 12.4),
('z42', '2014-10-19', 12.5),
('z49', '2014-10-01', 12.6),
('z49', '2014-10-30', 12.7),
('z77', '2014-10-01', 12.8);

   CREATE UNIQUE INDEX primary_key ON stock (id ASC, effective_date DESC);

   SELECT
s.price
   FROM
stock s
   WHERE
s.id = 'x42'
AND s.effective_date <= '2014-10-11'
   ORDER BY
s.effective_date DESC
   LIMIT 1;



Cheers,
Gavin

Re: [GENERAL] table versioning approach (not auditing)

2014-10-07 Thread Jim Nasby


On 10/6/14, 6:10 PM, Gavin Flower wrote:

Even if timestamps are used extensively, you'd have to be careful joining on them. 
You may have information valid at T1 and changing at T3, but the transaction has T2, 
where T1 < T2 < T3 - the appropriate set of data would be associated with T1, 
would would not get anywhere trying to find data with a timestamp of T2 (unless you 
were very lucky!).


Yeah, this is why I think timestamps need to be shunned in favor of explicit 
pointers. Anyone that thinks timestamps are good enough hasn't thought the 
problem through completely. :)

I also think there's potential value to storing full transaction information 
(presumably in a separate table): txid_current(), txid_current_snapshot(), 
now(), current_user, maybe some other stuff (client IP address?). That way you 
can tell exactly what created a history record. With appropriate shenanigans 
you can theoretically determine exactly what other history data would be 
visible at that time without using pointers (but man would that bu ugly!)


Actually things like phone numbers are tricky.  Sometimes you may want to use the 
current phone number, and not the one extant at that time (as you want to phone the 
contact now), or you may still want the old phone number (was the call to a 
specific number at date/time legitimate & who do we charge the cost of the call 
too).


Yeah, I'm pretty convinced at this point that history/versioning should be 
built on top of a schema that always contains the current information, if for 
no other reason than so you always have a PK that points to what's current in 
addition to your history PKs.
--
Jim Nasby, Data Architect, Blue Treble Consulting
Data in Trouble? Get it in Treble! http://BlueTreble.com


--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general

Re: [GENERAL] table versioning approach (not auditing)

2014-10-06 Thread Gavin Flower

On 07/10/14 10:47, Jim Nasby wrote:

On 10/2/14, 9:27 AM, Adam Brusselback wrote:
i've also tried to implement a database versioning using JSON to log
changes in tables. Here it is:
https://github.com/fxku/audit[https://github.com/fxku/audit][https://github.com/fxku/audit[https://github.com/fxku/audit]]

I've got two versioning tables, one storing information about all
transactions that happened and one where i put the JSON logs of row
changes of each table. I'm only logging old values and not complete
rows.

Then I got a function that recreates a database state at a given time
into a separate schema - either to VIEWs, MVIEWs or TABLES. This
database state could then be indexed in order to work with it. You
can also reset the production state to the recreated past state.

Unfortunately I've got no time to further work on it at the moment +
I have not done tests with many changes in the database so I can't
say if the recreation process scales well. On downside I've realised
is that using the json_agg function has limits when I've got binary
data. It gets too long. So I'm really looking forward using JSONB.

There are more plans in my mind. By having a Transaction_Log table it
should be possible to revert only certain transactions. I'm also
thinking of parallel versioning, e.g. different users are all working
with their version of the database and commit their changes to the
production state. As I've got a unique history ID for each table and
each row, I should be able to map the affected records.
Sorry I'm coming late to this thread. I agree that getting interested
people together would be a good idea. Is there another mailing list we
can do that with?

Versioning is also something I've interested in, and have put a lot of
thought into (if not much actual code :( ). I'll also make some
general comments, if I may...

I think timestamps should be *heavily avoided* in versioning, because
they are frequently the wrong way to solve a problem. There are many
use cases where you're trying to answer "What values were in place
when X happened", and the simplest, most fool-proof way to answer that
is that when you create a record for X, part of that record is a
"history ID" that shows you the exact data used. For example, if
you're creating an invoicing system that has versioning of customer
addresses you would not try and join an invoice with it's address
using a timestamp; you would put an actual address_history_id in the
invoice table.

I thought I saw a reference to versioning sets of information. This is
perhaps the trickiest part. You first have to think about the
non-versioned sets (ie: a customer may have many phone numbers) before
you think about versioning the set. In this example, you want the
history of the *set* of phone numbers, not of each individual number.
Design it with full duplication of data first, don't think about
normalizing until you have the full set versioning design.

I understand the generic appeal of using something like JSON, but in
reality I don't see it working terribly well. It's likely to be on the
slow side, and it'll also be difficult to query from. Instead, I think
it makes more sense to create actual history tables that derive their
definition from the base table. I've got code that extracts
information (column_name, data type, nullability) from a table (or
even a table definition), and it's not that complex. With the work
that's been done on capturing DDL changes it shouldn't be too hard to
handle that automatically.

Yeah, my design was quite extensive and ensured all relevant information
was associated with the 'history id' (still need timestamps to find the
appropriate value), the powers that be watered it down somewhat (but
that was outside my control). Performance was not too critical, probably
less than 10 transactions per second at peak times. JSON, had yet to be
invented, but we would not have used it anyhow.

Even if timestamps are used extensively, you'd have to be careful
joining on them. You may have information valid at T1 and changing at
T3, but the transaction has T2, where T1 < T2 < T3 - the appropriate set
of data would be associated with T1, would would not get anywhere trying
to find data with a timestamp of T2 (unless you were very lucky!).

Actually things like phone numbers are tricky. Sometimes you may want
to use the current phone number, and not the one extant at that time (as
you want to phone the contact now), or you may still want the old phone
number (was the call to a specific number at date/time legitimate & who
do we charge the cost of the call too).

--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general

Re: [GENERAL] table versioning approach (not auditing)

2014-10-06 Thread Jim Nasby

On 10/2/14, 9:27 AM, Adam Brusselback wrote:

i've also tried to implement a database versioning using JSON to log changes in
tables. Here it is:
https://github.com/fxku/audit[https://github.com/fxku/audit][https://github.com/fxku/audit[https://github.com/fxku/audit]]

I've got two versioning tables, one storing information about all transactions
that happened and one where i put the JSON logs of row changes of each table.
I'm only logging old values and not complete rows.

Then I got a function that recreates a database state at a given time into a
separate schema - either to VIEWs, MVIEWs or TABLES. This database state could
then be indexed in order to work with it. You can also reset the production
state to the recreated past state.

Unfortunately I've got no time to further work on it at the moment + I have not
done tests with many changes in the database so I can't say if the recreation
process scales well. On downside I've realised is that using the json_agg
function has limits when I've got binary data. It gets too long. So I'm really
looking forward using JSONB.

There are more plans in my mind. By having a Transaction_Log table it should be
possible to revert only certain transactions. I'm also thinking of parallel
versioning, e.g. different users are all working with their version of the
database and commit their changes to the production state. As I've got a unique
history ID for each table and each row, I should be able to map the affected
records.

Sorry I'm coming late to this thread. I agree that getting interested people
together would be a good idea. Is there another mailing list we can do that
with?

Versioning is also something I've interested in, and have put a lot of thought
into (if not much actual code :( ). I'll also make some general comments, if I
may...

I think timestamps should be *heavily avoided* in versioning, because they are frequently the wrong
way to solve a problem. There are many use cases where you're trying to answer "What values
were in place when X happened", and the simplest, most fool-proof way to answer that is that
when you create a record for X, part of that record is a "history ID" that shows you the
exact data used. For example, if you're creating an invoicing system that has versioning of
customer addresses you would not try and join an invoice with it's address using a timestamp; you
would put an actual address_history_id in the invoice table.

I thought I saw a reference to versioning sets of information. This is perhaps
the trickiest part. You first have to think about the non-versioned sets (ie: a
customer may have many phone numbers) before you think about versioning the
set. In this example, you want the history of the *set* of phone numbers, not
of each individual number. Design it with full duplication of data first, don't
think about normalizing until you have the full set versioning design.

I understand the generic appeal of using something like JSON, but in reality I
don't see it working terribly well. It's likely to be on the slow side, and
it'll also be difficult to query from. Instead, I think it makes more sense to
create actual history tables that derive their definition from the base table.
I've got code that extracts information (column_name, data type, nullability)
from a table (or even a table definition), and it's not that complex. With the
work that's been done on capturing DDL changes it shouldn't be too hard to
handle that automatically.

--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general

Re: [GENERAL] table versioning approach (not auditing)

2014-10-02 Thread Adam Brusselback

Ended up running for 28 min, but it did work as expected.

On Thu, Oct 2, 2014 at 10:27 AM, Adam Brusselback  wrote:

> Testing that now.  Initial results are not looking too performant.
> I have one single table which had 234575 updates done to it.  I am rolling
> back 13093 of them.  It's been running 20 min now, using 100% of a single
> core, and almost 0 disk.  No idea how long it'll run at this point.
>
> This is on an i5 desktop with 16 gigs of ram and an ssd.
>
> This is a pretty good test though, as it's a real world use case (even if
> the data was generated with PGBench).  We now know that area needs some
> work before it can be used for anything more than a toy database.
>
> Thanks,
> -Adam
>
> On Thu, Oct 2, 2014 at 7:52 AM, Felix Kunde  wrote:
>
>> Hey there
>>
>> Thanks again for the fix. I was able to merge it into my repo.
>> Also thanks for benchmarking audit. Very interesting results.
>> I wonder how the recreation of former database states scales when
>> processing many deltas.
>> Haven’t done a lot of testing in that direction.
>>
>> I will transfer the code soon to a more public repo on GitHub. As far as
>> I see I have to create an organization for that.
>>
>> Cheers
>> Felix
>>
>> *Gesendet:* Mittwoch, 01. Oktober 2014 um 17:09 Uhr
>>
>> *Von:* "Adam Brusselback" 
>> *An:* "Felix Kunde" 
>> *Cc:* "pgsql-general@postgresql.org" 
>> *Betreff:* Re: [GENERAL] table versioning approach (not auditing)
>>   I know we're kinda hijacking this thread, so sorry for that.  If you'd
>> like to do that, i'd be more than happy to use it and push any fixes /
>> changes upstream.  I don't have much of a preference on the name either, as
>> long as it's something that makes sense.
>>
>> I would consider myself far from an expert though! Either way, more
>> people using a single solution is a good thing.
>>
>> As a side note, I did some benchmarking this morning and wanted to share
>> the results:
>> pgbench -i -s 140 -U postgres pgbench
>>
>> pgbench -c 4 -j 4 -T 600 -U postgres pgbench
>> no auditing tps: 2854
>> NOTE: Accounts are audited
>> auditing tps: 1278
>>
>> pgbench -c 2 -j 2 -N -T 300 -U postgres pgbench
>> no auditing tps: 2504
>> NOTE: Accounts are audited
>> auditing tps: 822
>>
>> pgbench -c 2 -j 2 -T 300 -U postgres pgbench
>> no auditing tps: 1836
>> NOTE: branches and tellers are audited, accounts are not
>> auditing tps: 505
>>
>> I'd love to see if there are some easy wins to boost the performance.
>>
>> On Wed, Oct 1, 2014 at 5:19 AM, Felix Kunde  wrote:
>>>
>>> Hey there. Thank you very much for that fix! Thats why I'd like to have
>>> a joint development and joint testing. It's way more convincing for users
>>> to go for a solution that is tested by some experts than just by a random
>>> developer :)
>>>
>>> I'm open to create a new project and push the code there. Don't care
>>> about the name. Then we might figure out which parts are already good,
>>> which parts could be improved and where to go next. I think switching to
>>> JSONB for example will be easy, as it offers the same functions than JSON
>>> afaik.
>>>
>>>
>>> Gesendet: Dienstag, 30. September 2014 um 21:16 Uhr
>>> Von: "Adam Brusselback" 
>>> An: "Felix Kunde" 
>>> Cc: "pgsql-general@postgresql.org" 
>>> Betreff: Re: [GENERAL] table versioning approach (not auditing)
>>>
>>> Felix, I'd love to see a single, well maintained project. For example, I
>>> just found yours, and gave it a shot today after seeing this post.  I found
>>> a bug when an update command is issued, but the old and new values are all
>>> the same.  The trigger will blow up.  I've got a fix for that, but if we
>>> had one project that more than a handful of people used, stuff like that
>>> would be quashed very quickly.
>>>
>>> I love the design of it by the way. Any idea what it will take to move
>>> to JSONB for 9.4?
>>>
>>>
>>> On Tue, Sep 30, 2014 at 7:22 AM, Felix Kunde 
>>> wrote:Hey
>>>
>>> yes i'm adding an additional key to each of my tables. First i wanted to
>>> use the primary key as one column in my audit_log table, but in some of my
>>> tables the PK consists of more than one column. Plus it's nice to have one
>>>

Re: [GENERAL] table versioning approach (not auditing)

2014-10-02 Thread Adam Brusselback

Testing that now.  Initial results are not looking too performant.
I have one single table which had 234575 updates done to it.  I am rolling
back 13093 of them.  It's been running 20 min now, using 100% of a single
core, and almost 0 disk.  No idea how long it'll run at this point.

This is on an i5 desktop with 16 gigs of ram and an ssd.

This is a pretty good test though, as it's a real world use case (even if
the data was generated with PGBench).  We now know that area needs some
work before it can be used for anything more than a toy database.

Thanks,
-Adam

On Thu, Oct 2, 2014 at 7:52 AM, Felix Kunde  wrote:

> Hey there
>
> Thanks again for the fix. I was able to merge it into my repo.
> Also thanks for benchmarking audit. Very interesting results.
> I wonder how the recreation of former database states scales when
> processing many deltas.
> Haven’t done a lot of testing in that direction.
>
> I will transfer the code soon to a more public repo on GitHub. As far as I
> see I have to create an organization for that.
>
> Cheers
> Felix
>
> *Gesendet:* Mittwoch, 01. Oktober 2014 um 17:09 Uhr
>
> *Von:* "Adam Brusselback" 
> *An:* "Felix Kunde" 
> *Cc:* "pgsql-general@postgresql.org" 
> *Betreff:* Re: [GENERAL] table versioning approach (not auditing)
>   I know we're kinda hijacking this thread, so sorry for that.  If you'd
> like to do that, i'd be more than happy to use it and push any fixes /
> changes upstream.  I don't have much of a preference on the name either, as
> long as it's something that makes sense.
>
> I would consider myself far from an expert though! Either way, more people
> using a single solution is a good thing.
>
> As a side note, I did some benchmarking this morning and wanted to share
> the results:
> pgbench -i -s 140 -U postgres pgbench
>
> pgbench -c 4 -j 4 -T 600 -U postgres pgbench
> no auditing tps: 2854
> NOTE: Accounts are audited
> auditing tps: 1278
>
> pgbench -c 2 -j 2 -N -T 300 -U postgres pgbench
> no auditing tps: 2504
> NOTE: Accounts are audited
> auditing tps: 822
>
> pgbench -c 2 -j 2 -T 300 -U postgres pgbench
> no auditing tps: 1836
> NOTE: branches and tellers are audited, accounts are not
> auditing tps: 505
>
> I'd love to see if there are some easy wins to boost the performance.
>
> On Wed, Oct 1, 2014 at 5:19 AM, Felix Kunde  wrote:
>>
>> Hey there. Thank you very much for that fix! Thats why I'd like to have a
>> joint development and joint testing. It's way more convincing for users to
>> go for a solution that is tested by some experts than just by a random
>> developer :)
>>
>> I'm open to create a new project and push the code there. Don't care
>> about the name. Then we might figure out which parts are already good,
>> which parts could be improved and where to go next. I think switching to
>> JSONB for example will be easy, as it offers the same functions than JSON
>> afaik.
>>
>>
>> Gesendet: Dienstag, 30. September 2014 um 21:16 Uhr
>> Von: "Adam Brusselback" 
>> An: "Felix Kunde" 
>> Cc: "pgsql-general@postgresql.org" 
>> Betreff: Re: [GENERAL] table versioning approach (not auditing)
>>
>> Felix, I'd love to see a single, well maintained project. For example, I
>> just found yours, and gave it a shot today after seeing this post.  I found
>> a bug when an update command is issued, but the old and new values are all
>> the same.  The trigger will blow up.  I've got a fix for that, but if we
>> had one project that more than a handful of people used, stuff like that
>> would be quashed very quickly.
>>
>> I love the design of it by the way. Any idea what it will take to move to
>> JSONB for 9.4?
>>
>>
>> On Tue, Sep 30, 2014 at 7:22 AM, Felix Kunde 
>> wrote:Hey
>>
>> yes i'm adding an additional key to each of my tables. First i wanted to
>> use the primary key as one column in my audit_log table, but in some of my
>> tables the PK consists of more than one column. Plus it's nice to have one
>> key that is called the same over all tables.
>>
>> To get a former state for one row at date x I need to join the latest
>> delta BEFORE date x with each delta AFTER date x. If I would log complete
>> rows, this joining part would not be neccessary, but as I usually work with
>> spatial databases that have complex geometries and also image files, this
>> strategy is too harddisk consuming.
>>
>> If there are more users following a similar approach, I wonder why we not
>> throw all the

Re: [GENERAL] table versioning approach (not auditing)

2014-10-01 Thread Adam Brusselback

I know we're kinda hijacking this thread, so sorry for that.  If you'd like
to do that, i'd be more than happy to use it and push any fixes / changes
upstream.  I don't have much of a preference on the name either, as long as
it's something that makes sense.

I would consider myself far from an expert though! Either way, more people
using a single solution is a good thing.

As a side note, I did some benchmarking this morning and wanted to share
the results:
pgbench -i -s 140 -U postgres pgbench

pgbench -c 4 -j 4 -T 600 -U postgres pgbench
no auditing tps: 2854
NOTE: Accounts are audited
auditing tps: 1278

pgbench -c 2 -j 2 -N -T 300 -U postgres pgbench
no auditing tps: 2504
NOTE: Accounts are audited
auditing tps: 822

pgbench -c 2 -j 2 -T 300 -U postgres pgbench
no auditing tps: 1836
NOTE: branches and tellers are audited, accounts are not
auditing tps: 505

I'd love to see if there are some easy wins to boost the performance.

On Wed, Oct 1, 2014 at 5:19 AM, Felix Kunde  wrote:

> Hey there. Thank you very much for that fix! Thats why I'd like to have a
> joint development and joint testing. It's way more convincing for users to
> go for a solution that is tested by some experts than just by a random
> developer :)
>
> I'm open to create a new project and push the code there. Don't care about
> the name. Then we might figure out which parts are already good, which
> parts could be improved and where to go next. I think switching to JSONB
> for example will be easy, as it offers the same functions than JSON afaik.
>
>
> Gesendet: Dienstag, 30. September 2014 um 21:16 Uhr
> Von: "Adam Brusselback" 
> An: "Felix Kunde" 
> Cc: "pgsql-general@postgresql.org" 
> Betreff: Re: [GENERAL] table versioning approach (not auditing)
>
> Felix, I'd love to see a single, well maintained project. For example, I
> just found yours, and gave it a shot today after seeing this post.  I found
> a bug when an update command is issued, but the old and new values are all
> the same.  The trigger will blow up.  I've got a fix for that, but if we
> had one project that more than a handful of people used, stuff like that
> would be quashed very quickly.
>
> I love the design of it by the way. Any idea what it will take to move to
> JSONB for 9.4?
>
>
> On Tue, Sep 30, 2014 at 7:22 AM, Felix Kunde 
> wrote:Hey
>
> yes i'm adding an additional key to each of my tables. First i wanted to
> use the primary key as one column in my audit_log table, but in some of my
> tables the PK consists of more than one column. Plus it's nice to have one
> key that is called the same over all tables.
>
> To get a former state for one row at date x I need to join the latest
> delta BEFORE date x with each delta AFTER date x. If I would log complete
> rows, this joining part would not be neccessary, but as I usually work with
> spatial databases that have complex geometries and also image files, this
> strategy is too harddisk consuming.
>
> If there are more users following a similar approach, I wonder why we not
> throw all the good ideas together, to have one solution that is tested,
> maintained and improved by more developpers. This would be great.
>
> Felix
>
>
> Gesendet: Montag, 29. September 2014 um 23:25 Uhr
> Von: "Abelard Hoffman"  ]>
> An: "Felix Kunde" 
> Cc: "pgsql-general@postgresql.org[pgsql-general@postgresql.org]" <
> pgsql-general@postgresql.org[pgsql-general@postgresql.org]>
> Betreff: Re: [GENERAL] table versioning approach (not auditing)
>
> Thank you Felix, Gavin, and Jonathan for your responses.
>
> Felix & Jonathan: both of you mention just storing deltas. But if you do
> that, how do you associate the delta record with the original row? Where's
> the PK stored, if it wasn't part of the delta?
>
> Felix, thank you very much for the example code. I took a look at your
> table schemas. I need to study it more, but it looks like the way you're
> handling the PK, is you're adding a separate synthethic key (audit_id) to
> each table that's being versioned. And then storing that key along with the
> delta.
>
> So then to find all the versions of a given row, you just need to join the
> audit row with the schema_name.table_name.audit_id column. Is that right?
> The only potential drawback there is there's no referential integrity
> between the audit_log.audit_id and the actual table.
>
> I do like that approach very much though, in that it eliminates the need
> to interrogate the json data in order to perform most queries.
>
> AH
>
>
>
> On Mon, Sep 29, 2014 at 12:26 AM, Felix Kunde  felix-ku...@gmx.de]> wrote:Hey
&g

Re: [GENERAL] table versioning approach (not auditing)

2014-10-01 Thread Felix Kunde

Hey there. Thank you very much for that fix! Thats why I'd like to have a joint
development and joint testing. It's way more convincing for users to go for a
solution that is tested by some experts than just by a random developer :)

I'm open to create a new project and push the code there. Don't care about the
name. Then we might figure out which parts are already good, which parts could
be improved and where to go next. I think switching to JSONB for example will
be easy, as it offers the same functions than JSON afaik.

Gesendet: Dienstag, 30. September 2014 um 21:16 Uhr
Von: "Adam Brusselback"
An: "Felix Kunde"
Cc: "pgsql-general@postgresql.org"
Betreff: Re: [GENERAL] table versioning approach (not auditing)

Felix, I'd love to see a single, well maintained project. For example, I just
found yours, and gave it a shot today after seeing this post. I found a bug
when an update command is issued, but the old and new values are all the same.
The trigger will blow up. I've got a fix for that, but if we had one project
that more than a handful of people used, stuff like that would be quashed very
quickly.
I love the design of it by the way. Any idea what it will take to move to JSONB
for 9.4?
On Tue, Sep 30, 2014 at 7:22 AM, Felix Kunde wrote:Hey
yes i'm adding an additional key to each of my tables. First i wanted to use
the primary key as one column in my audit_log table, but in some of my tables
the PK consists of more than one column. Plus it's nice to have one key that is
called the same over all tables.
To get a former state for one row at date x I need to join the latest delta
BEFORE date x with each delta AFTER date x. If I would log complete rows, this
joining part would not be neccessary, but as I usually work with spatial
databases that have complex geometries and also image files, this strategy is
too harddisk consuming.
If there are more users following a similar approach, I wonder why we not throw
all the good ideas together, to have one solution that is tested, maintained
and improved by more developpers. This would be great.
Felix

Gesendet: Montag, 29. September 2014 um 23:25 Uhr
Von: "Abelard Hoffman"
An: "Felix Kunde"
Cc: "pgsql-general@postgresql.org[pgsql-general@postgresql.org]"

Betreff: Re: [GENERAL] table versioning approach (not auditing)

Thank you Felix, Gavin, and Jonathan for your responses.
Felix & Jonathan: both of you mention just storing deltas. But if you do that,
how do you associate the delta record with the original row? Where's the PK
stored, if it wasn't part of the delta?
Felix, thank you very much for the example code. I took a look at your table
schemas. I need to study it more, but it looks like the way you're handling the
PK, is you're adding a separate synthethic key (audit_id) to each table that's
being versioned. And then storing that key along with the delta.
So then to find all the versions of a given row, you just need to join the
audit row with the schema_name.table_name.audit_id column. Is that right? The
only potential drawback there is there's no referential integrity between the
audit_log.audit_id and the actual table.
I do like that approach very much though, in that it eliminates the need to
interrogate the json data in order to perform most queries.
AH
On Mon, Sep 29, 2014 at 12:26 AM, Felix Kunde
wrote:Hey
i've also tried to implement a database versioning using JSON to log changes in
tables. Here it is:
https://github.com/fxku/audit[https://github.com/fxku/audit][https://github.com/fxku/audit[https://github.com/fxku/audit]]
I've got two versioning tables, one storing information about all transactions
that happened and one where i put the JSON logs of row changes of each table.
I'm only logging old values and not complete rows.
Then I got a function that recreates a database state at a given time into a
separate schema - either to VIEWs, MVIEWs or TABLES. This database state could
then be indexed in order to work with it. You can also reset the production
state to the recreated past state.
Unfortunately I've got no time to further work on it at the moment + I have not
done tests with many changes in the database so I can't say if the recreation
process scales well. On downside I've realised is that using the json_agg
function has limits when I've got binary data. It gets too long. So I'm really
looking forward using JSONB.

Re: [GENERAL] table versioning approach (not auditing)

2014-09-30 Thread Adam Brusselback

Felix, I'd love to see a single, well maintained project. For example, I
just found yours, and gave it a shot today after seeing this post.  I found
a bug when an update command is issued, but the old and new values are all
the same.  The trigger will blow up.  I've got a fix for that, but if we
had one project that more than a handful of people used, stuff like that
would be quashed very quickly.

I love the design of it by the way. Any idea what it will take to move to
JSONB for 9.4?


On Tue, Sep 30, 2014 at 7:22 AM, Felix Kunde  wrote:

> Hey
>
> yes i'm adding an additional key to each of my tables. First i wanted to
> use the primary key as one column in my audit_log table, but in some of my
> tables the PK consists of more than one column. Plus it's nice to have one
> key that is called the same over all tables.
>
> To get a former state for one row at date x I need to join the latest
> delta BEFORE date x with each delta AFTER date x. If I would log complete
> rows, this joining part would not be neccessary, but as I usually work with
> spatial databases that have complex geometries and also image files, this
> strategy is too harddisk consuming.
>
> If there are more users following a similar approach, I wonder why we not
> throw all the good ideas together, to have one solution that is tested,
> maintained and improved by more developpers. This would be great.
>
> Felix
>
>
> Gesendet: Montag, 29. September 2014 um 23:25 Uhr
> Von: "Abelard Hoffman" 
> An: "Felix Kunde" 
> Cc: "pgsql-general@postgresql.org" 
> Betreff: Re: [GENERAL] table versioning approach (not auditing)
>
> Thank you Felix, Gavin, and Jonathan for your responses.
>
> Felix & Jonathan: both of you mention just storing deltas. But if you do
> that, how do you associate the delta record with the original row? Where's
> the PK stored, if it wasn't part of the delta?
>
> Felix, thank you very much for the example code. I took a look at your
> table schemas. I need to study it more, but it looks like the way you're
> handling the PK, is you're adding a separate synthethic key (audit_id) to
> each table that's being versioned. And then storing that key along with the
> delta.
>
> So then to find all the versions of a given row, you just need to join the
> audit row with the schema_name.table_name.audit_id column. Is that right?
> The only potential drawback there is there's no referential integrity
> between the audit_log.audit_id and the actual table.
>
> I do like that approach very much though, in that it eliminates the need
> to interrogate the json data in order to perform most queries.
>
> AH
>
>
>
> On Mon, Sep 29, 2014 at 12:26 AM, Felix Kunde 
> wrote:Hey
>
> i've also tried to implement a database versioning using JSON to log
> changes in tables. Here it is:
> https://github.com/fxku/audit[https://github.com/fxku/audit]
> I've got two versioning tables, one storing information about all
> transactions that happened and one where i put the JSON logs of row changes
> of each table. I'm only logging old values and not complete rows.
>
> Then I got a function that recreates a database state at a given time into
> a separate schema - either to VIEWs, MVIEWs or TABLES. This database state
> could then be indexed in order to work with it. You can also reset the
> production state to the recreated past state.
>
> Unfortunately I've got no time to further work on it at the moment + I
> have not done tests with many changes in the database so I can't say if the
> recreation process scales well. On downside I've realised is that using the
> json_agg function has limits when I've got binary data. It gets too long.
> So I'm really looking forward using JSONB.
>
> There are more plans in my mind. By having a Transaction_Log table it
> should be possible to revert only certain transactions. I'm also thinking
> of parallel versioning, e.g. different users are all working with their
> version of the database and commit their changes to the production state.
> As I've got a unique history ID for each table and each row, I should be
> able to map the affected records.
>
> Have a look and tell me what you think of it.
>
> Cheers
> Felix
>
>
> Gesendet: Montag, 29. September 2014 um 04:00 Uhr
> Von: "Abelard Hoffman" 
> An: "pgsql-general@postgresql.org" 
> Betreff: [GENERAL] table versioning approach (not auditing)
>
> Hi. I need to maintain a record of all changes to certain tables so assist
> in viewing history and reverting changes when necessary (customer service
> makes an incorrect edit, etc.).
>
> I

Re: [GENERAL] table versioning approach (not auditing)

2014-09-30 Thread Felix Kunde

Hey
yes i'm adding an additional key to each of my tables. First i wanted to use
the primary key as one column in my audit_log table, but in some of my tables
the PK consists of more than one column. Plus it's nice to have one key that is
called the same over all tables.
To get a former state for one row at date x I need to join the latest delta
BEFORE date x with each delta AFTER date x. If I would log complete rows, this
joining part would not be neccessary, but as I usually work with spatial
databases that have complex geometries and also image files, this strategy is
too harddisk consuming.
If there are more users following a similar approach, I wonder why we not throw
all the good ideas together, to have one solution that is tested, maintained
and improved by more developpers. This would be great.
Felix

Gesendet: Montag, 29. September 2014 um 23:25 Uhr
Von: "Abelard Hoffman"
An: "Felix Kunde"
Cc: "pgsql-general@postgresql.org"
Betreff: Re: [GENERAL] table versioning approach (not auditing)

Thank you Felix, Gavin, and Jonathan for your responses.
Felix & Jonathan: both of you mention just storing deltas. But if you do that,
how do you associate the delta record with the original row? Where's the PK
stored, if it wasn't part of the delta?
Felix, thank you very much for the example code. I took a look at your table
schemas. I need to study it more, but it looks like the way you're handling the
PK, is you're adding a separate synthethic key (audit_id) to each table that's
being versioned. And then storing that key along with the delta.
So then to find all the versions of a given row, you just need to join the
audit row with the schema_name.table_name.audit_id column. Is that right? The
only potential drawback there is there's no referential integrity between the
audit_log.audit_id and the actual table.
I do like that approach very much though, in that it eliminates the need to
interrogate the json data in order to perform most queries.
AH
On Mon, Sep 29, 2014 at 12:26 AM, Felix Kunde wrote:Hey
i've also tried to implement a database versioning using JSON to log changes in
tables. Here it is: https://github.com/fxku/audit[https://github.com/fxku/audit]
I've got two versioning tables, one storing information about all transactions
that happened and one where i put the JSON logs of row changes of each table.
I'm only logging old values and not complete rows.
Then I got a function that recreates a database state at a given time into a
separate schema - either to VIEWs, MVIEWs or TABLES. This database state could
then be indexed in order to work with it. You can also reset the production
state to the recreated past state.
Unfortunately I've got no time to further work on it at the moment + I have not
done tests with many changes in the database so I can't say if the recreation
process scales well. On downside I've realised is that using the json_agg
function has limits when I've got binary data. It gets too long. So I'm really
looking forward using JSONB.

There are more plans in my mind. By having a Transaction_Log table it should be
possible to revert only certain transactions. I'm also thinking of parallel
versioning, e.g. different users are all working with their version of the
database and commit their changes to the production state. As I've got a unique
history ID for each table and each row, I should be able to map the affected
records.

Have a look and tell me what you think of it.

Cheers
Felix

Gesendet: Montag, 29. September 2014 um 04:00 Uhr
Von: "Abelard Hoffman"
An: "pgsql-general@postgresql.org"
Betreff: [GENERAL] table versioning approach (not auditing)

Hi. I need to maintain a record of all changes to certain tables so assist in
viewing history and reverting changes when necessary (customer service makes an
incorrect edit, etc.).
I have studied these two audit trigger examples:
https://wiki.postgresql.org/wiki/Audit_trigger[https://wiki.postgresql.org/wiki/Audit_trigger]
https://wiki.postgresql.org/wiki/Audit_trigger_91plus
I've also read about two other approaches to versioning:
1. maintain all versions in one table, with a flag to indicate which is the
current version
2. have a separate versions table for each real table, and insert into the
associated version table whenever an update or insert is done.
My current implementation is based on the wiki trigger examples, using a single
table, and a json column to record the row changes (rather than hstore). What I
like about that, in particular, is I can have a "global," chronological view of
all versioned changes very easily.
But there are two types of queries I need to run.
1. Find all changes made by a specific user
2. Find all changes related to a specific record
#1 is simple to do. The vers

Re: [GENERAL] table versioning approach (not auditing)

2014-09-29 Thread Jonathan Vanasco

On Sep 29, 2014, at 4:06 PM, Nick Guenther wrote:

> A newbie tangent question: how do you access the transaction serial? Is it 
> txid_current() as listed in 
> http://www.postgresql.org/docs/9.3/static/functions-info.html?

My implementations were ridiculously simple/naive in design, and existed 
entirely with under defined serials.  i'd just create a new record + id on a 
write operation, and then use it when logging all operations.

I had read up on a lot of (possibly better) ways to handle this using pg 
internals.  They all seemed more advanced than I needed.

> And does your implementation worry about multiple timelines? 

Not sure I understand this... but every object is given a revision id.  edits 
between consecutive revisions are allowed, edits spanning multiple revisions 
are rejected.

On Sep 29, 2014, at 5:25 PM, Abelard Hoffman wrote:

> Felix & Jonathan: both of you mention just storing deltas. But if you do 
> that, how do you associate the delta record with the original row? Where's 
> the PK stored, if it wasn't part of the delta?

The logic I decided on, is this:

Revision 0
 Only the original record is stored
Revision 1
• Copy the original record into revision store
Revision 1+
• Update the original record, store the deltas in the revision store

The reason why I chose this path, is that in my system:
• most records are not edited
• the records that are edited, are heavily edited

We use an ORM and it was simple to implement this pattern with it, and then 
write some functions in postgres to ensure it is adhered to.

When I need to pull data out:

• I can pull exact revisions out of the htstore for a given table/row 
using the revision ids as a key
• the revisions all contain the transaction id
• if i need to get more info about a given transaction, i can query the 
transactions table and get a list of all the objects that were edited within 
that transaction

if i wanted to ensure referential integrity, i could have used a table instead 
of an hstore (or json).  If the application grows much larger, it will probably 
be migrated to a model like that.  This approach just gave a lot of flexibility 
, minimized  tables in the database, and was very easy to pull off.  i went 
with hstore because json didn't allow in-place updates at the time (i think it 
does now).  

-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general

Re: [GENERAL] table versioning approach (not auditing)

2014-09-29 Thread Abelard Hoffman

Thank you Felix, Gavin, and Jonathan for your responses.

Felix & Jonathan: both of you mention just storing deltas. But if you do
that, how do you associate the delta record with the original row? Where's
the PK stored, if it wasn't part of the delta?

Felix, thank you very much for the example code. I took a look at your
table schemas. I need to study it more, but it looks like the way you're
handling the PK, is you're adding a separate synthethic key (audit_id) to
each table that's being versioned. And then storing that key along with the
delta.

So then to find all the versions of a given row, you just need to join the
audit row with the schema_name.table_name.audit_id column. Is that right?
The only potential drawback there is there's no referential integrity
between the audit_log.audit_id and the actual table.

I do like that approach very much though, in that it eliminates the need to
interrogate the json data in order to perform most queries.

AH



On Mon, Sep 29, 2014 at 12:26 AM, Felix Kunde  wrote:

> Hey
>
> i've also tried to implement a database versioning using JSON to log
> changes in tables. Here it is: https://github.com/fxku/audit
> I've got two versioning tables, one storing information about all
> transactions that happened and one where i put the JSON logs of row changes
> of each table. I'm only logging old values and not complete rows.
>
> Then I got a function that recreates a database state at a given time into
> a separate schema - either to VIEWs, MVIEWs or TABLES. This database state
> could then be indexed in order to work with it. You can also reset the
> production state to the recreated past state.
>
> Unfortunately I've got no time to further work on it at the moment + I
> have not done tests with many changes in the database so I can't say if the
> recreation process scales well. On downside I've realised is that using the
> json_agg function has limits when I've got binary data. It gets too long.
> So I'm really looking forward using JSONB.
>
> There are more plans in my mind. By having a Transaction_Log table it
> should be possible to revert only certain transactions. I'm also thinking
> of parallel versioning, e.g. different users are all working with their
> version of the database and commit their changes to the production state.
> As I've got a unique history ID for each table and each row, I should be
> able to map the affected records.
>
> Have a look and tell me what you think of it.
>
> Cheers
> Felix
>
>
> Gesendet: Montag, 29. September 2014 um 04:00 Uhr
> Von: "Abelard Hoffman" 
> An: "pgsql-general@postgresql.org" 
> Betreff: [GENERAL] table versioning approach (not auditing)
>
> Hi. I need to maintain a record of all changes to certain tables so assist
> in viewing history and reverting changes when necessary (customer service
> makes an incorrect edit, etc.).
>
> I have studied these two audit trigger examples:
> https://wiki.postgresql.org/wiki/Audit_trigger
> https://wiki.postgresql.org/wiki/Audit_trigger_91plus
>
> I've also read about two other approaches to versioning:
> 1. maintain all versions in one table, with a flag to indicate which is
> the current version
> 2. have a separate versions table for each real table, and insert into the
> associated version table whenever an update or insert is done.
>
> My current implementation is based on the wiki trigger examples, using a
> single table, and a json column to record the row changes (rather than
> hstore). What I like about that, in particular, is I can have a "global,"
> chronological view of all versioned changes very easily.
>
> But there are two types of queries I need to run.
> 1. Find all changes made by a specific user
> 2. Find all changes related to a specific record
>
> #1 is simple to do. The versioning table has a user_id column of who made
> the change, so I can query on that.
>
> #2 is more difficult. I may want to fetch all changes to a group of tables
> that are all related by foreign keys (e.g., find all changes to "user"
> record 849, along with any changes to their "articles," "photos," etc.).
> All of the data is in the json column, of course, but it seems like a pain
> to try and build a query on the json column that can fetch all those
> relationships (and if I mess it up, I probably won't generate any errors,
> since the json is so free-form).
>
> So my question is, do you think using the json approach is wrong for this
> case? Does it seem better to have separate versioning tables associated
> with each real table? Or another approach?
>
> Thanks
>
>
>

Re: [GENERAL] table versioning approach (not auditing)

2014-09-29 Thread Nick Guenther

On September 29, 2014 11:08:55 AM EDT, Jonathan Vanasco  
wrote:
>
>- use a "transaction" log.  every write session gets logged into the
>transaction table (serial, timestamp, user_id).  all updates to the
>recorded tables include the transaction's serial.  then there is a
>"transactions" table, that is just "transaction_serial ,  object_id ,
>object_action".  

A newbie tangent question: how do you access the transaction serial? Is it 
txid_current() as listed in 
http://www.postgresql.org/docs/9.3/static/functions-info.html?

And how do you actually make use of that information? I know from Bruce 
Momjians's excellent MVCC talk  
that postgres internally has a secret txid column on each row; can you somehow 
query on the secret column? And does your implementation worry about multiple 
timelines? 

My use case is dynamically allocated replication. Broadly, my algorithm is that 
for each client
1) download a full copy of the current table
2) keep the connection open and send deltas (which are just inserts and 
deletes, for me)

I need 2 to begin *as if immediately* after 1.  txids sound like they are 
exactly what I need but without knowing how to handle them, I fudged it by 
opening a query for 1 and for 2 immediately after each other so that they 
should be plugged to the same txid but before reading them.  There's definitely 
a race condition that will show under load, though. I think the correct 
algorithm is:

1) ask the current txid X
2) start buffering deltas with txid > X
3) download the table as of X
4) download the buffer of deltas and listen for future ones
-- 

-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general

Re: [GENERAL] table versioning approach (not auditing)

2014-09-29 Thread Jonathan Vanasco


In the past, to accomplish the same thing I've done this:

- store the data in hstore/json.  instead of storing snapshots, I store deltas. 
 i've been using a second table though, because it's improved performance on 
reads and writes.
- use a "transaction" log.  every write session gets logged into the 
transaction table (serial, timestamp, user_id).  all updates to the recorded 
tables include the transaction's serial.  then there is a "transactions" table, 
that is just "transaction_serial ,  object_id , object_action".  

whenever I have needs for auditing or versioning, I can just query the 
transaction table for the records I want... then use that to grab the data out 
of hstore.



On Sep 28, 2014, at 10:00 PM, Abelard Hoffman wrote:

> Hi. I need to maintain a record of all changes to certain tables so assist in 
> viewing history and reverting changes when necessary (customer service makes 
> an incorrect edit, etc.).
> 
> I have studied these two audit trigger examples:
> https://wiki.postgresql.org/wiki/Audit_trigger
> https://wiki.postgresql.org/wiki/Audit_trigger_91plus
> 
> I've also read about two other approaches to versioning:
> 1. maintain all versions in one table, with a flag to indicate which is the 
> current version
> 2. have a separate versions table for each real table, and insert into the 
> associated version table whenever an update or insert is done.
> 
> My current implementation is based on the wiki trigger examples, using a 
> single table, and a json column to record the row changes (rather than 
> hstore). What I like about that, in particular, is I can have a "global," 
> chronological view of all versioned changes very easily.
> 
> But there are two types of queries I need to run.
> 1. Find all changes made by a specific user
> 2. Find all changes related to a specific record
> 
> #1 is simple to do. The versioning table has a user_id column of who made the 
> change, so I can query on that.
> 
> #2 is more difficult. I may want to fetch all changes to a group of tables 
> that are all related by foreign keys (e.g., find all changes to "user" record 
> 849, along with any changes to their "articles," "photos," etc.). All of the 
> data is in the json column, of course, but it seems like a pain to try and 
> build a query on the json column that can fetch all those relationships (and 
> if I mess it up, I probably won't generate any errors, since the json is so 
> free-form).
> 
> So my question is, do you think using the json approach is wrong for this 
> case? Does it seem better to have separate versioning tables associated with 
> each real table? Or another approach?
> 
> Thanks

Re: [GENERAL] table versioning approach (not auditing)

2014-09-29 Thread Felix Kunde

Hey
 
i've also tried to implement a database versioning using JSON to log changes in 
tables. Here it is: https://github.com/fxku/audit
I've got two versioning tables, one storing information about all transactions 
that happened and one where i put the JSON logs of row changes of each table. 
I'm only logging old values and not complete rows.
 
Then I got a function that recreates a database state at a given time into a 
separate schema - either to VIEWs, MVIEWs or TABLES. This database state could 
then be indexed in order to work with it. You can also reset the production 
state to the recreated past state.
 
Unfortunately I've got no time to further work on it at the moment + I have not 
done tests with many changes in the database so I can't say if the recreation 
process scales well. On downside I've realised is that using the json_agg 
function has limits when I've got binary data. It gets too long. So I'm really 
looking forward using JSONB.

There are more plans in my mind. By having a Transaction_Log table it should be 
possible to revert only certain transactions. I'm also thinking of parallel 
versioning, e.g. different users are all working with their version of the 
database and commit their changes to the production state. As I've got a unique 
history ID for each table and each row, I should be able to map the affected 
records.

Have a look and tell me what you think of it.

Cheers
Felix
 

Gesendet: Montag, 29. September 2014 um 04:00 Uhr
Von: "Abelard Hoffman" 
An: "pgsql-general@postgresql.org" 
Betreff: [GENERAL] table versioning approach (not auditing)

Hi. I need to maintain a record of all changes to certain tables so assist in 
viewing history and reverting changes when necessary (customer service makes an 
incorrect edit, etc.).
 
I have studied these two audit trigger examples:
https://wiki.postgresql.org/wiki/Audit_trigger
https://wiki.postgresql.org/wiki/Audit_trigger_91plus
 
I've also read about two other approaches to versioning:
1. maintain all versions in one table, with a flag to indicate which is the 
current version
2. have a separate versions table for each real table, and insert into the 
associated version table whenever an update or insert is done.
 
My current implementation is based on the wiki trigger examples, using a single 
table, and a json column to record the row changes (rather than hstore). What I 
like about that, in particular, is I can have a "global," chronological view of 
all versioned changes very easily.
 
But there are two types of queries I need to run.
1. Find all changes made by a specific user
2. Find all changes related to a specific record
 
#1 is simple to do. The versioning table has a user_id column of who made the 
change, so I can query on that.
 
#2 is more difficult. I may want to fetch all changes to a group of tables that 
are all related by foreign keys (e.g., find all changes to "user" record 849, 
along with any changes to their "articles," "photos," etc.). All of the data is 
in the json column, of course, but it seems like a pain to try and build a 
query on the json column that can fetch all those relationships (and if I mess 
it up, I probably won't generate any errors, since the json is so free-form).
 
So my question is, do you think using the json approach is wrong for this case? 
Does it seem better to have separate versioning tables associated with each 
real table? Or another approach?
 
Thanks
 
 


-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general

Re: [GENERAL] table versioning approach (not auditing)

2014-09-28 Thread Gavin Flower


On 29/09/14 15:00, Abelard Hoffman wrote:
Hi. I need to maintain a record of all changes to certain tables so 
assist in viewing history and reverting changes when necessary 
(customer service makes an incorrect edit, etc.).


I have studied these two audit trigger examples:
https://wiki.postgresql.org/wiki/Audit_trigger
https://wiki.postgresql.org/wiki/Audit_trigger_91plus

I've also read about two other approaches to versioning:
1. maintain all versions in one table, with a flag to indicate which 
is the current version
2. have a separate versions table for each real table, and insert into 
the associated version table whenever an update or insert is done.


My current implementation is based on the wiki trigger examples, using 
a single table, and a json column to record the row changes (rather 
than hstore). What I like about that, in particular, is I can have a 
"global," chronological view of all versioned changes very easily.


But there are two types of queries I need to run.
1. Find all changes made by a specific user
2. Find all changes related to a specific record

#1 is simple to do. The versioning table has a user_id column of who 
made the change, so I can query on that.


#2 is more difficult. I may want to fetch all changes to a group of 
tables that are all related by foreign keys (e.g., find all changes to 
"user" record 849, along with any changes to their "articles," 
"photos," etc.). All of the data is in the json column, of course, but 
it seems like a pain to try and build a query on the json column that 
can fetch all those relationships (and if I mess it up, I probably 
won't generate any errors, since the json is so free-form).


So my question is, do you think using the json approach is wrong for 
this case? Does it seem better to have separate versioning tables 
associated with each real table? Or another approach?


Thanks


I implemented a 2 table approach over 15 years ago for an insurance 
application.  I used both an /effective_date/ & and an /as_at_date/, no 
triggers were involved.  I think a 2 table approach gives you more 
flexibility.


The /effective_date/ allowed changes to be made to the table in advance 
of when they were to become effective.


The /as_at_date/ allowed quotes to be made, valid for a period starting 
at the as_at_date.


End users did not query the database directly, all queries were precoded 
in a 4GL called Progress backed by an Oracle database.  The same could 
be done with a WildFly Java Enterprise AppSever (or some other 
middleware) and a PostgreSQL backend.


Different use case, but the concept is probably adaptable to your situation.

You may want a change table, that has a change_number that is in each 
type of table affected by a change.  This would help for query type #2.


I would be quite happy to contract to work out the appropriate schema 
and develop some SQL scripts to query & update the database, if you were 
interested.  My approach would be to create a minimal database with 
sample data to validate the schema design and SQL scripts.


Using a flag to indicate current record, seems inflexible.  As some 
changes may not take affect until some time in the future, and you can't 
query the database to see what was the situation at a particular point 
in the past.  For example: somebody complains about something that 
happened last Saturday near noon, how would you query the database to 
what it was like then?



Cheers,
Gavin

[GENERAL] table versioning approach (not auditing)

2014-09-28 Thread Abelard Hoffman

Hi. I need to maintain a record of all changes to certain tables so assist
in viewing history and reverting changes when necessary (customer service
makes an incorrect edit, etc.).

I have studied these two audit trigger examples:
https://wiki.postgresql.org/wiki/Audit_trigger
https://wiki.postgresql.org/wiki/Audit_trigger_91plus

I've also read about two other approaches to versioning:
1. maintain all versions in one table, with a flag to indicate which is the
current version
2. have a separate versions table for each real table, and insert into the
associated version table whenever an update or insert is done.

My current implementation is based on the wiki trigger examples, using a
single table, and a json column to record the row changes (rather than
hstore). What I like about that, in particular, is I can have a "global,"
chronological view of all versioned changes very easily.

But there are two types of queries I need to run.
1. Find all changes made by a specific user
2. Find all changes related to a specific record

#1 is simple to do. The versioning table has a user_id column of who made
the change, so I can query on that.

#2 is more difficult. I may want to fetch all changes to a group of tables
that are all related by foreign keys (e.g., find all changes to "user"
record 849, along with any changes to their "articles," "photos," etc.).
All of the data is in the json column, of course, but it seems like a pain
to try and build a query on the json column that can fetch all those
relationships (and if I mess it up, I probably won't generate any errors,
since the json is so free-form).

So my question is, do you think using the json approach is wrong for this
case? Does it seem better to have separate versioning tables associated
with each real table? Or another approach?

Thanks

Re: [GENERAL] table versioning approach (not auditing)

Re: [GENERAL] table versioning approach (not auditing)

Re: [GENERAL] table versioning approach (not auditing)

Re: [GENERAL] table versioning approach (not auditing)

Re: [GENERAL] table versioning approach (not auditing)

Re: [GENERAL] table versioning approach (not auditing)

Re: [GENERAL] table versioning approach (not auditing)

Re: [GENERAL] table versioning approach (not auditing)

Re: [GENERAL] table versioning approach (not auditing)

Re: [GENERAL] table versioning approach (not auditing)

Re: [GENERAL] table versioning approach (not auditing)

Re: [GENERAL] table versioning approach (not auditing)

Re: [GENERAL] table versioning approach (not auditing)

Re: [GENERAL] table versioning approach (not auditing)

Re: [GENERAL] table versioning approach (not auditing)

Re: [GENERAL] table versioning approach (not auditing)

Re: [GENERAL] table versioning approach (not auditing)

[GENERAL] table versioning approach (not auditing)

18 matches

Site Navigation

Mail list logo

Footer information