Re: [GENERAL] table versioning approach (not auditing)

2014-10-10 Thread Jim Nasby

On 10/7/14, 10:40 PM, Gavin Flower wrote:

Yeah, I'm pretty convinced at this point that history/versioning should be 
built on top of a schema that always contains the current information, if for 
no other reason than so you always have a PK that points to what's current in 
addition to your history PKs.

One of the motivations for having an effective_date, was being able to put 
changes into the database ahead of time.


Yeah, allowing for future data makes things more interesting. My first 
inclination is that it's a completely separate requirement, and you would track 
the history of all records that you had at a point in time. Doing that means 
you can see things like someone changing the effective date from Nov. 1 to Dec. 
1. But clearly this is an area where you have to take the business case into 
account.
--
Jim Nasby, Data Architect, Blue Treble Consulting
Data in Trouble? Get it in Treble! http://BlueTreble.com


--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] table versioning approach (not auditing)

2014-10-07 Thread Gavin Flower

On 08/10/14 13:29, Jim Nasby wrote:

On 10/6/14, 6:10 PM, Gavin Flower wrote:
Even if timestamps are used extensively, you'd have to be careful 
joining on them. You may have information valid at T1 and changing at 
T3, but the transaction has T2, where T1 < T2 < T3 - the appropriate 
set of data would be associated with T1, would would not get anywhere 
trying to find data with a timestamp of T2 (unless you were very 
lucky!).


Yeah, this is why I think timestamps need to be shunned in favor of 
explicit pointers. Anyone that thinks timestamps are good enough 
hasn't thought the problem through completely. :)


I also think there's potential value to storing full transaction 
information (presumably in a separate table): txid_current(), 
txid_current_snapshot(), now(), current_user, maybe some other stuff 
(client IP address?). That way you can tell exactly what created a 
history record. With appropriate shenanigans you can theoretically 
determine exactly what other history data would be visible at that 
time without using pointers (but man would that bu ugly!)


Actually things like phone numbers are tricky.  Sometimes you may 
want to use the current phone number, and not the one extant at that 
time (as you want to phone the contact now), or you may still want 
the old phone number (was the call to a specific number at date/time 
legitimate & who do we charge the cost of the call too).


Yeah, I'm pretty convinced at this point that history/versioning 
should be built on top of a schema that always contains the current 
information, if for no other reason than so you always have a PK that 
points to what's current in addition to your history PKs.
One of the motivations for having an effective_date, was being able to 
put changes into the database ahead of time.


Finding the current value uses the same logic a find the value at any 
other date/time - so you don't need a special schema to distinguish the 
current state from anything else.  For example:


   DROP TABLE IF EXISTS stock;

   CREATE TABLE stock
   (
id text,
effective_date timestamptz,
price numeric
   );

   INSERT INTO stock
   (
   id,
   effective_date,
   price
   )
   VALUES
('y88', '2014-10-01', 12.0),
('x42', '2014-10-01', 12.1),
('x42', '2014-10-08', 12.2),
('x42', '2014-10-10', 12.3),
('x42', '2014-10-16', 12.4),
('z42', '2014-10-19', 12.5),
('z49', '2014-10-01', 12.6),
('z49', '2014-10-30', 12.7),
('z77', '2014-10-01', 12.8);

   CREATE UNIQUE INDEX primary_key ON stock (id ASC, effective_date DESC);

   SELECT
s.price
   FROM
stock s
   WHERE
s.id = 'x42'
AND s.effective_date <= '2014-10-11'
   ORDER BY
s.effective_date DESC
   LIMIT 1;



Cheers,
Gavin


Re: [GENERAL] table versioning approach (not auditing)

2014-10-07 Thread Jim Nasby

On 10/6/14, 6:10 PM, Gavin Flower wrote:

Even if timestamps are used extensively, you'd have to be careful joining on them. 
You may have information valid at T1 and changing at T3, but the transaction has T2, 
where T1 < T2 < T3 - the appropriate set of data would be associated with T1, 
would would not get anywhere trying to find data with a timestamp of T2 (unless you 
were very lucky!).


Yeah, this is why I think timestamps need to be shunned in favor of explicit 
pointers. Anyone that thinks timestamps are good enough hasn't thought the 
problem through completely. :)

I also think there's potential value to storing full transaction information 
(presumably in a separate table): txid_current(), txid_current_snapshot(), 
now(), current_user, maybe some other stuff (client IP address?). That way you 
can tell exactly what created a history record. With appropriate shenanigans 
you can theoretically determine exactly what other history data would be 
visible at that time without using pointers (but man would that bu ugly!)


Actually things like phone numbers are tricky.  Sometimes you may want to use the 
current phone number, and not the one extant at that time (as you want to phone the 
contact now), or you may still want the old phone number (was the call to a 
specific number at date/time legitimate & who do we charge the cost of the call 
too).


Yeah, I'm pretty convinced at this point that history/versioning should be 
built on top of a schema that always contains the current information, if for 
no other reason than so you always have a PK that points to what's current in 
addition to your history PKs.
--
Jim Nasby, Data Architect, Blue Treble Consulting
Data in Trouble? Get it in Treble! http://BlueTreble.com


--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] table versioning approach (not auditing)

2014-10-06 Thread Gavin Flower

On 07/10/14 10:47, Jim Nasby wrote:

On 10/2/14, 9:27 AM, Adam Brusselback wrote:
i've also tried to implement a database versioning using JSON to log 
changes in tables. Here it is: 
https://github.com/fxku/audit[https://github.com/fxku/audit][https://github.com/fxku/audit[https://github.com/fxku/audit]] 
 

I've got two versioning tables, one storing information about all 
transactions that happened and one where i put the JSON logs of row 
changes of each table. I'm only logging old values and not complete 
rows.


Then I got a function that recreates a database state at a given time 
into a separate schema - either to VIEWs, MVIEWs or TABLES. This 
database state could then be indexed in order to work with it. You 
can also reset the production state to the recreated past state.


Unfortunately I've got no time to further work on it at the moment + 
I have not done tests with many changes in the database so I can't 
say if the recreation process scales well. On downside I've realised 
is that using the json_agg function has limits when I've got binary 
data. It gets too long. So I'm really looking forward using JSONB.


There are more plans in my mind. By having a Transaction_Log table it 
should be possible to revert only certain transactions. I'm also 
thinking of parallel versioning, e.g. different users are all working 
with their version of the database and commit their changes to the 
production state. As I've got a unique history ID for each table and 
each row, I should be able to map the affected records.
Sorry I'm coming late to this thread. I agree that getting interested 
people together would be a good idea. Is there another mailing list we 
can do that with?


Versioning is also something I've interested in, and have put a lot of 
thought into (if not much actual code :( ). I'll also make some 
general comments, if I may...



I think timestamps should be *heavily avoided* in versioning, because 
they are frequently the wrong way to solve a problem. There are many 
use cases where you're trying to answer "What values were in place 
when X happened", and the simplest, most fool-proof way to answer that 
is that when you create a record for X, part of that record is a 
"history ID" that shows you the exact data used. For example, if 
you're creating an invoicing system that has versioning of customer 
addresses you would not try and join an invoice with it's address 
using a timestamp; you would put an actual address_history_id in the 
invoice table.


I thought I saw a reference to versioning sets of information. This is 
perhaps the trickiest part. You first have to think about the 
non-versioned sets (ie: a customer may have many phone numbers) before 
you think about versioning the set. In this example, you want the 
history of the *set* of phone numbers, not of each individual number. 
Design it with full duplication of data first, don't think about 
normalizing until you have the full set versioning design.


I understand the generic appeal of using something like JSON, but in 
reality I don't see it working terribly well. It's likely to be on the 
slow side, and it'll also be difficult to query from. Instead, I think 
it makes more sense to create actual history tables that derive their 
definition from the base table. I've got code that extracts 
information (column_name, data type, nullability) from a table (or 
even a table definition), and it's not that complex. With the work 
that's been done on capturing DDL changes it shouldn't be too hard to 
handle that automatically.



Yeah, my design was quite extensive and ensured all relevant information 
was associated with the 'history id' (still need timestamps to find the 
appropriate value), the powers that be watered it down somewhat (but 
that was outside my control). Performance was not too critical, probably 
less than 10 transactions per second at peak times.  JSON, had yet to be 
invented, but we would not have used it anyhow.


Even if timestamps are used extensively, you'd have to be careful 
joining on them. You may have information valid at T1 and changing at 
T3, but the transaction has T2, where T1 < T2 < T3 - the appropriate set 
of data would be associated with T1, would would not get anywhere trying 
to find data with a timestamp of T2 (unless you were very lucky!).


Actually things like phone numbers are tricky.  Sometimes you may want 
to use the current phone number, and not the one extant at that time (as 
you want to phone the contact now), or you may still want the old phone 
number (was the call to a specific number at date/time legitimate & who 
do we charge the cost of the call too).



--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] table versioning approach (not auditing)

2014-10-06 Thread Jim Nasby

On 10/2/14, 9:27 AM, Adam Brusselback wrote:

i've also tried to implement a database versioning using JSON to log changes in 
tables. Here it is: 
https://github.com/fxku/audit[https://github.com/fxku/audit][https://github.com/fxku/audit[https://github.com/fxku/audit]]
 

I've got two versioning tables, one storing information about all transactions 
that happened and one where i put the JSON logs of row changes of each table. 
I'm only logging old values and not complete rows.

Then I got a function that recreates a database state at a given time into a 
separate schema - either to VIEWs, MVIEWs or TABLES. This database state could 
then be indexed in order to work with it. You can also reset the production 
state to the recreated past state.

Unfortunately I've got no time to further work on it at the moment + I have not 
done tests with many changes in the database so I can't say if the recreation 
process scales well. On downside I've realised is that using the json_agg 
function has limits when I've got binary data. It gets too long. So I'm really 
looking forward using JSONB.

There are more plans in my mind. By having a Transaction_Log table it should be 
possible to revert only certain transactions. I'm also thinking of parallel 
versioning, e.g. different users are all working with their version of the 
database and commit their changes to the production state. As I've got a unique 
history ID for each table and each row, I should be able to map the affected 
records.

Sorry I'm coming late to this thread. I agree that getting interested people 
together would be a good idea. Is there another mailing list we can do that 
with?

Versioning is also something I've interested in, and have put a lot of thought 
into (if not much actual code :( ). I'll also make some general comments, if I 
may...


I think timestamps should be *heavily avoided* in versioning, because they are frequently the wrong 
way to solve a problem. There are many use cases where you're trying to answer "What values 
were in place when X happened", and the simplest, most fool-proof way to answer that is that 
when you create a record for X, part of that record is a "history ID" that shows you the 
exact data used. For example, if you're creating an invoicing system that has versioning of 
customer addresses you would not try and join an invoice with it's address using a timestamp; you 
would put an actual address_history_id in the invoice table.

I thought I saw a reference to versioning sets of information. This is perhaps 
the trickiest part. You first have to think about the non-versioned sets (ie: a 
customer may have many phone numbers) before you think about versioning the 
set. In this example, you want the history of the *set* of phone numbers, not 
of each individual number. Design it with full duplication of data first, don't 
think about normalizing until you have the full set versioning design.

I understand the generic appeal of using something like JSON, but in reality I 
don't see it working terribly well. It's likely to be on the slow side, and 
it'll also be difficult to query from. Instead, I think it makes more sense to 
create actual history tables that derive their definition from the base table. 
I've got code that extracts information (column_name, data type, nullability) 
from a table (or even a table definition), and it's not that complex. With the 
work that's been done on capturing DDL changes it shouldn't be too hard to 
handle that automatically.


--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] table versioning approach (not auditing)

2014-10-02 Thread Adam Brusselback
Ended up running for 28 min, but it did work as expected.

On Thu, Oct 2, 2014 at 10:27 AM, Adam Brusselback  wrote:

> Testing that now.  Initial results are not looking too performant.
> I have one single table which had 234575 updates done to it.  I am rolling
> back 13093 of them.  It's been running 20 min now, using 100% of a single
> core, and almost 0 disk.  No idea how long it'll run at this point.
>
> This is on an i5 desktop with 16 gigs of ram and an ssd.
>
> This is a pretty good test though, as it's a real world use case (even if
> the data was generated with PGBench).  We now know that area needs some
> work before it can be used for anything more than a toy database.
>
> Thanks,
> -Adam
>
> On Thu, Oct 2, 2014 at 7:52 AM, Felix Kunde  wrote:
>
>> Hey there
>>
>> Thanks again for the fix. I was able to merge it into my repo.
>> Also thanks for benchmarking audit. Very interesting results.
>> I wonder how the recreation of former database states scales when
>> processing many deltas.
>> Haven’t done a lot of testing in that direction.
>>
>> I will transfer the code soon to a more public repo on GitHub. As far as
>> I see I have to create an organization for that.
>>
>> Cheers
>> Felix
>>
>> *Gesendet:* Mittwoch, 01. Oktober 2014 um 17:09 Uhr
>>
>> *Von:* "Adam Brusselback" 
>> *An:* "Felix Kunde" 
>> *Cc:* "pgsql-general@postgresql.org" 
>> *Betreff:* Re: [GENERAL] table versioning approach (not auditing)
>>   I know we're kinda hijacking this thread, so sorry for that.  If you'd
>> like to do that, i'd be more than happy to use it and push any fixes /
>> changes upstream.  I don't have much of a preference on the name either, as
>> long as it's something that makes sense.
>>
>> I would consider myself far from an expert though! Either way, more
>> people using a single solution is a good thing.
>>
>> As a side note, I did some benchmarking this morning and wanted to share
>> the results:
>> pgbench -i -s 140 -U postgres pgbench
>>
>> pgbench -c 4 -j 4 -T 600 -U postgres pgbench
>> no auditing tps: 2854
>> NOTE: Accounts are audited
>> auditing tps: 1278
>>
>> pgbench -c 2 -j 2 -N -T 300 -U postgres pgbench
>> no auditing tps: 2504
>> NOTE: Accounts are audited
>> auditing tps: 822
>>
>> pgbench -c 2 -j 2 -T 300 -U postgres pgbench
>> no auditing tps: 1836
>> NOTE: branches and tellers are audited, accounts are not
>> auditing tps: 505
>>
>> I'd love to see if there are some easy wins to boost the performance.
>>
>> On Wed, Oct 1, 2014 at 5:19 AM, Felix Kunde  wrote:
>>>
>>> Hey there. Thank you very much for that fix! Thats why I'd like to have
>>> a joint development and joint testing. It's way more convincing for users
>>> to go for a solution that is tested by some experts than just by a random
>>> developer :)
>>>
>>> I'm open to create a new project and push the code there. Don't care
>>> about the name. Then we might figure out which parts are already good,
>>> which parts could be improved and where to go next. I think switching to
>>> JSONB for example will be easy, as it offers the same functions than JSON
>>> afaik.
>>>
>>>
>>> Gesendet: Dienstag, 30. September 2014 um 21:16 Uhr
>>> Von: "Adam Brusselback" 
>>> An: "Felix Kunde" 
>>> Cc: "pgsql-general@postgresql.org" 
>>> Betreff: Re: [GENERAL] table versioning approach (not auditing)
>>>
>>> Felix, I'd love to see a single, well maintained project. For example, I
>>> just found yours, and gave it a shot today after seeing this post.  I found
>>> a bug when an update command is issued, but the old and new values are all
>>> the same.  The trigger will blow up.  I've got a fix for that, but if we
>>> had one project that more than a handful of people used, stuff like that
>>> would be quashed very quickly.
>>>
>>> I love the design of it by the way. Any idea what it will take to move
>>> to JSONB for 9.4?
>>>
>>>
>>> On Tue, Sep 30, 2014 at 7:22 AM, Felix Kunde 
>>> wrote:Hey
>>>
>>> yes i'm adding an additional key to each of my tables. First i wanted to
>>> use the primary key as one column in my audit_log table, but in some of my
>>> tables the PK consists of more than one column. Plus it's nice to have one
>>> 

Re: [GENERAL] table versioning approach (not auditing)

2014-10-02 Thread Adam Brusselback
Testing that now.  Initial results are not looking too performant.
I have one single table which had 234575 updates done to it.  I am rolling
back 13093 of them.  It's been running 20 min now, using 100% of a single
core, and almost 0 disk.  No idea how long it'll run at this point.

This is on an i5 desktop with 16 gigs of ram and an ssd.

This is a pretty good test though, as it's a real world use case (even if
the data was generated with PGBench).  We now know that area needs some
work before it can be used for anything more than a toy database.

Thanks,
-Adam

On Thu, Oct 2, 2014 at 7:52 AM, Felix Kunde  wrote:

> Hey there
>
> Thanks again for the fix. I was able to merge it into my repo.
> Also thanks for benchmarking audit. Very interesting results.
> I wonder how the recreation of former database states scales when
> processing many deltas.
> Haven’t done a lot of testing in that direction.
>
> I will transfer the code soon to a more public repo on GitHub. As far as I
> see I have to create an organization for that.
>
> Cheers
> Felix
>
> *Gesendet:* Mittwoch, 01. Oktober 2014 um 17:09 Uhr
>
> *Von:* "Adam Brusselback" 
> *An:* "Felix Kunde" 
> *Cc:* "pgsql-general@postgresql.org" 
> *Betreff:* Re: [GENERAL] table versioning approach (not auditing)
>   I know we're kinda hijacking this thread, so sorry for that.  If you'd
> like to do that, i'd be more than happy to use it and push any fixes /
> changes upstream.  I don't have much of a preference on the name either, as
> long as it's something that makes sense.
>
> I would consider myself far from an expert though! Either way, more people
> using a single solution is a good thing.
>
> As a side note, I did some benchmarking this morning and wanted to share
> the results:
> pgbench -i -s 140 -U postgres pgbench
>
> pgbench -c 4 -j 4 -T 600 -U postgres pgbench
> no auditing tps: 2854
> NOTE: Accounts are audited
> auditing tps: 1278
>
> pgbench -c 2 -j 2 -N -T 300 -U postgres pgbench
> no auditing tps: 2504
> NOTE: Accounts are audited
> auditing tps: 822
>
> pgbench -c 2 -j 2 -T 300 -U postgres pgbench
> no auditing tps: 1836
> NOTE: branches and tellers are audited, accounts are not
> auditing tps: 505
>
> I'd love to see if there are some easy wins to boost the performance.
>
> On Wed, Oct 1, 2014 at 5:19 AM, Felix Kunde  wrote:
>>
>> Hey there. Thank you very much for that fix! Thats why I'd like to have a
>> joint development and joint testing. It's way more convincing for users to
>> go for a solution that is tested by some experts than just by a random
>> developer :)
>>
>> I'm open to create a new project and push the code there. Don't care
>> about the name. Then we might figure out which parts are already good,
>> which parts could be improved and where to go next. I think switching to
>> JSONB for example will be easy, as it offers the same functions than JSON
>> afaik.
>>
>>
>> Gesendet: Dienstag, 30. September 2014 um 21:16 Uhr
>> Von: "Adam Brusselback" 
>> An: "Felix Kunde" 
>> Cc: "pgsql-general@postgresql.org" 
>> Betreff: Re: [GENERAL] table versioning approach (not auditing)
>>
>> Felix, I'd love to see a single, well maintained project. For example, I
>> just found yours, and gave it a shot today after seeing this post.  I found
>> a bug when an update command is issued, but the old and new values are all
>> the same.  The trigger will blow up.  I've got a fix for that, but if we
>> had one project that more than a handful of people used, stuff like that
>> would be quashed very quickly.
>>
>> I love the design of it by the way. Any idea what it will take to move to
>> JSONB for 9.4?
>>
>>
>> On Tue, Sep 30, 2014 at 7:22 AM, Felix Kunde 
>> wrote:Hey
>>
>> yes i'm adding an additional key to each of my tables. First i wanted to
>> use the primary key as one column in my audit_log table, but in some of my
>> tables the PK consists of more than one column. Plus it's nice to have one
>> key that is called the same over all tables.
>>
>> To get a former state for one row at date x I need to join the latest
>> delta BEFORE date x with each delta AFTER date x. If I would log complete
>> rows, this joining part would not be neccessary, but as I usually work with
>> spatial databases that have complex geometries and also image files, this
>> strategy is too harddisk consuming.
>>
>> If there are more users following a similar approach, I wonder why we not
>> throw all the

Re: [GENERAL] table versioning approach (not auditing)

2014-10-01 Thread Adam Brusselback
I know we're kinda hijacking this thread, so sorry for that.  If you'd like
to do that, i'd be more than happy to use it and push any fixes / changes
upstream.  I don't have much of a preference on the name either, as long as
it's something that makes sense.

I would consider myself far from an expert though! Either way, more people
using a single solution is a good thing.

As a side note, I did some benchmarking this morning and wanted to share
the results:
pgbench -i -s 140 -U postgres pgbench

pgbench -c 4 -j 4 -T 600 -U postgres pgbench
no auditing tps: 2854
NOTE: Accounts are audited
auditing tps: 1278

pgbench -c 2 -j 2 -N -T 300 -U postgres pgbench
no auditing tps: 2504
NOTE: Accounts are audited
auditing tps: 822

pgbench -c 2 -j 2 -T 300 -U postgres pgbench
no auditing tps: 1836
NOTE: branches and tellers are audited, accounts are not
auditing tps: 505

I'd love to see if there are some easy wins to boost the performance.

On Wed, Oct 1, 2014 at 5:19 AM, Felix Kunde  wrote:

> Hey there. Thank you very much for that fix! Thats why I'd like to have a
> joint development and joint testing. It's way more convincing for users to
> go for a solution that is tested by some experts than just by a random
> developer :)
>
> I'm open to create a new project and push the code there. Don't care about
> the name. Then we might figure out which parts are already good, which
> parts could be improved and where to go next. I think switching to JSONB
> for example will be easy, as it offers the same functions than JSON afaik.
>
>
> Gesendet: Dienstag, 30. September 2014 um 21:16 Uhr
> Von: "Adam Brusselback" 
> An: "Felix Kunde" 
> Cc: "pgsql-general@postgresql.org" 
> Betreff: Re: [GENERAL] table versioning approach (not auditing)
>
> Felix, I'd love to see a single, well maintained project. For example, I
> just found yours, and gave it a shot today after seeing this post.  I found
> a bug when an update command is issued, but the old and new values are all
> the same.  The trigger will blow up.  I've got a fix for that, but if we
> had one project that more than a handful of people used, stuff like that
> would be quashed very quickly.
>
> I love the design of it by the way. Any idea what it will take to move to
> JSONB for 9.4?
>
>
> On Tue, Sep 30, 2014 at 7:22 AM, Felix Kunde 
> wrote:Hey
>
> yes i'm adding an additional key to each of my tables. First i wanted to
> use the primary key as one column in my audit_log table, but in some of my
> tables the PK consists of more than one column. Plus it's nice to have one
> key that is called the same over all tables.
>
> To get a former state for one row at date x I need to join the latest
> delta BEFORE date x with each delta AFTER date x. If I would log complete
> rows, this joining part would not be neccessary, but as I usually work with
> spatial databases that have complex geometries and also image files, this
> strategy is too harddisk consuming.
>
> If there are more users following a similar approach, I wonder why we not
> throw all the good ideas together, to have one solution that is tested,
> maintained and improved by more developpers. This would be great.
>
> Felix
>
>
> Gesendet: Montag, 29. September 2014 um 23:25 Uhr
> Von: "Abelard Hoffman"  ]>
> An: "Felix Kunde" 
> Cc: "pgsql-general@postgresql.org[pgsql-general@postgresql.org]" <
> pgsql-general@postgresql.org[pgsql-general@postgresql.org]>
> Betreff: Re: [GENERAL] table versioning approach (not auditing)
>
> Thank you Felix, Gavin, and Jonathan for your responses.
>
> Felix & Jonathan: both of you mention just storing deltas. But if you do
> that, how do you associate the delta record with the original row? Where's
> the PK stored, if it wasn't part of the delta?
>
> Felix, thank you very much for the example code. I took a look at your
> table schemas. I need to study it more, but it looks like the way you're
> handling the PK, is you're adding a separate synthethic key (audit_id) to
> each table that's being versioned. And then storing that key along with the
> delta.
>
> So then to find all the versions of a given row, you just need to join the
> audit row with the schema_name.table_name.audit_id column. Is that right?
> The only potential drawback there is there's no referential integrity
> between the audit_log.audit_id and the actual table.
>
> I do like that approach very much though, in that it eliminates the need
> to interrogate the json data in order to perform most queries.
>
> AH
>
>
>
> On Mon, Sep 29, 2014 at 12:26 AM, Felix Kunde  felix-ku...@gmx.de]> wrote:Hey
&g

Re: [GENERAL] table versioning approach (not auditing)

2014-10-01 Thread Felix Kunde
Hey there. Thank you very much for that fix! Thats why I'd like to have a joint 
development and joint testing. It's way more convincing for users to go for a 
solution that is tested by some experts than just by a random developer :) 

I'm open to create a new project and push the code there. Don't care about the 
name. Then we might figure out which parts are already good, which parts could 
be improved and where to go next. I think switching to JSONB for example will 
be easy, as it offers the same functions than JSON afaik.
 

Gesendet: Dienstag, 30. September 2014 um 21:16 Uhr
Von: "Adam Brusselback" 
An: "Felix Kunde" 
Cc: "pgsql-general@postgresql.org" 
Betreff: Re: [GENERAL] table versioning approach (not auditing)

Felix, I'd love to see a single, well maintained project. For example, I just 
found yours, and gave it a shot today after seeing this post.  I found a bug 
when an update command is issued, but the old and new values are all the same.  
The trigger will blow up.  I've got a fix for that, but if we had one project 
that more than a handful of people used, stuff like that would be quashed very 
quickly.
 
I love the design of it by the way. Any idea what it will take to move to JSONB 
for 9.4? 
 
 
On Tue, Sep 30, 2014 at 7:22 AM, Felix Kunde  wrote:Hey
 
yes i'm adding an additional key to each of my tables. First i wanted to use 
the primary key as one column in my audit_log table, but in some of my tables 
the PK consists of more than one column. Plus it's nice to have one key that is 
called the same over all tables.
 
To get a former state for one row at date x I need to join the latest delta 
BEFORE date x with each delta AFTER date x. If I would log complete rows, this 
joining part would not be neccessary, but as I usually work with spatial 
databases that have complex geometries and also image files, this strategy is 
too harddisk consuming.
 
If there are more users following a similar approach, I wonder why we not throw 
all the good ideas together, to have one solution that is tested, maintained 
and improved by more developpers. This would be great.
 
Felix
 

Gesendet: Montag, 29. September 2014 um 23:25 Uhr
Von: "Abelard Hoffman" 
An: "Felix Kunde" 
Cc: "pgsql-general@postgresql.org[pgsql-general@postgresql.org]" 

Betreff: Re: [GENERAL] table versioning approach (not auditing)

Thank you Felix, Gavin, and Jonathan for your responses.
 
Felix & Jonathan: both of you mention just storing deltas. But if you do that, 
how do you associate the delta record with the original row? Where's the PK 
stored, if it wasn't part of the delta?
 
Felix, thank you very much for the example code. I took a look at your table 
schemas. I need to study it more, but it looks like the way you're handling the 
PK, is you're adding a separate synthethic key (audit_id) to each table that's 
being versioned. And then storing that key along with the delta.
 
So then to find all the versions of a given row, you just need to join the 
audit row with the schema_name.table_name.audit_id column. Is that right? The 
only potential drawback there is there's no referential integrity between the 
audit_log.audit_id and the actual table.
 
I do like that approach very much though, in that it eliminates the need to 
interrogate the json data in order to perform most queries.
 
AH
 
 
 
On Mon, Sep 29, 2014 at 12:26 AM, Felix Kunde 
 wrote:Hey
 
i've also tried to implement a database versioning using JSON to log changes in 
tables. Here it is: 
https://github.com/fxku/audit[https://github.com/fxku/audit][https://github.com/fxku/audit[https://github.com/fxku/audit]]
I've got two versioning tables, one storing information about all transactions 
that happened and one where i put the JSON logs of row changes of each table. 
I'm only logging old values and not complete rows.
 
Then I got a function that recreates a database state at a given time into a 
separate schema - either to VIEWs, MVIEWs or TABLES. This database state could 
then be indexed in order to work with it. You can also reset the production 
state to the recreated past state.
 
Unfortunately I've got no time to further work on it at the moment + I have not 
done tests with many changes in the database so I can't say if the recreation 
process scales well. On downside I've realised is that using the json_agg 
function has limits when I've got binary data. It gets too long. So I'm really 
looking forward using JSONB.

There are more plans in my mind. By having a Transaction_Log table it should be 
possible to revert only certain transactions. I'm also thinking of parallel 
versioning, e.g. different users are all working with their version of the 
database and commit their changes to the production state. As I've got a unique 
history ID for each table and each row

Re: [GENERAL] table versioning approach (not auditing)

2014-09-30 Thread Adam Brusselback
Felix, I'd love to see a single, well maintained project. For example, I
just found yours, and gave it a shot today after seeing this post.  I found
a bug when an update command is issued, but the old and new values are all
the same.  The trigger will blow up.  I've got a fix for that, but if we
had one project that more than a handful of people used, stuff like that
would be quashed very quickly.

I love the design of it by the way. Any idea what it will take to move to
JSONB for 9.4?


On Tue, Sep 30, 2014 at 7:22 AM, Felix Kunde  wrote:

> Hey
>
> yes i'm adding an additional key to each of my tables. First i wanted to
> use the primary key as one column in my audit_log table, but in some of my
> tables the PK consists of more than one column. Plus it's nice to have one
> key that is called the same over all tables.
>
> To get a former state for one row at date x I need to join the latest
> delta BEFORE date x with each delta AFTER date x. If I would log complete
> rows, this joining part would not be neccessary, but as I usually work with
> spatial databases that have complex geometries and also image files, this
> strategy is too harddisk consuming.
>
> If there are more users following a similar approach, I wonder why we not
> throw all the good ideas together, to have one solution that is tested,
> maintained and improved by more developpers. This would be great.
>
> Felix
>
>
> Gesendet: Montag, 29. September 2014 um 23:25 Uhr
> Von: "Abelard Hoffman" 
> An: "Felix Kunde" 
> Cc: "pgsql-general@postgresql.org" 
> Betreff: Re: [GENERAL] table versioning approach (not auditing)
>
> Thank you Felix, Gavin, and Jonathan for your responses.
>
> Felix & Jonathan: both of you mention just storing deltas. But if you do
> that, how do you associate the delta record with the original row? Where's
> the PK stored, if it wasn't part of the delta?
>
> Felix, thank you very much for the example code. I took a look at your
> table schemas. I need to study it more, but it looks like the way you're
> handling the PK, is you're adding a separate synthethic key (audit_id) to
> each table that's being versioned. And then storing that key along with the
> delta.
>
> So then to find all the versions of a given row, you just need to join the
> audit row with the schema_name.table_name.audit_id column. Is that right?
> The only potential drawback there is there's no referential integrity
> between the audit_log.audit_id and the actual table.
>
> I do like that approach very much though, in that it eliminates the need
> to interrogate the json data in order to perform most queries.
>
> AH
>
>
>
> On Mon, Sep 29, 2014 at 12:26 AM, Felix Kunde 
> wrote:Hey
>
> i've also tried to implement a database versioning using JSON to log
> changes in tables. Here it is:
> https://github.com/fxku/audit[https://github.com/fxku/audit]
> I've got two versioning tables, one storing information about all
> transactions that happened and one where i put the JSON logs of row changes
> of each table. I'm only logging old values and not complete rows.
>
> Then I got a function that recreates a database state at a given time into
> a separate schema - either to VIEWs, MVIEWs or TABLES. This database state
> could then be indexed in order to work with it. You can also reset the
> production state to the recreated past state.
>
> Unfortunately I've got no time to further work on it at the moment + I
> have not done tests with many changes in the database so I can't say if the
> recreation process scales well. On downside I've realised is that using the
> json_agg function has limits when I've got binary data. It gets too long.
> So I'm really looking forward using JSONB.
>
> There are more plans in my mind. By having a Transaction_Log table it
> should be possible to revert only certain transactions. I'm also thinking
> of parallel versioning, e.g. different users are all working with their
> version of the database and commit their changes to the production state.
> As I've got a unique history ID for each table and each row, I should be
> able to map the affected records.
>
> Have a look and tell me what you think of it.
>
> Cheers
> Felix
>
>
> Gesendet: Montag, 29. September 2014 um 04:00 Uhr
> Von: "Abelard Hoffman" 
> An: "pgsql-general@postgresql.org" 
> Betreff: [GENERAL] table versioning approach (not auditing)
>
> Hi. I need to maintain a record of all changes to certain tables so assist
> in viewing history and reverting changes when necessary (customer service
> makes an incorrect edit, etc.).
>
> I

Re: [GENERAL] table versioning approach (not auditing)

2014-09-30 Thread Felix Kunde
Hey
 
yes i'm adding an additional key to each of my tables. First i wanted to use 
the primary key as one column in my audit_log table, but in some of my tables 
the PK consists of more than one column. Plus it's nice to have one key that is 
called the same over all tables.
 
To get a former state for one row at date x I need to join the latest delta 
BEFORE date x with each delta AFTER date x. If I would log complete rows, this 
joining part would not be neccessary, but as I usually work with spatial 
databases that have complex geometries and also image files, this strategy is 
too harddisk consuming.
 
If there are more users following a similar approach, I wonder why we not throw 
all the good ideas together, to have one solution that is tested, maintained 
and improved by more developpers. This would be great.
 
Felix
 

Gesendet: Montag, 29. September 2014 um 23:25 Uhr
Von: "Abelard Hoffman" 
An: "Felix Kunde" 
Cc: "pgsql-general@postgresql.org" 
Betreff: Re: [GENERAL] table versioning approach (not auditing)

Thank you Felix, Gavin, and Jonathan for your responses.
 
Felix & Jonathan: both of you mention just storing deltas. But if you do that, 
how do you associate the delta record with the original row? Where's the PK 
stored, if it wasn't part of the delta?
 
Felix, thank you very much for the example code. I took a look at your table 
schemas. I need to study it more, but it looks like the way you're handling the 
PK, is you're adding a separate synthethic key (audit_id) to each table that's 
being versioned. And then storing that key along with the delta.
 
So then to find all the versions of a given row, you just need to join the 
audit row with the schema_name.table_name.audit_id column. Is that right? The 
only potential drawback there is there's no referential integrity between the 
audit_log.audit_id and the actual table.
 
I do like that approach very much though, in that it eliminates the need to 
interrogate the json data in order to perform most queries.
 
AH
 
 
 
On Mon, Sep 29, 2014 at 12:26 AM, Felix Kunde  wrote:Hey
 
i've also tried to implement a database versioning using JSON to log changes in 
tables. Here it is: https://github.com/fxku/audit[https://github.com/fxku/audit]
I've got two versioning tables, one storing information about all transactions 
that happened and one where i put the JSON logs of row changes of each table. 
I'm only logging old values and not complete rows.
 
Then I got a function that recreates a database state at a given time into a 
separate schema - either to VIEWs, MVIEWs or TABLES. This database state could 
then be indexed in order to work with it. You can also reset the production 
state to the recreated past state.
 
Unfortunately I've got no time to further work on it at the moment + I have not 
done tests with many changes in the database so I can't say if the recreation 
process scales well. On downside I've realised is that using the json_agg 
function has limits when I've got binary data. It gets too long. So I'm really 
looking forward using JSONB.

There are more plans in my mind. By having a Transaction_Log table it should be 
possible to revert only certain transactions. I'm also thinking of parallel 
versioning, e.g. different users are all working with their version of the 
database and commit their changes to the production state. As I've got a unique 
history ID for each table and each row, I should be able to map the affected 
records.

Have a look and tell me what you think of it.

Cheers
Felix
 

Gesendet: Montag, 29. September 2014 um 04:00 Uhr
Von: "Abelard Hoffman" 
An: "pgsql-general@postgresql.org" 
Betreff: [GENERAL] table versioning approach (not auditing)

Hi. I need to maintain a record of all changes to certain tables so assist in 
viewing history and reverting changes when necessary (customer service makes an 
incorrect edit, etc.).
 
I have studied these two audit trigger examples:
https://wiki.postgresql.org/wiki/Audit_trigger[https://wiki.postgresql.org/wiki/Audit_trigger]
https://wiki.postgresql.org/wiki/Audit_trigger_91plus
 
I've also read about two other approaches to versioning:
1. maintain all versions in one table, with a flag to indicate which is the 
current version
2. have a separate versions table for each real table, and insert into the 
associated version table whenever an update or insert is done.
 
My current implementation is based on the wiki trigger examples, using a single 
table, and a json column to record the row changes (rather than hstore). What I 
like about that, in particular, is I can have a "global," chronological view of 
all versioned changes very easily.
 
But there are two types of queries I need to run.
1. Find all changes made by a specific user
2. Find all changes related to a specific record
 
#1 is simple to do. The vers

Re: [GENERAL] table versioning approach (not auditing)

2014-09-29 Thread Jonathan Vanasco

On Sep 29, 2014, at 4:06 PM, Nick Guenther wrote:

> A newbie tangent question: how do you access the transaction serial? Is it 
> txid_current() as listed in 
> http://www.postgresql.org/docs/9.3/static/functions-info.html?

My implementations were ridiculously simple/naive in design, and existed 
entirely with under defined serials.  i'd just create a new record + id on a 
write operation, and then use it when logging all operations.

I had read up on a lot of (possibly better) ways to handle this using pg 
internals.  They all seemed more advanced than I needed.


> And does your implementation worry about multiple timelines? 

Not sure I understand this... but every object is given a revision id.  edits 
between consecutive revisions are allowed, edits spanning multiple revisions 
are rejected.


On Sep 29, 2014, at 5:25 PM, Abelard Hoffman wrote:

> Felix & Jonathan: both of you mention just storing deltas. But if you do 
> that, how do you associate the delta record with the original row? Where's 
> the PK stored, if it wasn't part of the delta?

The logic I decided on, is this:

Revision 0
 Only the original record is stored
Revision 1
• Copy the original record into revision store
Revision 1+
• Update the original record, store the deltas in the revision store

The reason why I chose this path, is that in my system:
• most records are not edited
• the records that are edited, are heavily edited

We use an ORM and it was simple to implement this pattern with it, and then 
write some functions in postgres to ensure it is adhered to.

When I need to pull data out:

• I can pull exact revisions out of the htstore for a given table/row 
using the revision ids as a key
• the revisions all contain the transaction id
• if i need to get more info about a given transaction, i can query the 
transactions table and get a list of all the objects that were edited within 
that transaction

if i wanted to ensure referential integrity, i could have used a table instead 
of an hstore (or json).  If the application grows much larger, it will probably 
be migrated to a model like that.  This approach just gave a lot of flexibility 
, minimized  tables in the database, and was very easy to pull off.  i went 
with hstore because json didn't allow in-place updates at the time (i think it 
does now).  




-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] table versioning approach (not auditing)

2014-09-29 Thread Abelard Hoffman
Thank you Felix, Gavin, and Jonathan for your responses.

Felix & Jonathan: both of you mention just storing deltas. But if you do
that, how do you associate the delta record with the original row? Where's
the PK stored, if it wasn't part of the delta?

Felix, thank you very much for the example code. I took a look at your
table schemas. I need to study it more, but it looks like the way you're
handling the PK, is you're adding a separate synthethic key (audit_id) to
each table that's being versioned. And then storing that key along with the
delta.

So then to find all the versions of a given row, you just need to join the
audit row with the schema_name.table_name.audit_id column. Is that right?
The only potential drawback there is there's no referential integrity
between the audit_log.audit_id and the actual table.

I do like that approach very much though, in that it eliminates the need to
interrogate the json data in order to perform most queries.

AH



On Mon, Sep 29, 2014 at 12:26 AM, Felix Kunde  wrote:

> Hey
>
> i've also tried to implement a database versioning using JSON to log
> changes in tables. Here it is: https://github.com/fxku/audit
> I've got two versioning tables, one storing information about all
> transactions that happened and one where i put the JSON logs of row changes
> of each table. I'm only logging old values and not complete rows.
>
> Then I got a function that recreates a database state at a given time into
> a separate schema - either to VIEWs, MVIEWs or TABLES. This database state
> could then be indexed in order to work with it. You can also reset the
> production state to the recreated past state.
>
> Unfortunately I've got no time to further work on it at the moment + I
> have not done tests with many changes in the database so I can't say if the
> recreation process scales well. On downside I've realised is that using the
> json_agg function has limits when I've got binary data. It gets too long.
> So I'm really looking forward using JSONB.
>
> There are more plans in my mind. By having a Transaction_Log table it
> should be possible to revert only certain transactions. I'm also thinking
> of parallel versioning, e.g. different users are all working with their
> version of the database and commit their changes to the production state.
> As I've got a unique history ID for each table and each row, I should be
> able to map the affected records.
>
> Have a look and tell me what you think of it.
>
> Cheers
> Felix
>
>
> Gesendet: Montag, 29. September 2014 um 04:00 Uhr
> Von: "Abelard Hoffman" 
> An: "pgsql-general@postgresql.org" 
> Betreff: [GENERAL] table versioning approach (not auditing)
>
> Hi. I need to maintain a record of all changes to certain tables so assist
> in viewing history and reverting changes when necessary (customer service
> makes an incorrect edit, etc.).
>
> I have studied these two audit trigger examples:
> https://wiki.postgresql.org/wiki/Audit_trigger
> https://wiki.postgresql.org/wiki/Audit_trigger_91plus
>
> I've also read about two other approaches to versioning:
> 1. maintain all versions in one table, with a flag to indicate which is
> the current version
> 2. have a separate versions table for each real table, and insert into the
> associated version table whenever an update or insert is done.
>
> My current implementation is based on the wiki trigger examples, using a
> single table, and a json column to record the row changes (rather than
> hstore). What I like about that, in particular, is I can have a "global,"
> chronological view of all versioned changes very easily.
>
> But there are two types of queries I need to run.
> 1. Find all changes made by a specific user
> 2. Find all changes related to a specific record
>
> #1 is simple to do. The versioning table has a user_id column of who made
> the change, so I can query on that.
>
> #2 is more difficult. I may want to fetch all changes to a group of tables
> that are all related by foreign keys (e.g., find all changes to "user"
> record 849, along with any changes to their "articles," "photos," etc.).
> All of the data is in the json column, of course, but it seems like a pain
> to try and build a query on the json column that can fetch all those
> relationships (and if I mess it up, I probably won't generate any errors,
> since the json is so free-form).
>
> So my question is, do you think using the json approach is wrong for this
> case? Does it seem better to have separate versioning tables associated
> with each real table? Or another approach?
>
> Thanks
>
>
>


Re: [GENERAL] table versioning approach (not auditing)

2014-09-29 Thread Nick Guenther


On September 29, 2014 11:08:55 AM EDT, Jonathan Vanasco  
wrote:
>
>- use a "transaction" log.  every write session gets logged into the
>transaction table (serial, timestamp, user_id).  all updates to the
>recorded tables include the transaction's serial.  then there is a
>"transactions" table, that is just "transaction_serial ,  object_id ,
>object_action".  

A newbie tangent question: how do you access the transaction serial? Is it 
txid_current() as listed in 
http://www.postgresql.org/docs/9.3/static/functions-info.html?

And how do you actually make use of that information? I know from Bruce 
Momjians's excellent MVCC talk  
that postgres internally has a secret txid column on each row; can you somehow 
query on the secret column? And does your implementation worry about multiple 
timelines? 

My use case is dynamically allocated replication. Broadly, my algorithm is that 
for each client
1) download a full copy of the current table
2) keep the connection open and send deltas (which are just inserts and 
deletes, for me)

I need 2 to begin *as if immediately* after 1.  txids sound like they are 
exactly what I need but without knowing how to handle them, I fudged it by 
opening a query for 1 and for 2 immediately after each other so that they 
should be plugged to the same txid but before reading them.  There's definitely 
a race condition that will show under load, though. I think the correct 
algorithm is:

1) ask the current txid X
2) start buffering deltas with txid > X
3) download the table as of X
4) download the buffer of deltas and listen for future ones
-- 


-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] table versioning approach (not auditing)

2014-09-29 Thread Jonathan Vanasco

In the past, to accomplish the same thing I've done this:

- store the data in hstore/json.  instead of storing snapshots, I store deltas. 
 i've been using a second table though, because it's improved performance on 
reads and writes.
- use a "transaction" log.  every write session gets logged into the 
transaction table (serial, timestamp, user_id).  all updates to the recorded 
tables include the transaction's serial.  then there is a "transactions" table, 
that is just "transaction_serial ,  object_id , object_action".  

whenever I have needs for auditing or versioning, I can just query the 
transaction table for the records I want... then use that to grab the data out 
of hstore.



On Sep 28, 2014, at 10:00 PM, Abelard Hoffman wrote:

> Hi. I need to maintain a record of all changes to certain tables so assist in 
> viewing history and reverting changes when necessary (customer service makes 
> an incorrect edit, etc.).
> 
> I have studied these two audit trigger examples:
> https://wiki.postgresql.org/wiki/Audit_trigger
> https://wiki.postgresql.org/wiki/Audit_trigger_91plus
> 
> I've also read about two other approaches to versioning:
> 1. maintain all versions in one table, with a flag to indicate which is the 
> current version
> 2. have a separate versions table for each real table, and insert into the 
> associated version table whenever an update or insert is done.
> 
> My current implementation is based on the wiki trigger examples, using a 
> single table, and a json column to record the row changes (rather than 
> hstore). What I like about that, in particular, is I can have a "global," 
> chronological view of all versioned changes very easily.
> 
> But there are two types of queries I need to run.
> 1. Find all changes made by a specific user
> 2. Find all changes related to a specific record
> 
> #1 is simple to do. The versioning table has a user_id column of who made the 
> change, so I can query on that.
> 
> #2 is more difficult. I may want to fetch all changes to a group of tables 
> that are all related by foreign keys (e.g., find all changes to "user" record 
> 849, along with any changes to their "articles," "photos," etc.). All of the 
> data is in the json column, of course, but it seems like a pain to try and 
> build a query on the json column that can fetch all those relationships (and 
> if I mess it up, I probably won't generate any errors, since the json is so 
> free-form).
> 
> So my question is, do you think using the json approach is wrong for this 
> case? Does it seem better to have separate versioning tables associated with 
> each real table? Or another approach?
> 
> Thanks




Re: [GENERAL] table versioning approach (not auditing)

2014-09-29 Thread Felix Kunde
Hey
 
i've also tried to implement a database versioning using JSON to log changes in 
tables. Here it is: https://github.com/fxku/audit
I've got two versioning tables, one storing information about all transactions 
that happened and one where i put the JSON logs of row changes of each table. 
I'm only logging old values and not complete rows.
 
Then I got a function that recreates a database state at a given time into a 
separate schema - either to VIEWs, MVIEWs or TABLES. This database state could 
then be indexed in order to work with it. You can also reset the production 
state to the recreated past state.
 
Unfortunately I've got no time to further work on it at the moment + I have not 
done tests with many changes in the database so I can't say if the recreation 
process scales well. On downside I've realised is that using the json_agg 
function has limits when I've got binary data. It gets too long. So I'm really 
looking forward using JSONB.

There are more plans in my mind. By having a Transaction_Log table it should be 
possible to revert only certain transactions. I'm also thinking of parallel 
versioning, e.g. different users are all working with their version of the 
database and commit their changes to the production state. As I've got a unique 
history ID for each table and each row, I should be able to map the affected 
records.

Have a look and tell me what you think of it.

Cheers
Felix
 

Gesendet: Montag, 29. September 2014 um 04:00 Uhr
Von: "Abelard Hoffman" 
An: "pgsql-general@postgresql.org" 
Betreff: [GENERAL] table versioning approach (not auditing)

Hi. I need to maintain a record of all changes to certain tables so assist in 
viewing history and reverting changes when necessary (customer service makes an 
incorrect edit, etc.).
 
I have studied these two audit trigger examples:
https://wiki.postgresql.org/wiki/Audit_trigger
https://wiki.postgresql.org/wiki/Audit_trigger_91plus
 
I've also read about two other approaches to versioning:
1. maintain all versions in one table, with a flag to indicate which is the 
current version
2. have a separate versions table for each real table, and insert into the 
associated version table whenever an update or insert is done.
 
My current implementation is based on the wiki trigger examples, using a single 
table, and a json column to record the row changes (rather than hstore). What I 
like about that, in particular, is I can have a "global," chronological view of 
all versioned changes very easily.
 
But there are two types of queries I need to run.
1. Find all changes made by a specific user
2. Find all changes related to a specific record
 
#1 is simple to do. The versioning table has a user_id column of who made the 
change, so I can query on that.
 
#2 is more difficult. I may want to fetch all changes to a group of tables that 
are all related by foreign keys (e.g., find all changes to "user" record 849, 
along with any changes to their "articles," "photos," etc.). All of the data is 
in the json column, of course, but it seems like a pain to try and build a 
query on the json column that can fetch all those relationships (and if I mess 
it up, I probably won't generate any errors, since the json is so free-form).
 
So my question is, do you think using the json approach is wrong for this case? 
Does it seem better to have separate versioning tables associated with each 
real table? Or another approach?
 
Thanks
 
 


-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] table versioning approach (not auditing)

2014-09-28 Thread Gavin Flower

On 29/09/14 15:00, Abelard Hoffman wrote:
Hi. I need to maintain a record of all changes to certain tables so 
assist in viewing history and reverting changes when necessary 
(customer service makes an incorrect edit, etc.).


I have studied these two audit trigger examples:
https://wiki.postgresql.org/wiki/Audit_trigger
https://wiki.postgresql.org/wiki/Audit_trigger_91plus

I've also read about two other approaches to versioning:
1. maintain all versions in one table, with a flag to indicate which 
is the current version
2. have a separate versions table for each real table, and insert into 
the associated version table whenever an update or insert is done.


My current implementation is based on the wiki trigger examples, using 
a single table, and a json column to record the row changes (rather 
than hstore). What I like about that, in particular, is I can have a 
"global," chronological view of all versioned changes very easily.


But there are two types of queries I need to run.
1. Find all changes made by a specific user
2. Find all changes related to a specific record

#1 is simple to do. The versioning table has a user_id column of who 
made the change, so I can query on that.


#2 is more difficult. I may want to fetch all changes to a group of 
tables that are all related by foreign keys (e.g., find all changes to 
"user" record 849, along with any changes to their "articles," 
"photos," etc.). All of the data is in the json column, of course, but 
it seems like a pain to try and build a query on the json column that 
can fetch all those relationships (and if I mess it up, I probably 
won't generate any errors, since the json is so free-form).


So my question is, do you think using the json approach is wrong for 
this case? Does it seem better to have separate versioning tables 
associated with each real table? Or another approach?


Thanks


I implemented a 2 table approach over 15 years ago for an insurance 
application.  I used both an /effective_date/ & and an /as_at_date/, no 
triggers were involved.  I think a 2 table approach gives you more 
flexibility.


The /effective_date/ allowed changes to be made to the table in advance 
of when they were to become effective.


The /as_at_date/ allowed quotes to be made, valid for a period starting 
at the as_at_date.


End users did not query the database directly, all queries were precoded 
in a 4GL called Progress backed by an Oracle database.  The same could 
be done with a WildFly Java Enterprise AppSever (or some other 
middleware) and a PostgreSQL backend.


Different use case, but the concept is probably adaptable to your situation.

You may want a change table, that has a change_number that is in each 
type of table affected by a change.  This would help for query type #2.


I would be quite happy to contract to work out the appropriate schema 
and develop some SQL scripts to query & update the database, if you were 
interested.  My approach would be to create a minimal database with 
sample data to validate the schema design and SQL scripts.


Using a flag to indicate current record, seems inflexible.  As some 
changes may not take affect until some time in the future, and you can't 
query the database to see what was the situation at a particular point 
in the past.  For example: somebody complains about something that 
happened last Saturday near noon, how would you query the database to 
what it was like then?



Cheers,
Gavin



[GENERAL] table versioning approach (not auditing)

2014-09-28 Thread Abelard Hoffman
Hi. I need to maintain a record of all changes to certain tables so assist
in viewing history and reverting changes when necessary (customer service
makes an incorrect edit, etc.).

I have studied these two audit trigger examples:
https://wiki.postgresql.org/wiki/Audit_trigger
https://wiki.postgresql.org/wiki/Audit_trigger_91plus

I've also read about two other approaches to versioning:
1. maintain all versions in one table, with a flag to indicate which is the
current version
2. have a separate versions table for each real table, and insert into the
associated version table whenever an update or insert is done.

My current implementation is based on the wiki trigger examples, using a
single table, and a json column to record the row changes (rather than
hstore). What I like about that, in particular, is I can have a "global,"
chronological view of all versioned changes very easily.

But there are two types of queries I need to run.
1. Find all changes made by a specific user
2. Find all changes related to a specific record

#1 is simple to do. The versioning table has a user_id column of who made
the change, so I can query on that.

#2 is more difficult. I may want to fetch all changes to a group of tables
that are all related by foreign keys (e.g., find all changes to "user"
record 849, along with any changes to their "articles," "photos," etc.).
All of the data is in the json column, of course, but it seems like a pain
to try and build a query on the json column that can fetch all those
relationships (and if I mess it up, I probably won't generate any errors,
since the json is so free-form).

So my question is, do you think using the json approach is wrong for this
case? Does it seem better to have separate versioning tables associated
with each real table? Or another approach?

Thanks