Re: Persist MVCC forever - retain history

2020-07-05 Thread Thomas Kellerer

Konstantin Knizhnik schrieb am 05.07.2020 um 19:31:

I am surprised that you are saying you didn't feel big interest. My
reading of the thread is the opposite, that there was quite some
interest, but that there are technical challenges to overcome. So you
gave up on that work?

No, I have not gave up.
But...
There are well known problems of proposed approach:
1. Not supporting schema changes
2. Not compatible with DROP/TRUNCATE
3. Presence of large number of aborted transaction can slow down data access.
4. Semantic of join of tables with different timestamp is obscure.


Oracle partially solved this (at least 1,3 and 4 - don't know about 3) by storing the old 
versions in a separate table that is automatically managed if you enable the feature. If 
a query uses the AS OF to go "back in time", it's rewritten to access the 
history table.

Thomas





Re: Persist MVCC forever - retain history

2020-07-05 Thread Konstantin Knizhnik




On 05.07.2020 08:48, Mitar wrote:

Hi!

On Fri, Jul 3, 2020 at 12:29 AM Konstantin Knizhnik
 wrote:

Did you read this thread:
https://www.postgresql.org/message-id/flat/78aadf6b-86d4-21b9-9c2a-51f1efb8a499%40postgrespro.ru
I have proposed a patch for supporting time travel (AS OF) queries.
But I didn't fill a big interest to it from community.

Oh, you went much further than me in this thinking. Awesome!

I am surprised that you are saying you didn't feel big interest. My
reading of the thread is the opposite, that there was quite some
interest, but that there are technical challenges to overcome. So you
gave up on that work?

No, I have not gave up.
But...
There are well known problems of proposed approach:
1. Not supporting schema changes
2. Not compatible with DROP/TRUNCATE
3. Presence of large number of aborted transaction can slow down data 
access.

4. Semantic of join of tables with different timestamp is obscure.

I do not know how to address this issues. I am not sure how critical all 
this issues are and do them made this approach unusable.
Also there is quite common opinion that time travel should be don at 
application level and we do not need to support it at database kernel level.


I will be glad to continue work in this direction if there is some 
interest to this topic and somebody is going to try/review this feature.
It is very difficult to find some motivation for developing new features 
if you are absolutely sure that it will be never accepted by community.







Re: Persist MVCC forever - retain history

2020-07-04 Thread Mitar
Hi!

On Fri, Jul 3, 2020 at 12:29 AM Konstantin Knizhnik
 wrote:
> Did you read this thread:
> https://www.postgresql.org/message-id/flat/78aadf6b-86d4-21b9-9c2a-51f1efb8a499%40postgrespro.ru
> I have proposed a patch for supporting time travel (AS OF) queries.
> But I didn't fill a big interest to it from community.

Oh, you went much further than me in this thinking. Awesome!

I am surprised that you are saying you didn't feel big interest. My
reading of the thread is the opposite, that there was quite some
interest, but that there are technical challenges to overcome. So you
gave up on that work?


Mitar

-- 
http://mitar.tnode.com/
https://twitter.com/mitar_m




Re: Persist MVCC forever - retain history

2020-07-03 Thread Adam Brusselback
> But I didn't fill a big interest to it from community.
Just fyi, it is something that I use in my database design now (just hacked
together using ranges / exclusion constraints) and
would love for a well supported solution.

I've chimed in a couple times as this feature has popped up in discussion
over the years, as I have seen others with similar needs do as well.
Just sometimes feels like spam to chime in just saying "i'd find this
feature useful" so I try and not do that too much. I'd rather not step on
the
community's toes.

-Adam


Re: Persist MVCC forever - retain history

2020-07-03 Thread Konstantin Knizhnik




On 02.07.2020 21:55, Mitar wrote:

Hi!

(Sorry if this was already discussed, it looks pretty obvious, but I
could not find anything.)

I was thinking and reading about how to design the schema to keep
records of all changes which happen to the table, at row granularity,
when I realized that all this is already done for me by PostgreSQL
MVCC. All rows (tuples) are already stored, with an internal version
field as well.

So I wonder, how could I hack PostgreSQL to disable vacuuming a table,
so that all tuples persist forever, and how could I make those
internal columns visible so that I could make queries asking for
results at the particular historical version of table state? My
understanding is that indices are already indexing over those internal
columns as well, so those queries over historical versions would be
efficient as well. Am I missing something which would make this not
possible?

Is this something I would have to run a custom version of PostgreSQL
or is this possible through an extension of sort?


Mitar


Did you read this thread:
https://www.postgresql.org/message-id/flat/78aadf6b-86d4-21b9-9c2a-51f1efb8a499%40postgrespro.ru
I have proposed a patch for supporting time travel (AS OF) queries.
But I didn't fill a big interest to it from community.





Re: Persist MVCC forever - retain history

2020-07-02 Thread Mitar
Hi!

On Thu, Jul 2, 2020 at 7:51 PM Mark Dilger  wrote:
> I expect these issues to be less than half what you would need to resolve, 
> though much of the rest of it is less clear to me.

Thank you for this insightful input. I will think it over.


Mitar

-- 
http://mitar.tnode.com/
https://twitter.com/mitar_m




Re: Persist MVCC forever - retain history

2020-07-02 Thread Mitar
Hi!

On Thu, Jul 2, 2020 at 12:16 PM Thomas Munro  wrote:
> This was a research topic in ancient times (somewhere I read that in
> some ancient version, VACUUM didn't originally remove tuples, it moved
> them to permanent write-only storage).  Even after the open source
> project began, there was a "time travel" feature, but it was removed
> in 6.2:

Very interesting. Thanks for sharing.

> There aren't indexes on those things.

Oh. My information is based on what I read in [1]. This is where I
realized that if PostgreSQL maintains those extra columns and indices,
then there is no point in replicating that by copying all that to
another table. So this is not true? Or not true anymore?

> If you want to keep track of all changes in a way that lets you query
> things as of historical times, including joins, and possibly including
> multiple time dimensions ("on the 2nd of Feb, what address did we
> think Fred lived at on the 1st of Jan?") you might want to read
> "Developing Time-Oriented Database Applications in SQL" about this,

Interesting. I checked it out a bit. I think this is not exactly what
I am searching for. My main motivation is reactive web applications,
where I can push changes of (sub)state of the database to the web app,
when that (sub)state changes. And if the web app is offline for some
time, that it can come and resync also all missed changes. Moreover,
changes themselves are important (not just the last state) because it
allows one to merge with a potentially changed local state in the web
app while it was offline. So in a way it is logical replication and
replay, but just at database - client level.

[1] https://eng.uber.com/postgres-to-mysql-migration/


Mitar

-- 
http://mitar.tnode.com/
https://twitter.com/mitar_m




Re: Persist MVCC forever - retain history

2020-07-02 Thread Mark Dilger



> On Jul 2, 2020, at 5:58 PM, Mitar  wrote:
> 
>> Plus, wrap-around and freezing aren’t just nice-to-have features.
> 
> Oh, I forgot about that. ctid is still just 32 bits? So then for such
> table with permanent MVCC this would have to be increased, to like 64
> bits or something. Then one would not have to do wrap-around
> protection, no?

I think what you propose is a huge undertaking, and would likely result in a 
fork of postgres not compatible with the public sources.  I do not recommend 
the project.  But in answer to your question

Yes, the values stored in the tuple header are 32 bits.  Take a look in 
access/htup_details.h.  You'll notice that HeapTupleHeaderData has a union:

union
{
HeapTupleFields t_heap;
DatumTupleFields t_datum;
}   t_choice;

If you check, HeapTupleFields and DatumTupleFields are the same size, each 
having three 32 bit values, though they mean different things.  You may need to 
expand types TransactionId, CommandId, and Oid to 64 bits, expand varlena 
headers to 64 bits, and typemods to 64 bits.  You may find that it is harder to 
just expand a subset of those, given the way these fields overlay in these 
unions.  There will be lot of busy work going through the code to adjust 
everything else to match.  Just updating printf style formatting in error 
messages may take a long time.

If you do choose to expand only some of the types, say just TransactionId and 
CommandId, you'll have to deal with the size mismatch between HeapTupleFields 
and DatumTupleFields.

Aborted transactions leave dead rows in your tables, and you may want to deal 
with that for performance reasons.  Even if you don't intend to remove deleted 
rows, because you are just going to keep them around for time travel purposes, 
you might still want to use vacuum to remove dead rows, those that never 
committed.

You'll need to think about how to manage the growing clog if you don't intend 
to truncate it periodically.  Or if you do intend to truncate clog 
periodically, you'll need to think about the fact that you have TransactionIds 
in your tables older than what clog knows about.

You may want to think about how keeping dead rows around affects index 
performance.

I expect these issues to be less than half what you would need to resolve, 
though much of the rest of it is less clear to me.

—
Mark Dilger
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company







Re: Persist MVCC forever - retain history

2020-07-02 Thread Jonah H. Harris
On Thu, Jul 2, 2020 at 2:56 PM Mitar  wrote:

> Hi!
>
> (Sorry if this was already discussed, it looks pretty obvious, but I
> could not find anything.)


There have been a couple timetravel extensions done, each with their own
limitations. I don’t believe a perfect implementation could be done without
reading the functionality to core (which would be new functionality given
all the changes.) I’d say start with the extensions and go from there.

-- 
Jonah H. Harris


Re: Persist MVCC forever - retain history

2020-07-02 Thread Mitar
Hi!

On Thu, Jul 2, 2020 at 12:12 PM David G. Johnston
 wrote:
> Even for a single table how would you go about specifying this in a 
> user-friendly way?  Then consider joins.

One general answer: you use query rewriting. But what is user-friendly
depends on the use case. For me, the main motivation for this is that
I would like to sync database and client state, including all
revisions of data. So it is pretty easy to then query based on this
row revision for which rows are newer and sync them over. And then I
can show diffs of changes through time for that particular row.

I agree that reconstructing joins at one particular moment in time in
the past requires more information. But that information also other
solutions (like copying all changes to a separate table in triggers)
require: adding timestamp column and so on. So I can just have a
timestamp column in my original (and only) table and have a BEFORE
trigger which populates it with a timestamp. Then at a later time,
when I have in one table all revisions of a row, I can also query
based on timestamp, but PostgreSQL revision column help me to address
the issue of two changes happening at the same timestamp.

I still gain that a) I do not have to copy rows to another table b) I
do not have to vacuum. The only downside is that I have to rewrite
queries for the latest state to operate only on the latest state (or
maybe PostgreSQL could continue to do this for me like now, just allow
me to also access old versions).

>  If by “this” you mean leveraging MVCC you don’t; it isn’t suitable for 
> persistent temporal data.

Why not?

> The fundamental missing piece is that there is no concept of timestamp in 
> MVCC.

That can be added using BEFORE trigger.

> Plus, wrap-around and freezing aren’t just nice-to-have features.

Oh, I forgot about that. ctid is still just 32 bits? So then for such
table with permanent MVCC this would have to be increased, to like 64
bits or something. Then one would not have to do wrap-around
protection, no?


Mitar

-- 
http://mitar.tnode.com/
https://twitter.com/mitar_m




Re: Persist MVCC forever - retain history

2020-07-02 Thread Thomas Munro
On Fri, Jul 3, 2020 at 6:56 AM Mitar  wrote:
> I was thinking and reading about how to design the schema to keep
> records of all changes which happen to the table, at row granularity,
> when I realized that all this is already done for me by PostgreSQL
> MVCC. All rows (tuples) are already stored, with an internal version
> field as well.

This was a research topic in ancient times (somewhere I read that in
some ancient version, VACUUM didn't originally remove tuples, it moved
them to permanent write-only storage).  Even after the open source
project began, there was a "time travel" feature, but it was removed
in 6.2:

https://www.postgresql.org/docs/6.3/c0503.htm

> So I wonder, how could I hack PostgreSQL to disable vacuuming a table,
> so that all tuples persist forever, and how could I make those
> internal columns visible so that I could make queries asking for
> results at the particular historical version of table state? My
> understanding is that indices are already indexing over those internal
> columns as well, so those queries over historical versions would be
> efficient as well. Am I missing something which would make this not
> possible?

There aren't indexes on those things.

If you want to keep track of all changes in a way that lets you query
things as of historical times, including joins, and possibly including
multiple time dimensions ("on the 2nd of Feb, what address did we
think Fred lived at on the 1st of Jan?") you might want to read
"Developing Time-Oriented Database Applications in SQL" about this,
freely available as a PDF[1].  There's also a bunch of temporal
support in more recent SQL standards, not supported by PostgreSQL, and
it was designed by the author of that book.  There are people working
on trying to implement parts of the standard support for PostgreSQL.

> Is this something I would have to run a custom version of PostgreSQL
> or is this possible through an extension of sort?

There are some extensions that offer some temporal support inspired by
the standard (I haven't used any of them so I can't comment on them).

[1] http://www2.cs.arizona.edu/~rts/publications.html




Re: Persist MVCC forever - retain history

2020-07-02 Thread David G. Johnston
On Thursday, July 2, 2020, Mitar  wrote:


> make queries asking for
> results at the particular historical version of table state?


Even for a single table how would you go about specifying this in a
user-friendly way?  Then consider joins.


> Is this something I would have to run a custom version of PostgreSQL
> or is this possible through an extension of sort?
>

 If by “this” you mean leveraging MVCC you don’t; it isn’t suitable for
persistent temporal data.

The fundamental missing piece is that there is no concept of timestamp in
MVCC. Plus, wrap-around and freezing aren’t just nice-to-have features.

David J.


Persist MVCC forever - retain history

2020-07-02 Thread Mitar
Hi!

(Sorry if this was already discussed, it looks pretty obvious, but I
could not find anything.)

I was thinking and reading about how to design the schema to keep
records of all changes which happen to the table, at row granularity,
when I realized that all this is already done for me by PostgreSQL
MVCC. All rows (tuples) are already stored, with an internal version
field as well.

So I wonder, how could I hack PostgreSQL to disable vacuuming a table,
so that all tuples persist forever, and how could I make those
internal columns visible so that I could make queries asking for
results at the particular historical version of table state? My
understanding is that indices are already indexing over those internal
columns as well, so those queries over historical versions would be
efficient as well. Am I missing something which would make this not
possible?

Is this something I would have to run a custom version of PostgreSQL
or is this possible through an extension of sort?


Mitar

-- 
http://mitar.tnode.com/
https://twitter.com/mitar_m