Re: [Bucardo-general] Design of new txntime field

Jon Jensen Mon, 11 Sep 2017 08:32:03 -0700

On Sun, 10 Sep 2017, Greg Sabino Mullane wrote:

That won't sort unless you pad txid_current(), right? As a stringifiedinteger with varying digits, it'll depend on number of digits.
And actually the timestamp won't sort right as a string either, sinceit will have fewer digits sometimes when the microseconds end with 0and are truncated, leaving fewer string digits.
Excellent points, I'm glad you spotted that. I've been playing with thistoday, and we can run now() back through to_char to get the fullmicroseconds each time like this:
select to_char(now(), 'YYYY-MM-DD HH24:MI:SS.US');


Yes, that's better.

Then I started looking at formatting the txnid_current() output. Problemis, it's a bigint, which means quite a lot of padding for most, if notall, standard installations. Then I had another idea - what about thePID? Those are distributed sequentially or randomly, but all that reallymatters is that two backends who happen to have the exact same now()have diferent PIDs - which should always happen. Then the question ofhow big to make the string - PIDs can be forced quite high with someeffort, but for purposes of coliision avoidance, we probably only needsome of the digits. So I came up with:
select to_char(right(pg_backend_pid()::text, 4)::int, 'FM00000');
While PIDs can be negative on some systems, that should not be a problemeither - it still makes a good tie breaker.
We can also compress things a little - we want to keep the timestamphuman readable, but we don't need all the whitespace that now() gives bydefault. This, the final proposal becomes:
select to_char(now(), 'YYYYMMDD.HH24MISS.US.')::text || 
to_char(right(pg_backend_pid()::text, 5)::int,'FM00000');

The output looks like this:

20170910.143526.395762.04482
That's 28 characters, which is not too bad overall. And it is stillhuman readable, and sortable.

If we're going for space savings, it'd be just as well to leave out theunnecessary dots, wouldn't it? It's still pretty easily readable withoutthem if you know what and how wide the fields are.

So how does it do in the criteria I made for the new txntime?

* No collisions
The addition of the last five of the current PID should avoid collisions.
In theory, there may still be collisions - which we can reduce further by
going to 6 or 7 digits, but is that really worth it? On a system using
6 or more digits, you'd have to have two backends created exactly
100,000 PIDs apart (if sequential), or happening to have the same last
five digits (if random), a 9 in 10,0000 chance, or 99 in 10,000.
Think I just talked myself into 6 digits. :)

We could use the same approach with txnid_current() and keep only the lastfew digits, right? That would be a lot more consistent across OSes thanPID allocations are.

* Human readable
Not quite as intuitive as the existing timestamp, but fairly obvious and
easy to read. And we can document the right-hand-side of it. Might be
worth it to force it all to GMT as well.


+1 to using UTC.

* Disk space
We want the delta tables to be as small as possible. Right now, this
solution is a lot more space than a plain timestamp, but I don't
see an easy way around this problem right now.

It could probably be converted to 1 or 2 bigints, which would moreefficiently use the space available, and sort more quickly, at the cost ofno easy readability of the source inputs.

On rereading this, maybe we don't need the microseconds anymore either, as
the PID chunk should be enough to distinguish things? Or at least we could
do milliseconds, and save a few characters.


That's probably true.

Jon
_______________________________________________
Bucardo-general mailing list
[email protected]
https://mail.endcrypt.com/mailman/listinfo/bucardo-general

Re: [Bucardo-general] Design of new txntime field

Reply via email to