On Sun, 10 Sep 2017, Greg Sabino Mullane wrote:
That won't sort unless you pad txid_current(), right? As a stringified
integer with varying digits, it'll depend on number of digits.
And actually the timestamp won't sort right as a string either, since
it will have fewer digits sometimes when the microseconds end with 0
and are truncated, leaving fewer string digits.
Excellent points, I'm glad you spotted that. I've been playing with this
today, and we can run now() back through to_char to get the full
microseconds each time like this:
select to_char(now(), 'YYYY-MM-DD HH24:MI:SS.US');
Yes, that's better.
Then I started looking at formatting the txnid_current() output. Problem
is, it's a bigint, which means quite a lot of padding for most, if not
all, standard installations. Then I had another idea - what about the
PID? Those are distributed sequentially or randomly, but all that really
matters is that two backends who happen to have the exact same now()
have diferent PIDs - which should always happen. Then the question of
how big to make the string - PIDs can be forced quite high with some
effort, but for purposes of coliision avoidance, we probably only need
some of the digits. So I came up with:
select to_char(right(pg_backend_pid()::text, 4)::int, 'FM00000');
While PIDs can be negative on some systems, that should not be a problem
either - it still makes a good tie breaker.
We can also compress things a little - we want to keep the timestamp
human readable, but we don't need all the whitespace that now() gives by
default. This, the final proposal becomes:
select to_char(now(), 'YYYYMMDD.HH24MISS.US.')::text ||
to_char(right(pg_backend_pid()::text, 5)::int,'FM00000');
The output looks like this:
20170910.143526.395762.04482
That's 28 characters, which is not too bad overall. And it is still
human readable, and sortable.
If we're going for space savings, it'd be just as well to leave out the
unnecessary dots, wouldn't it? It's still pretty easily readable without
them if you know what and how wide the fields are.
So how does it do in the criteria I made for the new txntime?
* No collisions
The addition of the last five of the current PID should avoid collisions.
In theory, there may still be collisions - which we can reduce further by
going to 6 or 7 digits, but is that really worth it? On a system using
6 or more digits, you'd have to have two backends created exactly
100,000 PIDs apart (if sequential), or happening to have the same last
five digits (if random), a 9 in 10,0000 chance, or 99 in 10,000.
Think I just talked myself into 6 digits. :)
We could use the same approach with txnid_current() and keep only the last
few digits, right? That would be a lot more consistent across OSes than
PID allocations are.
* Human readable
Not quite as intuitive as the existing timestamp, but fairly obvious and
easy to read. And we can document the right-hand-side of it. Might be
worth it to force it all to GMT as well.
+1 to using UTC.
* Disk space
We want the delta tables to be as small as possible. Right now, this
solution is a lot more space than a plain timestamp, but I don't
see an easy way around this problem right now.
It could probably be converted to 1 or 2 bigints, which would more
efficiently use the space available, and sort more quickly, at the cost of
no easy readability of the source inputs.
On rereading this, maybe we don't need the microseconds anymore either, as
the PID chunk should be enough to distinguish things? Or at least we could
do milliseconds, and save a few characters.
That's probably true.
Jon
_______________________________________________
Bucardo-general mailing list
[email protected]
https://mail.endcrypt.com/mailman/listinfo/bucardo-general