Re: [HACKERS] Proposal for CSN based snapshots

2017-10-05 Thread Robert Haas
On Mon, Sep 25, 2017 at 10:17 AM, Alexander Kuzmenkov
 wrote:
> Here is some news about the CSN patch.
>
> * I merged it with master (58bd60995f), which now has the clog group update.
> With this optimization, CSN is now faster than the master by about 15% on
> 100 to 400 clients (72 cores, pgbench tpcb-like, scale 500). It does not
> degrade faster than master as it did before. The numbers of clients greater
> than 400 were not measured.

Hmm, that's gratifying.

> * Querying for CSN of subtransactions was not implemented in the previous
> version of the patch, so I added it. I tested the performance on the
> tpcb-like pgbench script with some savepoints added, and it was
> significantly worse than on the master. The main culprit seems to be the
> ProcArrayLock taken in GetSnapshotData, GetRecentGlobalXmin,
> ProcArrayEndTransaction. Although it is only taken in shared mode, just
> reading the current lock mode and writing the same value back takes about
> 10% CPU. Maybe we could do away with some of these locks, but there is some
> interplay with imported snapshots and replication slots which I don't
> understand well. I plan to investigate this next.

That's not so good, though.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Proposal for CSN based snapshots

2017-08-02 Thread Alexander Kuzmenkov

What problem exactly you are seeing in the clog, is it the contention
around CLOGControlLock or generally accessing CLOG is slower.  If
former, then we already have a patch [1] to address it.
It's the contention around CLogControlLock. Thank you for the pointer, 
next time I'll try it with the group clog update patch.



--
Alexander Kuzmenkov
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company



--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Proposal for CSN based snapshots

2017-08-01 Thread Amit Kapila
On Tue, Aug 1, 2017 at 7:41 PM, Alexander Kuzmenkov
 wrote:
> Hi all,
>
> So I did some more experiments on this patch.
>
> * I fixed the bug with duplicate tuples I mentioned in the previous letter.
> Indeed, the oldestActiveXid could be advanced past the transaction's xid
> before it set the clog status. This happened because the oldestActiveXid is
> calculated based on the CSN log contents, and we wrote to CSN log before
> writing to clog. The fix is to write to clog before CSN log
> (TransactionIdAsyncCommitTree)
>
> * We can remove the exclusive locking on CSNLogControlLock when setting the
> CSN for a transaction (CSNLogSetPageStatus). When we assign a CSN to a
> transaction and its children, the atomicity is guaranteed by using an
> intermediate state (COMMITSEQNO_COMMITTING), so it doesn't matter if this
> function is not atomic in itself. The shared lock should suffice here.
>
> * On throughputs of about 100k TPS, we allocate ~1k CSN log pages per
> second. This is done with exclusive locking on CSN control lock, and
> noticeably increases contention. To alleviate this, I allocate new pages in
> batches (ExtendCSNLOG).
>
> * When advancing oldestActiveXid, we scan CSN log to find an xid that is
> still in progress. To do that, we increment the xid and query its CSN using
> the high level function, acquiring and releasing the lock and looking up the
> log page for each xid. I wrote some code to acquire the lock only once and
> then scan the pages (CSNLogSetPageStatus).
>
> * On bigger buffers the linear page lookup code that the SLRU uses now
> becomes slow. I added a shared dynahash table to speed up this lookup.
>
> * I merged in recent changes from master (up to 7e1fb4). Unfortunately I
> didn't have enough time to fix the logical replication and snapshot import,
> so now it's completely broken.
>
> I ran some pgbench with these tweaks (tpcb-like, 72 cores, scale 500). The
> throughput is good on lower number of clients (on 50 clients it's 35% higher
> than on the master), but then it degrades steadily. After 250 clients it's
> already lower than master; see the attached graph. In perf reports the
> CSN-related things have almost vanished, and I see lots of time spent
> working with clog. This is probably the situation where by making some parts
> faster, the contention in other parts becomes worse and overall we have a
> performance loss.

Yeah, this happens sometimes and I have also observed this behavior.

> Hilariously, at some point I saw a big performance
> increase after adding some debug printfs. I wanted to try some things with
> the clog next, but for now I'm out of time.
>

What problem exactly you are seeing in the clog, is it the contention
around CLOGControlLock or generally accessing CLOG is slower.  If
former, then we already have a patch [1] to address it.


[1] - https://commitfest.postgresql.org/14/358/

-- 
With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Proposal for CSN based snapshots

2017-07-19 Thread Alexander Kuzmenkov

That's not actually a problem as I am reusing an older v4 from Heikki
now for the future, but I wanted to let you know about that first.

Thank you, I'll look into that.

--
Alexander Kuzmenkov
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company



--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Proposal for CSN based snapshots

2017-07-18 Thread Michael Paquier
On Wed, Jun 21, 2017 at 1:48 PM, Alexander Kuzmenkov
 wrote:
> Glad to see you working on this! I've been studying this topic too. Attached
> you can find a recently rebased version of Heikki's v4 patch.
> I also fixed a bug that appeared on report-receipts isolation test:
> XidIsConcurrent should report that a transaction is concurrent to ours when
> its csn equals our snapshotcsn.

I am finally looking at rebasing things properly now, and getting more
familiar in order to come up with a patch for the upcoming coming fest
(I can see some diff simplifications as well to impact less existing
applications), and this v5 is very suspicious. You are adding code
that should actually be removed. One such block can be found at the
beginning of the patch:

+txid_snapshot_xip
+   
+
+   
That's not actually a problem as I am reusing an older v4 from Heikki
now for the future, but I wanted to let you know about that first.
-- 
Michael


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Proposal for CSN based snapshots

2017-06-21 Thread Alexander Kuzmenkov

On 21.06.2017 04:48, Michael Paquier wrote:

There has not been much activity on this thread for some time, and I
mentioned my intentions to some developers at the last PGCon. But I am
planning to study more the work that has been done here, with as
envisaged goal to present a patch for the first CF of PG11. Lots of
fun ahead.


Hi Michael,

Glad to see you working on this! I've been studying this topic too. 
Attached you can find a recently rebased version of Heikki's v4 patch.
I also fixed a bug that appeared on report-receipts isolation test: 
XidIsConcurrent should report that a transaction is concurrent to ours 
when its csn equals our snapshotcsn.


There is another bug where multiple tuples with the same primary key 
appear after a pgbench read-write test. Querying pgbench_branches with 
disabled index scans, I see several tuples with the same 'bid' value. 
Usually they do not touch the index so there are no errors. But 
sometimes HOT update is not possible and it errors out with violation of 
unique constraint, which is how I noticed it. I can't reproduce it on my 
machine and have to use a 72-core one.


For now I can conclude that the oldestActiveXid is not always updated 
correctly. In TransactionIdAsyncCommitTree, just before the transaction 
sets clog status, its xid can be less than oldestActiveXid. Other 
transactions are seeing it as aborted for some time before it writes to 
clog (see TransactionIdGetCommitSeqNo). They can update a tuple it 
deleted, and that leads to duplication. Unfortunately, I didn't have 
time yet to investigate this further.

--

Alexander Kuzmenkov
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company



csn-5.patch.gz
Description: application/gzip

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Proposal for CSN based snapshots

2017-06-20 Thread Michael Paquier
On Wed, Aug 24, 2016 at 5:54 PM, Heikki Linnakangas  wrote:
> And here are the results on the 72 core machine (thanks again, Alexander!).
> The test setup was the same as on the 32-core machine, except that I ran it
> with more clients since the system has more CPU cores. In summary, in the
> best case, the patch increases throughput by about 10%. That peak is with 64
> clients. Interestingly, as the number of clients increases further, the gain
> evaporates, and the CSN version actually performs worse than unpatched
> master. I don't know why that is. One theory that by eliminating one
> bottleneck, we're now hitting another bottleneck which doesn't degrade as
> gracefully when there's contention.
>
> Full results are available at
> https://hlinnaka.iki.fi/temp/csn-4-72core-results/.

There has not been much activity on this thread for some time, and I
mentioned my intentions to some developers at the last PGCon. But I am
planning to study more the work that has been done here, with as
envisaged goal to present a patch for the first CF of PG11. Lots of
fun ahead.
-- 
Michael


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Proposal for CSN based snapshots

2016-12-22 Thread Amit Kapila
On Wed, Aug 24, 2016 at 2:24 PM, Heikki Linnakangas  wrote:
>
> And here are the results on the 72 core machine (thanks again, Alexander!).
> The test setup was the same as on the 32-core machine, except that I ran it
> with more clients since the system has more CPU cores. In summary, in the
> best case, the patch increases throughput by about 10%. That peak is with 64
> clients. Interestingly, as the number of clients increases further, the gain
> evaporates, and the CSN version actually performs worse than unpatched
> master. I don't know why that is. One theory that by eliminating one
> bottleneck, we're now hitting another bottleneck which doesn't degrade as
> gracefully when there's contention.
>

Quite possible and I think it could be either due CLOGControlLock or
WALWriteLock.  Are you planning to work on this for this release, if
not do you think meanwhile we should pursue "caching the snapshot"
idea of Andres?  Last time Mithun did some benchmarks [1] after fixing
some issues in the patch and he found noticeable performance increase.

[1] - 
https://www.postgresql.org/message-id/CAD__Ouic1Tvnwqm6Wf6j7Cz1Kk1DQgmy0isC7%3DOgX%2B3JtfGk9g%40mail.gmail.com

-- 
With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Proposal for CSN based snapshots

2016-08-24 Thread Alexander Korotkov
On Wed, Aug 24, 2016 at 11:54 AM, Heikki Linnakangas 
wrote:

> On 08/23/2016 06:18 PM, Heikki Linnakangas wrote:
>
>> On 08/22/2016 08:38 PM, Andres Freund wrote:
>>
>>> On 2016-08-22 20:32:42 +0300, Heikki Linnakangas wrote:
>>>
 I
 remember seeing ProcArrayLock contention very visible earlier, but I
 can't
 hit that now. I suspect you'd still see contention on bigger hardware,
 though, my laptop has oly 4 cores. I'll have to find a real server for
 the
 next round of testing.

>>>
>>> Yea, I think that's true. I can just about see ProcArrayLock contention
>>> on my more powerful laptop, to see it really bad you need bigger
>>> hardware / higher concurrency.
>>>
>>
>> As soon as I sent my previous post, Vladimir Borodin kindly offered
>> access to a 32-core server for performance testing. Thanks Vladimir!
>>
>> I installed Greg Smith's pgbench-tools kit on that server, and ran some
>> tests. I'm seeing some benefit on "pgbench -N" workload, but only after
>> modifying the test script to use "-M prepared", and using Unix domain
>> sockets instead of TCP to connect. Apparently those things add enough
>> overhead to mask out the little difference.
>>
>> Attached is a graph with the results. Full results are available at
>> https://hlinnaka.iki.fi/temp/csn-4-results/. In short, the patch
>> improved throughput, measured in TPS, with >= 32 or so clients. The
>> biggest difference was with 44 clients, which saw about 5% improvement.
>>
>> So, not phenomenal, but it's something. I suspect that with more cores,
>> the difference would become more clear.
>>
>> Like on a cue, Alexander Korotkov just offered access to a 72-core
>> system :-). Thanks! I'll run the same tests on that.
>>
>
> And here are the results on the 72 core machine (thanks again,
> Alexander!). The test setup was the same as on the 32-core machine, except
> that I ran it with more clients since the system has more CPU cores. In
> summary, in the best case, the patch increases throughput by about 10%.
> That peak is with 64 clients. Interestingly, as the number of clients
> increases further, the gain evaporates, and the CSN version actually
> performs worse than unpatched master. I don't know why that is. One theory
> that by eliminating one bottleneck, we're now hitting another bottleneck
> which doesn't degrade as gracefully when there's contention.
>

Did you try to identify this second bottleneck with perf or something?
It would be nice to also run pgbench -S.  Also, it would be nice to check
something like 10% of writes, 90% of reads (which is quite typical workload
in real life I believe).

--
Alexander Korotkov
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company


Re: [HACKERS] Proposal for CSN based snapshots

2016-08-24 Thread Heikki Linnakangas

On 08/23/2016 06:18 PM, Heikki Linnakangas wrote:

On 08/22/2016 08:38 PM, Andres Freund wrote:

On 2016-08-22 20:32:42 +0300, Heikki Linnakangas wrote:

I
remember seeing ProcArrayLock contention very visible earlier, but I can't
hit that now. I suspect you'd still see contention on bigger hardware,
though, my laptop has oly 4 cores. I'll have to find a real server for the
next round of testing.


Yea, I think that's true. I can just about see ProcArrayLock contention
on my more powerful laptop, to see it really bad you need bigger
hardware / higher concurrency.


As soon as I sent my previous post, Vladimir Borodin kindly offered
access to a 32-core server for performance testing. Thanks Vladimir!

I installed Greg Smith's pgbench-tools kit on that server, and ran some
tests. I'm seeing some benefit on "pgbench -N" workload, but only after
modifying the test script to use "-M prepared", and using Unix domain
sockets instead of TCP to connect. Apparently those things add enough
overhead to mask out the little difference.

Attached is a graph with the results. Full results are available at
https://hlinnaka.iki.fi/temp/csn-4-results/. In short, the patch
improved throughput, measured in TPS, with >= 32 or so clients. The
biggest difference was with 44 clients, which saw about 5% improvement.

So, not phenomenal, but it's something. I suspect that with more cores,
the difference would become more clear.

Like on a cue, Alexander Korotkov just offered access to a 72-core
system :-). Thanks! I'll run the same tests on that.


And here are the results on the 72 core machine (thanks again, 
Alexander!). The test setup was the same as on the 32-core machine, 
except that I ran it with more clients since the system has more CPU 
cores. In summary, in the best case, the patch increases throughput by 
about 10%. That peak is with 64 clients. Interestingly, as the number of 
clients increases further, the gain evaporates, and the CSN version 
actually performs worse than unpatched master. I don't know why that is. 
One theory that by eliminating one bottleneck, we're now hitting another 
bottleneck which doesn't degrade as gracefully when there's contention.


Full results are available at 
https://hlinnaka.iki.fi/temp/csn-4-72core-results/.


- Heikki


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Proposal for CSN based snapshots

2016-08-23 Thread Andres Freund
On 2016-08-23 18:18:57 +0300, Heikki Linnakangas wrote:
> I installed Greg Smith's pgbench-tools kit on that server, and ran some
> tests. I'm seeing some benefit on "pgbench -N" workload, but only after
> modifying the test script to use "-M prepared", and using Unix domain
> sockets instead of TCP to connect. Apparently those things add enough
> overhead to mask out the little difference.

To make the problem more apparent, how are the differences for something
as extreme as just a
SELECT 1;
or a
SELECT 1 FROM pgbench_accounts WHERE aid = 1;

Greetings,

Andres Freund


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Proposal for CSN based snapshots

2016-08-23 Thread Heikki Linnakangas

On 08/22/2016 08:38 PM, Andres Freund wrote:

On 2016-08-22 20:32:42 +0300, Heikki Linnakangas wrote:

I
remember seeing ProcArrayLock contention very visible earlier, but I can't
hit that now. I suspect you'd still see contention on bigger hardware,
though, my laptop has oly 4 cores. I'll have to find a real server for the
next round of testing.


Yea, I think that's true. I can just about see ProcArrayLock contention
on my more powerful laptop, to see it really bad you need bigger
hardware / higher concurrency.


As soon as I sent my previous post, Vladimir Borodin kindly offered 
access to a 32-core server for performance testing. Thanks Vladimir!


I installed Greg Smith's pgbench-tools kit on that server, and ran some 
tests. I'm seeing some benefit on "pgbench -N" workload, but only after 
modifying the test script to use "-M prepared", and using Unix domain 
sockets instead of TCP to connect. Apparently those things add enough 
overhead to mask out the little difference.


Attached is a graph with the results. Full results are available at 
https://hlinnaka.iki.fi/temp/csn-4-results/. In short, the patch 
improved throughput, measured in TPS, with >= 32 or so clients. The 
biggest difference was with 44 clients, which saw about 5% improvement.


So, not phenomenal, but it's something. I suspect that with more cores, 
the difference would become more clear.


Like on a cue, Alexander Korotkov just offered access to a 72-core 
system :-). Thanks! I'll run the same tests on that.


- Heikki


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Proposal for CSN based snapshots

2016-08-23 Thread Amit Kapila
On Mon, Aug 22, 2016 at 11:11 PM, Robert Haas  wrote:
> On Mon, Aug 22, 2016 at 1:32 PM, Heikki Linnakangas  wrote:
>>> But what about the best case?  If we create a scenario where there are
>>> no open read-write transactions at all and (somehow) lots and lots of
>>> ProcArrayLock contention, how much does this help?
>>
>> I ran some quick pgbench tests on my laptop, but didn't see any meaningful
>> benefit. I think the best I could see is about 5% speedup, when running
>> "pgbench -S", with 900 idle connections sitting in the background. On the
>> positive side, I didn't see much slowdown either. (Sorry, I didn't record
>> the details of those tests, as I was testing many different options and I
>> didn't see a clear difference either way.)
>
> That's not very exciting.
>
>> It seems that Amit's PGPROC batch clearing patch was very effective. I
>> remember seeing ProcArrayLock contention very visible earlier, but I can't
>> hit that now. I suspect you'd still see contention on bigger hardware,
>> though, my laptop has oly 4 cores. I'll have to find a real server for the
>> next round of testing.
>
> It's good that those patches were effective, and I bet that approach
> could be further refined, too.  However, I think Amit may have
> mentioned in an internal meeting that he was able to generate some
> pretty serious ProcArrayLock contention with some of his hash index
> patches applied.  I don't remember the details, though.
>

Yes, thats right.  We are seeing ProcArrayLock as a bottleneck at
high-concurrency for hash-index reads.  On a quick look, it seems
patch still acquires ProcArrayLock in shared mode in GetSnapshotData()
although duration is reduced.  I think it is worth to try the workload
where we are seeing ProcArrayLock as bottleneck, but not sure if it
can drastically improves the situation, because as per our initial
analysis, lock acquisition and release itself causes problem.


-- 
With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Proposal for CSN based snapshots

2016-08-22 Thread Andres Freund
On 2016-08-22 13:41:57 -0400, Robert Haas wrote:
> On Mon, Aug 22, 2016 at 1:32 PM, Heikki Linnakangas  wrote:
> >> But what about the best case?  If we create a scenario where there are
> >> no open read-write transactions at all and (somehow) lots and lots of
> >> ProcArrayLock contention, how much does this help?
> >
> > I ran some quick pgbench tests on my laptop, but didn't see any meaningful
> > benefit. I think the best I could see is about 5% speedup, when running
> > "pgbench -S", with 900 idle connections sitting in the background. On the
> > positive side, I didn't see much slowdown either. (Sorry, I didn't record
> > the details of those tests, as I was testing many different options and I
> > didn't see a clear difference either way.)
> 
> That's not very exciting.

I think it's neither exciting nor worrying - the benefit of the pgproc
batch clearing itself wasn't apparent on small hardware either...


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Proposal for CSN based snapshots

2016-08-22 Thread Robert Haas
On Mon, Aug 22, 2016 at 1:32 PM, Heikki Linnakangas  wrote:
>> But what about the best case?  If we create a scenario where there are
>> no open read-write transactions at all and (somehow) lots and lots of
>> ProcArrayLock contention, how much does this help?
>
> I ran some quick pgbench tests on my laptop, but didn't see any meaningful
> benefit. I think the best I could see is about 5% speedup, when running
> "pgbench -S", with 900 idle connections sitting in the background. On the
> positive side, I didn't see much slowdown either. (Sorry, I didn't record
> the details of those tests, as I was testing many different options and I
> didn't see a clear difference either way.)

That's not very exciting.

> It seems that Amit's PGPROC batch clearing patch was very effective. I
> remember seeing ProcArrayLock contention very visible earlier, but I can't
> hit that now. I suspect you'd still see contention on bigger hardware,
> though, my laptop has oly 4 cores. I'll have to find a real server for the
> next round of testing.

It's good that those patches were effective, and I bet that approach
could be further refined, too.  However, I think Amit may have
mentioned in an internal meeting that he was able to generate some
pretty serious ProcArrayLock contention with some of his hash index
patches applied.  I don't remember the details, though.

>> Because there's only a purpose to trying to minimize the losses if
>> there are some gains to which we can look forward.
>
> Aside from the potential performance gains, this slashes a lot of
> complicated code:
>
>  70 files changed, 2429 insertions(+), 6066 deletions(-)
>
> That removed code is quite mature at this point, and I'm sure we'll add some
> code back to this patch as it evolves, but still.

That's interesting, but it might just mean we're replacing well-tested
code with new, buggy code.  By the time you fix all the performance
regressions, those numbers could be a lot closer together.

> Also, I'm looking forward for a follow-up patch, to track snapshots in
> backends at a finer level, so that vacuum could remove tuples more
> aggressively, if you have pg_dump running for days. CSN snapshots isn't a
> strict requirement for that, but it makes it simpler, when you can represent
> a snapshot with a small fixed-size integer.

That would certainly be nice, but I think we need to be careful not to
sacrifice too much trying to get there.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Proposal for CSN based snapshots

2016-08-22 Thread Andres Freund
Hi,

On 2016-08-22 20:32:42 +0300, Heikki Linnakangas wrote:
> I ran some quick pgbench tests on my laptop, but didn't see any meaningful
> benefit. I think the best I could see is about 5% speedup, when running
> "pgbench -S", with 900 idle connections sitting in the background. On the
> positive side, I didn't see much slowdown either. (Sorry, I didn't record
> the details of those tests, as I was testing many different options and I
> didn't see a clear difference either way.)

Hm. Does the picture change if those are idle in transaction, after
assigning an xid.


> It seems that Amit's PGPROC batch clearing patch was very effective.

It usually breaks down if you have a mixed read/write workload - might
be worthehile prototyping that.


> I
> remember seeing ProcArrayLock contention very visible earlier, but I can't
> hit that now. I suspect you'd still see contention on bigger hardware,
> though, my laptop has oly 4 cores. I'll have to find a real server for the
> next round of testing.

Yea, I think that's true. I can just about see ProcArrayLock contention
on my more powerful laptop, to see it really bad you need bigger
hardware / higher concurrency.


Greetings,

Andres Freund


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Proposal for CSN based snapshots

2016-08-22 Thread Heikki Linnakangas

On 08/22/2016 07:49 PM, Robert Haas wrote:

Nice to see you working on this again.

On Mon, Aug 22, 2016 at 12:35 PM, Heikki Linnakangas  wrote:

A sequential scan of a table like that with 10 million rows took about 700
ms on my laptop, when the hint bits are set, without this patch. With this
patch, if there's a snapshot holding back the xmin horizon, so that we need
to check the CSN log for every XID, it took about 3 ms. So we have some
optimization work to do :-). I'm not overly worried about that right now, as
I think there's a lot of room for improvement in the SLRU code. But that's
the next thing I'm going to work.


So the worst case for this patch is obviously bad right now and, as
you say, that means that some optimization work is needed.

But what about the best case?  If we create a scenario where there are
no open read-write transactions at all and (somehow) lots and lots of
ProcArrayLock contention, how much does this help?


I ran some quick pgbench tests on my laptop, but didn't see any 
meaningful benefit. I think the best I could see is about 5% speedup, 
when running "pgbench -S", with 900 idle connections sitting in the 
background. On the positive side, I didn't see much slowdown either. 
(Sorry, I didn't record the details of those tests, as I was testing 
many different options and I didn't see a clear difference either way.)


It seems that Amit's PGPROC batch clearing patch was very effective. I 
remember seeing ProcArrayLock contention very visible earlier, but I 
can't hit that now. I suspect you'd still see contention on bigger 
hardware, though, my laptop has oly 4 cores. I'll have to find a real 
server for the next round of testing.



Because there's only a purpose to trying to minimize the losses if
there are some gains to which we can look forward.


Aside from the potential performance gains, this slashes a lot of 
complicated code:


 70 files changed, 2429 insertions(+), 6066 deletions(-)

That removed code is quite mature at this point, and I'm sure we'll add 
some code back to this patch as it evolves, but still.


Also, I'm looking forward for a follow-up patch, to track snapshots in 
backends at a finer level, so that vacuum could remove tuples more 
aggressively, if you have pg_dump running for days. CSN snapshots isn't 
a strict requirement for that, but it makes it simpler, when you can 
represent a snapshot with a small fixed-size integer.


Yes, seeing some direct performance gains would be nice too.

- Heikki


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Proposal for CSN based snapshots

2016-08-22 Thread Robert Haas
Nice to see you working on this again.

On Mon, Aug 22, 2016 at 12:35 PM, Heikki Linnakangas  wrote:
> A sequential scan of a table like that with 10 million rows took about 700
> ms on my laptop, when the hint bits are set, without this patch. With this
> patch, if there's a snapshot holding back the xmin horizon, so that we need
> to check the CSN log for every XID, it took about 3 ms. So we have some
> optimization work to do :-). I'm not overly worried about that right now, as
> I think there's a lot of room for improvement in the SLRU code. But that's
> the next thing I'm going to work.

So the worst case for this patch is obviously bad right now and, as
you say, that means that some optimization work is needed.

But what about the best case?  If we create a scenario where there are
no open read-write transactions at all and (somehow) lots and lots of
ProcArrayLock contention, how much does this help?

Because there's only a purpose to trying to minimize the losses if
there are some gains to which we can look forward.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Proposal for CSN based snapshots

2016-08-22 Thread Heikki Linnakangas

On 08/22/2016 07:35 PM, Heikki Linnakangas wrote:

And here's a new patch version...


And here's the attachment I forgot, *sigh*..

- Heikki



csn-4.patch.gz
Description: application/gzip

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Proposal for CSN based snapshots

2016-08-22 Thread Heikki Linnakangas
And here's a new patch version. Still lots of work to do, especially in 
performance testing, and minimizing the worst-case performance hit.


On 08/09/2016 03:16 PM, Heikki Linnakangas wrote:

Next steps:

* Hot standby feedback is broken, now that CSN != LSN again. Will have
to switch this back to using an "oldest XID", rather than a CSN.

* I plan to replace pg_subtrans with a special range of CSNs in the
csnlog. Something like, start the CSN counter at 2^32 + 1, and use CSNs
< 2^32 to mean "this is a subtransaction, parent is XXX". One less SLRU
to maintain.

* Put per-proc xmin back into procarray. I removed it, because it's not
necessary for snapshots or GetOldestSnapshot() (which replaces
GetOldestXmin()) anymore. But on second thoughts, we still need it for
deciding when it's safe to truncate the csnlog.

* In this patch, HeapTupleSatisfiesVacuum() is rewritten to use an
"oldest CSN", instead of "oldest xmin", but that's not strictly
necessary. To limit the size of the patch, I might revert those changes
for now.


I did all of the above. This patch is now much smaller, as I didn't 
change all the places that used to deal with global-xmin's, like I did 
earlier. The oldest-xmin is now computed pretty like it always has been.



* Rewrite the way RecentGlobalXmin is updated. As Alvaro pointed out in
his review comments two years ago, that was quite complicated. And I'm
worried that the lazy scheme I had might not allow pruning fast enough.
I plan to make it more aggressive, so that whenever the currently oldest
transaction finishes, it's responsible for advancing the "global xmin"
in shared memory. And the way it does that, is by scanning the csnlog,
starting from the current "global xmin", until the next still
in-progress XID. That could be a lot, if you have a very long-running
transaction that ends, but we'll see how it performs.


I ripped out all that, and created a GetRecentGlobalXmin() function that 
computes a global-xmin value when needed, like GetOldestXmin() does. 
Seems most straightforward. Since we no longer get a RecentGlobalXmin 
value essentially for free in GetSnapshotData(), as we no longer scan 
the proc array, it's better to compute the value only when needed.



* Performance testing. Clearly this should have a performance benefit,
at least under some workloads, to be worthwhile. And not regress.


I wrote a little C module to create a "worst-case" table. Every row in 
the table has a different xmin, and the xmin values are shuffled across 
the table, to defeat any caching.


A sequential scan of a table like that with 10 million rows took about 
700 ms on my laptop, when the hint bits are set, without this patch. 
With this patch, if there's a snapshot holding back the xmin horizon, so 
that we need to check the CSN log for every XID, it took about 3 ms. 
So we have some optimization work to do :-). I'm not overly worried 
about that right now, as I think there's a lot of room for improvement 
in the SLRU code. But that's the next thing I'm going to work.


- Heikki



--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Proposal for CSN based snapshots

2016-08-10 Thread Alexander Korotkov
On Wed, Aug 10, 2016 at 8:26 PM, Greg Stark  wrote:

> On Wed, Aug 10, 2016 at 5:54 PM, Alexander Korotkov
>  wrote:
> > Oh, I found that I underestimated complexity of async commit...  :)
>
> Indeed. I think Tom's attitude was right even if the specific
> conclusion was wrong. While I don't think removing async commit is
> viable I think it would be a laudable goal if we can remove some of
> the complication in it. I normally describe async commit as "just like
> a normal commit but don't block on the commit" but it's actually a bit
> more complicated.
>
> AIUI the additional complexity is that while async commits are visible
> to everyone immediately other non-async transactions can be committed
> but then be in limbo for a while before they are visible to others. So
> other sessions will see the async commit "jump ahead" of any non-async
> transactions even if those other transactions were committed first.
> Any standbys will see the non-async transaction in the logs before the
> async transaction and in a crash it's possible to lose the async
> transaction even though it was visible but not the sync transaction
> that wasn't.
>
> Complexity like this makes it hard to implement other features such as
> CSNs. IIRC this already bit hot standby as well. I think it would be a
> big improvement if we had a clear, well defined commit order that was
> easy to explain and easy to reason about when new changes are being
> made.
>

Heikki, Ants, Greg, thank you for the explanation.  You restored order in
my personal world.
Now I see that introduction of own sequence of CSN which is not equal to
LSN makes sense.

--
Alexander Korotkov
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company


Re: [HACKERS] Proposal for CSN based snapshots

2016-08-10 Thread Kevin Grittner
On Wed, Aug 10, 2016 at 12:26 PM, Greg Stark  wrote:

> Complexity like this makes it hard to implement other features such as
> CSNs. IIRC this already bit hot standby as well. I think it would be a
> big improvement if we had a clear, well defined commit order that was
> easy to explain and easy to reason about when new changes are being
> made.

And here I was getting concerned that there was no mention of
"apparent order of execution" for serializable transactions --
which does not necessarily match either the order of LSNs from
commit records nor CSNs.  The order in which transactions become
visible is clearly a large factor in determining AOoE, but it is
secondary to looking at whether a transaction modified data based
on reading the "before" image of a data set modified by a
concurrent transaction.

I still think that our best bet for avoiding anomalies when using
logical replication in complex environments is for logical
replication to apply transactions in apparent order of execution.

--
Kevin Grittner
EDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Proposal for CSN based snapshots

2016-08-10 Thread Ants Aasma
On Wed, Aug 10, 2016 at 7:54 PM, Alexander Korotkov
 wrote:
> Oh, I found that I underestimated complexity of async commit...  :)
>
> Do I understand right that now async commit right as follows?
> 1) Async transaction confirms commit before flushing WAL.
> 2) Other transactions sees effect of async transaction only when its WAL
> flushed.
> 3) In the session which just committed async transaction, effect of this
> transaction is visible immediately (before WAL flushed). Is it true?

Current code  simplified:

XactLogCommitRecord()
if (synchronous_commit)
 XLogFlush()
ProcArrayEndTransaction() // Become visible

The issue we are discussing is that with CSNs, the "become visible"
portion must occur in CSN order. If CSN == LSN, then async
transactions that have their commit record after a sync record must
wait for the sync record to flush and become visible. Simplest
solution is to not require CSN == LSN and just assign a CSN value
immediately before becoming visible.

Regards,
Ants Aasma


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Proposal for CSN based snapshots

2016-08-10 Thread Greg Stark
On Wed, Aug 10, 2016 at 5:54 PM, Alexander Korotkov
 wrote:
> Oh, I found that I underestimated complexity of async commit...  :)

Indeed. I think Tom's attitude was right even if the specific
conclusion was wrong. While I don't think removing async commit is
viable I think it would be a laudable goal if we can remove some of
the complication in it. I normally describe async commit as "just like
a normal commit but don't block on the commit" but it's actually a bit
more complicated.

AIUI the additional complexity is that while async commits are visible
to everyone immediately other non-async transactions can be committed
but then be in limbo for a while before they are visible to others. So
other sessions will see the async commit "jump ahead" of any non-async
transactions even if those other transactions were committed first.
Any standbys will see the non-async transaction in the logs before the
async transaction and in a crash it's possible to lose the async
transaction even though it was visible but not the sync transaction
that wasn't.

Complexity like this makes it hard to implement other features such as
CSNs. IIRC this already bit hot standby as well. I think it would be a
big improvement if we had a clear, well defined commit order that was
easy to explain and easy to reason about when new changes are being
made.


-- 
greg


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Proposal for CSN based snapshots

2016-08-10 Thread Heikki Linnakangas

On 08/10/2016 07:54 PM, Alexander Korotkov wrote:

Do I understand right that now async commit right as follows?
1) Async transaction confirms commit before flushing WAL.


Yes.


2) Other transactions sees effect of async transaction only when its WAL
flushed.


No. Other transactions also see the effects of the async transaction 
before it's flushed.



3) In the session which just committed async transaction, effect of this
transaction is visible immediately (before WAL flushed). Is it true?


Yes. (The same session is not a special case, per previous point. All 
sessions see the async transaction as committed, even before it's flushed.)


- Heikki



--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Proposal for CSN based snapshots

2016-08-10 Thread Alexander Korotkov
On Wed, Aug 10, 2016 at 6:09 PM, Heikki Linnakangas  wrote:

> On 08/10/2016 05:51 PM, Tom Lane wrote:
>
>> Heikki Linnakangas  writes:
>>
>>> On 08/10/2016 05:09 PM, Tom Lane wrote:
>>>
 Uh, what?  That's not the semantics we have today, and I don't see why
 it's necessary or a good idea.  Once the commit is in the WAL stream,
 any action taken on the basis of seeing the commit must be later in
 the WAL stream.  So what's the problem?

>>>
>> I was talking about synchronous commits in the above. A synchronous
>>> commit is not made visible to other transactions, until the commit WAL
>>> record is flushed to disk.
>>>
>>
>> [ thinks for a bit... ]  Oh, OK, that's because we don't treat a
>> transaction as committed until its clog bit is set *and* it's not
>> marked as running in the PGPROC array.  And sync transactions will
>> flush WAL in between.
>>
>
> Right.
>
> Still, having to invent CSNs seems like a huge loss for this design.
>> Personally I'd give up async commit first.  If we had only sync commit,
>> the rule could be "xact LSN less than snapshot threshold and less than
>> WAL flush position", and we'd not need CSNs.  I know some people like
>> async commit, but it's there because it was easy and cheap in our old
>> design, not because it's the world's greatest feature and worth giving
>> up performance for.
>>
>
> I don't think that's a very popular opinion (I disagree, for one).
> Asynchronous commits are a huge performance boost for some applications.
> The alternative is fsync=off, and I don't want to see more people doing
> that. SSDs have made the penalty of an fsync much smaller, but it's still
> there.
>
> Hmm. There's one more possible way this could all work. Let's have CSN ==
> LSN, also for asynchronous commits. A snapshot is the current insert
> position, but also make note of the current flush position, when you take a
> snapshot. Now, when you use the snapshot, if you ever see an XID that
> committed between the snapshot's insert position and the flush position,
> wait for the WAL to be flushed up to the snapshot's insert position at that
> point. With that scheme, an asynchronous commit could return to the
> application without waiting for a flush, but if someone actually looks at
> the changes the transaction made, then that transaction would have to wait.
> Furthermore, we could probably skip that waiting too, if the reading
> transaction is also using synchronous_commit=off.
>
> That's slightly different from the current behaviour. A transaction that
> runs with synchronous_commit=on, and reads data that was modified by an
> asynchronous transaction, would take a hit. But I think that would be
> acceptable.


Oh, I found that I underestimated complexity of async commit...  :)

Do I understand right that now async commit right as follows?
1) Async transaction confirms commit before flushing WAL.
2) Other transactions sees effect of async transaction only when its WAL
flushed.
3) In the session which just committed async transaction, effect of this
transaction is visible immediately (before WAL flushed). Is it true?

--
Alexander Korotkov
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company


Re: [HACKERS] Proposal for CSN based snapshots

2016-08-10 Thread Ants Aasma
On Wed, Aug 10, 2016 at 6:09 PM, Heikki Linnakangas  wrote:
> Hmm. There's one more possible way this could all work. Let's have CSN ==
> LSN, also for asynchronous commits. A snapshot is the current insert
> position, but also make note of the current flush position, when you take a
> snapshot. Now, when you use the snapshot, if you ever see an XID that
> committed between the snapshot's insert position and the flush position,
> wait for the WAL to be flushed up to the snapshot's insert position at that
> point. With that scheme, an asynchronous commit could return to the
> application without waiting for a flush, but if someone actually looks at
> the changes the transaction made, then that transaction would have to wait.
> Furthermore, we could probably skip that waiting too, if the reading
> transaction is also using synchronous_commit=off.
>
> That's slightly different from the current behaviour. A transaction that
> runs with synchronous_commit=on, and reads data that was modified by an
> asynchronous transaction, would take a hit. But I think that would be
> acceptable.

My proposal of vector clocks would allow for pretty much exactly
current behavior.

To simplify, there would be lastSyncCommitSeqNo and
lastAsyncCommitSeqNo variables in ShmemVariableCache. Transaction
commit would choose which one to update based on synchronous_commit
setting and embed the value of the setting into CSN log. Snapshots
would contain both values, when checking for CSN visibility use the
value of the looked up synchronous_commit setting to decide which
value to compare against. Standby's replaying commit records would
just update both values, resulting in transactions becoming visible in
xlog order, as they do today. The scheme would allow for inventing a
new xlog record/replication message communicating visibility ordering.

However I don't see why inventing a separate CSN concept is a large
problem. Quite the opposite, unless there is a good reason that I'm
missing, it seems better to not unnecessarily conflate commit record
durability and transaction visibility ordering. Not having them tied
together allows for an external source to provide CSN values, allowing
for interesting distributed transaction implementations. E.g. using a
timestamp as the CSN a'la Google Spanner and the TrueTime API.

Regards,
Ants Aasma


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Proposal for CSN based snapshots

2016-08-10 Thread Joshua D. Drake

On 08/10/2016 09:04 AM, Stephen Frost wrote:

* Joshua D. Drake (j...@commandprompt.com) wrote:

+1 for Robert here, removing async commit is a non-starter. It is
PostgreSQL performance 101 that you disable synchronous commit
unless you have a specific data/business requirement that needs it.
Specifically because of how much faster Pg is with async commit.


I agree that we don't want to get rid of async commit, but, for the
archive, I wouldn't recommend using it unless you specifically understand
and accept that trade-off, so I wouldn't lump it into a "PostgreSQL
performance 101" group- that's increasing work_mem, shared_buffers, WAL
size, etc.  Accepting that you're going to lose *committed* transactions
on a crash requires careful thought and consideration of what you're
going to do when that happens, not the other way around.


Yes Stephen, you are correct which is why I said, "unless you have a 
specific data/business requirement that needs it".


Thanks!

jD



Thanks!

Stephen




--
Command Prompt, Inc.  http://the.postgres.company/
+1-503-667-4564
PostgreSQL Centered full stack support, consulting and development.
Everyone appreciates your honesty, until you are honest with them.
Unless otherwise stated, opinions are my own.


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Proposal for CSN based snapshots

2016-08-10 Thread Stephen Frost
* Joshua D. Drake (j...@commandprompt.com) wrote:
> +1 for Robert here, removing async commit is a non-starter. It is
> PostgreSQL performance 101 that you disable synchronous commit
> unless you have a specific data/business requirement that needs it.
> Specifically because of how much faster Pg is with async commit.

I agree that we don't want to get rid of async commit, but, for the
archive, I wouldn't recommend using it unless you specifically understand
and accept that trade-off, so I wouldn't lump it into a "PostgreSQL
performance 101" group- that's increasing work_mem, shared_buffers, WAL
size, etc.  Accepting that you're going to lose *committed* transactions
on a crash requires careful thought and consideration of what you're
going to do when that happens, not the other way around.

Thanks!

Stephen


signature.asc
Description: Digital signature


Re: [HACKERS] Proposal for CSN based snapshots

2016-08-10 Thread Joshua D. Drake

On 08/10/2016 08:28 AM, Robert Haas wrote:

On Wed, Aug 10, 2016 at 11:09 AM, Heikki Linnakangas  wrote:

Still, having to invent CSNs seems like a huge loss for this design.
Personally I'd give up async commit first.  If we had only sync commit,
the rule could be "xact LSN less than snapshot threshold and less than
WAL flush position", and we'd not need CSNs.  I know some people like
async commit, but it's there because it was easy and cheap in our old
design, not because it's the world's greatest feature and worth giving
up performance for.


I don't think that's a very popular opinion (I disagree, for one).
Asynchronous commits are a huge performance boost for some applications. The
alternative is fsync=off, and I don't want to see more people doing that.
SSDs have made the penalty of an fsync much smaller, but it's still there.


Uh, yeah.  Asynchronous commit can be 100 times faster on some
realistic workloads.  If we remove it, many people will have to decide
between running with fsync=off and abandoning PostgreSQL altogether.
That doesn't strike me as a remotely possible line of attack.


+1 for Robert here, removing async commit is a non-starter. It is 
PostgreSQL performance 101 that you disable synchronous commit unless 
you have a specific data/business requirement that needs it. 
Specifically because of how much faster Pg is with async commit.


Sincerely,

jD







--
Command Prompt, Inc.  http://the.postgres.company/
+1-503-667-4564
PostgreSQL Centered full stack support, consulting and development.
Everyone appreciates your honesty, until you are honest with them.
Unless otherwise stated, opinions are my own.


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Proposal for CSN based snapshots

2016-08-10 Thread Robert Haas
On Wed, Aug 10, 2016 at 11:09 AM, Heikki Linnakangas  wrote:
>> Still, having to invent CSNs seems like a huge loss for this design.
>> Personally I'd give up async commit first.  If we had only sync commit,
>> the rule could be "xact LSN less than snapshot threshold and less than
>> WAL flush position", and we'd not need CSNs.  I know some people like
>> async commit, but it's there because it was easy and cheap in our old
>> design, not because it's the world's greatest feature and worth giving
>> up performance for.
>
> I don't think that's a very popular opinion (I disagree, for one).
> Asynchronous commits are a huge performance boost for some applications. The
> alternative is fsync=off, and I don't want to see more people doing that.
> SSDs have made the penalty of an fsync much smaller, but it's still there.

Uh, yeah.  Asynchronous commit can be 100 times faster on some
realistic workloads.  If we remove it, many people will have to decide
between running with fsync=off and abandoning PostgreSQL altogether.
That doesn't strike me as a remotely possible line of attack.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Proposal for CSN based snapshots

2016-08-10 Thread Heikki Linnakangas

On 08/10/2016 05:51 PM, Tom Lane wrote:

Heikki Linnakangas  writes:

On 08/10/2016 05:09 PM, Tom Lane wrote:

Uh, what?  That's not the semantics we have today, and I don't see why
it's necessary or a good idea.  Once the commit is in the WAL stream,
any action taken on the basis of seeing the commit must be later in
the WAL stream.  So what's the problem?



I was talking about synchronous commits in the above. A synchronous
commit is not made visible to other transactions, until the commit WAL
record is flushed to disk.


[ thinks for a bit... ]  Oh, OK, that's because we don't treat a
transaction as committed until its clog bit is set *and* it's not
marked as running in the PGPROC array.  And sync transactions will
flush WAL in between.


Right.


Still, having to invent CSNs seems like a huge loss for this design.
Personally I'd give up async commit first.  If we had only sync commit,
the rule could be "xact LSN less than snapshot threshold and less than
WAL flush position", and we'd not need CSNs.  I know some people like
async commit, but it's there because it was easy and cheap in our old
design, not because it's the world's greatest feature and worth giving
up performance for.


I don't think that's a very popular opinion (I disagree, for one). 
Asynchronous commits are a huge performance boost for some applications. 
The alternative is fsync=off, and I don't want to see more people doing 
that. SSDs have made the penalty of an fsync much smaller, but it's 
still there.


Hmm. There's one more possible way this could all work. Let's have CSN 
== LSN, also for asynchronous commits. A snapshot is the current insert 
position, but also make note of the current flush position, when you 
take a snapshot. Now, when you use the snapshot, if you ever see an XID 
that committed between the snapshot's insert position and the flush 
position, wait for the WAL to be flushed up to the snapshot's insert 
position at that point. With that scheme, an asynchronous commit could 
return to the application without waiting for a flush, but if someone 
actually looks at the changes the transaction made, then that 
transaction would have to wait. Furthermore, we could probably skip that 
waiting too, if the reading transaction is also using 
synchronous_commit=off.


That's slightly different from the current behaviour. A transaction that 
runs with synchronous_commit=on, and reads data that was modified by an 
asynchronous transaction, would take a hit. But I think that would be 
acceptable.


- Heikki



--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Proposal for CSN based snapshots

2016-08-10 Thread Tom Lane
Heikki Linnakangas  writes:
> On 08/10/2016 05:09 PM, Tom Lane wrote:
>> Uh, what?  That's not the semantics we have today, and I don't see why
>> it's necessary or a good idea.  Once the commit is in the WAL stream,
>> any action taken on the basis of seeing the commit must be later in
>> the WAL stream.  So what's the problem?

> I was talking about synchronous commits in the above. A synchronous 
> commit is not made visible to other transactions, until the commit WAL 
> record is flushed to disk.

[ thinks for a bit... ]  Oh, OK, that's because we don't treat a
transaction as committed until its clog bit is set *and* it's not
marked as running in the PGPROC array.  And sync transactions will
flush WAL in between.

Still, having to invent CSNs seems like a huge loss for this design.
Personally I'd give up async commit first.  If we had only sync commit,
the rule could be "xact LSN less than snapshot threshold and less than
WAL flush position", and we'd not need CSNs.  I know some people like
async commit, but it's there because it was easy and cheap in our old
design, not because it's the world's greatest feature and worth giving
up performance for.

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Proposal for CSN based snapshots

2016-08-10 Thread Heikki Linnakangas

On 08/10/2016 05:09 PM, Tom Lane wrote:

Heikki Linnakangas  writes:

Imagine that you have a stream of normal, synchronous, commits. They get
assigned LSNs: 1, 2, 3, 4. They become visible to other transactions in
that order.



The way I described this scheme in the first emails on this thread, was
to use the current WAL insertion position as the snapshot. That's not
correct, though: you have to use the current WAL *flush* position as the
snapshot. Otherwise you would see the results of a transaction that
hasn't been flushed to disk yet, i.e. which might still get lost, if you
pull the power plug before the flush happens. So you have to use the
last flush position as the snapshot.


Uh, what?  That's not the semantics we have today, and I don't see why
it's necessary or a good idea.  Once the commit is in the WAL stream,
any action taken on the basis of seeing the commit must be later in
the WAL stream.  So what's the problem?


I was talking about synchronous commits in the above. A synchronous 
commit is not made visible to other transactions, until the commit WAL 
record is flushed to disk.


You could argue that that doesn't need to be so, because indeed any 
action taken on the basis of seeing the commit must be later in the WAL 
stream. But that's what asynchronous commits are for. For synchronous 
commits, we have a higher standard.


- Heikki



--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Proposal for CSN based snapshots

2016-08-10 Thread Tom Lane
Heikki Linnakangas  writes:
> Imagine that you have a stream of normal, synchronous, commits. They get 
> assigned LSNs: 1, 2, 3, 4. They become visible to other transactions in 
> that order.

> The way I described this scheme in the first emails on this thread, was 
> to use the current WAL insertion position as the snapshot. That's not 
> correct, though: you have to use the current WAL *flush* position as the 
> snapshot. Otherwise you would see the results of a transaction that 
> hasn't been flushed to disk yet, i.e. which might still get lost, if you 
> pull the power plug before the flush happens. So you have to use the 
> last flush position as the snapshot.

Uh, what?  That's not the semantics we have today, and I don't see why
it's necessary or a good idea.  Once the commit is in the WAL stream,
any action taken on the basis of seeing the commit must be later in
the WAL stream.  So what's the problem?

> Now, if you do an asynchronous commit, the effects of that should become 
> visible immediately, without waiting for the next flush. You can't do 
> that, if its CSN == LSN.

This distinction is completely arbitrary, and unlike the way it works now.

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Proposal for CSN based snapshots

2016-08-10 Thread Heikki Linnakangas

On 08/10/2016 04:34 PM, Alexander Korotkov wrote:

On Tue, Aug 9, 2016 at 3:16 PM, Heikki Linnakangas  wrote:


I switched to using a separate counter for CSNs. CSN is no longer the same
as the commit WAL record's LSN. While I liked the conceptual simplicity of
CSN == LSN a lot, and the fact that the standby would see the same commit
order as master, I couldn't figure out how to make async commits to work.


I didn't get async commits problem at first glance.  AFAICS, the difference
between sync commit and async is only that async commit doesn't wait WAL
log to flush.  But async commit still receives LSN.
Could you describe it in more details?


Imagine that you have a stream of normal, synchronous, commits. They get 
assigned LSNs: 1, 2, 3, 4. They become visible to other transactions in 
that order.


The way I described this scheme in the first emails on this thread, was 
to use the current WAL insertion position as the snapshot. That's not 
correct, though: you have to use the current WAL *flush* position as the 
snapshot. Otherwise you would see the results of a transaction that 
hasn't been flushed to disk yet, i.e. which might still get lost, if you 
pull the power plug before the flush happens. So you have to use the 
last flush position as the snapshot.


Now, if you do an asynchronous commit, the effects of that should become 
visible immediately, without waiting for the next flush. You can't do 
that, if its CSN == LSN.


Perhaps you could make an exception for async commits, so that the CSN 
of an async commit is not the commit record's LSN, but the current WAL 
flush position, so that it becomes visible to others immediately. But 
then, how do you distinguish two async transactions that commit roughly 
at the same time, so that the flush position at time of commit is the 
same for both? And now you've given up on the CSN == LSN property, for 
async commits, anyway.


- Heikki



--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Proposal for CSN based snapshots

2016-08-10 Thread Alexander Korotkov
On Tue, Aug 9, 2016 at 3:16 PM, Heikki Linnakangas  wrote:

> (Reviving an old thread)
>
> I spent some time dusting off this old patch, to implement CSN snapshots.
> Attached is a new patch, rebased over current master, and with tons of
> comments etc. cleaned up. There's more work to be done here, I'm posting
> this to let people know I'm working on this again. And to have a backup on
> the 'net :-).
>

Great! It's very nice seeing that you're working on this patch.
After PGCON I've your patch rebased to the master, but you already did much
more.


> I switched to using a separate counter for CSNs. CSN is no longer the same
> as the commit WAL record's LSN. While I liked the conceptual simplicity of
> CSN == LSN a lot, and the fact that the standby would see the same commit
> order as master, I couldn't figure out how to make async commits to work.
>

I didn't get async commits problem at first glance.  AFAICS, the difference
between sync commit and async is only that async commit doesn't wait WAL
log to flush.  But async commit still receives LSN.
Could you describe it in more details?

--
Alexander Korotkov
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company


Re: [HACKERS] Proposal for CSN based snapshots

2016-08-10 Thread Alexander Korotkov
On Wed, Aug 10, 2016 at 2:10 PM, Heikki Linnakangas  wrote:

> Yeah, if the csnlog access turns out to be too expensive, we could do
> something like this. In theory, you can always convert a CSN snapshot into
> an old-style list of XIDs, by scanning the csnlog between the xmin and
> xmax. That could be expensive if the distance between xmin and xmax is
> large, of course. But as you said, you can have various hybrid forms, where
> you use a list of XIDs of some range as a cache, for example.
>
> I'm hopeful that we can simply make the csnlog access fast enough, though.
> Looking up an XID in a sorted array is O(log n), while looking up an XID in
> the csnlog is O(1). That ignores all locking and different constant
> factors, of course, but it's not a given that accessing the csnlog has to
> be slower than a binary search of an XID array.


FYI, I'm still fan of idea to rewrite XID with CSN in tuple in the same way
we're writing hint bits now.
I'm going to make prototype of this approach which would be enough for
performance measurements.

--
Alexander Korotkov
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company


Re: [HACKERS] Proposal for CSN based snapshots

2016-08-10 Thread Heikki Linnakangas

On 08/10/2016 01:03 PM, Ants Aasma wrote:

On Tue, Aug 9, 2016 at 3:16 PM, Heikki Linnakangas  wrote:

(Reviving an old thread)

I spent some time dusting off this old patch, to implement CSN snapshots.
Attached is a new patch, rebased over current master, and with tons of
comments etc. cleaned up. There's more work to be done here, I'm posting
this to let people know I'm working on this again. And to have a backup on
the 'net :-).


Great to hear. I hope to find some time to review this. In the
meanwhile here is a brain dump of my thoughts on the subject.


Thanks!


* Performance testing. Clearly this should have a performance benefit, at
least under some workloads, to be worthwhile. And not regress.


I guess this is the main question. GetSnapshotData is probably going
to be faster, but the second important benefit that I saw was the
possibility of reducing locking around common activities. This of
course needs to guided by profiling - evils of premature optimization
and all that.


Yep.

For the record, one big reason I'm excited about this is that a snapshot 
can be represented as small fixed-size value. And the reason that's 
exciting is that it makes it more feasible for each backend to publish 
their snapshots, which in turn makes it possible to vacuum old tuples 
more aggressively. In particular, if you have one very long-running 
transaction (think pg_dump), and a stream of transactions, you could 
vacuum away tuples that were inserted after the long-running transaction 
started and deleted before any of the shorter transactions started. You 
could do that without CSN snapshots, but it's a lot simpler with them. 
But that's a separate story.



But that said, my gut tells me that the most important spots are:

* Looking up CSNs for XidVisibleInSnashot. Current version gets a
shared lock on CSNLogControlLock. That might get nasty, for example
when two concurrent sequential scans running on different cores or
sockets hit a patch of transactions within their xmin-xmax interval.


I'm not too worried about that particular case. Shared LWLocks scale 
pretty well these days, thanks to Andres Freund's work in this area. A 
mix of updates and reads on the csnlog might become a scalability 
problem though. We'll find out once we start testing.



* GetSnapshotData. With tps on read only queries potentially pushing
into the millions, it would be awfully nice if contending on a single
cache line could be avoided. Currently it acquires ProcArrayLock and
CommitSeqNoLock to atomically update MyPgXact->snapshotcsn. The
pattern I had in mind to make this lock free would be to read a CSN,
publish value, check against latest OldestGlobalSnapshot value and
loop if necessary, and on the other side, calculate optimistic oldest
snapshot, publish and then recheck, if necessary, republish.


Yeah, I haven't spent any serious effort into optimizing this yet. For 
the final version, should definitely do something like that.



* While commits are not as performance critical it may still be
beneficial to defer clog updates and perform them in larger batches to
reduce clog lock traffic.


Hmm. I doubt batching them is going to help much. But it should be 
straightforward to use atomic operations here too, to reduce locking.



I think I have achieved some clarity on my idea of snapshots that
migrate between being CSN and XID based. The essential problem with
old CSN based snapshots is that the CSN log for their xmin-xmax
interval needs to be kept around in a quick to access datastructure.
And the problem with long running transactions is twofold, first is
that they need a well defined location where to publish the visibility
info for their xids and secondly, they enlarge the xmin-xmax interval
of all concurrent snapshots, needing a potentially huge amount of CSN
log to be kept around. One way to deal with it is to just accept it
and use a SLRU as in this patch.

My new insight is that a snapshot doesn't need to be either-or CSN or
XIP (xid in progress) array based, but it can also be a hybrid. There
would be a sorted array of in progress xids and a non-overlapping CSN
xmin-xmax interval where CSN log needs to be consulted. As snapshots
age they scan the CSN log and incrementally build their XIP array,
basically lazily constructing same data structure used in snapshots
today. Old in progress xids need a slot for themselves to be kept
around, but all new snapshots being taken would immediately classify
them as old, store them in XIP and not include them in their CSN xmin.
Under this scheme the amount of CSN log strictly needed is a
reasonably sized ring buffer for recent xids (probably sized based on
max conn), a sparse map for long transactions (bounded by max conn)
and some slack for snapshots still being converted. A SLRU or similar
is still needed because there is no way to ensure timely conversion of
all snapshots, but by properly timing the notification the probability
of actually using it should be 

Re: [HACKERS] Proposal for CSN based snapshots

2016-08-10 Thread Ants Aasma
On Tue, Aug 9, 2016 at 3:16 PM, Heikki Linnakangas  wrote:
> (Reviving an old thread)
>
> I spent some time dusting off this old patch, to implement CSN snapshots.
> Attached is a new patch, rebased over current master, and with tons of
> comments etc. cleaned up. There's more work to be done here, I'm posting
> this to let people know I'm working on this again. And to have a backup on
> the 'net :-).

Great to hear. I hope to find some time to review this. In the
meanwhile here is a brain dump of my thoughts on the subject.

> I switched to using a separate counter for CSNs. CSN is no longer the same
> as the commit WAL record's LSN. While I liked the conceptual simplicity of
> CSN == LSN a lot, and the fact that the standby would see the same commit
> order as master, I couldn't figure out how to make async commits to work.

I had an idea how that would work. In short lastCommitSeqNo would be a
vector clock. Each tier of synchronization requiring different amount
of waiting between xlog insert and visibility would get a position in
the vector. Commits then need to be ordered only with respect to other
commits within the same tier. The vector position also needs to be
stored in the CSN log, probably by appropriating a couple of bits from
the CSN value.

I'm not sure if doing it this way would be worth the complexity, but
as far as I can see it would work. However I think keeping CSN and LSN
separate is a good idea, as it keeps the door open to using an
external source for the CSN value, enabling stuff like cheap multi
master globally consistent snapshots.

> Next steps:
>
> * Hot standby feedback is broken, now that CSN != LSN again. Will have to
> switch this back to using an "oldest XID", rather than a CSN.
>
> * I plan to replace pg_subtrans with a special range of CSNs in the csnlog.
> Something like, start the CSN counter at 2^32 + 1, and use CSNs < 2^32 to
> mean "this is a subtransaction, parent is XXX". One less SLRU to maintain.

That's a great idea. I had a similar idea to enable lock-free writing
into data structures, have 2^32 COMMITSEQNO_INPROGRESS values that
embed the xid owning the slot. This way a writer can just blindly CAS
into data structures without worrying about overwriting useful stuff
due to race conditions.

> * Put per-proc xmin back into procarray. I removed it, because it's not
> necessary for snapshots or GetOldestSnapshot() (which replaces
> GetOldestXmin()) anymore. But on second thoughts, we still need it for
> deciding when it's safe to truncate the csnlog.
>
> * In this patch, HeapTupleSatisfiesVacuum() is rewritten to use an "oldest
> CSN", instead of "oldest xmin", but that's not strictly necessary. To limit
> the size of the patch, I might revert those changes for now.
>
> * Rewrite the way RecentGlobalXmin is updated. As Alvaro pointed out in his
> review comments two years ago, that was quite complicated. And I'm worried
> that the lazy scheme I had might not allow pruning fast enough. I plan to
> make it more aggressive, so that whenever the currently oldest transaction
> finishes, it's responsible for advancing the "global xmin" in shared memory.
> And the way it does that, is by scanning the csnlog, starting from the
> current "global xmin", until the next still in-progress XID. That could be a
> lot, if you have a very long-running transaction that ends, but we'll see
> how it performs.

I had a similar idea. This would be greatly improved by a hybrid
snapshot scheme that could make the tail end of the CSN log a lot
sparser. More on that below.

> * Performance testing. Clearly this should have a performance benefit, at
> least under some workloads, to be worthwhile. And not regress.

I guess this is the main question. GetSnapshotData is probably going
to be faster, but the second important benefit that I saw was the
possibility of reducing locking around common activities. This of
course needs to guided by profiling - evils of premature optimization
and all that. But that said, my gut tells me that the most important
spots are:

* Looking up CSNs for XidVisibleInSnashot. Current version gets a
shared lock on CSNLogControlLock. That might get nasty, for example
when two concurrent sequential scans running on different cores or
sockets hit a patch of transactions within their xmin-xmax interval.

* GetSnapshotData. With tps on read only queries potentially pushing
into the millions, it would be awfully nice if contending on a single
cache line could be avoided. Currently it acquires ProcArrayLock and
CommitSeqNoLock to atomically update MyPgXact->snapshotcsn. The
pattern I had in mind to make this lock free would be to read a CSN,
publish value, check against latest OldestGlobalSnapshot value and
loop if necessary, and on the other side, calculate optimistic oldest
snapshot, publish and then recheck, if necessary, republish.

* While commits are not as performance critical it may still be
beneficial to defer clog updates and perform them in 

Re: [HACKERS] Proposal for CSN based snapshots

2015-07-27 Thread Alexander Korotkov
On Sat, Jul 25, 2015 at 11:39 AM, Simon Riggs si...@2ndquadrant.com wrote:

 On 24 July 2015 at 19:21, Robert Haas robertmh...@gmail.com wrote:

 On Fri, Jul 24, 2015 at 1:00 PM, Simon Riggs si...@2ndquadrant.com
 wrote:
  It depends on the exact design we use to get that. Certainly we do not
 want
  them if they cause a significant performance regression.

 Yeah.  I think the performance worries expressed so far are:

 - Currently, if you see an XID that is between the XMIN and XMAX of
 the snapshot, you hit CLOG only on first access.  After that, the
 tuple is hinted.  With this approach, the hint bit doesn't avoid
 needing to hit CLOG anymore, because it's not enough to know whether
 or not the tuple committed; you have to know the CSN at which it
 committed, which means you have to look that up in CLOG (or whatever
 SLRU stores this data).  Heikki mentioned adding some caching to
 ameliorate this problem, but it sounds like he was worried that the
 impact might still be significant.


 This seems like the heart of the problem. Changing a snapshot from a list
 of xids into one number is easy. Making XidInMVCCSnapshot() work is the
 hard part because there needs to be a translation/lookup from CSN to
 determine if it contains the xid.

 That turns CSN into a reference to a cached snapshot, or a reference by
 which a snapshot can be derived on demand.


I got the problem. Currently, once we set hint bits don't have to visit
CLOG anymore. With CSN snapshots that is not so. We still have to translate
XID into CSN in order to compare it with snapshot CSN. In version of CSN
patch in this thread we still have XMIN and XMAX in the snapshot. AFAICS
with CSN snapshots XMIN and XMAX are not necessary required to express
snapshot, they were kept for optimization. That restricts usage of XID =
CSN map with given range of XIDs. However, with long running transactions
[XMIN; XMAX] range could be very wide and we could use XID = CSN map
heavily in wide range of XIDs.

As I can see in Huawei PGCon talk Dense Map in shared memory is proposed
for XID = CSN transformation. Having large enough Dense Map we can do
most of XID = CSN transformations with single shared memory access. PGCon
talk gives us result of pgbench. However, pgbench doesn't run any long
transactions. With long running transaction we can run out of Dense Map
for significant part of XID = CSN transformations. Dilip, did you test
your patch with long transactions?

I'm also thinking about different option for optimizing this. When we set
hint bits we can also change XMIN/XMAX with CSN. In this case we wouldn't
need to do XID = CSN transformation for this tuple anymore. However, we
have 32-bit XIDs for now. We could also have 32-bit CSNs. However, that
would doubles our troubles with wraparound: we will have 2 counters that
could wraparound. That could return us to thoughts about 64-bit XIDs.

--
Alexander Korotkov
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company


Re: [HACKERS] Proposal for CSN based snapshots

2015-07-25 Thread Simon Riggs
On 24 July 2015 at 19:21, Robert Haas robertmh...@gmail.com wrote:

 On Fri, Jul 24, 2015 at 1:00 PM, Simon Riggs si...@2ndquadrant.com
 wrote:
  It depends on the exact design we use to get that. Certainly we do not
 want
  them if they cause a significant performance regression.

 Yeah.  I think the performance worries expressed so far are:

 - Currently, if you see an XID that is between the XMIN and XMAX of
 the snapshot, you hit CLOG only on first access.  After that, the
 tuple is hinted.  With this approach, the hint bit doesn't avoid
 needing to hit CLOG anymore, because it's not enough to know whether
 or not the tuple committed; you have to know the CSN at which it
 committed, which means you have to look that up in CLOG (or whatever
 SLRU stores this data).  Heikki mentioned adding some caching to
 ameliorate this problem, but it sounds like he was worried that the
 impact might still be significant.


This seems like the heart of the problem. Changing a snapshot from a list
of xids into one number is easy. Making XidInMVCCSnapshot() work is the
hard part because there needs to be a translation/lookup from CSN to
determine if it contains the xid.

That turns CSN into a reference to a cached snapshot, or a reference by
which a snapshot can be derived on demand.


 - Mixing synchronous_commit=on and synchronous_commit=off won't work
 as well, because if the LSN ordering of commit records matches the
 order in which transactions become visible, then an async-commit
 transaction can't become visible before a later sync-commit
 transaction.  I expect we might just decide we can live with this, but
 it's worth discussing in case people feel otherwise.


Using the Commit LSN as CSN seems interesting, but it is not the only
choice.

Commit LSN is not the precise order in which commits become visible because
of the race condition between marking commit in WAL and marking commit in
clog. That problem is accentuated by mixing async and sync, but that is not
the only source of racing.

-- 
Simon Riggshttp://www.2ndQuadrant.com/
http://www.2ndquadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training  Services


Re: [HACKERS] Proposal for CSN based snapshots

2015-07-24 Thread Robert Haas
On Fri, Jul 24, 2015 at 1:00 PM, Simon Riggs si...@2ndquadrant.com wrote:
 It depends on the exact design we use to get that. Certainly we do not want
 them if they cause a significant performance regression.

Yeah.  I think the performance worries expressed so far are:

- Currently, if you see an XID that is between the XMIN and XMAX of
the snapshot, you hit CLOG only on first access.  After that, the
tuple is hinted.  With this approach, the hint bit doesn't avoid
needing to hit CLOG anymore, because it's not enough to know whether
or not the tuple committed; you have to know the CSN at which it
committed, which means you have to look that up in CLOG (or whatever
SLRU stores this data).  Heikki mentioned adding some caching to
ameliorate this problem, but it sounds like he was worried that the
impact might still be significant.

- Mixing synchronous_commit=on and synchronous_commit=off won't work
as well, because if the LSN ordering of commit records matches the
order in which transactions become visible, then an async-commit
transaction can't become visible before a later sync-commit
transaction.  I expect we might just decide we can live with this, but
it's worth discussing in case people feel otherwise.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Proposal for CSN based snapshots

2015-07-24 Thread Simon Riggs
On 24 July 2015 at 14:43, Alexander Korotkov a.korot...@postgrespro.ru
wrote:


 What is current state of this patch? Does community want CSN based
 snapshots?


CSN snapshots could give us

1. More scalable GetSnapshotData() calls
2. Faster GetSnapshotData() calls
3. Smaller snapshots which could be passed around more usefully between
distributed servers

It depends on the exact design we use to get that. Certainly we do not want
them if they cause a significant performance regression.


 Now, our company have resources to work on CSN based snapshots for 9.6. If
 Huawei will not publish their patch, we can pick up this work from last
 community version of this patch.


Sounds good.

-- 
Simon Riggshttp://www.2ndQuadrant.com/
http://www.2ndquadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training  Services


Re: [HACKERS] Proposal for CSN based snapshots

2015-07-24 Thread Andres Freund
On 2015-07-24 14:21:39 -0400, Robert Haas wrote:
 - Mixing synchronous_commit=on and synchronous_commit=off won't work
 as well, because if the LSN ordering of commit records matches the
 order in which transactions become visible, then an async-commit
 transaction can't become visible before a later sync-commit
 transaction.  I expect we might just decide we can live with this, but
 it's worth discussing in case people feel otherwise.

I'm not following this anymore. Even though a couple months back I
apparently did. Heikki's description of the problem was:

On 2014-05-30 17:59:23 +0300, Heikki Linnakangas wrote:
 One thorny issue came up in discussions with other hackers on this in PGCon:
 
 When a transaction is committed asynchronously, it becomes visible to other
 backends before the commit WAL record is flushed. With CSN-based snapshots,
 the order that transactions become visible is always based on the LSNs of
 the WAL records. This is a problem when there is a mix of synchronous and
 asynchronous commits:
 
 If transaction A commits synchronously with commit LSN 1, and transaction B
 commits asynchronously with commit LSN 2, B cannot become visible before A.
 And we cannot acknowledge B as committed to the client until it's visible to
 other transactions. That means that B will have to wait for A's commit
 record to be flushed to disk, before it can return, even though it was an
 asynchronous commit.


Afacs the xlog insertion order isn't changed relevantly by the patch?

Right now the order for sync commit = on is:

XactLogCommitRecord();
XLogFlush();
TransactionIdCommitTree();
ProcArrayEndTransaction();

and for sync commit = off it's:
XactLogCommitRecord();
TransactionIdAsyncCommitTree();
ProcArrayEndTransaction();

with the CSN patch it's:

sync on:
TransactionIdSetCommitting();
XactLogCommitRecord();
XLogFlush(XactLastRecEnd);
TransactionIdCommitTree();

sync off:
TransactionIdSetCommitting();
XactLogCommitRecord();
TransactionIdCommitTree();

How does this cause the above described problem? Yes, mixing sync/async
commits can give somewhat weird visibility semantics, but that's not
really new, is it?

Regards,

Andres


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Proposal for CSN based snapshots

2015-07-24 Thread Alexander Korotkov
On Wed, Aug 27, 2014 at 10:42 AM, Heikki Linnakangas 
hlinnakan...@vmware.com wrote:

 On 08/27/2014 09:40 AM, Heikki Linnakangas wrote:

 On 08/27/2014 08:23 AM, Jeff Davis wrote:

 On Tue, 2014-08-26 at 13:45 +0300, Heikki Linnakangas wrote:

 Yeah. This patch in the current state is likely much much slower than
 unpatched master, except in extreme cases where you have thousands of
 connections and short transactions so that without the patch, you spend
 most of the time acquiring snapshots.


 What else are you looking to accomplish with this patch during this
 'fest? Bug finding? Design review? Performance testing?


 Design review, mostly. I know the performance still sucks. Although if
 you can foresee some performance problems, aside from the extra CSNLOG
 lookups, it would be good to know.


 I think for this commitfest, I've gotten as much review of this patch that
 I can hope for. Marking as Returned with Feedback. But of course, feel
 free to continue reviewing and commenting ;-).


What is current state of this patch? Does community want CSN based
snapshots?
Last email in lists was in August 2014. Huawei did talk about their further
research on this idea at PGCon and promised to publish their patch in open
source.
http://www.pgcon.org/2015/schedule/events/810.en.html
However, it isn't published yet, and we don't know how long we could wait
for it.
Now, our company have resources to work on CSN based snapshots for 9.6. If
Huawei will not publish their patch, we can pick up this work from last
community version of this patch.

--
Alexander Korotkov
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company


Re: [HACKERS] Proposal for CSN based snapshots

2014-08-27 Thread Heikki Linnakangas

On 08/27/2014 08:23 AM, Jeff Davis wrote:

On Tue, 2014-08-26 at 13:45 +0300, Heikki Linnakangas wrote:

Yeah. This patch in the current state is likely much much slower than
unpatched master, except in extreme cases where you have thousands of
connections and short transactions so that without the patch, you spend
most of the time acquiring snapshots.


What else are you looking to accomplish with this patch during this
'fest? Bug finding? Design review? Performance testing?


Design review, mostly. I know the performance still sucks. Although if 
you can foresee some performance problems, aside from the extra CSNLOG 
lookups, it would be good to know.



I think there's already at least one design issue to consider, which is
whether we care about CLOG/CSNLOG access for hinted records where the
xid  snapshot-xmin (that is, accesses that previously would have
looked at xip). Would more discussion help here or do we need to wait
for performance numbers?


I think my current plan is to try to make that CSNLOG lookup fast. In 
essence, rewrite SLRUs to be more performant. That would help with the 
current clog, too, which would make it more feasible to set hint bits 
less often. In particular, avoid dirtying pages just to set hint bits. 
I'm not sure if that's enough - you can't beat checking a single hint 
bit in the same cache line as the XID - but I think it's worth a try.


- Heikki



--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Proposal for CSN based snapshots

2014-08-27 Thread Heikki Linnakangas

On 08/27/2014 09:40 AM, Heikki Linnakangas wrote:

On 08/27/2014 08:23 AM, Jeff Davis wrote:

On Tue, 2014-08-26 at 13:45 +0300, Heikki Linnakangas wrote:

Yeah. This patch in the current state is likely much much slower than
unpatched master, except in extreme cases where you have thousands of
connections and short transactions so that without the patch, you spend
most of the time acquiring snapshots.


What else are you looking to accomplish with this patch during this
'fest? Bug finding? Design review? Performance testing?


Design review, mostly. I know the performance still sucks. Although if
you can foresee some performance problems, aside from the extra CSNLOG
lookups, it would be good to know.


I think for this commitfest, I've gotten as much review of this patch 
that I can hope for. Marking as Returned with Feedback. But of course, 
feel free to continue reviewing and commenting ;-).


- Heikki



--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Proposal for CSN based snapshots

2014-08-26 Thread Jeff Davis
On Fri, 2014-06-13 at 13:24 +0300, Heikki Linnakangas wrote:
 Attached is a new patch. It now keeps the current pg_clog unchanged, but 
 adds a new pg_csnlog besides it. pg_csnlog is more similar to 
 pg_subtrans than pg_clog: it's not WAL-logged, is reset at startup, and 
 segments older than GlobalXmin can be truncated.

It appears that this patch weakens the idea of hint bits. Even if
HEAP_XMIN_COMMITTED is set, it still needs to find out if it's in the
snapshot.

That's fast if the xid is less than snap-xmin, but otherwise it needs
to do a fetch from the csnlog, which is exactly what the hint bits are
designed to avoid. And we can't get around this, because the whole point
of this patch is to remove the xip array from the snapshot.

If the transaction was committed a long time ago, then we could set
PD_ALL_VISIBLE and the VM bit, and a scan wouldn't even look at the hint
bit. If it was committed recently, then it's probably greater than the
recentXmin. I think there's still room for a hint bit to technically be
useful, but it seems quite narrow unless I'm missing something (and a
narrowly-useful hint bit doesn't seem to be useful at all).

I'm not complaining, and I hope this is not a showstopper for this
patch, but I think it's worth discussing.

Regards,
Jeff Davis




-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Proposal for CSN based snapshots

2014-08-26 Thread Heikki Linnakangas

On 08/26/2014 12:03 PM, Jeff Davis wrote:

On Fri, 2014-06-13 at 13:24 +0300, Heikki Linnakangas wrote:

Attached is a new patch. It now keeps the current pg_clog unchanged, but
adds a new pg_csnlog besides it. pg_csnlog is more similar to
pg_subtrans than pg_clog: it's not WAL-logged, is reset at startup, and
segments older than GlobalXmin can be truncated.


It appears that this patch weakens the idea of hint bits. Even if
HEAP_XMIN_COMMITTED is set, it still needs to find out if it's in the
snapshot.

That's fast if the xid is less than snap-xmin, but otherwise it needs
to do a fetch from the csnlog, which is exactly what the hint bits are
designed to avoid. And we can't get around this, because the whole point
of this patch is to remove the xip array from the snapshot.


Yeah. This patch in the current state is likely much much slower than 
unpatched master, except in extreme cases where you have thousands of 
connections and short transactions so that without the patch, you spend 
most of the time acquiring snapshots.


My current thinking is that access to the csnlog will need to be made 
faster. Currently, it's just another SLRU, but I'm sure we can do better 
than that. Or add a backend-private cache on top of it; that might 
already alleviate it enough..


- Heikki



--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Proposal for CSN based snapshots

2014-08-26 Thread Jeff Davis
On Tue, 2014-08-26 at 13:45 +0300, Heikki Linnakangas wrote:
 My current thinking is that access to the csnlog will need to be made 
 faster. Currently, it's just another SLRU, but I'm sure we can do better 
 than that. Or add a backend-private cache on top of it; that might 
 already alleviate it enough..

Aren't those the same ideas that have been proposed for eliminating hint
bits?

If this patch makes taking snapshots much faster, then perhaps it's
enough incentive to follow through on one of those ideas. I am certainly
open to removing hint bits (which was probably clear from earlier
discussions) if there are significant wins elsewhere and the penalty is
not too great.

To continue the discussion on this patch, let's assume that we make
TransactionIdGetCommitLSN sufficiently fast. If that's the case, can't
we make HeapTupleSatisfiesMVCC look more like:

  LsnMin = TransactionIdGetCommitLSN(xmin);
  if (!COMMITLSN_IS_COMMITTED(LsnMin))
 LsnMin = BIG_LSN;

  LsnMin = TransactionIdGetCommitLSN(xmin);
  if (!COMMITLSN_IS_COMMITTED(LsnMin))
 LsnMin = BIG_LSN;

  return (snapshot-snapshotlsn = LsnMin 
  snapshot-snapshotlsn   LsnMax);

There would need to be some handling for locked tuples, or tuples
related to the current transaction, of course. But I still think it
would turn out simpler; perhaps by enough to save a few cycles.

Regards,
Jeff Davis






-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Proposal for CSN based snapshots

2014-08-26 Thread Greg Stark
On Tue, Aug 26, 2014 at 11:45 AM, Heikki Linnakangas
hlinnakan...@vmware.com wrote:
 It appears that this patch weakens the idea of hint bits. Even if
 HEAP_XMIN_COMMITTED is set, it still needs to find out if it's in the
 snapshot.

 That's fast if the xid is less than snap-xmin, but otherwise it needs
 to do a fetch from the csnlog, which is exactly what the hint bits are
 designed to avoid. And we can't get around this, because the whole point
 of this patch is to remove the xip array from the snapshot.


 Yeah. This patch in the current state is likely much much slower than
 unpatched master, except in extreme cases where you have thousands of
 connections and short transactions so that without the patch, you spend most
 of the time acquiring snapshots.


Interesting analysis.

I suppose the equivalent of hint bits would be to actually write the
CSN of the transaction into the record when the hint bit is set.

I don't immediately see how to make that practical. One thought would
be to have a list of xids in the page header with their corresponding
csn -- which starts to sound a lot like Oralce's Interested
Transaction List. But I don't see how to make that work for the
hundreds of possible xids on the page.

The worst case for visibility resolution is you have a narrow table
that has random access DDL happening all the time, each update is a
short transaction and there are a very high rate of such transactions
spread out uniformly over a very large table. That means any given
page has over 200 rows with random xids spread over a very large range
of xids.

Currently the invariant hint bits give us is that each xid needs to be
looked up in the clog only a more or less fixed number of times, in
that scenario only once since the table is very large and the
transactions short lived.

-- 
greg


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Proposal for CSN based snapshots

2014-08-26 Thread Jeff Davis
On Tue, 2014-08-26 at 19:25 +0100, Greg Stark wrote:
 I don't immediately see how to make that practical. One thought would
 be to have a list of xids in the page header with their corresponding
 csn -- which starts to sound a lot like Oralce's Interested
 Transaction List. But I don't see how to make that work for the
 hundreds of possible xids on the page.

I feel like that's moving in the wrong direction. That's still causing a
lot of modifications to a data page when the data is not changing, and
that's bad for a lot of cases that I'm interested in (checksums are one
example).

We are mixing two kinds of data: user data and visibility information.
Each is changed under different circumstances and has different
characteristics, and I'm beginning to think they shouldn't be mixed at
all.

What if we just devised a structure specially designed to hold
visibility information, put all of the visibility information there, and
data pages would only change where there is a real, user-initiated
I/U/D. Vacuum could still clear out dead tuples from the data area, but
it would do the rest of its work on the visibility structure. It could
even be a clever structure that could compress away large static areas
until they become active again.

Maybe this wouldn't work for all tables, but could be an option for big
tables with low update rates.

 The worst case for visibility resolution is you have a narrow table
 that has random access DDL happening all the time, each update is a
 short transaction and there are a very high rate of such transactions
 spread out uniformly over a very large table. That means any given
 page has over 200 rows with random xids spread over a very large range
 of xids.

That's not necessarily a bad case, unless the CLOG/CSNLOG lookup is a
significant fraction of the effort to update a tuple.

That would be a bad case if you introduce scans into the equation as
well, but that's not a problem if the all-visible bit is set.

 Currently the invariant hint bits give us is that each xid needs to be
 looked up in the clog only a more or less fixed number of times, in
 that scenario only once since the table is very large and the
 transactions short lived.

A backend-local cache might accomplish that, as well (would still need
to do a lookup, but no locks or contention). There would be some
challenges around invalidation (for xid wraparound) and pre-warming the
cache (so establishing a lot of connections doesn't cause a lot of CLOG
access).

Regards,
Jeff Davis




-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Proposal for CSN based snapshots

2014-08-26 Thread Greg Stark
On Tue, Aug 26, 2014 at 8:32 PM, Jeff Davis pg...@j-davis.com wrote:
 We are mixing two kinds of data: user data and visibility information.
 Each is changed under different circumstances and has different
 characteristics, and I'm beginning to think they shouldn't be mixed at
 all.

 What if we just devised a structure specially designed to hold
 visibility information, put all of the visibility information there, and
 data pages would only change where there is a real, user-initiated
 I/U/D. Vacuum could still clear out dead tuples from the data area, but
 it would do the rest of its work on the visibility structure. It could
 even be a clever structure that could compress away large static areas
 until they become active again.

Well fundamentally the reason the visibility information is with the
user data is so that we don't need to do two i/os to access the data.
The whole point of hint bits is to guarantee that after some amount of
time you can read data directly out of the heap page and use it
without doing any additional I/O.

-- 
greg


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Proposal for CSN based snapshots

2014-08-26 Thread Jeff Davis
On Tue, 2014-08-26 at 21:00 +0100, Greg Stark wrote:
 Well fundamentally the reason the visibility information is with the
 user data is so that we don't need to do two i/os to access the data.
 The whole point of hint bits is to guarantee that after some amount of
 time you can read data directly out of the heap page and use it
 without doing any additional I/O.

If the data is that static, then the visibility information would be
highly compressible, and surely in shared_buffers already.

(Yes, it would need to be pinned, which has a cost.)

Regards,
Jeff Davis




-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Proposal for CSN based snapshots

2014-08-26 Thread Amit Kapila
On Tue, Aug 26, 2014 at 11:55 PM, Greg Stark st...@mit.edu wrote:
 Interesting analysis.

 I suppose the equivalent of hint bits would be to actually write the
 CSN of the transaction into the record when the hint bit is set.

 I don't immediately see how to make that practical. One thought would
 be to have a list of xids in the page header with their corresponding
 csn -- which starts to sound a lot like Oralce's Interested
 Transaction List. But I don't see how to make that work for the
 hundreds of possible xids on the page.

Here we can limit the number of such entries (xid-csn pair) that
can exist in block header and incase it starts to overflow, it can
use some alternate location.  Another idea could be that reserve
the space whenever new transaction starts to operate in block
and update the CSN for the same later and incase the maximum
number of available slots gets filled up, we can either use alternate
location or may be block the new transaction until such a slot is
available, here I am assuming that Vacuum would clear the slots
in blocks as and when such transactions become visible to
everyone or may be the transaction which didn't find any slot can
also try to clear such slots.


With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com


Re: [HACKERS] Proposal for CSN based snapshots

2014-08-26 Thread Jeff Davis
On Tue, 2014-08-26 at 13:45 +0300, Heikki Linnakangas wrote:
 Yeah. This patch in the current state is likely much much slower than 
 unpatched master, except in extreme cases where you have thousands of 
 connections and short transactions so that without the patch, you spend 
 most of the time acquiring snapshots.

What else are you looking to accomplish with this patch during this
'fest? Bug finding? Design review? Performance testing?

I haven't had a good track record with my performance testing recently,
so I'm unlikely to be much help there.

It seems a bit early for bug hunting, unless you think it will raise
possible design issues.

I think there's already at least one design issue to consider, which is
whether we care about CLOG/CSNLOG access for hinted records where the
xid  snapshot-xmin (that is, accesses that previously would have
looked at xip). Would more discussion help here or do we need to wait
for performance numbers?

Regards,
Jeff Davis




-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Proposal for CSN based snapshots

2014-07-10 Thread Alvaro Herrera
Heikki Linnakangas wrote:

 Attached is a new patch. It now keeps the current pg_clog unchanged,
 but adds a new pg_csnlog besides it. pg_csnlog is more similar to
 pg_subtrans than pg_clog: it's not WAL-logged, is reset at startup,
 and segments older than GlobalXmin can be truncated.
 
 This addresses the disk space consumption, and simplifies pg_upgrade.
 
 There are no other significant changes in this new version, so it's
 still very much WIP. But please take a look!

Thanks.  I've been looking at this.  It needs a rebase; I had to apply
to commit 89cf2d52030 (Wed Jul 2 13:11:05 2014 -0400) because of
conflicts with the commit after that; it applies with some fuzz.  I read
the discussion in this thread and the README, to try to understand how
this is supposed to work.  It looks pretty good.

One thing not quite clear to me is the now-static RecentXmin,
GetRecentXmin(), AdvanceGlobalRecentXmin, and the like.  I found the
whole thing pretty confusing.  I noticed that ShmemVariableCache-recentXmin
is read without a lock in GetSnapshotData(), but exclusive ProcArrayLock
is taken to write to it in GetRecentXmin().  I think we can write it
without a lock also, since we assume that storing an xid is atomic.
With that change, I think you can make the acquisition of ProcArray Lock
to walk the pgproc list use shared mode, not exclusive.

Another point is that RecentXmin is no longer used directly, except in
TransactionIdIsActive().  But that routine is only used for an Assert()
now, so I think it's fine to just have GetRecentXmin() return the value
and not set RecentXmin as a side effect; my point is that ISTM
RecentXmin can be removed altogether, which makes that business simpler.


As far as GetOldestXmin() is concerned, that routine now ignores its
arguments.  Is that okay?  I imagine it's just a natural consequence of
how things work now.   [ ... reads some more code ... ]  Oh, now it's
only used to determine a freeze limit.  I think you should just remove
the arguments and make that whole thing simpler.  I was going to suggest
that AdvanceRecentGlobalXmin() could receive a possibly-NULL Relation
argument to pass down to GetOldestSnapshotGuts() which can make use of
it for a tigther limit, but since OldestXmin is only for freezing, maybe
there is no point in this extra complication.

Regarding the pendingGlobalXmin stuff, I didn't find any documentation.
I think you just need to write a very long comment on top of
AdvanceRecentGlobalXmin() to explain what it's doing.  After going over
the code half a dozen times I think I understand what's idea, and it
seems sound.

Not sure about the advancexmin_counter thing.  Maybe in some cases
sessions will only run 9 transactions or less and then finish, in which
case it will only get advanced at checkpoint time, which would be way
too seldom.  Maybe it would work to place the counter in shared memory:

acquire(spinlock);
if (ShmemVariableCache-counter++ = some_number)
{
   SI don't understand this:hmemVariableCache-counter = 0;
   do_update = true;
}
release(spinlock);
if (do_update)
   AdvanceRecentGlobalXmin();
(Maybe we can forgo the spinlock altogether?  If the value gets out of
hand once in a while, it shouldn't really matter)  Perhaps the some
number should be scaled to some fraction of MaxBackends, but not less
than X (32?) and no more than Y (128?).  Or maybe just use constant 10,
as the current code does.  But it'd be total number of transactions, not
transaction in the current backend.


I think it'd be cleaner to have a distinct typedef to use when
XLogRecPtr is used as a snapshot representation.  Right now there is no
clarity on whether we're interested in the xlog position itself or it's
a snapshot value.


HeapTupleSatisfiesVacuum[X]() and various callers needs update to their
comments: when OldestXmin is mentioned, it should be OldestSnapshot.

I noticed that SubTransGetTopmostTransaction() is now only called from
predicate.c, and it's pretty constrained in what it wants.  I'm not
implying that we want to do anything in your patch about it, other than
perhaps add a note to it that we may want to examine it later for
possible changes.

I haven't gone over the transam.c, clog.c changes yet.

I attach a couple of minor tweaks to the README, mostly for clarity (but
also update clog - csnlog in a couple of places).

-- 
Álvaro Herrerahttp://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training  Services
diff --git a/src/backend/access/transam/README b/src/backend/access/transam/README
index d11c817..23479b6 100644
--- a/src/backend/access/transam/README
+++ b/src/backend/access/transam/README
@@ -247,24 +247,24 @@ What we actually enforce is strict serialization of commits and rollbacks
 with snapshot-taking. We use the LSNs generated by Write-Ahead-Logging as
 a convenient monotonically increasing counter, to serialize commits with
 snapshots. Each commit is naturally assigned an LSN; it's the LSN of the
-commit WAL record. 

Re: [HACKERS] Proposal for CSN based snapshots

2014-06-30 Thread Simon Riggs
On 12 May 2014 17:14, Heikki Linnakangas hlinnakan...@vmware.com wrote:
 On 05/12/2014 06:26 PM, Andres Freund wrote:

 With the new commit-in-progress status in clog, we won't need the
 sub-committed clog status anymore. The commit-in-progress status will
 achieve the same thing.

 Wouldn't that cause many spurious waits? Because commit-in-progress
 needs to be waited on, but a sub-committed xact surely not?


 Ah, no. Even today, a subxid isn't marked as sub-committed, until you commit
 the top-level transaction. The sub-commit state is a very transient state
 during the commit process, used to make the commit of the sub-transactions
 and the top-level transaction appear atomic. The commit-in-progress state
 would be a similarly short-lived state. You mark the subxids and the top xid
 as commit-in-progress just before the XLogInsert() of the commit record, and
 you replace them with the real LSNs right after XLogInsert().

My understanding is that we aim to speed up the rate of Snapshot
reads. Which may make it feasible to store autonomous transactions in
shared memory, hopefully DSM.

The above mechanism sounds like it might slow down transaction commit.
Could we prototype that and measure the speed?

More generally, do we have any explicit performance goals or
acceptance criteria for this patch?

-- 
 Simon Riggs   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training  Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Proposal for CSN based snapshots

2014-06-19 Thread Amit Langote
Hi,

On Fri, Jun 13, 2014 at 7:24 PM, Heikki Linnakangas
hlinnakan...@vmware.com wrote:
 Yeah, that seems like a better design, after all.

 Attached is a new patch. It now keeps the current pg_clog unchanged, but
 adds a new pg_csnlog besides it. pg_csnlog is more similar to pg_subtrans
 than pg_clog: it's not WAL-logged, is reset at startup, and segments older
 than GlobalXmin can be truncated.

 This addresses the disk space consumption, and simplifies pg_upgrade.

 There are no other significant changes in this new version, so it's still
 very much WIP. But please take a look!


Thanks for working on this important patch. I know this patch is still
largely a WIP but I would like to report an observation.

I applied this patch and did a few pgbench runs with 32 clients (this
is on a not so powerful VM, by the way) . Perhaps you suspect such a
thing already but I observed a relatively larger percentage of time
being spent in XLogInsert().

Perhaps XLogCtlInsert.insertpos_lck contention via GetXLogInsertRecPtr()?

--
Amit


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Proposal for CSN based snapshots

2014-06-18 Thread Robert Haas
On Tue, Jun 17, 2014 at 9:00 PM, Craig Ringer cr...@2ndquadrant.com wrote:
 On 06/18/2014 12:41 AM, Robert Haas wrote:
 On Mon, Jun 16, 2014 at 12:58 AM, Craig Ringer cr...@2ndquadrant.com wrote:
  On 05/30/2014 11:14 PM, Heikki Linnakangas wrote:
  Yeah. To recap, the failure mode is that if the master crashes and
  restarts, the transaction becomes visible in the master even though it
  was never replicated.
 
  Wouldn't another pg_clog bit for the transaction be able to sort that out?
 How?

 A flag to indicate that the tx is locally committed but hasn't been
 confirmed by a streaming synchronous replica, so it must not become
 visible until the replica confirms it or SR is disabled.

 Then scan pg_clog on start / replica connect and ask the replica to
 confirm local commit for those tx's.

 No?

No.  Otherwise, one of those bits could get changed after a backend
takes a snapshot and before it finishes using it - so that the
transaction snapshot is in effect changing underneath it.  You could
avoid that by memorizing the contents of CLOG when taking a snapshot,
but that would defeat the whole purpose of CSN-based snapshots, which
is to make the small and fixed-size.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Proposal for CSN based snapshots

2014-06-18 Thread Craig Ringer
On 06/18/2014 09:12 PM, Robert Haas wrote:
 No.  Otherwise, one of those bits could get changed after a backend
 takes a snapshot and before it finishes using it - so that the
 transaction snapshot is in effect changing underneath it.  You could
 avoid that by memorizing the contents of CLOG when taking a snapshot,
 but that would defeat the whole purpose of CSN-based snapshots, which
 is to make the small and fixed-size.

Ah.

Thankyou. I appreciate the explanation.

-- 
 Craig Ringer   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training  Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Proposal for CSN based snapshots

2014-06-17 Thread Robert Haas
On Mon, Jun 16, 2014 at 12:58 AM, Craig Ringer cr...@2ndquadrant.com wrote:
 On 05/30/2014 11:14 PM, Heikki Linnakangas wrote:
 Yeah. To recap, the failure mode is that if the master crashes and
 restarts, the transaction becomes visible in the master even though it
 was never replicated.

 Wouldn't another pg_clog bit for the transaction be able to sort that out?

How?

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Proposal for CSN based snapshots

2014-06-17 Thread Craig Ringer
On 06/18/2014 12:41 AM, Robert Haas wrote:
 On Mon, Jun 16, 2014 at 12:58 AM, Craig Ringer cr...@2ndquadrant.com wrote:
  On 05/30/2014 11:14 PM, Heikki Linnakangas wrote:
  Yeah. To recap, the failure mode is that if the master crashes and
  restarts, the transaction becomes visible in the master even though it
  was never replicated.
 
  Wouldn't another pg_clog bit for the transaction be able to sort that out?
 How?

A flag to indicate that the tx is locally committed but hasn't been
confirmed by a streaming synchronous replica, so it must not become
visible until the replica confirms it or SR is disabled.

Then scan pg_clog on start / replica connect and ask the replica to
confirm local commit for those tx's.

No?

-- 
 Craig Ringer   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training  Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Proposal for CSN based snapshots

2014-06-15 Thread Craig Ringer
On 05/30/2014 11:14 PM, Heikki Linnakangas wrote:
 
 Yeah. To recap, the failure mode is that if the master crashes and
 restarts, the transaction becomes visible in the master even though it
 was never replicated.

Wouldn't another pg_clog bit for the transaction be able to sort that out?

-- 
 Craig Ringer   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training  Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Proposal for CSN based snapshots

2014-06-03 Thread Robert Haas
On Fri, May 30, 2014 at 2:38 PM, Heikki Linnakangas
hlinnakan...@vmware.com wrote:
 (forgot to answer this question)

 On 05/30/2014 06:27 PM, Andres Freund wrote:

 On 2014-05-30 17:59:23 +0300, Heikki Linnakangas wrote:

 * I expanded pg_clog to 64-bits per XID, but people suggested keeping
 pg_clog as is, with two bits per commit, and adding a new SLRU for the
 commit LSNs beside it. Probably will need to do something like that to
 avoid
 bloating the clog.


 It also influences how on-disk compatibility is dealt with. So: How are
 you planning to deal with on-disk compatibility?


 Have pg_upgrade read the old clog and write it out in the new format.

That doesn't address Bruce's concern about CLOG disk consumption.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Proposal for CSN based snapshots

2014-06-03 Thread Greg Stark
On Tue, Jun 3, 2014 at 2:55 PM, Robert Haas robertmh...@gmail.com wrote:
 That doesn't address Bruce's concern about CLOG disk consumption.

Well we would only need the xid-lsn mapping for transactions since
globalxmin. Anything older we would just need the committed bit. So we
could maintain two structures, one like our current clog going back
until the freeze_max_age and one with 32-bits (or 64 bits?) per xid
but only going back as far as the globalxmin. There are a myriad of
compression techniques we could use on a sequence of mostly similar
mostly increasing numbers in a small range too. But I suspect they
wouldn't really be necessary.

Here's another thought. I don't see how to make this practical but it
would be quite convenient if it could be made so. The first time an
xid is looked up and is determined to be committed then all we'll ever
need in the future is the lsn. If we could replace the xid with the
LSN of the commit record right there in the page then future viewers
would be able to determine if it's visible without looking in the clog
or this new clog xid-lsn mapping. If that was the full LSN it would
never need to be frozen either. The problem with this is that the LSN
is big and actually moves faster than the xid. We could play the same
games with putting the lsn segment in the page header but it's
actually entirely feasible to have a snapshot that extends over
several segments. Way easier than a snapshot that extends over several
xid epochs. (This does make having a CSN like Oracle kind of tempting
after all)

-- 
greg


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Proposal for CSN based snapshots

2014-05-30 Thread Greg Stark
On Fri, May 30, 2014 at 3:59 PM, Heikki Linnakangas
hlinnakan...@vmware.com wrote:
 If transaction A commits synchronously with commit LSN 1, and transaction B
 commits asynchronously with commit LSN 2, B cannot become visible before A.
 And we cannot acknowledge B as committed to the client until it's visible to
 other transactions. That means that B will have to wait for A's commit
 record to be flushed to disk, before it can return, even though it was an
 asynchronous commit.


I thought that's what happens now.

What's more of a concern is synchronousl replication. We currently
have a hack that makes transactions committed locally invisible to
other transactions even though they've committed and synced to disk
until the slave responds that it's received the transaction. (I think
this is bogus personally, it just shifts the failure modes around. If
we wanted to do it properly we would have to do two-phase commit.)

I guess it still works because we don't support having synchronous
replication for just some transactions and not others. It would be
nice to support that but I think it would mean making it work like
local synchronous commit. It would only affect how long the commit
blocks, not when other transactions see the committed data.

-- 
greg


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Proposal for CSN based snapshots

2014-05-30 Thread Heikki Linnakangas

On 05/30/2014 06:09 PM, Greg Stark wrote:

On Fri, May 30, 2014 at 3:59 PM, Heikki Linnakangas
hlinnakan...@vmware.com wrote:

If transaction A commits synchronously with commit LSN 1, and transaction B
commits asynchronously with commit LSN 2, B cannot become visible before A.
And we cannot acknowledge B as committed to the client until it's visible to
other transactions. That means that B will have to wait for A's commit
record to be flushed to disk, before it can return, even though it was an
asynchronous commit.



I thought that's what happens now.

What's more of a concern is synchronousl replication. We currently
have a hack that makes transactions committed locally invisible to
other transactions even though they've committed and synced to disk
until the slave responds that it's received the transaction. (I think
this is bogus personally, it just shifts the failure modes around. If
we wanted to do it properly we would have to do two-phase commit.)


Yeah. To recap, the failure mode is that if the master crashes and 
restarts, the transaction becomes visible in the master even though it 
was never replicated.



I guess it still works because we don't support having synchronous
replication for just some transactions and not others. It would be
nice to support that but I think it would mean making it work like
local synchronous commit. It would only affect how long the commit
blocks, not when other transactions see the committed data.


Actually, we do support that, see synchronous_commit=local.

- Heikki


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Proposal for CSN based snapshots

2014-05-30 Thread Andres Freund
Hi,

On 2014-05-30 17:59:23 +0300, Heikki Linnakangas wrote:
 So, here's a first version of the patch. Still very much WIP.

Cool.

 One thorny issue came up in discussions with other hackers on this in PGCon:
 
 When a transaction is committed asynchronously, it becomes visible to other
 backends before the commit WAL record is flushed. With CSN-based snapshots,
 the order that transactions become visible is always based on the LSNs of
 the WAL records. This is a problem when there is a mix of synchronous and
 asynchronous commits:
 
 If transaction A commits synchronously with commit LSN 1, and transaction B
 commits asynchronously with commit LSN 2, B cannot become visible before A.
 And we cannot acknowledge B as committed to the client until it's visible to
 other transactions. That means that B will have to wait for A's commit
 record to be flushed to disk, before it can return, even though it was an
 asynchronous commit.

 I personally think that's annoying, but we can live with it. The most common
 usage of synchronous_commit=off is to run a lot of transactions in that
 mode, setting it in postgresql.conf. And it wouldn't completely defeat the
 purpose of mixing synchronous and asynchronous commits either: an
 asynchronous commit still only needs to wait for any already-logged
 synchronous commits to be flushed to disk, not the commit record of the
 asynchronous transaction itself.

I have a hard time believing that users won't hate us for such a
regression. It's pretty common to mix both sorts of transactions and
this will - by my guesstimate - dramatically reduce throughput for the
async backends.

 * Logical decoding is broken. I hacked on it enough that it looks roughly
 sane and it compiles, but didn't spend more time to debug.

I think we can live with it not working for the first few
iterations. I'll look into it once the patch has stabilized a bit.

 * I expanded pg_clog to 64-bits per XID, but people suggested keeping
 pg_clog as is, with two bits per commit, and adding a new SLRU for the
 commit LSNs beside it. Probably will need to do something like that to avoid
 bloating the clog.

It also influences how on-disk compatibility is dealt with. So: How are
you planning to deal with on-disk compatibility?

 * Add some kind of backend-private caching of clog, to make it faster to
 access. The visibility checks are now hitting the clog a lot more heavily
 than before, as you need to check the clog even if the hint bits are set, if
 the XID falls between xmin and xmax of the snapshot.

That'll hurt a lot in concurrent scenarios :/. Have you measured how
'wide' xmax-xmin usually is? I wonder if we could just copy a range of
values from the clog when we start scanning

Greetings,

Andres Freund

-- 
 Andres Freund http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training  Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Proposal for CSN based snapshots

2014-05-30 Thread Heikki Linnakangas

On 05/30/2014 06:27 PM, Andres Freund wrote:

On 2014-05-30 17:59:23 +0300, Heikki Linnakangas wrote:

One thorny issue came up in discussions with other hackers on this in PGCon:

When a transaction is committed asynchronously, it becomes visible to other
backends before the commit WAL record is flushed. With CSN-based snapshots,
the order that transactions become visible is always based on the LSNs of
the WAL records. This is a problem when there is a mix of synchronous and
asynchronous commits:

If transaction A commits synchronously with commit LSN 1, and transaction B
commits asynchronously with commit LSN 2, B cannot become visible before A.
And we cannot acknowledge B as committed to the client until it's visible to
other transactions. That means that B will have to wait for A's commit
record to be flushed to disk, before it can return, even though it was an
asynchronous commit.



I personally think that's annoying, but we can live with it. The most common
usage of synchronous_commit=off is to run a lot of transactions in that
mode, setting it in postgresql.conf. And it wouldn't completely defeat the
purpose of mixing synchronous and asynchronous commits either: an
asynchronous commit still only needs to wait for any already-logged
synchronous commits to be flushed to disk, not the commit record of the
asynchronous transaction itself.


I have a hard time believing that users won't hate us for such a
regression. It's pretty common to mix both sorts of transactions and
this will - by my guesstimate - dramatically reduce throughput for the
async backends.


Yeah, it probably would. Not sure how many people would care.

For an asynchronous commit, we could store the current WAL flush 
location as the commit LSN, instead of the location of the commit 
record. That would break the property that LSN == commit order, but that 
property is fundamentally incompatible with having async commits become 
visible without flushing previous transactions. Or we could even make it 
configurable, it would be fairly easy to support both behaviors.



* Logical decoding is broken. I hacked on it enough that it looks roughly
sane and it compiles, but didn't spend more time to debug.


I think we can live with it not working for the first few
iterations. I'll look into it once the patch has stabilized a bit.


Thanks!


* I expanded pg_clog to 64-bits per XID, but people suggested keeping
pg_clog as is, with two bits per commit, and adding a new SLRU for the
commit LSNs beside it. Probably will need to do something like that to avoid
bloating the clog.


It also influences how on-disk compatibility is dealt with. So: How are
you planning to deal with on-disk compatibility?


* Add some kind of backend-private caching of clog, to make it faster to
access. The visibility checks are now hitting the clog a lot more heavily
than before, as you need to check the clog even if the hint bits are set, if
the XID falls between xmin and xmax of the snapshot.


That'll hurt a lot in concurrent scenarios :/. Have you measured how
'wide' xmax-xmin usually is?


That depends entirely on the workload. The worst case is a mix of a 
long-running transaction and a lot of short transaction. It could grow 
to millions of transactions or more in that case.



I wonder if we could just copy a range of
values from the clog when we start scanning


I don't think that's practical, if the xmin-xmax gap is wide.

Perhaps we should take the bull by the horns and make clog faster to 
look up. If we e.g. mmapped the clog file into backend-private address 
space, we could all the locking overhead of an SLRU. On platforms with 
atomic 64-bit instructions, you could read the clog with just a memory 
barrier. Even on other architectures, you'd only need a spinlock.


- Heikki


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Proposal for CSN based snapshots

2014-05-30 Thread Heikki Linnakangas

(forgot to answer this question)

On 05/30/2014 06:27 PM, Andres Freund wrote:

On 2014-05-30 17:59:23 +0300, Heikki Linnakangas wrote:

* I expanded pg_clog to 64-bits per XID, but people suggested keeping
pg_clog as is, with two bits per commit, and adding a new SLRU for the
commit LSNs beside it. Probably will need to do something like that to avoid
bloating the clog.


It also influences how on-disk compatibility is dealt with. So: How are
you planning to deal with on-disk compatibility?


Have pg_upgrade read the old clog and write it out in the new format.

- Heikki


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Proposal for CSN based snapshots

2014-05-15 Thread Bruce Momjian
On Mon, May 12, 2014 at 06:01:59PM +0300, Heikki Linnakangas wrote:
 Some of the stuff in here will be influence whether your freezing
 replacement patch gets in. Do you plan to further pursue that one?
 
 Not sure. I got to the point where it seemed to work, but I got a
 bit of a cold feet proceeding with it. I used the page header's LSN
 field to define the epoch of the page, but I started to feel
 uneasy about it. I would be much more comfortable with an extra
 field in the page header, even though that uses more disk space. And
 requires dealing with pg_upgrade.

FYI, pg_upgrade copies pg_clog from the old cluster, so there will be a
pg_upgrade issue anyway.

I am not excited about a 32x increase in clog size, especially since we
already do freezing at 200M transactions to allow for more aggressive
clog trimming.  Extrapolating that out, it means we would freeze every
6.25M transactions.

-- 
  Bruce Momjian  br...@momjian.ushttp://momjian.us
  EnterpriseDB http://enterprisedb.com

  + Everyone has their own god. +


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Proposal for CSN based snapshots

2014-05-15 Thread Robert Haas
On Thu, May 15, 2014 at 2:34 PM, Bruce Momjian br...@momjian.us wrote:
 On Mon, May 12, 2014 at 06:01:59PM +0300, Heikki Linnakangas wrote:
 Some of the stuff in here will be influence whether your freezing
 replacement patch gets in. Do you plan to further pursue that one?

 Not sure. I got to the point where it seemed to work, but I got a
 bit of a cold feet proceeding with it. I used the page header's LSN
 field to define the epoch of the page, but I started to feel
 uneasy about it. I would be much more comfortable with an extra
 field in the page header, even though that uses more disk space. And
 requires dealing with pg_upgrade.

 FYI, pg_upgrade copies pg_clog from the old cluster, so there will be a
 pg_upgrade issue anyway.

 I am not excited about a 32x increase in clog size, especially since we
 already do freezing at 200M transactions to allow for more aggressive
 clog trimming.  Extrapolating that out, it means we would freeze every
 6.25M transactions.

It seems better to allow clog to grow larger than to force
more-frequent freezing.

If the larger clog size is a show-stopper (and I'm not sure I have an
intelligent opinion on that just yet), one way to get around the
problem would be to summarize CLOG entries after-the-fact.  Once an
XID precedes the xmin of every snapshot, we don't need to know the
commit LSN any more.  So we could read the old pg_clog files and write
new summary files.  Since we don't need to care about subcommitted
transactions either, we could get by with just 1 bit per transaction,
1 = committed, 0 = aborted.  Once we've written and fsync'd the
summary files, we could throw away the original files.  That might
leave us with a smaller pg_clog than what we have today.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Proposal for CSN based snapshots

2014-05-15 Thread Andres Freund
On 2014-05-15 15:40:06 -0400, Robert Haas wrote:
 On Thu, May 15, 2014 at 2:34 PM, Bruce Momjian br...@momjian.us wrote:
  On Mon, May 12, 2014 at 06:01:59PM +0300, Heikki Linnakangas wrote:
  Some of the stuff in here will be influence whether your freezing
  replacement patch gets in. Do you plan to further pursue that one?
 
  Not sure. I got to the point where it seemed to work, but I got a
  bit of a cold feet proceeding with it. I used the page header's LSN
  field to define the epoch of the page, but I started to feel
  uneasy about it. I would be much more comfortable with an extra
  field in the page header, even though that uses more disk space. And
  requires dealing with pg_upgrade.
 
  FYI, pg_upgrade copies pg_clog from the old cluster, so there will be a
  pg_upgrade issue anyway.
 
  I am not excited about a 32x increase in clog size, especially since we
  already do freezing at 200M transactions to allow for more aggressive
  clog trimming.  Extrapolating that out, it means we would freeze every
  6.25M transactions.

The default setting imo is far too low for a database of any relevant
activity. If I had the stomach for the fight around it I'd suggest
increasing it significantly by default. People with small databases
won't be hurt significantly because they simply don't have that many
transactions and autovacuum will get around to cleanup long before
normally.

 It seems better to allow clog to grow larger than to force
 more-frequent freezing.

Yes.

 If the larger clog size is a show-stopper (and I'm not sure I have an
 intelligent opinion on that just yet), one way to get around the
 problem would be to summarize CLOG entries after-the-fact.  Once an
 XID precedes the xmin of every snapshot, we don't need to know the
 commit LSN any more.  So we could read the old pg_clog files and write
 new summary files.  Since we don't need to care about subcommitted
 transactions either, we could get by with just 1 bit per transaction,
 1 = committed, 0 = aborted.  Once we've written and fsync'd the
 summary files, we could throw away the original files.  That might
 leave us with a smaller pg_clog than what we have today.

I think the easiest way for now would be to have pg_clog with the same
format as today and a rangewise much smaller pg_csn storing the lsns
that are needed. That'll leave us with pg_upgrade'ability without
needing to rewrite pg_clog during the upgrade.

Greetings,

Andres Freund

-- 
 Andres Freund http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training  Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Proposal for CSN based snapshots

2014-05-15 Thread Bruce Momjian
On Thu, May 15, 2014 at 10:06:32PM +0200, Andres Freund wrote:
  If the larger clog size is a show-stopper (and I'm not sure I have an
  intelligent opinion on that just yet), one way to get around the
  problem would be to summarize CLOG entries after-the-fact.  Once an
  XID precedes the xmin of every snapshot, we don't need to know the
  commit LSN any more.  So we could read the old pg_clog files and write
  new summary files.  Since we don't need to care about subcommitted
  transactions either, we could get by with just 1 bit per transaction,
  1 = committed, 0 = aborted.  Once we've written and fsync'd the
  summary files, we could throw away the original files.  That might
  leave us with a smaller pg_clog than what we have today.
 
 I think the easiest way for now would be to have pg_clog with the same
 format as today and a rangewise much smaller pg_csn storing the lsns
 that are needed. That'll leave us with pg_upgrade'ability without
 needing to rewrite pg_clog during the upgrade.

Yes, I like the idea of storing the CSN separately.  One reason the
2-bit clog is so good is that we know we have atomic 1-byte writes on
all platforms.  Can we assume atomic 64-bit writes?

-- 
  Bruce Momjian  br...@momjian.ushttp://momjian.us
  EnterpriseDB http://enterprisedb.com

  + Everyone has their own god. +


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Proposal for CSN based snapshots

2014-05-15 Thread Andres Freund
On 2014-05-15 16:13:49 -0400, Bruce Momjian wrote:
 On Thu, May 15, 2014 at 10:06:32PM +0200, Andres Freund wrote:
   If the larger clog size is a show-stopper (and I'm not sure I have an
   intelligent opinion on that just yet), one way to get around the
   problem would be to summarize CLOG entries after-the-fact.  Once an
   XID precedes the xmin of every snapshot, we don't need to know the
   commit LSN any more.  So we could read the old pg_clog files and write
   new summary files.  Since we don't need to care about subcommitted
   transactions either, we could get by with just 1 bit per transaction,
   1 = committed, 0 = aborted.  Once we've written and fsync'd the
   summary files, we could throw away the original files.  That might
   leave us with a smaller pg_clog than what we have today.
  
  I think the easiest way for now would be to have pg_clog with the same
  format as today and a rangewise much smaller pg_csn storing the lsns
  that are needed. That'll leave us with pg_upgrade'ability without
  needing to rewrite pg_clog during the upgrade.
 
 Yes, I like the idea of storing the CSN separately.  One reason the
 2-bit clog is so good is that we know we have atomic 1-byte writes on
 all platforms.

I don't think we rely on that anywhere. And in fact we don't have the
ability to do so for arbitrary bytes - lots of platforms can do that
only on specifically aligned bytes.

We rely on being able to atomically (as in either before/after no torn
value) write/read TransactionIds, but that's it I think?

  Can we assume atomic 64-bit writes?

Not on 32bit platforms.

Greetings,

Andres Freund

-- 
 Andres Freund http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training  Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Proposal for CSN based snapshots

2014-05-15 Thread Alvaro Herrera
Andres Freund wrote:
 On 2014-05-15 15:40:06 -0400, Robert Haas wrote:
  On Thu, May 15, 2014 at 2:34 PM, Bruce Momjian br...@momjian.us wrote:

  If the larger clog size is a show-stopper (and I'm not sure I have an
  intelligent opinion on that just yet), one way to get around the
  problem would be to summarize CLOG entries after-the-fact.  Once an
  XID precedes the xmin of every snapshot, we don't need to know the
  commit LSN any more.  So we could read the old pg_clog files and write
  new summary files.  Since we don't need to care about subcommitted
  transactions either, we could get by with just 1 bit per transaction,
  1 = committed, 0 = aborted.  Once we've written and fsync'd the
  summary files, we could throw away the original files.  That might
  leave us with a smaller pg_clog than what we have today.
 
 I think the easiest way for now would be to have pg_clog with the same
 format as today and a rangewise much smaller pg_csn storing the lsns
 that are needed. That'll leave us with pg_upgrade'ability without
 needing to rewrite pg_clog during the upgrade.

Err, we're proposing a patch to add timestamps to each commit,
http://www.postgresql.org/message-id/2013101600.ge4...@eldon.alvh.no-ip.org
which does so in precisely this way.

The idea that pg_csn or pg_committs can be truncated much earlier than
pg_clog has its merit, no doubt.  If we can make sure that the atomicity
is sane, +1 from me.

-- 
Álvaro Herrerahttp://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training  Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Proposal for CSN based snapshots

2014-05-15 Thread Andres Freund
On 2014-05-15 17:37:14 -0400, Alvaro Herrera wrote:
 Andres Freund wrote:
  On 2014-05-15 15:40:06 -0400, Robert Haas wrote:
   On Thu, May 15, 2014 at 2:34 PM, Bruce Momjian br...@momjian.us wrote:
 
   If the larger clog size is a show-stopper (and I'm not sure I have an
   intelligent opinion on that just yet), one way to get around the
   problem would be to summarize CLOG entries after-the-fact.  Once an
   XID precedes the xmin of every snapshot, we don't need to know the
   commit LSN any more.  So we could read the old pg_clog files and write
   new summary files.  Since we don't need to care about subcommitted
   transactions either, we could get by with just 1 bit per transaction,
   1 = committed, 0 = aborted.  Once we've written and fsync'd the
   summary files, we could throw away the original files.  That might
   leave us with a smaller pg_clog than what we have today.
  
  I think the easiest way for now would be to have pg_clog with the same
  format as today and a rangewise much smaller pg_csn storing the lsns
  that are needed. That'll leave us with pg_upgrade'ability without
  needing to rewrite pg_clog during the upgrade.
 
 Err, we're proposing a patch to add timestamps to each commit,
 http://www.postgresql.org/message-id/2013101600.ge4...@eldon.alvh.no-ip.org
 which does so in precisely this way.

I am not sure where my statements above conflict with committs?

Greetings,

Andres Freund

-- 
 Andres Freund http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training  Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Proposal for CSN based snapshots

2014-05-15 Thread Alvaro Herrera
Andres Freund wrote:
 On 2014-05-15 17:37:14 -0400, Alvaro Herrera wrote:
  Andres Freund wrote:
   On 2014-05-15 15:40:06 -0400, Robert Haas wrote:
On Thu, May 15, 2014 at 2:34 PM, Bruce Momjian br...@momjian.us wrote:
  
If the larger clog size is a show-stopper (and I'm not sure I have an
intelligent opinion on that just yet), one way to get around the
problem would be to summarize CLOG entries after-the-fact.  Once an
XID precedes the xmin of every snapshot, we don't need to know the
commit LSN any more.  So we could read the old pg_clog files and write
new summary files.  Since we don't need to care about subcommitted
transactions either, we could get by with just 1 bit per transaction,
1 = committed, 0 = aborted.  Once we've written and fsync'd the
summary files, we could throw away the original files.  That might
leave us with a smaller pg_clog than what we have today.
   
   I think the easiest way for now would be to have pg_clog with the same
   format as today and a rangewise much smaller pg_csn storing the lsns
   that are needed. That'll leave us with pg_upgrade'ability without
   needing to rewrite pg_clog during the upgrade.
  
  Err, we're proposing a patch to add timestamps to each commit,
  http://www.postgresql.org/message-id/2013101600.ge4...@eldon.alvh.no-ip.org
  which does so in precisely this way.
 
 I am not sure where my statements above conflict with committs?

I didn't say it did ...

-- 
Álvaro Herrerahttp://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training  Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Proposal for CSN based snapshots

2014-05-13 Thread Amit Kapila
On Mon, May 12, 2014 at 7:26 PM, Heikki Linnakangas
hlinnakan...@vmware.com wrote:
 In theory, we could use a snapshot LSN as the cutoff-point for
 HeapTupleSatisfiesVisibility(). Maybe it's just because this is new, but
 that makes me feel uneasy.

To accomplish this won't XID-CSN map table be required and how will
it be maintained (means when to clear and add a entry to that map table)?

With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Proposal for CSN based snapshots

2014-05-13 Thread Heikki Linnakangas

On 05/13/2014 09:44 AM, Amit Kapila wrote:

On Mon, May 12, 2014 at 7:26 PM, Heikki Linnakangas
hlinnakan...@vmware.com wrote:

In theory, we could use a snapshot LSN as the cutoff-point for
HeapTupleSatisfiesVisibility(). Maybe it's just because this is new, but
that makes me feel uneasy.


To accomplish this won't XID-CSN map table be required and how will
it be maintained (means when to clear and add a entry to that map table)?


Not sure I understand. The clog is a mapping from XID to CSN. What 
vacuum needs to know is whether the xmin and/or xmax is visible to 
everyone (and whether they committed or aborted). To determine that, it 
needs the oldest still active snapshot LSN. That can be found by 
scanning the proc array. It's pretty much the same as a regular MVCC 
visibility check, but using the oldest still-active snapshot.


- Heikki


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Proposal for CSN based snapshots

2014-05-13 Thread Heikki Linnakangas

On 05/13/2014 08:08 AM, Rajeev rastogi wrote:

The core of the design is to store the LSN of the commit record in
pg_clog. Currently, we only store 2 bits per transaction there,
indicating if the transaction committed or not, but the patch will
expand it to 64 bits, to store the LSN. To check the visibility of an
XID in a snapshot, the XID's commit LSN is looked up in pg_clog, and
compared with the snapshot's LSN.

Isn't it will be bit in-efficient to look in to pg_clog to read XID's commit
LSN for every visibility check?


Maybe. If no hint bit is set on the tuple, you have to check the clog 
anyway to determine if the tuple is committed. And if for XIDs older 
than xmin or newer than xmax, you don't need to check pg_clog. But it's 
true that for tuples with hint bit set, and xmin  XID  xmax, you have 
to check the pg_clog in the new system, when currently you only need to 
do a binary search of the local array in the snapshot. My gut feeling is 
that it won't be significantly slower in practice. If it becomes a 
problem, some rearrangement pg_clog code might help, or you could build 
a cache of XID-CSN mappings that you've alread looked up in 
SnapshotData. So I don't think that's going to be a show-stopper.


- Heikki


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Proposal for CSN based snapshots

2014-05-13 Thread Amit Kapila
On Tue, May 13, 2014 at 1:59 PM, Heikki Linnakangas
hlinnakan...@vmware.com wrote:
 On 05/13/2014 09:44 AM, Amit Kapila wrote:

 To accomplish this won't XID-CSN map table be required and how will
 it be maintained (means when to clear and add a entry to that map table)?


 Not sure I understand. The clog is a mapping from XID to CSN.

The case I was referring is for xmin  XID  xmax, which you have
mentioned below that you are planing to directly refer pg_clog.  This
is I think one of the main place where new design can have impact
on performance, but as you said it is better to first do the implementation
based on pg_clog rather than directly jumping to optimize by maintaining
XID to CSN mapping.

With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Proposal for CSN based snapshots

2014-05-13 Thread Rajeev rastogi
On 13 May 2014 14:06, Heikki Linnakangas

  The core of the design is to store the LSN of the commit record in
  pg_clog. Currently, we only store 2 bits per transaction there,
  indicating if the transaction committed or not, but the patch will
  expand it to 64 bits, to store the LSN. To check the visibility of
  an XID in a snapshot, the XID's commit LSN is looked up in pg_clog,
  and compared with the snapshot's LSN.
  Isn't it will be bit in-efficient to look in to pg_clog to read XID's
  commit LSN for every visibility check?
 
 Maybe. If no hint bit is set on the tuple, you have to check the clog
 anyway to determine if the tuple is committed. And if for XIDs older
 than xmin or newer than xmax, you don't need to check pg_clog. But it's
 true that for tuples with hint bit set, and xmin  XID  xmax, you have
 to check the pg_clog in the new system, when currently you only need to
 do a binary search of the local array in the snapshot. My gut feeling
 is that it won't be significantly slower in practice. If it becomes a
 problem, some rearrangement pg_clog code might help, or you could build
 a cache of XID-CSN mappings that you've alread looked up in
 SnapshotData. So I don't think that's going to be a show-stopper.

Yes definitely it should not be not show-stopper. This can be optimized later 
by method 
as you mentioned  and also by some cut-off technique based on which we can 
decide that a XID beyond a certain range will be always visible, and thereby
avoiding look-up in pg_clog.

Thanks and Regards,
Kumar Rajeev Rastogi


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Proposal for CSN based snapshots

2014-05-12 Thread Heikki Linnakangas

On 01/24/2014 02:10 PM, Rajeev rastogi wrote:

We are also planning to implement CSN based snapshot.
So I am curious to know whether any further development is happening on this.


I started looking into this, and plan to work on this for 9.5. It's a 
big project, so any help is welcome. The design I have in mind is to use 
the LSN of the commit record as the CSN (as Greg Stark suggested).


Some problems and solutions I have been thinking of:

The core of the design is to store the LSN of the commit record in 
pg_clog. Currently, we only store 2 bits per transaction there, 
indicating if the transaction committed or not, but the patch will 
expand it to 64 bits, to store the LSN. To check the visibility of an 
XID in a snapshot, the XID's commit LSN is looked up in pg_clog, and 
compared with the snapshot's LSN.


Currently, before consulting the clog for an XID's status, it is 
necessary to first check if the transaction is still in progress by 
scanning the proc array. To get rid of that requirement, just before 
writing the commit record in the WAL, the backend will mark the clog 
slot with a magic value that says I'm just about to commit. After 
writing the commit record, it is replaced with the record's actual LSN. 
If a backend sees the magic value in the clog, it will wait for the 
transaction to finish the insertion, and then check again to get the 
real LSN. I'm thinking of just using XactLockTableWait() for that. This 
mechanism makes the insertion of a commit WAL record and updating the 
clog appear atomic to the rest of the system.


With this mechanism, taking a snapshot is just a matter of reading the 
current WAL insertion point. There is no need to scan the proc array, 
which is good. However, it probably still makes sense to record an xmin 
and an xmax in SnapshotData, for performance reasons. An xmax, in 
particular, will allow us to skip checking the clog for transactions 
that will surely not be visible. We will no longer track the latest 
completed XID or the xmin like we do today, but we can use 
SharedVariableCache-nextXid as a conservative value for xmax, and keep 
a cached global xmin value in shared memory, updated when convenient, 
that can be just copied to the snapshot.


In theory, we could use a snapshot LSN as the cutoff-point for 
HeapTupleSatisfiesVisibility(). Maybe it's just because this is new, but 
that makes me feel uneasy. In any case, I think we'll need a cut-off 
point defined as an XID rather than an LSN for freezing purposes. In 
particular, we need a cut-off XID to determine how far the pg_clog can 
be truncated, and to store in relfrozenxid. So, we will still need the 
concept of a global oldest xmin.


When a snapshot is just an LSN, taking a snapshot can no longer 
calculate an xmin, like we currently do (there will be a snapshot LSN in 
place of an xmin in the proc array). So we will need a new mechanism to 
calculate the global oldest xmin. First scan the proc array to find the 
oldest still in-progress XID. That - 1 will become the new oldest global 
xmin, after all currently active snapshots have finished. We don't want 
to sleep in GetOldestXmin(), waiting for the snapshots to finish, so we 
should periodically advance a system-wide oldest xmin value, for example 
whenever the walwrite process wakes up, so that when we need an 
oldest-xmin value, we will always have a fairly recently calculated 
value ready in shared memory.


- Heikki


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Proposal for CSN based snapshots

2014-05-12 Thread Andres Freund
On 2014-05-12 16:56:51 +0300, Heikki Linnakangas wrote:
 On 01/24/2014 02:10 PM, Rajeev rastogi wrote:
 We are also planning to implement CSN based snapshot.
 So I am curious to know whether any further development is happening on this.
 
 I started looking into this, and plan to work on this for 9.5. It's a big
 project, so any help is welcome. The design I have in mind is to use the LSN
 of the commit record as the CSN (as Greg Stark suggested).

Cool.

I haven't fully thought it through but I think it should make some of
the decoding code simpler. And it should greatly simplify the hot
standby code.

Some of the stuff in here will be influence whether your freezing
replacement patch gets in. Do you plan to further pursue that one?

 The core of the design is to store the LSN of the commit record in pg_clog.
 Currently, we only store 2 bits per transaction there, indicating if the
 transaction committed or not, but the patch will expand it to 64 bits, to
 store the LSN. To check the visibility of an XID in a snapshot, the XID's
 commit LSN is looked up in pg_clog, and compared with the snapshot's LSN.

We'll continue to need some of the old states? You plan to use values
that can never be valid lsns for them? I.e. 0/0 IN_PROGRESS, 0/1 ABORTED
etc?
How do you plan to deal with subtransactions?

 Currently, before consulting the clog for an XID's status, it is necessary
 to first check if the transaction is still in progress by scanning the proc
 array. To get rid of that requirement, just before writing the commit record
 in the WAL, the backend will mark the clog slot with a magic value that says
 I'm just about to commit. After writing the commit record, it is replaced
 with the record's actual LSN. If a backend sees the magic value in the clog,
 it will wait for the transaction to finish the insertion, and then check
 again to get the real LSN. I'm thinking of just using XactLockTableWait()
 for that. This mechanism makes the insertion of a commit WAL record and
 updating the clog appear atomic to the rest of the system.

So it's quite possible that clog will become more of a contention point
due to the doubled amount of writes.

 In theory, we could use a snapshot LSN as the cutoff-point for
 HeapTupleSatisfiesVisibility(). Maybe it's just because this is new, but
 that makes me feel uneasy.

It'd possibly also end up being less efficient because you'd visit the
clog for potentially quite some transactions to get the LSN.

Greetings,

Andres Freund


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Proposal for CSN based snapshots

2014-05-12 Thread Heikki Linnakangas

On 05/12/2014 05:41 PM, Andres Freund wrote:

I haven't fully thought it through but I think it should make some of
the decoding code simpler. And it should greatly simplify the hot
standby code.


Cool. I was worried it might conflict with the logical decoding stuff in 
some fundamental way, as I'm not really familiar with it.



Some of the stuff in here will be influence whether your freezing
replacement patch gets in. Do you plan to further pursue that one?


Not sure. I got to the point where it seemed to work, but I got a bit of 
a cold feet proceeding with it. I used the page header's LSN field to 
define the epoch of the page, but I started to feel uneasy about it. I 
would be much more comfortable with an extra field in the page header, 
even though that uses more disk space. And requires dealing with pg_upgrade.



The core of the design is to store the LSN of the commit record in pg_clog.
Currently, we only store 2 bits per transaction there, indicating if the
transaction committed or not, but the patch will expand it to 64 bits, to
store the LSN. To check the visibility of an XID in a snapshot, the XID's
commit LSN is looked up in pg_clog, and compared with the snapshot's LSN.


We'll continue to need some of the old states? You plan to use values
that can never be valid lsns for them? I.e. 0/0 IN_PROGRESS, 0/1 ABORTED
etc?


Exactly.

Using 64 bits per XID instead of just 2 will obviously require a lot 
more disk space, so we might actually want to still support the old clog 
format too, as an archive format. The clog for old transactions could 
be converted to the more compact 2-bits per XID format (or even just 1 bit).



How do you plan to deal with subtransactions?


pg_subtrans will stay unchanged. We could possibly merge it with 
pg_clog, reserving some 32-bit chunk of values that are not valid LSNs 
to mean an uncommitted subtransaction, with the parent XID. That assumes 
that you never need to look up the parent of an already-committed 
subtransaction. I thought that was true at first, but I think the SSI 
code looks up the parent of a committed subtransaction, to find its 
predicate locks. Perhaps it could be changed, but seems best to leave it 
alone for now; there will be a lot code churn anyway.


I think we can get rid of the sub-XID array in PGPROC. It's currently 
used to speed up TransactionIdIsInProgress(), but with the patch it will 
no longer be necessary to call TransactionIdIsInProgress() every time 
you check the visibility of an XID, so it doesn't need to be so fast 
anymore.


With the new commit-in-progress status in clog, we won't need the 
sub-committed clog status anymore. The commit-in-progress status will 
achieve the same thing.



Currently, before consulting the clog for an XID's status, it is necessary
to first check if the transaction is still in progress by scanning the proc
array. To get rid of that requirement, just before writing the commit record
in the WAL, the backend will mark the clog slot with a magic value that says
I'm just about to commit. After writing the commit record, it is replaced
with the record's actual LSN. If a backend sees the magic value in the clog,
it will wait for the transaction to finish the insertion, and then check
again to get the real LSN. I'm thinking of just using XactLockTableWait()
for that. This mechanism makes the insertion of a commit WAL record and
updating the clog appear atomic to the rest of the system.


So it's quite possible that clog will become more of a contention point
due to the doubled amount of writes.


Yeah. OTOH, each transaction will take more space in the clog, which 
will spread the contention across more pages. And I think there are ways 
to mitigate contention in clog, if it becomes a problem. We could make 
the locking more fine-grained than one lock per page, use atomic 64-bit 
reads/writes on platforms that support it, etc.



In theory, we could use a snapshot LSN as the cutoff-point for
HeapTupleSatisfiesVisibility(). Maybe it's just because this is new, but
that makes me feel uneasy.


It'd possibly also end up being less efficient because you'd visit the
clog for potentially quite some transactions to get the LSN.


True.

- Heikki


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Proposal for CSN based snapshots

2014-05-12 Thread Robert Haas
On Mon, May 12, 2014 at 10:41 AM, Andres Freund and...@2ndquadrant.com wrote:
 On 2014-05-12 16:56:51 +0300, Heikki Linnakangas wrote:
 On 01/24/2014 02:10 PM, Rajeev rastogi wrote:
 We are also planning to implement CSN based snapshot.
 So I am curious to know whether any further development is happening on 
 this.

 I started looking into this, and plan to work on this for 9.5. It's a big
 project, so any help is welcome. The design I have in mind is to use the LSN
 of the commit record as the CSN (as Greg Stark suggested).

 Cool.

Yes, very cool.  I remember having some concerns about using the LSN
of the commit record as the CSN.  I think the biggest one was the need
to update clog with the CSN before the commit record had been written,
which your proposal to store a temporary sentinel value there until
the commit has completed might address.  However, I wonder what
happens if you write the commit record and then the attempt to update
pg_clog fails.  I think you'll have to PANIC, which kind of sucks.  It
would be nice to pin the pg_clog page into the SLRU before writing the
commit record so that we don't have to fear needing to re-read it
afterwards, but the SLRU machinery doesn't currently have that notion.

Another thing to think about is that LSN = CSN will make things much
more difficult if we ever want to support multiple WAL streams with a
separate LSN sequence for each.  Perhaps you'll say that's a pipe
dream anyway, and I agree it's probably 5 years out, but I think it
may be something we'll want eventually.  With synthetic CSNs those
systems are more decoupled.  OTOH, one advantage of LSN = CSN is that
the commit order as seen on the standby would always match the commit
order as seen on the master, which is currently not true, and would be
a very nice property to have.

I think we're likely to find that system performance is quite
sensitive to any latency in updating the global-xmin.  One thing about
the present system is that if you take a snapshot while a very old
transaction is still running, you're going to use that as your
global-xmin for the entire lifetime of your transaction.  It might be
possible, with some of the rejiggering you're thinking about, to
arrange things so that there are opportunities for processes to roll
forward their notion of the global-xmin, making HOT pruning more
efficient.  Whether anything good happens there or not is sort of a
side issue, but we need to make sure the efficiency of HOT pruning
doesn't regress.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Proposal for CSN based snapshots

2014-05-12 Thread Ants Aasma
On Mon, May 12, 2014 at 4:56 PM, Heikki Linnakangas
hlinnakan...@vmware.com wrote:
 On 01/24/2014 02:10 PM, Rajeev rastogi wrote:

 We are also planning to implement CSN based snapshot.
 So I am curious to know whether any further development is happening on
 this.


 I started looking into this, and plan to work on this for 9.5. It's a big
 project, so any help is welcome. The design I have in mind is to use the LSN
 of the commit record as the CSN (as Greg Stark suggested).

I did do some coding work on this, but the free time I used to work on
this basically disappeared with a child in the family. I guess what I
have has bitrotted beyond recognition. However I may still have some
insight that may be of use.

From your comments I presume that you are going with the original,
simpler approach proposed by Robert to simply keep the XID-CSN map
around for ever and probe it for all visibility lookups that lie
outside of the xmin-xmax range? As opposed to the more complex hybrid
approach I proposed that keeps a short term XID-CSN map and lazily
builds conventional list-of-concurrent-XIDs snapshots for long lived
snapshots. I think that would be prudent, as the simpler approach
needs mostly the same ground work and if turns out to work well
enough, simpler is always better.

Regards,
Ants Aasma
-- 
Cybertec Schönig  Schönig GmbH
Gröhrmühlgasse 26
A-2700 Wiener Neustadt
Web: http://www.postgresql-support.de


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Proposal for CSN based snapshots

2014-05-12 Thread Ants Aasma
On Mon, May 12, 2014 at 6:09 PM, Robert Haas robertmh...@gmail.com wrote:
 However, I wonder what
 happens if you write the commit record and then the attempt to update
 pg_clog fails.  I think you'll have to PANIC, which kind of sucks.

CLOG IO error while commiting is already a PANIC, SimpleLruReadPage()
does SlruReportIOError(), which in turn does ereport(ERROR), while
inside a critical section initiated in RecordTransactionCommit().

Regards,
Ants Aasma
-- 
Cybertec Schönig  Schönig GmbH
Gröhrmühlgasse 26
A-2700 Wiener Neustadt
Web: http://www.postgresql-support.de


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Proposal for CSN based snapshots

2014-05-12 Thread Greg Stark
On Mon, May 12, 2014 at 2:56 PM, Heikki Linnakangas
hlinnakan...@vmware.com wrote:
 Currently, before consulting the clog for an XID's status, it is necessary
 to first check if the transaction is still in progress by scanning the proc
 array. To get rid of that requirement, just before writing the commit record
 in the WAL, the backend will mark the clog slot with a magic value that says
 I'm just about to commit. After writing the commit record, it is replaced
 with the record's actual LSN. If a backend sees the magic value in the clog,
 it will wait for the transaction to finish the insertion, and then check
 again to get the real LSN. I'm thinking of just using XactLockTableWait()
 for that. This mechanism makes the insertion of a commit WAL record and
 updating the clog appear atomic to the rest of the system.


Would it be useful to store the current WAL insertion point along with
the about to commit flag so it's effectively a promise that this
transaction will commit no earlier than XXX. That should allow most
transactions to decide if those records are visible or not unless
they're very recent transactions which started in that short window
while the committing transaction was in the process of committing.

-- 
greg


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Proposal for CSN based snapshots

2014-05-12 Thread Heikki Linnakangas

On 05/12/2014 06:26 PM, Andres Freund wrote:

With the new commit-in-progress status in clog, we won't need the
sub-committed clog status anymore. The commit-in-progress status will
achieve the same thing.

Wouldn't that cause many spurious waits? Because commit-in-progress
needs to be waited on, but a sub-committed xact surely not?


Ah, no. Even today, a subxid isn't marked as sub-committed, until you 
commit the top-level transaction. The sub-commit state is a very 
transient state during the commit process, used to make the commit of 
the sub-transactions and the top-level transaction appear atomic. The 
commit-in-progress state would be a similarly short-lived state. You 
mark the subxids and the top xid as commit-in-progress just before the 
XLogInsert() of the commit record, and you replace them with the real 
LSNs right after XLogInsert().


- Heikki


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Proposal for CSN based snapshots

2014-05-12 Thread Andres Freund
On 2014-05-12 19:14:55 +0300, Heikki Linnakangas wrote:
 On 05/12/2014 06:26 PM, Andres Freund wrote:
 With the new commit-in-progress status in clog, we won't need the
 sub-committed clog status anymore. The commit-in-progress status will
 achieve the same thing.
 Wouldn't that cause many spurious waits? Because commit-in-progress
 needs to be waited on, but a sub-committed xact surely not?
 
 Ah, no. Even today, a subxid isn't marked as sub-committed, until you commit
 the top-level transaction. The sub-commit state is a very transient state
 during the commit process, used to make the commit of the sub-transactions
 and the top-level transaction appear atomic. The commit-in-progress state
 would be a similarly short-lived state. You mark the subxids and the top xid
 as commit-in-progress just before the XLogInsert() of the commit record, and
 you replace them with the real LSNs right after XLogInsert().

Ah, right. Forgot that detail...

Greetings,

Andres Freund

-- 
 Andres Freund http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training  Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Proposal for CSN based snapshots

2014-05-12 Thread Ants Aasma
On Mon, May 12, 2014 at 7:10 PM, Greg Stark st...@mit.edu wrote:
 Would it be useful to store the current WAL insertion point along with
 the about to commit flag so it's effectively a promise that this
 transaction will commit no earlier than XXX. That should allow most
 transactions to decide if those records are visible or not unless
 they're very recent transactions which started in that short window
 while the committing transaction was in the process of committing.

I don't believe this is worth the complexity. The contention window is
extremely short here.

Regards,
Ants Aasma
-- 
Cybertec Schönig  Schönig GmbH
Gröhrmühlgasse 26
A-2700 Wiener Neustadt
Web: http://www.postgresql-support.de


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Proposal for CSN based snapshots

2014-05-12 Thread Rajeev rastogi
On 12 May 2014 19:27, Heikki Linnakangas Wrote:
 
 On 01/24/2014 02:10 PM, Rajeev rastogi wrote:
  We are also planning to implement CSN based snapshot.
  So I am curious to know whether any further development is happening
 on this.
 
 I started looking into this, and plan to work on this for 9.5. It's a
 big project, so any help is welcome. The design I have in mind is to
 use the LSN of the commit record as the CSN (as Greg Stark suggested).

Great !

 Some problems and solutions I have been thinking of:
 
 The core of the design is to store the LSN of the commit record in
 pg_clog. Currently, we only store 2 bits per transaction there,
 indicating if the transaction committed or not, but the patch will
 expand it to 64 bits, to store the LSN. To check the visibility of an
 XID in a snapshot, the XID's commit LSN is looked up in pg_clog, and
 compared with the snapshot's LSN.

Isn't it will be bit in-efficient to look in to pg_clog to read XID's commit
LSN for every visibility check?
 
 With this mechanism, taking a snapshot is just a matter of reading the
 current WAL insertion point. There is no need to scan the proc array,
 which is good. However, it probably still makes sense to record an xmin
 and an xmax in SnapshotData, for performance reasons. An xmax, in
 particular, will allow us to skip checking the clog for transactions
 that will surely not be visible. We will no longer track the latest
 completed XID or the xmin like we do today, but we can use
 SharedVariableCache-nextXid as a conservative value for xmax, and keep
 a cached global xmin value in shared memory, updated when convenient,
 that can be just copied to the snapshot.

I think we can update xmin, whenever transaction with its XID equal
to xmin gets committed (i.e. in ProcArrayEndTransaction).

Thanks and Regards,
Kumar Rajeev Rastogi



-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Proposal for CSN based snapshots

2014-01-24 Thread Rajeev rastogi
On 11th June 2013, Markus Wanner wrote:

  Agreed. Postgres-R uses a CommitOrderId, which is very similar in
  concept, for example.
 
  Do you think having this snapshot scheme would be helpful for
 Postgres-R?
 
 Yeah, it could help to reduce patch size, after a rewrite to use such a
 CSN.
 
  Or why do you need to tell apart aborted from in-progress
  transactions by CSN?
 
  I need to detect aborted transactions so they can be discared during
  the eviction process, otherwise the sparse array will fill up. They
  could also be filtered out by cross-referencing uncommitted slots
 with
  the procarray. Having the abort case do some additional work to make
  xid assigment cheaper looks like a good tradeoff.
 
 I see.
 
  Sparse buffer needs to be at least big enough to fit CSN slots for
  the xids of all active transactions and non-overflowed
  subtransactions. At the current level PGPROC_MAX_CACHED_SUBXIDS=64,
  the minimum comes out at 16 bytes * (64 + 1) slots * 100 =
 backends
  = 101.6KB per buffer, or 203KB total in the default configuration.
 
  A CSN is 8 bytes, the XID 4, resulting in 12 bytes per slot. So I
  guess the given 16 bytes includes alignment to 8 byte boundaries.
 Sounds good.
 
  8 byte alignment for CSNs is needed for atomic if not something else.
 
 Oh, right, atomic writes.
 
  I think the size could be cut in half by using a base value for CSNs
  if we assume that no xid is active for longer than 2B transactions as
  is currently the case. I didn't want to include the complication in
  the first iteration, so I didn't verify if that would have any
  gotchas.
 
 In Postgres-R, I effectively used a 32-bit order id which wraps around.
 
 In this case, I guess adjusting the base value will get tricky.
 Wrapping could probably be used as well, instead.
 
  The number of times each cache line can be invalidated is bounded by
  8.
 
 Hm.. good point.
 

We are also planning to implement CSN based snapshot.
So I am curious to know whether any further development is happening on this.
If not then what is the reason?

Am I missing something?

Thanks and Regards,
Kumar Rajeev Rastogi



-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Proposal for CSN based snapshots

2013-06-11 Thread Markus Wanner
Ants,

the more I think about this, the more I start to like it.

On 06/07/2013 02:50 PM, Ants Aasma wrote:
 On Fri, Jun 7, 2013 at 2:59 PM, Markus Wanner mar...@bluegap.ch wrote:
 Agreed. Postgres-R uses a CommitOrderId, which is very similar in
 concept, for example.
 
 Do you think having this snapshot scheme would be helpful for Postgres-R?

Yeah, it could help to reduce patch size, after a rewrite to use such a CSN.

 Or why do you need to tell apart aborted from in-progress transactions
 by CSN?
 
 I need to detect aborted transactions so they can be discared during
 the eviction process, otherwise the sparse array will fill up. They
 could also be filtered out by cross-referencing uncommitted slots with
 the procarray. Having the abort case do some additional work to make
 xid assigment cheaper looks like a good tradeoff.

I see.

 Sparse buffer needs to be at least big enough to fit CSN slots for the
 xids of all active transactions and non-overflowed subtransactions. At
 the current level PGPROC_MAX_CACHED_SUBXIDS=64, the minimum comes out
 at 16 bytes * (64 + 1) slots * 100 =  backends = 101.6KB per buffer,
 or 203KB total in the default configuration.

 A CSN is 8 bytes, the XID 4, resulting in 12 bytes per slot. So I guess
 the given 16 bytes includes alignment to 8 byte boundaries. Sounds good.
 
 8 byte alignment for CSNs is needed for atomic if not something else.

Oh, right, atomic writes.

 I think the size could be cut in half by using a base value for CSNs
 if we assume that no xid is active for longer than 2B transactions as
 is currently the case. I didn't want to include the complication in
 the first iteration, so I didn't verify if that would have any
 gotchas.

In Postgres-R, I effectively used a 32-bit order id which wraps around.

In this case, I guess adjusting the base value will get tricky. Wrapping
could probably be used as well, instead.

 The number of times each cache line can be invalidated is
 bounded by 8.

Hm.. good point.

Regards

Markus Wanner


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Proposal for CSN based snapshots

2013-06-07 Thread Markus Wanner
Ants,

On 06/07/2013 12:42 AM, Ants Aasma wrote:
 Given the recent ideas being thrown about changing how freezing and
 clog is handled and MVCC catalog access I thought I would write out
 the ideas that I have had about speeding up snapshots in case there is
 an interesting tie in with the current discussions.

Thanks for this write-up.

 To refresh your memory the basic idea is to change visibility
 determination to be based on a commit sequence number (CSN for short)
 - a 8 byte number incremented on every commit representing the total
 ordering of commits. To take a snapshot in this scheme you only need
 to know the value of last assigned CSN, all transactions with XID less

You mean with a *CSN* less than or equal to that number? There's
certainly no point in comparing a CSN with an XID.

 than or equal to that number were commited at the time of the
 snapshots, everything above wasn't committed. Besides speeding up
 snapshot taking, this scheme can also be a building block for
 consistent snapshots in a multi-master environment with minimal
 communication.

Agreed. Postgres-R uses a CommitOrderId, which is very similar in
concept, for example.

 The main tricky part about this scheme is finding the CSN that was
 assigned to each XIDs in face of arbitrarily long transactions and
 snapshots using only a bounded amount of shared memory. The secondary
 tricky part is doing this in a way that doesn't need locks for
 visibility determination as that would kill any hope of a performance
 gain.

Agreed. Para-phrasing, you can also say that CSNs can only ever identify
completed transactions, where XIDs can be used to identify transactions
in progress as well.

[ We cannot possibly get rid of XIDs entirely for that reason. And the
mapping between CSNs and XIDs has some additional cost. ]

 We need to keep around CSN slots for all currently running
 transactions and CSN slots of transactions that are concurrent with
 any active CSN based snapshot (xid.csn  min(snapshot.csn)). To do
 this I propose the following datastructures to do the XID-to-CSN
 mapping. For most recently assigned XIDs there is a ringbuffer of
 slots that contain the CSN values of the XIDs or special CSN values
 for transactions that haven't completed yet, aborted transactions or
 subtransactions. I call this the dense CSN map. Looking up a CSN of a
 XID from the ringbuffer is just a trivial direct indexing into the
 ring buffer.
 
 For long running transactions the ringbuffer may do a full circle
 before a transaction commits. Such CSN slots along with slots that are
 needed for older snapshots are evicted from the dense buffer into a
 sorted array of XID-CSN pairs, or the sparse mapping. For locking
 purposes there are two sparse buffers, one of them being active the
 other inactive, more on that later. Looking up the CSN value of a XID
 that has been evicted into the sparse mapping is a matter of
 performing a binary search to find the slot and reading the CSN value.

I like this idea of dense and sparse areas. Seems like a simple
enough, yet reasonably compact representation that might work well in
practice.

 Because snapshots can remain active for an unbounded amount of time
 and there can be unbounded amount of active snapshots, even the sparse
 mapping can fill up.

I don't think the number of active snapshots matters - after all, they
could all refer the same CSN. So that number shouldn't have any
influence on the size of the sparse mapping.

What does matter is the number of transactions referenced by such a
sparse map. You are of course correct in that this number is equally
unbounded.

 To handle that case, each backend advertises its
 lowest snapshot number csn_min. When values need to be evicted from
 the sparse mapping, they are evicted in CSN order and written into the
 CSN log - a series of CSN-XID pairs. Backends that may still be
 concerned about those values are then notified that values that they
 might need to use have been evicted. Backends with invalidated
 snapshots can then convert their snapshots to regular list of
 concurrent XIDs snapshots at their leisure.
 
 To convert a CSN based snapshot to XID based, a backend would first
 scan the shared memory structures for xids up to snapshot.xmax for
 CSNs that are concurrent to the snapshot and insert the XIDs into the
 snapshot, then read in the CSN log starting from snapshots CSN,
 looking for xid's less than the snapshots xmax. After this the
 snapshot can be handled like current snapshots are handled.

Hm, I dislike the requirement to maintain two different snapshot formats.

Also mind that snapshot conversions - however unlikely you choose to
make them - may well result in bursts as multiple processes may need to
do such a conversion, all starting at the same point in time.

 Rolling back a transaction
 --
 
 Rolling back a transaction is basically the same as committing, but
 the CSN slots need to be stamped with a AbortedCSN.

Is that 

Re: [HACKERS] Proposal for CSN based snapshots

2013-06-07 Thread Greg Stark
On Thu, Jun 6, 2013 at 11:42 PM, Ants Aasma a...@cybertec.at wrote:
 To refresh your memory the basic idea is to change visibility
 determination to be based on a commit sequence number (CSN for short)
 - a 8 byte number incremented on every commit representing the total
 ordering of commits

I think we would just use the LSN of the commit record which is
effectively the same but doesn't require a new counter.
I don't think this changes anything though.


-- 
greg


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


  1   2   >