Re: [HACKERS] [RFC] LSN Map

2015-07-06 Thread Heikki Linnakangas

On 02/24/2015 04:55 AM, Robert Haas wrote:

On Mon, Feb 23, 2015 at 12:52 PM, Heikki Linnakangas
hlinnakan...@vmware.com wrote:

Dunno, but Jim's got a point. This is a maintenance burden to all indexams,
if they all have to remember to update the LSN map separately. It needs to
be done in some common code, like in PageSetLSN or XLogInsert or something.

Aside from that, isn't this horrible from a performance point of view? The
patch doubles the buffer manager traffic, because any update to any page
will also need to modify the LSN map. This code is copied from the
visibility map code, but we got away with it there because the VM only needs
to be updated the first time a page is modified. Subsequent updates will
know the visibility bit is already cleared, and don't need to access the
visibility map.

Ans scalability: Whether you store one value for every N pages, or the LSN
of every page, this is going to have a huge effect of focusing contention to
the LSN pages. Currently, if ten backends operate on ten different heap
pages, for example, they can run in parallel. There will be some contention
on the WAL insertions (much less in 9.4 than before). But with this patch,
they will all fight for the exclusive lock on the single LSN map page.

You'll need to find a way to not update the LSN map on every update. For
example, only update the LSN page on the first update after a checkpoint
(although that would still have a big contention focusing effect right after
a checkpoint).


I think it would make more sense to do this in the background.
Suppose there's a background process that reads the WAL and figures
out which buffers it touched, and then updates the LSN map
accordingly.  Then the contention-focusing effect disappears, because
all of the updates to the LSN map are being made by the same process.
You need some way to make sure the WAL sticks around until you've
scanned it for changed blocks - but that is mighty close to what a
physical replication slot does, so it should be manageable.


If you implement this as a background process that reads WAL, as Robert 
suggested, you could perhaps implement this completely in an extension. 
That'd be nice, even if we later want to integrate this in the backend, 
in order to get you started quickly.


This is marked in the commitfest as Needs Review, but ISTM this got 
its fair share of review back in February. Marking as Returned with 
Feedback.


- Heikki


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [RFC] LSN Map

2015-02-23 Thread Heikki Linnakangas

On 01/13/2015 01:22 PM, Marco Nenciarini wrote:

Il 08/01/15 20:18, Jim Nasby ha scritto:

On 1/7/15, 3:50 AM, Marco Nenciarini wrote:

The current implementation tracks only heap LSN. It currently does not
track any kind of indexes, but this can be easily added later.


Would it make sense to do this at a buffer level, instead of at the heap
level? That means it would handle both heap and indexes.
  I don't know if LSN is visible that far down though.


Where exactly you are thinking to handle it?


Dunno, but Jim's got a point. This is a maintenance burden to all 
indexams, if they all have to remember to update the LSN map separately. 
It needs to be done in some common code, like in PageSetLSN or 
XLogInsert or something.


Aside from that, isn't this horrible from a performance point of view? 
The patch doubles the buffer manager traffic, because any update to any 
page will also need to modify the LSN map. This code is copied from the 
visibility map code, but we got away with it there because the VM only 
needs to be updated the first time a page is modified. Subsequent 
updates will know the visibility bit is already cleared, and don't need 
to access the visibility map.


Ans scalability: Whether you store one value for every N pages, or the 
LSN of every page, this is going to have a huge effect of focusing 
contention to the LSN pages. Currently, if ten backends operate on ten 
different heap pages, for example, they can run in parallel. There will 
be some contention on the WAL insertions (much less in 9.4 than before). 
But with this patch, they will all fight for the exclusive lock on the 
single LSN map page.


You'll need to find a way to not update the LSN map on every update. For 
example, only update the LSN page on the first update after a checkpoint 
(although that would still have a big contention focusing effect right 
after a checkpoint).


- Heikki



--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [RFC] LSN Map

2015-02-23 Thread Robert Haas
On Mon, Feb 23, 2015 at 12:52 PM, Heikki Linnakangas
hlinnakan...@vmware.com wrote:
 Dunno, but Jim's got a point. This is a maintenance burden to all indexams,
 if they all have to remember to update the LSN map separately. It needs to
 be done in some common code, like in PageSetLSN or XLogInsert or something.

 Aside from that, isn't this horrible from a performance point of view? The
 patch doubles the buffer manager traffic, because any update to any page
 will also need to modify the LSN map. This code is copied from the
 visibility map code, but we got away with it there because the VM only needs
 to be updated the first time a page is modified. Subsequent updates will
 know the visibility bit is already cleared, and don't need to access the
 visibility map.

 Ans scalability: Whether you store one value for every N pages, or the LSN
 of every page, this is going to have a huge effect of focusing contention to
 the LSN pages. Currently, if ten backends operate on ten different heap
 pages, for example, they can run in parallel. There will be some contention
 on the WAL insertions (much less in 9.4 than before). But with this patch,
 they will all fight for the exclusive lock on the single LSN map page.

 You'll need to find a way to not update the LSN map on every update. For
 example, only update the LSN page on the first update after a checkpoint
 (although that would still have a big contention focusing effect right after
 a checkpoint).

I think it would make more sense to do this in the background.
Suppose there's a background process that reads the WAL and figures
out which buffers it touched, and then updates the LSN map
accordingly.  Then the contention-focusing effect disappears, because
all of the updates to the LSN map are being made by the same process.
You need some way to make sure the WAL sticks around until you've
scanned it for changed blocks - but that is mighty close to what a
physical replication slot does, so it should be manageable.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [RFC] LSN Map

2015-01-13 Thread Marco Nenciarini
Il 08/01/15 20:18, Jim Nasby ha scritto:
 On 1/7/15, 3:50 AM, Marco Nenciarini wrote:
 The current implementation tracks only heap LSN. It currently does not
 track any kind of indexes, but this can be easily added later.
 
 Would it make sense to do this at a buffer level, instead of at the heap
 level? That means it would handle both heap and indexes.
  I don't know if LSN is visible that far down though.

Where exactly you are thinking to handle it?

 
 Also, this pattern is repeated several times; it would be good to put it
 in it's own function:
 + lsnmap_pin(reln, blkno, lmbuffer);
 + lsnmap_set(reln, blkno, lmbuffer, lsn);
 + ReleaseBuffer(lmbuffer);

Right.

Regards,
Marco

-- 
Marco Nenciarini - 2ndQuadrant Italy
PostgreSQL Training, Services and Support
marco.nenciar...@2ndquadrant.it | www.2ndQuadrant.it



signature.asc
Description: OpenPGP digital signature


Re: [HACKERS] [RFC] LSN Map

2015-01-08 Thread Jim Nasby

On 1/7/15, 3:50 AM, Marco Nenciarini wrote:

The current implementation tracks only heap LSN. It currently does not
track any kind of indexes, but this can be easily added later.


Would it make sense to do this at a buffer level, instead of at the heap level? 
That means it would handle both heap and indexes. I don't know if LSN is 
visible that far down though.

Also, this pattern is repeated several times; it would be good to put it in 
it's own function:
+   lsnmap_pin(reln, blkno, lmbuffer);
+   lsnmap_set(reln, blkno, lmbuffer, lsn);
+   ReleaseBuffer(lmbuffer);
--
Jim Nasby, Data Architect, Blue Treble Consulting
Data in Trouble? Get it in Treble! http://BlueTreble.com


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


[HACKERS] [RFC] LSN Map

2015-01-07 Thread Marco Nenciarini
Hi Hackers,

In order to make incremental backup
(https://wiki.postgresql.org/wiki/Incremental_backup) efficient we
need a way to track the LSN of a page in a way that we can retrieve it
without reading the actual block. Below there is my proposal on how to
achieve it.

LSN Map
---

The purpose of the LSN map is to quickly know if a page of a relation
has been modified after a specified checkpoint.

Implementation
--

We create an additional fork which contains a raw stream of LSNs. To
limit the space used, every entry represent the maximum LSN of a group
of blocks of a fixed size. I chose arbitrarily the size of 2048
which is equivalent to 16MB of heap data, which means that we need 64k
entry to track one terabyte of heap.

Name


I've called this map LSN map, and I've named the corresponding fork
file as lm.

WAL logging
---

At the moment the map is not wal logged, but is updated during the wal
reply. I'm not enough deep in WAL mechanics to see if the current
approach is sane or if we should change it.

Current limits
--

The current implementation tracks only heap LSN. It currently does not
track any kind of indexes, but this can be easily added later. The
implementation of commands that rewrite the whole table can be
improved: cluster uses shared memory buffers instead of writing the
map directly on the disk, and moving a table to another tablespace
simply drops the map instead of updating it correctly.

Further ideas
-

The current implementation updates an entry in the map every time the
block get its LSN bumped, but we really only need to know which is the
first checkpoint that contains expired data. So setting the entry to
the last checkpoint LSN is probably enough, and will reduce the number
of writes. To implement this we only need a backend local copy of the
last checkpoint LSN, which is updated during each XLogInsert. Again,
I'm not enough deep in replication mechanics to see if this approach
could work on a standby using restartpoints instead of checkpoints.
Please advice on the best way to implement it.

Conclusions


This code is incomplete, and the xlog reply part must be
improved/fixed, but I think its a good start to have this feature.
I will appreciate any review, advice or critic.

Regards,
Marco

-- 
Marco Nenciarini - 2ndQuadrant Italy
PostgreSQL Training, Services and Support
marco.nenciar...@2ndquadrant.it | www.2ndQuadrant.it

From 89a943032f0a10fd093c126d15fbf81e5861dbe3 Mon Sep 17 00:00:00 2001
From: Marco Nenciarini marco.nenciar...@2ndquadrant.it
Date: Mon, 3 Nov 2014 17:52:27 +0100
Subject: [PATCH] LSN Map

This is a WIP. Only heap is supported. No indexes, no sequences.
---
 src/backend/access/heap/Makefile  |   2 +-
 src/backend/access/heap/heapam.c  | 239 ++--
 src/backend/access/heap/hio.c |  11 +-
 src/backend/access/heap/lsnmap.c  | 336 ++
 src/backend/access/heap/pruneheap.c   |  10 +
 src/backend/access/heap/rewriteheap.c |  37 +++-
 src/backend/catalog/storage.c |   8 +
 src/backend/commands/tablecmds.c  |   5 +-
 src/backend/commands/vacuumlazy.c |  35 +++-
 src/backend/storage/smgr/smgr.c   |   1 +
 src/common/relpath.c  |   5 +-
 src/include/access/hio.h  |   3 +-
 src/include/access/lsnmap.h   |  28 +++
 src/include/common/relpath.h  |   5 +-
 src/include/storage/smgr.h|   1 +
 15 files changed, 687 insertions(+), 39 deletions(-)
 create mode 100644 src/backend/access/heap/lsnmap.c
 create mode 100644 src/include/access/lsnmap.h

diff --git a/src/backend/access/heap/Makefile b/src/backend/access/heap/Makefile
index b83d496..776ee7d 100644
*** a/src/backend/access/heap/Makefile
--- b/src/backend/access/heap/Makefile
*** subdir = src/backend/access/heap
*** 12,17 
  top_builddir = ../../../..
  include $(top_builddir)/src/Makefile.global
  
! OBJS = heapam.o hio.o pruneheap.o rewriteheap.o syncscan.o tuptoaster.o 
visibilitymap.o
  
  include $(top_srcdir)/src/backend/common.mk
--- 12,17 
  top_builddir = ../../../..
  include $(top_builddir)/src/Makefile.global
  
! OBJS = heapam.o hio.o pruneheap.o rewriteheap.o syncscan.o tuptoaster.o 
visibilitymap.o lsnmap.o
  
  include $(top_srcdir)/src/backend/common.mk
diff --git a/src/backend/access/heap/heapam.c b/src/backend/access/heap/heapam.c
index 21e9d06..9486562 100644
*** a/src/backend/access/heap/heapam.c
--- b/src/backend/access/heap/heapam.c
***
*** 48,53 
--- 48,54 
  #include access/tuptoaster.h
  #include access/valid.h
  #include access/visibilitymap.h
+ #include access/lsnmap.h
  #include access/xact.h
  #include access/xlog.h
  #include access/xloginsert.h
*** heap_insert(Relation relation, HeapTuple
*** 2067,2073 
TransactionId xid = GetCurrentTransactionId();
HeapTuple   heaptup;
Buffer  buffer;
!   

Re: [HACKERS] [RFC] LSN Map

2015-01-07 Thread Bruce Momjian
On Wed, Jan  7, 2015 at 10:50:38AM +0100, Marco Nenciarini wrote:
 Implementation
 --
 
 We create an additional fork which contains a raw stream of LSNs. To
 limit the space used, every entry represent the maximum LSN of a group
 of blocks of a fixed size. I chose arbitrarily the size of 2048
 which is equivalent to 16MB of heap data, which means that we need 64k
 entry to track one terabyte of heap.

I like the idea of summarizing the LSN to keep its size reaonable.  Have
you done any measurements to determine how much backup can be skipped
using this method for a typical workload, i.e. how many 16MB page ranges
are not modified in a typical span between incremental backups?

-- 
  Bruce Momjian  br...@momjian.ushttp://momjian.us
  EnterpriseDB http://enterprisedb.com

  + Everyone has their own god. +


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [RFC] LSN Map

2015-01-07 Thread Alvaro Herrera
Bruce Momjian wrote:

 Have you done any measurements to determine how much backup can be
 skipped using this method for a typical workload, i.e. how many 16MB
 page ranges are not modified in a typical span between incremental
 backups?

That seems entirely dependent on the specific workload.

-- 
Álvaro Herrerahttp://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training  Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [RFC] LSN Map

2015-01-07 Thread Bruce Momjian
On Wed, Jan  7, 2015 at 12:33:20PM -0300, Alvaro Herrera wrote:
 Bruce Momjian wrote:
 
  Have you done any measurements to determine how much backup can be
  skipped using this method for a typical workload, i.e. how many 16MB
  page ranges are not modified in a typical span between incremental
  backups?
 
 That seems entirely dependent on the specific workload.

Well, obviously.  Is that worth even stating?  

My question is whether there are enough workloads for this to be
generally useful, particularly considering the recording granularity,
hint bits, and freezing.  Do we have cases where 16MB granularity helps
compared to file or table-level granularity?  How would we even measure
the benefits?  How would the administrator know they are benefitting
from incremental backups vs complete backups, considering the complexity
of incremental restores?

-- 
  Bruce Momjian  br...@momjian.ushttp://momjian.us
  EnterpriseDB http://enterprisedb.com

  + Everyone has their own god. +


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [RFC] LSN Map

2015-01-07 Thread Tom Lane
Alvaro Herrera alvhe...@2ndquadrant.com writes:
 Bruce Momjian wrote:
 Have you done any measurements to determine how much backup can be
 skipped using this method for a typical workload, i.e. how many 16MB
 page ranges are not modified in a typical span between incremental
 backups?

 That seems entirely dependent on the specific workload.

Maybe, but it's a reasonable question.  The benefit obtained from the
added complexity/overhead clearly goes to zero if you summarize too much,
and it's not at all clear that there's a sweet spot where you win.  So
I'd want to see some measurements demonstrating that this is worthwhile.

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers