Re: [HACKERS] [RFC] LSN Map
On 02/24/2015 04:55 AM, Robert Haas wrote: On Mon, Feb 23, 2015 at 12:52 PM, Heikki Linnakangas hlinnakan...@vmware.com wrote: Dunno, but Jim's got a point. This is a maintenance burden to all indexams, if they all have to remember to update the LSN map separately. It needs to be done in some common code, like in PageSetLSN or XLogInsert or something. Aside from that, isn't this horrible from a performance point of view? The patch doubles the buffer manager traffic, because any update to any page will also need to modify the LSN map. This code is copied from the visibility map code, but we got away with it there because the VM only needs to be updated the first time a page is modified. Subsequent updates will know the visibility bit is already cleared, and don't need to access the visibility map. Ans scalability: Whether you store one value for every N pages, or the LSN of every page, this is going to have a huge effect of focusing contention to the LSN pages. Currently, if ten backends operate on ten different heap pages, for example, they can run in parallel. There will be some contention on the WAL insertions (much less in 9.4 than before). But with this patch, they will all fight for the exclusive lock on the single LSN map page. You'll need to find a way to not update the LSN map on every update. For example, only update the LSN page on the first update after a checkpoint (although that would still have a big contention focusing effect right after a checkpoint). I think it would make more sense to do this in the background. Suppose there's a background process that reads the WAL and figures out which buffers it touched, and then updates the LSN map accordingly. Then the contention-focusing effect disappears, because all of the updates to the LSN map are being made by the same process. You need some way to make sure the WAL sticks around until you've scanned it for changed blocks - but that is mighty close to what a physical replication slot does, so it should be manageable. If you implement this as a background process that reads WAL, as Robert suggested, you could perhaps implement this completely in an extension. That'd be nice, even if we later want to integrate this in the backend, in order to get you started quickly. This is marked in the commitfest as Needs Review, but ISTM this got its fair share of review back in February. Marking as Returned with Feedback. - Heikki -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] [RFC] LSN Map
On 01/13/2015 01:22 PM, Marco Nenciarini wrote: Il 08/01/15 20:18, Jim Nasby ha scritto: On 1/7/15, 3:50 AM, Marco Nenciarini wrote: The current implementation tracks only heap LSN. It currently does not track any kind of indexes, but this can be easily added later. Would it make sense to do this at a buffer level, instead of at the heap level? That means it would handle both heap and indexes. I don't know if LSN is visible that far down though. Where exactly you are thinking to handle it? Dunno, but Jim's got a point. This is a maintenance burden to all indexams, if they all have to remember to update the LSN map separately. It needs to be done in some common code, like in PageSetLSN or XLogInsert or something. Aside from that, isn't this horrible from a performance point of view? The patch doubles the buffer manager traffic, because any update to any page will also need to modify the LSN map. This code is copied from the visibility map code, but we got away with it there because the VM only needs to be updated the first time a page is modified. Subsequent updates will know the visibility bit is already cleared, and don't need to access the visibility map. Ans scalability: Whether you store one value for every N pages, or the LSN of every page, this is going to have a huge effect of focusing contention to the LSN pages. Currently, if ten backends operate on ten different heap pages, for example, they can run in parallel. There will be some contention on the WAL insertions (much less in 9.4 than before). But with this patch, they will all fight for the exclusive lock on the single LSN map page. You'll need to find a way to not update the LSN map on every update. For example, only update the LSN page on the first update after a checkpoint (although that would still have a big contention focusing effect right after a checkpoint). - Heikki -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] [RFC] LSN Map
On Mon, Feb 23, 2015 at 12:52 PM, Heikki Linnakangas hlinnakan...@vmware.com wrote: Dunno, but Jim's got a point. This is a maintenance burden to all indexams, if they all have to remember to update the LSN map separately. It needs to be done in some common code, like in PageSetLSN or XLogInsert or something. Aside from that, isn't this horrible from a performance point of view? The patch doubles the buffer manager traffic, because any update to any page will also need to modify the LSN map. This code is copied from the visibility map code, but we got away with it there because the VM only needs to be updated the first time a page is modified. Subsequent updates will know the visibility bit is already cleared, and don't need to access the visibility map. Ans scalability: Whether you store one value for every N pages, or the LSN of every page, this is going to have a huge effect of focusing contention to the LSN pages. Currently, if ten backends operate on ten different heap pages, for example, they can run in parallel. There will be some contention on the WAL insertions (much less in 9.4 than before). But with this patch, they will all fight for the exclusive lock on the single LSN map page. You'll need to find a way to not update the LSN map on every update. For example, only update the LSN page on the first update after a checkpoint (although that would still have a big contention focusing effect right after a checkpoint). I think it would make more sense to do this in the background. Suppose there's a background process that reads the WAL and figures out which buffers it touched, and then updates the LSN map accordingly. Then the contention-focusing effect disappears, because all of the updates to the LSN map are being made by the same process. You need some way to make sure the WAL sticks around until you've scanned it for changed blocks - but that is mighty close to what a physical replication slot does, so it should be manageable. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] [RFC] LSN Map
Il 08/01/15 20:18, Jim Nasby ha scritto: On 1/7/15, 3:50 AM, Marco Nenciarini wrote: The current implementation tracks only heap LSN. It currently does not track any kind of indexes, but this can be easily added later. Would it make sense to do this at a buffer level, instead of at the heap level? That means it would handle both heap and indexes. I don't know if LSN is visible that far down though. Where exactly you are thinking to handle it? Also, this pattern is repeated several times; it would be good to put it in it's own function: + lsnmap_pin(reln, blkno, lmbuffer); + lsnmap_set(reln, blkno, lmbuffer, lsn); + ReleaseBuffer(lmbuffer); Right. Regards, Marco -- Marco Nenciarini - 2ndQuadrant Italy PostgreSQL Training, Services and Support marco.nenciar...@2ndquadrant.it | www.2ndQuadrant.it signature.asc Description: OpenPGP digital signature
Re: [HACKERS] [RFC] LSN Map
On 1/7/15, 3:50 AM, Marco Nenciarini wrote: The current implementation tracks only heap LSN. It currently does not track any kind of indexes, but this can be easily added later. Would it make sense to do this at a buffer level, instead of at the heap level? That means it would handle both heap and indexes. I don't know if LSN is visible that far down though. Also, this pattern is repeated several times; it would be good to put it in it's own function: + lsnmap_pin(reln, blkno, lmbuffer); + lsnmap_set(reln, blkno, lmbuffer, lsn); + ReleaseBuffer(lmbuffer); -- Jim Nasby, Data Architect, Blue Treble Consulting Data in Trouble? Get it in Treble! http://BlueTreble.com -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] [RFC] LSN Map
On Wed, Jan 7, 2015 at 10:50:38AM +0100, Marco Nenciarini wrote: Implementation -- We create an additional fork which contains a raw stream of LSNs. To limit the space used, every entry represent the maximum LSN of a group of blocks of a fixed size. I chose arbitrarily the size of 2048 which is equivalent to 16MB of heap data, which means that we need 64k entry to track one terabyte of heap. I like the idea of summarizing the LSN to keep its size reaonable. Have you done any measurements to determine how much backup can be skipped using this method for a typical workload, i.e. how many 16MB page ranges are not modified in a typical span between incremental backups? -- Bruce Momjian br...@momjian.ushttp://momjian.us EnterpriseDB http://enterprisedb.com + Everyone has their own god. + -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] [RFC] LSN Map
Bruce Momjian wrote: Have you done any measurements to determine how much backup can be skipped using this method for a typical workload, i.e. how many 16MB page ranges are not modified in a typical span between incremental backups? That seems entirely dependent on the specific workload. -- Álvaro Herrerahttp://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Remote DBA, Training Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] [RFC] LSN Map
On Wed, Jan 7, 2015 at 12:33:20PM -0300, Alvaro Herrera wrote: Bruce Momjian wrote: Have you done any measurements to determine how much backup can be skipped using this method for a typical workload, i.e. how many 16MB page ranges are not modified in a typical span between incremental backups? That seems entirely dependent on the specific workload. Well, obviously. Is that worth even stating? My question is whether there are enough workloads for this to be generally useful, particularly considering the recording granularity, hint bits, and freezing. Do we have cases where 16MB granularity helps compared to file or table-level granularity? How would we even measure the benefits? How would the administrator know they are benefitting from incremental backups vs complete backups, considering the complexity of incremental restores? -- Bruce Momjian br...@momjian.ushttp://momjian.us EnterpriseDB http://enterprisedb.com + Everyone has their own god. + -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] [RFC] LSN Map
Alvaro Herrera alvhe...@2ndquadrant.com writes: Bruce Momjian wrote: Have you done any measurements to determine how much backup can be skipped using this method for a typical workload, i.e. how many 16MB page ranges are not modified in a typical span between incremental backups? That seems entirely dependent on the specific workload. Maybe, but it's a reasonable question. The benefit obtained from the added complexity/overhead clearly goes to zero if you summarize too much, and it's not at all clear that there's a sweet spot where you win. So I'd want to see some measurements demonstrating that this is worthwhile. regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers