Re: [HACKERS] Hooks to track changed pages for backup purposes
> On 13 Sep 2017, at 15:01, Tomas Vondra wrote: > > On 09/13/2017 07:53 AM, Andrey Borodin wrote: >>> * I see there are conditions like this: >>> >>>if(xlogreader->blocks[nblock].forknum == MAIN_FORKNUM) >>> >>> Why is it enough to restrict the block-tracking code to main fork? >>> Aren't we interested in all relation forks? >> fsm, vm and others are small enough to take them > > That seems like an optimization specific to your backup solution, not > necessarily to others and/or to other possible use cases. > >>> I guess you'll have to explain >>> what the implementation of the hooks is supposed to do, and why these >>> locations for hook calls are the right ones. It's damn impossible to >>> validate the patch without that information. >>> >>> Assuming you still plan to use the hook approach ... >> Yes, I still think hooking is good idea, but you are right - I need >> prototype first. I'll mark patch as Returned with feedback before >> prototype implementation. > > OK > > There > are no arguments fed to this hook, so modules would not be able to > analyze things in this context, except shared memory and process > state? > > Those hooks are put in hot code paths, and could impact performance of > WAL insertion itself. I do not think sending few bytes to cached array is comparable to disk >>> write of XLog record. Checking the func ptr is even cheaper with correct >>> branch prediction. >>> >>> That seems somewhat suspicious, for two reasons. Firstly, I believe we >>> only insert the XLOG records into WAL buffer here, so why should there >>> be any disk write related? Or do you mean the final commit? >> Yes, I mean finally we will be waiting for disk. Hundred empty ptr >> checks are neglectable in comparision with disk. > > Aren't we doing these calls while holding XLog locks? IIRC there was > quite a significant performance improvement after Heikki reduced the > amount of code executed while holding the locks. > >>> But more importantly, doesn't this kind of information require some >>> durability guarantees? I mean, if it gets lost during server crashes or >>> restarts, doesn't that mean the incremental backups might miss some >>> buffers? I'd guess the hooks will have to do some sort of I/O, to >>> achieve that, no? >> We need durability only on the level of one segment. If we do not have >> info from segment we can just rescan it. >> If we send segment to S3 as one file, we are sure in it's integrity. But >> this IO can by async. >> >> PTRACK in it's turn switch bits in fork's buffers which are written in >> checkpointer and..well... recovered during recovery. By usual WAL replay >> of recovery. > > But how do you do that from the hooks, if they only store the data into > a buffer in memory? Let's say you insert ~8MB of WAL into a segment, and > then the system crashes and reboots. How do you know you have incomplete > information from the WAL segment? > > Although, that's probably what wal_switch_hook() might do - sync the > data whenever the WAL segment is switched. Right? > >>> From this POV, the idea to collect this information on the backup system >>> (WAL archive) by pre-processing the arriving WAL segments seems like the >>> most promising. It moves the work to another system, the backup system >>> can make it as durable as the WAL segments, etc. >> >> Well, in some not so rare cases users encrypt backups and send to S3. >> And there is no system with CPUs that can handle that WAL parsing. >> Currently, I'm considering mocking prototype for wal-g, which works >> exactly this. > > Why couldn't there be a system with enough CPU power? Sure, if you want > to do this, you'll need a more powerful system, but regular CPUs can do >> 1GB/s in AES-256-GCM thanks to AES-NI. Or you could do it on the > database as part of archive_command, before the encryption, of course. Based on unanswered questions in the discussion in this thread, and that no new version of the patch has been submitted, I’m marking this returned with feedback. Please re-submit the patch in a future commitfest when ready for new review. cheers ./daniel -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Hooks to track changed pages for backup purposes
On 09/13/2017 07:53 AM, Andrey Borodin wrote: >> * I see there are conditions like this: >> >> if(xlogreader->blocks[nblock].forknum == MAIN_FORKNUM) >> >> Why is it enough to restrict the block-tracking code to main fork? >> Aren't we interested in all relation forks? > fsm, vm and others are small enough to take them > That seems like an optimization specific to your backup solution, not necessarily to others and/or to other possible use cases. >> I guess you'll have to explain >> what the implementation of the hooks is supposed to do, and why these >> locations for hook calls are the right ones. It's damn impossible to >> validate the patch without that information. >> >> Assuming you still plan to use the hook approach ... > Yes, I still think hooking is good idea, but you are right - I need > prototype first. I'll mark patch as Returned with feedback before > prototype implementation. > OK >> There are no arguments fed to this hook, so modules would not be able to analyze things in this context, except shared memory and process state? >>> Those hooks are put in hot code paths, and could impact performance of WAL insertion itself. >>> I do not think sending few bytes to cached array is comparable to disk >> write of XLog record. Checking the func ptr is even cheaper with correct >> branch prediction. >>> >> >> That seems somewhat suspicious, for two reasons. Firstly, I believe we >> only insert the XLOG records into WAL buffer here, so why should there >> be any disk write related? Or do you mean the final commit? > Yes, I mean finally we will be waiting for disk. Hundred empty ptr > checks are neglectable in comparision with disk. Aren't we doing these calls while holding XLog locks? IIRC there was quite a significant performance improvement after Heikki reduced the amount of code executed while holding the locks. >> >> But more importantly, doesn't this kind of information require some >> durability guarantees? I mean, if it gets lost during server crashes or >> restarts, doesn't that mean the incremental backups might miss some >> buffers? I'd guess the hooks will have to do some sort of I/O, to >> achieve that, no? > We need durability only on the level of one segment. If we do not have > info from segment we can just rescan it. > If we send segment to S3 as one file, we are sure in it's integrity. But > this IO can by async. > > PTRACK in it's turn switch bits in fork's buffers which are written in > checkpointer and..well... recovered during recovery. By usual WAL replay > of recovery. > But how do you do that from the hooks, if they only store the data into a buffer in memory? Let's say you insert ~8MB of WAL into a segment, and then the system crashes and reboots. How do you know you have incomplete information from the WAL segment? Although, that's probably what wal_switch_hook() might do - sync the data whenever the WAL segment is switched. Right? > >> From this POV, the idea to collect this information on the backup system >> (WAL archive) by pre-processing the arriving WAL segments seems like the >> most promising. It moves the work to another system, the backup system >> can make it as durable as the WAL segments, etc. > > Well, in some not so rare cases users encrypt backups and send to S3. > And there is no system with CPUs that can handle that WAL parsing. > Currently, I'm considering mocking prototype for wal-g, which works > exactly this. > Why couldn't there be a system with enough CPU power? Sure, if you want to do this, you'll need a more powerful system, but regular CPUs can do >1GB/s in AES-256-GCM thanks to AES-NI. Or you could do it on the database as part of archive_command, before the encryption, of course. regards -- Tomas Vondra http://www.2ndQuadrant.com PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Hooks to track changed pages for backup purposes
Hi! Thank you for your interest and experiment results. > 13 сент. 2017 г., в 15:43, Ants Aasma написал(а): > > On Thu, Aug 31, 2017 at 9:02 AM, Andrey Borodin wrote: >> When we have accumulated diff blocknumbers for most of segments we can >> significantly speed up method of WAL scanning. If we have blocknumbers for >> all segments we can skip WAL scanning at all. > > Have you measured that the WAL scanning is actually a significant > issue? As a quick experiment I hacked up pg_waldump to just dump block > references to stdout in binary format. It scanned 2.8GB of WAL in 3.17 > seconds, outputting 9.3M block refs per second. WAL was generated with > pgbench, synchronous commit off, using 4 cores for 10 minutes - making > the ratio of work from generating WAL to parsing it be about 750:1. > No, I had not done this measurement myself. Sure, parsing WAL, when it is in RAM, is not very expensive. Though, it can be even cheaper before formatting WAL. I just want to figure out what is the best place for this, if backuping exec is sharing CPUs with postmaster. Best regards, Andrey Borodin. -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Hooks to track changed pages for backup purposes
On Thu, Aug 31, 2017 at 9:02 AM, Andrey Borodin wrote: > When we have accumulated diff blocknumbers for most of segments we can > significantly speed up method of WAL scanning. If we have blocknumbers for > all segments we can skip WAL scanning at all. Have you measured that the WAL scanning is actually a significant issue? As a quick experiment I hacked up pg_waldump to just dump block references to stdout in binary format. It scanned 2.8GB of WAL in 3.17 seconds, outputting 9.3M block refs per second. WAL was generated with pgbench, synchronous commit off, using 4 cores for 10 minutes - making the ratio of work from generating WAL to parsing it be about 750:1. Regards, Ants Aasma -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Hooks to track changed pages for backup purposes
Hi Tomas! Thank you for looking into that patch. > 8 сент. 2017 г., в 1:53, Tomas Vondra > написал(а): > > A few more comments: > > * The patch defines wal_switch_hook, but it's never called. That call was missing, that's a bug, thanks for spotting that out. > * I see there are conditions like this: > >if(xlogreader->blocks[nblock].forknum == MAIN_FORKNUM) > > Why is it enough to restrict the block-tracking code to main fork? > Aren't we interested in all relation forks? fsm, vm and others are small enough to take them > I guess you'll have to explain > what the implementation of the hooks is supposed to do, and why these > locations for hook calls are the right ones. It's damn impossible to > validate the patch without that information. > > Assuming you still plan to use the hook approach ... Yes, I still think hooking is good idea, but you are right - I need prototype first. I'll mark patch as Returned with feedback before prototype implementation. > >>> There >>> are no arguments fed to this hook, so modules would not be able to >>> analyze things in this context, except shared memory and process >>> state? >> >>> >>> Those hooks are put in hot code paths, and could impact performance of >>> WAL insertion itself. >> I do not think sending few bytes to cached array is comparable to disk > write of XLog record. Checking the func ptr is even cheaper with correct > branch prediction. >> > > That seems somewhat suspicious, for two reasons. Firstly, I believe we > only insert the XLOG records into WAL buffer here, so why should there > be any disk write related? Or do you mean the final commit? Yes, I mean finally we will be waiting for disk. Hundred empty ptr checks are neglectable in comparision with disk. > > But more importantly, doesn't this kind of information require some > durability guarantees? I mean, if it gets lost during server crashes or > restarts, doesn't that mean the incremental backups might miss some > buffers? I'd guess the hooks will have to do some sort of I/O, to > achieve that, no? We need durability only on the level of one segment. If we do not have info from segment we can just rescan it. If we send segment to S3 as one file, we are sure in it's integrity. But this IO can by async. PTRACK in it's turn switch bits in fork's buffers which are written in checkpointer and..well... recovered during recovery. By usual WAL replay of recovery. > From this POV, the idea to collect this information on the backup system > (WAL archive) by pre-processing the arriving WAL segments seems like the > most promising. It moves the work to another system, the backup system > can make it as durable as the WAL segments, etc. Well, in some not so rare cases users encrypt backups and send to S3. And there is no system with CPUs that can handle that WAL parsing. Currently, I'm considering mocking prototype for wal-g, which works exactly this. Your comments were very valuable, thank you for looking into the patch and joining the discussion. Best regards, Andrey Borodin.
Re: [HACKERS] Hooks to track changed pages for backup purposes
Hi, On 09/01/2017 08:13 AM, Andrey Borodin wrote: > Thank you for your reply, Michael! Your comments are valuable, especially in the world of backups. > >> 31 авг. 2017 г., в 19:44, Michael Paquier написал(а): >> Such things are not Postgres-C like. > Will be fixed. > A few more comments: * The patch defines wal_switch_hook, but it's never called. * I see there are conditions like this: if(xlogreader->blocks[nblock].forknum == MAIN_FORKNUM) Why is it enough to restrict the block-tracking code to main fork? Aren't we interested in all relation forks? >> I don't understand what xlog_begin_insert_hook() is good for. > memset control structures and array of blocknos and relfilenodes of the size XLR_MAX_BLOCK_ID . > Why can't that happen in the other hooks? I guess you'll have to explain what the implementation of the hooks is supposed to do, and why these locations for hook calls are the right ones. It's damn impossible to validate the patch without that information. Assuming you still plan to use the hook approach ... >> There >> are no arguments fed to this hook, so modules would not be able to >> analyze things in this context, except shared memory and process >> state? > >> >> Those hooks are put in hot code paths, and could impact performance of >> WAL insertion itself. > I do not think sending few bytes to cached array is comparable to disk write of XLog record. Checking the func ptr is even cheaper with correct branch prediction. > That seems somewhat suspicious, for two reasons. Firstly, I believe we only insert the XLOG records into WAL buffer here, so why should there be any disk write related? Or do you mean the final commit? But more importantly, doesn't this kind of information require some durability guarantees? I mean, if it gets lost during server crashes or restarts, doesn't that mean the incremental backups might miss some buffers? I'd guess the hooks will have to do some sort of I/O, to achieve that, no? In any case, claims like this probably deserve some benchmarking. >> So you basically move the cost of scanning WAL >> segments for those blocks from any backup solution to the WAL >> insertion itself. Really, wouldn't it be more simple to let for >> example the archiver process to create this meta-data if you just want >> to take faster backups with a set of segments? Even better, you could >> do a scan after archiving N segments, and then use M jobs to do this >> work more quickly. (A set of background workers could do this job as >> well). > I like the idea of doing this during archiving. It is different trade-off between performance of OLTP and performance of backuping. Essentially, it is parsing WAL some time before doing backup. The best thing about it is usage of CPUs that are usually spinning in idle loop on backup machines. > >> In the backup/restore world, backups can be allowed to be taken at a >> slow pace, what matters is to be able to restore them quickly. > Backups are taken much more often than restored. > I agree. The ability to take backups quickly (and often) is just as important as fast recovery. >> In short, anything moving performance from an external backup code path >> to a critical backend code path looks like a bad design to begin with. >> So I am dubious that what you are proposing here is a good idea. > I will think about it more. This proposal takes vanishingly small part of backend performance, but, indeed, nonzero part. > But I think this is not necessarily just about code paths - moving it to the archiver or a bunch of background workers may easily have just as negative effect (or event worse), as it's using the same CPUs, has to actually re-read the WAL segments from disk (which may be a lot of I/O, particularly when done from multiple processes). >From this POV, the idea to collect this information on the backup system (WAL archive) by pre-processing the arriving WAL segments seems like the most promising. It moves the work to another system, the backup system can make it as durable as the WAL segments, etc. regards -- Tomas Vondra http://www.2ndQuadrant.com PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Hooks to track changed pages for backup purposes
Thank you for your reply, Michael! Your comments are valuable, especially in the world of backups. > 31 авг. 2017 г., в 19:44, Michael Paquier > написал(а): > Such things are not Postgres-C like. Will be fixed. > I don't understand what xlog_begin_insert_hook() is good for. memset control structures and array of blocknos and relfilenodes of the size XLR_MAX_BLOCK_ID . > There > are no arguments fed to this hook, so modules would not be able to > analyze things in this context, except shared memory and process > state? > > Those hooks are put in hot code paths, and could impact performance of > WAL insertion itself. I do not think sending few bytes to cached array is comparable to disk write of XLog record. Checking the func ptr is even cheaper with correct branch prediction. > So you basically move the cost of scanning WAL > segments for those blocks from any backup solution to the WAL > insertion itself. Really, wouldn't it be more simple to let for > example the archiver process to create this meta-data if you just want > to take faster backups with a set of segments? Even better, you could > do a scan after archiving N segments, and then use M jobs to do this > work more quickly. (A set of background workers could do this job as > well). I like the idea of doing this during archiving. It is different trade-off between performance of OLTP and performance of backuping. Essentially, it is parsing WAL some time before doing backup. The best thing about it is usage of CPUs that are usually spinning in idle loop on backup machines. > In the backup/restore world, backups can be allowed to be taken at a > slow pace, what matters is to be able to restore them quickly. Backups are taken much more often than restored. > In short, anything moving performance from an external backup code path > to a critical backend code path looks like a bad design to begin with. > So I am dubious that what you are proposing here is a good idea. I will think about it more. This proposal takes vanishingly small part of backend performance, but, indeed, nonzero part. Again, thank you for your time and comments. Best regards, Andrey Borodin. -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Hooks to track changed pages for backup purposes
On Thu, Aug 31, 2017 at 3:02 PM, Andrey Borodin wrote: > Here is the patch with hooks that I consider sufficient for implementation of > incremental backup with pages tracking as extension. > > Recently I was posting these things to the thread "Adding hook in BufferSync > for backup purposes" [0], but here I start separate thread since Subj field > of previous discussion is technically wrong. > > Currently various incremental backups can use one of this methods to take > diff of a cluster since some LSN: > 1. Check LSN of every page > 2. Scan WAL and collect block numbers of changed pages > > I propose adding hooks: > 1. When a buffer is registered in WAL insertion > This hook is supposed to place blocknumbers in a temporary storage, like > backend-local static array. > 2. When a WAL record insertion is started and finished, to transfer > blocknumbers to more concurrency-protected storage. > 3. When the WAL segment is switched to initiate async transfer of accumulated > blocknumbers to durable storage. > > When we have accumulated diff blocknumbers for most of segments we can > significantly speed up method of WAL scanning. If we have blocknumbers for > all segments we can skip WAL scanning at all. > > I think that these proposed hooks can enable more efficient backups. How do > you think? > > Any ideas will be appreciated. This patch is influenced by the code of PTRACK > (Yury Zhuravlev and Postgres Professional). +if (xlog_insert_buffer_hook) +for(nblock = 0; nblock < xlogreader->max_block_id; nblock++) +{ +if(xlogreader->blocks[nblock].forknum == MAIN_FORKNUM) +{ + xlog_insert_buffer_hook(xlogreader->blocks[nblock].blkno, + xlogreader->blocks[nblock].rnode, true); +} +} [...] +if(xlog_begin_insert_hook) +xlog_begin_insert_hook(); Such things are not Postgres-C like. The first one should use brackets, and the second one proper spacing after "if". I don't understand what xlog_begin_insert_hook() is good for. There are no arguments fed to this hook, so modules would not be able to analyze things in this context, except shared memory and process state? Those hooks are put in hot code paths, and could impact performance of WAL insertion itself. So you basically move the cost of scanning WAL segments for those blocks from any backup solution to the WAL insertion itself. Really, wouldn't it be more simple to let for example the archiver process to create this meta-data if you just want to take faster backups with a set of segments? Even better, you could do a scan after archiving N segments, and then use M jobs to do this work more quickly. (A set of background workers could do this job as well). In the backup/restore world, backups can be allowed to be taken at a slow pace, what matters is to be able to restore them quickly. In short, anything moving performance from an external backup code path to a critical backend code path looks like a bad design to begin with. So I am dubious that what you are proposing here is a good idea. -- Michael -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
[HACKERS] Hooks to track changed pages for backup purposes
Hi hackers! Here is the patch with hooks that I consider sufficient for implementation of incremental backup with pages tracking as extension. Recently I was posting these things to the thread "Adding hook in BufferSync for backup purposes" [0], but here I start separate thread since Subj field of previous discussion is technically wrong. Currently various incremental backups can use one of this methods to take diff of a cluster since some LSN: 1. Check LSN of every page 2. Scan WAL and collect block numbers of changed pages I propose adding hooks: 1. When a buffer is registered in WAL insertion This hook is supposed to place blocknumbers in a temporary storage, like backend-local static array. 2. When a WAL record insertion is started and finished, to transfer blocknumbers to more concurrency-protected storage. 3. When the WAL segment is switched to initiate async transfer of accumulated blocknumbers to durable storage. When we have accumulated diff blocknumbers for most of segments we can significantly speed up method of WAL scanning. If we have blocknumbers for all segments we can skip WAL scanning at all. I think that these proposed hooks can enable more efficient backups. How do you think? Any ideas will be appreciated. This patch is influenced by the code of PTRACK (Yury Zhuravlev and Postgres Professional). Best regards, Andrey Borodin. [0] https://www.postgresql.org/message-id/flat/20051502087457%40webcorp01e.yandex-team.ru#20051502087...@webcorp01e.yandex-team.ru 0001-hooks-to-watch-for-changed-pages.patch Description: Binary data -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers