On Wed, Jul 17, 2019 at 10:22 AM Jeevan Chalke < jeevan.cha...@enterprisedb.com> wrote:
> > > On Thu, Jul 11, 2019 at 5:00 PM Jeevan Chalke < > jeevan.cha...@enterprisedb.com> wrote: > >> Hi Anastasia, >> >> On Wed, Jul 10, 2019 at 11:47 PM Anastasia Lubennikova < >> a.lubennik...@postgrespro.ru> wrote: >> >>> 23.04.2019 14:08, Anastasia Lubennikova wrote: >>> > I'm volunteering to write a draft patch or, more likely, set of >>> > patches, which >>> > will allow us to discuss the subject in more detail. >>> > And to do that I wish we agree on the API and data format (at least >>> > broadly). >>> > Looking forward to hearing your thoughts. >>> >>> Though the previous discussion stalled, >>> I still hope that we could agree on basic points such as a map file >>> format and protocol extension, >>> which is necessary to start implementing the feature. >>> >> >> It's great that you too come up with the PoC patch. I didn't look at your >> changes in much details but we at EnterpriseDB too working on this feature >> and started implementing it. >> >> Attached series of patches I had so far... (which needed further >> optimization and adjustments though) >> >> Here is the overall design (as proposed by Robert) we are trying to >> implement: >> >> 1. Extend the BASE_BACKUP command that can be used with replication >> connections. Add a new [ LSN 'lsn' ] option. >> >> 2. Extend pg_basebackup with a new --lsn=LSN option that causes it to >> send the option added to the server in #1. >> >> Here are the implementation details when we have a valid LSN >> >> sendFile() in basebackup.c is the function which mostly does the thing >> for us. If the filename looks like a relation file, then we'll need to >> consider sending only a partial file. The way to do that is probably: >> >> A. Read the whole file into memory. >> >> B. Check the LSN of each block. Build a bitmap indicating which blocks >> have an LSN greater than or equal to the threshold LSN. >> >> C. If more than 90% of the bits in the bitmap are set, send the whole >> file just as if this were a full backup. This 90% is a constant now; we >> might make it a GUC later. >> >> D. Otherwise, send a file with .partial added to the name. The .partial >> file contains an indication of which blocks were changed at the beginning, >> followed by the data blocks. It also includes a checksum/CRC. >> Currently, a .partial file format looks like: >> - start with a 4-byte magic number >> - then store a 4-byte CRC covering the header >> - then a 4-byte count of the number of blocks included in the file >> - then the block numbers, each as a 4-byte quantity >> - then the data blocks >> >> >> We are also working on combining these incremental back-ups with the full >> backup and for that, we are planning to add a new utility called >> pg_combinebackup. Will post the details on that later once we have on the >> same page for taking backup. >> > > For combining a full backup with one or more incremental backup, we are > adding > a new utility called pg_combinebackup in src/bin. > > Here is the overall design as proposed by Robert. > > pg_combinebackup starts from the LAST backup specified and work backward. > It > must NOT start with the full backup and work forward. This is important > both > for reasons of efficiency and of correctness. For example, if you start by > copying over the full backup and then later apply the incremental backups > on > top of it then you'll copy data and later end up overwriting it or removing > it. Any files that are leftover at the end that aren't in the final > incremental backup even as .partial files need to be removed, or the > result is > wrong. We should aim for a system where every block in the output > directory is > written exactly once and nothing ever has to be created and then removed. > > To make that work, we should start by examining the final incremental > backup. > We should proceed with one file at a time. For each file: > > 1. If the complete file is present in the incremental backup, then just > copy it > to the output directory - and move on to the next file. > > 2. Otherwise, we have a .partial file. Work backward through the backup > chain > until we find a complete version of the file. That might happen when we get > \back to the full backup at the start of the chain, but it might also > happen > sooner - at which point we do not need to and should not look at earlier > backups for that file. During this phase, we should read only the HEADER of > each .partial file, building a map of which blocks we're ultimately going > to > need to read from each backup. We can also compute the offset within each > file > where that block is stored at this stage, again using the header > information. > > 3. Now, we can write the output file - reading each block in turn from the > correct backup and writing it to the write output file, using the map we > constructed in the previous step. We should probably keep all of the input > files open over steps 2 and 3 and then close them at the end because > repeatedly closing and opening them is going to be expensive. When that's > done, > go on to the next file and start over at step 1. > > > At what stage you will apply the WAL generated in between the START/STOP backup. > We are already started working on this design. > > -- > Jeevan Chalke > Technical Architect, Product Development > EnterpriseDB Corporation > > -- Ibrar Ahmed