On Wed, Jul 30, 2014 at 7:00 PM, desmodemone <desmodem...@gmail.com> wrote: > Hello, > I think it's very useful an incremental/differential backup method, by the way > the method has two drawbacks: > 1) In a database normally, even if the percent of modify rows is small compared to total rows, the probability to change only some files /tables is small, because the rows are normally not ordered inside a tables and the update are "random". If some tables are static, probably they are lookup tables or something like a registry, and normally these tables are small . > 2) every time a file changed require every time to read all file. So if the point A is true, probably you are reading a large part of the databases and then send that part , instead of sending a small part. > > In my opinion to solve these problems we need a different implementation of incremental backup. > I will try to show my idea about it. > > I think we need a bitmap map in memory to track the changed "chunks" of the file/s/table [ for "chunk" I mean an X number of tracked pages , to divide the every tracked files in "chunks" ], so we could send only the changed blocks from last incremental backup ( that could be a full for incremental backup ).The map could have one submaps for every tracked files, so it's more simple. > > So ,if we track with one bit a chunk of 8 page blocks ( 64KB) [ a chunk of 8 block is only an example] , If we use one map of 1Mbit ( 1Mbit are 125KB of memory ) we could track a table with a total size of 64Gb, probably we could use a compression algorithm because the map is done by 1 and 0 . This is a very simple idea, but it shows that the map does not need too much memory if we track groups of blocks i.e. "chunk", obviously the problem is more complex, and probably there are better and more robust solutions. > Probably we need more space for the header of map to track the informations about file and the last backup and so on. > > I think the map must be updated by the bgwriter , i.e. when it flushes the dirty buffers,
Not only bgwriter, but checkpointer and backends as well, as those also flush buffers. Also there are some writes which are done outside shared buffers, you need to track those separately. Another point is that to track the changes due to hint bit modification, you need to enable checksums or wal_log_hints which will either lead to more cpu or I/O. > fortunately we don't need this map for consistence of database, so we could create and manage it in memory to limit the impact on performance. > The drawback is that If the db crashes or someone closes it , the next incremental backup will be full , we could think to flush the map to disk if the PostgreSQL will receive a signal of closing process or something similar. > > > > In this way we obtain : > 1) we read only small part of a database ( the probability of a changed chunk are less the the changed of the whole file ) > 2) we do not need to calculate the checksum, saving cpu > 3) we save i/o in reading and writing ( we will send only the changed block from last incremental backup ) > 4) we save network > 5) we save time during backup. if we read and write less data, we reduce the time to do an incremental backup. > 6) I think the bitmap map in memory will not impact too much on the performance of the bgwriter. > > What do you think about? I think with this method has 3 drawbacks compare to method proposed a. either enable checksum or wal_log_hints, so it will incur extra I/O if you enable wal_log_hints b. backends also need to update the map which though a small cost, but still ... c. map is not crash safe, due to which sometimes full back up is needed. With Regards, Amit Kapila. EnterpriseDB: http://www.enterprisedb.com