[ http://issues.apache.org/jira/browse/DERBY-239?page=comments#action_12316434 ]
Suresh Thalamati commented on DERBY-239: ---------------------------------------- I think providing an online backup mechanism that does not block changes to the database when the backup is in progress will be a useful feature to Derby users, especially in the client/server environment. This backup mechanism might take more time than current online backup because of the synchronization overheads required to allow changes to the database when backup is in progress. At this point I am not sure how much more time it will take, but I think it should not be more than 50%, in the worst case scenario. Current online backup mechanism (that blocks changes to the database) is supported using system procedures(ex:SYSCS_UTIL.SYSCS_BACKUP_DATABASE ). My plan is to make the existing backup procedures work work without blocking the changes to the database; No new system procedures are required. If community thinks both blocking/non-blocking type backups are useful, new procedures can be added. Currently backup contains mainly data files (seg0/*) and the transaction log files(log/*) that are there when the backup started. On restore from the backup, transactions are replayed, similar to crash-recovery to bring the database to a consistent state. New online backup also will work same way, except that all the transaction log must be copied to the backup, only after all the data files are backed up. I think current implementation freezes(no changes to the database) the database during backup for following reasons : 1) Data files will in a stable state; backup will not contain partially updates pages on the disk. 2) No new data files will be added/deleted on the disk; because create/drop operations are blocked. 3) No transaction will committed after the backup starts. So all unlogged operations will be rolled back. If the database is not frozen above conditions will not be true, that might lead to the backups that are in corrupted/inconsistent state. I think, it is not necessary to freeze the whole database to make a stable backup copy, by blocking operations that modifies the files on-disk for small amounts of time, a stable backup can be made. Following sections explain some of the issues and possible ways to address them to provide a real online backup that does not block changes to the database for the whole duration of the backup. 1) Corrupt pages in the backup database: Backup reads and the page cache writes can be interleaved if the database is not frozen. i.e it is possible to land up with a page in the backup that has a portion of the page that is more up-to-date than the rest of the page, if the page cache writes are not blocked when a page is being read for the backup. To avoid backup process reading partial written pages, some kind of synchronization mechanism that does not allow reading a page to write to the back backup when the same page is being written to the disk. This can be implemented by one of the following approaches: a) By latching on a page key (container id, page number) while doing the write of the page from cache to disk and while reading the page from the disk/cache to write to the backup. This approach has small overhead of acquiring an extra latch during the page cache writes when the backup is in progress. or b) read each pages in to the page cache first and then latch the page in the cache until a temporary copy of it is made. This approach does not have extra overhead of extra latches on the page keys during writes , but will pollute the page cache with the pages that are only required by the backup; this might have impact on user operations because active user pages may have been replaced by the backup pages in the page cache. or c) read pages into buffer pool and latch them while making a copy similar to the above approach, but some how make sure that user pages are not kicked out of the buffer pool. One optimization that may be made is to copy the file on the disk as it is to the backup, but keep track of pages that gets modified when file was being copied and rewrite those pages by using one of the above latching mechanisms. 2) Committed Non logged operation: Basic requirement to have consistent database backup is after the checkpoint for the backup all changes to the database will be available in the transaction log. But Derby provides some non logged operations for performance reasons , for example CREATE INDEX , IMPORT to a empty table ..etc. This was not a issue in the old backup mechanism because no operations will be committed once the backup starts. So any non logged operations will be rolled back similar to the regular crash recovery. I can think of two ways to address this issue: a) To block non-logged operations when backup is in progress and also make backup wait before copying until the non-logged operation are complete. b) make backup always wait for the non-logged operations to complete and retake the backup of those files that got affected by the non-logged operation, if they were already backed up. c) Some how trigger logging for all the operations after the checkpoint for the backup until the backup is complete. This one is easy to implement for non-logged operation that are stated after the backup, but the tricky case is to trigger logging for those non-logging operation that started before the backup but are committed during the backup. 3) drop of a table when the file on the disk is being backed up. drop of a table will result in deletion of the file on the disk, but deletion will get errors if it is opened for backup. Some form of synchronization required to make sure that users do not see weird errors in this case. 4) creating a table/index after the data files are backed up. Basically recovery system expects that file on the disk exists before the log records that refer to it are written to the transaction log. I think roll-forward recovery already handles this case , but should be tested. 5) data file growth because of inserts when the file(table/index) is being backed up. Recovery system expects that a page is allocated on the disk before log records are written to the transaction log about a page to avoid recovery errors because of space issues except incase of roll-forward recovery. I think roll-forward recovery handles this case already; but have to make sure it will work in this case also. Test cases should be added. Some form of synchronization is required, to make a stable table snap shot of the file , if the file is growing when the backup is in progress. 6) checkpoints when the backup is in progress. I think it not necessary to allow checkpoints when the backup is in progress. But if some one thinks otherwise , following should be addressed: 1) make copy of the log control file for the backup before copying any 2) If there are any operations that rely on checkpoint to make the operation consistent should not be allowed because backup might have already copied some files when checkpoint happens. Any comments/suggestions will be appreciated. Thanks -suresh > Need a online backup feature that does not block update operations when > online backup is in progress. > -------------------------------------------------------------------------------------------------------- > > Key: DERBY-239 > URL: http://issues.apache.org/jira/browse/DERBY-239 > Project: Derby > Type: New Feature > Components: Store > Versions: 10.1.1.0 > Reporter: Suresh Thalamati > Assignee: Suresh Thalamati > > Currently Derby allows users to perfoms online backups using > SYSCS_UTIL.SYSCS_BACKUP_DATABASE() procedure, but while the backup is in > progress, update operations are temporarily blocked, but read operations can > still proceed. > Blocking update operations can be real issue specifically in client server > environments, because user requests will be blocked for a long time if a > backup is in the progress on the server. -- This message is automatically generated by JIRA. - If you think it was sent incorrectly contact one of the administrators: http://issues.apache.org/jira/secure/Administrators.jspa - For more information on JIRA, see: http://www.atlassian.com/software/jira