[ 
http://issues.apache.org/jira/browse/DERBY-239?page=comments#action_12316434 ] 

Suresh Thalamati commented on DERBY-239:
----------------------------------------

I think providing  an online backup mechanism that does not block changes
to the database when the backup is in progress will be a useful feature to 
Derby users,
especially in the client/server environment. This backup mechanism might take 
more time
than current online backup because of the synchronization overheads required to 
 allow changes to
the database when backup is in progress. At this point I am not sure how much 
more
time it will take, but I think it should not be more than 50%, in the worst 
case scenario. 

Current online backup mechanism (that blocks changes to the database) is
supported using system procedures(ex:SYSCS_UTIL.SYSCS_BACKUP_DATABASE ). My
plan is to make the existing backup procedures work work without blocking the
changes to the database; No new system procedures are required. If community 
thinks 
both blocking/non-blocking type backups are useful, new procedures can 
be added. 

Currently backup contains mainly data files (seg0/*) and the transaction log
files(log/*) that are there when the backup started. On restore from the
backup, transactions are  replayed, similar to crash-recovery to bring the 
database
to a consistent state. New online backup also will work same way,  except
that all the transaction log must be copied to the backup, only after all the 
data 
files are backed up.


I think current implementation freezes(no changes to the database) the database 
during backup for following reasons :
1) Data files will in a stable state; backup will not contain partially updates
   pages on the disk. 
2) No new data files will be added/deleted on the disk; 
   because create/drop operations are blocked.
3) No transaction will committed after the backup starts. So all 
   unlogged operations will be rolled back. 

If the database is not frozen above conditions will not be true, that might
lead to the backups that are in corrupted/inconsistent state. I think, it is
not necessary to freeze the whole database to make a stable backup copy, by
blocking operations that modifies the files on-disk for small amounts of time, 
a stable backup can be made. 


Following sections explain some of the issues and possible ways to address them 
to
provide a real online backup that does not block changes to the database for 
the whole
duration of the backup.   

1) Corrupt pages in the backup database:

Backup reads and the  page cache writes can be interleaved if the database is
not frozen. i.e it is possible to land up with a page in the backup that  has  
a portion of the page that is more up-to-date than the rest of the page, if the
page cache writes are not blocked when a page is being read for the backup.


To avoid backup process reading partial written pages, some kind of
synchronization mechanism that  does not allow reading a page to write to the
back backup when the same page is being written to the disk.  This can be
implemented  by one of the following approaches:

a) By latching on a page key (container id, page number)  while doing the write 
   of the page from cache to disk  and while reading the page from the
   disk/cache to write to the backup. This approach has small overhead of
   acquiring an extra latch during the page cache writes when the backup is in 
progress. 

or 

b) read each pages in to the page cache first and then  latch  the
   page in the cache until a  temporary copy of it is made. This approach
   does not have extra overhead of extra latches on the page keys during writes 
, but 
   will pollute the page cache with the pages that are only required by the 
   backup; this  might have  impact on user operations because active user 
pages may 
   have been replaced by the backup pages in the page cache. 

or 

c) read pages into buffer pool and latch them while making a copy  similar to
the above approach,  but some how make sure that user pages are not kicked out 
of the buffer pool.   



One optimization  that may be made is to copy the file on the disk as it 
is to the backup, but keep track of pages that gets modified when file was 
being copied and rewrite those pages by using one of the above latching
mechanisms. 
 


2) Committed Non logged operation:
   
  Basic requirement to have consistent database backup is after the checkpoint
  for the backup all changes to the database will be available in the
  transaction log.  But Derby provides some non logged operations for
  performance reasons , for example  CREATE INDEX , IMPORT to a empty table
  ..etc. 
   
  This was not a issue in the old backup mechanism because no operations will
  be committed once the backup starts. So any non logged operations will be 
rolled 
  back similar to the regular crash recovery. 


  I can think of two ways to address this issue:

  a) To block non-logged operations when backup is in progress and also make 
backup
     wait before copying until the non-logged operation are complete. 
 
  b) make backup always wait for the non-logged operations to complete and
     retake the backup of those files that got affected by the non-logged
     operation, if they were already backed up.  
  
  c)  Some how trigger logging for all the operations after the checkpoint for
      the backup until the backup is  complete. This one is easy to implement 
      for non-logged operation  that are stated after the  backup, but the
      tricky case is to trigger logging for those non-logging operation that
      started before the backup but are committed during the backup. 

   
3) drop of a table when the  file on the disk  is being backed up. drop of 
   a table will result in deletion of the file on the disk, but deletion will 
get errors
   if it is opened for backup.  

   Some form of synchronization required to make sure that users do not see 
   weird errors in this case.  

4) creating a table/index  after the data files are backed up. Basically 
   recovery  system expects that file on the disk exists before the log records
   that  refer to it are written to the transaction log. 
   
   I think roll-forward recovery already handles this case , but should be
   tested. 

5) data file growth because of inserts when the file(table/index) is being 
backed up. 

   Recovery system  expects that a page is allocated on the disk 
   before log records are written to the transaction log about a page  to 
   avoid recovery errors because of space issues except incase of roll-forward
   recovery. 

   I think roll-forward recovery handles this case already; but have to make
   sure it will  work in this case also. Test cases should be added.

   Some form of synchronization is required, to make  a stable table snap shot 
of the
   file , if the file is growing when the backup is in progress.  

6) checkpoints when the backup is in progress.

   I think it not necessary to allow checkpoints when the backup is in
   progress. But if some one thinks otherwise , following should
   be addressed: 

   1) make copy of the log control file for the backup before copying any   
   2) If there are any operations that rely on checkpoint to make the
   operation consistent should not be allowed because backup might have 
   already copied some files when checkpoint happens.  


Any comments/suggestions will be appreciated. 


Thanks
-suresh

> Need a online backup feature  that does not block update operations   when 
> online backup is in progress.
> --------------------------------------------------------------------------------------------------------
>
>          Key: DERBY-239
>          URL: http://issues.apache.org/jira/browse/DERBY-239
>      Project: Derby
>         Type: New Feature
>   Components: Store
>     Versions: 10.1.1.0
>     Reporter: Suresh Thalamati
>     Assignee: Suresh Thalamati

>
> Currently Derby allows users to perfoms  online backups using 
> SYSCS_UTIL.SYSCS_BACKUP_DATABASE() procedure,  but while the backup is in 
> progress, update operations are temporarily blocked, but read operations can 
> still proceed.
> Blocking update operations can be real issue specifically in client server 
> environments, because user requests will be blocked for a long time if a 
> backup is in the progress on the server.

-- 
This message is automatically generated by JIRA.
-
If you think it was sent incorrectly contact one of the administrators:
   http://issues.apache.org/jira/secure/Administrators.jspa
-
For more information on JIRA, see:
   http://www.atlassian.com/software/jira

Reply via email to