Re: [jira] Commented: (DERBY-239) Need a online backup feature that does not block update operations when online backup is in progress.

Suresh Thalamati Wed, 26 Oct 2005 01:38:00 -0700

Hi Øystein,

Thanks for reviewing the patch. My answers are in-line...


Øystein Grøvlen (JIRA) wrote:

[ http://issues.apache.org/jira/browse/DERBY-239?page=comments#action_12355620 ]

Øystein Grøvlen commented on DERBY-239:
---------------------------------------

I have looked at the onlinebackup_1.diff patch, and overall the patch
looks good.  However, I think there is one severe bug in
RAFContainer.java.  I guess due the cut and paste from writePage(),
you are seeking in another file than the one you are writing
to (fileData.seek(), but backupRaf.write()).


filedata.seek() in the backup routine is wrong, I will fix it. Good Catch.

I have a few other comments and questions:

General:
    * If I run backup with this patch, it seems like I will run the
      new code.  Does one not need to change the restore code to be
      able to restore restore such a backup, or does that the
      ordinary recovery handle that?

Backup structure is not changed, so current file restore code shouldwork ok. Some of the new issues related to online backup are logreplay issues like, redoing CREATE CONTAINER, new page creation ..etc.I think most of these issues are addressed as part of the rollforwardrecovery. I plan to rescan the restore code and do some more testingto find any cases that are not handled by the current code.

RawStore.java:
    * The BACKUP_FILTER now contains so much, that it would be useful
      to have a comment that says what is really left to copy.

I agree, comments will be helpful here. I am not sure BACKUP_FILTERwill be be needed any more. Currently it just does not filter JAR dir,I plan to make JARS also as separate copy after the data segment asdiscussed in my previous e-mail.

    * Intuitively, it seems wrong to hard-code "seg0", but I see that

this is done all over the code.

Could you please file a JIRA entry for this one. I noticed it too,hard coding is definitely not the correct way here. One of us canclean up this.

Will there always be only one segment?

Yes, at least for now. It is hard wired some where in the access layerI think.

What is then the purpose of the segment concept?

I think initial design plan was to support some kind of datapartitioning by storing fragments of tables/indexes on differentsegments or distribute tables into different ssegments. Each segmentthen can be placed on on different disks for performance reasons or toavoid derby database size being limited to a single disk/volume.

    * backup(File ...) seems like it would create an endless recursion
      if called.  Fortunately, it seems like it never will be
      called. Why do we need the methods with a File parameter instead
      of a String.  The system procedures uses the String variant.

>       Maybe we could just remove the File variant?

I think they are left overs from the old cloudscape versions, Filevariant is not supported any more , these can be cleaned up. Thereshould be some File variant calls in access layer also, Could youplease file Jira for this cleanup, so that these won't forgotten again.

    * I think it would be helpful with a comment that explained why
      disableLogArchiveMode() was made synchronized.


Yes, I definitely need to add comments about this synchronization.

This was part of my attempt to prevent parallel backup actions, whichI did not complete as part of this patch.


BaseDataFileFactory.java:
    * I do think basing which files to back up on the contents of the
      seg0 directory is very robust.  What if someone by accident has
      written a file with a name that matches the pattern you are
      looking for.  Then I would think you may get a very strange
      error message that may not be easy to resolve.  Could not this

be based on some system catalog?

I think users should not be creating/deleting any files in SEG0 tostart with. Pattern matching is just a simple precaution. I thought ofscanning the catalogs to find the container to backup, but could notconvince my self that it is required at least for now.

If you think scanning the catalogs is more robust, this enhancementcan be done at later point of time easily.

 Another scenario is if someone
      by accident deletes a file for a table that is not accessed very
      often.  A later backup will then not detect that this file is
      missing.  Since the backup is believed to be succesful, the
      latest backup of this file may be deleted.

Backup is as good as the database in this case :-) If users accessthe deleted file table from the backup DB , it will fail with NOcontainer error, as it will on the main database.


By using the catalog approach also backup can only throw a warning
about the  deleted container. Once the user deletes a container file ,
there is no way out, users can not even drop that table. I don't think

making backup fail forever when this scenario occurs will beacceptable to the users.

FileContainer.java:
    * I cannot find any backup-specific about getPageForBackup() so I
      think a more general name would be better (e.g, getLatchedPage).

Yes, currently this routine does not have any backup specific stuffin this routine. I would like to get the page from the cache withspecific weight for the backup page in future, when the weight basedcache mechanism is implemented.

RAFContainer.java:
    * The changes to this file seems not to be quite in line with some
      of the original design philosophies.  I am not sure that is
      necessarily bad, but it would be nice to here the arguments for
      doing it this way.  More specifically:
          - While RAFContainer so far has used the
            StorageRandomAccessFile/StorageFile abstractions, backup
            use RandomAccessFile/File directly.  Is there a particular
            reason for that?

Yes. Currently there is No API available for the users to specify thetype of Storage Factory to use for the backup. It will be goodenhancement to add later.


Online backup implementation is doing what the current backup does;

i.e use the disk IO calls directly , instead of the database StorageFactory. I think one reason to do this way is if a database is usingin-memory Storage Factory, users can backup the database on to a disk.If Storage Factory used by the Database is java.io.* , then backup isdoing the same without going through the Storage Factory interface.

          - In order to be able to backup a page, new methods have
            been added to FileContainer and BasePage/CachedPage, and
            the RAFContainer is doing latching of pages.  This
            increases coupling to other modules, have alternative
            designs been considered?



I was considering doing following minor changes here:

1) Instead of getting pageData byte array from the Page Object, make
a request to the page to write it self to backup(like page.backup(....).

2) Move the backing up and unlatching of the page code to theFileContainer.java.

I have not thought of any other better ways to do design this, anyideas will be helpful.

* privBackupContainer():- Setting 'done=true' at the start of the method is a bit
            confusing.  I would think another name for this variable
            would be better.

I will change it to 'backupComplete = false' and reverse the exitlogic, hopefully that will make it more readable.

          - If an exception is thrown while holding a latch, do one
            not need to relase the latch?

Yes. Latch has to be released. I will make sure no latches are heldwhen an error occurs

* writeToBackup():- copies a lot of code from writePage(). One should
            consider factoring out common code.


I will move the common code into a separate routine.

          - Due the cut and paste, you are seeking in another file
            than the one you are writing to. (fileData.seek(), but
            backupRaf.write())


you are right, I will fix it.

LogToFile.java:
    * In getFirstLogNeeded why did you need to change
      getFirstLogNeeded() to handle a null checkpoint?  Is it in case
      you do backup before any checkpoint has been performed?

No. There will be a checkpoint before the backup. I added NULLcheck for testing drop table stub cases without the checkpoint and Ialso noticed a NULL pointer error when shutting down on corrupt casesif a checkpoint not occurred by that time. so thought there is no harmin adding a NULL check there.

    * The javadoc for startLogBackup() says that 'log files are copied
      after all the data files are backed up, but at the end of the
      method log files are copied.

ReadOnly.java/InputStreamContainer.java:* I do not think javadoc should just be a copy of the

      interface/superclass, but say something about what is particular
      to this implementation.  In this case, the Javadoc should say
      that nothing is done.

typos:
    * several occurences of 'backedup'
    * LogToFile.java: 'eventhough'
    * RAFContainer.java: 'pahe cache', 'conatiner'
    * 'Standard Cloudscape error policy'.  Should this be changed to
      Derby?



I will correct the comments.


Pet peeves:
    * Should have Javadoc on all methods, parameters and return
      values.
    * Lines longer than 80 chars.

Sound like a good ones to follow. I will add the java doc and fix anylong lines.


Thanks
-suresh

Re: [jira] Commented: (DERBY-239) Need a online backup feature that does not block update operations when online backup is in progress.

Reply via email to