Hi Øystein,

Thanks for reviewing the patch. My answers are in-line...

Øystein Grøvlen (JIRA) wrote:
[ http://issues.apache.org/jira/browse/DERBY-239?page=comments#action_12355620 ]
Øystein Grøvlen commented on DERBY-239:
---------------------------------------

I have looked at the onlinebackup_1.diff patch, and overall the patch
looks good.  However, I think there is one severe bug in
RAFContainer.java.  I guess due the cut and paste from writePage(),
you are seeking in another file than the one you are writing
to (fileData.seek(), but backupRaf.write()).


filedata.seek() in the backup routine is wrong, I will fix it. Good Catch.

I have a few other comments and questions:

General:
    * If I run backup with this patch, it seems like I will run the
      new code.  Does one not need to change the restore code to be
      able to restore restore such a backup, or does that the
      ordinary recovery handle that?


Backup structure is not changed, so current file restore code should work ok. Some of the new issues related to online backup are log replay issues like, redoing CREATE CONTAINER, new page creation ..etc. I think most of these issues are addressed as part of the rollforward recovery. I plan to rescan the restore code and do some more testing to find any cases that are not handled by the current code.

RawStore.java:
    * The BACKUP_FILTER now contains so much, that it would be useful
      to have a comment that says what is really left to copy.

I agree, comments will be helpful here. I am not sure BACKUP_FILTER will be be needed any more. Currently it just does not filter JAR dir, I plan to make JARS also as separate copy after the data segment as discussed in my previous e-mail.


    * Intuitively, it seems wrong to hard-code "seg0", but I see that
this is done all over the code.

Could you please file a JIRA entry for this one. I noticed it too, hard coding is definitely not the correct way here. One of us can clean up this.


Will there always be only one segment?

Yes, at least for now. It is hard wired some where in the access layer I think.


What is then the purpose of the segment concept?

I think initial design plan was to support some kind of data partitioning by storing fragments of tables/indexes on different segments or distribute tables into different ssegments. Each segment then can be placed on on different disks for performance reasons or to avoid derby database size being limited to a single disk/volume.


    * backup(File ...) seems like it would create an endless recursion
      if called.  Fortunately, it seems like it never will be
      called. Why do we need the methods with a File parameter instead
      of a String.  The system procedures uses the String variant.
>       Maybe we could just remove the File variant?

I think they are left overs from the old cloudscape versions, File variant is not supported any more , these can be cleaned up. There should be some File variant calls in access layer also, Could you please file Jira for this cleanup, so that these won't forgotten again.


    * I think it would be helpful with a comment that explained why
      disableLogArchiveMode() was made synchronized.

Yes, I definitely need to add comments about this synchronization.
This was part of my attempt to prevent parallel backup actions, which I did not complete as part of this patch.


BaseDataFileFactory.java:
    * I do think basing which files to back up on the contents of the
      seg0 directory is very robust.  What if someone by accident has
      written a file with a name that matches the pattern you are
      looking for.  Then I would think you may get a very strange
      error message that may not be easy to resolve.  Could not this
be based on some system catalog?

I think users should not be creating/deleting any files in SEG0 to start with. Pattern matching is just a simple precaution. I thought of scanning the catalogs to find the container to backup, but could not convince my self that it is required at least for now.

If you think scanning the catalogs is more robust, this enhancement can be done at later point of time easily.


 Another scenario is if someone
      by accident deletes a file for a table that is not accessed very
      often.  A later backup will then not detect that this file is
      missing.  Since the backup is believed to be succesful, the
      latest backup of this file may be deleted.


Backup is as good as the database in this case :-) If users access the deleted file table from the backup DB , it will fail with NO container error, as it will on the main database.

By using the catalog approach also backup can only throw a warning
about the  deleted container. Once the user deletes a container file ,
there is no way out, users can not even drop that table. I don't think
making backup fail forever when this scenario occurs will be acceptable to the users.


FileContainer.java:
    * I cannot find any backup-specific about getPageForBackup() so I
      think a more general name would be better (e.g, getLatchedPage).


Yes, currently this routine does not have any backup specific stuff in this routine. I would like to get the page from the cache with specific weight for the backup page in future, when the weight based cache mechanism is implemented.


RAFContainer.java:
    * The changes to this file seems not to be quite in line with some
      of the original design philosophies.  I am not sure that is
      necessarily bad, but it would be nice to here the arguments for
      doing it this way.  More specifically:
          - While RAFContainer so far has used the
            StorageRandomAccessFile/StorageFile abstractions, backup
            use RandomAccessFile/File directly.  Is there a particular
            reason for that?

Yes. Currently there is No API available for the users to specify the type of Storage Factory to use for the backup. It will be good enhancement to add later.

Online backup implementation is doing what the current backup does;
i.e use the disk IO calls directly , instead of the database Storage Factory. I think one reason to do this way is if a database is using in-memory Storage Factory, users can backup the database on to a disk. If Storage Factory used by the Database is java.io.* , then backup is doing the same without going through the Storage Factory interface.


          - In order to be able to backup a page, new methods have
            been added to FileContainer and BasePage/CachedPage, and
            the RAFContainer is doing latching of pages.  This
            increases coupling to other modules, have alternative
            designs been considered?


I was considering doing following minor changes here:

1) Instead of getting pageData byte array from the Page Object, make
a request to the page to write it self to backup(like page.backup(....).

2) Move the backing up and unlatching of the page code to the FileContainer.java.

I have not thought of any other better ways to do design this, any ideas will be helpful.


* privBackupContainer(): - Setting 'done=true' at the start of the method is a bit
            confusing.  I would think another name for this variable
            would be better.

I will change it to 'backupComplete = false' and reverse the exit logic, hopefully that will make it more readable.

          - If an exception is thrown while holding a latch, do one
            not need to relase the latch?

Yes. Latch has to be released. I will make sure no latches are held when an error occurs

* writeToBackup(): - copies a lot of code from writePage(). One should
            consider factoring out common code.

I will move the common code into a separate routine.

          - Due the cut and paste, you are seeking in another file
            than the one you are writing to. (fileData.seek(), but
            backupRaf.write())


you are right, I will fix it.

LogToFile.java:
    * In getFirstLogNeeded why did you need to change
      getFirstLogNeeded() to handle a null checkpoint?  Is it in case
      you do backup before any checkpoint has been performed?

No. There will be a checkpoint before the backup. I added NULL check for testing drop table stub cases without the checkpoint and I also noticed a NULL pointer error when shutting down on corrupt cases if a checkpoint not occurred by that time. so thought there is no harm in adding a NULL check there.


    * The javadoc for startLogBackup() says that 'log files are copied
      after all the data files are backed up, but at the end of the
      method log files are copied.

ReadOnly.java/InputStreamContainer.java: * I do not think javadoc should just be a copy of the
      interface/superclass, but say something about what is particular
      to this implementation.  In this case, the Javadoc should say
      that nothing is done.

typos:
    * several occurences of 'backedup'
    * LogToFile.java: 'eventhough'
    * RAFContainer.java: 'pahe cache', 'conatiner'
    * 'Standard Cloudscape error policy'.  Should this be changed to
      Derby?


I will correct the comments.



Pet peeves:
    * Should have Javadoc on all methods, parameters and return
      values.
    * Lines longer than 80 chars.


Sound like a good ones to follow. I will add the java doc and fix any long lines.

Thanks
-suresh


Reply via email to