Hi Øystein,
Thanks for reviewing the patch. My answers are in-line...
Øystein Grøvlen (JIRA) wrote:
[ http://issues.apache.org/jira/browse/DERBY-239?page=comments#action_12355620 ]
Øystein Grøvlen commented on DERBY-239:
---------------------------------------
I have looked at the onlinebackup_1.diff patch, and overall the patch
looks good. However, I think there is one severe bug in
RAFContainer.java. I guess due the cut and paste from writePage(),
you are seeking in another file than the one you are writing
to (fileData.seek(), but backupRaf.write()).
filedata.seek() in the backup routine is wrong, I will fix it. Good Catch.
I have a few other comments and questions:
General:
* If I run backup with this patch, it seems like I will run the
new code. Does one not need to change the restore code to be
able to restore restore such a backup, or does that the
ordinary recovery handle that?
Backup structure is not changed, so current file restore code should
work ok. Some of the new issues related to online backup are log
replay issues like, redoing CREATE CONTAINER, new page creation ..etc.
I think most of these issues are addressed as part of the rollforward
recovery. I plan to rescan the restore code and do some more testing
to find any cases that are not handled by the current code.
RawStore.java:
* The BACKUP_FILTER now contains so much, that it would be useful
to have a comment that says what is really left to copy.
I agree, comments will be helpful here. I am not sure BACKUP_FILTER
will be be needed any more. Currently it just does not filter JAR dir,
I plan to make JARS also as separate copy after the data segment as
discussed in my previous e-mail.
* Intuitively, it seems wrong to hard-code "seg0", but I see that
this is done all over the code.
Could you please file a JIRA entry for this one. I noticed it too,
hard coding is definitely not the correct way here. One of us can
clean up this.
Will there always be only one segment?
Yes, at least for now. It is hard wired some where in the access layer
I think.
What is then the purpose of the segment concept?
I think initial design plan was to support some kind of data
partitioning by storing fragments of tables/indexes on different
segments or distribute tables into different ssegments. Each segment
then can be placed on on different disks for performance reasons or to
avoid derby database size being limited to a single disk/volume.
* backup(File ...) seems like it would create an endless recursion
if called. Fortunately, it seems like it never will be
called. Why do we need the methods with a File parameter instead
of a String. The system procedures uses the String variant.
> Maybe we could just remove the File variant?
I think they are left overs from the old cloudscape versions, File
variant is not supported any more , these can be cleaned up. There
should be some File variant calls in access layer also, Could you
please file Jira for this cleanup, so that these won't forgotten again.
* I think it would be helpful with a comment that explained why
disableLogArchiveMode() was made synchronized.
Yes, I definitely need to add comments about this synchronization.
This was part of my attempt to prevent parallel backup actions, which
I did not complete as part of this patch.
BaseDataFileFactory.java:
* I do think basing which files to back up on the contents of the
seg0 directory is very robust. What if someone by accident has
written a file with a name that matches the pattern you are
looking for. Then I would think you may get a very strange
error message that may not be easy to resolve. Could not this
be based on some system catalog?
I think users should not be creating/deleting any files in SEG0 to
start with. Pattern matching is just a simple precaution. I thought of
scanning the catalogs to find the container to backup, but could not
convince my self that it is required at least for now.
If you think scanning the catalogs is more robust, this enhancement
can be done at later point of time easily.
Another scenario is if someone
by accident deletes a file for a table that is not accessed very
often. A later backup will then not detect that this file is
missing. Since the backup is believed to be succesful, the
latest backup of this file may be deleted.
Backup is as good as the database in this case :-) If users access
the deleted file table from the backup DB , it will fail with NO
container error, as it will on the main database.
By using the catalog approach also backup can only throw a warning
about the deleted container. Once the user deletes a container file ,
there is no way out, users can not even drop that table. I don't think
making backup fail forever when this scenario occurs will be
acceptable to the users.
FileContainer.java:
* I cannot find any backup-specific about getPageForBackup() so I
think a more general name would be better (e.g, getLatchedPage).
Yes, currently this routine does not have any backup specific stuff
in this routine. I would like to get the page from the cache with
specific weight for the backup page in future, when the weight based
cache mechanism is implemented.
RAFContainer.java:
* The changes to this file seems not to be quite in line with some
of the original design philosophies. I am not sure that is
necessarily bad, but it would be nice to here the arguments for
doing it this way. More specifically:
- While RAFContainer so far has used the
StorageRandomAccessFile/StorageFile abstractions, backup
use RandomAccessFile/File directly. Is there a particular
reason for that?
Yes. Currently there is No API available for the users to specify the
type of Storage Factory to use for the backup. It will be good
enhancement to add later.
Online backup implementation is doing what the current backup does;
i.e use the disk IO calls directly , instead of the database Storage
Factory. I think one reason to do this way is if a database is using
in-memory Storage Factory, users can backup the database on to a disk.
If Storage Factory used by the Database is java.io.* , then backup is
doing the same without going through the Storage Factory interface.
- In order to be able to backup a page, new methods have
been added to FileContainer and BasePage/CachedPage, and
the RAFContainer is doing latching of pages. This
increases coupling to other modules, have alternative
designs been considered?
I was considering doing following minor changes here:
1) Instead of getting pageData byte array from the Page Object, make
a request to the page to write it self to backup(like page.backup(....).
2) Move the backing up and unlatching of the page code to the
FileContainer.java.
I have not thought of any other better ways to do design this, any
ideas will be helpful.
* privBackupContainer():
- Setting 'done=true' at the start of the method is a bit
confusing. I would think another name for this variable
would be better.
I will change it to 'backupComplete = false' and reverse the exit
logic, hopefully that will make it more readable.
- If an exception is thrown while holding a latch, do one
not need to relase the latch?
Yes. Latch has to be released. I will make sure no latches are held
when an error occurs
* writeToBackup():
- copies a lot of code from writePage(). One should
consider factoring out common code.
I will move the common code into a separate routine.
- Due the cut and paste, you are seeking in another file
than the one you are writing to. (fileData.seek(), but
backupRaf.write())
you are right, I will fix it.
LogToFile.java:
* In getFirstLogNeeded why did you need to change
getFirstLogNeeded() to handle a null checkpoint? Is it in case
you do backup before any checkpoint has been performed?
No. There will be a checkpoint before the backup. I added NULL
check for testing drop table stub cases without the checkpoint and I
also noticed a NULL pointer error when shutting down on corrupt cases
if a checkpoint not occurred by that time. so thought there is no harm
in adding a NULL check there.
* The javadoc for startLogBackup() says that 'log files are copied
after all the data files are backed up, but at the end of the
method log files are copied.
ReadOnly.java/InputStreamContainer.java:
* I do not think javadoc should just be a copy of the
interface/superclass, but say something about what is particular
to this implementation. In this case, the Javadoc should say
that nothing is done.
typos:
* several occurences of 'backedup'
* LogToFile.java: 'eventhough'
* RAFContainer.java: 'pahe cache', 'conatiner'
* 'Standard Cloudscape error policy'. Should this be changed to
Derby?
I will correct the comments.
Pet peeves:
* Should have Javadoc on all methods, parameters and return
values.
* Lines longer than 80 chars.
Sound like a good ones to follow. I will add the java doc and fix any
long lines.
Thanks
-suresh