thanks, I was writing from memory - I'll check the code. My plan is to
make sure any exception during that phase of rollforward, while getting
the page during redo for the purpose of redoing the initialization of
the page goes through the existing path of just creating a new page if
it did not exist.
Basically for "garbage" I expect either a bad checksum or bad format id
exception. I'll make sure to add these test cases.
Your second point I am having a harder time with. If the "random" bits
found on disk happen to map to a valid checksum'd page which isn't the
right one, there will be a problem. At this point I think I would
consider it a bug in the OS to return data on a page that we never wrote
- and seems ok to assume this won't happen. The code will handle all
cases of this bug except when the page of data happens to be a valid
derby page.
Suresh Thalamati wrote:
I am not sure rollforward recovery fixes a page if there is a garbage on
the page. I think there is a small difference between the rollforward
from backup and the not syncing on allocation case u are trying to
handle. During rollforward recovery from the backup , redo is happening
on the the container from the backup. During redo if page is not found
means it was not allocated before, so it just goes and creates new page
and does a sync. I think rollforward recovery will never see a garbage
or a old valid page.
Without sync on allocation, How to make sure recovery never sees good
old page on a new allocation because some OS/Hardware does zero out a
new page to a file that was used earlier by the same file or some other
file ?. I think in most generic Operating Systems one will never give
the user a old page on new page allocation to the file. But I am not
sure how it works on small devices with FLASH memory ..etc.
Thanks
-suresht
Mike Matrigali wrote:
This first step is not going all the way to buffering with no
file system interaction. The system is still going to request
the space from the file before allowing the insert. In java
the way you do this is write the file to the OS. What I am
changing is that we use to sync the empty page. This means the
user will get the normal immediate feedback if it is his inserts
that cause the filesystem to fill up.
The current system already has the sync every 8 page optimization,
rather than every page. (I think it actually sync's every page
until there are 8 and then does 8 at a time - a left over from
when it was important to conserve disk space for running on small
systems and many apps had tables less than 8 pages). I considered
some sort of dynamic changing of the preallocate size, but it
seemed too complicated - and the bigger the preallocate grew
the more likely we would grow the file TOO much.
As you say the sync every N seconds is like a checkpoint. In
the current system checkpoint will sync every file so the sync
will happen at that point no matter what.
We did the sync for 2 reasons. One was that we use to not be
able to handle redo recovery of version 0 of the page if the
system read "garbage" from the disk. This was fixed by
requirements of recent rollforward recovery project. The
system can now handle redo where it reads
garbage from disk while creating version 0 of the page, and
it also handles if trying to read page and finds it needs
to extend the file.
The second is that we guessed that some OS might require the
sync to insure the space on disk. There is no info one way or
the other in the JVM documentation so it was just a guess on
our part. My belief is that the write call we are doing will
force the OS to allocate the space to our file, and no other
file will be able use that space, so the space is ours until
OS/machine crashes at least.
I think in general users will almost never see out of space
during redo recovery. I think it might take an OS crash,
on an OS with no filesystem logging, and a subsequent process
using up al the disk space on the machine before derby gets
to run redo.
So the upside is inserts go much faster. The downside is that
in some rare (and maybe on some/most OS's never) cases the user
will see an out of disk space message during database boot that
tells him that he has to free up some disk space. The system
will boot once the disk space is available. This error exists
today if the disk is full, and undo needs to write some CLR's -
so the error isn't even new for derby.
Bryan Pendleton wrote:
Mike Matrigali (JIRA) wrote:
... the total time aproaches very close to durability=test ...
Wow! This is great; looks like a very big win. Cool!
I had two questions:
1) It seems like a really common scenario would be:
- a single enormous "batch" application is trying to insert
many, many rows into a table.
- there's enough room in memory, so we just buffer up a bunch
of pages in the cache and let the application insert as it pleases.
- the application completes, and commits,
- then we discover there's not enough space on the disk.
Is this the problem you're trying to solve?
If so, I'm a little confused as to what the "external view" of the
system will be -- how will the user know that the disk has become
full and that space needs to be added?
2) Is there any advantage that you can see to have some sort of
intermediate behavior in between the extremes of:
- always sync every freshly allocated page, and
- never sync freshly allocated pages
For example, is there any point in a "sync every N pages", or
"sync every N seconds"? (I guess the latter is sort of like a
checkpoint?)
thanks,
bryan