Re: [jira] Created: (DERBY-888) improve performance of page allocation

Mike Matrigali Mon, 30 Jan 2006 12:08:02 -0800

thanks, I was writing from memory - I'll check the code.  My plan is to

make sure any exception during that phase of rollforward, while gettingthe page during redo for the purpose of redoing the initialization of

the page goes through the existing path of just creating a new page if
it did not exist.
Basically for "garbage" I expect either a bad checksum or bad format id
exception.  I'll make sure to add these test cases.


Your second point I am having a harder time with.  If the "random" bits
found on disk happen to map to a valid checksum'd page which isn't the

right one, there will be a problem. At this point I think I wouldconsider it a bug in the OS to return data on a page that we never wrote- and seems ok to assume this won't happen. The code will handle all

cases of this bug except when the page of data happens to be a valid
derby page.

Suresh Thalamati wrote:

I am not sure rollforward recovery fixes a page if there is a garbage onthe page. I think there is a small difference between the rollforwardfrom backup and the not syncing on allocation case u are trying tohandle. During rollforward recovery from the backup , redo is happeningon the the container from the backup. During redo if page is not foundmeans it was not allocated before, so it just goes and creates new pageand does a sync. I think rollforward recovery will never see a garbageor a old valid page.

Without sync on allocation, How to make sure recovery never sees goodold page on a new allocation because some OS/Hardware does zero out anew page to a file that was used earlier by the same file or some otherfile ?. I think in most generic Operating Systems one will never givethe user a old page on new page allocation to the file. But I am notsure how it works on small devices with FLASH memory ..etc.



Thanks
-suresht

Mike Matrigali wrote:

This first step is not going all the way to buffering with no
file system interaction.  The system is still going to request
the space from the file before allowing the insert.  In java
the way you do this is write the file to the OS.  What I am
changing is that we use to sync the empty page.  This means the
user will get the normal immediate feedback if it is his inserts
that cause the filesystem to fill up.

The current system already has the sync every 8 page optimization,
rather than every page.  (I think it actually sync's every page
until there are 8 and then does 8 at a time - a left over from
when it was important to conserve disk space for running on small
systems and many apps had tables less than 8 pages).  I considered
some sort of dynamic changing of the preallocate size, but it
seemed too complicated - and the bigger the preallocate grew
the more likely we would grow the file TOO much.

As you say the sync every N seconds is like a checkpoint.  In
the current system checkpoint will sync every file so the sync
will happen at that point no matter what.

We did the sync for 2 reasons.  One was that we use to not be
able to handle redo recovery of version 0 of the page if the
system read "garbage" from the disk.  This was fixed by
requirements of recent rollforward recovery project.  The
system can now handle redo where it reads
garbage from disk while creating version 0 of the page, and
it also handles if trying to read page and finds it needs
to extend the file.

The second is that we guessed that some OS might require the
sync to insure the space on disk.  There is no info one way or
the other in the JVM documentation so it was just a guess on
our part.  My belief is that the write call we are doing will
force the OS to allocate the space to our file, and no other
file will be able use that space, so the space is ours until
OS/machine crashes at least.

I think in general users will almost never see out of space
during redo recovery.  I think it might take an OS crash,
on an OS with no filesystem logging, and a subsequent process
using up al the disk space on the machine before derby gets
to run redo.

So the upside is inserts go much faster.  The downside is that
in some rare (and maybe on some/most OS's never) cases the user
will see an out of disk space message during database boot that
tells him that he has to free up some disk space.  The system
will boot once the disk space is available.  This error exists
today if the disk is full, and undo needs to write some CLR's -
so the error isn't even new for derby.



Bryan Pendleton wrote:

Mike Matrigali (JIRA) wrote:

... the total time aproaches very close to durability=test ...





Wow! This is great; looks like a very big win. Cool!

I had two questions:

1) It seems like a really common scenario would be:
   - a single enormous "batch" application is trying to insert
     many, many rows into a table.
   - there's enough room in memory, so we just buffer up a bunch
     of pages in the cache and let the application insert as it pleases.
   - the application completes, and commits,
   - then we discover there's not enough space on the disk.

Is this the problem you're trying to solve?

If so, I'm a little confused as to what the "external view" of the
system will be -- how will the user know that the disk has become
full and that space needs to be added?

2) Is there any advantage that you can see to have some sort of
   intermediate behavior in between the extremes of:
   - always sync every freshly allocated page, and
   - never sync freshly allocated pages

For example, is there any point in a "sync every N pages", or

"sync every N seconds"? (I guess the latter is sort of like acheckpoint?)


thanks,

bryan

Re: [jira] Created: (DERBY-888) improve performance of page allocation

Reply via email to