Re: [dba-dev] Base Performance

Frank Schönheit - Sun Microsystems Germ any Tue, 18 Sep 2007 00:23:37 -0700

Hi Marc,

>> Unfortunately not. This would be the only change which *really* allows
>> to address a number of performance issues with the embedded HSQLDB.
>> Amongst others, closing data views or forms becomes unacceptably slow
>> (IMO) if the .odb exceeds a certain (relatively small) size limit. Also,
>> opening the connection becomes slower as the database and thus the .odb
>> grows. The only change to overcome this would be the single-file
>> backend, but there has been no progress at this.
> 
> Will a single file make such a big difference? And why?


Because with the ZIP file architecture, every commit/write to the ZIP
file (the .odb) requires a complete rewriting of the complete package.
Technically, this is "solved" (not really) by working on a copy of all
the streams in the ZIP package, and only re-packinging them when the
document as a whole is finally saved.

That is the reason for some oddnesses: For instance, if a form is saved,
then the changes you did to the form are saved to the copy of the form's
stream. Only if you then save the database document, the copy is
re-merged into the .odb file.

This approach was dictated by the fact that on the medium term, the .odb
format should be standardized at OASIS as well, and this means doing as
the other applications/formats do.

Unfortunately, as real life shows, it doesn't scale at all. Database
documents, by their very nature, grow much faster than texts or
spreadsheets. Which means saving the whole thing becomes slower much
faster (what a statement).

The only solution here is to have a format where you have random access
to all parts, so you can change single sub streams, or even single
bytes, in constant time (as opposed to time growing linear or worse with
the file size).

Also, this would automatically solve the problem of data changes not
surviving a crash: Currently, when you enter data in say the table data
view, this (by the HSQL engine) is immediately (well, with a
configurable delay) written into the underlying files. This is how every
reasonable database engine behaves - it means if you pull the plug just
after changing the data, it will most probably still be there the next
time you look at it (again, not counting for possible write caches of
the operating system).

Unfortunately, this file which HSQL writes to is, as the form from
above, only a copy of the stream in the .odb file, where the copy is
located in some temporary place.

With a single-file back-end (which, when I say it, always implies a file
with random access to it, other back-end file formats are useless for a
DB engine), this would change, too.

> I could easily think of owering the workload when serializing the
> database to disc by having some sort of background task preparing the
> physical save by e.g. building up the DOM-model of the data or the like.

I'd suppose this is overkill, and will get into performance problems a
little bit later, but soon enough. Still, the problem, imposed by the
ZIP format, is that for the change of a single byte in the file (say you
changed a single letter in a table row), the complete file has to be
re-packaged and re-written. This bottle neck can IMO only be removed
with a change of the file format, away from ZIP, forward to a
random-access format.

Ciao
Frank

-- 
- Frank Schönheit, Software Engineer         [EMAIL PROTECTED] -
- Sun Microsystems                      http://www.sun.com/staroffice -
- OpenOffice.org Base                       http://dba.openoffice.org -
- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -

---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

Re: [dba-dev] Base Performance

Reply via email to