Re: [dba-dev] Base Performance

Marc Santhoff Wed, 19 Sep 2007 13:28:53 -0700

Am Mittwoch, den 19.09.2007, 09:56 +0200 schrieb Frank Schönheit - Sun
Microsystems Germany:
> Hi Marc,
> 
> >>> An idea here could be to re-package the .odb using the database part as
> >>> a single item with compression -0, but I think zip doesn't support this.
> >> It does. The meta information in all of OOo's document is uncompressed,
> >> for quicker access. However, still the complete package has to be
> >> written when you only change a single byte.
> > 
> > At least one possibility. If a scheme of temporarily writing to a random
> > access file and adding the (closed) ra-file to the zip when closing
> > the .odb would be introduced.
> 
> Something like this is already in place. HSQL's data files (which you
> find inside the "database" folder of an .odb) are RA files, since
> they're HSQL's normal backend format, which of course is RA for
> non-trivial data sizes. Also, every file in the ZIP package is extracted
> as one temporary file. So effectively, the normal work already happens
> on RA files.


I think I begin to understand. With this knowledge in mind I think it
makes much more sense to swap the order of compression and packaging:

Having a file format (for .odb) as a container usable as a VFS that can
hold compressed items (file in the virtual space) would be better. It
may mean to shift the compression workload to another spot, but since it
can (should be possible) be done on single items (e.g. storing one
table) it'll spread wider (in time) and could help fastening up the long
lasting actions at the cost of slowing down those single actions.
Deciding if it is a usable approach really needs some testing
beforehand.

VFS implementations capable of storing compressed files should be
available in many incarnations - HDF is one.

> However, there still is the problem of pulled plugs (or, since rumors
> say this sometimes happens, of OOo crashing): You'll lose all your data
> since the last committing of the temporary files to the .odb. So, if we
> commit too seldom, you lose "too much" data (don't shoot me, I know that
> every single byte is "too much" in this context). If we commit too
> often, your work flow suffers, since every commit takes the more time
> the greater the DB is.

So that is what happens when file recovering after a crash is done:
looking for temporary files and re-building the .odb from them.

> Which, hmm, perhaps leads back to your idea of doing the committing in
> the background ...

Not really the best solution. But if the file format has to stay
unchanged one possible technique to address the packaging performance
problem ...

> > That's an interesting thought, if you say the ODF file only will contain
> > the metadata and the real data is stored in an (not completely, see
> > above) external high performance container file.
> 
> Well, for all DBs except embedded HSQL this is the case - e.g., if you
> connect to an MySQL or PostgreSQL server :)

I had the idea of using an external HSQL server instance owned by the
quickstarter. ;)

> > Since it is my personal pet for storing hirarchical measurement data I
> > could imagine HDF5 as a portential candidate.
> > ...
> 
> Interesting. I'm not sure what our OASIS guys would say if I suggest
> using a file format which, though open, is under the control of a single
> company, but I will read a little.

I never noticed that fact, hdf is around for many, many years. But if it
is so the difference to OO.o or Java is not that big, is it?

> >>> <shouting "Jehova mode>
> >>> Other databases using binary files having a jdbc driver may fit this
> >>> requirement, too. Firebird would be a candidate.
> >>> </shouting "Jehova mode>
> >> Sure. Do you volunteer to write the driver/integration for FB? :)
> > 
> > No, definitly not. That's why I tagged this remark with some sort of
> > "humor sign". ;)
> 
> Oh. Should I have said "Ist hier etwa Weibsvolk anwesend?" to show I
> understood it? :)

Bärte, schöne Bärte!

> > Although I'd appreciate having drivers for Firebird, DB2 and other would
> > be nice, this is far beyond the scope of my time. How much work would it
> > be to write a driver?
> 
> Depends. Usually, if you have a good API to connect to, this shouldn't
> be too much work (a few man weeks, I'd say). However, dedicated drivers
> exist because generic (ODBC/JDBC/ADO) drivers do not care for all
> specialities of the specific DB. Depending on how many of those you
> want to address, it can take arbitrarily longer. Look, Joerg
> Budischewski is enhancing his PosgreSQL driver for *years* now ;)

Far too much work, and since there are JDBC drivers it may be a better
idea to improve these drivers than writing new ones. Btw., is there a
list of features a JDBC driver needs to offer for working flawlessly
with OO.o?

Marc


---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

Re: [dba-dev] Base Performance

Reply via email to