Earlier I implemented a class that does aside and extend.
Yes - fantastic, thanks for sharing it!
From the growth of the aside file, it looks like theaside/extend method at this point is really only useful if you change the data a little bit.
It seems that if you have a database that changes often, it may actually be more efficient to simply copy the file, prevent others from writing, and make all the changes there and copy it back (having some method to manage the copy part for the readers).
The logic of commit-aside is only partly implemented. The idea behind commit-aside is that it would save only differences, which would make it a quick way to save small changes. Unfortunately, the project for which this work was done was cancelled, so I decided to bring out the mechanism, but stopped short of actually implementing binary diff logic. As a result, commit aside stores complete columns for now (the file format is ready for storing changes, it's just that each modified is saved as one "big change", i.e. in full.
Commit extend does indeed grow the file rather quickly. This is a consequence of MK's column-wise storage. The goal of c-extend and c-aside is to be able to combine them. At that point, one can store changes with multiple readers and a single writer without any contention, hence no need for locking. This goal is still on my list - in fact I've made some progress in getting closer to it very recently - but it hasn't happened, and I can't predict when it will, at this stage.
There is one case where the disk use is not as extreme: by using blocked views, changes tend to make changes in much smaller chunks. In this case, both c-extend and c-aside will be somewhat less disk-hungry.
Having said that, commit extend has been in use in production code for some time now. The scenario where its disk usage is not too extreme is when there is just one writer and usually at most one reader. In that approach, once can compact the file regularly, and keep the total file size within reasonable limits. That does imply copying to a new file - so yes, your observation is accurate: with today's c-aside not taking advantage of the diff-storage design, copying to file and replacing the original is sensible. The one thing you gain with c-extend and c-aside is that the time when this is done can be picked independently (say every night, at times of low activity).
I'm still quite keen on getting c-aside to work with diffs. The benefits are contention-free multi-user access and the ability to "change" read-only datafiles, e.g. those stored on CD-ROM. The use of a differential second file means one can ship one version of data, and then "apply" modifications by simply opening a second file. Lots of deployment options can be added (sending starkit updates in the Tcl world, for example).
-jcw
_____________________________________________ Metakit mailing list - [EMAIL PROTECTED] http://www.equi4.com/mailman/listinfo/metakit
