[HACKERS] A Silly Idea for Vertically-Oriented Databases

Avery Payne Fri, 07 Sep 2007 13:58:38 -0700

Be forewarned - this is probably a very long post, and I'm just a meremortal (ie. admin) who doesn't write copious amounts of C code. Takethe following posts and suggestions with a grain of salt.

So I've been seeing/hearing all of the hoopla over vertical databases(column stores), and how they'll not only slice bread but also maketoast, etc. I've done some quick searches for past articles on"C-Store", "Vertica", "Column Store", and "Vertical Database", and haveseen little discussion on this. And then a funny thought occurs to me -when I look at the directory structure and file layout of a PostgreSQLdatabase, I see that each OID corresponds to a table, which correspondsto (generally) a single file. Then I have a second funny thought - whatif there was a low-friction, low-cost-of-implementation way to bringsimilar advantages to PostgreSQL without major alterations, recoding,etc? Finally it occurs to me that PostgreSQL already does somethingsimilar but it could do it so much better, with only one language changeand minor changes to the storage layout. So here's my plum-crazyproposal (and I've made some before - seehttp://archives.postgresql.org/pgsql-odbc/2006-10/msg00040.php - andthey not only made it into production, but they are in active use by meon a weekly basis - Thanks Hiroshi!!!), bear with me...

Make one small, very tiny syntactic change to "CREATE TABLE" thatincludes a new keyword, "COLUMN-STORE" or something similar. I don'tcare where it appears as long as it's after the "CREATE TABLE". Youwould not have to break any existing SQL conventions, PostgreSQL wouldcontinue to be SQL compliant, and given the odd wording, I highly doubtthat the folks who work on SQL keywords will end up using it at anypoint in time. If adding COLUMN-STORE is objectionable because it will"cloud the compliance of the language" then simply move thefunctionality into the table space functionality. In hindsight, itmight even belong there instead. So, instead of specifying it bytable, create a table space that has an attribute "Column Storage" setas active. When inactive, it uses the traditional "one-file-per-table"layout.

Make each table column capable of receiving an OID. This will becritical for the following steps...

If a table is created with "COLUMN-STORE" as an option, then it willcontinue to behave in the same way it always has, but the storage willbe different. Each column in the table will be represented by a singlefile, with the file name being (naturally) the OID.INSERT/UPDATE/DELETE would function as it always has, etc. Nothing wouldchange. Except how the data is stored. The existing TOAST mechanismscontinue to work - because the engine would treat each file as asingle-column table!One additional "column" would be added to the store, an invisible onethat not only tracks the OID for the "rows" in this type of setup, butalso the state of the row. Let's call this the "Control Column". Giventhat the metadata size for the row would be fixed/constant, we won'thave to worry about what is in the other "columns" and "rows", they canbe any size. BTW, the "Control Column" would be just another columnfrom the storage engine's point of view. It just happens to be one thatno-one can see, other than the database (and maybe the administrator).

When you go to VACUUM a table, you would treat each column as asingle-row table, so if a row is a candidate for a VACUUM reclamation,then it will adjust each "column" an equal amount. Under nocircumstances would you have columns "out of sync", so if a record goes,it means each adjacent column goes with it. This sounds disk-intensiveat first, until you realize that the admin will have made a contentiousdecision to use this format, and understands the advantages/drawbacks tothis method. So it takes a little longer to VACUUM, I don't care,because as an admin I will have specified this layout for a reason - todo OLAP, not OLTP. Which means, I rarely VACUUM it. Add to this thehigh efficiency you would gain by packing more records into buffers perread, and most of the losses you take in re-reading data would reallynot amount to as big a loss as you might think.

DELETE would simply mark a row off as deleted in the "Control Column".If the storage engine needed to reclaim a row, it would not have to lookany further than the "control column" to find an empty spot where itcould overwrite data.

INSERT/UPDATE continue to work as they always have. The storage enginewould perceive each "column" as a single-column table, meaning that theexisting TOAST mechanisms continue to work! Nothing needs to changethere. The real change would be that the table's columns would be"split up" into individual updates, and the "Control Column" would beused to keep all of the records in sync.

Why bother with this? Because, when you are said and done, you willfind yourself with a rough equivelent of a column-store database, withall of the OLAP goodness that people are looking for. You have littleif any impact on the admin/users perception, other than a flag waschecked somewhere and forgotten about in the database. From the storageengine's perspective, you have many many many small 1-column tables totake care of, and they all update at the same "place" at the same "time"to keep the records in sync when you recompose a row. TOAST and largeobject storage works the same as before, nothing changes, and that's asit should be.

All with what would be (hopefully) a minor change to the storagebackend. We're not talking about brain surgery on existing, testedcode, but rather, a new feature that uses existing features in-place.As TOAST improves so does this feature. As caching improves, so doesthe feature again. And so on.

I've been in a bit of a hurry to blurt all of this out, and I'm surethat I've forgotten something along the way, so if you find somethingmissing, please be patient- I had to write all of this in about 20minutes or less and I didn't have alot of time.


---------------------------(end of broadcast)---------------------------
TIP 7: You can help support the PostgreSQL project by donating at

               http://www.postgresql.org/about/donate

[HACKERS] A Silly Idea for Vertically-Oriented Databases

Reply via email to