[HACKERS] Follow-Up to A Silly Idea for Vertically-Oriented Databases

Avery Payne Fri, 07 Sep 2007 14:59:08 -0700

In hindsight, I did miss quite a bit in my last post. Here's a summarythat might clear it up:

Add a single keyword that specifies that the storage format changesslightly. The keyword should not affect SQL compliancy while stillextending functionality. It can be specified as either part of theCREATE TABLE statement or part of the tablespace mechanism.

When a table is created with this setting, all columns in a record aresplit vertically into individual, 1-column-wide tables, and each columnin the table is assigned an OIDs. Each OID corresponds to one of our"1-wide" tables. An additional control column will be created that isonly visible to the database and the administrator. This column storesa single logical indicating if the record is allocated or not. Youmight even be able to create a special bitmap index that is hidden, andjust use existing bitmap functions in the index code. In essence, thiscolumn helps keep all of the other columns in sync when dealing with rows.

When writing data to the table, each individual column will update, butthe engine invisibly wraps together all of the columns into a singletransaction. That is, each row insert is still atomic and behaves likeit normally would - either the insert succeeds or it doesn't. Becausethe updates are handled by the engine as many separate tables, nospecial changes are required, and existing storage mechanisms (TOAST)continue to function as they always did. This could be written as asuper-function of sorts, one that would combine all of the smaller stepstogether and use the existing mechanisms.

Updates are performed in the same manner, with each "column" beingrolled up into a single invisible mini-transaction for the given record.

Deletes are performed by marking not only the columns as deleted butalso the control column as having that row available for overwrite. I'msimplifying quite a bit but I think the general idea is understood.Yes, a delete will have significant overhead compared to an insert orupdate but this is a known tradeoff that the administrator is willing tomake, so they can gain faster read speeds - ie. they want anOLAP-oriented store, not an OLTP-oriented store.

The control column would be used to locate records that can beoverwritten quickly. When a record is deleted, the control column'sbitmap was adjusted to indicate that a free space was available. Theengine would then co-ordinate as it did above, but it can "cheat" -instead of trying to figure things out for each table, the offset towrite to is already known, so the update proceeds as listed above, otherthan each part of the little mini-transaction writes to the same"offset" (ie. each column in the record will have the same "hole", sowhen you go to write the record out, write it to the same "recordspot"). This is where the control column not only coordinates deletesbut also inserts that re-use space from deleted records.


Hopefully that makes it a little clearer.




---------------------------(end of broadcast)---------------------------
TIP 4: Have you searched our list archives?

              http://archives.postgresql.org

[HACKERS] Follow-Up to A Silly Idea for Vertically-Oriented Databases

Reply via email to