This is a proposal for the introduction of *pseudo-tables* into Hypertable. This idea came about when trying to come up with an inexpensive way to discover large rows in a table. We zeroed in on the CellStore indexes because they contain information that can be used to estimate large rows cheaply. However, the next question was how do we provide access to the CellStore indexdes through the API? Instead of adding some special-purpose *ReadCellStoreIndexes* API, I propose that we use the existing API as-is and surface the CellStore index information via a *pseudo-table*. A pseudo-table is a virtual table with no real table behind it. When a query comes in for the CellStore index pseudo table, the CellStore indexes will get read directly to satisfy the query. This approach is exactly analogous to the /proc filesystem in Linux<http://www.ibm.com/developerworks/library/l-proc/index.html> .
The pseudo-table that represents the CellStore indexes for a given table, * foo*, would have the name *foo*^.cellstore.index and the following schema: create table foo^.cellstore.index ( Size, CompressedSize, KeyCount ); For each column family, there would be one qualified column for each block in the CellStore indexes. The column qualifier would have the format: <filename>:<hex-offset>. Also, the row key would be the same as the row key in the CellStore index entries (we assume that's what most people will want to aggregate this info on). So for example, the CellStore index block entry for file 2/2/default/ZwmE_ShYJKgim-IL/cs103 at offset 0x28A61 might generate the following keys: [email protected] Size:2/2/default/ZwmE_ShYJKgim-IL/cs103:0000000000028A61 171728 [email protected] CompressedSize:2/2/default/ZwmE_ShYJKgim-IL/cs103:0000000000028A61 65231 [email protected] KeyCount:2/2/default/ZwmE_ShYJKgim-IL/cs103:0000000000028A61 281 To query the cellstore.index pseudo-table for table *foo* to find an estimate of large rows, you would issue a query along the lines of the following: SELECT sum(Size) FROM foo^.cellstore.index WHERE sum(Size) > 100000000; Please respond with feedback or if you have any questions. Thanks! - Doug -- You received this message because you are subscribed to the Google Groups "Hypertable Development" group. To unsubscribe from this group and stop receiving emails from it, send an email to [email protected]. To post to this group, send email to [email protected]. Visit this group at http://groups.google.com/group/hypertable-dev?hl=en. For more options, visit https://groups.google.com/groups/opt_out.
