This is a proposal for the introduction of *pseudo-tables* into Hypertable.
 This idea came about when trying to come up with an inexpensive way to
discover large rows in a table.  We zeroed in on the CellStore indexes
because they contain information that can be used to estimate large rows
cheaply.  However, the next question was how do we provide access to the
CellStore indexdes through the API?  Instead of adding some special-purpose
*ReadCellStoreIndexes* API, I propose that we use the existing API as-is
and surface the CellStore index information via a *pseudo-table*.  A
pseudo-table is a virtual table with no real table behind it.  When a query
comes in for the CellStore index pseudo table, the CellStore indexes will
get read directly to satisfy the query.  This approach is exactly analogous
to the /proc filesystem in
Linux<http://www.ibm.com/developerworks/library/l-proc/index.html>
.

The pseudo-table that represents the CellStore indexes for a given table, *
foo*, would have the name *foo*^.cellstore.index and the following schema:

create table foo^.cellstore.index (
  Size,
  CompressedSize,
  KeyCount
);

For each column family, there would be one qualified column for each block
in the CellStore indexes.  The column qualifier would have the format:
<filename>:<hex-offset>.  Also, the row key would be the same as the row
key in the CellStore index entries (we assume that's what most people will
want to aggregate this info on).  So for example, the CellStore index block
entry for file 2/2/default/ZwmE_ShYJKgim-IL/cs103 at offset 0x28A61 might
generate the following keys:

[email protected]
 Size:2/2/default/ZwmE_ShYJKgim-IL/cs103:0000000000028A61    171728
[email protected]
 CompressedSize:2/2/default/ZwmE_ShYJKgim-IL/cs103:0000000000028A61  65231
[email protected]
 KeyCount:2/2/default/ZwmE_ShYJKgim-IL/cs103:0000000000028A61        281

To query the cellstore.index pseudo-table for table *foo* to find an
estimate of large rows, you would issue a query along the lines of the
following:

SELECT sum(Size) FROM foo^.cellstore.index WHERE sum(Size) > 100000000;

Please respond with feedback or if you have any questions.  Thanks!

- Doug

-- 
You received this message because you are subscribed to the Google Groups 
"Hypertable Development" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to [email protected].
To post to this group, send email to [email protected].
Visit this group at http://groups.google.com/group/hypertable-dev?hl=en.
For more options, visit https://groups.google.com/groups/opt_out.


Reply via email to