[hypertable-dev] Re: [hypertable-user] Re: Pseudo-table proposal

Doug Judd Tue, 05 Mar 2013 14:51:19 -0800

Yes to both questions.  When it comes to API change proposals, it's good to
get feedback from people on both lists.  Aggregate functions, like sum(),
will be evaluated as far down as possible to avoid unnecessarily passing
data around.


- Doug




On Tue, Mar 5, 2013 at 2:41 PM, ddorian <[email protected]> wrote:

>
>    - Looks like when i reply (and you too) it gets posted at both
>    hypertable-user and hypertable-dev (intentional?).
>    - I hope that the sum() and new other operations happen first at the
>    rangeserver and then(if necessary) at the thriftclient.
>
>
> On Tuesday, March 5, 2013 4:14:45 PM UTC+1, Doug Judd wrote:
>
>> This is a proposal for the introduction of *pseudo-tables* into
>> Hypertable.  This idea came about when trying to come up with an
>> inexpensive way to discover large rows in a table.  We zeroed in on the
>> CellStore indexes because they contain information that can be used to
>> estimate large rows cheaply.  However, the next question was how do we
>> provide access to the CellStore indexdes through the API?  Instead of
>> adding some special-purpose *ReadCellStoreIndexes* API, I propose that
>> we use the existing API as-is and surface the CellStore index information
>> via a *pseudo-table*.  A pseudo-table is a virtual table with no real
>> table behind it.  When a query comes in for the CellStore index pseudo
>> table, the CellStore indexes will get read directly to satisfy the query.  
>> This
>> approach is exactly analogous to the /proc filesystem in 
>> Linux<http://www.ibm.com/developerworks/library/l-proc/index.html>
>> .
>>
>> The pseudo-table that represents the CellStore indexes for a given table,
>> *foo*, would have the name *foo*^.cellstore.index and the following
>> schema:
>>
>> create table foo^.cellstore.index (
>>   Size,
>>   CompressedSize,
>>   KeyCount
>> );
>>
>> For each column family, there would be one qualified column for each
>> block in the CellStore indexes.  The column qualifier would have the
>> format:  <filename>:<hex-offset>.  Also, the row key would be the same
>> as the row key in the CellStore index entries (we assume that's what most
>> people will want to aggregate this info on).  So for example, the CellStore
>> index block entry for file 2/2/default/ZwmE_**ShYJKgim-IL/cs103 at
>> offset 0x28A61 might generate the following keys:
>>
>> [email protected]    
>> Size:2/2/default/ZwmE_**ShYJKgim-IL/cs103:**0000000000028A61
>>    171728
>> [email protected]    CompressedSize:2/2/default/**
>> ZwmE_ShYJKgim-IL/cs103:**0000000000028A61  65231
>> [email protected]    KeyCount:2/2/default/ZwmE_**
>> ShYJKgim-IL/cs103:**0000000000028A61        281
>>
>> To query the cellstore.index pseudo-table for table *foo* to find an
>> estimate of large rows, you would issue a query along the lines of the
>> following:
>>
>> SELECT sum(Size) FROM foo^.cellstore.index WHERE sum(Size) > 100000000;
>>
>> Please respond with feedback or if you have any questions.  Thanks!
>>
>> - Doug
>>
>>   --
> You received this message because you are subscribed to the Google Groups
> "Hypertable User" group.
> To unsubscribe from this group and stop receiving emails from it, send an
> email to [email protected].
> To post to this group, send email to [email protected].
> Visit this group at http://groups.google.com/group/hypertable-user?hl=en.
> For more options, visit https://groups.google.com/groups/opt_out.
>
>
>



-- 
Doug Judd
CEO, Hypertable Inc.

-- 
You received this message because you are subscribed to the Google Groups 
"Hypertable Development" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to [email protected].
To post to this group, send email to [email protected].
Visit this group at http://groups.google.com/group/hypertable-dev?hl=en.
For more options, visit https://groups.google.com/groups/opt_out.

[hypertable-dev] Re: [hypertable-user] Re: Pseudo-table proposal

Reply via email to