Re: [hypertable-dev] Re: row set scan spec

Doug Judd Tue, 07 Dec 2010 13:32:35 -0800

HQL currently supports queries like the following:

SELECT * FROM test WHERE (ROW = 'a' or ROW = 'c' or ROW = 'g');


This query is handled via the row_intervals member of the ScanSpec.
You are not logically adding any new functionality, you're just
optimizing a specific case.  What I'm suggesting is that the system
should transparently optimize each query without exposing the details
to the user.

- Doug

On Tue, Dec 7, 2010 at 1:14 PM, Andy <[email protected]> wrote:

Regarding your concerns to plumb the change through the ThriftBroker
> and HQL.
>
> - HQL here I see two option
>  a) Consider the row set scan as a none HQL requirement or
>  b) anyway a new row_predicate will be required, maybe something like
>      ROW IN '(' row_key [, row_key] ')' - which is independent, the
> same if using the
>      rowset member or the existing row_intervals
>
> - ThriftBroker, thrift does support a set type (not for PHP) which
> could be used - so no difficulties,
>  just adding a new line 13:optional set<string> rowset or with PHP
> support 13:optional list<string>; other
>  2 lines in ThriftBroker.cc convert_scan_spec would be required in
> addition, which is also not really an issue
>
> I see following disadvantages to use the existing row_interval member
> - the row interval data structure obviously does not fit well to a row
> set
> - an additional loop in TableScanner would be required to split away
> row
>  intervals from single rows
> - IntervalScanner is limited to 0 or 1 row interval at the moment,
> which
>  we have to break with the scan_and_filter_rows flag
> - IntervalScanner needs to maintain the row set as a set data
> structure and
>  as well for the row_intervals, which needs to be rebuild for each
> range
>  (in front of each create_scanner call) and in addition the row
> interval for the
>  first/last row in the actual row set must be squeezed in
> - ScanContext needs to rebuild the row set from the row intervals
> again, just another
>  sanity check will be required - many additional comparisons per
> range
> - the amount of data to transfer to the range server will be more than
> double in size
>  compared to the rowset approach
> - overall I think the maintainability for IntervalScanner and
> ScanContext will be much better
>  using the rowset member approach, because we do not mixup the
> meaning of the row_interval, rowset
>  scans will be better isolated using the rowset member approach
>
> Where do you see the advantages to use the existing row_interval
> member? Or
> where do you see troubles adding a "rowset" member to the ScanSpec?
>
> -Andy
>
>
>
>
>
> --
> You received this message because you are subscribed to the Google Groups
> "Hypertable Development" group.
> To post to this group, send email to [email protected].
> To unsubscribe from this group, send email to
> [email protected]<hypertable-dev%[email protected]>
> .
> For more options, visit this group at
> http://groups.google.com/group/hypertable-dev?hl=en.
>
>

-- 
You received this message because you are subscribed to the Google Groups 
"Hypertable Development" group.
To post to this group, send email to [email protected].
To unsubscribe from this group, send email to 
[email protected].
For more options, visit this group at 
http://groups.google.com/group/hypertable-dev?hl=en.

Re: [hypertable-dev] Re: row set scan spec

Reply via email to