HQL currently supports queries like the following: SELECT * FROM test WHERE (ROW = 'a' or ROW = 'c' or ROW = 'g');
This query is handled via the row_intervals member of the ScanSpec. You are not logically adding any new functionality, you're just optimizing a specific case. What I'm suggesting is that the system should transparently optimize each query without exposing the details to the user. - Doug On Tue, Dec 7, 2010 at 1:14 PM, Andy <[email protected]> wrote: Regarding your concerns to plumb the change through the ThriftBroker > and HQL. > > - HQL here I see two option > a) Consider the row set scan as a none HQL requirement or > b) anyway a new row_predicate will be required, maybe something like > ROW IN '(' row_key [, row_key] ')' - which is independent, the > same if using the > rowset member or the existing row_intervals > > - ThriftBroker, thrift does support a set type (not for PHP) which > could be used - so no difficulties, > just adding a new line 13:optional set<string> rowset or with PHP > support 13:optional list<string>; other > 2 lines in ThriftBroker.cc convert_scan_spec would be required in > addition, which is also not really an issue > > I see following disadvantages to use the existing row_interval member > - the row interval data structure obviously does not fit well to a row > set > - an additional loop in TableScanner would be required to split away > row > intervals from single rows > - IntervalScanner is limited to 0 or 1 row interval at the moment, > which > we have to break with the scan_and_filter_rows flag > - IntervalScanner needs to maintain the row set as a set data > structure and > as well for the row_intervals, which needs to be rebuild for each > range > (in front of each create_scanner call) and in addition the row > interval for the > first/last row in the actual row set must be squeezed in > - ScanContext needs to rebuild the row set from the row intervals > again, just another > sanity check will be required - many additional comparisons per > range > - the amount of data to transfer to the range server will be more than > double in size > compared to the rowset approach > - overall I think the maintainability for IntervalScanner and > ScanContext will be much better > using the rowset member approach, because we do not mixup the > meaning of the row_interval, rowset > scans will be better isolated using the rowset member approach > > Where do you see the advantages to use the existing row_interval > member? Or > where do you see troubles adding a "rowset" member to the ScanSpec? > > -Andy > > > > > > -- > You received this message because you are subscribed to the Google Groups > "Hypertable Development" group. > To post to this group, send email to [email protected]. > To unsubscribe from this group, send email to > [email protected]<hypertable-dev%[email protected]> > . > For more options, visit this group at > http://groups.google.com/group/hypertable-dev?hl=en. > > -- You received this message because you are subscribed to the Google Groups "Hypertable Development" group. To post to this group, send email to [email protected]. To unsubscribe from this group, send email to [email protected]. For more options, visit this group at http://groups.google.com/group/hypertable-dev?hl=en.
