Andy (Thalmann) recently sent me the following commit. It solves a problem he's seeing and wanted to know if I'd be willing to pull it in. He's allowed me to respond here on the mailing list so that others could benefit and/or offer an opinion.
https://github.com/andysoftdev/hypertable/commit/d69f4ad274bccc8c9971e70a31ec705b2b1867d5 In a nutshell, this commit is an optimization for the case where you're wanting to query 10,000 or 100,000 rows. It minimizes the number of network roundtrips the client needs to do (currently one per row) by doing a full table scan and passing the entire set of rows to each range scan and filtering out the matching rows. This reduces the number of network round trips to the number of ranges in a table. I think the ideal solution would do the following: 1. Only involve the set of ranges required to cover the row set being queried 2. For each range involved in the query, either do a set of random lookups, or scan the entire range, depending on how much coverage the set of queries has over the range. For example, if a particular range only contains (potentially) a single row, then it would be inefficient to scan the entire range just to filter out that single row. This would involve a fair amount of work on the client side (TableScanner and IntervalScanner) as well as some work in the RangeServer. If this is too much work for you right now, we could probably do something more along the lines of your existing commit, but do it in such a way that is more in-line with the ideal solution. The concern I have with your existing commit is the added "rowset" member of the ScanSpec. I think it would be better to use the existing "row_intervals" member and add a "scan_and_filter_rows" boolean. If you don't want to handle the interval case right now, if the scan_and_filter_rows flag is set, you could sanity check the row_intervals to make sure it included exact row matches only. By using the existing "row_intervals" member, it would make it a lot easier to plumb the change through the ThriftBroker and HQL. - Doug -- You received this message because you are subscribed to the Google Groups "Hypertable Development" group. To post to this group, send email to [email protected]. To unsubscribe from this group, send email to [email protected]. For more options, visit this group at http://groups.google.com/group/hypertable-dev?hl=en.
