Will Berkeley has posted comments on this change. ( http://gerrit.cloudera.org:8080/11381 )
Change subject: KUDU-2566: fix two todo list ...................................................................... Patch Set 2: (5 comments) http://gerrit.cloudera.org:8080/#/c/11381/2//COMMIT_MSG Commit Message: http://gerrit.cloudera.org:8080/#/c/11381/2//COMMIT_MSG@7 PS2, Line 7: fix two todo list Could you make the subject more descriptive? Howabout "[tablet] KUDU-2566: Enhance rowset tree pruning"? http://gerrit.cloudera.org:8080/#/c/11381/2//COMMIT_MSG@10 PS2, Line 10: support open-ended intervals Following good commit message guidelines, could you explain what the previous behavior was, and how this patch changes it? Maybe: Previously, the rowset tree only could only compute overlap with a closed and bounded interval. This changeset adds the ability to find overlap for unbounded intervals as well. As a result, tablet iterators with a single primary key bound are more efficient because they do not iterate over rowsets that don't overlap with the key bound. http://gerrit.cloudera.org:8080/#/c/11381/2//COMMIT_MSG@11 PS2, Line 11: end up fetching one more rowset for the upper bound which is exclusive Again, let's fill out the explanation here a bit. My suggestion (adding on to the previous suggestion): This changeset also adds the ability to query the rowset tree using an exclusive upper bound, whereas previously only inclusive intervals were supported. This makes scans more efficient since primary key upper bounds are passed as exclusive bounds, so now rowsets that overlap only in the upper bound are excluded. http://gerrit.cloudera.org:8080/#/c/11381/2/src/kudu/tablet/rowset_tree.h File src/kudu/tablet/rowset_tree.h: http://gerrit.cloudera.org:8080/#/c/11381/2/src/kudu/tablet/rowset_tree.h@88 PS2, Line 88: void FindRowSetsIntersectingInterval(const Slice &lower_bound, : const Slice &upper_bound, : std::vector<RowSet *> *rowsets) const; : void FindRowSetsIntersectingIntervalGE(const Slice& lower_bound, : std::vector<RowSet*> *rowsets) const; : void FindRowSetsIntersectingIntervalLT(const Slice& upper_bound, : std::vector<RowSet*> *rowsets) const; > Rather than introducing new FindRowSetsIntersectingInterval... methods, cou +1. We could also accept boost::none for both arguments and short-circuit return all rowsets, which would allow the logic in tablet.cc to be very simple (because it's pushed into the rowset tree). http://gerrit.cloudera.org:8080/#/c/11381/2/src/kudu/util/interval_tree.h File src/kudu/util/interval_tree.h: http://gerrit.cloudera.org:8080/#/c/11381/2/src/kudu/util/interval_tree.h@140 PS2, Line 140: // Greater than or equal. : // The 'query' is the lower_bound. : template<class QueryPointType> : void FindIntersectingIntervalGE(const QueryPointType& query, : IntervalVector *results) const; : // Less than or equal. : // The 'query' is the upper_bound. : template<class QueryPointType> : void FindIntersectingIntervalLT(const QueryPointType &query, : IntervalVector *results) const; : > Rather than introduce these new methods, would it be possible to modify int +1 to the idea though I think it might suffice to add an additional QueryIntervalType analogous to QueryPointType, and have the QueryIntervalType be able to be unbounded. The interval tree stores objects of type interval_type, so generalizing this type to support open-ended ranges means the interval tree needs to support storing open ended intervals, which is not necessary for the rowset tree. In particular, the rowset tree's interval_type is RowsetWithBounds, which appears ot have been optimized for the rowset compaction algorithm, so there's other perf implications for changing it to be able to describe open-ended intervals.. -- To view, visit http://gerrit.cloudera.org:8080/11381 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: kudu Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I0e34b4169f93f3519f694c9bf86ca5295c318e79 Gerrit-Change-Number: 11381 Gerrit-PatchSet: 2 Gerrit-Owner: helifu <hzhel...@corp.netease.com> Gerrit-Reviewer: Adar Dembo <a...@cloudera.com> Gerrit-Reviewer: Kudu Jenkins Gerrit-Reviewer: Tidy Bot Gerrit-Reviewer: Will Berkeley <wdberke...@gmail.com> Gerrit-Comment-Date: Wed, 05 Sep 2018 15:30:40 +0000 Gerrit-HasComments: Yes