helifu has posted comments on this change. ( http://gerrit.cloudera.org:8080/11381 )
Change subject: KUDU-2566: Enhance rowset tree pruning and stop copying strings ...................................................................... Patch Set 9: (47 comments) thanks to Adar and Will. http://gerrit.cloudera.org:8080/#/c/11381/2//COMMIT_MSG Commit Message: http://gerrit.cloudera.org:8080/#/c/11381/2//COMMIT_MSG@7 PS2, Line 7: Enhance rowset tr > Could you make the subject more descriptive? Howabout "[tablet] KUDU-2566: Done http://gerrit.cloudera.org:8080/#/c/11381/2//COMMIT_MSG@10 PS2, Line 10: Support open-ended intervals > Following good commit message guidelines, could you explain what the previo Done http://gerrit.cloudera.org:8080/#/c/11381/2//COMMIT_MSG@11 PS2, Line 11: Previously, the rowset tree could only compute overlap with a > Again, let's fill out the explanation here a bit. My suggestion (adding on Done http://gerrit.cloudera.org:8080/#/c/11381/5//COMMIT_MSG Commit Message: PS5: > Thanks for rewriting the commit message. It's really nice now. Done http://gerrit.cloudera.org:8080/#/c/11381/5//COMMIT_MSG@7 PS5, Line 7: stop copying strings > nit: "stop copying strings" Done http://gerrit.cloudera.org:8080/#/c/11381/5//COMMIT_MSG@11 PS5, Line 11: > nit: Could you wrap commit message lines at 72 characters? Done http://gerrit.cloudera.org:8080/#/c/11381/6//COMMIT_MSG Commit Message: http://gerrit.cloudera.org:8080/#/c/11381/6//COMMIT_MSG@8 PS6, Line 8: > I think git only uses the first line as the subject so inserting a line bre Done http://gerrit.cloudera.org:8080/#/c/11381/4/src/kudu/tablet/rowset_tree-test.cc File src/kudu/tablet/rowset_tree-test.cc: http://gerrit.cloudera.org:8080/#/c/11381/4/src/kudu/tablet/rowset_tree-test.cc@18 PS4, Line 18: #include <algorithm> > warning: #includes are not sorted properly [llvm-include-order] Done http://gerrit.cloudera.org:8080/#/c/11381/5/src/kudu/tablet/rowset_tree-test.cc File src/kudu/tablet/rowset_tree-test.cc: http://gerrit.cloudera.org:8080/#/c/11381/5/src/kudu/tablet/rowset_tree-test.cc@211 PS5, Line 211: domized > By the convention of the Google style guide, we use references arguments on Done http://gerrit.cloudera.org:8080/#/c/11381/5/src/kudu/tablet/rowset_tree-test.cc@218 PS5, Line 218: > I think we want coverage of the degenerate case when s1 == s2 (so the inter Done http://gerrit.cloudera.org:8080/#/c/11381/5/src/kudu/tablet/rowset_tree-test.cc@217 PS5, Line 217: UND_EQUAL : }; : const auto& GetStringPair = [] (const BoundOperator op) -> std::pair<string, string> { : while (true) { : string > You can simplify this a little bit to Done http://gerrit.cloudera.org:8080/#/c/11381/6/src/kudu/tablet/rowset_tree-test.cc File src/kudu/tablet/rowset_tree-test.cc: http://gerrit.cloudera.org:8080/#/c/11381/6/src/kudu/tablet/rowset_tree-test.cc@251 PS6, Line 251: vector<RowSet*> out; > Don't need to SeedRandom() twice in a test. Done http://gerrit.cloudera.org:8080/#/c/11381/6/src/kudu/tablet/rowset_tree-test.cc@283 PS6, Line 283: testing::Values(10, 100, 250, 500), > Should either fix this test case, or remove it. Done http://gerrit.cloudera.org:8080/#/c/11381/6/src/kudu/tablet/rowset_tree.h File src/kudu/tablet/rowset_tree.h: http://gerrit.cloudera.org:8080/#/c/11381/6/src/kudu/tablet/rowset_tree.h@96 PS6, Line 96: // [lower_bound, upper_bound) > Nit: got a tab here. Done http://gerrit.cloudera.org:8080/#/c/11381/5/src/kudu/tablet/rowset_tree.h File src/kudu/tablet/rowset_tree.h: http://gerrit.cloudera.org:8080/#/c/11381/5/src/kudu/tablet/rowset_tree.h@90 PS5, Line 90: en 'lower_bound' is boost::none > Can you add a comment explaining what it means to pass boost::none for 'low Done http://gerrit.cloudera.org:8080/#/c/11381/5/src/kudu/tablet/rowset_tree.h@90 PS5, Line 90: > style nit: We put the &'s and *'s for refs and pointers by the type, not th Done http://gerrit.cloudera.org:8080/#/c/11381/5/src/kudu/tablet/rowset_tree.h@91 PS5, Line 91: > Likewise. Done http://gerrit.cloudera.org:8080/#/c/11381/2/src/kudu/tablet/rowset_tree.h File src/kudu/tablet/rowset_tree.h: http://gerrit.cloudera.org:8080/#/c/11381/2/src/kudu/tablet/rowset_tree.h@88 PS2, Line 88: const std::function<void(RowSet*, int)>& cb) const; : : // When 'lower_bound' is boost::none, it means negative infinity. : // When 'upper_bound' is boost::none, it means positive infinity. : // So the query interval can be one of below: : // [-OO, +OO) : // [-OO, upper_bound) > +1. We could also accept boost::none for both arguments and short-circuit r some tests, example TestTablet.TestRowIteratorSimple in tablet-test.cc, will rely on rowsets's sequence, so we need keep the original logic. http://gerrit.cloudera.org:8080/#/c/11381/2/src/kudu/tablet/rowset_tree.h@88 PS2, Line 88: const std::function<void(RowSet*, int)>& cb) const; : : // When 'lower_bound' is boost::none, it means negative infinity. : // When 'upper_bound' is boost::none, it means positive infinity. : // So the query interval can be one of below: : // [-OO, +OO) : // [-OO, upper_bound) > Rather than introducing new FindRowSetsIntersectingInterval... methods, cou Done http://gerrit.cloudera.org:8080/#/c/11381/5/src/kudu/tablet/rowset_tree.cc File src/kudu/tablet/rowset_tree.cc: http://gerrit.cloudera.org:8080/#/c/11381/5/src/kudu/tablet/rowset_tree.cc@107 PS5, Line 107: & > Likewise with the & placement nit. Done http://gerrit.cloudera.org:8080/#/c/11381/5/src/kudu/tablet/rowset_tree.cc@108 PS5, Line 108: & > Likewise with the & placement nit. Done http://gerrit.cloudera.org:8080/#/c/11381/5/src/kudu/tablet/rowset_tree.cc@109 PS5, Line 109: const > I don't think the const specifier is meaningful on a bool passed by value. Done http://gerrit.cloudera.org:8080/#/c/11381/5/src/kudu/tablet/rowset_tree.cc@117 PS5, Line 117: & > Likewise with the & placement nit. Done http://gerrit.cloudera.org:8080/#/c/11381/5/src/kudu/tablet/rowset_tree.cc@118 PS5, Line 118: & > Likewise with the & placement nit. Done http://gerrit.cloudera.org:8080/#/c/11381/5/src/kudu/tablet/rowset_tree.cc@119 PS5, Line 119: const > Ditto on the const specifier. Done http://gerrit.cloudera.org:8080/#/c/11381/5/src/kudu/tablet/rowset_tree.cc@194 PS5, Line 194: & > Likewise with the & placement nit. Done http://gerrit.cloudera.org:8080/#/c/11381/5/src/kudu/tablet/rowset_tree.cc@195 PS5, Line 195: & > Likewise with the & placement nit. Done http://gerrit.cloudera.org:8080/#/c/11381/5/src/kudu/tablet/rowset_tree.cc@196 PS5, Line 196: * > * next to type, not parameter name. Done http://gerrit.cloudera.org:8080/#/c/11381/3/src/kudu/tablet/tablet.cc File src/kudu/tablet/tablet.cc: http://gerrit.cloudera.org:8080/#/c/11381/3/src/kudu/tablet/tablet.cc@1773 PS3, Line 1773: : // Grab the memrowset iterator. : gscoped_ptr<RowwiseIterator> ms_iter; : RETURN_NOT_OK(components_->memrowset->NewRowIterator(opts, &ms_iter)); : ret.emplace_back(ms_iter.release()); : : opts.io_context = io_context; : should check spec != nullptr first, because some test cases's spec is nullptr. http://gerrit.cloudera.org:8080/#/c/11381/3/src/kudu/tablet/tablet.cc@1782 PS3, Line 1782: if (spec != nullptr && (spec->lower_bound_key() || spec->exclusive_upper_bound_key())) { : boost::optional<Slice> lower_bound = spec->lower_bound_key() ? \ : boost::optional<Slice>(spec->lower_bound_key()->encoded_key()) : boost::none; : boost::optional<Slice> upper_bound = spec->exclusive_upper_bound_key() ? \ : boost::optional<Slice>(spec->exclusive_upper_bound_key()->encoded_key()) : boost::none; : vector<RowSet*> interval_sets; : components_->rowsets->FindRowSetsIntersectingInterval(lower_bound, upper_bound, &interval_sets); : some tests, example TestTablet.TestRowIteratorSimple in tablet-test.cc, will rely on rowsets's sequence, so we need keep the original logic. http://gerrit.cloudera.org:8080/#/c/11381/6/src/kudu/tablet/tablet.cc File src/kudu/tablet/tablet.cc: http://gerrit.cloudera.org:8080/#/c/11381/6/src/kudu/tablet/tablet.cc@1774 PS6, Line 1774: // Grab the memrowset iterator. : gscoped_ptr<RowwiseIterator> ms_iter; : RETURN_NOT_OK(components_->memrowset->NewRowIterator(opts, &ms_iter)); : ret.emplace_back(ms_iter.release()); : : opts.io_context = io_context; : : // > Nit: combine using ternaries: Done http://gerrit.cloudera.org:8080/#/c/11381/7/src/kudu/tablet/tablet.cc File src/kudu/tablet/tablet.cc: http://gerrit.cloudera.org:8080/#/c/11381/7/src/kudu/tablet/tablet.cc@1793 PS7, Line 1793: rs->ToString())); : ret.emplace_back(row_it.release()); > Can remove this. Done http://gerrit.cloudera.org:8080/#/c/11381/7/src/kudu/tablet/tablet.cc@2362 PS7, Line 2362: > nit: extra line Done http://gerrit.cloudera.org:8080/#/c/11381/6/src/kudu/util/interval_tree-inl.h File src/kudu/util/interval_tree-inl.h: http://gerrit.cloudera.org:8080/#/c/11381/6/src/kudu/util/interval_tree-inl.h@402 PS6, Line 402: interval > Nit: 'intervals' was correct; this deals with multiple intervals. "interval ... is ... " | "intervals ... are ... ", maybe the former is simple :) http://gerrit.cloudera.org:8080/#/c/11381/6/src/kudu/util/interval_tree-inl.h@402 PS6, Line 402: interval > I think 'interval' can be singular or plural as long as the rest of the sen Done http://gerrit.cloudera.org:8080/#/c/11381/5/src/kudu/util/interval_tree-inl.h File src/kudu/util/interval_tree-inl.h: http://gerrit.cloudera.org:8080/#/c/11381/5/src/kudu/util/interval_tree-inl.h@402 PS5, Line 402: interval > nit: interval Done http://gerrit.cloudera.org:8080/#/c/11381/5/src/kudu/util/interval_tree-inl.h@409 PS5, Line 409: POSI > To make this easier to understand locally, could you do one of two things: Done http://gerrit.cloudera.org:8080/#/c/11381/1/src/kudu/util/interval_tree-test.cc File src/kudu/util/interval_tree-test.cc: http://gerrit.cloudera.org:8080/#/c/11381/1/src/kudu/util/interval_tree-test.cc@66 PS1, Line 66: // | | > warning: redundant boolean literal in conditional return statement [readabi Done http://gerrit.cloudera.org:8080/#/c/11381/1/src/kudu/util/interval_tree-test.cc@71 PS1, Line 71: // [-OO, upper) > warning: redundant boolean literal in conditional return statement [readabi Done http://gerrit.cloudera.org:8080/#/c/11381/1/src/kudu/util/interval_tree-test.cc@223 PS1, Line 223: tree.FindIntersectingInterval(lower, upper, &results); > warning: the const qualified parameter 'all_intervals' is copied for each i Done http://gerrit.cloudera.org:8080/#/c/11381/1/src/kudu/util/interval_tree-test.cc@238 PS1, Line 238: } > warning: the const qualified parameter 'all_intervals' is copied for each i Done http://gerrit.cloudera.org:8080/#/c/11381/5/src/kudu/util/interval_tree-test.cc File src/kudu/util/interval_tree-test.cc: http://gerrit.cloudera.org:8080/#/c/11381/5/src/kudu/util/interval_tree-test.cc@64 PS5, Line 64: wer == boost::none & > I'm getting confused now about which intervals are inclusive of the right e Done http://gerrit.cloudera.org:8080/#/c/11381/6/src/kudu/util/interval_tree-test.cc File src/kudu/util/interval_tree-test.cc: http://gerrit.cloudera.org:8080/#/c/11381/6/src/kudu/util/interval_tree-test.cc@59 PS6, Line 59: // boost::none means infinity. : // [left, right] is closed interval. : // [lower, upper) is half-open interval, so the upper is exclusive. > This and the pictures are great. Thanks for adding them! Done http://gerrit.cloudera.org:8080/#/c/11381/7/src/kudu/util/interval_tree-test.cc File src/kudu/util/interval_tree-test.cc: http://gerrit.cloudera.org:8080/#/c/11381/7/src/kudu/util/interval_tree-test.cc@187 PS7, Line 187: & > nit: Here and below, & with type. Done http://gerrit.cloudera.org:8080/#/c/11381/6/src/kudu/util/interval_tree.h File src/kudu/util/interval_tree.h: http://gerrit.cloudera.org:8080/#/c/11381/6/src/kudu/util/interval_tree.h@145 PS6, Line 145: void FindIntersectingInterval(const QueryPointType& lower_bound, : const QueryPointType& upper_bound, > Nit: const QueryPointType& Done http://gerrit.cloudera.org:8080/#/c/11381/2/src/kudu/util/interval_tree.h File src/kudu/util/interval_tree.h: http://gerrit.cloudera.org:8080/#/c/11381/2/src/kudu/util/interval_tree.h@140 PS2, Line 140: : // Find all intervals in the tree which intersect the given interval. : // The resulting intervals are added to the 'results' vector. : // The vector is not cleared first. : template<class QueryPointType> : void FindIntersectingInterval(const QueryPointType& lower_bound, : const QueryPointType& upper_bound, : IntervalVector* results) const; : private: : static void Partition(const IntervalVector &in, : > Rather than introduce these new methods, would it be possible to modify int Done http://gerrit.cloudera.org:8080/#/c/11381/2/src/kudu/util/interval_tree.h@140 PS2, Line 140: : // Find all intervals in the tree which intersect the given interval. : // The resulting intervals are added to the 'results' vector. : // The vector is not cleared first. : template<class QueryPointType> : void FindIntersectingInterval(const QueryPointType& lower_bound, : const QueryPointType& upper_bound, : IntervalVector* results) const; : private: : static void Partition(const IntervalVector &in, : > +1 to the idea though I think it might suffice to add an additional QueryIn Done -- To view, visit http://gerrit.cloudera.org:8080/11381 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: kudu Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I0e34b4169f93f3519f694c9bf86ca5295c318e79 Gerrit-Change-Number: 11381 Gerrit-PatchSet: 9 Gerrit-Owner: helifu <hzhel...@corp.netease.com> Gerrit-Reviewer: Adar Dembo <a...@cloudera.com> Gerrit-Reviewer: Kudu Jenkins (120) Gerrit-Reviewer: Tidy Bot (241) Gerrit-Reviewer: Will Berkeley <wdberke...@gmail.com> Gerrit-Reviewer: helifu <hzhel...@corp.netease.com> Gerrit-Comment-Date: Wed, 31 Oct 2018 07:23:34 +0000 Gerrit-HasComments: Yes