Todd Lipcon has posted comments on this change. Change subject: Integrate ColumnPredicate into client and server ......................................................................
Patch Set 3: (16 comments) still have quite a bit more of this to look at, but figured i'd post these thoughts before heading home http://gerrit.cloudera.org:8080/#/c/2138/3/src/kudu/client/client.cc File src/kudu/client/client.cc: Line 1050: // Find the first tablet. comment doesn't seem quite accurate Line 1054: VLOG(1) << "Short circuting scan " << ToString(); typo Line 1114: if (data_->short_circuit_) { why is this necessary? shouldn't next_req_.scanner_id() be empty in the case that we didn't need to scan at all? maybe better to just move the CHECK(data_->proxy_) down into the 'if' below http://gerrit.cloudera.org:8080/#/c/2138/3/src/kudu/client/scan_predicate-internal.h File src/kudu/client/scan_predicate-internal.h: Line 79: ComparisonPredicateData* Clone() const override { it seems like there is a potential bug lurking here: AddToScanSpec std::moves col_ into the ScanSpec, but Clone() tries to copy col_. Is there a restriction that Clone() cannot be called after AddToScanSpec? Can we document that and add a DCHECK? http://gerrit.cloudera.org:8080/#/c/2138/3/src/kudu/client/scan_predicate.cc File src/kudu/client/scan_predicate.cc: Line 73: auto pred = ColumnPredicate::InclusiveRange(move(col_), nullptr, val_void, arena); can you expand out this auto type? wasn't obvious from context to me what it is. (boost::optional, right? the *pred made me think it was some kind of pointer) Line 74: if (pred) { I think it's worth a note explaining that InclusiveRange will return none if the range actually spans the whole value space http://gerrit.cloudera.org:8080/#/c/2138/3/src/kudu/client/scanner-internal.cc File src/kudu/client/scanner-internal.cc: Line 64: pool_(), no need for these empty initializers Line 100: void ColumnPredicateIntoPB(const ColumnPredicate& predicate, style: add space above http://gerrit.cloudera.org:8080/#/c/2138/3/src/kudu/common/generic_iterators.h File src/kudu/common/generic_iterators.h: Line 170: std::vector<std::tuple<int32_t, ColumnPredicate>> predicates_; maybe colidx_to_predicate_ or something? Line 173: std::vector<size_t> non_predicates_; I think better to rename this to non_predicate_col_indexes or somesuch -- we've been bitten too many times in the past by mixing up index + id http://gerrit.cloudera.org:8080/#/c/2138/3/src/kudu/common/scan_spec.cc File src/kudu/common/scan_spec.cc: Line 57: bool ScanSpec::CanShortCircuit() { hrm, possible thought: would it make more sense to have the 'Optimize()' step evaluate whether it can be short-circuited, and if so, remove all the predicates and set a 'short_circuited_' flag or somesuch? maybe not. Line 106: preds.push_back(predicate.second.ToString()); nit: indent Line 116: // Don't bother if we can already short circuit the scan. This also let's us typo: lets http://gerrit.cloudera.org:8080/#/c/2138/3/src/kudu/tserver/tserver.proto File src/kudu/tserver/tserver.proto: Line 213: optional bytes upper = 2; let's name this exclusive_upper or excl_upper, or even 'xupper' if you want to use that abbreviation consistently. Otherwise I think it's too easy to get confused (given we still have to handle the deprecated [inclusive] code path for a while, etc) Line 243: // DEPRECATED: use column_predicates field. in other places, we've renamed the field to DEPRECATED_foo -- that way we make sure that we've caught all the cases that were using it, and it's clear in the code that it's a deprecation compat path. Line 249: repeated ColumnPredicatePB column_predicates = 13; This protocol change will allow old clients to continue to talk to new servers, but if new clients send a predicate to an old server, it'll be silently ignored. That seems a little problematic. I think it's OK in beta to not support this compatibility yet, but we should have some safeguard so it fails fast instead of silently ignoring all the predicates. For example, we could add a bitfield to the response like 'supported_features', and then the client can check that the response has a bit set like COLUMN_PREDICATE_PB? We could also do this more generically in the RPC system... worth thinking about a little. I pinged the #grpc IRC channel to see what they normally consider best practices for changes like this. -- To view, visit http://gerrit.cloudera.org:8080/2138 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: Ife6852680b7f59fddee688e5702c1a70944f7622 Gerrit-PatchSet: 3 Gerrit-Project: kudu Gerrit-Branch: master Gerrit-Owner: Dan Burkert <[email protected]> Gerrit-Reviewer: Adar Dembo <[email protected]> Gerrit-Reviewer: David Ribeiro Alves <[email protected]> Gerrit-Reviewer: Jean-Daniel Cryans Gerrit-Reviewer: Kudu Jenkins Gerrit-Reviewer: Todd Lipcon <[email protected]> Gerrit-HasComments: Yes
