Todd Lipcon has posted comments on this change. Change subject: KUDU-1235. Add Get API ......................................................................
Patch Set 1: (10 comments) didn't go through the java client yet or really look carefully at all the code. I think there are some potential bugs, though, that are worth solving before looking carefully. I'm also interested to know where the main performance difference is, here. Do you have a comparison flame-graph of the tserver CPU usage handling a YCSB workload of get vs scan? http://gerrit.cloudera.org:8080/#/c/2519/1/src/kudu/client/client-test.cc File src/kudu/client/client-test.cc: Line 888: ASSERT_NO_FATAL_FAILURE(InsertTestRows( can you add a couple more tests here? I think there might be bugs with the case where a row actually has existed in multiple different RowSets, and maybe bugs where the row has been deleted. For example: - insert test rows - delete the rows - verify that Get() of the row returns expected NotFound status - force the tablet to Flush() - verify that it still returns NotFound - insert the rows again with new values - verify that the correct value is returned Can you also modify fuzz-itest.cc so that the GetRow() function there will randomly pick either using the new 'Get' API or a scan? I think that test will actually give you good coverage of the above scenario as well, but in a more thorough/random fashion. http://gerrit.cloudera.org:8080/#/c/2519/1/src/kudu/client/client.h File src/kudu/client/client.h: Line 1073: Status SetProjectedColumns(const std::vector<std::string>& col_names) WARN_UNUSED_RESULT; rename to SetProjectedColumnNames (to match the non-deprecated API in KuduScanner) http://gerrit.cloudera.org:8080/#/c/2519/1/src/kudu/client/getter-internal.cc File src/kudu/client/getter-internal.cc: Line 47: arena_(1024, 1024*1024), this arena doesn't seem to be used Line 61: Status KuduGetter::Data::SetProjectedColumns(const vector<string>& col_names) { this function is copy-pasted from KuduScanner -- can we share the code somehow in a utility function that both of them call? Line 93: RETURN_NOT_OK(SchemaToColumnPBs(GetCurrentProjection(), request_.mutable_projected_columns())); performance-wise, you could do this lazily only when the projection changes -- it's fairly expensive to construct the PB for a wide table. mind adding a TODO? Line 111: KuduClient::LEADER_ONLY, why restrict to leader only? I think this API should mirror the Scanner API where the caller can pick which consistency guarantees are required http://gerrit.cloudera.org:8080/#/c/2519/1/src/kudu/tablet/tablet.cc File src/kudu/tablet/tablet.cc: Line 304: if (row_key_util::IncrementKey(&new_row, &arena)) { this is likely to conflict a lot with the stuff Dan has been working on. Dan, do you have any suggestions on how this could be easier on top of your ColumnPredicate work? Line 315: return iter->NextBlock(dst); I'm not sure this is correct -- it's allowed for an iterator to return HasNext() but then not return any selected rows (eg due to deletions). see my comment on test coverage. Line 1551: vector<shared_ptr<RowwiseIterator> > *iters) const { I don't think we need shared_ptr here -- this could be a vector<unique_ptr<...>> Line 1567: RETURN_NOT_OK_PREPEND(rs->NewRowIterator(projection, snap, &row_it), seems like we are leaving a lot of potential optimization on the table by not checking bloom filters here. Is that something you plan to do as a follow-up? -- To view, visit http://gerrit.cloudera.org:8080/2519 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: Id054a198373da04a3eca5b244516acc378e1e37c Gerrit-PatchSet: 1 Gerrit-Project: kudu Gerrit-Branch: master Gerrit-Owner: Binglin Chang <[email protected]> Gerrit-Reviewer: Kudu Jenkins Gerrit-Reviewer: Todd Lipcon <[email protected]> Gerrit-HasComments: Yes
