Hello Internal Jenkins,
I'd like you to reexamine a change. Please visit
http://gerrit.cloudera.org:8080/1562
to look at the new patch set (#9).
Change subject: KUDU-1259: new scanner API with an encapsulated Batch object
......................................................................
KUDU-1259: new scanner API with an encapsulated Batch object
This adds a new API for scanner results which encapsulates the result batch,
allowing the caller to access the rows one row at a time, rather than
constructing a vector<KuduRowResult>. This is important in the case that the
result rows are small or empty (for example an empty projection, or scanning a
single int8 column). In those cases, a single batch may return millions or even
tens of millions of rows, in which case the vector<KuduRowResult> was taking up
tens or hundreds of MBs of memory.
The KuduRowResult class itself is renamed to KuduScanBatch::RowPtr, since that
makes it more obvious that the row's lifetime is tied to the batch that it came
from. The old name is preserved via a typedef that will provide for API
compatibility for most users, though it does break the ABI since the
implementation symbols are renamed. Given our beta status, it doesn't seem
necessary to bump the soversion due to this ABI change.
This refactoring ends up transferring the RpcController into the returned
RowBatch object, so it will actually be feasible to use this to avoid
copying strings in Impala -- we can simply attach the KuduScanBatch
to the Impala RowBatch to tie the lifecycle of indirect data to the
lifecycle of the rows in Impala.
I made the appropriate small change in Impala to use the new API and
verified that a SELECT COUNT(*) query which used to take 40+GB of RAM
per server now only uses a few MB. Performance also improved about 10%
for this query, likely due to less allocator pressure and page faults.
The new KuduScanBatch class fits the C++ "iterable sequence" concept,
and thus works with the C++11 range-for loop. Unfortunately it doesn't
seem to work directly with BOOST_FOREACH.
Change-Id: I29fd4fbb8b906ffa591853ab625ac4b089da4bc9
---
M src/kudu/client/CMakeLists.txt
M src/kudu/client/client-test.cc
M src/kudu/client/client.cc
M src/kudu/client/client.h
D src/kudu/client/row_result.cc
M src/kudu/client/row_result.h
A src/kudu/client/scan_batch.cc
A src/kudu/client/scan_batch.h
M src/kudu/client/scanner-internal.cc
M src/kudu/client/scanner-internal.h
M src/kudu/rpc/rpc_controller.cc
M src/kudu/rpc/rpc_controller.h
M src/kudu/tools/ts-cli.cc
13 files changed, 783 insertions(+), 527 deletions(-)
git pull ssh://gerrit.cloudera.org:29418/kudu refs/changes/62/1562/9
--
To view, visit http://gerrit.cloudera.org:8080/1562
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I29fd4fbb8b906ffa591853ab625ac4b089da4bc9
Gerrit-PatchSet: 9
Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-Owner: Todd Lipcon <[email protected]>
Gerrit-Reviewer: Binglin Chang <[email protected]>
Gerrit-Reviewer: Dan Burkert <[email protected]>
Gerrit-Reviewer: David Ribeiro Alves <[email protected]>
Gerrit-Reviewer: Internal Jenkins
Gerrit-Reviewer: Martin Grund <[email protected]>
Gerrit-Reviewer: Todd Lipcon <[email protected]>