Omid Shahidi has uploaded a new patch set (#10). ( http://gerrit.cloudera.org:8080/18798 )
Change subject: IMPALA-6684: Fix untracked memory in KRPC ...................................................................... IMPALA-6684: Fix untracked memory in KRPC During serialization of an row batch header, a tuple_data_ is created which will hold the compressed tuple data for an outbound row batch. We would like this tuple data to be trackable as it is responsible for a significant portion of untrackable memory from the krpc data stream sender. By using free pool, we are able to allocate tuple data and compression scratch and account for it in the memory tracker of the KrpcDataStreamSender. This solution creates a RAII class responsible for memory allocation and changes the existing code to use a char buffer pointed by a char* tuple_data_ instead of the previously used std::string tuple_data_. The thrift implementation is left unchanged and the protobuf implementation is seperated. Testing: - Passed core tests. - Ran a single node benchmark which shows no regression. - Updated row-batch-serialize-test and row-batch-serialize-benchmark to test the row-batch serialization used by KRPC. - Manually collected query-profile, heap growth, and memory usage log showing untracked memory decreased by 1/2. - Add end-end unit-test to verify the new counters in runtime profile serialize: Func 10% 50% 90% 10% 50% 90% ile (rel) (rel) (rel) ----------------------------------------------------------- ser_no_dups_baseline 8.36 8.6 8.7 1X 1X 1X ser_no_dups 6.73 6.85 6.93 0.804X 0.796X 0.796X ser_no_dups_full 5.28 5.38 5.55 0.631X 0.625X 0.637X ser_adjacent_dups_baseline 12.9 13.2 13.4 1X 1X 1X ser_adjacent_dups 23.2 23.7 24.1 1.8X 1.8X 1.8X ser_adjacent_dups_full 19.9 20.3 20.7 1.54X 1.54X 1.55X ser_dups_baseline 9.17 9.54 9.72 1X 1X 1X ser_dups 7.45 7.69 7.86 0.812X 0.806X 0.809X ser_dups_full 14.6 15 15.3 1.6X 1.57X 1.57X deserialize: Func 10% 50% 90% 10% 50% 90% ile (rel) (rel) (rel) ----------------------------------------------------------- deser_no_dups_baseline 32.6 33.5 34 1X 1X 1X deser_no_dups 32.5 33.1 33.7 0.999X 0.99X 0.992X deser_adjacent_dups_baseline 53.1 54 54.7 1X 1X 1X deser_adjacent_dups 80.3 81.6 82.5 1.51X 1.51X 1.51X deser_dups_baseline 52.4 54 54.7 1X 1X 1X deser_dups 86.8 88.4 89.7 1.66X 1.64X 1.64X Change-Id: I2ba2b907ce4f275a7a1fb8cf75453c7003eb4b82 --- M be/src/benchmarks/row-batch-serialize-benchmark.cc M be/src/runtime/krpc-data-stream-sender.cc M be/src/runtime/krpc-data-stream-sender.h M be/src/runtime/row-batch-serialize-test.cc M be/src/runtime/row-batch.cc M be/src/runtime/row-batch.h A be/src/runtime/row-batch.inline.h A testdata/workloads/tpch/queries/datastream-sender.test A tests/query_test/test_datastream_sender.py 9 files changed, 656 insertions(+), 214 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/98/18798/10 -- To view, visit http://gerrit.cloudera.org:8080/18798 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: newpatchset Gerrit-Change-Id: I2ba2b907ce4f275a7a1fb8cf75453c7003eb4b82 Gerrit-Change-Number: 18798 Gerrit-PatchSet: 10 Gerrit-Owner: Omid Shahidi <omid.shahidi.2...@gmail.com> Gerrit-Reviewer: Impala Public Jenkins <impala-public-jenk...@cloudera.com> Gerrit-Reviewer: Kurt Deschler <kdesc...@cloudera.com> Gerrit-Reviewer: Omid Shahidi <omid.shahidi.2...@gmail.com> Gerrit-Reviewer: Riza Suminto <riza.sumi...@cloudera.com> Gerrit-Reviewer: Wenzhe Zhou <wz...@cloudera.com>