Joe McDonnell has posted comments on this change. Change subject: IMPALA-5522:Use tracked memory for DictDecoder and DictEncoder ......................................................................
Patch Set 1: (3 comments) http://gerrit.cloudera.org:8080/#/c/8034/1/be/src/exec/exec-node.h File be/src/exec/exec-node.h: PS1, Line 215: MemTracker* decoder_mem_tracker() { return decoder_mem_tracker_.get(); } The dictionary is only used for Parquet, so I think the MemPool should not be in the ExecNode, as that is used for a large number of other things. Look into moving this down to HdfsParquetScanner (or reuse dictionary_pool_). http://gerrit.cloudera.org:8080/#/c/8034/1/be/src/exec/parquet-column-readers.cc File be/src/exec/parquet-column-readers.cc: PS1, Line 213: dict_decoder_(new MemPool(parent->scan_node_->decoder_mem_tracker())), It is important not to create objects on a per-column basis unless truly necessary. I think it makes more sense to have a single MemPool up at the HdfsParquetScanner level. Look at how dictionary_pool_ works. I think it might be better to reuse that pool. http://gerrit.cloudera.org:8080/#/c/8034/1/be/src/util/dict-encoding.h File be/src/util/dict-encoding.h: PS1, Line 234: *val_ptr = *dict_[index]; dict_ needs to return T's directly rather than T*'s. This is a performance critical path, and an extra dereference is too expensive. -- To view, visit http://gerrit.cloudera.org:8080/8034 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: I02a3b54f6c107d19b62ad9e1c49df94175964299 Gerrit-PatchSet: 1 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Pranay Singh Gerrit-Reviewer: Joe McDonnell <joemcdonn...@cloudera.com> Gerrit-HasComments: Yes