Joe McDonnell has posted comments on this change.

Change subject: IMPALA-5522:Use tracked memory for DictDecoder and DictEncoder
......................................................................


Patch Set 1:

(3 comments)

http://gerrit.cloudera.org:8080/#/c/8034/1/be/src/exec/exec-node.h
File be/src/exec/exec-node.h:

PS1, Line 215:  MemTracker* decoder_mem_tracker() { return 
decoder_mem_tracker_.get(); }
The dictionary is only used for Parquet, so I think the MemPool should not be 
in the ExecNode, as that is used for a large number of other things. Look into 
moving this down to HdfsParquetScanner (or reuse dictionary_pool_).


http://gerrit.cloudera.org:8080/#/c/8034/1/be/src/exec/parquet-column-readers.cc
File be/src/exec/parquet-column-readers.cc:

PS1, Line 213: dict_decoder_(new 
MemPool(parent->scan_node_->decoder_mem_tracker())),
It is important not to create objects on a per-column basis unless truly 
necessary. I think it makes more sense to have a single MemPool up at the 
HdfsParquetScanner level. Look at how dictionary_pool_ works. I think it might 
be better to reuse that pool.


http://gerrit.cloudera.org:8080/#/c/8034/1/be/src/util/dict-encoding.h
File be/src/util/dict-encoding.h:

PS1, Line 234: *val_ptr = *dict_[index];
dict_ needs to return T's directly rather than T*'s. This is a performance 
critical path, and an extra dereference is too expensive.


-- 
To view, visit http://gerrit.cloudera.org:8080/8034
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I02a3b54f6c107d19b62ad9e1c49df94175964299
Gerrit-PatchSet: 1
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Pranay Singh
Gerrit-Reviewer: Joe McDonnell <joemcdonn...@cloudera.com>
Gerrit-HasComments: Yes

Reply via email to