Qifan Chen has posted comments on this change. ( http://gerrit.cloudera.org:8080/17979 )
Change subject: IMPALA-10791 Add batching reading for remote temporary files ...................................................................... Patch Set 7: Code-Review+1 (6 comments) Looks good! http://gerrit.cloudera.org:8080/#/c/17979/5/be/src/runtime/io/disk-io-mgr.cc File be/src/runtime/io/disk-io-mgr.cc: http://gerrit.cloudera.org:8080/#/c/17979/5/be/src/runtime/io/disk-io-mgr.cc@424 PS5, Line 424: read_buffer_bloc, MemBlockStatus::DISABLED, dstl, &read_buffer_lock)) { > I think the mechanism is like this, any read(pin) who wants the block memor Done http://gerrit.cloudera.org:8080/#/c/17979/5/be/src/runtime/io/disk-io-mgr.cc@430 PS5, Line 430: RN_IF_ERROR( > Tried a way like closing the file handle if two blocks have used. The perfo Looking into it in near future sounds like a good plan and it probably should be given a high priority. In realty, say with the HJ builder use case, the overflow is very much instance specific and so is the tmp file itself. Therefore, only the reader(s) of that tmp file contend for the file handle. Per this doc, hdfs lib API is thread safe: https://hadoop.apache.org/docs/r1.2.1/libhdfs.html. http://gerrit.cloudera.org:8080/#/c/17979/6/be/src/runtime/tmp-file-mgr.cc File be/src/runtime/tmp-file-mgr.cc: http://gerrit.cloudera.org:8080/#/c/17979/6/be/src/runtime/tmp-file-mgr.cc@412 PS6, Line 412: ist should be int64_t? http://gerrit.cloudera.org:8080/#/c/17979/6/be/src/runtime/tmp-file-mgr.cc@411 PS6, Line 411: tmp_dirs_remote_ctrl_.tmp_file_pool_->dequeue_timer_metric_ = : metrics->RegisterMetric(new HistogramMetric( : MetricDefs::Get(TMP_FILE_BUFF_POOL_DEQUEUE_DURATI > We will go "read by page" instead in this case. Because the "read by page" Okay I see. Done. http://gerrit.cloudera.org:8080/#/c/17979/6/be/src/runtime/tmp-file-mgr.cc@426 PS6, Line 426: nique_i nit: reduced http://gerrit.cloudera.org:8080/#/c/17979/7/be/src/runtime/tmp-file-mgr.cc File be/src/runtime/tmp-file-mgr.cc: http://gerrit.cloudera.org:8080/#/c/17979/7/be/src/runtime/tmp-file-mgr.cc@1014 PS7, Line 1014: if (read_buffer_file->IsReadBufferBlockStatus(read_buffer_block, : io::MemBlockStatus::UNINIT, read_file_lock, &mem_bloc_lock)) { nit. Please add comment here this block of code which does several things. -- To view, visit http://gerrit.cloudera.org:8080/17979 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I1dcc5d0881ffaeff09c5c514306cd668373ad31b Gerrit-Change-Number: 17979 Gerrit-PatchSet: 7 Gerrit-Owner: Yida Wu <wydbaggio...@gmail.com> Gerrit-Reviewer: Impala Public Jenkins <impala-public-jenk...@cloudera.com> Gerrit-Reviewer: Qifan Chen <qc...@cloudera.com> Gerrit-Reviewer: Yida Wu <wydbaggio...@gmail.com> Gerrit-Comment-Date: Sat, 13 Nov 2021 00:31:15 +0000 Gerrit-HasComments: Yes