Yida Wu has posted comments on this change. ( http://gerrit.cloudera.org:8080/17979 )
Change subject: IMPALA-10791 Add batch reading for remote temporary files ...................................................................... Patch Set 21: (8 comments) Thanks Qifan for the review. http://gerrit.cloudera.org:8080/#/c/17979/20//COMMIT_MSG Commit Message: http://gerrit.cloudera.org:8080/#/c/17979/20//COMMIT_MSG@32 PS20, Line 32: and disable this block, because it may be good to avoid : duplicated reads from the remote fs for the same content. : > nit. suggest to present the positive case first. That is, moving it before Done http://gerrit.cloudera.org:8080/#/c/17979/20/be/src/runtime/tmp-file-mgr-test.cc File be/src/runtime/tmp-file-mgr-test.cc: http://gerrit.cloudera.org:8080/#/c/17979/20/be/src/runtime/tmp-file-mgr-test.cc@2183 PS20, Line 2183: WaitForDiskFileStatus(file1->DiskFile(), DiskFileStatus::PERSISTED); > nit. 'we use' -> 'we did use'. Done http://gerrit.cloudera.org:8080/#/c/17979/20/be/src/runtime/tmp-file-mgr-test.cc@2200 PS20, Line 2200: SSERT_OK(file_group.Read(handl > To improve the readability, it looks like various IF tests inside this bloc Added a helper function SetReadBufferSizeHelper(). http://gerrit.cloudera.org:8080/#/c/17979/13/be/src/runtime/tmp-file-mgr.cc File be/src/runtime/tmp-file-mgr.cc: http://gerrit.cloudera.org:8080/#/c/17979/13/be/src/runtime/tmp-file-mgr.cc@1070 PS13, Line 1070: > nit exists Done http://gerrit.cloudera.org:8080/#/c/17979/13/be/src/runtime/tmp-file-mgr.cc@1078 PS13, Line 1078: [read_buffer_block, tmp_file = this](const Status& fetch_status) { > nit. Can this line be moved above 1074? That is, if remote batch reading is The function firstly check whether the local buffer exists (even batch reading is disabled, we could still have the disk buffer), if not, and batch reading enabled, we will try to see if there is an available buffer in memory. So we are not able to move the line ahead. Also, this function is changed a lot in part 2, it may be better to keep optimizing in part 2 if there is no safety issue here. http://gerrit.cloudera.org:8080/#/c/17979/13/be/src/runtime/tmp-file-mgr.cc@1104 PS13, Line 1104: ile* TmpFileRemote::GetR > Looks like this function does more than what the function name implies. You Added a comment in the tmp-file-mgr.cc. Already a comment in the header of IncrementReadPageCount() in the tmp-file-mgr-internal.h. I think it is more efficient, because otherwise we need to fetch the lock to double check whether all the pages are read every time after the increment. Also it ensures that only one thread knows the "all_read" status and does the buffer removing job. http://gerrit.cloudera.org:8080/#/c/17979/20/be/src/util/mem-info.cc File be/src/util/mem-info.cc: http://gerrit.cloudera.org:8080/#/c/17979/20/be/src/util/mem-info.cc@265 PS20, Line 265: (process_avail_mem) *process_a > may just say Done http://gerrit.cloudera.org:8080/#/c/17979/20/be/src/util/metrics.h File be/src/util/metrics.h: http://gerrit.cloudera.org:8080/#/c/17979/20/be/src/util/metrics.h@272 PS20, Line 272: /// The hwm value is also updated atomically. > May add "Also return the updated current value." here. Done -- To view, visit http://gerrit.cloudera.org:8080/17979 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I1dcc5d0881ffaeff09c5c514306cd668373ad31b Gerrit-Change-Number: 17979 Gerrit-PatchSet: 21 Gerrit-Owner: Yida Wu <wydbaggio...@gmail.com> Gerrit-Reviewer: Abhishek Rawat <ara...@cloudera.com> Gerrit-Reviewer: Impala Public Jenkins <impala-public-jenk...@cloudera.com> Gerrit-Reviewer: Michael Smith <michael.sm...@cloudera.com> Gerrit-Reviewer: Qifan Chen <qc...@cloudera.com> Gerrit-Reviewer: Yida Wu <wydbaggio...@gmail.com> Gerrit-Comment-Date: Tue, 13 Sep 2022 17:57:55 +0000 Gerrit-HasComments: Yes