Yida Wu has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/17979 )

Change subject: IMPALA-10791 Add batch reading for remote temporary files
......................................................................


Patch Set 21:

(8 comments)

Thanks Qifan for the review.

http://gerrit.cloudera.org:8080/#/c/17979/20//COMMIT_MSG
Commit Message:

http://gerrit.cloudera.org:8080/#/c/17979/20//COMMIT_MSG@32
PS20, Line 32: and disable this block, because it may be good to avoid
             : duplicated reads from the remote fs for the same content.
             :
> nit. suggest to present the positive case first. That is, moving it before
Done


http://gerrit.cloudera.org:8080/#/c/17979/20/be/src/runtime/tmp-file-mgr-test.cc
File be/src/runtime/tmp-file-mgr-test.cc:

http://gerrit.cloudera.org:8080/#/c/17979/20/be/src/runtime/tmp-file-mgr-test.cc@2183
PS20, Line 2183:   WaitForDiskFileStatus(file1->DiskFile(), 
DiskFileStatus::PERSISTED);
> nit. 'we use' -> 'we did use'.
Done


http://gerrit.cloudera.org:8080/#/c/17979/20/be/src/runtime/tmp-file-mgr-test.cc@2200
PS20, Line 2200: SSERT_OK(file_group.Read(handl
> To improve the readability, it looks like various IF tests inside this bloc
Added a helper function SetReadBufferSizeHelper().


http://gerrit.cloudera.org:8080/#/c/17979/13/be/src/runtime/tmp-file-mgr.cc
File be/src/runtime/tmp-file-mgr.cc:

http://gerrit.cloudera.org:8080/#/c/17979/13/be/src/runtime/tmp-file-mgr.cc@1070
PS13, Line 1070:
> nit exists
Done


http://gerrit.cloudera.org:8080/#/c/17979/13/be/src/runtime/tmp-file-mgr.cc@1078
PS13, Line 1078:           [read_buffer_block, tmp_file = this](const Status& 
fetch_status) {
> nit. Can this line be moved above 1074? That is, if remote batch reading is
The function firstly check whether the local buffer exists (even batch reading 
is disabled, we could still have the disk buffer), if not, and batch reading 
enabled, we will try to see if there is an available buffer in memory. So we 
are not able to move the line ahead.
Also, this function is changed a lot in part 2, it may be better to keep 
optimizing in part 2 if there is no safety issue here.


http://gerrit.cloudera.org:8080/#/c/17979/13/be/src/runtime/tmp-file-mgr.cc@1104
PS13, Line 1104: ile* TmpFileRemote::GetR
> Looks like this function does more than what the function name implies. You
Added a comment in the tmp-file-mgr.cc. Already a comment in the header of 
IncrementReadPageCount() in the tmp-file-mgr-internal.h. I think it is more 
efficient, because otherwise we need to fetch the lock to double check whether 
all the pages are read every time after the increment. Also it ensures that 
only one thread knows the "all_read" status and does the buffer removing job.


http://gerrit.cloudera.org:8080/#/c/17979/20/be/src/util/mem-info.cc
File be/src/util/mem-info.cc:

http://gerrit.cloudera.org:8080/#/c/17979/20/be/src/util/mem-info.cc@265
PS20, Line 265: (process_avail_mem) *process_a
> may just say
Done


http://gerrit.cloudera.org:8080/#/c/17979/20/be/src/util/metrics.h
File be/src/util/metrics.h:

http://gerrit.cloudera.org:8080/#/c/17979/20/be/src/util/metrics.h@272
PS20, Line 272:   /// The hwm value is also updated atomically.
> May add "Also return the updated current value." here.
Done



--
To view, visit http://gerrit.cloudera.org:8080/17979
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I1dcc5d0881ffaeff09c5c514306cd668373ad31b
Gerrit-Change-Number: 17979
Gerrit-PatchSet: 21
Gerrit-Owner: Yida Wu <wydbaggio...@gmail.com>
Gerrit-Reviewer: Abhishek Rawat <ara...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <impala-public-jenk...@cloudera.com>
Gerrit-Reviewer: Michael Smith <michael.sm...@cloudera.com>
Gerrit-Reviewer: Qifan Chen <qc...@cloudera.com>
Gerrit-Reviewer: Yida Wu <wydbaggio...@gmail.com>
Gerrit-Comment-Date: Tue, 13 Sep 2022 17:57:55 +0000
Gerrit-HasComments: Yes

Reply via email to