Yida Wu has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/17979 )

Change subject: IMPALA-10791 Add batch reading for remote temporary files
......................................................................


Patch Set 12:

(11 comments)

http://gerrit.cloudera.org:8080/#/c/17979/12/be/src/runtime/io/disk-file.h
File be/src/runtime/io/disk-file.h:

http://gerrit.cloudera.org:8080/#/c/17979/12/be/src/runtime/io/disk-file.h@55
PS12, Line 55: class MemBlock {
> This seems like a fairly leaky abstraction. It mostly exists to handle the
It is quite a simple class working closely with the DiskFile and TmpFile stuff, 
may rely on the TmpFile stuff to reserve the memory (which relies on a global 
variable in TmpFileMgr) for it before having the right to allocate, and rely on 
the IO functions to write and read directly on the data_ under certain block 
status. Think for a while for a better abstraction, but dont have much idea for 
now while guarantee the efficiency and safety without too much complexity. I 
think one of the good thing here is the status transition is clear and one-way 
down, what the user needs to do, in most cases, is to lock on the block 
spinlock, do something, change the status if needed.


http://gerrit.cloudera.org:8080/#/c/17979/12/be/src/runtime/io/disk-file.h@220
PS12, Line 220:     std::unique_ptr<int64_t[]> page_cnts_per_block_;
> This seems a little funny, but I think I see why it's necessary. A const ve
Maybe personal preference. Prefer a fixed-size array if the size is fixed and 
type is simple. Looks more clear.


http://gerrit.cloudera.org:8080/#/c/17979/12/be/src/runtime/io/disk-file.h@446
PS12, Line 446:   // Helper function to check the status of a read buffe block.
> nit: buffe -> buffer
Done


http://gerrit.cloudera.org:8080/#/c/17979/12/be/src/runtime/io/disk-file.h@458
PS12, Line 458:   // Helper function to delete the read buffe block.
> nit: buffe -> buffer
Done


http://gerrit.cloudera.org:8080/#/c/17979/12/be/src/runtime/io/disk-file.h@516
PS12, Line 516:   /// The lock also protects the memory blocks from 
destruction, if the disk file has.
> "if the disk file has" seems like an incomplete sentence.
Done


http://gerrit.cloudera.org:8080/#/c/17979/12/be/src/runtime/io/disk-file.cc
File be/src/runtime/io/disk-file.cc:

http://gerrit.cloudera.org:8080/#/c/17979/12/be/src/runtime/io/disk-file.cc@99
PS12, Line 99: void MemBlock::Delete(bool* reserved, bool* allocated) {
> Some unit tests around MemBlock transitions might be useful.
Added a unit testcase "MemBlockTest" should have covered the MemBlock status 
transitions.


http://gerrit.cloudera.org:8080/#/c/17979/12/be/src/runtime/io/request-ranges.h
File be/src/runtime/io/request-ranges.h:

http://gerrit.cloudera.org:8080/#/c/17979/12/be/src/runtime/io/request-ranges.h@152
PS12, Line 152:   RequestRange(RequestType::type request_type, int disk_id = 
-1, int64_t offset = -1)
> How does an offset of -1 differ from 0?
I think the change is to allow assigning the offset in the constructor 
function. -1 is an invalid value for the offset, that means when we need to set 
the offset of the range before using the range, there are some assertion about 
it like "DCHECK_GE(offset, 0);" when using it.


http://gerrit.cloudera.org:8080/#/c/17979/12/be/src/runtime/io/scan-range.cc
File be/src/runtime/io/scan-range.cc:

http://gerrit.cloudera.org:8080/#/c/17979/12/be/src/runtime/io/scan-range.cc@156
PS12, Line 156:     if (!use_mem_buffer) {
> Since we asserted above that use_local_buff implies !use_mem_buffer, this c
Added some comments. The logic here is to set the correct file reader if it 
involves reading from the file system (could be local or remote file). If it 
uses memory buffer, it doesn't need to set this. There is a case if 
use_mem_buffer and use_local_buff are all false, this case we will use the 
original file_reader_ to get the range.


http://gerrit.cloudera.org:8080/#/c/17979/12/be/src/runtime/io/scan-range.cc@291
PS12, Line 291:       // 1. If it is the local buffer file is not 
deleted(evicted) yet.
> Change to "If the local buffer file ..."
Done


http://gerrit.cloudera.org:8080/#/c/17979/12/be/src/runtime/tmp-file-mgr-test.cc
File be/src/runtime/tmp-file-mgr-test.cc:

http://gerrit.cloudera.org:8080/#/c/17979/12/be/src/runtime/tmp-file-mgr-test.cc@1835
PS12, Line 1835:   int64_t file_size = 512 * 1024 * 1024;
> Why was this changed?
The maximum allowed file size for one remote file increases to 512MB in this 
patch defined in MAX_REMOTE_TMPFILE_SIZE_THRESHOLD_MB in tmp-file-mgr.cc. The 
reason for the increase is a bigger file size may have a better upload 
performance. By default we are still using 256MB, but may give users a chance 
if they would like to try a bigger size.


http://gerrit.cloudera.org:8080/#/c/17979/12/be/src/runtime/tmp-file-mgr.cc
File be/src/runtime/tmp-file-mgr.cc:

http://gerrit.cloudera.org:8080/#/c/17979/12/be/src/runtime/tmp-file-mgr.cc@117
PS12, Line 117:     "Set if the system uses batch reading for the remote 
temporary files.");
> This description could be more descriptive. Maybe "Set to prefetch addition
Added some description. But the prefetch is only in part 2, will add the 
prefetch description in part 2.



--
To view, visit http://gerrit.cloudera.org:8080/17979
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I1dcc5d0881ffaeff09c5c514306cd668373ad31b
Gerrit-Change-Number: 17979
Gerrit-PatchSet: 12
Gerrit-Owner: Yida Wu <wydbaggio...@gmail.com>
Gerrit-Reviewer: Abhishek Rawat <ara...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <impala-public-jenk...@cloudera.com>
Gerrit-Reviewer: Michael Smith <michael.sm...@cloudera.com>
Gerrit-Reviewer: Qifan Chen <qc...@cloudera.com>
Gerrit-Reviewer: Yida Wu <wydbaggio...@gmail.com>
Gerrit-Comment-Date: Tue, 28 Jun 2022 00:46:03 +0000
Gerrit-HasComments: Yes

Reply via email to