Michael Smith has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/17979 )

Change subject: IMPALA-10791 Add batch reading for remote temporary files
......................................................................


Patch Set 12:

(19 comments)

A mix of questions and suggestions. I didn't see anything clearly wrong.

http://gerrit.cloudera.org:8080/#/c/17979/12/be/src/runtime/io/disk-file.h
File be/src/runtime/io/disk-file.h:

http://gerrit.cloudera.org:8080/#/c/17979/12/be/src/runtime/io/disk-file.h@55
PS12, Line 55: class MemBlock {
This seems like a fairly leaky abstraction. It mostly exists to handle the 
lifecycle around data_, but also relies on external operations to do lots of 
the actual management. Not sure I have specific improvements to suggest however.


http://gerrit.cloudera.org:8080/#/c/17979/12/be/src/runtime/io/disk-file.h@60
PS12, Line 60:     DCHECK_EQ(static_cast<int>(status_), 
static_cast<int>(MemBlockStatus::DISABLED));
Can we also verify no memory leaks by checking `data_`?


http://gerrit.cloudera.org:8080/#/c/17979/12/be/src/runtime/io/disk-file.h@220
PS12, Line 220:     std::unique_ptr<int64_t[]> page_cnts_per_block_;
This seems a little funny, but I think I see why it's necessary. A const vector 
wouldn't let you update element values, and a non-const vector could 
potentially be resized. This is the only way to get a runtime fixed-size array.


http://gerrit.cloudera.org:8080/#/c/17979/12/be/src/runtime/io/disk-file.h@386
PS12, Line 386:   bool DoReadFromReadBuffer(
This function doesn't take any action, so I'm not really sure why it's called 
"DoRead". I'd opt for Can, Should, or Must (depending on expectations here).


http://gerrit.cloudera.org:8080/#/c/17979/12/be/src/runtime/io/disk-file.h@446
PS12, Line 446:   // Helper function to check the status of a read buffe block.
nit: buffe -> buffer


http://gerrit.cloudera.org:8080/#/c/17979/12/be/src/runtime/io/disk-file.h@458
PS12, Line 458:   // Helper function to delete the read buffe block.
nit: buffe -> buffer


http://gerrit.cloudera.org:8080/#/c/17979/12/be/src/runtime/io/disk-file.h@516
PS12, Line 516:   /// The lock also protects the memory blocks from 
destruction, if the disk file has.
"if the disk file has" seems like an incomplete sentence.


http://gerrit.cloudera.org:8080/#/c/17979/12/be/src/runtime/io/disk-file.cc
File be/src/runtime/io/disk-file.cc:

http://gerrit.cloudera.org:8080/#/c/17979/12/be/src/runtime/io/disk-file.cc@99
PS12, Line 99: void MemBlock::Delete(bool* reserved, bool* allocated) {
Some unit tests around MemBlock transitions might be useful.


http://gerrit.cloudera.org:8080/#/c/17979/12/be/src/runtime/io/disk-file.cc@116
PS12, Line 116:       SetStatusLocked(lock, MemBlockStatus::DISABLED);
It would be nice to have a DCHECK(data_ == nullptr) here. It would be true 
immediately after freeing ALLOC memory, or in other states (because no memory 
should have been allocated).


http://gerrit.cloudera.org:8080/#/c/17979/12/be/src/runtime/io/disk-io-mgr.cc
File be/src/runtime/io/disk-io-mgr.cc:

http://gerrit.cloudera.org:8080/#/c/17979/12/be/src/runtime/io/disk-io-mgr.cc@463
PS12, Line 463:     // If there was an error during reading, keep the old 
status.
Might be helpful to log errors that we can't return.


http://gerrit.cloudera.org:8080/#/c/17979/12/be/src/runtime/io/request-ranges.h
File be/src/runtime/io/request-ranges.h:

http://gerrit.cloudera.org:8080/#/c/17979/12/be/src/runtime/io/request-ranges.h@152
PS12, Line 152:   RequestRange(RequestType::type request_type, int disk_id = 
-1, int64_t offset = -1)
How does an offset of -1 differ from 0?


http://gerrit.cloudera.org:8080/#/c/17979/12/be/src/runtime/io/scan-range.cc
File be/src/runtime/io/scan-range.cc:

http://gerrit.cloudera.org:8080/#/c/17979/12/be/src/runtime/io/scan-range.cc@156
PS12, Line 156:     if (!use_mem_buffer) {
Since we asserted above that use_local_buff implies !use_mem_buffer, this check 
is redundant.


http://gerrit.cloudera.org:8080/#/c/17979/12/be/src/runtime/io/scan-range.cc@291
PS12, Line 291:       // 1. If it is the local buffer file is not 
deleted(evicted) yet.
Change to "If the local buffer file ..."


http://gerrit.cloudera.org:8080/#/c/17979/12/be/src/runtime/tmp-file-mgr-test.cc
File be/src/runtime/tmp-file-mgr-test.cc:

http://gerrit.cloudera.org:8080/#/c/17979/12/be/src/runtime/tmp-file-mgr-test.cc@1835
PS12, Line 1835:   int64_t file_size = 512 * 1024 * 1024;
Why was this changed?


http://gerrit.cloudera.org:8080/#/c/17979/12/be/src/runtime/tmp-file-mgr.cc
File be/src/runtime/tmp-file-mgr.cc:

http://gerrit.cloudera.org:8080/#/c/17979/12/be/src/runtime/tmp-file-mgr.cc@108
PS12, Line 108:     "files. Only valid when the remote_batch_read is true.");
"Only valid when remote_batch_read is true." seems more consistent with how we 
talk about other flags.


http://gerrit.cloudera.org:8080/#/c/17979/12/be/src/runtime/tmp-file-mgr.cc@117
PS12, Line 117:     "Set if the system uses batch reading for the remote 
temporary files.");
This description could be more descriptive. Maybe "Set to prefetch additional 
blocks when reading from remote temporary files." Unless prefetch is the wrong 
term; it's not clear what's being batched without digging into the source code 
a lot.


http://gerrit.cloudera.org:8080/#/c/17979/12/be/src/runtime/tmp-file-mgr.cc@259
PS12, Line 259:       LOG(WARNING) << "Disable the read buffer for the remote 
temporary files "
"Disabled the read buffer for remote temporary files due to errors in read 
buffer parameters: "


http://gerrit.cloudera.org:8080/#/c/17979/12/be/src/runtime/tmp-file-mgr.cc@1116
PS12, Line 1116:     // ahead of the real allocation, we need to decrease it if 
the block is reversed
reversed -> reserved


http://gerrit.cloudera.org:8080/#/c/17979/12/be/src/runtime/tmp-file-mgr.cc@1132
PS12, Line 1132:   // Set a flag to notify other threads which are hold the 
file lock to release,
nit: hold -> holding



--
To view, visit http://gerrit.cloudera.org:8080/17979
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I1dcc5d0881ffaeff09c5c514306cd668373ad31b
Gerrit-Change-Number: 17979
Gerrit-PatchSet: 12
Gerrit-Owner: Yida Wu <wydbaggio...@gmail.com>
Gerrit-Reviewer: Abhishek Rawat <ara...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <impala-public-jenk...@cloudera.com>
Gerrit-Reviewer: Michael Smith <michael.sm...@cloudera.com>
Gerrit-Reviewer: Qifan Chen <qc...@cloudera.com>
Gerrit-Reviewer: Yida Wu <wydbaggio...@gmail.com>
Gerrit-Comment-Date: Mon, 20 Jun 2022 21:43:22 +0000
Gerrit-HasComments: Yes

Reply via email to