Yida Wu has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/17979 )

Change subject: IMPALA-10791 Add batching reading for remote temporary files
......................................................................


Patch Set 1:

(37 comments)

http://gerrit.cloudera.org:8080/#/c/17979/1//COMMIT_MSG
Commit Message:

http://gerrit.cloudera.org:8080/#/c/17979/1//COMMIT_MSG@38
PS1, Line 38: now the system
> You mean Impala or OS?. Is it a guarantee?
changed, should be the impala process memory. The mem_limit should be 80% by 
default.


http://gerrit.cloudera.org:8080/#/c/17979/1/be/src/runtime/io/disk-file.h
File be/src/runtime/io/disk-file.h:

http://gerrit.cloudera.org:8080/#/c/17979/1/be/src/runtime/io/disk-file.h@47
PS1, Line 47: DONE
> nit. WRITTEN probably is better.
Done


http://gerrit.cloudera.org:8080/#/c/17979/1/be/src/runtime/io/disk-file.h@55
PS1, Line 55: DISABLED
> nit. Could be UNINIT too?
I am a little worried about the memory leak.
So the final state of the memory block needs to be MemBlockStatus::DISABLED, 
and every block needs to be called Delete() before destroyed to prevent any 
memory not released.


http://gerrit.cloudera.org:8080/#/c/17979/1/be/src/runtime/io/disk-file.h@62
PS1, Line 62: poool
> nit.
Done


http://gerrit.cloudera.org:8080/#/c/17979/1/be/src/runtime/io/disk-file.h@70
PS1, Line 70:  bool IsDisabled() {
            :     std::unique_lock<SpinLock> l(lock_);
            :     return IsDisabledLocked(l);
            :   }
            :
            :   bool IsDisabledLocked(const std::unique_lock<SpinLock>& lock) {
            :     DCHECK(lock.mutex() == &lock_ && lock.owns_lock());
            :     return status_ == MemBlockStatus::DISABLED;
            :   }
> Can this pair of methods to be simplified as follows?
I would like to keep the caller away from passing the enum, therefore design it 
this way, the benefit is that the caller doesn't need to pass the enum.


http://gerrit.cloudera.org:8080/#/c/17979/1/be/src/runtime/io/disk-file.h@193
PS1, Line 193: lock_
> A little bit confusing when read code which applies lock_. since there are
Changed the name to memory_block_lock_.

I think the benefit of using the memory block is that it may reduce potential 
contention. For example, if using disk file lock, which is a spinlock, only one 
block out of four (or more) can be written at a time, and also the status 
changing.

The rule now is the disk lock is to protect the members actual_file_size_ and 
read_buffer_ctrl_ (exclude the memory blocks). The members of memory blocks are 
protected by their own locks.

If the caller worries the memory block to be destructed while using, the caller 
can first hold the physical_file_lock_, which is a shared lock, to protect the 
while file from destruction.


http://gerrit.cloudera.org:8080/#/c/17979/1/be/src/runtime/io/disk-file.h@292
PS1, Line 292:   std::lock_guard<SpinLock> l(lock_);
             :     return actual_file_size_;
> actual_file_size_ is AtomicInt64. Do we need to lock with lock_ here first?
Changed the actual_file_size_ to ordinary int64_t. Just think maybe it is 
better to be managed by the lock_ like others.


http://gerrit.cloudera.org:8080/#/c/17979/1/be/src/runtime/io/disk-file.h@379
PS1, Line 379: num_of_read_buffer
> nit. num_of_read_buffers.
Done


http://gerrit.cloudera.org:8080/#/c/17979/1/be/src/runtime/io/disk-file.h@381
PS1, Line 381: equals
> nit equal
Done


http://gerrit.cloudera.org:8080/#/c/17979/1/be/src/runtime/io/disk-file.h@391
PS1, Line 391: DCHECK_LT(buffer_idx, num_of_read_buffer());
             :     DCHECK_GE(buffer_idx, 0);
> Should call DCheckReadBufferIdx().
Thanks, done.


http://gerrit.cloudera.org:8080/#/c/17979/1/be/src/runtime/io/disk-file.h@393
PS1, Line 393:  std::lock_guard<SpinLock> lock(lock_);
> should we lock it first before check?
It should be okay without the lock, because the check is to use the index to 
compare with constant variables.


http://gerrit.cloudera.org:8080/#/c/17979/1/be/src/runtime/io/disk-file.h@415
PS1, Line 415:  DCheckReadBufferIdx(buffer_idx);
             :     std::lock_guard<SpinLock> lock(lock_);
> Lock before check?
same as above.


http://gerrit.cloudera.org:8080/#/c/17979/1/be/src/runtime/io/disk-file.h@509
PS1, Line 509: lock_
> Suggest to rename as disk_file_lock_.
Done


http://gerrit.cloudera.org:8080/#/c/17979/1/be/src/runtime/io/disk-file.cc
File be/src/runtime/io/disk-file.cc:

http://gerrit.cloudera.org:8080/#/c/17979/1/be/src/runtime/io/disk-file.cc@123
PS1, Line 123:  *allocated = true;
> If the memory is deleted, the state in its control block should become rese
Added some comments for MemBlock in disk-file.h.
The MemBlock must be in MemBlockStatus::DISABLED before destruction, and the 
Delete() would be needed to call before destruction and it will set the state 
to MemBlockStatus::DISABLED for all cases.


http://gerrit.cloudera.org:8080/#/c/17979/2/be/src/runtime/tmp-file-mgr-internal.h
File be/src/runtime/tmp-file-mgr-internal.h:

http://gerrit.cloudera.org:8080/#/c/17979/2/be/src/runtime/tmp-file-mgr-internal.h@234
PS2, Line 234: is
> nit if
Done


http://gerrit.cloudera.org:8080/#/c/17979/2/be/src/runtime/tmp-file-mgr-internal.h@339
PS2, Line 339:   // Return the
> nit. incomplete.
Thanks. Forgot this one. Done.


http://gerrit.cloudera.org:8080/#/c/17979/1/be/src/runtime/tmp-file-mgr-internal.h
File be/src/runtime/tmp-file-mgr-internal.h:

http://gerrit.cloudera.org:8080/#/c/17979/1/be/src/runtime/tmp-file-mgr-internal.h@233
PS1, Line 233: of
> nit off.
Done


http://gerrit.cloudera.org:8080/#/c/17979/1/be/src/runtime/tmp-file-mgr-internal.h@234
PS1, Line 234: is
> nit if.
Done


http://gerrit.cloudera.org:8080/#/c/17979/1/be/src/runtime/tmp-file-mgr-internal.h@283
PS1, Line 283: SetToDelete
> nit. May name as SetToDeleteFlag() since the argument can be false which in
Done


http://gerrit.cloudera.org:8080/#/c/17979/2/be/src/runtime/tmp-file-mgr.h
File be/src/runtime/tmp-file-mgr.h:

http://gerrit.cloudera.org:8080/#/c/17979/2/be/src/runtime/tmp-file-mgr.h@146
PS2, Line 146: block
> nit memory block?
changed to use "read buffer (memory block)" to describe the block.


http://gerrit.cloudera.org:8080/#/c/17979/2/be/src/runtime/tmp-file-mgr.h@149
PS2, Line 149: The number of read buffer files
> nit. Should it be number of buffer blocks, since we are dividing file size
change to use "read buffer (memory block)" to describe the block.


http://gerrit.cloudera.org:8080/#/c/17979/2/be/src/runtime/tmp-file-mgr.h@151
PS2, Line 151: remote_tmp_file_read_buffer_num_
> nit. may be num_remote_read_buffers_?
changed to remote_num_read_buffers_per_file_.


http://gerrit.cloudera.org:8080/#/c/17979/2/be/src/runtime/tmp-file-mgr.h@158
PS2, Line 158: remote_read_memory_buffer_size_
> nit maybe max_remote_read_buffer_size_?
Changed to remote_max_total_read_buffer_size_. Keep the remote as prefix.


http://gerrit.cloudera.org:8080/#/c/17979/2/be/src/runtime/tmp-file-mgr.h@166
PS2, Line 166: remote_batching_read_
> nit. remote_batching_read_enabled_
Done


http://gerrit.cloudera.org:8080/#/c/17979/2/be/src/runtime/tmp-file-mgr.h@220
PS2, Line 220: GetRemoteTmpFileReadBuffNum
> GetNumRemoteReadBuffers()
Changed to GetRemoteNumReadBuffersPerFile(), want to keep the Remote as a 
prefix, so that we can easily to tell the function is working for the remote fs.


http://gerrit.cloudera.org:8080/#/c/17979/2/be/src/runtime/tmp-file-mgr.h@225
PS2, Line 225: GetRemoteReadMemoryBuffSize
> Buffer instead Buff?
changed to GetRemoteMaxTotalReadBufferSize.


http://gerrit.cloudera.org:8080/#/c/17979/2/be/src/runtime/tmp-file-mgr.h@292
PS2, Line 292: helps to release
> releases
Done


http://gerrit.cloudera.org:8080/#/c/17979/1/be/src/runtime/tmp-file-mgr.h
File be/src/runtime/tmp-file-mgr.h:

http://gerrit.cloudera.org:8080/#/c/17979/1/be/src/runtime/tmp-file-mgr.h@245
PS1, Line 245:  Return if batching read for remote temporary files is enabled.
> nit. Indicate ...
Changed to IsRemoteBatchingReadEnabled().


http://gerrit.cloudera.org:8080/#/c/17979/2/be/src/runtime/tmp-file-mgr.cc
File be/src/runtime/tmp-file-mgr.cc:

http://gerrit.cloudera.org:8080/#/c/17979/2/be/src/runtime/tmp-file-mgr.cc@107
PS2, Line 107: read memory buffer
> read buffers
Done


http://gerrit.cloudera.org:8080/#/c/17979/2/be/src/runtime/tmp-file-mgr.cc@144
PS2, Line 144:
> the maximal percentage of the system memory that can be utilized
Done


http://gerrit.cloudera.org:8080/#/c/17979/2/be/src/runtime/tmp-file-mgr.cc@257
PS2, Line 257:
             :   tmp_dirs_remote_ctrl_.remote_batching_read_ = FLAGS_re
> Should we consider a special case here when the remote temp file size is sm
Yes. That is the case in the new testcase, 
TmpFileMgrTest::TestBatchingReadFromRemote.

The testcase is using 1KB each file size, and therefore each block is 256B. But 
the page is 512B. Therefore, there are only two blocks with holes for each file 
like below:
blocks       [512B]  []   [512B]  []
start offset   0    256   512   768

It should be working fine.

But since it is a very rare case, the max page size of Impala is 2MB, unless we 
set the file size lower than 8MB, we should not have this case.

In current production system, we are using 256MB for each file.


http://gerrit.cloudera.org:8080/#/c/17979/2/be/src/runtime/tmp-file-mgr.cc@260
PS2, Line 260: rs_remote_ctrl_.remote_read_buffer_block_size_ =
> Should create a constant for it as the number in bytes is computed twice.
Done


http://gerrit.cloudera.org:8080/#/c/17979/2/be/src/runtime/tmp-file-mgr.cc@257
PS2, Line 257:
             :   tmp_dirs_remote_ctrl_.remote_batching_read_ = 
FLAGS_remote_batching_read;
             :   if (tmp_dirs_remote_ctrl_.remote_batching_read_) {
             :     tmp_dirs_remote_ctrl_.remote_read_buffer_block_size_ =
             :         tmp_dirs_remote_ctrl_.remote_tmp_file_size_ / 4;
             :     if (tmp_dirs_remote_ctrl_.remote_read_buffer_block_size_
             :         > MAX_REMOTE_READ_MEM_BLOCK_THRESHOLD_MB * 1024 * 1024) {
             :       tmp_dirs_remote_ctrl_.remote_read_buffer_block_size_ =
             :           MAX_REMOTE_READ_MEM_BLOCK_THRESHOLD_MB * 1024 * 1024;
             :     }
             :     tmp_dirs_remote_ctrl_.remote_tmp_file_read_buffer_num_ =
             :         
static_cast<int>(tmp_dirs_remote_ctrl_.remote_tmp_file_size_
             :             / 
tmp_dirs_remote_ctrl_.remote_read_buffer_block_size_);
             :     tmp_dirs_remote_ctrl_.remote_read_memory_buffer_size_ =
             :         
ParseUtil::ParseMemSpec(FLAGS_remote_read_memory_buffer_size, &is_percent, 0);
             :     i
> This entire block of code is a good candidate for a member function to popu
Done


http://gerrit.cloudera.org:8080/#/c/17979/2/be/src/runtime/tmp-file-mgr.cc@278
PS2, Line 278: 4_t sys_bytes_limit
> Should check max_allow_bytes > 0
Done


http://gerrit.cloudera.org:8080/#/c/17979/2/be/src/runtime/tmp-file-mgr.cc@954
PS2, Line 954: _, file_group_->tmp_file_mgr_->GetRemoteTmpFileSi
> We should check disk_read_page_cnts_.get() is not null.
Added a DCHECK.


http://gerrit.cloudera.org:8080/#/c/17979/2/be/src/runtime/tmp-file-mgr.cc@1039
PS2, Line 1039: }
> Do we revive this buffer block when the memory pressure is back to normal?
Nope, in this version, we disable the buffer block if fails to reserve the 
space, so the pages belongs to the block will be read by page. But because 
there are other blocks, they can still use the buffer block when the memory 
pressure is back to normal.


http://gerrit.cloudera.org:8080/#/c/17979/2/be/src/runtime/tmp-file-mgr.cc@1074
PS2, Line 1074:
> IncrementReadPageCount() since we increase the count by 1.
Done



--
To view, visit http://gerrit.cloudera.org:8080/17979
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I1dcc5d0881ffaeff09c5c514306cd668373ad31b
Gerrit-Change-Number: 17979
Gerrit-PatchSet: 1
Gerrit-Owner: Yida Wu <wydbaggio...@gmail.com>
Gerrit-Reviewer: Impala Public Jenkins <impala-public-jenk...@cloudera.com>
Gerrit-Reviewer: Qifan Chen <qc...@cloudera.com>
Gerrit-Reviewer: Yida Wu <wydbaggio...@gmail.com>
Gerrit-Comment-Date: Fri, 29 Oct 2021 02:34:37 +0000
Gerrit-HasComments: Yes

Reply via email to