Yida Wu has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/16318 )

Change subject: IMPALA-9867: Add Support for Spilling to S3: Milestone 1
......................................................................


Patch Set 27:

(18 comments)

http://gerrit.cloudera.org:8080/#/c/16318/27//COMMIT_MSG
Commit Message:

http://gerrit.cloudera.org:8080/#/c/16318/27//COMMIT_MSG@77
PS27, Line 77:    The first available local directory is used for the local 
buffer
> Can you give an example of how you might set this up in practice to allow b
Done


http://gerrit.cloudera.org:8080/#/c/16318/27//COMMIT_MSG@91
PS27, Line 91: * Unit Tests added to tmp-file-mgr-test/disk-io-mgr-test.
> I think it'd be good to add a buffer pool test that exercises the remote sp
Added BufferPoolTest::Multi8RandomSpillToRemoteMix and 
BufferPoolTest::Multi8RandomSpillToRemote.


http://gerrit.cloudera.org:8080/#/c/16318/27/be/src/runtime/io/disk-file.h
File be/src/runtime/io/disk-file.h:

http://gerrit.cloudera.org:8080/#/c/16318/27/be/src/runtime/io/disk-file.h@137
PS27, Line 137:   void SetActualFileSize(int64_t size) {
> This is only called once, right? Can we check that invariant:
Done


http://gerrit.cloudera.org:8080/#/c/16318/27/be/src/runtime/io/disk-file.h@138
PS27, Line 138: DCHECK
> nit: use DCHECK_LE - it automatically prints the two values if it fails so
Done


http://gerrit.cloudera.org:8080/#/c/16318/27/be/src/runtime/io/disk-io-mgr.cc
File be/src/runtime/io/disk-io-mgr.cc:

http://gerrit.cloudera.org:8080/#/c/16318/27/be/src/runtime/io/disk-io-mgr.cc@252
PS27, Line 252:     is_full_ = written_bytes != 0 && written_bytes == 
disk_file_->actual_file_size();
> Add
Done


http://gerrit.cloudera.org:8080/#/c/16318/27/be/src/runtime/io/disk-io-mgr.cc@329
PS27, Line 329:     int bytes = (file_size - offset < buffer_size) ? (file_size 
- offset) : buffer_size;
> Maybe use min() instead?
Done


http://gerrit.cloudera.org:8080/#/c/16318/27/be/src/runtime/io/request-context.cc
File be/src/runtime/io/request-context.cc:

http://gerrit.cloudera.org:8080/#/c/16318/27/be/src/runtime/io/request-context.cc@243
PS27, Line 243:     (static_cast<WriteRange*>(range))->callback()(status);
> nit: I think the parentheses around static_cast here and on l246 are not ne
Done


http://gerrit.cloudera.org:8080/#/c/16318/27/be/src/runtime/tmp-file-mgr-internal.h
File be/src/runtime/tmp-file-mgr-internal.h:

http://gerrit.cloudera.org:8080/#/c/16318/27/be/src/runtime/tmp-file-mgr-internal.h@326
PS27, Line 326:   /// The number of the tmp files in the pool. For debug and 
test.
> We don't seem to use this variable any more
Removed.


http://gerrit.cloudera.org:8080/#/c/16318/27/be/src/runtime/tmp-file-mgr.h
File be/src/runtime/tmp-file-mgr.h:

http://gerrit.cloudera.org:8080/#/c/16318/27/be/src/runtime/tmp-file-mgr.h@194
PS27, Line 194:     std::unique_ptr<TmpFileBufferPool> tmp_file_pool_;
> When is this null or non-null?
The pool only works for remote spilling. It would be null when there is no 
remote directory. Otherwise, the tmp_file_pool_ is non-null. Added a function 
TmpFileMgr::HasRemoteDir(), if true, tmp_file_pool_ must be non-null. Will add 
comments.


http://gerrit.cloudera.org:8080/#/c/16318/27/be/src/runtime/tmp-file-mgr.cc
File be/src/runtime/tmp-file-mgr.cc:

http://gerrit.cloudera.org:8080/#/c/16318/27/be/src/runtime/tmp-file-mgr.cc@512
PS27, Line 512:   if (tmp_dirs_remote_ctrl_.tmp_file_pool_ == nullptr) {
> What [
Done


http://gerrit.cloudera.org:8080/#/c/16318/27/be/src/runtime/tmp-file-mgr.cc@917
PS27, Line 917:     if (tmp_file_mgr_->tmp_dirs_remote_ctrl_.tmp_file_pool_ != 
nullptr) {
> Which case is this? Can we add a boolean helper method to TmpDirsRemoteCtrl
The tmp_file_pool should be non-null if there is a remote scratch space 
directory registered, because the pool is only working for buffer of remote 
files. Added a HasRemoteDir() function to TmpFileMgr.


http://gerrit.cloudera.org:8080/#/c/16318/27/be/src/runtime/tmp-file-mgr.cc@967
PS27, Line 967:   if (!tmp_files_remote_.empty()) {
> It would be helpful to add a couple of comments to explain the code flow. E
Done


http://gerrit.cloudera.org:8080/#/c/16318/27/be/src/runtime/tmp-file-mgr.cc@1019
PS27, Line 1019:     return Status("Page size can't be larger than the 
temporary file size.");
> I think this error isn't accurate any more?
Thanks for pointing this out.


http://gerrit.cloudera.org:8080/#/c/16318/27/be/src/runtime/tmp-file-mgr.cc@1073
PS27, Line 1073:       if ((*tmp_file)->is_blacklisted()) {
> nit: there seems to be some unnecessary formatting changes here that it wou
Done


http://gerrit.cloudera.org:8080/#/c/16318/27/be/src/runtime/tmp-file-mgr.cc@1112
PS27, Line 1112:     if (!alloc_full || !status.ok()) return status;
> nit: maybe reverse the condition so status.ok() comes first. It's basically
Done


http://gerrit.cloudera.org:8080/#/c/16318/27/be/src/runtime/tmp-file-mgr.cc@1705
PS27, Line 1705:     if (cur_write_range_ == nullptr) {
               :       DCHECK(shut_down_);
               :       return;
               :     }
> Couldn't we just return on line 1699 in this case?
Done


http://gerrit.cloudera.org:8080/#/c/16318/27/be/src/runtime/tmp-file-mgr.cc@1753
PS27, Line 1753:             auto key_range_it = find(write_ranges_.begin(), 
write_ranges_.end(), range);
> I think this ends up being O(n^2) in the number of write ranges, which isn'
Only the first range of a file is in the "write_ranges_" and can reach this 
logic, so instead of O(n^2), I think it should be O(m+n) to remove all the 
ranges of a file from the pool (m is the number of files waiting for 
reservation). Added a map "write_ranges_iterator_" to store the "range to 
iterator" relationship and be the index.


http://gerrit.cloudera.org:8080/#/c/16318/27/be/src/runtime/tmp-file-mgr.cc@1862
PS27, Line 1862: DCHECK
> nit: DCHECK_EQ
Done



--
To view, visit http://gerrit.cloudera.org:8080/16318
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I419b1d5dbbfe35334d9f964c4b65e553579fdc89
Gerrit-Change-Number: 16318
Gerrit-PatchSet: 27
Gerrit-Owner: Yida Wu <wydbaggio...@gmail.com>
Gerrit-Reviewer: Abhishek Rawat <ara...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <impala-public-jenk...@cloudera.com>
Gerrit-Reviewer: Sahil Takiar <stak...@cloudera.com>
Gerrit-Reviewer: Tim Armstrong <tarmstr...@cloudera.com>
Gerrit-Reviewer: Yida Wu <wydbaggio...@gmail.com>
Gerrit-Comment-Date: Tue, 19 Jan 2021 16:54:33 +0000
Gerrit-HasComments: Yes

Reply via email to