Yida Wu has posted comments on this change. ( http://gerrit.cloudera.org:8080/16318 )
Change subject: IMPALA-9867: Add Support for Spilling to S3: Milestone 1 ...................................................................... Patch Set 27: (18 comments) http://gerrit.cloudera.org:8080/#/c/16318/27//COMMIT_MSG Commit Message: http://gerrit.cloudera.org:8080/#/c/16318/27//COMMIT_MSG@77 PS27, Line 77: The first available local directory is used for the local buffer > Can you give an example of how you might set this up in practice to allow b Done http://gerrit.cloudera.org:8080/#/c/16318/27//COMMIT_MSG@91 PS27, Line 91: * Unit Tests added to tmp-file-mgr-test/disk-io-mgr-test. > I think it'd be good to add a buffer pool test that exercises the remote sp Added BufferPoolTest::Multi8RandomSpillToRemoteMix and BufferPoolTest::Multi8RandomSpillToRemote. http://gerrit.cloudera.org:8080/#/c/16318/27/be/src/runtime/io/disk-file.h File be/src/runtime/io/disk-file.h: http://gerrit.cloudera.org:8080/#/c/16318/27/be/src/runtime/io/disk-file.h@137 PS27, Line 137: void SetActualFileSize(int64_t size) { > This is only called once, right? Can we check that invariant: Done http://gerrit.cloudera.org:8080/#/c/16318/27/be/src/runtime/io/disk-file.h@138 PS27, Line 138: DCHECK > nit: use DCHECK_LE - it automatically prints the two values if it fails so Done http://gerrit.cloudera.org:8080/#/c/16318/27/be/src/runtime/io/disk-io-mgr.cc File be/src/runtime/io/disk-io-mgr.cc: http://gerrit.cloudera.org:8080/#/c/16318/27/be/src/runtime/io/disk-io-mgr.cc@252 PS27, Line 252: is_full_ = written_bytes != 0 && written_bytes == disk_file_->actual_file_size(); > Add Done http://gerrit.cloudera.org:8080/#/c/16318/27/be/src/runtime/io/disk-io-mgr.cc@329 PS27, Line 329: int bytes = (file_size - offset < buffer_size) ? (file_size - offset) : buffer_size; > Maybe use min() instead? Done http://gerrit.cloudera.org:8080/#/c/16318/27/be/src/runtime/io/request-context.cc File be/src/runtime/io/request-context.cc: http://gerrit.cloudera.org:8080/#/c/16318/27/be/src/runtime/io/request-context.cc@243 PS27, Line 243: (static_cast<WriteRange*>(range))->callback()(status); > nit: I think the parentheses around static_cast here and on l246 are not ne Done http://gerrit.cloudera.org:8080/#/c/16318/27/be/src/runtime/tmp-file-mgr-internal.h File be/src/runtime/tmp-file-mgr-internal.h: http://gerrit.cloudera.org:8080/#/c/16318/27/be/src/runtime/tmp-file-mgr-internal.h@326 PS27, Line 326: /// The number of the tmp files in the pool. For debug and test. > We don't seem to use this variable any more Removed. http://gerrit.cloudera.org:8080/#/c/16318/27/be/src/runtime/tmp-file-mgr.h File be/src/runtime/tmp-file-mgr.h: http://gerrit.cloudera.org:8080/#/c/16318/27/be/src/runtime/tmp-file-mgr.h@194 PS27, Line 194: std::unique_ptr<TmpFileBufferPool> tmp_file_pool_; > When is this null or non-null? The pool only works for remote spilling. It would be null when there is no remote directory. Otherwise, the tmp_file_pool_ is non-null. Added a function TmpFileMgr::HasRemoteDir(), if true, tmp_file_pool_ must be non-null. Will add comments. http://gerrit.cloudera.org:8080/#/c/16318/27/be/src/runtime/tmp-file-mgr.cc File be/src/runtime/tmp-file-mgr.cc: http://gerrit.cloudera.org:8080/#/c/16318/27/be/src/runtime/tmp-file-mgr.cc@512 PS27, Line 512: if (tmp_dirs_remote_ctrl_.tmp_file_pool_ == nullptr) { > What [ Done http://gerrit.cloudera.org:8080/#/c/16318/27/be/src/runtime/tmp-file-mgr.cc@917 PS27, Line 917: if (tmp_file_mgr_->tmp_dirs_remote_ctrl_.tmp_file_pool_ != nullptr) { > Which case is this? Can we add a boolean helper method to TmpDirsRemoteCtrl The tmp_file_pool should be non-null if there is a remote scratch space directory registered, because the pool is only working for buffer of remote files. Added a HasRemoteDir() function to TmpFileMgr. http://gerrit.cloudera.org:8080/#/c/16318/27/be/src/runtime/tmp-file-mgr.cc@967 PS27, Line 967: if (!tmp_files_remote_.empty()) { > It would be helpful to add a couple of comments to explain the code flow. E Done http://gerrit.cloudera.org:8080/#/c/16318/27/be/src/runtime/tmp-file-mgr.cc@1019 PS27, Line 1019: return Status("Page size can't be larger than the temporary file size."); > I think this error isn't accurate any more? Thanks for pointing this out. http://gerrit.cloudera.org:8080/#/c/16318/27/be/src/runtime/tmp-file-mgr.cc@1073 PS27, Line 1073: if ((*tmp_file)->is_blacklisted()) { > nit: there seems to be some unnecessary formatting changes here that it wou Done http://gerrit.cloudera.org:8080/#/c/16318/27/be/src/runtime/tmp-file-mgr.cc@1112 PS27, Line 1112: if (!alloc_full || !status.ok()) return status; > nit: maybe reverse the condition so status.ok() comes first. It's basically Done http://gerrit.cloudera.org:8080/#/c/16318/27/be/src/runtime/tmp-file-mgr.cc@1705 PS27, Line 1705: if (cur_write_range_ == nullptr) { : DCHECK(shut_down_); : return; : } > Couldn't we just return on line 1699 in this case? Done http://gerrit.cloudera.org:8080/#/c/16318/27/be/src/runtime/tmp-file-mgr.cc@1753 PS27, Line 1753: auto key_range_it = find(write_ranges_.begin(), write_ranges_.end(), range); > I think this ends up being O(n^2) in the number of write ranges, which isn' Only the first range of a file is in the "write_ranges_" and can reach this logic, so instead of O(n^2), I think it should be O(m+n) to remove all the ranges of a file from the pool (m is the number of files waiting for reservation). Added a map "write_ranges_iterator_" to store the "range to iterator" relationship and be the index. http://gerrit.cloudera.org:8080/#/c/16318/27/be/src/runtime/tmp-file-mgr.cc@1862 PS27, Line 1862: DCHECK > nit: DCHECK_EQ Done -- To view, visit http://gerrit.cloudera.org:8080/16318 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I419b1d5dbbfe35334d9f964c4b65e553579fdc89 Gerrit-Change-Number: 16318 Gerrit-PatchSet: 27 Gerrit-Owner: Yida Wu <wydbaggio...@gmail.com> Gerrit-Reviewer: Abhishek Rawat <ara...@cloudera.com> Gerrit-Reviewer: Impala Public Jenkins <impala-public-jenk...@cloudera.com> Gerrit-Reviewer: Sahil Takiar <stak...@cloudera.com> Gerrit-Reviewer: Tim Armstrong <tarmstr...@cloudera.com> Gerrit-Reviewer: Yida Wu <wydbaggio...@gmail.com> Gerrit-Comment-Date: Tue, 19 Jan 2021 16:54:33 +0000 Gerrit-HasComments: Yes