Michael Smith has posted comments on this change. ( http://gerrit.cloudera.org:8080/17979 )
Change subject: IMPALA-10791 Add batch reading for remote temporary files ...................................................................... Patch Set 12: (19 comments) A mix of questions and suggestions. I didn't see anything clearly wrong. http://gerrit.cloudera.org:8080/#/c/17979/12/be/src/runtime/io/disk-file.h File be/src/runtime/io/disk-file.h: http://gerrit.cloudera.org:8080/#/c/17979/12/be/src/runtime/io/disk-file.h@55 PS12, Line 55: class MemBlock { This seems like a fairly leaky abstraction. It mostly exists to handle the lifecycle around data_, but also relies on external operations to do lots of the actual management. Not sure I have specific improvements to suggest however. http://gerrit.cloudera.org:8080/#/c/17979/12/be/src/runtime/io/disk-file.h@60 PS12, Line 60: DCHECK_EQ(static_cast<int>(status_), static_cast<int>(MemBlockStatus::DISABLED)); Can we also verify no memory leaks by checking `data_`? http://gerrit.cloudera.org:8080/#/c/17979/12/be/src/runtime/io/disk-file.h@220 PS12, Line 220: std::unique_ptr<int64_t[]> page_cnts_per_block_; This seems a little funny, but I think I see why it's necessary. A const vector wouldn't let you update element values, and a non-const vector could potentially be resized. This is the only way to get a runtime fixed-size array. http://gerrit.cloudera.org:8080/#/c/17979/12/be/src/runtime/io/disk-file.h@386 PS12, Line 386: bool DoReadFromReadBuffer( This function doesn't take any action, so I'm not really sure why it's called "DoRead". I'd opt for Can, Should, or Must (depending on expectations here). http://gerrit.cloudera.org:8080/#/c/17979/12/be/src/runtime/io/disk-file.h@446 PS12, Line 446: // Helper function to check the status of a read buffe block. nit: buffe -> buffer http://gerrit.cloudera.org:8080/#/c/17979/12/be/src/runtime/io/disk-file.h@458 PS12, Line 458: // Helper function to delete the read buffe block. nit: buffe -> buffer http://gerrit.cloudera.org:8080/#/c/17979/12/be/src/runtime/io/disk-file.h@516 PS12, Line 516: /// The lock also protects the memory blocks from destruction, if the disk file has. "if the disk file has" seems like an incomplete sentence. http://gerrit.cloudera.org:8080/#/c/17979/12/be/src/runtime/io/disk-file.cc File be/src/runtime/io/disk-file.cc: http://gerrit.cloudera.org:8080/#/c/17979/12/be/src/runtime/io/disk-file.cc@99 PS12, Line 99: void MemBlock::Delete(bool* reserved, bool* allocated) { Some unit tests around MemBlock transitions might be useful. http://gerrit.cloudera.org:8080/#/c/17979/12/be/src/runtime/io/disk-file.cc@116 PS12, Line 116: SetStatusLocked(lock, MemBlockStatus::DISABLED); It would be nice to have a DCHECK(data_ == nullptr) here. It would be true immediately after freeing ALLOC memory, or in other states (because no memory should have been allocated). http://gerrit.cloudera.org:8080/#/c/17979/12/be/src/runtime/io/disk-io-mgr.cc File be/src/runtime/io/disk-io-mgr.cc: http://gerrit.cloudera.org:8080/#/c/17979/12/be/src/runtime/io/disk-io-mgr.cc@463 PS12, Line 463: // If there was an error during reading, keep the old status. Might be helpful to log errors that we can't return. http://gerrit.cloudera.org:8080/#/c/17979/12/be/src/runtime/io/request-ranges.h File be/src/runtime/io/request-ranges.h: http://gerrit.cloudera.org:8080/#/c/17979/12/be/src/runtime/io/request-ranges.h@152 PS12, Line 152: RequestRange(RequestType::type request_type, int disk_id = -1, int64_t offset = -1) How does an offset of -1 differ from 0? http://gerrit.cloudera.org:8080/#/c/17979/12/be/src/runtime/io/scan-range.cc File be/src/runtime/io/scan-range.cc: http://gerrit.cloudera.org:8080/#/c/17979/12/be/src/runtime/io/scan-range.cc@156 PS12, Line 156: if (!use_mem_buffer) { Since we asserted above that use_local_buff implies !use_mem_buffer, this check is redundant. http://gerrit.cloudera.org:8080/#/c/17979/12/be/src/runtime/io/scan-range.cc@291 PS12, Line 291: // 1. If it is the local buffer file is not deleted(evicted) yet. Change to "If the local buffer file ..." http://gerrit.cloudera.org:8080/#/c/17979/12/be/src/runtime/tmp-file-mgr-test.cc File be/src/runtime/tmp-file-mgr-test.cc: http://gerrit.cloudera.org:8080/#/c/17979/12/be/src/runtime/tmp-file-mgr-test.cc@1835 PS12, Line 1835: int64_t file_size = 512 * 1024 * 1024; Why was this changed? http://gerrit.cloudera.org:8080/#/c/17979/12/be/src/runtime/tmp-file-mgr.cc File be/src/runtime/tmp-file-mgr.cc: http://gerrit.cloudera.org:8080/#/c/17979/12/be/src/runtime/tmp-file-mgr.cc@108 PS12, Line 108: "files. Only valid when the remote_batch_read is true."); "Only valid when remote_batch_read is true." seems more consistent with how we talk about other flags. http://gerrit.cloudera.org:8080/#/c/17979/12/be/src/runtime/tmp-file-mgr.cc@117 PS12, Line 117: "Set if the system uses batch reading for the remote temporary files."); This description could be more descriptive. Maybe "Set to prefetch additional blocks when reading from remote temporary files." Unless prefetch is the wrong term; it's not clear what's being batched without digging into the source code a lot. http://gerrit.cloudera.org:8080/#/c/17979/12/be/src/runtime/tmp-file-mgr.cc@259 PS12, Line 259: LOG(WARNING) << "Disable the read buffer for the remote temporary files " "Disabled the read buffer for remote temporary files due to errors in read buffer parameters: " http://gerrit.cloudera.org:8080/#/c/17979/12/be/src/runtime/tmp-file-mgr.cc@1116 PS12, Line 1116: // ahead of the real allocation, we need to decrease it if the block is reversed reversed -> reserved http://gerrit.cloudera.org:8080/#/c/17979/12/be/src/runtime/tmp-file-mgr.cc@1132 PS12, Line 1132: // Set a flag to notify other threads which are hold the file lock to release, nit: hold -> holding -- To view, visit http://gerrit.cloudera.org:8080/17979 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I1dcc5d0881ffaeff09c5c514306cd668373ad31b Gerrit-Change-Number: 17979 Gerrit-PatchSet: 12 Gerrit-Owner: Yida Wu <wydbaggio...@gmail.com> Gerrit-Reviewer: Abhishek Rawat <ara...@cloudera.com> Gerrit-Reviewer: Impala Public Jenkins <impala-public-jenk...@cloudera.com> Gerrit-Reviewer: Michael Smith <michael.sm...@cloudera.com> Gerrit-Reviewer: Qifan Chen <qc...@cloudera.com> Gerrit-Reviewer: Yida Wu <wydbaggio...@gmail.com> Gerrit-Comment-Date: Mon, 20 Jun 2022 21:43:22 +0000 Gerrit-HasComments: Yes