Yida Wu has posted comments on this change. ( http://gerrit.cloudera.org:8080/17979 )
Change subject: IMPALA-10791 Add batching reading for remote temporary files ...................................................................... Patch Set 6: (6 comments) We have at most 10% of the process memory for the remote batching reading, theoretically this memory consumption should be okay, but maybe later we can test more fierce cases http://gerrit.cloudera.org:8080/#/c/17979/5/be/src/runtime/io/disk-io-mgr.cc File be/src/runtime/io/disk-io-mgr.cc: http://gerrit.cloudera.org:8080/#/c/17979/5/be/src/runtime/io/disk-io-mgr.cc@424 PS5, Line 424: read_buffer_bloc, MemBlockStatus::DISABLED, dstl, &read_buffer_lock)) { > Well, if we are here, it means we want to a memory block to hold some data, I think there are two types of memory may be involved. For the memory of the block content, we don't need to allocate all of them of a file during construction. Each block will reserve or allocate its own memory separately when a page in the block is going to be read(pinned). Right now the memory reuse process is, we will free the memory of a DISABLED block and return the reservation to the global, then other block can get the reservation and allocate itself. Maybe later we can have some memory pool, but compared to the rate of network transmission, it should be not that bad to have one system free and malloc during reuse. For the memory of block structures(MemBlock), we will have all of them since construction of the TmpFile till destruction. For example, if the file is 256M, block is 16M, we will have 16 block structures created since construction of the TmpFile. But consider that, the size of each MemBlock structure is 128 bytes, if we are going to spill 1TB data, the total memory needs for the block structures is 1T/16M*128B=8M, the number is not that bad. http://gerrit.cloudera.org:8080/#/c/17979/5/be/src/runtime/io/disk-io-mgr.cc@430 PS5, Line 430: RN_IF_ERROR( > This could be a performance hit to open/close a file to read one data block It is a good point, could be a potential performance improvement. I modified the code and did some tests, looks like the reuse of the hdfs file handle doesn't work that good. Firstly, the hdfsFile seems not allow multithreading use, therefore, there needs to be an exclusive lock among multiple blocks in the same file, the performance is affected. Secondly, it could require more memory because probably we need to keep all the file handle till query ends, during the tests, I have met hanging on the the calls to hdfs interface after lots of file handle have been created, I assume it could be related. Therefore, to keep the file open, may need more efforts, could be good to put it in the next round of improvement. http://gerrit.cloudera.org:8080/#/c/17979/5/be/src/runtime/io/disk-io-mgr.cc@445 PS5, Line 445: queue->read_size()->Update(local_file_size); : disk_file_dst_->SetReadBufferBlockStatus( > See my comment above. Basically, I have a good mem block and there is a rea It depends on how the caller deal with the error. Right now, if fails to fetch the block, we simply return the block memory to global(the block structure is still there, but content memory is released), therefore some other block can use. The one who fails to fetch a block will change the policy to read by page for this page. If read by page fails again, the query will fail. http://gerrit.cloudera.org:8080/#/c/17979/5/be/src/runtime/io/disk-io-mgr.cc@452 PS5, Line 452: > This code could mask out a read error set at line 445. Better to return th Done http://gerrit.cloudera.org:8080/#/c/17979/6/be/src/runtime/tmp-file-mgr.cc File be/src/runtime/tmp-file-mgr.cc: http://gerrit.cloudera.org:8080/#/c/17979/6/be/src/runtime/tmp-file-mgr.cc@402 PS6, Line 402: TmpFileMgr::SetUpReadBufferParams > Since the work in this method is done for tmp_dirs_remote_ctrl_, suggest to Done http://gerrit.cloudera.org:8080/#/c/17979/6/be/src/runtime/tmp-file-mgr.cc@411 PS6, Line 411: tmp_dirs_remote_ctrl_.num_read_buffer_blocks_per_file_ = : static_cast<int>(tmp_dirs_remote_ctrl_.remote_tmp_file_size_ : / tmp_dirs_remote_ctrl_.read_buffer_block_size_); > This assumes that there is enough memory to hold the entire file. When it i It doesn't need the memory for the entire file. During the initialization, we only create the structure of the memory blocks, not reserve or allocate the memory for the block content. Each block will reserve or allocate its own memory separately. We have a global limitation on the sum of the size of all the blocks, which is the tmp_dirs_remote_ctrl_.max_read_buffer_size_. The global usage of the read buffer memory is tracked by current_read_memory_allocated_ in https://gerrit.cloudera.org/#/c/17979/6/be/src/runtime/tmp-file-mgr.h@400. If the current global usage is below the limitation, the block will reserve the memory successfully. The memory reservation for a block is done in https://gerrit.cloudera.org/#/c/17979/6/be/src/runtime/tmp-file-mgr.cc@1024, when someone wants to pin a page. Then with the reservation, the block can allocate the memory in https://gerrit.cloudera.org/#/c/17979/6/be/src/runtime/io/disk-io-mgr.cc@430. The data will be read from the remote file to the block memory which was just allocated. If error happens or query ends, the memory of the block will be released according to the status of the block in https://gerrit.cloudera.org/#/c/17979/6/be/src/runtime/tmp-file-mgr.cc@1107. By decreasing the reservation tracker current_read_memory_allocated_ by one block size, other blocks could be able to reserve the memory then allocate. -- To view, visit http://gerrit.cloudera.org:8080/17979 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I1dcc5d0881ffaeff09c5c514306cd668373ad31b Gerrit-Change-Number: 17979 Gerrit-PatchSet: 6 Gerrit-Owner: Yida Wu <wydbaggio...@gmail.com> Gerrit-Reviewer: Impala Public Jenkins <impala-public-jenk...@cloudera.com> Gerrit-Reviewer: Qifan Chen <qc...@cloudera.com> Gerrit-Reviewer: Yida Wu <wydbaggio...@gmail.com> Gerrit-Comment-Date: Fri, 12 Nov 2021 21:14:41 +0000 Gerrit-HasComments: Yes