[Impala-ASF-CR] IMPALA-10791 Add batch reading for remote temporary files
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/17979 ) Change subject: IMPALA-10791 Add batch reading for remote temporary files .. Patch Set 23: Verified+1 -- To view, visit http://gerrit.cloudera.org:8080/17979 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I1dcc5d0881ffaeff09c5c514306cd668373ad31b Gerrit-Change-Number: 17979 Gerrit-PatchSet: 23 Gerrit-Owner: Yida Wu Gerrit-Reviewer: Abhishek Rawat Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Michael Smith Gerrit-Reviewer: Qifan Chen Gerrit-Reviewer: Yida Wu Gerrit-Comment-Date: Wed, 28 Sep 2022 09:02:06 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-10791 Add batch reading for remote temporary files
Impala Public Jenkins has submitted this change and it was merged. ( http://gerrit.cloudera.org:8080/17979 ) Change subject: IMPALA-10791 Add batch reading for remote temporary files .. IMPALA-10791 Add batch reading for remote temporary files The patch adds a feature to batch read from a remote temporary file in order to improve the reading performance for the spilled remote data. Originally, the design is to use the local disk file as the buffer for batch read from the remote file. But in practice, it doesn't help to improve the performance. Therefore, the design is changed to use the memory as the read buffer. Currently, each TmpFileRemote has two DiskFile, one is for the remote, and one is for the local buffer. The patch adds MemBlocks to the local buffer file. Each local buffer file is divided into several MemBlocks evenly. Moreover, in order to guarantee a single page not being cut into two parts in different blocks, the block size could be a little different to each other in practice. The default block size is the minimum value between the default file size and MAX_REMOTE_READ_MEM_BLOCK_THRESHOLD_BYTES, which is 16MB. When pinning a page, the system will detect if there is enough memory for the block that holds the page. If yes, the block will be stored in the memory until all of the pages in the block are read or the query ends. If not, we will go reading the page directly and disable this block, because it may be good to avoid duplicated reads from the remote fs for the same content. One challenge of the read buffer is where to get the extra memory for it, because when impala starts to spill data, it means the process lacks of memory to use. By default, impala process will reserve 20% of the total system memory as unused memory, and here we will use this unused memory for the read buffer because it is reasonable to use it for the emergency case like spilling and the memory of the read buffer will be returned immediately after the use. For system reliability consideration, we set a restriction that, the maximum bytes of the read buffer memory are no more than 10% of the total system memory and 50% of the unused memory. Also, if the unused memory is less than 5% of the total system memory, the read buffer will be disabled. Two start options have been added for the new feature. 1. remote_batch_read. Default is false. If set true, the batch read is enabled. 2. remote_read_memory_buffer_size. Default is 1G. The maximum memory that can be used by the read buffer. The number is also restricted by the process memory limit, which can not exceed 10% of the process memory limit. Added metrics ScratchReadsUseMem/ScratchBytesReadUseMem/ ScratchBytesReadUseLocalDisk to the query profile. The patch also increases the MAX_REMOTE_TMPFILE_SIZE_THRESHOLD_MB from 256 to 512. Tests: Ran core and exhaustive tests. Added and ran TmpFileMgrTest::TestBatchReadingFromRemote. Added e2e test test_scratch_dirs_batch_reading. Change-Id: I1dcc5d0881ffaeff09c5c514306cd668373ad31b Reviewed-on: http://gerrit.cloudera.org:8080/17979 Reviewed-by: Impala Public Jenkins Tested-by: Impala Public Jenkins --- M be/src/runtime/io/CMakeLists.txt A be/src/runtime/io/disk-file-test.cc M be/src/runtime/io/disk-file.cc M be/src/runtime/io/disk-file.h M be/src/runtime/io/disk-io-mgr-test.cc M be/src/runtime/io/disk-io-mgr.cc M be/src/runtime/io/request-context.cc M be/src/runtime/io/request-context.h M be/src/runtime/io/request-ranges.h M be/src/runtime/io/scan-range.cc M be/src/runtime/tmp-file-mgr-internal.h M be/src/runtime/tmp-file-mgr-test.cc M be/src/runtime/tmp-file-mgr.cc M be/src/runtime/tmp-file-mgr.h M be/src/util/mem-info.cc M be/src/util/mem-info.h M be/src/util/metrics.h M common/thrift/metrics.json M tests/custom_cluster/test_scratch_disk.py 19 files changed, 1,666 insertions(+), 156 deletions(-) Approvals: Impala Public Jenkins: Looks good to me, approved; Verified -- To view, visit http://gerrit.cloudera.org:8080/17979 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: merged Gerrit-Change-Id: I1dcc5d0881ffaeff09c5c514306cd668373ad31b Gerrit-Change-Number: 17979 Gerrit-PatchSet: 24 Gerrit-Owner: Yida Wu Gerrit-Reviewer: Abhishek Rawat Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Michael Smith Gerrit-Reviewer: Qifan Chen Gerrit-Reviewer: Yida Wu
[Impala-ASF-CR] IMPALA-10791 Add batch reading for remote temporary files
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/17979 ) Change subject: IMPALA-10791 Add batch reading for remote temporary files .. Patch Set 23: Build started: https://jenkins.impala.io/job/gerrit-verify-dryrun/8629/ DRY_RUN=false -- To view, visit http://gerrit.cloudera.org:8080/17979 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I1dcc5d0881ffaeff09c5c514306cd668373ad31b Gerrit-Change-Number: 17979 Gerrit-PatchSet: 23 Gerrit-Owner: Yida Wu Gerrit-Reviewer: Abhishek Rawat Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Michael Smith Gerrit-Reviewer: Qifan Chen Gerrit-Reviewer: Yida Wu Gerrit-Comment-Date: Wed, 28 Sep 2022 03:50:57 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-10791 Add batch reading for remote temporary files
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/17979 ) Change subject: IMPALA-10791 Add batch reading for remote temporary files .. Patch Set 23: Code-Review+2 -- To view, visit http://gerrit.cloudera.org:8080/17979 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I1dcc5d0881ffaeff09c5c514306cd668373ad31b Gerrit-Change-Number: 17979 Gerrit-PatchSet: 23 Gerrit-Owner: Yida Wu Gerrit-Reviewer: Abhishek Rawat Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Michael Smith Gerrit-Reviewer: Qifan Chen Gerrit-Reviewer: Yida Wu Gerrit-Comment-Date: Wed, 28 Sep 2022 03:50:56 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-10791 Add batch reading for remote temporary files
Yida Wu has posted comments on this change. ( http://gerrit.cloudera.org:8080/17979 ) Change subject: IMPALA-10791 Add batch reading for remote temporary files .. Patch Set 22: The dry run should hit "IMPALA-11508 Iceberg test test_expire_snapshots is flaky" -- To view, visit http://gerrit.cloudera.org:8080/17979 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I1dcc5d0881ffaeff09c5c514306cd668373ad31b Gerrit-Change-Number: 17979 Gerrit-PatchSet: 22 Gerrit-Owner: Yida Wu Gerrit-Reviewer: Abhishek Rawat Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Michael Smith Gerrit-Reviewer: Qifan Chen Gerrit-Reviewer: Yida Wu Gerrit-Comment-Date: Wed, 28 Sep 2022 03:14:49 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-10791 Add batch reading for remote temporary files
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/17979 ) Change subject: IMPALA-10791 Add batch reading for remote temporary files .. Patch Set 22: Verified-1 Build failed: https://jenkins.impala.io/job/gerrit-verify-dryrun/8599/ -- To view, visit http://gerrit.cloudera.org:8080/17979 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I1dcc5d0881ffaeff09c5c514306cd668373ad31b Gerrit-Change-Number: 17979 Gerrit-PatchSet: 22 Gerrit-Owner: Yida Wu Gerrit-Reviewer: Abhishek Rawat Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Michael Smith Gerrit-Reviewer: Qifan Chen Gerrit-Reviewer: Yida Wu Gerrit-Comment-Date: Thu, 22 Sep 2022 21:21:28 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-10791 Add batch reading for remote temporary files
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/17979 ) Change subject: IMPALA-10791 Add batch reading for remote temporary files .. Patch Set 22: Code-Review+2 -- To view, visit http://gerrit.cloudera.org:8080/17979 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I1dcc5d0881ffaeff09c5c514306cd668373ad31b Gerrit-Change-Number: 17979 Gerrit-PatchSet: 22 Gerrit-Owner: Yida Wu Gerrit-Reviewer: Abhishek Rawat Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Michael Smith Gerrit-Reviewer: Qifan Chen Gerrit-Reviewer: Yida Wu Gerrit-Comment-Date: Thu, 22 Sep 2022 16:10:10 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-10791 Add batch reading for remote temporary files
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/17979 ) Change subject: IMPALA-10791 Add batch reading for remote temporary files .. Patch Set 22: Build started: https://jenkins.impala.io/job/gerrit-verify-dryrun/8599/ DRY_RUN=false -- To view, visit http://gerrit.cloudera.org:8080/17979 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I1dcc5d0881ffaeff09c5c514306cd668373ad31b Gerrit-Change-Number: 17979 Gerrit-PatchSet: 22 Gerrit-Owner: Yida Wu Gerrit-Reviewer: Abhishek Rawat Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Michael Smith Gerrit-Reviewer: Qifan Chen Gerrit-Reviewer: Yida Wu Gerrit-Comment-Date: Thu, 22 Sep 2022 16:10:11 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-10791 Add batch reading for remote temporary files
Qifan Chen has posted comments on this change. ( http://gerrit.cloudera.org:8080/17979 ) Change subject: IMPALA-10791 Add batch reading for remote temporary files .. Patch Set 21: Code-Review+2 +2 to carry Michael and Abhishek's vote. -- To view, visit http://gerrit.cloudera.org:8080/17979 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I1dcc5d0881ffaeff09c5c514306cd668373ad31b Gerrit-Change-Number: 17979 Gerrit-PatchSet: 21 Gerrit-Owner: Yida Wu Gerrit-Reviewer: Abhishek Rawat Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Michael Smith Gerrit-Reviewer: Qifan Chen Gerrit-Reviewer: Yida Wu Gerrit-Comment-Date: Wed, 21 Sep 2022 14:10:55 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-10791 Add batch reading for remote temporary files
Michael Smith has posted comments on this change. ( http://gerrit.cloudera.org:8080/17979 ) Change subject: IMPALA-10791 Add batch reading for remote temporary files .. Patch Set 21: Code-Review+1 -- To view, visit http://gerrit.cloudera.org:8080/17979 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I1dcc5d0881ffaeff09c5c514306cd668373ad31b Gerrit-Change-Number: 17979 Gerrit-PatchSet: 21 Gerrit-Owner: Yida Wu Gerrit-Reviewer: Abhishek Rawat Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Michael Smith Gerrit-Reviewer: Qifan Chen Gerrit-Reviewer: Yida Wu Gerrit-Comment-Date: Fri, 16 Sep 2022 22:09:33 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-10791 Add batch reading for remote temporary files
Qifan Chen has posted comments on this change. ( http://gerrit.cloudera.org:8080/17979 ) Change subject: IMPALA-10791 Add batch reading for remote temporary files .. Patch Set 21: Code-Review+1 (1 comment) Looks great! Thanks Yida for the careful rework. http://gerrit.cloudera.org:8080/#/c/17979/13/be/src/runtime/tmp-file-mgr.cc File be/src/runtime/tmp-file-mgr.cc: http://gerrit.cloudera.org:8080/#/c/17979/13/be/src/runtime/tmp-file-mgr.cc@1078 PS13, Line 1078: [read_buffer_block, tmp_file = this](const Status& fetch_status) { > The function firstly check whether the local buffer exists (even batch read Done -- To view, visit http://gerrit.cloudera.org:8080/17979 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I1dcc5d0881ffaeff09c5c514306cd668373ad31b Gerrit-Change-Number: 17979 Gerrit-PatchSet: 21 Gerrit-Owner: Yida Wu Gerrit-Reviewer: Abhishek Rawat Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Michael Smith Gerrit-Reviewer: Qifan Chen Gerrit-Reviewer: Yida Wu Gerrit-Comment-Date: Wed, 14 Sep 2022 13:13:48 + Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-10791 Add batch reading for remote temporary files
Yida Wu has posted comments on this change. ( http://gerrit.cloudera.org:8080/17979 ) Change subject: IMPALA-10791 Add batch reading for remote temporary files .. Patch Set 21: (1 comment) http://gerrit.cloudera.org:8080/#/c/17979/20//COMMIT_MSG Commit Message: http://gerrit.cloudera.org:8080/#/c/17979/20//COMMIT_MSG@45 PS20, Line 45: the maximum bytes of the read buffer memory are no more than 10% > nit. May mention that the memory borrowed from the reserved memory will be Done -- To view, visit http://gerrit.cloudera.org:8080/17979 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I1dcc5d0881ffaeff09c5c514306cd668373ad31b Gerrit-Change-Number: 17979 Gerrit-PatchSet: 21 Gerrit-Owner: Yida Wu Gerrit-Reviewer: Abhishek Rawat Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Michael Smith Gerrit-Reviewer: Qifan Chen Gerrit-Reviewer: Yida Wu Gerrit-Comment-Date: Tue, 13 Sep 2022 17:58:53 + Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-10791 Add batch reading for remote temporary files
Yida Wu has posted comments on this change. ( http://gerrit.cloudera.org:8080/17979 ) Change subject: IMPALA-10791 Add batch reading for remote temporary files .. Patch Set 21: (8 comments) Thanks Qifan for the review. http://gerrit.cloudera.org:8080/#/c/17979/20//COMMIT_MSG Commit Message: http://gerrit.cloudera.org:8080/#/c/17979/20//COMMIT_MSG@32 PS20, Line 32: and disable this block, because it may be good to avoid : duplicated reads from the remote fs for the same content. : > nit. suggest to present the positive case first. That is, moving it before Done http://gerrit.cloudera.org:8080/#/c/17979/20/be/src/runtime/tmp-file-mgr-test.cc File be/src/runtime/tmp-file-mgr-test.cc: http://gerrit.cloudera.org:8080/#/c/17979/20/be/src/runtime/tmp-file-mgr-test.cc@2183 PS20, Line 2183: WaitForDiskFileStatus(file1->DiskFile(), DiskFileStatus::PERSISTED); > nit. 'we use' -> 'we did use'. Done http://gerrit.cloudera.org:8080/#/c/17979/20/be/src/runtime/tmp-file-mgr-test.cc@2200 PS20, Line 2200: SSERT_OK(file_group.Read(handl > To improve the readability, it looks like various IF tests inside this bloc Added a helper function SetReadBufferSizeHelper(). http://gerrit.cloudera.org:8080/#/c/17979/13/be/src/runtime/tmp-file-mgr.cc File be/src/runtime/tmp-file-mgr.cc: http://gerrit.cloudera.org:8080/#/c/17979/13/be/src/runtime/tmp-file-mgr.cc@1070 PS13, Line 1070: > nit exists Done http://gerrit.cloudera.org:8080/#/c/17979/13/be/src/runtime/tmp-file-mgr.cc@1078 PS13, Line 1078: [read_buffer_block, tmp_file = this](const Status& fetch_status) { > nit. Can this line be moved above 1074? That is, if remote batch reading is The function firstly check whether the local buffer exists (even batch reading is disabled, we could still have the disk buffer), if not, and batch reading enabled, we will try to see if there is an available buffer in memory. So we are not able to move the line ahead. Also, this function is changed a lot in part 2, it may be better to keep optimizing in part 2 if there is no safety issue here. http://gerrit.cloudera.org:8080/#/c/17979/13/be/src/runtime/tmp-file-mgr.cc@1104 PS13, Line 1104: ile* TmpFileRemote::GetR > Looks like this function does more than what the function name implies. You Added a comment in the tmp-file-mgr.cc. Already a comment in the header of IncrementReadPageCount() in the tmp-file-mgr-internal.h. I think it is more efficient, because otherwise we need to fetch the lock to double check whether all the pages are read every time after the increment. Also it ensures that only one thread knows the "all_read" status and does the buffer removing job. http://gerrit.cloudera.org:8080/#/c/17979/20/be/src/util/mem-info.cc File be/src/util/mem-info.cc: http://gerrit.cloudera.org:8080/#/c/17979/20/be/src/util/mem-info.cc@265 PS20, Line 265: (process_avail_mem) *process_a > may just say Done http://gerrit.cloudera.org:8080/#/c/17979/20/be/src/util/metrics.h File be/src/util/metrics.h: http://gerrit.cloudera.org:8080/#/c/17979/20/be/src/util/metrics.h@272 PS20, Line 272: /// The hwm value is also updated atomically. > May add "Also return the updated current value." here. Done -- To view, visit http://gerrit.cloudera.org:8080/17979 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I1dcc5d0881ffaeff09c5c514306cd668373ad31b Gerrit-Change-Number: 17979 Gerrit-PatchSet: 21 Gerrit-Owner: Yida Wu Gerrit-Reviewer: Abhishek Rawat Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Michael Smith Gerrit-Reviewer: Qifan Chen Gerrit-Reviewer: Yida Wu Gerrit-Comment-Date: Tue, 13 Sep 2022 17:57:55 + Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-10791 Add batch reading for remote temporary files
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/17979 ) Change subject: IMPALA-10791 Add batch reading for remote temporary files .. Patch Set 21: Build Successful https://jenkins.impala.io/job/gerrit-code-review-checks/11347/ : Initial code review checks passed. Use gerrit-verify-dryrun-external or gerrit-verify-dryrun to run full precommit tests. -- To view, visit http://gerrit.cloudera.org:8080/17979 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I1dcc5d0881ffaeff09c5c514306cd668373ad31b Gerrit-Change-Number: 17979 Gerrit-PatchSet: 21 Gerrit-Owner: Yida Wu Gerrit-Reviewer: Abhishek Rawat Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Michael Smith Gerrit-Reviewer: Qifan Chen Gerrit-Reviewer: Yida Wu Gerrit-Comment-Date: Tue, 13 Sep 2022 17:55:03 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-10791 Add batch reading for remote temporary files
Yida Wu has uploaded a new patch set (#21). ( http://gerrit.cloudera.org:8080/17979 ) Change subject: IMPALA-10791 Add batch reading for remote temporary files .. IMPALA-10791 Add batch reading for remote temporary files The patch adds a feature to batch read from a remote temporary file in order to improve the reading performance for the spilled remote data. Originally, the design is to use the local disk file as the buffer for batch read from the remote file. But in practice, it doesn't help to improve the performance. Therefore, the design is changed to use the memory as the read buffer. Currently, each TmpFileRemote has two DiskFile, one is for the remote, and one is for the local buffer. The patch adds MemBlocks to the local buffer file. Each local buffer file is divided into several MemBlocks evenly. Moreover, in order to guarantee a single page not being cut into two parts in different blocks, the block size could be a little different to each other in practice. The default block size is the minimum value between the default file size and MAX_REMOTE_READ_MEM_BLOCK_THRESHOLD_BYTES, which is 16MB. When pinning a page, the system will detect if there is enough memory for the block that holds the page. If yes, the block will be stored in the memory until all of the pages in the block are read or the query ends. If not, we will go reading the page directly and disable this block, because it may be good to avoid duplicated reads from the remote fs for the same content. One challenge of the read buffer is where to get the extra memory for it, because when impala starts to spill data, it means the process lacks of memory to use. By default, impala process will reserve 20% of the total system memory as unused memory, and here we will use this unused memory for the read buffer because it is reasonable to use it for the emergency case like spilling and the memory of the read buffer will be returned immediately after the use. For system reliability consideration, we set a restriction that, the maximum bytes of the read buffer memory are no more than 10% of the total system memory and 50% of the unused memory. Also, if the unused memory is less than 5% of the total system memory, the read buffer will be disabled. Two start options have been added for the new feature. 1. remote_batch_read. Default is false. If set true, the batch read is enabled. 2. remote_read_memory_buffer_size. Default is 1G. The maximum memory that can be used by the read buffer. The number is also restricted by the process memory limit, which can not exceed 10% of the process memory limit. Added metrics ScratchReadsUseMem/ScratchBytesReadUseMem/ ScratchBytesReadUseLocalDisk to the query profile. The patch also increases the MAX_REMOTE_TMPFILE_SIZE_THRESHOLD_MB from 256 to 512. Tests: Ran core and exhaustive tests. Added and ran TmpFileMgrTest::TestBatchReadingFromRemote. Added e2e test test_scratch_dirs_batch_reading. Change-Id: I1dcc5d0881ffaeff09c5c514306cd668373ad31b --- M be/src/runtime/io/CMakeLists.txt A be/src/runtime/io/disk-file-test.cc M be/src/runtime/io/disk-file.cc M be/src/runtime/io/disk-file.h M be/src/runtime/io/disk-io-mgr-test.cc M be/src/runtime/io/disk-io-mgr.cc M be/src/runtime/io/request-context.cc M be/src/runtime/io/request-context.h M be/src/runtime/io/request-ranges.h M be/src/runtime/io/scan-range.cc M be/src/runtime/tmp-file-mgr-internal.h M be/src/runtime/tmp-file-mgr-test.cc M be/src/runtime/tmp-file-mgr.cc M be/src/runtime/tmp-file-mgr.h M be/src/util/mem-info.cc M be/src/util/mem-info.h M be/src/util/metrics.h M common/thrift/metrics.json M tests/custom_cluster/test_scratch_disk.py 19 files changed, 1,666 insertions(+), 156 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/79/17979/21 -- To view, visit http://gerrit.cloudera.org:8080/17979 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: newpatchset Gerrit-Change-Id: I1dcc5d0881ffaeff09c5c514306cd668373ad31b Gerrit-Change-Number: 17979 Gerrit-PatchSet: 21 Gerrit-Owner: Yida Wu Gerrit-Reviewer: Abhishek Rawat Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Michael Smith Gerrit-Reviewer: Qifan Chen Gerrit-Reviewer: Yida Wu
[Impala-ASF-CR] IMPALA-10791 Add batch reading for remote temporary files
Qifan Chen has posted comments on this change. ( http://gerrit.cloudera.org:8080/17979 ) Change subject: IMPALA-10791 Add batch reading for remote temporary files .. Patch Set 20: (9 comments) Looks great! http://gerrit.cloudera.org:8080/#/c/17979/20//COMMIT_MSG Commit Message: http://gerrit.cloudera.org:8080/#/c/17979/20//COMMIT_MSG@32 PS20, Line 32: If the system decides to fetch a block, the block will be : stored in the memory until all of the pages in the block are read : or the query ends. nit. suggest to present the positive case first. That is, moving it before "if not, we will go" at line 29. http://gerrit.cloudera.org:8080/#/c/17979/20//COMMIT_MSG@45 PS20, Line 45: the read buffer will be disabled. nit. May mention that the memory borrowed from the reserved memory will be returned immediately after use. http://gerrit.cloudera.org:8080/#/c/17979/20/be/src/runtime/tmp-file-mgr-test.cc File be/src/runtime/tmp-file-mgr-test.cc: http://gerrit.cloudera.org:8080/#/c/17979/20/be/src/runtime/tmp-file-mgr-test.cc@2183 PS20, Line 2183: // Check the metrics that we use the read buffer for reading. nit. 'we use' -> 'we did use'. http://gerrit.cloudera.org:8080/#/c/17979/20/be/src/runtime/tmp-file-mgr-test.cc@2200 PS20, Line 2200: f (buffer_size == small_size) To improve the readability, it looks like various IF tests inside this block can be avoided if a helper function testCapacityForReadBuffer(int64_t buffer_size) can be written that tests one buffer size only. http://gerrit.cloudera.org:8080/#/c/17979/13/be/src/runtime/tmp-file-mgr.cc File be/src/runtime/tmp-file-mgr.cc: http://gerrit.cloudera.org:8080/#/c/17979/13/be/src/runtime/tmp-file-mgr.cc@1070 PS13, Line 1070: nit exists http://gerrit.cloudera.org:8080/#/c/17979/13/be/src/runtime/tmp-file-mgr.cc@1078 PS13, Line 1078: [read_buffer_block, tmp_file = this](const Status& fetch_status) { nit. Can this line be moved above 1074? That is, if remote batch reading is disabled, return null immediately. No need to do the work from 1074 to 1077. http://gerrit.cloudera.org:8080/#/c/17979/13/be/src/runtime/tmp-file-mgr.cc@1104 PS13, Line 1104: ile* TmpFileRemote::GetR Looks like this function does more than what the function name implies. You may need to add a comment at least. Is there a strong need to do this check? http://gerrit.cloudera.org:8080/#/c/17979/20/be/src/util/mem-info.cc File be/src/util/mem-info.cc: http://gerrit.cloudera.org:8080/#/c/17979/20/be/src/util/mem-info.cc@265 PS20, Line 265: (process_avail_mem != nullptr) may just say if (process_avail_mem) *process_avail_mem = avail_mem; http://gerrit.cloudera.org:8080/#/c/17979/20/be/src/util/metrics.h File be/src/util/metrics.h: http://gerrit.cloudera.org:8080/#/c/17979/20/be/src/util/metrics.h@272 PS20, Line 272: /// The hwm value is also updated atomically. May add "Also return the updated current value." here. -- To view, visit http://gerrit.cloudera.org:8080/17979 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I1dcc5d0881ffaeff09c5c514306cd668373ad31b Gerrit-Change-Number: 17979 Gerrit-PatchSet: 20 Gerrit-Owner: Yida Wu Gerrit-Reviewer: Abhishek Rawat Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Michael Smith Gerrit-Reviewer: Qifan Chen Gerrit-Reviewer: Yida Wu Gerrit-Comment-Date: Tue, 13 Sep 2022 14:21:23 + Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-10791 Add batch reading for remote temporary files
Yida Wu has posted comments on this change. ( http://gerrit.cloudera.org:8080/17979 ) Change subject: IMPALA-10791 Add batch reading for remote temporary files .. Patch Set 20: Rebased and refined the limits for the read buffer based on Abhishek's comments. -- To view, visit http://gerrit.cloudera.org:8080/17979 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I1dcc5d0881ffaeff09c5c514306cd668373ad31b Gerrit-Change-Number: 17979 Gerrit-PatchSet: 20 Gerrit-Owner: Yida Wu Gerrit-Reviewer: Abhishek Rawat Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Michael Smith Gerrit-Reviewer: Qifan Chen Gerrit-Reviewer: Yida Wu Gerrit-Comment-Date: Mon, 12 Sep 2022 17:17:11 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-10791 Add batch reading for remote temporary files
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/17979 ) Change subject: IMPALA-10791 Add batch reading for remote temporary files .. Patch Set 20: Build Successful https://jenkins.impala.io/job/gerrit-code-review-checks/11337/ : Initial code review checks passed. Use gerrit-verify-dryrun-external or gerrit-verify-dryrun to run full precommit tests. -- To view, visit http://gerrit.cloudera.org:8080/17979 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I1dcc5d0881ffaeff09c5c514306cd668373ad31b Gerrit-Change-Number: 17979 Gerrit-PatchSet: 20 Gerrit-Owner: Yida Wu Gerrit-Reviewer: Abhishek Rawat Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Michael Smith Gerrit-Reviewer: Qifan Chen Gerrit-Reviewer: Yida Wu Gerrit-Comment-Date: Mon, 12 Sep 2022 17:12:22 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-10791 Add batch reading for remote temporary files
Yida Wu has uploaded a new patch set (#20). ( http://gerrit.cloudera.org:8080/17979 ) Change subject: IMPALA-10791 Add batch reading for remote temporary files .. IMPALA-10791 Add batch reading for remote temporary files The patch adds a feature to batch read from a remote temporary file in order to improve the reading performance for the spilled remote data. Originally, the design is to use the local disk file as the buffer for batch read from the remote file. But in practice, it doesn't help to improve the performance. Therefore, the design is changed to use the memory as the read buffer. Currently, each TmpFileRemote has two DiskFile, one is for the remote, and one is for the local buffer. The patch adds MemBlocks to the local buffer file. Each local buffer file is divided into several MemBlocks evenly. Moreover, in order to guarantee a single page not being cut into two parts in different blocks, the block size could be a little different to each other in practice. The default block size is the minimum value between the default file size and MAX_REMOTE_READ_MEM_BLOCK_THRESHOLD_BYTES, which is 16MB. When pinning a page, the system will detect if there is enough memory for the block that holds the page, if not, we will go reading the page directly and disable this block, because it may be good to avoid duplicated reads from the remote fs for the same content. If the system decides to fetch a block, the block will be stored in the memory until all of the pages in the block are read or the query ends. One challenge of the read buffer is where to get the extra memory for it, because when impala starts to spill data, it means the process lacks of memory to use. By default, impala process will reserve 20% of the total system memory as unused memory, and here we will use this unused memory for the read buffer because it is reasonable to use it for the emergency case like spilling. The maximum bytes of the read buffer memory are no more than 10% of the total system memory and 50% of the unused memory. Also, if the unused memory is less than 5% of the total system memory, the read buffer will be disabled. Two start options have been added for the new feature. 1. remote_batch_read. Default is false. If set true, the batch read is enabled. 2. remote_read_memory_buffer_size. Default is 1G. The maximum memory that can be used by the read buffer. The number is also restricted by the process memory limit, which can not exceed 10% of the process memory limit. Added metrics ScratchReadsUseMem/ScratchBytesReadUseMem/ ScratchBytesReadUseLocalDisk to the query profile. The patch also increases the MAX_REMOTE_TMPFILE_SIZE_THRESHOLD_MB from 256 to 512. Tests: Ran core and exhaustive tests. Added and ran TmpFileMgrTest::TestBatchReadingFromRemote. Added e2e test test_scratch_dirs_batch_reading. Change-Id: I1dcc5d0881ffaeff09c5c514306cd668373ad31b --- M be/src/runtime/io/CMakeLists.txt A be/src/runtime/io/disk-file-test.cc M be/src/runtime/io/disk-file.cc M be/src/runtime/io/disk-file.h M be/src/runtime/io/disk-io-mgr-test.cc M be/src/runtime/io/disk-io-mgr.cc M be/src/runtime/io/request-context.cc M be/src/runtime/io/request-context.h M be/src/runtime/io/request-ranges.h M be/src/runtime/io/scan-range.cc M be/src/runtime/tmp-file-mgr-internal.h M be/src/runtime/tmp-file-mgr-test.cc M be/src/runtime/tmp-file-mgr.cc M be/src/runtime/tmp-file-mgr.h M be/src/util/mem-info.cc M be/src/util/mem-info.h M be/src/util/metrics.h M common/thrift/metrics.json M tests/custom_cluster/test_scratch_disk.py 19 files changed, 1,648 insertions(+), 156 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/79/17979/20 -- To view, visit http://gerrit.cloudera.org:8080/17979 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: newpatchset Gerrit-Change-Id: I1dcc5d0881ffaeff09c5c514306cd668373ad31b Gerrit-Change-Number: 17979 Gerrit-PatchSet: 20 Gerrit-Owner: Yida Wu Gerrit-Reviewer: Abhishek Rawat Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Michael Smith Gerrit-Reviewer: Qifan Chen Gerrit-Reviewer: Yida Wu
[Impala-ASF-CR] IMPALA-10791 Add batch reading for remote temporary files
Michael Smith has posted comments on this change. ( http://gerrit.cloudera.org:8080/17979 ) Change subject: IMPALA-10791 Add batch reading for remote temporary files .. Patch Set 18: Code-Review+1 -- To view, visit http://gerrit.cloudera.org:8080/17979 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I1dcc5d0881ffaeff09c5c514306cd668373ad31b Gerrit-Change-Number: 17979 Gerrit-PatchSet: 18 Gerrit-Owner: Yida Wu Gerrit-Reviewer: Abhishek Rawat Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Michael Smith Gerrit-Reviewer: Qifan Chen Gerrit-Reviewer: Yida Wu Gerrit-Comment-Date: Mon, 29 Aug 2022 17:16:48 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-10791 Add batch reading for remote temporary files
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/17979 ) Change subject: IMPALA-10791 Add batch reading for remote temporary files .. Patch Set 17: Verified-1 Build failed: https://jenkins.impala.io/job/gerrit-verify-dryrun/8491/ -- To view, visit http://gerrit.cloudera.org:8080/17979 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I1dcc5d0881ffaeff09c5c514306cd668373ad31b Gerrit-Change-Number: 17979 Gerrit-PatchSet: 17 Gerrit-Owner: Yida Wu Gerrit-Reviewer: Abhishek Rawat Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Michael Smith Gerrit-Reviewer: Qifan Chen Gerrit-Reviewer: Yida Wu Gerrit-Comment-Date: Wed, 24 Aug 2022 00:43:34 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-10791 Add batch reading for remote temporary files
Yida Wu has posted comments on this change. ( http://gerrit.cloudera.org:8080/17979 ) Change subject: IMPALA-10791 Add batch reading for remote temporary files .. Patch Set 18: (3 comments) http://gerrit.cloudera.org:8080/#/c/17979/15/be/src/runtime/tmp-file-mgr.cc File be/src/runtime/tmp-file-mgr.cc: http://gerrit.cloudera.org:8080/#/c/17979/15/be/src/runtime/tmp-file-mgr.cc@604 PS15, Line 604: > I believe this is the memory limit for query processing, which is what I wo Thanks for pointing this out. Made a change on the max allowed bytes for read buffer. Now is 10% of the process bytes limit. Also modified the commit message. http://gerrit.cloudera.org:8080/#/c/17979/15/common/thrift/metrics.json File common/thrift/metrics.json: http://gerrit.cloudera.org:8080/#/c/17979/15/common/thrift/metrics.json@2424 PS15, Line 2424: The current total read memory buffer bytes for all scr > Maybe re-word this as follows to reflect this is current usage? Done http://gerrit.cloudera.org:8080/#/c/17979/15/common/thrift/metrics.json@2434 PS15, Line 2434: The high water mark for read memory buffer bytes of all scratch direct > Better wording to include the unit/bytes: Done -- To view, visit http://gerrit.cloudera.org:8080/17979 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I1dcc5d0881ffaeff09c5c514306cd668373ad31b Gerrit-Change-Number: 17979 Gerrit-PatchSet: 18 Gerrit-Owner: Yida Wu Gerrit-Reviewer: Abhishek Rawat Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Michael Smith Gerrit-Reviewer: Qifan Chen Gerrit-Reviewer: Yida Wu Gerrit-Comment-Date: Tue, 23 Aug 2022 23:19:39 + Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-10791 Add batch reading for remote temporary files
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/17979 ) Change subject: IMPALA-10791 Add batch reading for remote temporary files .. Patch Set 18: Build Successful https://jenkins.impala.io/job/gerrit-code-review-checks/11214/ : Initial code review checks passed. Use gerrit-verify-dryrun-external or gerrit-verify-dryrun to run full precommit tests. -- To view, visit http://gerrit.cloudera.org:8080/17979 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I1dcc5d0881ffaeff09c5c514306cd668373ad31b Gerrit-Change-Number: 17979 Gerrit-PatchSet: 18 Gerrit-Owner: Yida Wu Gerrit-Reviewer: Abhishek Rawat Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Michael Smith Gerrit-Reviewer: Qifan Chen Gerrit-Reviewer: Yida Wu Gerrit-Comment-Date: Tue, 23 Aug 2022 23:15:48 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-10791 Add batch reading for remote temporary files
Yida Wu has uploaded a new patch set (#18). ( http://gerrit.cloudera.org:8080/17979 ) Change subject: IMPALA-10791 Add batch reading for remote temporary files .. IMPALA-10791 Add batch reading for remote temporary files The patch adds a feature to batch read from a remote temporary file in order to improve the reading performance for the spilled remote data. Originally, the design is to use the local disk file as the buffer for batch read from the remote file. But in practice, it doesn't help to improve the performance. Therefore, the design is changed to use the memory as the read buffer. Currently, each TmpFileRemote has two DiskFile, one is for the remote, and one is for the local buffer. The patch adds MemBlocks to the local buffer file. Each local buffer file is divided into several MemBlocks evenly. Moreover, in order to guarantee a single page not being cut into two parts in different blocks, the block size could be a little different to each other in practice. The default block size is the minimum value between the default file size and MAX_REMOTE_READ_MEM_BLOCK_THRESHOLD_BYTES, which is 16MB. When pinning a page, the system will detect if there is enough memory for the block that holds the page, if not, we will go reading the page directly and disable this block, because it may be good to avoid duplicated reads from the remote fs for the same content. If the system decides to fetch a block, the block will be stored in the memory until all of the pages in the block are read or the query ends. One challenge of the read buffer is where to get the extra memory for it, because when impala starts to spill data, it means the process lacks of memory to use. By default, impala process will reserve 20% of the total system memory as unused memory, and here we decide to use this unused memory for the read buffer because it is reasonable to use it for the emergency case like spilling. The maximum bytes of the read buffer memory are 10% of the process memory limit, which is a conservative number compared to 20% of the total system memory. The read buffer size plus process memory limit should not exceed the total system memory. Two start options have been added for the new feature. 1. remote_batch_read. Default is false. If set true, the batch read is enabled. 2. remote_read_memory_buffer_size. Default is 1G. The maximum memory that can be used by the read buffer. The number is also restricted by the process memory limit, which can not exceed 10% of the process memory limit. Added metrics ScratchReadsUseMem/ScratchBytesReadUseMem/ ScratchBytesReadUseLocalDisk to the query profile. The patch also increases the MAX_REMOTE_TMPFILE_SIZE_THRESHOLD_MB from 256 to 512. Tests: Ran core and exhaustive tests. Added and ran TmpFileMgrTest::TestBatchReadingFromRemote. Added e2e test test_scratch_dirs_batch_reading. Change-Id: I1dcc5d0881ffaeff09c5c514306cd668373ad31b --- M be/src/runtime/io/CMakeLists.txt A be/src/runtime/io/disk-file-test.cc M be/src/runtime/io/disk-file.cc M be/src/runtime/io/disk-file.h M be/src/runtime/io/disk-io-mgr-test.cc M be/src/runtime/io/disk-io-mgr.cc M be/src/runtime/io/request-context.cc M be/src/runtime/io/request-context.h M be/src/runtime/io/request-ranges.h M be/src/runtime/io/scan-range.cc M be/src/runtime/tmp-file-mgr-internal.h M be/src/runtime/tmp-file-mgr-test.cc M be/src/runtime/tmp-file-mgr.cc M be/src/runtime/tmp-file-mgr.h M be/src/util/mem-info.cc M be/src/util/mem-info.h M be/src/util/metrics.h M common/thrift/metrics.json M tests/custom_cluster/test_scratch_disk.py 19 files changed, 1,587 insertions(+), 161 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/79/17979/18 -- To view, visit http://gerrit.cloudera.org:8080/17979 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: newpatchset Gerrit-Change-Id: I1dcc5d0881ffaeff09c5c514306cd668373ad31b Gerrit-Change-Number: 17979 Gerrit-PatchSet: 18 Gerrit-Owner: Yida Wu Gerrit-Reviewer: Abhishek Rawat Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Michael Smith Gerrit-Reviewer: Qifan Chen Gerrit-Reviewer: Yida Wu
[Impala-ASF-CR] IMPALA-10791 Add batch reading for remote temporary files
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/17979 ) Change subject: IMPALA-10791 Add batch reading for remote temporary files .. Patch Set 17: Build started: https://jenkins.impala.io/job/gerrit-verify-dryrun/8491/ DRY_RUN=true -- To view, visit http://gerrit.cloudera.org:8080/17979 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I1dcc5d0881ffaeff09c5c514306cd668373ad31b Gerrit-Change-Number: 17979 Gerrit-PatchSet: 17 Gerrit-Owner: Yida Wu Gerrit-Reviewer: Abhishek Rawat Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Michael Smith Gerrit-Reviewer: Qifan Chen Gerrit-Reviewer: Yida Wu Gerrit-Comment-Date: Tue, 23 Aug 2022 22:03:29 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-10791 Add batch reading for remote temporary files
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/17979 ) Change subject: IMPALA-10791 Add batch reading for remote temporary files .. Patch Set 16: Build Failed https://jenkins.impala.io/job/gerrit-code-review-checks/11213/ : Initial code review checks failed. See linked job for details on the failure. -- To view, visit http://gerrit.cloudera.org:8080/17979 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I1dcc5d0881ffaeff09c5c514306cd668373ad31b Gerrit-Change-Number: 17979 Gerrit-PatchSet: 16 Gerrit-Owner: Yida Wu Gerrit-Reviewer: Abhishek Rawat Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Michael Smith Gerrit-Reviewer: Qifan Chen Gerrit-Reviewer: Yida Wu Gerrit-Comment-Date: Tue, 23 Aug 2022 21:24:15 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-10791 Add batch reading for remote temporary files
Yida Wu has uploaded a new patch set (#16). ( http://gerrit.cloudera.org:8080/17979 ) Change subject: IMPALA-10791 Add batch reading for remote temporary files .. IMPALA-10791 Add batch reading for remote temporary files The patch adds a feature to batch read from a remote temporary file in order to improve the reading performance for the spilled remote data. Originally, the design is to use the local disk file as the buffer for batch read from the remote file. But in practice, it doesn't help to improve the performance. Therefore, the design is changed to use the memory as the read buffer. Currently, each TmpFileRemote has two DiskFile, one is for the remote, and one is for the local buffer. The patch adds MemBlocks to the local buffer file. Each local buffer file is divided into several MemBlocks evenly. Moreover, in order to guarantee a single page not being cut into two parts in different blocks, the block size could be a little different to each other in practice. The default block size is the minimum value between the default file size and MAX_REMOTE_READ_MEM_BLOCK_THRESHOLD_BYTES, which is 16MB. When pinning a page, the system will detect if there is enough memory for the block that holds the page, if not, we will go reading the page directly and disable this block, because it may be good to avoid duplicated reads from the remote fs for the same content. If the system decides to fetch a block, the block will be stored in the memory until all of the pages in the block are read or the query ends. One challenge of the read buffer is where to get the extra memory for it, because when impala starts to spill data, it means the process lacks of memory to use. By default, impala process will reserve 20% of the total system memory as unused memory, and here we decide to use this unused memory for the read buffer because it is reasonable to use it for the emergency case like spilling. The maximum bytes of the read buffer memory are 10% of the process memory limit, which is a conservative number compared to 20% of the total system memory. The read buffer size plus process memory limit should not exceed the total system memory. Two start options have been added for the new feature. 1. remote_batch_read. Default is false. If set true, the batch read is enabled. 2. remote_read_memory_buffer_size. Default is 1G. The maximum memory that can be used by the read buffer. The number is also restricted by the process memory limit, which can not exceed 10% of the process memory limit. Added metrics ScratchReadsUseMem/ScratchBytesReadUseMem/ ScratchBytesReadUseLocalDisk to the query profile. The patch also increases the MAX_REMOTE_TMPFILE_SIZE_THRESHOLD_MB from 256 to 512. Tests: Ran core and exhaustive tests. Added and ran TmpFileMgrTest::TestBatchReadingFromRemote. Added e2e test test_scratch_dirs_batch_reading. Change-Id: I1dcc5d0881ffaeff09c5c514306cd668373ad31b --- M be/src/runtime/io/CMakeLists.txt A be/src/runtime/io/disk-file-test.cc M be/src/runtime/io/disk-file.cc M be/src/runtime/io/disk-file.h M be/src/runtime/io/disk-io-mgr-test.cc M be/src/runtime/io/disk-io-mgr.cc M be/src/runtime/io/request-context.cc M be/src/runtime/io/request-context.h M be/src/runtime/io/request-ranges.h M be/src/runtime/io/scan-range.cc M be/src/runtime/tmp-file-mgr-internal.h M be/src/runtime/tmp-file-mgr-test.cc M be/src/runtime/tmp-file-mgr.cc M be/src/runtime/tmp-file-mgr.h M be/src/util/mem-info.cc M be/src/util/mem-info.h M be/src/util/metrics.h M common/thrift/metrics.json M tests/custom_cluster/test_scratch_disk.py 19 files changed, 1,587 insertions(+), 161 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/79/17979/16 -- To view, visit http://gerrit.cloudera.org:8080/17979 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: newpatchset Gerrit-Change-Id: I1dcc5d0881ffaeff09c5c514306cd668373ad31b Gerrit-Change-Number: 17979 Gerrit-PatchSet: 16 Gerrit-Owner: Yida Wu Gerrit-Reviewer: Abhishek Rawat Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Michael Smith Gerrit-Reviewer: Qifan Chen Gerrit-Reviewer: Yida Wu
[Impala-ASF-CR] IMPALA-10791 Add batch reading for remote temporary files
Abhishek Rawat has posted comments on this change. ( http://gerrit.cloudera.org:8080/17979 ) Change subject: IMPALA-10791 Add batch reading for remote temporary files .. Patch Set 15: (9 comments) Looks good to me. Some initial comments. http://gerrit.cloudera.org:8080/#/c/17979/12//COMMIT_MSG Commit Message: http://gerrit.cloudera.org:8080/#/c/17979/12//COMMIT_MSG@10 PS12, Line 10: file in order to improve the reading performance for the spilled nit: reading -> read http://gerrit.cloudera.org:8080/#/c/17979/12//COMMIT_MSG@13 PS12, Line 13: Originally, the design is to use the local disk file as the buffer nit: is -> was http://gerrit.cloudera.org:8080/#/c/17979/14/be/src/runtime/io/disk-file-test.cc File be/src/runtime/io/disk-file-test.cc: http://gerrit.cloudera.org:8080/#/c/17979/14/be/src/runtime/io/disk-file-test.cc@30 PS14, Line 30: MemBlockStatus new_status, bool expect_suc); 'expect_success' is probably better for reading. http://gerrit.cloudera.org:8080/#/c/17979/14/be/src/runtime/io/disk-file-test.cc@33 PS14, Line 33: // last_status is MemBlock's last status is going to reach other than missing "it" in the description. last_status is MemBlock's last status 'it' is going to reach other than http://gerrit.cloudera.org:8080/#/c/17979/14/be/src/runtime/io/disk-file.h File be/src/runtime/io/disk-file.h: http://gerrit.cloudera.org:8080/#/c/17979/14/be/src/runtime/io/disk-file.h@45 PS14, Line 45: /// UNINIT is the default status, indicates the block is not uninitialized. typo? not uninitialized. => not initialized. http://gerrit.cloudera.org:8080/#/c/17979/14/be/src/runtime/io/disk-file.h@72 PS14, Line 72: // Allcate the memory for the MemBlock. typo "Allcate" http://gerrit.cloudera.org:8080/#/c/17979/15/be/src/runtime/tmp-file-mgr.cc File be/src/runtime/tmp-file-mgr.cc: http://gerrit.cloudera.org:8080/#/c/17979/15/be/src/runtime/tmp-file-mgr.cc@604 PS15, Line 604: admit_mem_limit I believe this is the memory limit for query processing, which is what I would expect we use and not use the 20% emergency memory. But, then this contradicts the commit message. http://gerrit.cloudera.org:8080/#/c/17979/15/common/thrift/metrics.json File common/thrift/metrics.json: http://gerrit.cloudera.org:8080/#/c/17979/15/common/thrift/metrics.json@2424 PS15, Line 2424: The used read memory buffer of all scratch directories Maybe re-word this as follows to reflect this is current usage? "The current total read memory buffer bytes for all scratch directories" http://gerrit.cloudera.org:8080/#/c/17979/15/common/thrift/metrics.json@2434 PS15, Line 2434: The high water mark for read memory buffer of all scratch directories. Better wording to include the unit/bytes: The high water mark for read memory buffer bytes of all scratch directories. -- To view, visit http://gerrit.cloudera.org:8080/17979 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I1dcc5d0881ffaeff09c5c514306cd668373ad31b Gerrit-Change-Number: 17979 Gerrit-PatchSet: 15 Gerrit-Owner: Yida Wu Gerrit-Reviewer: Abhishek Rawat Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Michael Smith Gerrit-Reviewer: Qifan Chen Gerrit-Reviewer: Yida Wu Gerrit-Comment-Date: Tue, 09 Aug 2022 23:25:02 + Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-10791 Add batch reading for remote temporary files
Michael Smith has posted comments on this change. ( http://gerrit.cloudera.org:8080/17979 ) Change subject: IMPALA-10791 Add batch reading for remote temporary files .. Patch Set 15: Code-Review+1 -- To view, visit http://gerrit.cloudera.org:8080/17979 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I1dcc5d0881ffaeff09c5c514306cd668373ad31b Gerrit-Change-Number: 17979 Gerrit-PatchSet: 15 Gerrit-Owner: Yida Wu Gerrit-Reviewer: Abhishek Rawat Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Michael Smith Gerrit-Reviewer: Qifan Chen Gerrit-Reviewer: Yida Wu Gerrit-Comment-Date: Wed, 03 Aug 2022 16:42:58 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-10791 Add batch reading for remote temporary files
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/17979 ) Change subject: IMPALA-10791 Add batch reading for remote temporary files .. Patch Set 15: Build Successful https://jenkins.impala.io/job/gerrit-code-review-checks/11077/ : Initial code review checks passed. Use gerrit-verify-dryrun-external or gerrit-verify-dryrun to run full precommit tests. -- To view, visit http://gerrit.cloudera.org:8080/17979 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I1dcc5d0881ffaeff09c5c514306cd668373ad31b Gerrit-Change-Number: 17979 Gerrit-PatchSet: 15 Gerrit-Owner: Yida Wu Gerrit-Reviewer: Abhishek Rawat Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Michael Smith Gerrit-Reviewer: Qifan Chen Gerrit-Reviewer: Yida Wu Gerrit-Comment-Date: Tue, 02 Aug 2022 23:27:32 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-10791 Add batch reading for remote temporary files
Yida Wu has posted comments on this change. ( http://gerrit.cloudera.org:8080/17979 ) Change subject: IMPALA-10791 Add batch reading for remote temporary files .. Patch Set 15: Fixed a testcase failure in TSAN build, could be related to an old issue IMPALA-4249. -- To view, visit http://gerrit.cloudera.org:8080/17979 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I1dcc5d0881ffaeff09c5c514306cd668373ad31b Gerrit-Change-Number: 17979 Gerrit-PatchSet: 15 Gerrit-Owner: Yida Wu Gerrit-Reviewer: Abhishek Rawat Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Michael Smith Gerrit-Reviewer: Qifan Chen Gerrit-Reviewer: Yida Wu Gerrit-Comment-Date: Tue, 02 Aug 2022 23:08:53 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-10791 Add batch reading for remote temporary files
Yida Wu has uploaded a new patch set (#15). ( http://gerrit.cloudera.org:8080/17979 ) Change subject: IMPALA-10791 Add batch reading for remote temporary files .. IMPALA-10791 Add batch reading for remote temporary files The patch adds a feature to batch read from a remote temporary file in order to improve the reading performance for the spilled remote data. Originally, the design is to use the local disk file as the buffer for batch reading from the remote file. But in practice, it doesn't help to improve the performance. Therefore, the design is changed to use the memory as the read buffer. Currently, each TmpFileRemote has two DiskFile, one is for the remote, and one is for the local buffer. The patch adds MemBlocks to the local buffer file. Each local buffer file is divided into several MemBlocks evenly. Moreover, in order to guarantee a single page not being cut into two parts in different blocks, the block size could be a little different to each other in practice. The default block size is the minimum value between the default file size and MAX_REMOTE_READ_MEM_BLOCK_THRESHOLD_BYTES, which is 16MB. When pinning a page, the system will detect if there is enough memory for the block that holds the page, if not, we will go reading the page directly and disable this block, because it may be good to avoid duplicated reads from the remote fs for the same content. If the system decides to fetch a block, the block will be stored in the memory until all of the pages in the block are read or the query ends. One challenge of using the memory for the buffer is that, when the system is lacking of memory when it needs to spill the data. So we make a restriction to limit the percentage of the memory for the read buffer to 10% of the total, because right now the impala process will reserve 20% memory as unused memory by default, using 10% for the emergency case like spilling is reasonable. Two start options have been added for the new feature. 1. remote_batch_read. Default is false. If set true, the batch read is enabled. 2. remote_read_memory_buffer_size. Default is 1G. The maximum memory that can be used by the read buffer. The number also restricted by the total system memory, which can not exceed 10% of the total system memory. Added metrics ScratchReadsUseMem/ScratchBytesReadUseMem/ ScratchBytesReadUseLocalDisk to the query profile. The patch also increases the MAX_REMOTE_TMPFILE_SIZE_THRESHOLD_MB from 256 to 512. Tests: Ran core and exhaustive tests. Added and ran TmpFileMgrTest::TestBatchReadingFromRemote. Added e2e test test_scratch_dirs_batch_reading. Change-Id: I1dcc5d0881ffaeff09c5c514306cd668373ad31b --- M be/src/runtime/io/CMakeLists.txt A be/src/runtime/io/disk-file-test.cc M be/src/runtime/io/disk-file.cc M be/src/runtime/io/disk-file.h M be/src/runtime/io/disk-io-mgr-test.cc M be/src/runtime/io/disk-io-mgr.cc M be/src/runtime/io/request-context.cc M be/src/runtime/io/request-context.h M be/src/runtime/io/request-ranges.h M be/src/runtime/io/scan-range.cc M be/src/runtime/tmp-file-mgr-internal.h M be/src/runtime/tmp-file-mgr-test.cc M be/src/runtime/tmp-file-mgr.cc M be/src/runtime/tmp-file-mgr.h M be/src/util/metrics.h M common/thrift/metrics.json M tests/custom_cluster/test_scratch_disk.py 17 files changed, 1,572 insertions(+), 159 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/79/17979/15 -- To view, visit http://gerrit.cloudera.org:8080/17979 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: newpatchset Gerrit-Change-Id: I1dcc5d0881ffaeff09c5c514306cd668373ad31b Gerrit-Change-Number: 17979 Gerrit-PatchSet: 15 Gerrit-Owner: Yida Wu Gerrit-Reviewer: Abhishek Rawat Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Michael Smith Gerrit-Reviewer: Qifan Chen Gerrit-Reviewer: Yida Wu
[Impala-ASF-CR] IMPALA-10791 Add batch reading for remote temporary files
Michael Smith has posted comments on this change. ( http://gerrit.cloudera.org:8080/17979 ) Change subject: IMPALA-10791 Add batch reading for remote temporary files .. Patch Set 14: Code-Review+1 -- To view, visit http://gerrit.cloudera.org:8080/17979 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I1dcc5d0881ffaeff09c5c514306cd668373ad31b Gerrit-Change-Number: 17979 Gerrit-PatchSet: 14 Gerrit-Owner: Yida Wu Gerrit-Reviewer: Abhishek Rawat Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Michael Smith Gerrit-Reviewer: Qifan Chen Gerrit-Reviewer: Yida Wu Gerrit-Comment-Date: Tue, 19 Jul 2022 16:56:31 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-10791 Add batch reading for remote temporary files
Yida Wu has posted comments on this change. ( http://gerrit.cloudera.org:8080/17979 ) Change subject: IMPALA-10791 Add batch reading for remote temporary files .. Patch Set 14: (11 comments) http://gerrit.cloudera.org:8080/#/c/17979/13/be/src/runtime/io/disk-io-mgr-test.cc File be/src/runtime/io/disk-io-mgr-test.cc: http://gerrit.cloudera.org:8080/#/c/17979/13/be/src/runtime/io/disk-io-mgr-test.cc@191 PS13, Line 191: protected: > Naming this "ValidateMemBlockStatus" seems more consistent with naming test Done. Moved to DiskFileTest. http://gerrit.cloudera.org:8080/#/c/17979/13/be/src/runtime/io/disk-io-mgr-test.cc@2494 PS13, Line 2494: > ObjectPool is useful when you need to generate a bunch or variable number o Done. Moved to DiskFileTest. http://gerrit.cloudera.org:8080/#/c/17979/13/be/src/runtime/io/disk-io-mgr-test.cc@2525 PS13, Line 2525: ASSERT_EQ(0, status.code()); > Does disk-io-mgr-test contain a bunch of unit tests for DiskFile too? This Added some negative tests. Moved to DiskFileTest. http://gerrit.cloudera.org:8080/#/c/17979/13/be/src/runtime/io/disk-io-mgr.cc File be/src/runtime/io/disk-io-mgr.cc: http://gerrit.cloudera.org:8080/#/c/17979/13/be/src/runtime/io/disk-io-mgr.cc@264 PS13, Line 264: disk_file_->UpdateReadBufferMetaDataIfNeeded(written_bytes - len_); > The condition check is unnecessary because UpdateReadBufferMetaData returns Done http://gerrit.cloudera.org:8080/#/c/17979/13/be/src/runtime/io/disk-io-mgr.cc@353 PS13, Line 353: if (!status.ok()) goto end; > nit: line length limit is 90, this doesn't need to wrap. Done http://gerrit.cloudera.org:8080/#/c/17979/13/be/src/runtime/io/disk-io-mgr.cc@407 PS13, Line 407: // Get the shared lock to prevent the physical files from deletion during the fetching. > Storing the result of `c_str()` is usually a bad idea, since it has no guar Done http://gerrit.cloudera.org:8080/#/c/17979/13/be/src/runtime/io/disk-io-mgr.cc@411 PS13, Line 411: shared_lock srcl(disk_file_src_->physical_file_lock_); > Initializing this up-front is a very old-school C idiom. Compilers can ofte Done http://gerrit.cloudera.org:8080/#/c/17979/13/be/src/runtime/io/disk-io-mgr.cc@447 PS13, Line 447: read_buffer_bloc, MemBlockStatus::WRITTEN, dstl, _buffer_lock); > If we wanted to get fancy we could use a lambda to return the read result. Done http://gerrit.cloudera.org:8080/#/c/17979/13/be/src/runtime/tmp-file-mgr-test.cc File be/src/runtime/tmp-file-mgr-test.cc: http://gerrit.cloudera.org:8080/#/c/17979/13/be/src/runtime/tmp-file-mgr-test.cc@2101 PS13, Line 2101: TEST_F(TmpFileMgrTest, TestBatchReadingFromRemote) { > Can we add a test case where the remote file is deleted while DoUpload is i Added a testcase DiskIoMgrTest::WriteToRemoteFileDeleted, but it is hard to delete the remote physical file in the middle of the uploading because the detection using hdfsExists seems only after the file is closed. http://gerrit.cloudera.org:8080/#/c/17979/13/be/src/runtime/tmp-file-mgr.cc File be/src/runtime/tmp-file-mgr.cc: http://gerrit.cloudera.org:8080/#/c/17979/13/be/src/runtime/tmp-file-mgr.cc@1019 PS13, Line 1019: // waiting for the fetch, so that it won't block the current page reading. > So if MemBlock is reserved/allocated, that means another process has alread Added comments. http://gerrit.cloudera.org:8080/#/c/17979/13/be/src/runtime/tmp-file-mgr.cc@1022 PS13, Line 1022: if (read_buffer_file->IsReadBufferBlockStatus(read_buffer_block, > Ok, took me a sec to see why these checks make sense even though the only c This part is changed a lot in part 2. Maybe we can optimize the structure in part 2. -- To view, visit http://gerrit.cloudera.org:8080/17979 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I1dcc5d0881ffaeff09c5c514306cd668373ad31b Gerrit-Change-Number: 17979 Gerrit-PatchSet: 14 Gerrit-Owner: Yida Wu Gerrit-Reviewer: Abhishek Rawat Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Michael Smith Gerrit-Reviewer: Qifan Chen Gerrit-Reviewer: Yida Wu Gerrit-Comment-Date: Mon, 11 Jul 2022 15:53:18 + Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-10791 Add batch reading for remote temporary files
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/17979 ) Change subject: IMPALA-10791 Add batch reading for remote temporary files .. Patch Set 14: Build Successful https://jenkins.impala.io/job/gerrit-code-review-checks/10945/ : Initial code review checks passed. Use gerrit-verify-dryrun-external or gerrit-verify-dryrun to run full precommit tests. -- To view, visit http://gerrit.cloudera.org:8080/17979 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I1dcc5d0881ffaeff09c5c514306cd668373ad31b Gerrit-Change-Number: 17979 Gerrit-PatchSet: 14 Gerrit-Owner: Yida Wu Gerrit-Reviewer: Abhishek Rawat Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Michael Smith Gerrit-Reviewer: Qifan Chen Gerrit-Reviewer: Yida Wu Gerrit-Comment-Date: Mon, 11 Jul 2022 15:44:16 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-10791 Add batch reading for remote temporary files
Yida Wu has uploaded a new patch set (#14). ( http://gerrit.cloudera.org:8080/17979 ) Change subject: IMPALA-10791 Add batch reading for remote temporary files .. IMPALA-10791 Add batch reading for remote temporary files The patch adds a feature to batch read from a remote temporary file in order to improve the reading performance for the spilled remote data. Originally, the design is to use the local disk file as the buffer for batch reading from the remote file. But in practice, it doesn't help to improve the performance. Therefore, the design is changed to use the memory as the read buffer. Currently, each TmpFileRemote has two DiskFile, one is for the remote, and one is for the local buffer. The patch adds MemBlocks to the local buffer file. Each local buffer file is divided into several MemBlocks evenly. Moreover, in order to guarantee a single page not being cut into two parts in different blocks, the block size could be a little different to each other in practice. The default block size is the minimum value between the default file size and MAX_REMOTE_READ_MEM_BLOCK_THRESHOLD_BYTES, which is 16MB. When pinning a page, the system will detect if there is enough memory for the block that holds the page, if not, we will go reading the page directly and disable this block, because it may be good to avoid duplicated reads from the remote fs for the same content. If the system decides to fetch a block, the block will be stored in the memory until all of the pages in the block are read or the query ends. One challenge of using the memory for the buffer is that, when the system is lacking of memory when it needs to spill the data. So we make a restriction to limit the percentage of the memory for the read buffer to 10% of the total, because right now the impala process will reserve 20% memory as unused memory by default, using 10% for the emergency case like spilling is reasonable. Two start options have been added for the new feature. 1. remote_batch_read. Default is false. If set true, the batch read is enabled. 2. remote_read_memory_buffer_size. Default is 1G. The maximum memory that can be used by the read buffer. The number also restricted by the total system memory, which can not exceed 10% of the total system memory. Added metrics ScratchReadsUseMem/ScratchBytesReadUseMem/ ScratchBytesReadUseLocalDisk to the query profile. The patch also increases the MAX_REMOTE_TMPFILE_SIZE_THRESHOLD_MB from 256 to 512. Tests: Ran core and exhaustive tests. Added and ran TmpFileMgrTest::TestBatchReadingFromRemote. Added e2e test test_scratch_dirs_batch_reading. Change-Id: I1dcc5d0881ffaeff09c5c514306cd668373ad31b --- M be/src/runtime/io/CMakeLists.txt A be/src/runtime/io/disk-file-test.cc M be/src/runtime/io/disk-file.cc M be/src/runtime/io/disk-file.h M be/src/runtime/io/disk-io-mgr-test.cc M be/src/runtime/io/disk-io-mgr.cc M be/src/runtime/io/request-context.cc M be/src/runtime/io/request-context.h M be/src/runtime/io/request-ranges.h M be/src/runtime/io/scan-range.cc M be/src/runtime/tmp-file-mgr-internal.h M be/src/runtime/tmp-file-mgr-test.cc M be/src/runtime/tmp-file-mgr.cc M be/src/runtime/tmp-file-mgr.h M be/src/util/metrics.h M common/thrift/metrics.json M tests/custom_cluster/test_scratch_disk.py 17 files changed, 1,571 insertions(+), 159 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/79/17979/14 -- To view, visit http://gerrit.cloudera.org:8080/17979 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: newpatchset Gerrit-Change-Id: I1dcc5d0881ffaeff09c5c514306cd668373ad31b Gerrit-Change-Number: 17979 Gerrit-PatchSet: 14 Gerrit-Owner: Yida Wu Gerrit-Reviewer: Abhishek Rawat Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Michael Smith Gerrit-Reviewer: Qifan Chen Gerrit-Reviewer: Yida Wu
[Impala-ASF-CR] IMPALA-10791 Add batch reading for remote temporary files
Michael Smith has posted comments on this change. ( http://gerrit.cloudera.org:8080/17979 ) Change subject: IMPALA-10791 Add batch reading for remote temporary files .. Patch Set 13: (14 comments) http://gerrit.cloudera.org:8080/#/c/17979/12/be/src/runtime/io/disk-file.h File be/src/runtime/io/disk-file.h: http://gerrit.cloudera.org:8080/#/c/17979/12/be/src/runtime/io/disk-file.h@55 PS12, Line 55: /// The caller may need to maintain the status of the MemBlock and make sure the block is > It is quite a simple class working closely with the DiskFile and TmpFile st Ack http://gerrit.cloudera.org:8080/#/c/17979/12/be/src/runtime/io/disk-file.h@220 PS12, Line 220: SpinLock read_buffer_ctrl_lock_; > Maybe personal preference. Prefer a fixed-size array if the size is fixed a Ack http://gerrit.cloudera.org:8080/#/c/17979/13/be/src/runtime/io/disk-io-mgr-test.cc File be/src/runtime/io/disk-io-mgr-test.cc: http://gerrit.cloudera.org:8080/#/c/17979/13/be/src/runtime/io/disk-io-mgr-test.cc@191 PS13, Line 191: void TestMemBlockStatus(MemBlockStatus last_status); Naming this "ValidateMemBlockStatus" seems more consistent with naming test helper functions. http://gerrit.cloudera.org:8080/#/c/17979/13/be/src/runtime/io/disk-io-mgr-test.cc@2494 PS13, Line 2494: ObjectPool pool; ObjectPool is useful when you need to generate a bunch or variable number of objects that need to persist passed the creation scope. That's not the case here, you could use a local variable, as in MemBlock block(block_id); http://gerrit.cloudera.org:8080/#/c/17979/13/be/src/runtime/io/disk-io-mgr-test.cc@2525 PS13, Line 2525: TEST_F(DiskIoMgrTest, MemBlockTest) { Does disk-io-mgr-test contain a bunch of unit tests for DiskFile too? This would make more sense to me in a new file: disk-file-test.cc. I'd also be interested in negative tests, but it looks like we only enforce order via DCHECK so the negative tests wouldn't really work. http://gerrit.cloudera.org:8080/#/c/17979/13/be/src/runtime/io/disk-io-mgr.cc File be/src/runtime/io/disk-io-mgr.cc: http://gerrit.cloudera.org:8080/#/c/17979/13/be/src/runtime/io/disk-io-mgr.cc@264 PS13, Line 264: if (disk_file_->IsBatchReadEnabled()) { The condition check is unnecessary because UpdateReadBufferMetaData returns immediately when !IsBatchReadEnabled. http://gerrit.cloudera.org:8080/#/c/17979/13/be/src/runtime/io/disk-io-mgr.cc@353 PS13, Line 353: // is_to_delete flag. nit: line length limit is 90, this doesn't need to wrap. http://gerrit.cloudera.org:8080/#/c/17979/13/be/src/runtime/io/disk-io-mgr.cc@407 PS13, Line 407: const char* remote_file_path = disk_file_src_->path().c_str(); Storing the result of `c_str()` is usually a bad idea, since it has no guaranteed lifetime. In this case it looks like it's fine, but also unnecessary. I'd personally get the string& instead and use that for later calls. http://gerrit.cloudera.org:8080/#/c/17979/13/be/src/runtime/io/disk-io-mgr.cc@411 PS13, Line 411: int ret = 0; Initializing this up-front is a very old-school C idiom. Compilers can often optimize better if you declare the variable in the scope where it's used, so I'd prefer to move this to the else block at line 444 where it's actually needed. http://gerrit.cloudera.org:8080/#/c/17979/13/be/src/runtime/io/disk-io-mgr.cc@447 PS13, Line 447: ret = hdfsPreadFully(hdfs_conn, remote_hdfs_file, remote_file_offset, If we wanted to get fancy we could use a lambda to return the read result. I think that adds no additional overhead, but also entirely unnecessary. int ret = [&]() { ScopedHistogramTimer read_timer(queue->read_latency()); return hdfsPreadFully(hdfs_conn, remote_hdfs_file, remote_file_offset, read_buffer_bloc->data(), local_file_size); }(); http://gerrit.cloudera.org:8080/#/c/17979/12/be/src/runtime/io/scan-range.cc File be/src/runtime/io/scan-range.cc: http://gerrit.cloudera.org:8080/#/c/17979/12/be/src/runtime/io/scan-range.cc@156 PS12, Line 156: // Set the correct reader to read the range if the memory buffer is not available. > Added some comments. The logic here is to set the correct file reader if it Ack http://gerrit.cloudera.org:8080/#/c/17979/13/be/src/runtime/tmp-file-mgr-test.cc File be/src/runtime/tmp-file-mgr-test.cc: http://gerrit.cloudera.org:8080/#/c/17979/13/be/src/runtime/tmp-file-mgr-test.cc@2101 PS13, Line 2101: TEST_F(TmpFileMgrTest, TestBatchReadingFromRemote) { Can we add a test case where the remote file is deleted while DoUpload is in progress, and verify it stops early? Not sure if that belongs here or in disk-io-mgr-test.cc. http://gerrit.cloudera.org:8080/#/c/17979/13/be/src/runtime/tmp-file-mgr.cc File be/src/runtime/tmp-file-mgr.cc: http://gerrit.cloudera.org:8080/#/c/17979/13/be/src/runtime/tmp-file-mgr.cc@1019 PS13, Line 1019: // For other status, will be treated as disabled. So if MemBlock is
[Impala-ASF-CR] IMPALA-10791 Add batch reading for remote temporary files
Yida Wu has posted comments on this change. ( http://gerrit.cloudera.org:8080/17979 ) Change subject: IMPALA-10791 Add batch reading for remote temporary files .. Patch Set 12: (11 comments) http://gerrit.cloudera.org:8080/#/c/17979/12/be/src/runtime/io/disk-file.h File be/src/runtime/io/disk-file.h: http://gerrit.cloudera.org:8080/#/c/17979/12/be/src/runtime/io/disk-file.h@55 PS12, Line 55: class MemBlock { > This seems like a fairly leaky abstraction. It mostly exists to handle the It is quite a simple class working closely with the DiskFile and TmpFile stuff, may rely on the TmpFile stuff to reserve the memory (which relies on a global variable in TmpFileMgr) for it before having the right to allocate, and rely on the IO functions to write and read directly on the data_ under certain block status. Think for a while for a better abstraction, but dont have much idea for now while guarantee the efficiency and safety without too much complexity. I think one of the good thing here is the status transition is clear and one-way down, what the user needs to do, in most cases, is to lock on the block spinlock, do something, change the status if needed. http://gerrit.cloudera.org:8080/#/c/17979/12/be/src/runtime/io/disk-file.h@220 PS12, Line 220: std::unique_ptr page_cnts_per_block_; > This seems a little funny, but I think I see why it's necessary. A const ve Maybe personal preference. Prefer a fixed-size array if the size is fixed and type is simple. Looks more clear. http://gerrit.cloudera.org:8080/#/c/17979/12/be/src/runtime/io/disk-file.h@446 PS12, Line 446: // Helper function to check the status of a read buffe block. > nit: buffe -> buffer Done http://gerrit.cloudera.org:8080/#/c/17979/12/be/src/runtime/io/disk-file.h@458 PS12, Line 458: // Helper function to delete the read buffe block. > nit: buffe -> buffer Done http://gerrit.cloudera.org:8080/#/c/17979/12/be/src/runtime/io/disk-file.h@516 PS12, Line 516: /// The lock also protects the memory blocks from destruction, if the disk file has. > "if the disk file has" seems like an incomplete sentence. Done http://gerrit.cloudera.org:8080/#/c/17979/12/be/src/runtime/io/disk-file.cc File be/src/runtime/io/disk-file.cc: http://gerrit.cloudera.org:8080/#/c/17979/12/be/src/runtime/io/disk-file.cc@99 PS12, Line 99: void MemBlock::Delete(bool* reserved, bool* allocated) { > Some unit tests around MemBlock transitions might be useful. Added a unit testcase "MemBlockTest" should have covered the MemBlock status transitions. http://gerrit.cloudera.org:8080/#/c/17979/12/be/src/runtime/io/request-ranges.h File be/src/runtime/io/request-ranges.h: http://gerrit.cloudera.org:8080/#/c/17979/12/be/src/runtime/io/request-ranges.h@152 PS12, Line 152: RequestRange(RequestType::type request_type, int disk_id = -1, int64_t offset = -1) > How does an offset of -1 differ from 0? I think the change is to allow assigning the offset in the constructor function. -1 is an invalid value for the offset, that means when we need to set the offset of the range before using the range, there are some assertion about it like "DCHECK_GE(offset, 0);" when using it. http://gerrit.cloudera.org:8080/#/c/17979/12/be/src/runtime/io/scan-range.cc File be/src/runtime/io/scan-range.cc: http://gerrit.cloudera.org:8080/#/c/17979/12/be/src/runtime/io/scan-range.cc@156 PS12, Line 156: if (!use_mem_buffer) { > Since we asserted above that use_local_buff implies !use_mem_buffer, this c Added some comments. The logic here is to set the correct file reader if it involves reading from the file system (could be local or remote file). If it uses memory buffer, it doesn't need to set this. There is a case if use_mem_buffer and use_local_buff are all false, this case we will use the original file_reader_ to get the range. http://gerrit.cloudera.org:8080/#/c/17979/12/be/src/runtime/io/scan-range.cc@291 PS12, Line 291: // 1. If it is the local buffer file is not deleted(evicted) yet. > Change to "If the local buffer file ..." Done http://gerrit.cloudera.org:8080/#/c/17979/12/be/src/runtime/tmp-file-mgr-test.cc File be/src/runtime/tmp-file-mgr-test.cc: http://gerrit.cloudera.org:8080/#/c/17979/12/be/src/runtime/tmp-file-mgr-test.cc@1835 PS12, Line 1835: int64_t file_size = 512 * 1024 * 1024; > Why was this changed? The maximum allowed file size for one remote file increases to 512MB in this patch defined in MAX_REMOTE_TMPFILE_SIZE_THRESHOLD_MB in tmp-file-mgr.cc. The reason for the increase is a bigger file size may have a better upload performance. By default we are still using 256MB, but may give users a chance if they would like to try a bigger size. http://gerrit.cloudera.org:8080/#/c/17979/12/be/src/runtime/tmp-file-mgr.cc File be/src/runtime/tmp-file-mgr.cc: http://gerrit.cloudera.org:8080/#/c/17979/12/be/src/runtime/tmp-file-mgr.cc@117 PS12, Line 117: "Set if
[Impala-ASF-CR] IMPALA-10791 Add batch reading for remote temporary files
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/17979 ) Change subject: IMPALA-10791 Add batch reading for remote temporary files .. Patch Set 13: Build Successful https://jenkins.impala.io/job/gerrit-code-review-checks/10889/ : Initial code review checks passed. Use gerrit-verify-dryrun-external or gerrit-verify-dryrun to run full precommit tests. -- To view, visit http://gerrit.cloudera.org:8080/17979 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I1dcc5d0881ffaeff09c5c514306cd668373ad31b Gerrit-Change-Number: 17979 Gerrit-PatchSet: 13 Gerrit-Owner: Yida Wu Gerrit-Reviewer: Abhishek Rawat Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Michael Smith Gerrit-Reviewer: Qifan Chen Gerrit-Reviewer: Yida Wu Gerrit-Comment-Date: Tue, 28 Jun 2022 00:27:22 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-10791 Add batch reading for remote temporary files
Yida Wu has uploaded a new patch set (#13). ( http://gerrit.cloudera.org:8080/17979 ) Change subject: IMPALA-10791 Add batch reading for remote temporary files .. IMPALA-10791 Add batch reading for remote temporary files The patch adds a feature to batch read from a remote temporary file in order to improve the reading performance for the spilled remote data. Originally, the design is to use the local disk file as the buffer for batch reading from the remote file. But in practice, it doesn't help to improve the performance. Therefore, the design is changed to use the memory as the read buffer. Currently, each TmpFileRemote has two DiskFile, one is for the remote, and one is for the local buffer. The patch adds MemBlocks to the local buffer file. Each local buffer file is divided into several MemBlocks evenly. Moreover, in order to guarantee a single page not being cut into two parts in different blocks, the block size could be a little different to each other in practice. The default block size is the minimum value between the default file size and MAX_REMOTE_READ_MEM_BLOCK_THRESHOLD_BYTES, which is 16MB. When pinning a page, the system will detect if there is enough memory for the block that holds the page, if not, we will go reading the page directly and disable this block, because it may be good to avoid duplicated reads from the remote fs for the same content. If the system decides to fetch a block, the block will be stored in the memory until all of the pages in the block are read or the query ends. One challenge of using the memory for the buffer is that, when the system is lacking of memory when it needs to spill the data. So we make a restriction to limit the percentage of the memory for the read buffer to 10% of the total, because right now the impala process will reserve 20% memory as unused memory by default, using 10% for the emergency case like spilling is reasonable. Two start options have been added for the new feature. 1. remote_batch_read. Default is false. If set true, the batch read is enabled. 2. remote_read_memory_buffer_size. Default is 1G. The maximum memory that can be used by the read buffer. The number also restricted by the total system memory, which can not exceed 10% of the total system memory. Added metrics ScratchReadsUseMem/ScratchBytesReadUseMem/ ScratchBytesReadUseLocalDisk to the query profile. The patch also increases the MAX_REMOTE_TMPFILE_SIZE_THRESHOLD_MB from 256 to 512. Tests: Ran core and exhaustive tests. Added and ran TmpFileMgrTest::TestBatchReadingFromRemote. Added e2e test test_scratch_dirs_batch_reading. Change-Id: I1dcc5d0881ffaeff09c5c514306cd668373ad31b --- M be/src/runtime/io/disk-file.cc M be/src/runtime/io/disk-file.h M be/src/runtime/io/disk-io-mgr-test.cc M be/src/runtime/io/disk-io-mgr.cc M be/src/runtime/io/request-context.cc M be/src/runtime/io/request-context.h M be/src/runtime/io/request-ranges.h M be/src/runtime/io/scan-range.cc M be/src/runtime/tmp-file-mgr-internal.h M be/src/runtime/tmp-file-mgr-test.cc M be/src/runtime/tmp-file-mgr.cc M be/src/runtime/tmp-file-mgr.h M be/src/util/metrics.h M common/thrift/metrics.json M tests/custom_cluster/test_scratch_disk.py 15 files changed, 1,340 insertions(+), 151 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/79/17979/13 -- To view, visit http://gerrit.cloudera.org:8080/17979 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: newpatchset Gerrit-Change-Id: I1dcc5d0881ffaeff09c5c514306cd668373ad31b Gerrit-Change-Number: 17979 Gerrit-PatchSet: 13 Gerrit-Owner: Yida Wu Gerrit-Reviewer: Abhishek Rawat Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Michael Smith Gerrit-Reviewer: Qifan Chen Gerrit-Reviewer: Yida Wu
[Impala-ASF-CR] IMPALA-10791 Add batch reading for remote temporary files
Michael Smith has posted comments on this change. ( http://gerrit.cloudera.org:8080/17979 ) Change subject: IMPALA-10791 Add batch reading for remote temporary files .. Patch Set 12: (19 comments) A mix of questions and suggestions. I didn't see anything clearly wrong. http://gerrit.cloudera.org:8080/#/c/17979/12/be/src/runtime/io/disk-file.h File be/src/runtime/io/disk-file.h: http://gerrit.cloudera.org:8080/#/c/17979/12/be/src/runtime/io/disk-file.h@55 PS12, Line 55: class MemBlock { This seems like a fairly leaky abstraction. It mostly exists to handle the lifecycle around data_, but also relies on external operations to do lots of the actual management. Not sure I have specific improvements to suggest however. http://gerrit.cloudera.org:8080/#/c/17979/12/be/src/runtime/io/disk-file.h@60 PS12, Line 60: DCHECK_EQ(static_cast(status_), static_cast(MemBlockStatus::DISABLED)); Can we also verify no memory leaks by checking `data_`? http://gerrit.cloudera.org:8080/#/c/17979/12/be/src/runtime/io/disk-file.h@220 PS12, Line 220: std::unique_ptr page_cnts_per_block_; This seems a little funny, but I think I see why it's necessary. A const vector wouldn't let you update element values, and a non-const vector could potentially be resized. This is the only way to get a runtime fixed-size array. http://gerrit.cloudera.org:8080/#/c/17979/12/be/src/runtime/io/disk-file.h@386 PS12, Line 386: bool DoReadFromReadBuffer( This function doesn't take any action, so I'm not really sure why it's called "DoRead". I'd opt for Can, Should, or Must (depending on expectations here). http://gerrit.cloudera.org:8080/#/c/17979/12/be/src/runtime/io/disk-file.h@446 PS12, Line 446: // Helper function to check the status of a read buffe block. nit: buffe -> buffer http://gerrit.cloudera.org:8080/#/c/17979/12/be/src/runtime/io/disk-file.h@458 PS12, Line 458: // Helper function to delete the read buffe block. nit: buffe -> buffer http://gerrit.cloudera.org:8080/#/c/17979/12/be/src/runtime/io/disk-file.h@516 PS12, Line 516: /// The lock also protects the memory blocks from destruction, if the disk file has. "if the disk file has" seems like an incomplete sentence. http://gerrit.cloudera.org:8080/#/c/17979/12/be/src/runtime/io/disk-file.cc File be/src/runtime/io/disk-file.cc: http://gerrit.cloudera.org:8080/#/c/17979/12/be/src/runtime/io/disk-file.cc@99 PS12, Line 99: void MemBlock::Delete(bool* reserved, bool* allocated) { Some unit tests around MemBlock transitions might be useful. http://gerrit.cloudera.org:8080/#/c/17979/12/be/src/runtime/io/disk-file.cc@116 PS12, Line 116: SetStatusLocked(lock, MemBlockStatus::DISABLED); It would be nice to have a DCHECK(data_ == nullptr) here. It would be true immediately after freeing ALLOC memory, or in other states (because no memory should have been allocated). http://gerrit.cloudera.org:8080/#/c/17979/12/be/src/runtime/io/disk-io-mgr.cc File be/src/runtime/io/disk-io-mgr.cc: http://gerrit.cloudera.org:8080/#/c/17979/12/be/src/runtime/io/disk-io-mgr.cc@463 PS12, Line 463: // If there was an error during reading, keep the old status. Might be helpful to log errors that we can't return. http://gerrit.cloudera.org:8080/#/c/17979/12/be/src/runtime/io/request-ranges.h File be/src/runtime/io/request-ranges.h: http://gerrit.cloudera.org:8080/#/c/17979/12/be/src/runtime/io/request-ranges.h@152 PS12, Line 152: RequestRange(RequestType::type request_type, int disk_id = -1, int64_t offset = -1) How does an offset of -1 differ from 0? http://gerrit.cloudera.org:8080/#/c/17979/12/be/src/runtime/io/scan-range.cc File be/src/runtime/io/scan-range.cc: http://gerrit.cloudera.org:8080/#/c/17979/12/be/src/runtime/io/scan-range.cc@156 PS12, Line 156: if (!use_mem_buffer) { Since we asserted above that use_local_buff implies !use_mem_buffer, this check is redundant. http://gerrit.cloudera.org:8080/#/c/17979/12/be/src/runtime/io/scan-range.cc@291 PS12, Line 291: // 1. If it is the local buffer file is not deleted(evicted) yet. Change to "If the local buffer file ..." http://gerrit.cloudera.org:8080/#/c/17979/12/be/src/runtime/tmp-file-mgr-test.cc File be/src/runtime/tmp-file-mgr-test.cc: http://gerrit.cloudera.org:8080/#/c/17979/12/be/src/runtime/tmp-file-mgr-test.cc@1835 PS12, Line 1835: int64_t file_size = 512 * 1024 * 1024; Why was this changed? http://gerrit.cloudera.org:8080/#/c/17979/12/be/src/runtime/tmp-file-mgr.cc File be/src/runtime/tmp-file-mgr.cc: http://gerrit.cloudera.org:8080/#/c/17979/12/be/src/runtime/tmp-file-mgr.cc@108 PS12, Line 108: "files. Only valid when the remote_batch_read is true."); "Only valid when remote_batch_read is true." seems more consistent with how we talk about other flags. http://gerrit.cloudera.org:8080/#/c/17979/12/be/src/runtime/tmp-file-mgr.cc@117 PS12, Line 117: "Set if the system uses batch reading for
[Impala-ASF-CR] IMPALA-10791 Add batch reading for remote temporary files
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/17979 ) Change subject: IMPALA-10791 Add batch reading for remote temporary files .. Patch Set 12: Build Successful https://jenkins.impala.io/job/gerrit-code-review-checks/10293/ : Initial code review checks passed. Use gerrit-verify-dryrun-external or gerrit-verify-dryrun to run full precommit tests. -- To view, visit http://gerrit.cloudera.org:8080/17979 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I1dcc5d0881ffaeff09c5c514306cd668373ad31b Gerrit-Change-Number: 17979 Gerrit-PatchSet: 12 Gerrit-Owner: Yida Wu Gerrit-Reviewer: Abhishek Rawat Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Qifan Chen Gerrit-Reviewer: Yida Wu Gerrit-Comment-Date: Tue, 15 Mar 2022 21:49:53 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-10791 Add batch reading for remote temporary files
Yida Wu has posted comments on this change. ( http://gerrit.cloudera.org:8080/17979 ) Change subject: IMPALA-10791 Add batch reading for remote temporary files .. Patch Set 12: Patch12 changes the name of the option from remote_batching_read to remote_batch_read to shorten the name, and also related naming. -- To view, visit http://gerrit.cloudera.org:8080/17979 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I1dcc5d0881ffaeff09c5c514306cd668373ad31b Gerrit-Change-Number: 17979 Gerrit-PatchSet: 12 Gerrit-Owner: Yida Wu Gerrit-Reviewer: Abhishek Rawat Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Qifan Chen Gerrit-Reviewer: Yida Wu Gerrit-Comment-Date: Tue, 15 Mar 2022 21:32:17 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-10791 Add batch reading for remote temporary files
Yida Wu has uploaded a new patch set (#12). ( http://gerrit.cloudera.org:8080/17979 ) Change subject: IMPALA-10791 Add batch reading for remote temporary files .. IMPALA-10791 Add batch reading for remote temporary files The patch adds a feature to batch read from a remote temporary file in order to improve the reading performance for the spilled remote data. Originally, the design is to use the local disk file as the buffer for batch reading from the remote file. But in practice, it doesn't help to improve the performance. Therefore, the design is changed to use the memory as the read buffer. Currently, each TmpFileRemote has two DiskFile, one is for the remote, and one is for the local buffer. The patch adds MemBlocks to the local buffer file. Each local buffer file is divided into several MemBlocks evenly. Moreover, in order to guarantee a single page not being cut into two parts in different blocks, the block size could be a little different to each other in practice. The default block size is the minimum value between the default file size and MAX_REMOTE_READ_MEM_BLOCK_THRESHOLD_BYTES, which is 16MB. When pinning a page, the system will detect if there is enough memory for the block that holds the page, if not, we will go reading the page directly and disable this block, because it may be good to avoid duplicated reads from the remote fs for the same content. If the system decides to fetch a block, the block will be stored in the memory until all of the pages in the block are read or the query ends. One challenge of using the memory for the buffer is that, when the system is lacking of memory when it needs to spill the data. So we make a restriction to limit the percentage of the memory for the read buffer to 10% of the total, because right now the impala process will reserve 20% memory as unused memory by default, using 10% for the emergency case like spilling is reasonable. Two start options have been added for the new feature. 1. remote_batch_read. Default is false. If set true, the batch read is enabled. 2. remote_read_memory_buffer_size. Default is 1G. The maximum memory that can be used by the read buffer. The number also restricted by the total system memory, which can not exceed 10% of the total system memory. Added metrics ScratchReadsUseMem/ScratchBytesReadUseMem/ ScratchBytesReadUseLocalDisk to the query profile. The patch also increases the MAX_REMOTE_TMPFILE_SIZE_THRESHOLD_MB from 256 to 512. Tests: Ran core and exhaustive tests. Added and ran TmpFileMgrTest::TestBatchReadingFromRemote. Added e2e test test_scratch_dirs_batch_reading. Change-Id: I1dcc5d0881ffaeff09c5c514306cd668373ad31b --- M be/src/runtime/io/disk-file.cc M be/src/runtime/io/disk-file.h M be/src/runtime/io/disk-io-mgr.cc M be/src/runtime/io/request-context.cc M be/src/runtime/io/request-context.h M be/src/runtime/io/request-ranges.h M be/src/runtime/io/scan-range.cc M be/src/runtime/tmp-file-mgr-internal.h M be/src/runtime/tmp-file-mgr-test.cc M be/src/runtime/tmp-file-mgr.cc M be/src/runtime/tmp-file-mgr.h M be/src/util/metrics.h M common/thrift/metrics.json M tests/custom_cluster/test_scratch_disk.py 14 files changed, 1,279 insertions(+), 151 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/79/17979/12 -- To view, visit http://gerrit.cloudera.org:8080/17979 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: newpatchset Gerrit-Change-Id: I1dcc5d0881ffaeff09c5c514306cd668373ad31b Gerrit-Change-Number: 17979 Gerrit-PatchSet: 12 Gerrit-Owner: Yida Wu Gerrit-Reviewer: Abhishek Rawat Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Qifan Chen Gerrit-Reviewer: Yida Wu