Yida Wu has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/17979 )

Change subject: IMPALA-10791 Add batching reading for remote temporary files
......................................................................


Patch Set 4:

(9 comments)

http://gerrit.cloudera.org:8080/#/c/17979/1/be/src/runtime/io/disk-file.h
File be/src/runtime/io/disk-file.h:

http://gerrit.cloudera.org:8080/#/c/17979/1/be/src/runtime/io/disk-file.h@195
PS1, Line 195:
> Can we use MemBlockState state_ here?
Because the naming in DiskFile is "file_status_", maybe just keep them the 
same, otherwise it may be good to change all of them, including the interface 
names, but should be some work.


http://gerrit.cloudera.org:8080/#/c/17979/3/be/src/runtime/io/request-context.cc
File be/src/runtime/io/request-context.cc:

http://gerrit.cloudera.org:8080/#/c/17979/3/be/src/runtime/io/request-context.cc@201
PS3, Line 201: unstarted_remote_file_op_ranges_;
> Maybe named to unstarted_remote_file_op_ranges_?
Done


http://gerrit.cloudera.org:8080/#/c/17979/3/be/src/runtime/io/request-ranges.h
File be/src/runtime/io/request-ranges.h:

http://gerrit.cloudera.org:8080/#/c/17979/3/be/src/runtime/io/request-ranges.h@118
PS3, Line 118: WRITE,
> May need to explain what it is.
Done


http://gerrit.cloudera.org:8080/#/c/17979/3/be/src/runtime/io/request-ranges.h@702
PS3, Line 702:
> nit. upload the file to a remote location.
Done


http://gerrit.cloudera.org:8080/#/c/17979/3/be/src/runtime/io/request-ranges.h@708
PS3, Line 708:
> nit. the fetch file operation from a remote site.
Done


http://gerrit.cloudera.org:8080/#/c/17979/3/be/src/runtime/io/scan-range.cc
File be/src/runtime/io/scan-range.cc:

http://gerrit.cloudera.org:8080/#/c/17979/3/be/src/runtime/io/scan-range.cc@171
PS3, Line 171:  the range
             :     // is supposed to be read in one round.
> Suggest to remove as there is no guarantee.
Changed to "supposed".


http://gerrit.cloudera.org:8080/#/c/17979/3/be/src/runtime/io/scan-range.cc@178
PS3, Line 178: read_status
> need to check the status.
Done


http://gerrit.cloudera.org:8080/#/c/17979/2/be/src/runtime/tmp-file-mgr.cc
File be/src/runtime/tmp-file-mgr.cc:

http://gerrit.cloudera.org:8080/#/c/17979/2/be/src/runtime/tmp-file-mgr.cc@257
PS2, Line 257: Status setup_read_buffer_status = SetUpReadBufferParams();
             :     if (!setup_read_buffer_status.ok()) {
> If handling the rare case is simple task, I feel we should do so.
Changed. If the file size is smaller than the max block size, set the block 
size as file size. Otherwise block size is the max block size, which is 16MB.


http://gerrit.cloudera.org:8080/#/c/17979/2/be/src/runtime/tmp-file-mgr.cc@1039
PS2, Line 1039:       read_buffer_block->NotifyAllWaits();
> In practice, the read buffer memory is always full during the big queries (
Have a simple test today (15x tpcds, q67, c5d.4xlarge 16u32g, 1G read buffer).

1. Disabled is set: (Time: 135s) (Data Read: 13.8GB)
2. Disabled is not set: (Time: 150s) (Data Read: 17.7GB)

As expected, if the disabled is not set, performance is worse because more data 
is read (more duplicated read). It could be a little different for other 
queries, but if the read buffer is not available (full) for most of the time, 
which is quite likely when spilling large amount of data, disabling the file 
from batching read when failing to reserve space could be a better solution.

I think the next optimization is to make the read buffer more available, maybe 
using a better eviction policy.



--
To view, visit http://gerrit.cloudera.org:8080/17979
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I1dcc5d0881ffaeff09c5c514306cd668373ad31b
Gerrit-Change-Number: 17979
Gerrit-PatchSet: 4
Gerrit-Owner: Yida Wu <wydbaggio...@gmail.com>
Gerrit-Reviewer: Impala Public Jenkins <impala-public-jenk...@cloudera.com>
Gerrit-Reviewer: Qifan Chen <qc...@cloudera.com>
Gerrit-Reviewer: Yida Wu <wydbaggio...@gmail.com>
Gerrit-Comment-Date: Tue, 02 Nov 2021 01:05:56 +0000
Gerrit-HasComments: Yes

Reply via email to