Tim Armstrong has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/16318 )

Change subject: IMPALA-9867: Add Support for Spilling to S3: Milestone 1
......................................................................


Patch Set 32:

(4 comments)

Few comments related to the hang

http://gerrit.cloudera.org:8080/#/c/16318/32/be/src/runtime/tmp-file-mgr-internal.h
File be/src/runtime/tmp-file-mgr-internal.h:

http://gerrit.cloudera.org:8080/#/c/16318/32/be/src/runtime/tmp-file-mgr-internal.h@206
PS32, Line 206:   TmpFileRemote(TmpFileGroup* file_group, TmpFileMgr::DeviceId 
device_id,
Can we add a destructor with a DCHECK that will ensure that the resources have 
been returned to TmpFileBufferPool correctly?

This might relate to the shared_ptr comment I made elsewhere - if we use 
shared_ptr consistently then we'll know that the destructor runs exactly when 
the object becomes unreachable.


http://gerrit.cloudera.org:8080/#/c/16318/32/be/src/runtime/tmp-file-mgr.h
File be/src/runtime/tmp-file-mgr.h:

http://gerrit.cloudera.org:8080/#/c/16318/32/be/src/runtime/tmp-file-mgr.h@184
PS32, Line 184:     /// A map to keep the TmpFile shared pointer alive by 
keeping a reference of the
Why do we need to do this again? Can't we enqueue the shared_ptr into the tmp 
file pool and make ownership explicit, instead of doing this shared_ptr/raw 
pointer dance. I'm finding it hard to understand the ownership of these TmpFile 
objects while looking to see if there's a resource leak.


http://gerrit.cloudera.org:8080/#/c/16318/32/be/src/runtime/tmp-file-mgr.cc
File be/src/runtime/tmp-file-mgr.cc:

http://gerrit.cloudera.org:8080/#/c/16318/32/be/src/runtime/tmp-file-mgr.cc@576
PS32, Line 576:   if 
(tmp_dirs_remote_ctrl_.local_buff_dir_bytes_high_water_mark_.Add(file_size)
I think this code might be vulnerable to a race where a thread ends up blocking 
indefinitely in DequeueTmpFilesPool, e.g.

Assume bytes_limit is 2x and each file is x bytes. Start with x bytes allocated.

Thread 1: increment, win race, 2x allocated
Thread 2: increment, lose race, 3x allocated > bytes limit
Thread 3: decrement, 2x
Thread 4: increment, fails, 3x allocated
Thread 4: decrement, 2x
Thread 2: decrement, 1x

In that case thread 4 could block in DequeueTmpFilesPool even though there's 
free capacity.

I think the basic flaw is that one of the conditions for a thread blocking on 
tmp_files_available_cv_ is that it should have checked for buffer availabity 
before doing so, but it's not part of the condition variable loop condition.

Using the atomics is ok in other places because we just return an error when 
there's a race.

I think it'd be best to protect it behind a lock and also signal the condition 
variable when it's decremented from >= bytes_limit to < bytes_limit.


http://gerrit.cloudera.org:8080/#/c/16318/32/be/src/runtime/tmp-file-mgr.cc@1835
PS32, Line 1835: void TmpFileBufferPool::EnqueueTmpFilesPool(TmpFile* tmp_file, 
bool front) {
I'm suspicious about using the raw pointer here



--
To view, visit http://gerrit.cloudera.org:8080/16318
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I419b1d5dbbfe35334d9f964c4b65e553579fdc89
Gerrit-Change-Number: 16318
Gerrit-PatchSet: 32
Gerrit-Owner: Yida Wu <wydbaggio...@gmail.com>
Gerrit-Reviewer: Abhishek Rawat <ara...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <impala-public-jenk...@cloudera.com>
Gerrit-Reviewer: Sahil Takiar <stak...@cloudera.com>
Gerrit-Reviewer: Tim Armstrong <tarmstr...@cloudera.com>
Gerrit-Reviewer: Yida Wu <wydbaggio...@gmail.com>
Gerrit-Comment-Date: Thu, 21 Jan 2021 23:51:19 +0000
Gerrit-HasComments: Yes

Reply via email to