Tim Armstrong has posted comments on this change. ( http://gerrit.cloudera.org:8080/16318 )
Change subject: IMPALA-9867: Add Support for Spilling to S3: Milestone 1 ...................................................................... Patch Set 32: (4 comments) Few comments related to the hang http://gerrit.cloudera.org:8080/#/c/16318/32/be/src/runtime/tmp-file-mgr-internal.h File be/src/runtime/tmp-file-mgr-internal.h: http://gerrit.cloudera.org:8080/#/c/16318/32/be/src/runtime/tmp-file-mgr-internal.h@206 PS32, Line 206: TmpFileRemote(TmpFileGroup* file_group, TmpFileMgr::DeviceId device_id, Can we add a destructor with a DCHECK that will ensure that the resources have been returned to TmpFileBufferPool correctly? This might relate to the shared_ptr comment I made elsewhere - if we use shared_ptr consistently then we'll know that the destructor runs exactly when the object becomes unreachable. http://gerrit.cloudera.org:8080/#/c/16318/32/be/src/runtime/tmp-file-mgr.h File be/src/runtime/tmp-file-mgr.h: http://gerrit.cloudera.org:8080/#/c/16318/32/be/src/runtime/tmp-file-mgr.h@184 PS32, Line 184: /// A map to keep the TmpFile shared pointer alive by keeping a reference of the Why do we need to do this again? Can't we enqueue the shared_ptr into the tmp file pool and make ownership explicit, instead of doing this shared_ptr/raw pointer dance. I'm finding it hard to understand the ownership of these TmpFile objects while looking to see if there's a resource leak. http://gerrit.cloudera.org:8080/#/c/16318/32/be/src/runtime/tmp-file-mgr.cc File be/src/runtime/tmp-file-mgr.cc: http://gerrit.cloudera.org:8080/#/c/16318/32/be/src/runtime/tmp-file-mgr.cc@576 PS32, Line 576: if (tmp_dirs_remote_ctrl_.local_buff_dir_bytes_high_water_mark_.Add(file_size) I think this code might be vulnerable to a race where a thread ends up blocking indefinitely in DequeueTmpFilesPool, e.g. Assume bytes_limit is 2x and each file is x bytes. Start with x bytes allocated. Thread 1: increment, win race, 2x allocated Thread 2: increment, lose race, 3x allocated > bytes limit Thread 3: decrement, 2x Thread 4: increment, fails, 3x allocated Thread 4: decrement, 2x Thread 2: decrement, 1x In that case thread 4 could block in DequeueTmpFilesPool even though there's free capacity. I think the basic flaw is that one of the conditions for a thread blocking on tmp_files_available_cv_ is that it should have checked for buffer availabity before doing so, but it's not part of the condition variable loop condition. Using the atomics is ok in other places because we just return an error when there's a race. I think it'd be best to protect it behind a lock and also signal the condition variable when it's decremented from >= bytes_limit to < bytes_limit. http://gerrit.cloudera.org:8080/#/c/16318/32/be/src/runtime/tmp-file-mgr.cc@1835 PS32, Line 1835: void TmpFileBufferPool::EnqueueTmpFilesPool(TmpFile* tmp_file, bool front) { I'm suspicious about using the raw pointer here -- To view, visit http://gerrit.cloudera.org:8080/16318 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I419b1d5dbbfe35334d9f964c4b65e553579fdc89 Gerrit-Change-Number: 16318 Gerrit-PatchSet: 32 Gerrit-Owner: Yida Wu <wydbaggio...@gmail.com> Gerrit-Reviewer: Abhishek Rawat <ara...@cloudera.com> Gerrit-Reviewer: Impala Public Jenkins <impala-public-jenk...@cloudera.com> Gerrit-Reviewer: Sahil Takiar <stak...@cloudera.com> Gerrit-Reviewer: Tim Armstrong <tarmstr...@cloudera.com> Gerrit-Reviewer: Yida Wu <wydbaggio...@gmail.com> Gerrit-Comment-Date: Thu, 21 Jan 2021 23:51:19 +0000 Gerrit-HasComments: Yes