Tim Armstrong has posted comments on this change. ( http://gerrit.cloudera.org:8080/13178 )
Change subject: IMPALA-4658: Potential race if compiler reorders ReachedLimit() usage. ...................................................................... Patch Set 4: (2 comments) http://gerrit.cloudera.org:8080/#/c/13178/4//COMMIT_MSG Commit Message: http://gerrit.cloudera.org:8080/#/c/13178/4//COMMIT_MSG@9 PS4, Line 9: The ExecNode::num_rows_returned_ member is shared by the scan node : (thread) and the related scanner threads. There is a potential race here : since the compiler is free to reorder the load/store of : num_rows_returned_ member. Also, num_rows_returned_ is not accessed in : a thread safe manner. > Can this actually cause problems? I think that all scanners read num_rows_r It's hard to reason about the implications. It would be valid for the load instruction to get pulled out of a loop and potentially have a non-terminating loop. I don't think we have code where this would happen now, but who knows - the point is more that the pattern is unsafe and you can't reason about the behaviour of the functional locally. It also is going to generate warnings in TSAN, which we'd prefer to avoid. http://gerrit.cloudera.org:8080/#/c/13178/4/be/src/exec/hdfs-scan-node.cc File be/src/exec/hdfs-scan-node.cc: http://gerrit.cloudera.org:8080/#/c/13178/4/be/src/exec/hdfs-scan-node.cc@134 PS4, Line 134: // Update the number of materialized rows now instead of when they are materialized. : // This means that scanners might process and queue up more rows than are necessary : // for the limit case but we want to avoid the synchronized writes to : // num_rows_returned_. : IncrementNumRowsReturnedShared(row_batch->num_rows()); > The comment + the Shared function seem weird together - the comment says th Yeah the comment is a little weird now in that it's justifying using the standard pattern for num_rows_returned_ instead of an alternative design. The alternative design would actually probably violate invariants of ExecNode by incrementing num_rows_returned_ before returning the actual nodes. So it doesn't make a lot of sense in context. We could just delete the comment I think. Or something short describing the consequences like // Note that the scanner threads may have processed and queued up extra rows before this thread incremented the rows returned. -- To view, visit http://gerrit.cloudera.org:8080/13178 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I4cbbfad80f7ab87dd6f192a24e2c68f7c66b047e Gerrit-Change-Number: 13178 Gerrit-PatchSet: 4 Gerrit-Owner: Abhishek Rawat <ara...@cloudera.com> Gerrit-Reviewer: Abhishek Rawat <ara...@cloudera.com> Gerrit-Reviewer: Csaba Ringhofer <csringho...@cloudera.com> Gerrit-Reviewer: Impala Public Jenkins <impala-public-jenk...@cloudera.com> Gerrit-Reviewer: Tim Armstrong <tarmstr...@cloudera.com> Gerrit-Reviewer: Todd Lipcon <t...@apache.org> Gerrit-Comment-Date: Thu, 02 May 2019 20:27:39 +0000 Gerrit-HasComments: Yes