Thomas Tauber-Marshall has posted comments on this change. ( http://gerrit.cloudera.org:8080/14677 )
Change subject: IMPALA-9137, IMPALA-9138: Mark failed RPCs as retryable and update blacklist ...................................................................... Patch Set 5: (2 comments) http://gerrit.cloudera.org:8080/#/c/14677/3/common/thrift/Common.thrift File common/thrift/Common.thrift: http://gerrit.cloudera.org:8080/#/c/14677/3/common/thrift/Common.thrift@1 PS3, Line 1: // Licensed to the Apache Software Foundation (ASF) under one > Types.thrift contains a bunch of thrift structs that aren't relevant to Sta Not really sure I see any benefit - I guess it very slightly reduces recompilation time when people make changes to Types.thrift that don't affect Status.thrift, though how frequent is that really, and its not really any cleaner imo since now we have to answer the question of what belongs in Common vs. Types, which isn't clear - and its nice to avoid unnecessary code churn, but obviously not that big of a deal. Imo, if anything the refactoring that we need is to make the thrift and protobuf definitions of this stuff more consistent, i.e. add a status.proto instead of defining it in common.proto, but again not necessarily worth the code churn. http://gerrit.cloudera.org:8080/#/c/14677/3/common/thrift/Status.thrift File common/thrift/Status.thrift: http://gerrit.cloudera.org:8080/#/c/14677/3/common/thrift/Status.thrift@48 PS3, Line 48: > Agree those use cases are important. I think the current mechanism will wor Agreed that a string->string map was a bad suggestion, just throwing stuff out there. Idk about this RetryInfo approach, though. I'm concerned that it doesn't really map to the existing semantics of what a Status is (a small object that indicates success/failure with a message about the cause of failure), and that as a result going this direction is awkward and potentially error-prone. For example, you seem to be assuming that at any random point in the code we can construct a Status object with a RetryInfo as part of it, and that will be bubbled up to become the ultimate query status that is returned to the coordinator. However, its currently a common pattern for code to call some function, get an error status back, and then instead of actually returning that Status object it constructs a new Status object, usually with an error message that contains the error message of the original error Status. In a case like this, if the original Status has a RetryInfo it was probably lost when the new Status was created. Of course, we can go through and make sure that all of the code paths where we generate RetryInfos do in fact bubble that Status up to become the overall query status, but again that seems error prone and means that the semantics of Status objects are sort of different in different places, i.e. sometimes a Status is just a potential indicator of some error that can be handled and discarded by the calling code, sometimes its an <Important_Query_Status> that must be preserved and returned to the coordinator. -- To view, visit http://gerrit.cloudera.org:8080/14677 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I733cca13847fde43c8ea2ae574d3ae04bd06419c Gerrit-Change-Number: 14677 Gerrit-PatchSet: 5 Gerrit-Owner: Sahil Takiar <stak...@cloudera.com> Gerrit-Reviewer: Impala Public Jenkins <impala-public-jenk...@cloudera.com> Gerrit-Reviewer: Michael Ho <k...@cloudera.com> Gerrit-Reviewer: Sahil Takiar <stak...@cloudera.com> Gerrit-Reviewer: Thomas Tauber-Marshall <tmarsh...@cloudera.com> Gerrit-Comment-Date: Mon, 25 Nov 2019 23:09:37 +0000 Gerrit-HasComments: Yes