[ https://issues.apache.org/jira/browse/IMPALA-9180?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17161529#comment-17161529 ]
Thomas Tauber-Marshall commented on IMPALA-9180: ------------------------------------------------ Some examples of things that can be removed as part of this: - In BackendDescriptorPB there's an 'address' and a 'krpc_address', see: https://github.com/apache/impala/blob/master/common/protobuf/statestore_service.proto#L63 The port that's part of 'address' is no longer in use. However, we need both the hostname (currently in 'address') and the IP address (currently in 'krpc_address') in order to do things like call RpcMgr::GetProxy. Probably the easiest thing to do is to replace 'TNetworkAddress address' with 'string hostname' - In TQueryCtx, 'coord_address' is no longer in use and can be removed. - The files be/src/service/impala-internal-service.(h/cc) are no longer in use and can be removed. - The flag 'be_port' is no longer used and can be made a REMOVED_FLAG and all infrastructure around it can be cleaned up (eg. in tests/common/impala_service.py) When removing these things, what to replace them with will need some thought: - In cases where we've keyed maps on a TNetworkAddress of a backend, probably we should change it to key off of the UniqueIdPB backend id, eg. in ExecutorBlacklist I think we can refactor 'executor_list_' to no longer be a map<address -> list<backend>> and instead make it a map<backend_id, backend>, which will simply the logic in ExecutorBlacklist a lot. - In cases where we're using the address in logging or error output, replacing it with the backend id has the advantage that its more precise (eg. if you're examining logs to try to track down the cause of an issue that occurred around the time an impalad crashed and a new one was restarted at the same address it will allow you to differentiate things that happened on each impalad) but the disadvantage of being less human friendly than hostnames. Including both might be appropriate sometimes, but might also make things too verbose. Possibly a good rule of thumb would be: if we're logging it, include as much info as possible (since its okay if logs get a little verbose) but if its going in an error message that will be returned to a client just use the hostname (since human readability is the most important here) This work will probably involve touching a lot of code, so it probably makes sense to break it up into a series of patches to keep reviewing simple, eg. the changes suggested above to ExecutorBlacklist are self-contained and could be submitted in a patch by themselves before we even actually remove BackendDescriptorPB::address. > Remove legacy ImpalaInternalService > ----------------------------------- > > Key: IMPALA-9180 > URL: https://issues.apache.org/jira/browse/IMPALA-9180 > Project: IMPALA > Issue Type: Improvement > Components: Backend > Affects Versions: Impala 3.4.0 > Reporter: Michael Ho > Assignee: Wenzhe Zhou > Priority: Minor > > Now that IMPALA-7984 is done, the legacy Thrift based Impala internal service > can now be removed. The port 22000 can also be freed up. In addition to code > change, the doc probably needs to be updated to reflect the fact that 22000 > is no longer in use. -- This message was sent by Atlassian Jira (v8.3.4#803005) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org For additional commands, e-mail: issues-all-h...@impala.apache.org