[ 
https://issues.apache.org/jira/browse/IMPALA-9180?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17161529#comment-17161529
 ] 

Thomas Tauber-Marshall commented on IMPALA-9180:
------------------------------------------------

Some examples of things that can be removed as part of this:
- In BackendDescriptorPB there's an 'address' and a 'krpc_address', see: 
https://github.com/apache/impala/blob/master/common/protobuf/statestore_service.proto#L63
 The port that's part of 'address' is no longer in use. However, we need both 
the hostname (currently in 'address') and the IP address (currently in 
'krpc_address') in order to do things like call RpcMgr::GetProxy. Probably the 
easiest thing to do is to replace 'TNetworkAddress address' with 'string 
hostname'
- In TQueryCtx, 'coord_address' is no longer in use and can be removed.
- The files be/src/service/impala-internal-service.(h/cc) are no longer in use 
and can be removed.
- The flag 'be_port' is no longer used and can be made a REMOVED_FLAG and all 
infrastructure around it can be cleaned up (eg. in 
tests/common/impala_service.py)

When removing these things, what to replace them with will need some thought:
- In cases where we've keyed maps on a TNetworkAddress of a backend, probably 
we should change it to key off of the UniqueIdPB backend id, eg. in 
ExecutorBlacklist I think we can refactor 'executor_list_' to no longer be a 
map<address -> list<backend>> and instead make it a map<backend_id, backend>, 
which will simply the logic in ExecutorBlacklist a lot.
- In cases where we're using the address in logging or error output, replacing 
it with the backend id has the advantage that its more precise (eg. if you're 
examining logs to try to track down the cause of an issue that occurred around 
the time an impalad crashed and a new one was restarted at the same address it 
will allow you to differentiate things that happened on each impalad) but the 
disadvantage of being less human friendly than hostnames. Including both might 
be appropriate sometimes, but might also make things too verbose. Possibly a 
good rule of thumb would be: if we're logging it, include as much info as 
possible (since its okay if logs get a little verbose) but if its going in an 
error message that will be returned to a client just use the hostname (since 
human readability is the most important here)

This work will probably involve touching a lot of code, so it probably makes 
sense to break it up into a series of patches to keep reviewing simple, eg. the 
changes suggested above to ExecutorBlacklist are self-contained and could be 
submitted in a patch by themselves before we even actually remove 
BackendDescriptorPB::address.

> Remove legacy ImpalaInternalService
> -----------------------------------
>
>                 Key: IMPALA-9180
>                 URL: https://issues.apache.org/jira/browse/IMPALA-9180
>             Project: IMPALA
>          Issue Type: Improvement
>          Components: Backend
>    Affects Versions: Impala 3.4.0
>            Reporter: Michael Ho
>            Assignee: Wenzhe Zhou
>            Priority: Minor
>
> Now that IMPALA-7984 is done, the legacy Thrift based Impala internal service 
> can now be removed. The port 22000 can also be freed up. In addition to code 
> change, the doc probably needs to be updated to reflect the fact that 22000 
> is no longer in use.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org

Reply via email to