potiuk commented on PR #35748:
URL: https://github.com/apache/airflow/pull/35748#issuecomment-1818945067

   Shared process namespace in this case kind of violates the whole assumption 
the separate kerberos container here is introduced for. The whole idea is that 
"airflow" containers do not see the kerberors / keytab used to refresh the 
short time token. Only kerberos container/refreshing process should ever be 
able to get access to keytab.
   
   Keytab are long-living and provide full access to kerberos service. They use 
symmetric encryption and once you get hold of it, you are able to communicate 
with Kerberos server and obtain the short living ticket to do the job you are 
supposed to do - so it should be very strongly guarded property.
   
   Generally Airflow components should never be able to see or obtain keytab 
files (only `airflow kerberos` refreshing container should have access to it) - 
instead all the components should only access the short-living ticket. This is 
the basic assumption that the whole separtion of the containers is based on - 
the two containers share only filesystem where the ticket is refreshed and 
nothing else.
   
   See https://www.fortinet.com/resources/cyberglossary/kerberos-authentication 
for example.
   
   If airflow components (specifically worker) will get access to keytab, 
someone could write a DAG to - for example - send the keytab to a remote system 
and once this happens and such keytab can be used by anyone to do anything. 
When you do the same with short living ticket, you are limited to only what the 
service allows the ticket to do + it's time limited so exposure of such breach 
is potentially much more limited.
   
   Of course, ysing SharedProcessNamespace is not explicitly violating this 
assumption. It does not give the airflow component direct access to the keytab. 
So far so good. But it opens a way to other ways of obtaining the keytab by 
malicious actors. For example, you could connect to the kerberos process with 
gdb and dump the memory of it and retrieve the keytab from memory of that 
process. And this is only one of the ways you could retrieve that information, 
there are many others. Some of them you might protect better from, but 
generally speaking, it significantly decreases the isolation that container 
mechanism introduces (and the reason why the two are run in separate 
containers).
   
   So while this solution is not directly giving access to the keytab, it does 
decrease isolation and introduces security risks. I am quite sure when we 
introduce it, this will be flagged as a security issue and we will have to fix 
it.
   
   I think there are two options:
   
   1) we should figure out a different way how to communicate with the kerberos 
refreshing process and stop it - one of the options is to let the running 
process be stoppable by sending a "shutdown" message via TCP connection - and 
expose the sidecar container's port to the main containers. Another option 
(probably simpler to implement) is to keep some kind of "shutdown" lock file in 
the same filesystem where the ticket is stored and signal the rerfreshing 
process that it should exit. It could be based on lock mechanism available 
natively in python. 
   
   2) If we limit the solution to only kubernetes 1.28+, we have the option of 
using Native Sidecar Containers and marking the kerberos sidecar as such 
https://kubernetes.io/blog/2023/08/25/native-sidecar-containers/
   
   Likely the best is combination of these 1) (for k8s < 1.28) and 2) (for k8s 
>= 1.28).
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@airflow.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

Reply via email to