alexdrydew commented on issue #38745:
URL: https://github.com/apache/airflow/issues/38745#issuecomment-2041073791

   My concern was primarily about processing untrusted data from external 
sources in DAGs: it seems malicious data can be used to steal secrets in some 
cases:
   ```python
   @dag(...)
   def pipeline():
       data = download_parameters_from_s3()
       transformed_data = transform.expand_kwargs(data)
       upload_to_s3(transformed_data)
   ```
   in this case data author could include `{{ var.value.get('SOME_SECRET') }}` 
template and get access to the variable if the target storage is available for 
them. I understand that this case is probably out of scope of the airflow 
security model but the way how plain TaskFlow-style tasks communicate using 
XCom allows to process untrusted data in this way.
   
   But not to change focus: my main concern is that even if we don't return 
processed untrusted data to potentially malicious user back we still need to 
sanitize inputs specifically for `expand_kwargs` in order not to fail while 
processing data that may contain template-like syntax (e.g. parsed webpage)


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@airflow.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

Reply via email to