martinbikandi opened a new issue, #56295:
URL: https://github.com/apache/airflow/issues/56295

   ### Apache Airflow version
   
   2.11.0
   
   ### If "Other Airflow 2 version" selected, which one?
   
   _No response_
   
   ### What happened?
   
   I define two functions to be run in the operator:
   
   ```python
   def run_kwarg(**kwargs):
       print(kwargs)
   
   def run_arg(context):
       print(context)
   ```
   
   Then **the task running the `run_krarg` function fails**, while **the other 
works fine**. **When system_site_packages=False, both work fine**.
   
   ```python
   # fails
   task_run_kwarg = PythonVirtualenvOperator(
       task_id="run_kwarg",
       python_callable=run_kwarg,
       op_kwargs={'context': "{{dag_run.conf}}"},
       system_site_packages=True
   )
   
   # works
   task_run_arg = PythonVirtualenvOperator(
       task_id="run_arg",
       python_callable=run_arg,
       op_kwargs={'context': "{{dag_run.conf}}"},
       system_site_packages=True
   )
   
   # works
   run_kwarg_no_site_packages = PythonVirtualenvOperator(
       task_id="run_kwarg_no-site-packages",
       python_callable=run_kwarg,
       op_kwargs={'context': "{{dag_run.conf}}"},
       system_site_packages=False,
   )
   
   # works
   run_arg_no_site_packages = PythonVirtualenvOperator(
       task_id="run_arg_no-site-packages",
       python_callable=run_arg,
       op_kwargs={'context': "{{dag_run.conf}}"},
       system_site_packages=False,
   )
   ```
   
   Additionally, when `outlets` is specified with a dataset, the task running 
`run_kwargs` fails, when system_site_packages is either true or false, but the 
task running `run_args` runs properly.
   
   ```python
   # fails
   run_kwarg_outlet = PythonVirtualenvOperator(
       task_id="run_kwarg_outlet",
       python_callable=run_kwarg,
       op_kwargs={'context': "{{dag_run.conf}}"},
       system_site_packages=True,
       outlets=[Dataset('dataset://run_kwarg_outlet')]
   )
   
   # works
   run_arg_outlet = PythonVirtualenvOperator(
       task_id="run_arg_outlet",
       python_callable=run_arg,
       op_kwargs={'context': "{{dag_run.conf}}"},
       system_site_packages=True,
       outlets=[Dataset('dataset://run_arg_outlet')]
   )
   
   # fails
   run_kwarg_no_site_packages_outlet = PythonVirtualenvOperator(
       task_id="run_kwarg_no-site-packages_outlet",
       python_callable=run_kwarg,
       op_kwargs={'context': "{{dag_run.conf}}"},
       system_site_packages=False,
       outlets=[Dataset('dataset://run_kwarg_no-site-packages_outlet')]
   )
   
   # works
   run_arg_no_site_packages_outlet = PythonVirtualenvOperator(
       task_id="run_arg_no-site-packages_outlet",
       python_callable=run_arg,
       op_kwargs={'context': "{{dag_run.conf}}"},
       system_site_packages=False,
       outlets=[Dataset('dataset://run_arg_no-site-packages_outlet')]
   )
   ```
   
   On the webUI I see the following:
   
   <img width="260" height="317" alt="Image" 
src="https://github.com/user-attachments/assets/8e1abe48-50c6-45a1-bf86-69abe2fc8299";
 />
   
   ### What you think should happen instead?
   
   **All defined tasks should work properly**, or at least consistently not 
work when system_site_packages=True.
   
   ### How to reproduce
   
   Fresh python=3.11.8 environment (I used miniconda)
   `conda create -n test-environment python=3.11.8`
   
   Install airflow and virtualenv in the environment:
   `pip install apache-airflow==2.11.0 virtualenv=20.34.0`
   
   Run airflow standalone, disable example dags and put the following dag into 
the dags folder:
   
   ```python
   from airflow import models
   from airflow.operators.python import PythonVirtualenvOperator
   
   from airflow.datasets import Dataset
   
   def run_kwarg(**kwargs):
       print(kwargs)
   
   def run_arg(context):
       print(context)
   
   with models.DAG("dag_test",
                   schedule=None,
                   catchup=False) as dag:
   
       # fails
       task_run_kwarg = PythonVirtualenvOperator(
           task_id="run_kwarg",
           python_callable=run_kwarg,
           op_kwargs={'context': "{{dag_run.conf}}"},
           system_site_packages=True,
           # outlets=[dummy_dataset]
       )
   
       # works
       task_run_arg = PythonVirtualenvOperator(
           task_id="run_arg",
           python_callable=run_arg,
           op_kwargs={'context': "{{dag_run.conf}}"},
           system_site_packages=True,
           # outlets=[dummy_dataset]
       )
   
       # works
       run_kwarg_no_site_packages = PythonVirtualenvOperator(
           task_id="run_kwarg_no-site-packages",
           python_callable=run_kwarg,
           op_kwargs={'context': "{{dag_run.conf}}"},
           system_site_packages=False,
           # outlets=[dummy_dataset]
       )
   
       # works
       run_arg_no_site_packages = PythonVirtualenvOperator(
           task_id="run_arg_no-site-packages",
           python_callable=run_arg,
           op_kwargs={'context': "{{dag_run.conf}}"},
           system_site_packages=False,
           # outlets=[dummy_dataset]
       )
   
       # fails
       run_kwarg_outlet = PythonVirtualenvOperator(
           task_id="run_kwarg_outlet",
           python_callable=run_kwarg,
           op_kwargs={'context': "{{dag_run.conf}}"},
           system_site_packages=True,
           outlets=[Dataset('dataset://run_kwarg_outlet')]
       )
   
       # works
       run_arg_outlet = PythonVirtualenvOperator(
           task_id="run_arg_outlet",
           python_callable=run_arg,
           op_kwargs={'context': "{{dag_run.conf}}"},
           system_site_packages=True,
           outlets=[Dataset('dataset://run_arg_outlet')]
       )
   
       # fails
       run_kwarg_no_site_packages_outlet = PythonVirtualenvOperator(
           task_id="run_kwarg_no-site-packages_outlet",
           python_callable=run_kwarg,
           op_kwargs={'context': "{{dag_run.conf}}"},
           system_site_packages=False,
           outlets=[Dataset('dataset://run_kwarg_no-site-packages_outlet')]
       )
   
       # works
       run_arg_no_site_packages_outlet = PythonVirtualenvOperator(
           task_id="run_arg_no-site-packages_outlet",
           python_callable=run_arg,
           op_kwargs={'context': "{{dag_run.conf}}"},
           system_site_packages=False,
           outlets=[Dataset('dataset://run_arg_no-site-packages_outlet')]
       )
   ```
   
   ### Operating System
   
   Windows 10 / WSL2 (Ubuntu 24.04)
   
   ### Versions of Apache Airflow Providers
   
   _No response_
   
   ### Deployment
   
   Virtualenv installation
   
   ### Deployment details
   
   ```bash
   AIRFLOW_HOME=$(pwd)
   conda activate airflow-tests
   airflow standalone
   ```
   
   I ran the dag with the webUI.
   
   ### Anything else?
   
   The error when `system_site_packages=True` is:
   
   `TypeError: cannot pickle 'module' object`
   
   The error when `system_site_packages=False` and the outlet is specified is:
   
    `ModuleNotFoundError: No module named 'airflow'`
   
   ### Are you willing to submit PR?
   
   - [ ] Yes I am willing to submit a PR!
   
   ### Code of Conduct
   
   - [x] I agree to follow this project's [Code of 
Conduct](https://github.com/apache/airflow/blob/main/CODE_OF_CONDUCT.md)
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to