martinbikandi opened a new issue, #56295:
URL: https://github.com/apache/airflow/issues/56295
### Apache Airflow version
2.11.0
### If "Other Airflow 2 version" selected, which one?
_No response_
### What happened?
I define two functions to be run in the operator:
```python
def run_kwarg(**kwargs):
print(kwargs)
def run_arg(context):
print(context)
```
Then **the task running the `run_krarg` function fails**, while **the other
works fine**. **When system_site_packages=False, both work fine**.
```python
# fails
task_run_kwarg = PythonVirtualenvOperator(
task_id="run_kwarg",
python_callable=run_kwarg,
op_kwargs={'context': "{{dag_run.conf}}"},
system_site_packages=True
)
# works
task_run_arg = PythonVirtualenvOperator(
task_id="run_arg",
python_callable=run_arg,
op_kwargs={'context': "{{dag_run.conf}}"},
system_site_packages=True
)
# works
run_kwarg_no_site_packages = PythonVirtualenvOperator(
task_id="run_kwarg_no-site-packages",
python_callable=run_kwarg,
op_kwargs={'context': "{{dag_run.conf}}"},
system_site_packages=False,
)
# works
run_arg_no_site_packages = PythonVirtualenvOperator(
task_id="run_arg_no-site-packages",
python_callable=run_arg,
op_kwargs={'context': "{{dag_run.conf}}"},
system_site_packages=False,
)
```
Additionally, when `outlets` is specified with a dataset, the task running
`run_kwargs` fails, when system_site_packages is either true or false, but the
task running `run_args` runs properly.
```python
# fails
run_kwarg_outlet = PythonVirtualenvOperator(
task_id="run_kwarg_outlet",
python_callable=run_kwarg,
op_kwargs={'context': "{{dag_run.conf}}"},
system_site_packages=True,
outlets=[Dataset('dataset://run_kwarg_outlet')]
)
# works
run_arg_outlet = PythonVirtualenvOperator(
task_id="run_arg_outlet",
python_callable=run_arg,
op_kwargs={'context': "{{dag_run.conf}}"},
system_site_packages=True,
outlets=[Dataset('dataset://run_arg_outlet')]
)
# fails
run_kwarg_no_site_packages_outlet = PythonVirtualenvOperator(
task_id="run_kwarg_no-site-packages_outlet",
python_callable=run_kwarg,
op_kwargs={'context': "{{dag_run.conf}}"},
system_site_packages=False,
outlets=[Dataset('dataset://run_kwarg_no-site-packages_outlet')]
)
# works
run_arg_no_site_packages_outlet = PythonVirtualenvOperator(
task_id="run_arg_no-site-packages_outlet",
python_callable=run_arg,
op_kwargs={'context': "{{dag_run.conf}}"},
system_site_packages=False,
outlets=[Dataset('dataset://run_arg_no-site-packages_outlet')]
)
```
On the webUI I see the following:
<img width="260" height="317" alt="Image"
src="https://github.com/user-attachments/assets/8e1abe48-50c6-45a1-bf86-69abe2fc8299"
/>
### What you think should happen instead?
**All defined tasks should work properly**, or at least consistently not
work when system_site_packages=True.
### How to reproduce
Fresh python=3.11.8 environment (I used miniconda)
`conda create -n test-environment python=3.11.8`
Install airflow and virtualenv in the environment:
`pip install apache-airflow==2.11.0 virtualenv=20.34.0`
Run airflow standalone, disable example dags and put the following dag into
the dags folder:
```python
from airflow import models
from airflow.operators.python import PythonVirtualenvOperator
from airflow.datasets import Dataset
def run_kwarg(**kwargs):
print(kwargs)
def run_arg(context):
print(context)
with models.DAG("dag_test",
schedule=None,
catchup=False) as dag:
# fails
task_run_kwarg = PythonVirtualenvOperator(
task_id="run_kwarg",
python_callable=run_kwarg,
op_kwargs={'context': "{{dag_run.conf}}"},
system_site_packages=True,
# outlets=[dummy_dataset]
)
# works
task_run_arg = PythonVirtualenvOperator(
task_id="run_arg",
python_callable=run_arg,
op_kwargs={'context': "{{dag_run.conf}}"},
system_site_packages=True,
# outlets=[dummy_dataset]
)
# works
run_kwarg_no_site_packages = PythonVirtualenvOperator(
task_id="run_kwarg_no-site-packages",
python_callable=run_kwarg,
op_kwargs={'context': "{{dag_run.conf}}"},
system_site_packages=False,
# outlets=[dummy_dataset]
)
# works
run_arg_no_site_packages = PythonVirtualenvOperator(
task_id="run_arg_no-site-packages",
python_callable=run_arg,
op_kwargs={'context': "{{dag_run.conf}}"},
system_site_packages=False,
# outlets=[dummy_dataset]
)
# fails
run_kwarg_outlet = PythonVirtualenvOperator(
task_id="run_kwarg_outlet",
python_callable=run_kwarg,
op_kwargs={'context': "{{dag_run.conf}}"},
system_site_packages=True,
outlets=[Dataset('dataset://run_kwarg_outlet')]
)
# works
run_arg_outlet = PythonVirtualenvOperator(
task_id="run_arg_outlet",
python_callable=run_arg,
op_kwargs={'context': "{{dag_run.conf}}"},
system_site_packages=True,
outlets=[Dataset('dataset://run_arg_outlet')]
)
# fails
run_kwarg_no_site_packages_outlet = PythonVirtualenvOperator(
task_id="run_kwarg_no-site-packages_outlet",
python_callable=run_kwarg,
op_kwargs={'context': "{{dag_run.conf}}"},
system_site_packages=False,
outlets=[Dataset('dataset://run_kwarg_no-site-packages_outlet')]
)
# works
run_arg_no_site_packages_outlet = PythonVirtualenvOperator(
task_id="run_arg_no-site-packages_outlet",
python_callable=run_arg,
op_kwargs={'context': "{{dag_run.conf}}"},
system_site_packages=False,
outlets=[Dataset('dataset://run_arg_no-site-packages_outlet')]
)
```
### Operating System
Windows 10 / WSL2 (Ubuntu 24.04)
### Versions of Apache Airflow Providers
_No response_
### Deployment
Virtualenv installation
### Deployment details
```bash
AIRFLOW_HOME=$(pwd)
conda activate airflow-tests
airflow standalone
```
I ran the dag with the webUI.
### Anything else?
The error when `system_site_packages=True` is:
`TypeError: cannot pickle 'module' object`
The error when `system_site_packages=False` and the outlet is specified is:
`ModuleNotFoundError: No module named 'airflow'`
### Are you willing to submit PR?
- [ ] Yes I am willing to submit a PR!
### Code of Conduct
- [x] I agree to follow this project's [Code of
Conduct](https://github.com/apache/airflow/blob/main/CODE_OF_CONDUCT.md)
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]