roach231428 opened a new issue, #50756:
URL: https://github.com/apache/airflow/issues/50756

   ### Apache Airflow Provider(s)
   
   apache-hdfs
   
   ### Versions of Apache Airflow Providers
   
   apache-airflow-providers-apache-hdfs==4.9.0
   
   ### Apache Airflow version
   
   3.0.1
   
   ### Operating System
   
   Ubuntu 22.04
   
   ### Deployment
   
   Docker-Compose
   
   ### Deployment details
   
   I deployed the airflow-apiserver and airflow-worker and others on the same 
machine. I used a Dockerfile to build my  airflow images, which use `RUN uv pip 
install --no-cache-dir "apache-airflow==${AIRFLOW_VERSION}" -r 
/tmp/requirements.txt` to install the providers. The base image is 
`apache/airflow:3.0.1`.
   
   ### What happened
   
   When I run a DAG that uses `WebHDFSHook`, it raises the error 
`AttributeError: 'Connection' object has no attribute 'get_password'`. However, 
I can execute the same code line by line within the worker container without 
any issues. I'm completely puzzled as to why this error occurs during DAG 
execution.
   
   The full error log:
   ```bash
   File 
"/home/airflow/.local/lib/python3.12/site-packages/airflow/sdk/execution_time/task_runner.py",
 line 838 in run
   
   File 
"/home/airflow/.local/lib/python3.12/site-packages/airflow/sdk/execution_time/task_runner.py",
 line 1130 in _execute_task
   
   File 
"/home/airflow/.local/lib/python3.12/site-packages/airflow/sdk/bases/operator.py",
 line 408 in wrapper
   
   File 
"/home/airflow/.local/lib/python3.12/site-packages/airflow/sdk/bases/decorator.py",
 line 251 in execute
   
   File 
"/home/airflow/.local/lib/python3.12/site-packages/airflow/sdk/bases/operator.py",
 line 408 in wrapper
   
   File 
"/home/airflow/.local/lib/python3.12/site-packages/airflow/providers/standard/operators/python.py",
 line 212 in execute
   
   File 
"/home/airflow/.local/lib/python3.12/site-packages/airflow/providers/standard/operators/python.py",
 line 235 in execute_callable
   
   File 
"/home/airflow/.local/lib/python3.12/site-packages/airflow/sdk/execution_time/callback_runner.py",
 line 81 in run
   
   File "/opt/airflow/dags/test/test_hdfs.py", line 22 in list_hdfs_directory
   
   File 
"/home/airflow/.local/lib/python3.12/site-packages/airflow/providers/apache/hdfs/hooks/webhdfs.py",
 line 154 in check_for_path
   
   File 
"/home/airflow/.local/lib/python3.12/site-packages/airflow/providers/apache/hdfs/hooks/webhdfs.py",
 line 72 in get_conn
   
   File 
"/home/airflow/.local/lib/python3.12/site-packages/airflow/providers/apache/hdfs/hooks/webhdfs.py",
 line 91 in _find_valid_server
   ```
   
   ### What you think should happen instead
   
   No error.
   
   ### How to reproduce
   
   1. Use the official 
[docker-compose.yaml](https://airflow.apache.org/docs/apache-airflow/3.0.1/docker-compose.yaml)
 file and modify the `x-airflow-common` part
   ```yaml
   # image: ${AIRFLOW_IMAGE_NAME:-apache/airflow:3.0.1}
     build: .
   ```
   
   2. Create a `Dockerfile`
   ```dockerfile
   FROM apache/airflow:3.0.1
   
   USER root
   
   # Install Kerberos and build tools
   RUN apt-get update && \
       apt-get install -y gcc g++ libkrb5-dev krb5-user && \
       apt-get clean && \
       rm -rf /var/lib/apt/lists/*
   
   USER airflow
   
   # Install HDFS provider
   RUN pip install apache-airflow-providers-apache-hdfs==4.9.0
   ```
   
   3. Run `docker compose build` to build images
   4. Add a `test_hdfs.py` file under `dags/`
   ```python
   from datetime import datetime
   from airflow.providers.apache.hdfs.hooks.webhdfs import WebHDFSHook
   from airflow.sdk import dag, task
   
   # Define the default arguments for the DAG
   default_args = {
       'owner': 'airflow',
       'start_date': datetime(2024, 9, 1),
       'retries': 1,
   }
   
   # Instantiate the DAG
   @dag(default_args=default_args, start_date=datetime(2025, 1, 1))
   def test_hdfs():
   
       @task()
       def list_hdfs_directory():
           # Initialize the WebHDFS Hook
           hook = WebHDFSHook(webhdfs_conn_id='webhdfs_default')
   
           # Get directory info
           res = hook.check_for_path('/airflow')
   
           # Print the result
           print(res)
   
       # Set the task order
       list_hdfs_task = list_hdfs_directory()
   
   dag1 = test_hdfs()
   ```
   
   5. Run `docker compose up -d`
   6. Add a `webhdfs` connection named `webhdfs_default` with host, login, 
password, and port on Admin > Connections.
   7. Run the `test_hdfs` task and get the error messages.
   
   ### Anything else
   
   _No response_
   
   ### Are you willing to submit PR?
   
   - [ ] Yes I am willing to submit a PR!
   
   ### Code of Conduct
   
   - [x] I agree to follow this project's [Code of 
Conduct](https://github.com/apache/airflow/blob/main/CODE_OF_CONDUCT.md)
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@airflow.apache.org.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

Reply via email to