GitHub user hanleybrand closed a discussion: Issue troubleshooting an s3 
connection for remote logging to Min.io

I'm trying to configure a Kubernetes deployment of Airflow 2.4.1 to use remote 
logging, but I can't create a working Amazon Web Services connection in the 
Airflow UI (or at least I get an error when I test the connection)
 
### TLDR; 

While searching and hacking around seem to have solved some of the issue (or at 
least fixed some other issues I hadn't been aware of?) when I use either the UI 
or python route to test_connection I get an error, but every other way I 
connect seems to work. 

The error I get when testing the connection is: 

> **An error occurred (InvalidParameterValue) when calling the 
> GetCallerIdentity operation: Unsupported action GetCallerIdentity** 

The biggest issue of course, is that when I deploy/redeploy to the kubernetes 
cluster logs are not being written to the 3s bucket, which makes it difficult 
to check the logs to see what's going on 😉 

My guess at the problem when I look at the code (  
[AwsGenericHook](https://github.com/apache/airflow/blob/main/airflow/providers/amazon/aws/hooks/base_aws.py#L631)?
  ) that I think is generating that error, the test_connection function is 
checking the connection using STS (session tags), which I don't have configured 
(I don't know if min.io supports STS actually, it looks like it supports 
AssumeRoleWithWebIdentity )

### My configuration:

Anonymizing my config particulars, let's say Airflow is deployed to 
`https://airflow.k8s.example.com`

In Admin -> Connections  I have a conn_id 'amazon_s3' of type 'Amazon Web 
Services' with the correct AWS Access Key ID & Secret Access Keys set, plus 
Extra set to: 

```
{ "aws_access_key_id": "LAWGIN", 
  "aws_secret_access_key": "PASSWOID", 
  "endpoint_url": "https://minio01.k8s.example.com:9000";, 
   "region_name": "us-east-1"
   "encrypted": "yes", "verify": "False"}
```

**Aiflow.cfg logging section** 

```cfg
[logging]
colored_console_log = False
encrypt_s3_logs = False
remote_base_log_folder = s3://airflow-logs-dev
remote_log_conn_id = amazon_s3
remote_logging = True
```

**Notes**:  

- "encrypted" set to "no" or "yes" does not change what error I got.  
- "verify" can likewise be set to "False", "None" or "True" -- the only thing 
that changes the error is removing the key/value pair for verify  
- I can also remove the login/password k/vs from the Exrta JSON string, it 
doesn't seem to matter. I leave them in because some of the discussions implied 
that they needed to be in the extra string.

With "verify" key included I get the following error:

> **- SSL validation failed for https://minio01.k8s.example.com:9000/ [Errno 2] 
> No such file or directory**

If I remove the  "verify" key entirely, I also get this error.

> **An error occurred (InvalidParameterValue) when calling the 
> GetCallerIdentity operation: Unsupported action GetCallerIdentity** 

### HIstory, stuff I've tried, etc

I had started with what the docs say, which was the simpler extra set to 
{ "aws_access_key_id": "LAWGIN",  "aws_secret_access_key": "PASSWOID", 
 "host": "https://minio01.k8s.example.com:9000"} but I got errors regarding the 
region not being set correctly.

I have confirmed with the aws-cli and min.io mc clients that the 
aws_access_key_id, aws_secret_access_key and endpoint_url paremters above are 
correct, e.g.:

 ```shell
aws --endpoint-url https://minio01.k8s.example.com:9000 s3 ls 
s3://airflow-logs-dev
```
and
``` shell
mc alias set aa https://minio01.k8s.example.com:9000 LAWGIN PASSWOID
mc ls aa/airflow-logs-dev
```
both allow me to interact with the Min.io instance, so I am certain the issue 
is not a typo in my login/password, etc. (I copy/pasted values out of my config 
to test with aws/mc)

Additionally, I can execute a shell in the webserver pod and using python in 
the shell:

```python
>>> import boto3
>>> client = boto3.client(
...     's3',
...     aws_access_key_id='LAWGIN',
...     aws_secret_access_key='PASSWOID',
...     region_name='us-east-1',
...     endpoint_url='https://minio01.k8s.example.com:9000',
... )
>>> 
>>> client.list_buckets()
{'ResponseMetadata': {'RequestId': '171CD39BF1DA4102', 'HostId': '', 
'HTTPStatusCode': 200, 'HTTPHeaders': {'accept-ranges': 'bytes', 
'content-length': '461', 'content-security-policy': 'block-all-mixed-content', 
'content-type': 'application/xml', 'server': 'MinIO', 
'strict-transport-security': 'max-age=31536000; includeSubDomains', 'vary': 
'Origin, Accept-Encoding', 'x-amz-bucket-region': 'us-east-1', 
'x-amz-request-id': '171CD39BF1DA4102', 'x-content-type-options': 'nosniff', 
'x-xss-protection': '1; mode=block', 'date': 'Mon, 10 Oct 2022 21:50:28 GMT'}, 
'RetryAttempts': 0}, 'Buckets': [{'Name': 'airflow-logs-dev', 'CreationDate': 
datetime.datetime(2022, 10, 3, 16, 26, 7, 938000, tzinfo=tzlocal())},....#etc

```

There was a clue that it it might be certificate related so I tried adding `RUN 
cp $(python -m certifi) /etc/ssl/certs/` to my Dockerfile, and while it seems 
to help (no longer get a cert error with `S3Hook.check_for_bucket()`) I'm back 
to getting the InvalidParameterValue/GetCallerIdentity error 

```python
>>> from airflow.providers.amazon.aws.hooks.s3 import S3Hook
>>> s3h = S3Hook('amazon_s3')
>>> b = 'airflow-logs-dev'
>>> s3h.check_for_bucket(b)
[2022-10-11T02:39:41.134+0000] {base.py:71} INFO - Using connection ID 
'amazon_s3' for task execution.
[2022-10-11T02:39:41.137+0000] {connection_wrapper.py:303} INFO - AWS 
Connection (conn_id='amazon_s3', conn_type='aws') credentials retrieved from 
login and password.
True
# that's good
>>> s3h.test_connection()
(False, 'An error occurred (InvalidParameterValue) when calling the 
GetCallerIdentity operation: Unsupported action GetCallerIdentity')
# not as good
```

Is anyone familiar with setting this kind of connection/logging config up on 
the more recent versions of Airflow?   As far as I can tell the  error is 
coming from the 
[AwsGenericHook](https://github.com/apache/airflow/blob/main/airflow/providers/amazon/aws/hooks/base_aws.py#L631)
 class, but it seems like an error with session tokens (STS) which isn't what 
I'm using in my config, so it's confusing.

edit: edited for clarity, hopefully

GitHub link: https://github.com/apache/airflow/discussions/26979

----
This is an automatically sent email for [email protected].
To unsubscribe, please send an email to: [email protected]

Reply via email to