GitHub user hanleybrand closed a discussion: Issue troubleshooting an s3 connection for remote logging to Min.io
I'm trying to configure a Kubernetes deployment of Airflow 2.4.1 to use remote logging, but I can't create a working Amazon Web Services connection in the Airflow UI (or at least I get an error when I test the connection) ### TLDR; While searching and hacking around seem to have solved some of the issue (or at least fixed some other issues I hadn't been aware of?) when I use either the UI or python route to test_connection I get an error, but every other way I connect seems to work. The error I get when testing the connection is: > **An error occurred (InvalidParameterValue) when calling the > GetCallerIdentity operation: Unsupported action GetCallerIdentity** The biggest issue of course, is that when I deploy/redeploy to the kubernetes cluster logs are not being written to the 3s bucket, which makes it difficult to check the logs to see what's going on 😉 My guess at the problem when I look at the code ( [AwsGenericHook](https://github.com/apache/airflow/blob/main/airflow/providers/amazon/aws/hooks/base_aws.py#L631)? ) that I think is generating that error, the test_connection function is checking the connection using STS (session tags), which I don't have configured (I don't know if min.io supports STS actually, it looks like it supports AssumeRoleWithWebIdentity ) ### My configuration: Anonymizing my config particulars, let's say Airflow is deployed to `https://airflow.k8s.example.com` In Admin -> Connections I have a conn_id 'amazon_s3' of type 'Amazon Web Services' with the correct AWS Access Key ID & Secret Access Keys set, plus Extra set to: ``` { "aws_access_key_id": "LAWGIN", "aws_secret_access_key": "PASSWOID", "endpoint_url": "https://minio01.k8s.example.com:9000", "region_name": "us-east-1" "encrypted": "yes", "verify": "False"} ``` **Aiflow.cfg logging section** ```cfg [logging] colored_console_log = False encrypt_s3_logs = False remote_base_log_folder = s3://airflow-logs-dev remote_log_conn_id = amazon_s3 remote_logging = True ``` **Notes**: - "encrypted" set to "no" or "yes" does not change what error I got. - "verify" can likewise be set to "False", "None" or "True" -- the only thing that changes the error is removing the key/value pair for verify - I can also remove the login/password k/vs from the Exrta JSON string, it doesn't seem to matter. I leave them in because some of the discussions implied that they needed to be in the extra string. With "verify" key included I get the following error: > **- SSL validation failed for https://minio01.k8s.example.com:9000/ [Errno 2] > No such file or directory** If I remove the "verify" key entirely, I also get this error. > **An error occurred (InvalidParameterValue) when calling the > GetCallerIdentity operation: Unsupported action GetCallerIdentity** ### HIstory, stuff I've tried, etc I had started with what the docs say, which was the simpler extra set to { "aws_access_key_id": "LAWGIN", "aws_secret_access_key": "PASSWOID", "host": "https://minio01.k8s.example.com:9000"} but I got errors regarding the region not being set correctly. I have confirmed with the aws-cli and min.io mc clients that the aws_access_key_id, aws_secret_access_key and endpoint_url paremters above are correct, e.g.: ```shell aws --endpoint-url https://minio01.k8s.example.com:9000 s3 ls s3://airflow-logs-dev ``` and ``` shell mc alias set aa https://minio01.k8s.example.com:9000 LAWGIN PASSWOID mc ls aa/airflow-logs-dev ``` both allow me to interact with the Min.io instance, so I am certain the issue is not a typo in my login/password, etc. (I copy/pasted values out of my config to test with aws/mc) Additionally, I can execute a shell in the webserver pod and using python in the shell: ```python >>> import boto3 >>> client = boto3.client( ... 's3', ... aws_access_key_id='LAWGIN', ... aws_secret_access_key='PASSWOID', ... region_name='us-east-1', ... endpoint_url='https://minio01.k8s.example.com:9000', ... ) >>> >>> client.list_buckets() {'ResponseMetadata': {'RequestId': '171CD39BF1DA4102', 'HostId': '', 'HTTPStatusCode': 200, 'HTTPHeaders': {'accept-ranges': 'bytes', 'content-length': '461', 'content-security-policy': 'block-all-mixed-content', 'content-type': 'application/xml', 'server': 'MinIO', 'strict-transport-security': 'max-age=31536000; includeSubDomains', 'vary': 'Origin, Accept-Encoding', 'x-amz-bucket-region': 'us-east-1', 'x-amz-request-id': '171CD39BF1DA4102', 'x-content-type-options': 'nosniff', 'x-xss-protection': '1; mode=block', 'date': 'Mon, 10 Oct 2022 21:50:28 GMT'}, 'RetryAttempts': 0}, 'Buckets': [{'Name': 'airflow-logs-dev', 'CreationDate': datetime.datetime(2022, 10, 3, 16, 26, 7, 938000, tzinfo=tzlocal())},....#etc ``` There was a clue that it it might be certificate related so I tried adding `RUN cp $(python -m certifi) /etc/ssl/certs/` to my Dockerfile, and while it seems to help (no longer get a cert error with `S3Hook.check_for_bucket()`) I'm back to getting the InvalidParameterValue/GetCallerIdentity error ```python >>> from airflow.providers.amazon.aws.hooks.s3 import S3Hook >>> s3h = S3Hook('amazon_s3') >>> b = 'airflow-logs-dev' >>> s3h.check_for_bucket(b) [2022-10-11T02:39:41.134+0000] {base.py:71} INFO - Using connection ID 'amazon_s3' for task execution. [2022-10-11T02:39:41.137+0000] {connection_wrapper.py:303} INFO - AWS Connection (conn_id='amazon_s3', conn_type='aws') credentials retrieved from login and password. True # that's good >>> s3h.test_connection() (False, 'An error occurred (InvalidParameterValue) when calling the GetCallerIdentity operation: Unsupported action GetCallerIdentity') # not as good ``` Is anyone familiar with setting this kind of connection/logging config up on the more recent versions of Airflow? As far as I can tell the error is coming from the [AwsGenericHook](https://github.com/apache/airflow/blob/main/airflow/providers/amazon/aws/hooks/base_aws.py#L631) class, but it seems like an error with session tokens (STS) which isn't what I'm using in my config, so it's confusing. edit: edited for clarity, hopefully GitHub link: https://github.com/apache/airflow/discussions/26979 ---- This is an automatically sent email for [email protected]. To unsubscribe, please send an email to: [email protected]
