keeed opened a new issue, #43238:
URL: https://github.com/apache/airflow/issues/43238

   ### Apache Airflow Provider(s)
   
   amazon
   
   ### Versions of Apache Airflow Providers
   
   
[apache-airflow-providers-amazon](https://airflow.apache.org/docs/apache-airflow-providers-amazon/8.28.0)
   - versionĀ  8.28.0
   
   ### Apache Airflow version
   
   2.10.1
   
   ### Operating System
   
   MWAA
   
   ### Deployment
   
   Amazon (AWS) MWAA
   
   ### Deployment details
   
   Vanilla Deployment
   
   ### What happened
   
   The current GlueCatalogHook doesn't pass the CatalogId property during boto3 
calls as seen from here:
   
   
[GlueCatalogHook](https://airflow.apache.org/docs/apache-airflow-providers-amazon/stable/_modules/airflow/providers/amazon/aws/hooks/glue_catalog.html)
   
   ### What you think should happen instead
   
   There should be a way to pass the CatalogId as there will be users that will 
need to pass the CatalogId.
   - This happened to my use case at work.
   
   ### How to reproduce
   
   Try to target a Glue database and table that has an associated CatalogId 
where the CatalogId is not the default AWS AccountId and all operations will 
fail.
   
   ### Anything else
   
   I was able to have a workaround by copying the implementation of the actual 
GlueCatalogHook and changing our sensors to use this ExtendedGlueCatalogHook 
where we add the CatalogId to the calls, example:
   
   ```
    def get_partitions(
           self,
           catalog_id: str,
           database_name: str,
           table_name: str,
           expression: str = "",
           page_size: int | None = None,
           max_items: int | None = None,
       ) -> set[tuple]:
      ...
   
      response = paginator.paginate(
               CatalogId=catalog_id, <=============== This should be added as 
an optional parameter
               DatabaseName=database_name, TableName=table_name, 
Expression=expression, PaginationConfig=config
           )
   
           partitions = set()
           for page in response:
               for partition in page["Partitions"]:
                   partitions.add(tuple(partition["Values"]))
   
           return partitions
   ...
   ```
   
   If anyone from the AWS team is going to work on this one, I'm also part of 
Amazon and you reach reach me @keds and I can show you what we did on this one.
   
   Thanks!
   
   ### Are you willing to submit PR?
   
   - [X] Yes I am willing to submit a PR!
   
   ### Code of Conduct
   
   - [X] I agree to follow this project's [Code of 
Conduct](https://github.com/apache/airflow/blob/main/CODE_OF_CONDUCT.md)
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to