romibuzi opened a new issue, #27592:
URL: https://github.com/apache/airflow/issues/27592

   ### Apache Airflow Provider(s)
   
   amazon
   
   ### Versions of Apache Airflow Providers
   
   apache-airflow-providers-amazon==6.0.0
   
   ### Apache Airflow version
   
   2.2.5
   
   ### Operating System
   
   Linux Ubuntu
   
   ### Deployment
   
   Virtualenv installation
   
   ### Deployment details
   
   Airflow deployed on ec2 instance
   
   ### What happened
   
   `GlueJobOperator` from airflow-amazon-provider is not updating job 
configuration (like its arguments or number of workers for example) if the job 
already exists and if there was a change in the configuration for example:
   
   ```python
    def get_or_create_glue_job(self) -> str:
           """
           Creates(or just returns) and returns the Job name
           :return:Name of the Job
           """
           glue_client = self.get_conn()
           try:
               get_job_response = glue_client.get_job(JobName=self.job_name)
               self.log.info("Job Already exist. Returning Name of the job")
               return get_job_response['Job']['Name']
   
           except glue_client.exceptions.EntityNotFoundException:
               self.log.info("Job doesn't exist. Now creating and running AWS 
Glue Job")
   
           ...
   ```
   
   Is there a particular reason to not doing it? Or it was just not done during 
the implementation of the operarot?
   
   ### What you think should happen instead
   
   _No response_
   
   ### How to reproduce
   
   Create a `GlueJobOperator` with a simple configuration:
   
   ```python
   from airflow.providers.amazon.aws.operators.glue import GlueJobOperator
   
   submit_glue_job = GlueJobOperator(
       task_id='submit_glue_job',
       job_name='test_glue_job
       job_desc='test glue job',
       script_location='s3://bucket/path/to/the/script/file',
       script_args={},
       s3_bucket='bucket',
       concurrent_run_limit=1,
       retry_limit=0,
       num_of_dpus=5,
       wait_for_completion=False
   )
   ```
   
   Then update one of the initial configuration like `num_of_dpus=10` and 
validate that the operator is not updating glue job configuration on AWS when 
it is rerun.
   
   ### Anything else
   
   There is `GlueCrawlerOperator` which is similar to GlueJobOperator and is 
doing it:
   
   ```python
       def execute(self, context: Context):
           """
           Executes AWS Glue Crawler from Airflow
   
           :return: the name of the current glue crawler.
           """
           crawler_name = self.config['Name']
           if self.hook.has_crawler(crawler_name):
               self.hook.update_crawler(**self.config)
           else:
               self.hook.create_crawler(**self.config)
           ...
   ```
   
   This behavior could be reproduced in the AWSGlueJobOperator if we agree to 
do it.
   
   ### Are you willing to submit PR?
   
   - [X] Yes I am willing to submit a PR!
   
   ### Code of Conduct
   
   - [X] I agree to follow this project's [Code of 
Conduct](https://github.com/apache/airflow/blob/main/CODE_OF_CONDUCT.md)
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to