[GitHub] [airflow] potiuk commented on pull request #8889: Support custom Variable implementation
potiuk commented on pull request #8889: URL: https://github.com/apache/airflow/pull/8889#issuecomment-631885153 I think maybe it's the right time to raise that as a separate thread on the devlist so that other people can state their opinion as well? I agree with @ashb and @kaxil that creating a custom variable implementation in this form might be super confusing and in the form proposed it's -1 from me. Regarding the case you mentioned - using variables for sharing data between tasks - I believe that's not what it is intended for and it should be strongly discouraged. Variables should be relatively static, changeable externally. Using variables for sharing data between tasks breaks the idempotency mechanism built-in Airflow via Xcom (separate value stored for each dag run). If you want to use variable and maintain idempotency you have to reproduce it and make your variable key contain the dag_run_id which I think many people would not do that or even think about if they use a convenient variable mechanism. It would have to create multiple variables with a similar name + dag_run_id. That would result in the number of variables growing continuously over time - I think this is a very, very bad use of variables in this case - and having variables effectively read-only with secrets is actually a nice way to avoid this behavior. In the example from (a bit related but not the same as mentioned by @ashb ) the discussion about hooks (https://lists.apache.org/thread.html/re2bd54b0682e14fac6c767895311baf411ea10b18685474f7683a2f5%40%3Cdev.airflow.apache.org%3E). I created a similar example where Secret backend could be used for sharing data between tasks. But then it would be custom implementation, and we can encourage using pattern where dag_run is part of the secret key name, or even better - you can generate and keep the name of the secret in the XCOM. This is the "standard" Airflow pattern for sharing data between tasks and once we have separate hooks for all secret backend, it is something that should be used rather than variables if you would like to keep the shared data "secret". This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [airflow] potiuk commented on pull request #8889: Support custom Variable implementation
potiuk commented on pull request #8889: URL: https://github.com/apache/airflow/pull/8889#issuecomment-631620756 > > What's the use-case of writing Variables to an external backend? > > Say we are in a migration process to move DAGs from an old airflow cluster to new airflow cluster. And many DAGs shared the same airflow variable, some of them reads variable and some of them writes the variable. We need a way to double write the variable to both cluster during the migration, hence the PR to support that. Why not writing and running it as custom code? it does seem like your specific case and not a common feature that people would be using. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [airflow] potiuk commented on pull request #8889: Support custom Variable implementation
potiuk commented on pull request #8889: URL: https://github.com/apache/airflow/pull/8889#issuecomment-631483938 > Of course there is more than one project that does the same thing called Vault. Of course. > > Even with Azure Vault not being granular enough, this feature is still a -1 from me. I'd be happy to see the ability to CUD connections/secrets via the existing Secrets backends -- this is already our interface to a "custom" backend of Variables. We should not have another. Yep. Fully agree on that. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [airflow] potiuk commented on pull request #8889: Support custom Variable implementation
potiuk commented on pull request #8889: URL: https://github.com/apache/airflow/pull/8889#issuecomment-631473133 It would be great to have an understanding how much we can rely on granularity being available. But I believe (at least currently) readonly/write access for whole vault is the common denominator. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [airflow] potiuk commented on pull request #8889: Support custom Variable implementation
potiuk commented on pull request #8889: URL: https://github.com/apache/airflow/pull/8889#issuecomment-631471582 Not in Azure's Vault for example (and I think this it might be one of the most popular in Corporate world): https://docs.microsoft.com/en-us/azure/key-vault/general/secure-your-key-vault#key-vault-access-policies Key Vault access policies grant permissions separately to keys, secrets, and certificate. You can grant a user access only to keys and not to secrets. Access permissions for keys, secrets, and certificates are at the vault level. Key Vault access policies don't support granular, object-level permissions like a specific key, secret, or certificate. To set access policies for a key vault, use the Azure portal, the Azure CLI, Azure PowerShell, or the Key Vault Management REST APIs. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [airflow] potiuk commented on pull request #8889: Support custom Variable implementation
potiuk commented on pull request #8889: URL: https://github.com/apache/airflow/pull/8889#issuecomment-631438618 @ashb -> Understood. Agree. And I understand the concerns. Ans I agree that if the configuration for Airlfow is going to be managed by the UI (And readonly set fo False), then it's OK to use existing Secret Backend to do writes. I can easily imagine both cases a) readonly = True - where all the secrets are managed externally b) readonly = False - where the Airflow UI is used to do it I think current vault Implementation (correct me if I am wrong) are "all-or-nothing" so you can't configure read/write separately for different keys/groups of keys so I think this is the biggest granularity you can do (readonly flag for whole installation). But I think it's good enough. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [airflow] potiuk commented on pull request #8889: Support custom Variable implementation
potiuk commented on pull request #8889: URL: https://github.com/apache/airflow/pull/8889#issuecomment-631359757 @milton0825 - see the discussion I started few days ago about Secret Hooks. https://lists.apache.org/thread.html/re2bd54b0682e14fac6c767895311baf411ea10b18685474f7683a2f5%40%3Cdev.airflow.apache.org%3E I think Secret Backend is good in its support to read connections and variables - because it is there to keep the airflow configuration. What I think you want to achieve can be done via Secret Hooks - so the usual airflow mechanism to talk to external services. You could very easily write custom operators that will be using hooks to read/write secrets to secret backends. And what's best here - you will be able to use different secret backends to keep Airflow configuration and different to store such "secret" values that should be writeable. I think this is a perfect setup from the security point of view - you do not want to give write access to the secret backend that keeps Airflow configuration to all the jobs. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org