[GitHub] [airflow] potiuk commented on pull request #8889: Support custom Variable implementation

2020-05-20 Thread GitBox


potiuk commented on pull request #8889:
URL: https://github.com/apache/airflow/pull/8889#issuecomment-631885153


   I think maybe it's the right time to raise that as a separate thread on the 
devlist so that other people can state their opinion as well?
   
   I agree with @ashb  and @kaxil that creating a custom variable 
implementation in this form might be super confusing and in the form proposed 
it's -1 from me.
   
   Regarding the case you mentioned - using variables for sharing data between 
tasks - I believe that's not what it is intended for and it should be strongly 
discouraged. Variables should be relatively static, changeable externally. 
Using variables for sharing data between tasks breaks the idempotency mechanism 
built-in Airflow via Xcom (separate value stored for each dag run). 
   
   If you want to use variable and maintain idempotency you have to reproduce 
it and make your variable key contain the dag_run_id which I think many people 
would not do that or even think about if they use a convenient variable 
mechanism. It would have to create multiple variables with a similar name + 
dag_run_id. That would result in the number of variables growing continuously 
over time - I think this is a very, very bad use of variables in this case - 
and having variables effectively read-only with secrets is actually a nice way 
to avoid this behavior.
   
   In the example from (a bit related but not the same as mentioned by @ashb ) 
the discussion about hooks 
(https://lists.apache.org/thread.html/re2bd54b0682e14fac6c767895311baf411ea10b18685474f7683a2f5%40%3Cdev.airflow.apache.org%3E).
 I created a similar example where Secret backend could be used for sharing 
data between tasks.  But then it would be custom implementation, and we can 
encourage using pattern where dag_run is part of the secret key name, or even 
better - you can generate and keep the name of the secret in the XCOM. This is 
the "standard" Airflow pattern for sharing data between tasks and once we have 
separate hooks for all secret backend, it is something that should be used 
rather than variables if you would like to keep the shared data "secret".
   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [airflow] potiuk commented on pull request #8889: Support custom Variable implementation

2020-05-20 Thread GitBox


potiuk commented on pull request #8889:
URL: https://github.com/apache/airflow/pull/8889#issuecomment-631620756


   > > What's the use-case of writing Variables to an external backend?
   > 
   > Say we are in a migration process to move DAGs from an old airflow cluster 
to new airflow cluster. And many DAGs shared the same airflow variable, some of 
them reads variable and some of them writes the variable. We need a way to 
double write the variable to both cluster during the migration, hence the PR to 
support that.
   
   Why not writing and running it as custom code? it does seem like your 
specific case and not a common feature that people would be using.



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [airflow] potiuk commented on pull request #8889: Support custom Variable implementation

2020-05-20 Thread GitBox


potiuk commented on pull request #8889:
URL: https://github.com/apache/airflow/pull/8889#issuecomment-631483938


   > Of course there is more than one project that does the same thing called 
Vault. Of course.
   > 
   > Even with Azure Vault not being granular enough, this feature is still a 
-1 from me. I'd be happy to see the ability to CUD connections/secrets via the 
existing Secrets backends -- this is already our interface to a "custom" 
backend of Variables. We should not have another.
   
   Yep. Fully agree on that.



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [airflow] potiuk commented on pull request #8889: Support custom Variable implementation

2020-05-20 Thread GitBox


potiuk commented on pull request #8889:
URL: https://github.com/apache/airflow/pull/8889#issuecomment-631473133


   It would be great to have an understanding how much we can rely on 
granularity being available. But I believe (at least currently) readonly/write 
access for whole vault is the common denominator.



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [airflow] potiuk commented on pull request #8889: Support custom Variable implementation

2020-05-20 Thread GitBox


potiuk commented on pull request #8889:
URL: https://github.com/apache/airflow/pull/8889#issuecomment-631471582


   Not in Azure's Vault for example (and I think this it might be one of the 
most popular in Corporate world): 
   
   
https://docs.microsoft.com/en-us/azure/key-vault/general/secure-your-key-vault#key-vault-access-policies
   
   Key Vault access policies grant permissions separately to keys, secrets, and 
certificate. You can grant a user access only to keys and not to secrets. 
Access permissions for keys, secrets, and certificates are at the vault level. 
Key Vault access policies don't support granular, object-level permissions like 
a specific key, secret, or certificate. To set access policies for a key vault, 
use the Azure portal, the Azure CLI, Azure PowerShell, or the Key Vault 
Management REST APIs.
   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [airflow] potiuk commented on pull request #8889: Support custom Variable implementation

2020-05-20 Thread GitBox


potiuk commented on pull request #8889:
URL: https://github.com/apache/airflow/pull/8889#issuecomment-631438618


   @ashb -> Understood. Agree. And I understand the concerns. Ans I agree that 
if the configuration for Airlfow is going to be managed by the UI (And readonly 
set fo False), then it's OK to use existing Secret Backend to do writes. 
   
   I can easily imagine both cases
   
   a) readonly = True - where all the secrets are managed externally
   b) readonly = False - where the Airflow UI is used to do it
   
   I think current vault Implementation (correct me if I am wrong) are 
"all-or-nothing" so you can't configure read/write separately for different 
keys/groups of keys so I think this is the biggest granularity you can do 
(readonly flag for whole installation). But I think it's good enough.
   
   
   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [airflow] potiuk commented on pull request #8889: Support custom Variable implementation

2020-05-20 Thread GitBox


potiuk commented on pull request #8889:
URL: https://github.com/apache/airflow/pull/8889#issuecomment-631359757


   @milton0825 - see the discussion I started few days ago about Secret Hooks. 
https://lists.apache.org/thread.html/re2bd54b0682e14fac6c767895311baf411ea10b18685474f7683a2f5%40%3Cdev.airflow.apache.org%3E
   
   I think Secret Backend is good in its support to read connections and 
variables - because it is there to keep the airflow configuration. What I think 
you want to achieve can be done via Secret Hooks - so the usual airflow 
mechanism to talk to external services. You could very easily write custom 
operators that will be using hooks to read/write secrets to secret backends. 
And what's best here - you will be able to use different secret backends to 
keep Airflow configuration and different to store such "secret" values that 
should be writeable. I think this is a perfect setup from the security point of 
view - you do not want to give write access to the secret backend that keeps 
Airflow configuration to all the jobs.
   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org