Re: Triggering behavior around DAG state changes

2018-11-08 Thread Brian Greene
In what you describe it doesn’t sound like a state change as much as just a dag that has as its first operator an http operator, and another at the end that communicates final state? Sent from a device with less than stellar autocorrect > On Nov 7, 2018, at 11:12 AM, Aleks Shulman > wrote:

Re: Duplicate key unique constraint error

2018-10-30 Thread Brian Greene
How do you trigger it externally? We have several custom operators that trigger other jobs and we had to be really careful or we’d get duplicate keys for the dag run and it would fail to kick off. One scheduler, but we saw it repeatedly and have it noted as a thing to watch out for. Brian

Re: Manual validation operator

2018-10-05 Thread Brian Greene
My first thought was this, but my understanding is That if you had a large number of dags “waiting” the sensor would consume all the concurrency. And what if the user doesn’t approve? How about the dag you have as it’s last step writes to an api/db the status. Then 2 other dags (or one with

Re: Solved: suppress PendingDeprecationWarning messages in airflow logs

2018-09-29 Thread Brian Greene
The operator is a thin wrapper around their SDK. The qubole SDK makes heavy use of **kwargs, and the operator just passes them through. Short of writing our own operator with named Params to keep the base operator happy and then delegate to the qubole sdk, there’s no other way to silence the

Re: execution_date - can we stop the confusion?

2018-09-27 Thread Brian Greene
and start from: >> >> A lot of new users get confused by how "execution_date" works. >> >> I recognize that some of this is a learning curve issue and some of this is >> a mindset issue but it begs the question: do enough users benefit from the >> cu

Re: execution_date - can we stop the confusion?

2018-09-26 Thread Brian Greene
It took a minute to grok, but in the larger context of how af works it makes perfect sense the way it is. Changing something so fundamentally breaking to every dag in existence should bring a comparable benefit. Beyond the avoiding teaching a concept you disagree with, what benefits does the

Re: Fundamental change - Separate DAG name and id.

2018-09-20 Thread Brian Greene
Prior to using airflow for much, on first inspection, I think I may have agreed with you. After a bit of use I’d agree with Fokko and others - this isn’t really a problem, and separating them seems to do more harm than good related to deployment. I was gonna stop there, but why? You can

Re: S3keysonsor

2018-05-21 Thread Brian Greene
I suggest it’ll work for your needs. Sent from a device with less than stellar autocorrect > On May 21, 2018, at 10:16 AM, purna pradeep wrote: > > Hi , > > I’m trying to evaluate airflow to see if it suits my needs. > > Basically i can have below steps in a DAG > >

Re: How to know the DAG is starting to run

2018-05-11 Thread Brian Greene
Okay I’ll bite... WT* does that mean? one of the best things about airflow is how easy it is to connect disparate systems... some would even say that’s much of the reason it exists.. It records reasonable metadata into an rdms (I suppose you could argue for other designs, but it’s pretty

Re: About the project support in Airflow

2018-04-25 Thread Brian Greene
+1 Sent from a device with less than stellar autocorrect > On Apr 25, 2018, at 12:04 PM, James Meickle wrote: > > Another reason you would want separated infrastructure is that there are a > lot of ways to exhaust Airflow resources or otherwise cause contention - >

Re: Slot pools correct usage

2018-04-07 Thread Brian Greene
un in parallel. > I think I have setup the tasks correctly to use pool but may be missing the > priority_weight setting correctly. Appreciate if you could run by your > configs just to see if I am not missing any simple point. > > thanks much, > Manish > >

Re: Slot pools correct usage

2018-04-06 Thread Brian Greene
To be clear, you’re hoping that setting the slots to 1 will cause the tasks across district dags to run in order based on the assumption that they’ll queue up and then execute off the pool? I don’t think it will quite work that way - there’s no guarantee the scheduler will execute your tasks

Re: RBAC Update

2018-03-30 Thread Brian Greene
I’d think we’d have privilege ‘can_view’ etc, and then a join table (priv) <-> (dagid) <-> (user/group). Then it’s a simple query to get the perms for a given dag (as you list In option 2 below). It also makes a “secure by default” easy - a lack of entries in that table for a dag can mean

Re: How to have dynamic downstream tasks that depend on data generated upstream

2018-03-15 Thread Brian Greene
My $.02 - posted to SO as well. I fought this use case for a long time. In short, a dag that’s built based on the state of a changing resource, especially a db table, doesn’t fly so well in airflow. My solution was to write a small custom operator that’s a subclass if truggerdagoperator, it

how to pass parameters to subdag operator from triggered dag

2018-01-25 Thread Brian Greene
Good evening, Here's the setup - I'm following a "trigger dag -> called(processing) dag" kind of structure. Then I set timespans, catchups, etc on the triggering dag. This allows me to separate the scheduling from the execution, and so far is working great. The triggered dag uses parameters