Re: programmatically creating and airflow quirks

2018-11-22 Thread Alex Guziel
Yup.

On Thu, Nov 22, 2018 at 3:16 PM soma dhavala  wrote:

>
>
> On Nov 23, 2018, at 3:28 AM, Alex Guziel  wrote:
>
> It’s because of this
>
> “When searching for DAGs, Airflow will only consider files where the
> string “airflow” and “DAG” both appear in the contents of the .py file.”
>
>
> Have not noticed it.  From airflow/models.py, in process_file — (both in
> 1.9 and 1.10)
> ..
> if not all([s in content for s in (b'DAG', b'airflow')]):
> ..
> is looking for those strings and if they are not found, it is returning
> without loading the DAGs.
>
>
> So having “airflow” and “DAG”  dummy strings placed somewhere will make it
> work?
>
>
> On Thu, Nov 22, 2018 at 2:27 AM soma dhavala 
> wrote:
>
>>
>>
>> On Nov 22, 2018, at 3:37 PM, Alex Guziel  wrote:
>>
>> I think this is what is going on. The dags are picked by local variables.
>> I.E. if you do
>> dag = Dag(...)
>> dag = Dag(…)
>>
>>
>> from my_module import create_dag
>>
>> for file in yaml_files:
>> dag = create_dag(file)
>> globals()[dag.dag_id] = dag
>>
>> You notice that create_dag is in a different module. If it is in the
>> same scope (file), it will be fine.
>>
>>
>>
>> Only the second dag will be picked up.
>>
>> On Thu, Nov 22, 2018 at 2:04 AM Soma S Dhavala 
>> wrote:
>>
>>> Hey AirFlow Devs:
>>> In our organization, we build a Machine Learning WorkBench with AirFlow
>>> as
>>> an orchestrator of the ML Work Flows, and have wrapped AirFlow python
>>> operators to customize the behaviour. These work flows are specified in
>>> YAML.
>>>
>>> We drop a DAG loader (written python) in the default location airflow
>>> expects the DAG files.  This DAG loader reads the specified YAML files
>>> and
>>> converts them into airflow DAG objects. Essentially, we are
>>> programmatically creating the DAG objects. In order to support muliple
>>> parsers (yaml, json etc), we separated the DAG creation from loading. But
>>> when a DAG is created (in a separate module) and made available to the
>>> DAG
>>> loaders, airflow does not pick it up. As an example, consider that I
>>> created a DAG picked it, and will simply unpickle the DAG and give it to
>>> airflow.
>>>
>>> However, in current avatar of airfow, the very creation of DAG has to
>>> happen in the loader itself. As far I am concerned, airflow should not
>>> care
>>> where and how the DAG object is created, so long as it is a valid DAG
>>> object. The workaround for us is to mix parser and loader in the same
>>> file
>>> and drop it in the airflow default dags folder. During dag_bag creation,
>>> this file is loaded up with import_modules utility and shows up in the
>>> UI.
>>> While this is a solution, but it is not clean.
>>>
>>> What do DEVs think about a solution to this problem? Will saving the DAG
>>> to
>>> the db and reading it from the db work? Or some core changes need to
>>> happen
>>> in the dag_bag creation. Can dag_bag take a bunch of "created" DAGs.
>>>
>>> thanks,
>>> -soma
>>>
>>
>>
>


Re: programmatically creating and airflow quirks

2018-11-22 Thread Alex Guziel
It’s because of this

“When searching for DAGs, Airflow will only consider files where the string
“airflow” and “DAG” both appear in the contents of the .py file.”

On Thu, Nov 22, 2018 at 2:27 AM soma dhavala  wrote:

>
>
> On Nov 22, 2018, at 3:37 PM, Alex Guziel  wrote:
>
> I think this is what is going on. The dags are picked by local variables.
> I.E. if you do
> dag = Dag(...)
> dag = Dag(…)
>
>
> from my_module import create_dag
>
> for file in yaml_files:
> dag = create_dag(file)
> globals()[dag.dag_id] = dag
>
> You notice that create_dag is in a different module. If it is in the same
> scope (file), it will be fine.
>
>
>
> Only the second dag will be picked up.
>
> On Thu, Nov 22, 2018 at 2:04 AM Soma S Dhavala 
> wrote:
>
>> Hey AirFlow Devs:
>> In our organization, we build a Machine Learning WorkBench with AirFlow as
>> an orchestrator of the ML Work Flows, and have wrapped AirFlow python
>> operators to customize the behaviour. These work flows are specified in
>> YAML.
>>
>> We drop a DAG loader (written python) in the default location airflow
>> expects the DAG files.  This DAG loader reads the specified YAML files and
>> converts them into airflow DAG objects. Essentially, we are
>> programmatically creating the DAG objects. In order to support muliple
>> parsers (yaml, json etc), we separated the DAG creation from loading. But
>> when a DAG is created (in a separate module) and made available to the DAG
>> loaders, airflow does not pick it up. As an example, consider that I
>> created a DAG picked it, and will simply unpickle the DAG and give it to
>> airflow.
>>
>> However, in current avatar of airfow, the very creation of DAG has to
>> happen in the loader itself. As far I am concerned, airflow should not
>> care
>> where and how the DAG object is created, so long as it is a valid DAG
>> object. The workaround for us is to mix parser and loader in the same file
>> and drop it in the airflow default dags folder. During dag_bag creation,
>> this file is loaded up with import_modules utility and shows up in the UI.
>> While this is a solution, but it is not clean.
>>
>> What do DEVs think about a solution to this problem? Will saving the DAG
>> to
>> the db and reading it from the db work? Or some core changes need to
>> happen
>> in the dag_bag creation. Can dag_bag take a bunch of "created" DAGs.
>>
>> thanks,
>> -soma
>>
>
>


Re: programmatically creating and airflow quirks

2018-11-22 Thread Alex Guziel
I think this is what is going on. The dags are picked by local variables.
I.E. if you do
dag = Dag(...)
dag = Dag(...)

Only the second dag will be picked up.

On Thu, Nov 22, 2018 at 2:04 AM Soma S Dhavala 
wrote:

> Hey AirFlow Devs:
> In our organization, we build a Machine Learning WorkBench with AirFlow as
> an orchestrator of the ML Work Flows, and have wrapped AirFlow python
> operators to customize the behaviour. These work flows are specified in
> YAML.
>
> We drop a DAG loader (written python) in the default location airflow
> expects the DAG files.  This DAG loader reads the specified YAML files and
> converts them into airflow DAG objects. Essentially, we are
> programmatically creating the DAG objects. In order to support muliple
> parsers (yaml, json etc), we separated the DAG creation from loading. But
> when a DAG is created (in a separate module) and made available to the DAG
> loaders, airflow does not pick it up. As an example, consider that I
> created a DAG picked it, and will simply unpickle the DAG and give it to
> airflow.
>
> However, in current avatar of airfow, the very creation of DAG has to
> happen in the loader itself. As far I am concerned, airflow should not care
> where and how the DAG object is created, so long as it is a valid DAG
> object. The workaround for us is to mix parser and loader in the same file
> and drop it in the airflow default dags folder. During dag_bag creation,
> this file is loaded up with import_modules utility and shows up in the UI.
> While this is a solution, but it is not clean.
>
> What do DEVs think about a solution to this problem? Will saving the DAG to
> the db and reading it from the db work? Or some core changes need to happen
> in the dag_bag creation. Can dag_bag take a bunch of "created" DAGs.
>
> thanks,
> -soma
>


Re: Pinning dependencies for Apache Airflow

2018-10-04 Thread Alex Guziel
FWIW, there's some value in using virtualenv with Docker to isolate
yourself from your system's Python.

It's worth noting that requirements files can link other requirements
files, so that would make groups easier, but not that pip in one run has no
guarantee of transitive dependencies not conflicting or overriding. You
need pip check for that or use --no-deps.

On Thu, Oct 4, 2018 at 5:19 PM Driesprong, Fokko 
wrote:

> Hi Jarek,
>
> Thanks for bringing this up. I missed the discussion on Slack since I'm on
> holiday, but I saw the thread and it was way too interesting, and therefore
> this email :)
>
> This is actually something that we need to address asap. Like you mention,
> we saw it earlier that specific transient dependencies are not compatible
> and then we end up with a breaking CI, or even worse, a broken release.
> Earlier we had in the setup.py the fixed versions (==) and in a separate
> requirements.txt the requirements for the CI. This was also far from
> optimal since we had two versions of the requirements.
>
> I like the idea that you are proposing. Maybe we can do an experiment with
> it, because of the nature of Airflow (orchestrating different systems), we
> have a huge list of dependencies. To not install everything, we've created
> groups. For example specific libraries when you're using the Google Cloud,
> Elastic, Druid, etc. So I'm curious how it will work with the `
> extras_require` of Airflow
>
> Regarding the pipenv. I don't use any pipenv/virtualenv anymore. For me
> Docker is much easier to work with. I'm also working on a PR to get rid of
> tox for the testing, and move to a more Docker idiomatic test pipeline.
> Curious what you thoughts are on that.
>
> Cheers, Fokko
>
> Op do 4 okt. 2018 om 15:39 schreef Arthur Wiedmer <
> arthur.wied...@gmail.com
> >:
>
> > Thanks Jakob!
> >
> > I think that this is a huge risk of Slack.
> > I am not against Slack as a support channel, but it is a slippery slope
> to
> > have more and more decisions/conversations happening there, contrary to
> > what we hope to achieve with the ASF.
> >
> > When we are starting to discuss issues of development, extensions and
> > improvements, it is important for the discussion to happen in the mailing
> > list.
> >
> > Jarek, I wouldn't worry too much, we are still in the process of learning
> > as a community. Welcome and thank you for your contribution!
> >
> > Best,
> > Arthur.
> >
> > On Thu, Oct 4, 2018 at 1:42 PM Jarek Potiuk 
> > wrote:
> >
> > > Thanks for pointing it out Jakob.
> > >
> > > I am still very fresh in the ASF community and learning the ropes and
> > > etiquette and code of conduct. Apologies for my ignorance.
> > > I re-read the conduct and FAQ now again - with more understanding and
> > will
> > > pay more attention to wording in the future. As you mentioned it's more
> > the
> > > wording than intentions, but since it was in TL;DR; it has stronger
> > > consequences.
> > >
> > > BTW. Thanks for actually following the code of conduct and pointing it
> > out
> > > in respectful manner. I really appreciate it.
> > >
> > > J.
> > >
> > > Principal Software Engineer
> > > Phone: +48660796129
> > >
> > > On Thu, 4 Oct 2018, 20:41 Jakob Homan,  wrote:
> > >
> > > > > TL;DR; A change is coming in the way how dependencies/requirements
> > are
> > > > > specified for Apache Airflow - they will be fixed rather than
> > flexible
> > > > (==
> > > > > rather than >=).
> > > >
> > > > > This is follow up after Slack discussion we had with Ash and Kaxil
> -
> > > > > summarising what we propose we'll do.
> > > >
> > > > Hey all.  It's great that we're moving this discussion back from
> Slack
> > > > to the mailing list.  But I've gotta point out that the wording needs
> > > > a small but critical fix up:
> > > >
> > > > "A change *is* coming... they *will* be fixed"
> > > >
> > > > needs to be
> > > >
> > > > "We'd like to propose a change... We would like to make them fixed."
> > > >
> > > > The first says that this decision has been made and the result of the
> > > > decision, which was made on Slack, is being reported back to the
> > > > mailing list.  The second is more accurate to the rest of 

Re: Pinning dependencies for Apache Airflow

2018-10-04 Thread Alex Guziel
You should run `pip check` to ensure no conflicts. Pip does not do this on
its own.

On Thu, Oct 4, 2018 at 9:20 AM Jarek Potiuk 
wrote:

> Great that this discussion already happened :). Lots of useful things in
> it. And yes - it means pinning in requirement.txt - this is how pip-tools
> work.
>
> J.
>
> Principal Software Engineer
> Phone: +48660796129
>
> On Thu, 4 Oct 2018, 18:14 Arthur Wiedmer, 
> wrote:
>
> > Hi Jarek,
> >
> > I will +1 the discussion Dan is referring to and George's advice.
> >
> > I just want to double check we are talking about pinning in
> > requirements.txt only.
> >
> > This offers the ability to
> > pip install -r requirements.txt
> > pip install --no-deps airflow
> > For a guaranteed install which works.
> >
> > Several different requirement files can be provided for specific use
> cases,
> > like a stable dev one for instance for people wanting to work on
> operators
> > and non-core functions.
> >
> > However, I think we should proactively test in CI against unpinned
> > dependencies (though it might be a separate case in the matrix) , so that
> > we get advance warning if possible that things will break.
> > CI downtime is not a bad thing here, it actually caught a problem :)
> >
> > We should unpin as possible in setup.py to only maintain minimum required
> > compatibility. The process of pinning in setup.py is extremely
> detrimental
> > when you have a large number of python libraries installed with different
> > pinned versions.
> >
> > Best,
> > Arthur
> >
> > On Thu, Oct 4, 2018 at 8:36 AM Dan Davydov  >
> > wrote:
> >
> > > Relevant discussion about this:
> > >
> > >
> >
> https://github.com/apache/incubator-airflow/pull/1809#issuecomment-257502174
> > >
> > > On Thu, Oct 4, 2018 at 11:25 AM Jarek Potiuk  >
> > > wrote:
> > >
> > > > TL;DR; A change is coming in the way how dependencies/requirements
> are
> > > > specified for Apache Airflow - they will be fixed rather than
> flexible
> > > (==
> > > > rather than >=).
> > > >
> > > > This is follow up after Slack discussion we had with Ash and Kaxil -
> > > > summarising what we propose we'll do.
> > > >
> > > > *Problem:*
> > > > During last few weeks we experienced quite a few downtimes of
> TravisCI
> > > > builds (for all PRs/branches including master) as some of the
> > transitive
> > > > dependencies were automatically upgraded. This because in a number of
> > > > dependencies we have  >= rather than == dependencies.
> > > >
> > > > Whenever there is a new release of such dependency, it might cause
> > chain
> > > > reaction with upgrade of transitive dependencies which might get into
> > > > conflict.
> > > >
> > > > An example was Flask-AppBuilder vs flask-login transitive dependency
> > with
> > > > click. They started to conflict once AppBuilder has released version
> > > > 1.12.0.
> > > >
> > > > *Diagnosis:*
> > > > Transitive dependencies with "flexible" versions (where >= is used
> > > instead
> > > > of ==) is a reason for "dependency hell". We will sooner or later hit
> > > other
> > > > cases where not fixed dependencies cause similar problems with other
> > > > transitive dependencies. We need to fix-pin them. This causes
> problems
> > > for
> > > > both - released versions (cause they stop to work!) and for
> development
> > > > (cause they break master builds in TravisCI and prevent people from
> > > > installing development environment from the scratch.
> > > >
> > > > *Solution:*
> > > >
> > > >- Following the old-but-good post
> > > >https://nvie.com/posts/pin-your-packages/ we are going to fix the
> > > > pinned
> > > >dependencies to specific versions (so basically all dependencies
> are
> > > >"fixed").
> > > >- We will introduce mechanism to be able to upgrade dependencies
> > with
> > > >pip-tools (https://github.com/jazzband/pip-tools). We might also
> > > take a
> > > >look at pipenv: https://pipenv.readthedocs.io/en/latest/
> > > >- People who would like to upgrade some dependencies for their PRs
> > > will
> > > >still be able to do it - but such upgrades will be in their PR
> thus
> > > they
> > > >will go through TravisCI tests and they will also have to be
> > specified
> > > > with
> > > >pinned fixed versions (==). This should be part of review process
> to
> > > > make
> > > >sure new/changed requirements are pinned.
> > > >- In release process there will be a point where an upgrade will
> be
> > > >attempted for all requirements (using pip-tools) so that we are
> not
> > > > stuck
> > > >with older releases. This will be in controlled PR environment
> where
> > > > there
> > > >will be time to fix all dependencies without impacting others and
> > > likely
> > > >enough time to "vet" such changes (this can be done for alpha/beta
> > > > releases
> > > >for example).
> > > >- As a side effect dependencies specification will become far
> > simpler
> > > >and straightforward.
> > > >
> > > > Happy to

Re: Fundamental change - Separate DAG name and id.

2018-09-24 Thread Alex Guziel
I think decoupling dag_id and display name could be confusing and
cumbersome. As for readme, DAG already has a field called description which
I think is close to what Alex is describing (I believe it is displayed by
the UI).

On Mon, Sep 24, 2018 at 3:12 PM Alex Tronchin-James 949-412-7220 <
alex.n.ja...@gmail.com> wrote:

> Re: [Brian Greene] "How does filename matter?  Frankly I wish the filename
> was REQUIRED to be the dag name so people would quit confusing themselves
> by mismatching them !"
>
> FWIW in the Facebook predecessor to airflow, the file path/name WAS the dag
> name. E.g. if your dag resided in best_team/new_project/sweet_dag.py then
> the dag name would be best_team.new_project.sweet_dag
> All tasks were identified by their variable name after that prefix: E.g. if
> best_team.new_project.sweet_dag defines an operator in a variable named
> task1, then the respective task_id is
> best_team.new_project.sweet_dag.task1.
>
> Airflow provides additional flexibility to specify DAG and task names to
> avoid the sometimes annoyingly long task names this resulted in and allow
> DAG/task names without forcing a code directory structure and python's
> variable naming restrictions, and I think this is a Good Thing.
>
> It seems like airflowuser is trying to provide additional metadata beyond
> the DAG/task names (so far, a DAG 'title' distinct from the ID). I've
> provided this through a README.md included in the DAG source directory, but
> maybe it would be a win to instead add a DAG parameter named 'readme' of
> string type which can include a docstring or even markdown to provide any
> desired additional metadata? This could then be displayed by the UI to
> simplify access to any such provided DAG documentation.
>
> 🍿
>
>
>
> On Thu, Sep 20, 2018 at 10:45 PM Brian Greene <
> br...@heisenbergwoodworking.com> wrote:
>
> > Prior to using airflow for much, on first inspection, I think I may have
> > agreed with you.
> >
> > After a bit of use I’d agree with Fokko and others - this isn’t really a
> > problem, and separating them seems to do more harm than good related to
> > deployment.
> >
> > I was gonna stop there, but why?
> >
> > You can add a task to a dag that’s deployed and has run and still view
> > history.  The “new” task shows up white Squares in the old dags.  nobody
> > said you’re required to also rename the dag when you do so this.  If your
> > process or desire or design determines you need to rename it, well then
> by
> > definition... isn’t it a new thing without a history?  Airflow is
> > implementing exactly that.
> >
> > One could argue that renaming to reflect exact purpose is good practice.
> > Yes, I’d agree, but again following that logic if it’s a small enough
> > change to “slip in” then the name likely shouldn’t change.  If it’s big
> > enough I want to change the name then it’s a big enough change that I’m
> > functionally running something “new”, and I expect to need to account for
> > that.  Airflow is enforcing that logic by coupling the name to the
> > deployment of what you said was a new process.
> >
> > One might put forth that changing the name to be more descriptive In the
> > ui makes it easier for support staff.  I think perhaps if that’s your
> > challenge it’s not airflow that’s a problem.  Dags are of course
> documented
> > elsewhere besides their name, right?  Yeah it’s self documenting (and the
> > graphs are cool), but I have to assume there’s something besides the NAME
> > to tell people what it does.  Additionally, far more than the name is
> > required for even an operator or monitor watcher to take action - you
> don’t
> > expect them to know which tasks to rerun or how to troubleshoot failures
> > just based on your “now most descriptive name in the UI” do you?
> >
> > I spent time In an informatica shop where all the jobs were numbered.
> > Numbered.  Let’s be more exact... their NAMES were NUMBERS like 56709.
> > Terrible, but 100% worked, because while a descriptive name would have
> been
> > useful, the name is the thing that’s supposed to NOT CHANGE (see code of
> > Abibarshim), and all the other information can attach to that in places
> > where you write... other information.  People would curse a number “F’ing
> > 6291 failed again” - everyone knew what they were talking about.. I
> digress.
> >
> >  You might decide to document “dag ID 12” or just “12” on your wiki - I’m
> > going to document “daily_sales_import”.  And when things start failing at
> > 3am it’s not my dag “56” that’s failing, it’s the sales_export dag.  But
> if
> > you document “12”, that’s still it’s name, and it’d better be 12 in all
> > your environments and documents.  This also means the actual db IDs from
> > your proposal are almost certainly NOT the same across your environments,
> > making the 12 unchangeable name!
> >
> > There are lots of languages (most of them) where the name of a thing is
> > important and hard to change.  It’s not a bad thing, and I’d assume th

Re: Broken DAG message won't go away in webserver

2018-08-09 Thread Alex Guziel
IIRC the scheduler sets these messages in the error table in the db.

On Thu, Aug 9, 2018 at 2:13 PM, Ben Laird  wrote:

> The messages persist even after restarting the webserver. I've verified
> with other airflow users in the office that they'd have to manually delete
> records from the 'import_error' table.
>
> When you say 'sync your DAGs', what do you mean exactly? When we fix a DAG,
> we'd normally kill the webserver process, push a zip containing our dag
> directory (with the fixed code), unzip and restart the webserver.
>
> Thanks
>
> On Thu, Aug 9, 2018 at 4:43 PM, Taylor Edmiston 
> wrote:
>
> > Yeah, you definitely shouldn't need to do a resetdb for that.
> >
> > Did you try restarting the webserver?
> >
> > How do you sync your DAGs to the webserver?  Is it possible the fixed DAG
> > didn't get synced there?
> >
> > For me, IIRC, the error stops persisting once the DAG is fixed and
> synced.
> >
> > *Taylor Edmiston*
> > Blog  | CV
> >  | LinkedIn
> >  | AngelList
> >  | Stack Overflow
> > 
> >
> >
> > On Thu, Aug 9, 2018 at 3:35 PM, Ben Laird  wrote:
> >
> > > Hello -
> > >
> > > I've noticed this several times and not sure what the solution is. If I
> > > have a DAG error at some point, I'll see message in the webserver that
> > says
> > > "Broken DAG: [Error]". However, after fixing the code, restarting the
> > > webserver, etc, the error persists. After closing it out, it will just
> > pop
> > > up again after reloading.
> > >
> > > The only way I was able to delete was by doing a `airflow resetdb`. I'd
> > > like to avoid manually deleting records from the DB, as now in prod we
> > > cannot just kill the DB state.
> > >
> > > Any suggestions?
> > >
> > > Thanks,
> > > Ben Laird
> > >
> >
>


Re: PSA: Make sure your Airflow instance isn't public and isn't Google indexed

2018-06-05 Thread Alex Guziel
I suggest reading the section on password complexity here
https://pages.nist.gov/800-63-3/sp800-63b.html which recommends just a
minimum length and a check against a list of the most common passwords.

On Tue, Jun 5, 2018 at 3:14 PM, Maxime Beauchemin <
maximebeauche...@gmail.com> wrote:

> Agreed, secured by default is ideal. Though I wouldn't want people to get
> an unreasonable sense of safety and open their instance to the web.
>
> I like the idea of generating a temporary key/token and exposing it in the
> console where the process was started. Other option is to use the
> database/password mechanism by default and add a `airflow create-user
> --admin`  CLI command to generate a user. With the level of cluelessness
> we're observing we should probably force a certain password complexity
> level.
>
> We should also state clearly in the docs that Airflow is not regularly
> pen-tested and should not be exposed to the Internet.
>
> For the record we had Airflow pen-tested at Airbnb by a third party in 2016
> (or was it 2017?) and found/resolved half a dozen or so vulnerabilities or
> so. Note that there's no recurring process in place, or any mechanisms to
> prevent regressions beyond code review. Also note that the new [beta in
> 1.10] UI has not been pen tested (to my knowledge).
>
> Max
>
> On Tue, Jun 5, 2018 at 2:48 PM Bolke de Bruin  wrote:
>
> > Tbh I like to go to a setup where it is secure by default. Airflow is
> > getting more and more used so it also increases the attack surface. If
> you
> > run “initdb” or “resetdb” it is easy to provide a generated password.
> >
> > I don’t see a reason anymore for having a unsecured version.
> >
> > B.
> >
> > Verstuurd vanaf mijn iPad
> >
> > > Op 5 jun. 2018 om 23:11 heeft Christopher Bockman <
> ch...@fathomhealth.co>
> > het volgende geschreven:
> > >
> > > +1 to being able to disable--we have authentication in place, but use a
> > > separate solution that (probably?) Airflow won't realize is enabled, so
> > > having a continuous giant warning banner would be rather unfortunate.
> > >
> > >> On Tue, Jun 5, 2018 at 2:05 PM, Alek Storm 
> > wrote:
> > >>
> > >> This is a great idea, but we'd appreciate a setting that disables the
> > >> banner even if those conditions aren't met - our instance is deployed
> > >> without authentication, but is only accessible via our intranet.
> > >>
> > >> Alek
> > >>
> > >>
> > >> On Tue, Jun 5, 2018, 3:35 PM James Meickle 
> > >> wrote:
> > >>
> > >>> I think that a banner notification would be a fair penalty if you
> > access
> > >>> Airflow without authentication, or have API authentication turned
> off,
> > or
> > >>> are accessing via http:// with a non-localhost `Host:`. (Are there
> any
> > >>> other circumstances to think of?)
> > >>>
> > >>> I would also suggest serving a default robots.txt to mitigate
> > accidental
> > >>> indexing of public instances (as most public instances will be
> > >> accidentally
> > >>> public, statistically speaking). If you truly want your Airflow
> > instance
> > >>> public and indexed, you should have to go out of your way to permit
> > that.
> > >>>
> > >>> On Tue, Jun 5, 2018 at 1:51 PM, Maxime Beauchemin <
> > >>> maximebeauche...@gmail.com> wrote:
> > >>>
> >  What about a clear alert on the UI showing when auth is off?
> Perhaps a
> >  large red triangle-exclamation icon on the navbar with a tooltip
> >  "Authentication is off, this Airflow instance in not secure." and
> > >>> clicking
> >  take you to the doc's security page.
> > 
> >  Well and then of course people should make sure their infra isn't
> open
> > >> to
> >  the Internet. We really shouldn't have to tell people to keep their
> >  infrastructure behind a firewall. In most environments you have to
> do
> > >>> quite
> >  a bit of work to open any resource up to the Internet (SSL certs,
> > >> special
> >  security groups for load balancers/proxies, ...). Now I'm curious to
> >  understand how UMG managed to do this by mistake...
> > 
> >  Also a quick reminder to use the Connection abstraction to store
> > >> secrets,
> >  ideally using the environment variable feature.
> > 
> >  Max
> > 
> >  On Tue, Jun 5, 2018 at 10:02 AM Taylor Edmiston <
> tedmis...@gmail.com>
> >  wrote:
> > 
> > > One of our engineers wrote a blog post about the UMG mistakes as
> > >> well.
> > >
> > > https://www.astronomer.io/blog/universal-music-group-airflow-leak/
> > >
> > > I know that best practices are well known here, but I second James'
> > > suggestion that we add some docs, code, or config so that the
> > >> framework
> > > optimizes for being (nearly) production-ready by default and not
> just
> >  easy
> > > to start with for local dev.  Admittedly this takes some work to
> not
> > >>> add
> > > friction to the local onboarding experience.
> > >
> > > Do most people keep separate airflow.cfg files p

Re: Introducing a "LAUNCHED" state into airflow

2017-12-01 Thread Alex Guziel
The task instance audit is pretty good for debugging but maybe not as
useful to pure users.

The crashing sqlalchemy thing is bad in terms of just being bad practice
but handling it better wouldn’t be much better than increasing
innodb_lock_wait_timeout in practice.

On Fri, Dec 1, 2017 at 12:56 AM Daniel (Daniel Lamblin) [BDP - Seoul] <
lamb...@coupang.com> wrote:

> On this tangent, our scheduler occasionally crashes when the db tells
> SQLAlchemy that there's a lock on the task it's trying to set as queued or
> running.
> Some (update query) retry logic in the (many) callers seems to be in order.
>
> On 12/1/17, 2:19 PM, "Maxime Beauchemin" 
> wrote:
>
> Taking a tangent here:
>
> I like the idea of logging every state change to another table.
> Mutating
> task_instance from many places results in things that are hard to
> debug in
> some cases.
>
> As we need similar history-tracking of mutations on task_instances
> around
> retries, we may want keep track of history for anything that touches
> task_instance.
>
> It may be easy-ish to maintain this table using SQLAlchemy after-update
> hooks on the model where we'd systematically insert in a
> task_instance_history. Just a thought.
>
> Max
>
> On Thu, Nov 30, 2017 at 11:26 AM, Grant Nicholas <
> grantnicholas2...@u.northwestern.edu> wrote:
>
> > Thanks, I see why that should work, I just know that from testing
> this
> > myself that I had to manually clear out old QUEUED task instances to
> get
> > them to reschedule. I'll do some more testing to confirm, it's
> totally
> > possible I did something wrong in our test suite setup.
> >
> > On Thu, Nov 30, 2017 at 1:18 PM, Alex Guziel  .
> > invalid
> > > wrote:
> >
> > > See reset_state_for_orphaned_tasks in jobs.py
> > >
> > > On Thu, Nov 30, 2017 at 11:17 AM, Alex Guziel <
> alex.guz...@airbnb.com>
> > > wrote:
> > >
> > > > Right now the scheduler re-launches all QUEUED tasks on restart
> (there
> > > are
> > > > safeguards for duplicates).
> > > >
> > > > On Thu, Nov 30, 2017 at 11:13 AM, Grant Nicholas
>  > > > northwestern.edu> wrote:
> > > >
> > > >> @Alex
> > > >> I agree setting the RUNNING state immediately when `airflow run`
> > starts
> > > up
> > > >> would be useful on its own, but it doesn't solve all the
> problems.
> > What
> > > >> happens if you have a task in the QUEUED state (that may or may
> not
> > have
> > > >> been launched) and your scheduler crashes. What does the
> scheduler do
> > on
> > > >> startup, does it launch all QUEUED tasks again (knowing that
> there may
> > > be
> > > >> duplicate tasks) or does it not launch the QUEUED tasks again
> (knowing
> > > >> that
> > > >> the task may be stuck in the QUEUED state forever). Right now,
> airflow
> > > >> does
> > > >> the latter which I think is not correct, as you can potentially
> have
> > > tasks
> > > >> stuck in the QUEUED state forever.
> > > >>
> > > >> Using a LAUNCHED state would explicitly keep track of whether
> tasks
> > were
> > > >> submitted for execution or not. At that point it's up to your
> > messaging
> > > >> system/queueing system/etc to be crash safe, and that is
> something you
> > > get
> > > >> for free with kubernetes and it's something you can tune with
> celery
> > > >> persistent queues.
> > > >>
> > > >> Note: Another option is to NOT introduce a new state but have
> airflow
> > > >> launch QUEUED tasks again on startup of the executor. This
> would mean
> > > that
> > > >> we may launch duplicate tasks, but this should not be an issue
> since
> > we
> > > >> have built in protections on worker startup to avoid having two
> > RUNNING
> > > >> task instances at once.
> > > >>
> > > >>
> > > >>
> > > >> On Thu, Nov 30, 2017 at 12:40 PM, Alex Guziel <
> > > >> alex.guz...@airbnb.com

Re: Introducing a "LAUNCHED" state into airflow

2017-11-30 Thread Alex Guziel
See reset_state_for_orphaned_tasks in jobs.py

On Thu, Nov 30, 2017 at 11:17 AM, Alex Guziel 
wrote:

> Right now the scheduler re-launches all QUEUED tasks on restart (there are
> safeguards for duplicates).
>
> On Thu, Nov 30, 2017 at 11:13 AM, Grant Nicholas  northwestern.edu> wrote:
>
>> @Alex
>> I agree setting the RUNNING state immediately when `airflow run` starts up
>> would be useful on its own, but it doesn't solve all the problems. What
>> happens if you have a task in the QUEUED state (that may or may not have
>> been launched) and your scheduler crashes. What does the scheduler do on
>> startup, does it launch all QUEUED tasks again (knowing that there may be
>> duplicate tasks) or does it not launch the QUEUED tasks again (knowing
>> that
>> the task may be stuck in the QUEUED state forever). Right now, airflow
>> does
>> the latter which I think is not correct, as you can potentially have tasks
>> stuck in the QUEUED state forever.
>>
>> Using a LAUNCHED state would explicitly keep track of whether tasks were
>> submitted for execution or not. At that point it's up to your messaging
>> system/queueing system/etc to be crash safe, and that is something you get
>> for free with kubernetes and it's something you can tune with celery
>> persistent queues.
>>
>> Note: Another option is to NOT introduce a new state but have airflow
>> launch QUEUED tasks again on startup of the executor. This would mean that
>> we may launch duplicate tasks, but this should not be an issue since we
>> have built in protections on worker startup to avoid having two RUNNING
>> task instances at once.
>>
>>
>>
>> On Thu, Nov 30, 2017 at 12:40 PM, Alex Guziel <
>> alex.guz...@airbnb.com.invalid> wrote:
>>
>> > I think the more sensible thing here is to just to set the state to
>> RUNNING
>> > immediately in the airflow run process. I don't think the distinction
>> > between launched and running adds much value.
>> >
>> > On Thu, Nov 30, 2017 at 10:36 AM, Daniel Imberman <
>> > daniel.imber...@gmail.com
>> > > wrote:
>> >
>> > > @Alex
>> > >
>> > > That could potentially work since if you have the same task launched
>> > twice
>> > > then the second time would die due to the "already running
>> dependency".
>> > > Still less ideal than not launching that task at all since it still
>> > allows
>> > > for race conditions. @grant thoughts on this?
>> > >
>> > > On Wed, Nov 29, 2017 at 11:00 AM Alex Guziel > > > invalid>
>> > > wrote:
>> > >
>> > > > It might be good enough to have RUNNING set immediately on the
>> process
>> > > run
>> > > > and not being dependent on the dag file being parsed. It is annoying
>> > here
>> > > > too when dags parse on the scheduler but not the worker, since
>> queued
>> > > tasks
>> > > > that don't heartbeat will not get retried, while running tasks will.
>> > > >
>> > > > On Wed, Nov 29, 2017 at 10:04 AM, Grant Nicholas <
>> > > > grantnicholas2...@u.northwestern.edu> wrote:
>> > > >
>> > > > > ---Opening up this conversation to the whole mailing list, as
>> > suggested
>> > > > by
>> > > > > Bolke---
>> > > > >
>> > > > >
>> > > > > A "launched" state has been suggested in the past (see here
>> > > > > <https://github.com/apache/incubator-airflow/blob/master/
>> > > > > airflow/utils/state.py#L31>)
>> > > > > but never implemented for reasons unknown to us. Does anyone have
>> > more
>> > > > > details about why?
>> > > > >
>> > > > > There are two big reasons why adding a new "launched" state to
>> > airflow
>> > > > > would be useful:
>> > > > >
>> > > > > 1. A "launched" state would be useful for crash safety of the
>> > > scheduler.
>> > > > If
>> > > > > the scheduler crashes in between the scheduler launching the task
>> and
>> > > the
>> > > > > task process starting up then we lose information about whether
>> that
>> > > task
>> > > > > was launched or not. By moving the state of the task 

Re: Introducing a "LAUNCHED" state into airflow

2017-11-30 Thread Alex Guziel
Right now the scheduler re-launches all QUEUED tasks on restart (there are
safeguards for duplicates).

On Thu, Nov 30, 2017 at 11:13 AM, Grant Nicholas <
grantnicholas2...@u.northwestern.edu> wrote:

> @Alex
> I agree setting the RUNNING state immediately when `airflow run` starts up
> would be useful on its own, but it doesn't solve all the problems. What
> happens if you have a task in the QUEUED state (that may or may not have
> been launched) and your scheduler crashes. What does the scheduler do on
> startup, does it launch all QUEUED tasks again (knowing that there may be
> duplicate tasks) or does it not launch the QUEUED tasks again (knowing that
> the task may be stuck in the QUEUED state forever). Right now, airflow does
> the latter which I think is not correct, as you can potentially have tasks
> stuck in the QUEUED state forever.
>
> Using a LAUNCHED state would explicitly keep track of whether tasks were
> submitted for execution or not. At that point it's up to your messaging
> system/queueing system/etc to be crash safe, and that is something you get
> for free with kubernetes and it's something you can tune with celery
> persistent queues.
>
> Note: Another option is to NOT introduce a new state but have airflow
> launch QUEUED tasks again on startup of the executor. This would mean that
> we may launch duplicate tasks, but this should not be an issue since we
> have built in protections on worker startup to avoid having two RUNNING
> task instances at once.
>
>
>
> On Thu, Nov 30, 2017 at 12:40 PM, Alex Guziel <
> alex.guz...@airbnb.com.invalid> wrote:
>
> > I think the more sensible thing here is to just to set the state to
> RUNNING
> > immediately in the airflow run process. I don't think the distinction
> > between launched and running adds much value.
> >
> > On Thu, Nov 30, 2017 at 10:36 AM, Daniel Imberman <
> > daniel.imber...@gmail.com
> > > wrote:
> >
> > > @Alex
> > >
> > > That could potentially work since if you have the same task launched
> > twice
> > > then the second time would die due to the "already running dependency".
> > > Still less ideal than not launching that task at all since it still
> > allows
> > > for race conditions. @grant thoughts on this?
> > >
> > > On Wed, Nov 29, 2017 at 11:00 AM Alex Guziel  > > invalid>
> > > wrote:
> > >
> > > > It might be good enough to have RUNNING set immediately on the
> process
> > > run
> > > > and not being dependent on the dag file being parsed. It is annoying
> > here
> > > > too when dags parse on the scheduler but not the worker, since queued
> > > tasks
> > > > that don't heartbeat will not get retried, while running tasks will.
> > > >
> > > > On Wed, Nov 29, 2017 at 10:04 AM, Grant Nicholas <
> > > > grantnicholas2...@u.northwestern.edu> wrote:
> > > >
> > > > > ---Opening up this conversation to the whole mailing list, as
> > suggested
> > > > by
> > > > > Bolke---
> > > > >
> > > > >
> > > > > A "launched" state has been suggested in the past (see here
> > > > > <https://github.com/apache/incubator-airflow/blob/master/
> > > > > airflow/utils/state.py#L31>)
> > > > > but never implemented for reasons unknown to us. Does anyone have
> > more
> > > > > details about why?
> > > > >
> > > > > There are two big reasons why adding a new "launched" state to
> > airflow
> > > > > would be useful:
> > > > >
> > > > > 1. A "launched" state would be useful for crash safety of the
> > > scheduler.
> > > > If
> > > > > the scheduler crashes in between the scheduler launching the task
> and
> > > the
> > > > > task process starting up then we lose information about whether
> that
> > > task
> > > > > was launched or not. By moving the state of the task to "launched"
> > when
> > > > it
> > > > > is sent off to celery/dask/kubernetes/etc, when crashes happen you
> > know
> > > > > whether you have to relaunch the task or not.
> > > > >
> > > > > To workaround this issue, on startup of the kubernetes executor we
> > > query
> > > > > all "queued" tasks and if there is not a matching kubernetes pod
> for
> > > t

Re: Introducing a "LAUNCHED" state into airflow

2017-11-30 Thread Alex Guziel
I think the more sensible thing here is to just to set the state to RUNNING
immediately in the airflow run process. I don't think the distinction
between launched and running adds much value.

On Thu, Nov 30, 2017 at 10:36 AM, Daniel Imberman  wrote:

> @Alex
>
> That could potentially work since if you have the same task launched twice
> then the second time would die due to the "already running dependency".
> Still less ideal than not launching that task at all since it still allows
> for race conditions. @grant thoughts on this?
>
> On Wed, Nov 29, 2017 at 11:00 AM Alex Guziel  invalid>
> wrote:
>
> > It might be good enough to have RUNNING set immediately on the process
> run
> > and not being dependent on the dag file being parsed. It is annoying here
> > too when dags parse on the scheduler but not the worker, since queued
> tasks
> > that don't heartbeat will not get retried, while running tasks will.
> >
> > On Wed, Nov 29, 2017 at 10:04 AM, Grant Nicholas <
> > grantnicholas2...@u.northwestern.edu> wrote:
> >
> > > ---Opening up this conversation to the whole mailing list, as suggested
> > by
> > > Bolke---
> > >
> > >
> > > A "launched" state has been suggested in the past (see here
> > > <https://github.com/apache/incubator-airflow/blob/master/
> > > airflow/utils/state.py#L31>)
> > > but never implemented for reasons unknown to us. Does anyone have more
> > > details about why?
> > >
> > > There are two big reasons why adding a new "launched" state to airflow
> > > would be useful:
> > >
> > > 1. A "launched" state would be useful for crash safety of the
> scheduler.
> > If
> > > the scheduler crashes in between the scheduler launching the task and
> the
> > > task process starting up then we lose information about whether that
> task
> > > was launched or not. By moving the state of the task to "launched" when
> > it
> > > is sent off to celery/dask/kubernetes/etc, when crashes happen you know
> > > whether you have to relaunch the task or not.
> > >
> > > To workaround this issue, on startup of the kubernetes executor we
> query
> > > all "queued" tasks and if there is not a matching kubernetes pod for
> that
> > > task then we set the task state to "None" so it is rescheduled. See
> here
> > > <https://github.com/bloomberg/airflow/blob/airflow-
> > >
> > kubernetes-executor/airflow/contrib/executors/kubernetes_
> executor.py#L400>
> > > for
> > > details if you are curious. While this works for the kubernetes
> executor,
> > > other executors can't easily introspect launched tasks and this means
> the
> > > celery executor (afaik) is not crash safe.
> > >
> > > 2. A "launched" state would allow for dynamic backpressure of tasks,
> not
> > > just static backpressure. Right now, airflow only allows static
> > > backpressure (`parallelism` config).This means you must statically say
> I
> > > only want to allow N running tasks at once. Imagine you have lots of
> > tasks
> > > being scheduled on your celery cluster/kubernetes cluster and since the
> > > resource usage of each task is heterogenous you don't know exactly how
> > many
> > > running tasks you can tolerate at once. If instead you can say "I only
> > want
> > > tasks to be launched while I have less than N tasks in the launched
> > state"
> > > you get some adaptive backpressure.
> > >
> > > While we have workarounds described above for the kubernetes executor,
> > how
> > > do people feel about introducing a launched state into airflow so we
> > don't
> > > need the workarounds? I think there are benefits to be gained for all
> the
> > > executors.
> > >
> > > On Sun, Nov 26, 2017 at 1:46 AM, Bolke de Bruin 
> > wrote:
> > >
> > > >
> > > > Hi Daniel,
> > > >
> > > > (BTW: I do think this discussion is better to have at the
> mailinglist,
> > > > more people might want to chime in and offer valuable opinions)
> > > >
> > > > Jumping right in: I am wondering if are you not duplicating the
> > “queued”
> > > > logic for (a.o) pools. Introducing LAUNCHED with the meaning attached
> > to
> > > > it that you describe, would mean that we have a second place where we
> > > >

Re: Introducing a "LAUNCHED" state into airflow

2017-11-29 Thread Alex Guziel
It might be good enough to have RUNNING set immediately on the process run
and not being dependent on the dag file being parsed. It is annoying here
too when dags parse on the scheduler but not the worker, since queued tasks
that don't heartbeat will not get retried, while running tasks will.

On Wed, Nov 29, 2017 at 10:04 AM, Grant Nicholas <
grantnicholas2...@u.northwestern.edu> wrote:

> ---Opening up this conversation to the whole mailing list, as suggested by
> Bolke---
>
>
> A "launched" state has been suggested in the past (see here
>  airflow/utils/state.py#L31>)
> but never implemented for reasons unknown to us. Does anyone have more
> details about why?
>
> There are two big reasons why adding a new "launched" state to airflow
> would be useful:
>
> 1. A "launched" state would be useful for crash safety of the scheduler. If
> the scheduler crashes in between the scheduler launching the task and the
> task process starting up then we lose information about whether that task
> was launched or not. By moving the state of the task to "launched" when it
> is sent off to celery/dask/kubernetes/etc, when crashes happen you know
> whether you have to relaunch the task or not.
>
> To workaround this issue, on startup of the kubernetes executor we query
> all "queued" tasks and if there is not a matching kubernetes pod for that
> task then we set the task state to "None" so it is rescheduled. See here
>  kubernetes-executor/airflow/contrib/executors/kubernetes_executor.py#L400>
> for
> details if you are curious. While this works for the kubernetes executor,
> other executors can't easily introspect launched tasks and this means the
> celery executor (afaik) is not crash safe.
>
> 2. A "launched" state would allow for dynamic backpressure of tasks, not
> just static backpressure. Right now, airflow only allows static
> backpressure (`parallelism` config).This means you must statically say I
> only want to allow N running tasks at once. Imagine you have lots of tasks
> being scheduled on your celery cluster/kubernetes cluster and since the
> resource usage of each task is heterogenous you don't know exactly how many
> running tasks you can tolerate at once. If instead you can say "I only want
> tasks to be launched while I have less than N tasks in the launched state"
> you get some adaptive backpressure.
>
> While we have workarounds described above for the kubernetes executor, how
> do people feel about introducing a launched state into airflow so we don't
> need the workarounds? I think there are benefits to be gained for all the
> executors.
>
> On Sun, Nov 26, 2017 at 1:46 AM, Bolke de Bruin  wrote:
>
> >
> > Hi Daniel,
> >
> > (BTW: I do think this discussion is better to have at the mailinglist,
> > more people might want to chime in and offer valuable opinions)
> >
> > Jumping right in: I am wondering if are you not duplicating the “queued”
> > logic for (a.o) pools. Introducing LAUNCHED with the meaning attached to
> > it that you describe, would mean that we have a second place where we
> > handle back pressure.
> >
> > Isn’t there a way to ask the k8s cluster how many tasks it has pending
> and
> > just to execute any queued tasks when it crosses a certain threshold?
> Have
> > a look a base_executor where it is handling slots and queued tasks.
> >
> > Cheers
> > Bolke
> >
> >
> > Verstuurd vanaf mijn iPad
> >
> > Op 15 nov. 2017 om 01:39 heeft Daniel Imberman <
> daniel.imber...@gmail.com>
> > het volgende geschreven:
> >
> > Hi Bolke and Dan!
> >
> > I had a quick question WRT the launched state (
> > https://github.com/apache/incubator-airflow/blob/master/air
> > flow/utils/state.py#L32).
> >
> > We are handling the issue of throttling the executor when the k8s cluster
> > has more than 5 pending tasks (which usually means that the cluster is
> > under a lot of strain), and one thought we had WRT crash safety was to
> use
> > a "LAUNCHED" state for pods that have been submitted but are not running
> > yet.
> >
> > With the launched state currently being TBD, I was wondering if there was
> > any reason you guys would not want this state? There are other
> workarounds
> > we can do, but we wanted to check in with you guys first.
> >
> > Thanks!
> >
> > Daniel
> >
> >
> >
>


Re: Ignore Processing DAG Definition Python Files for Paused DAGs

2017-11-27 Thread Alex Guziel
Hmm, this may not apply to your implementation, but it sounds like for this
it would not handle cases like:

1) a.py has dag A1 and A2, A1 is paused, A2 is not
2) b.py has dag B1, which is paused. Later B2 is added to b.py but does not
get picked up since B1 is paused.

On Mon, Nov 27, 2017 at 3:29 PM, Andy Huynh  wrote:

> When we updated to Airflow 1.9, we noticed that there was a pretty big
> delay between tasks (somewhere between 2-4 minutes, even after playing
> around with the min_file_process_interval and max_threads configs). Our
> thought was that if we reduce the number of files that the scheduler has to
> process, then the scheduler would process files for unpaused DAGs more
> frequently, reducing the delay between tasks.
>
> On 2017-11-27 11:23, Alek Storm  wrote:
> > What's the advantage of this change? Performance?
> >
> > Alek
> >
> > On Mon, Nov 27, 2017 at 1:11 PM, ahu...@symphonyrm.com <
> > ahu...@symphonyrm.com> wrote:
> >
> > > Hi all,
> > >
> > > I wanted to gauge community interest in this idea we have. We are
> > > currently running a modified version of Airflow 1.9 RC3 where we ignore
> > > processing DAG definition Python files for paused DAGs. By default,
> > > list_py_file_paths traverses the dags subdirectory to look for Python
> > > files, and the scheduler processes all these files, regardless of
> whether
> > > the DAGs defined in these files are paused or not. Our proposed
> > > modification was to query the fileloc column in the dag table,
> filtering
> > > on is_paused=1 and is_active=1 to get a list of file paths for paused
> DAGs.
> > > Then, we can exclude these files from the known_file_paths, so that the
> > > scheduler does not process these files. This feature can be set on and
> off
> > > via a scheduler config variable.
> > >
> > > If anyone is interested, we already have the code written, so we'd be
> > > happy to package up our changes and create a PR.
> > >
> > > Thanks!
> > > -Andy
> > >
> >
>


Re: 1.9.0alpha1 published

2017-10-13 Thread Alex Guziel
AIRFLOW-976 should be marked resolved. It is fixed by
https://github.com/apache/incubator-airflow/commit/b2e1753f5b74ad1b6e0889f7b784ce69623c95ce
(pardon my commit message), which is in v1.9.

On Fri, Oct 13, 2017 at 11:52 AM, Chris Riccomini 
wrote:

> Hey all,
>
> I have cut a 1.9.0alpha1 release of Airflow. You can download it here:
>
>   https://dist.apache.org/repos/dist/dev/incubator/airflow/1.9.0alpha1/
>
> The bin tarball can be installed with pip:
>
>   pip install apache-airflow-1.9.0alpha1+incubating-bin.tar.gz
>
> The goal is to have the community install and run this to expose any bugs
> before we move on to official release candidates.
>
> Here are the outstanding blocker bugs for 1.9.0:
>
>   AIRFLOW-1525 |Improvement |Fix minor LICENSE & NOTICE issue
>   AIRFLOW-1258 |Bug |TaskInstances within SubDagOperator are marked
> as
>   AIRFLOW-1055 |Bug |airflow/jobs.py:create_dag_run() exception
> for
> @on
>   AIRFLOW-1018 |Bug |Scheduler DAG processes can not log to stdout
>   AIRFLOW-1013 |Bug |airflow/jobs.py:manage_slas() exception for
> @once
>   AIRFLOW-976  |Bug |Mark success running task causes it to fail
>
> Note: it appears that none of the blocker bugs have been closed since
> alpha1. Are these truly blocker bugs, or should they be bumped to 1.10.0?
>
> Cheers,
> Chris
>


Re: Runbook to upgrade Airflow

2017-10-04 Thread Alex Guziel
We pretty much just shut everything off (fence the db by changing the
password), upgrade the db, then turn everything back on and change the db
password back. We accept that there will be downtime for maintenance and do
it at a low impact time (like 20:00 utc ). Also, the thing with sqlalchemy
is a lot more things become breaking changes because of the way it writes
queries.

It might be good to have downtime even without schema changes to prevent
weird behavior if the changeset is large.

But for small stuff, depending on the scope of the change, workers don't
really need to be restarted, scheduler can be handled manually, and
webserver we do a rolling restart.

On Wed, Oct 4, 2017 at 7:54 AM Thoralf Gutierrez 
wrote:

> Thanks for your answer Alex.
>
> I guess you mean it won't work if there is a _breaking_ schema change
> right? But for new patch and minor versions (i.e. without breaking
> changes), I'm guessing that upgrading the db first should do the trick?
>
> @Airbnb team, how do you all do it? If I remember correctly, you often
> update your version of airflow to test out new releases. What does your
> runbook for Airflow upgrades look like?
>
> Cheers,
> Thoralf
>
> On Tue, 3 Oct 2017 at 13:04 Alex Guziel 
> wrote:
>
> > You won't be able to if there's a schema change.
> >
> > On Tue, Oct 3, 2017 at 12:33 PM, Thoralf Gutierrez <
> > thoralfgutier...@gmail.com> wrote:
> >
> > > Hey everybody!
> > >
> > > Does anybody have some kind of runbook to upgrade airflow (with a
> Celery
> > > backend) without having any downtime (i.e. tasks keep on running as you
> > > upgrade)?
> > >
> > > I have this in mind, but not sure if I am missing something or if I
> > should
> > > be careful with the order of steps (especially for upgrading the db
> > > schema):
> > >
> > > 1. run airflow upgradedb from anywhere
> > >
> > > 2. one worker at a time
> > >   2a. make sure it doesn't start any new task.
> > >   2b. wait for all tasks to be finished
> > >   2c. run pip install airflow --upgrade
> > >   2d. re-enable worker
> > >
> > > 3. one webserver at a time
> > >   3a. kill webserver
> > >   3b. run pip install airflow --upgrade
> > >   3c. start webserver
> > >
> > > 4. scheduler
> > >   4a. kill scheduler
> > >   4b. run pip intsall airflow --upgrade
> > >   4c. start scheduler
> > >
> > > Thanks,
> > > Thoralf
> > >
> >
>


Re: Runbook to upgrade Airflow

2017-10-03 Thread Alex Guziel
You won't be able to if there's a schema change.

On Tue, Oct 3, 2017 at 12:33 PM, Thoralf Gutierrez <
thoralfgutier...@gmail.com> wrote:

> Hey everybody!
>
> Does anybody have some kind of runbook to upgrade airflow (with a Celery
> backend) without having any downtime (i.e. tasks keep on running as you
> upgrade)?
>
> I have this in mind, but not sure if I am missing something or if I should
> be careful with the order of steps (especially for upgrading the db
> schema):
>
> 1. run airflow upgradedb from anywhere
>
> 2. one worker at a time
>   2a. make sure it doesn't start any new task.
>   2b. wait for all tasks to be finished
>   2c. run pip install airflow --upgrade
>   2d. re-enable worker
>
> 3. one webserver at a time
>   3a. kill webserver
>   3b. run pip install airflow --upgrade
>   3c. start webserver
>
> 4. scheduler
>   4a. kill scheduler
>   4b. run pip intsall airflow --upgrade
>   4c. start scheduler
>
> Thanks,
> Thoralf
>


Re: Airflow 1.9.0 status

2017-09-20 Thread Alex Guziel
Can we get this in?

https://issues.apache.org/jira/browse/AIRFLOW-1519
https://issues.apache.org/jira/browse/AIRFLOW-1621

https://github.com/apache/incubator-airflow/commit/b6d2e0a46978e93e16576604624f57d1388814f2
https://github.com/apache/incubator-airflow/commit/656d045e90bf67ca484a3778b2a07a419bfb324a

It speeds up loading times a lot, so it's a good thing to have in 1.9.

On Wed, Sep 20, 2017 at 11:14 AM, Chris Riccomini 
wrote:

> Sounds good. I'll plan on stable+beta next week, then. Initial warning
> stands, that I will start locking down what can get into 1.9.0 at that
> point.
>
> On Wed, Sep 20, 2017 at 11:10 AM, Bolke de Bruin 
> wrote:
>
> > No vote indeed, just to gather feedback on a particular fixed point in
> > time. It also gives a bit more trust to a tarball than to a git pull.
> >
> > Bolke
> >
> > > On 20 Sep 2017, at 20:09, Chris Riccomini 
> wrote:
> > >
> > > I can do a beta. Is the process significantly different? IIRC, it's
> > > basically the same, just no vote, right?
> > >
> > > On Wed, Sep 20, 2017 at 10:56 AM, Bolke de Bruin 
> > wrote:
> > >
> > >> Are you sure you want to go ahead and do RCs right away? Isn’t a beta
> a
> > >> bit smarter?
> > >>
> > >> - Bolke
> > >>
> > >>> On 20 Sep 2017, at 19:41, Chris Riccomini 
> > wrote:
> > >>>
> > >>> Hey all,
> > >>>
> > >>> I want to send out a warning that I'm planning to cut the stable
> branch
> > >>> next week, and begin the RC1 release vote. Once the stable branch is
> > >> cut, I
> > >>> will be locking down what commits get cherry picked into the branch,
> > and
> > >>> will only be doing PRs that are required to get the release out.
> > >>>
> > >>> Cheers,
> > >>> Chris
> > >>>
> > >>> On Mon, Sep 18, 2017 at 11:19 AM, Chris Riccomini <
> > criccom...@apache.org
> > >>>
> > >>> wrote:
> > >>>
> >  Hey all,
> > 
> >  An update on the 1.9.0 release. Here are the outstanding PRs that
> are
> >  slated to be included into 1.9.0:
> > 
> >  ISSUE ID |STATUS|DESCRIPTION
> >  AIRFLOW-1617 |Open  |XSS Vulnerability in Variable endpoint
> >  AIRFLOW-1611 |Open  |Customize logging in Airflow
> >  AIRFLOW-1605 |Reopened  |Fix log source of local loggers
> >  AIRFLOW-1604 |Open  |Rename the logger to log
> >  AIRFLOW-1525 |Open  |Fix minor LICENSE & NOTICE issue
> >  AIRFLOW-1499 |In Progres|Eliminate duplicate and unneeded code
> >  AIRFLOW-1198 |Open  |HDFSOperator to operate HDFS
> >  AIRFLOW-1055 |Open  |airflow/jobs.py:create_dag_run() exception
> > for
> >  @on
> >  AIRFLOW-1019 |Open  |active_dagruns shouldn't include paused
> DAGs
> >  AIRFLOW-1018 |Open  |Scheduler DAG processes can not log to
> stdout
> >  AIRFLOW-1015 |Open  |TreeView displayed over task instances
> >  AIRFLOW-1013 |Open  |airflow/jobs.py:manage_slas() exception for
> >  @once
> >  AIRFLOW-976  |Open  |Mark success running task causes it to fail
> >  AIRFLOW-914  |Open  |Refactor BackfillJobTest.test_backfill_
> > >> examples
> >  to
> >  AIRFLOW-913  |Open  |Refactor tests.CoreTest.test_scheduler_job
> > to
> >  real
> >  AIRFLOW-912  |Open  |Refactor tests and build matrix
> >  AIRFLOW-888  |Open  |Operators should not push XComs by default
> >  AIRFLOW-828  |Open  |Add maximum size for XComs
> >  AIRFLOW-825  |Open  |Add Dataflow semantics
> >  AIRFLOW-788  |Open  |Context unexpectedly added to hive conf
> > 
> >  I will be locking down what can get cherry-picked into the 1.9.0
> > branch
> >  shortly, so if you have something you want in, please set the fix
> > >> version
> >  to 1.9.0.
> > 
> >  We (at WePay) have deployed 1.9.0 into our dev cluster, and it has
> > been
> >  running smoothly for several days.
> > 
> >  ** I could really use help verifying stability. If you run Airflow,
> > it's
> >  in your best interest to deploy the 1.9.0 test branch somewhere, and
> > >> verify
> >  it's working for your workload. **
> > 
> >  Cheers,
> >  Chris
> > 
> > >>
> > >>
> >
> >
>


Re: Proposal: Set Celery 4.0 as a minimum as Celery 4 is unsupported

2017-09-19 Thread Alex Guziel
That's probably fine but I'd like to note two things.

1) The celery 3 config options are forwards compatible as far as I know
2) Still doesn't fix the bug where tasks get reserved even though it
shouldn't.


But I think it makes sense to upgrade the version in setup.py regardless.

On Tue, Sep 19, 2017 at 2:39 AM Bolke de Bruin  wrote:

> ping.
>
> I'd like some more feedback.
>
> Cheers
> Bolke
>
> Verstuurd vanaf mijn iPad
>
> > Op 16 sep. 2017 om 17:45 heeft Ash Berlin-Taylor <
> ash_airflowl...@firemirror.com> het volgende geschreven:
> >
> > +1 from us, we're running on Celery 4.0.2 in production on Airflow 1.8.2
> (4.1 wasn't out when we started and haven't upgraded in prod yet)
> >
> >
> >> On 16 Sep 2017, at 16:35, Bolke de Bruin  wrote:
> >>
> >> Hi,
> >>
> >> Some refactoring of the Celery config is underway and as some of the
> options have changed between Celery 3 and 4 I have asked the question
> whether Celery 3 is still supported. Apparently it is not (
> https://github.com/celery/celery/issues/4258 <
> https://github.com/celery/celery/issues/4258>).
> >>
> >> As Celery 3 is also 2 major releases behind I propose to set Celery 4.0
> as a minimum supported version with Celery 4.1 being the recommend version.
> I think this is also important as we do see some intermittent issues with
> Celery that are reported to us but are, most likely, issues in Celery. I
> don’t want to take care of those as they are difficult to debug.
> >>
> >> I would like to do this per 1.9.0. The 1.8.X branch can still support
> Celery 3.
> >>
> >> Cheers
> >> Bolke
> >
>


Re: Terminate task process through UI

2017-09-13 Thread Alex Guziel
Right now, there are a few layers of processes. Here's an example in the
celery worker case.

CeleryMainProcess -> CeleryPoolWorker -> Airflow run --local -> Airflow run
--raw -> Bash command

In the past, airflow run --raw would handle almost all logic, and --local
would just handle heartbeating, which would include checking the task
state, and sending a SIGTERM down if necessary. (--local is LocalTaskJob).
I recently moved the logic in airflow run --raw that runs before the bash
command to run in airflow run --local, but I believe the logic in Airflow
run --raw after the bash command (or whatever operator) should also be
moved up. It would be more logical in cases like these.


Re: 1.9.0 test branch has been cut

2017-09-13 Thread Alex Guziel
Nevermind, I misunderstood what you meant. (I thought you meant you were
only including things with a fix version of 1.9.0, when you meant master
cut + 1.9.0 fix versions)

On Wed, Sep 13, 2017 at 1:19 PM, Alex Guziel  wrote:

> Shouldn't we include everything on master?
>
> On Wed, Sep 13, 2017 at 12:45 PM, Chris Riccomini 
> wrote:
>
>> Hey all,
>>
>> I've cut a 1.9.0 test branch.
>>
>> https://github.com/apache/incubator-airflow/tree/v1-9-test
>>
>> Here are the tickets that are being tracked on 1.9.0.
>>
>> ISSUE ID |DESCRIPTION   |MERGED
>> AIRFLOW-1608 |GCP Dataflow hook missing pending job state   |1
>> AIRFLOW-1606 |DAG.sync_to_db is static, but takes a DAG as first|1
>> AIRFLOW-1605 |Fix log source of local loggers   |0
>> AIRFLOW-1602 |Use LoggingMixin for the DAG class|1
>> AIRFLOW-1597 |Add GameWisp as Airflow user  |1
>> AIRFLOW-1594 |Installing via pip copies test files into python l|1
>> AIRFLOW-1593 |Expose load_string in WasbHook|1
>> AIRFLOW-1586 |MySQL to GCS to BigQuery fails for tables with dat|1
>> AIRFLOW-1584 |Remove the insecure /headers endpoints|1
>> AIRFLOW-1582 |Improve logging structure of Airflow  |1
>> AIRFLOW-1580 |Error in string formatter when throwing an excepti|1
>> AIRFLOW-1579 |Allow jagged rows in BQ Hook. |1
>> AIRFLOW-1577 |Add token support to DatabricksHook   |1
>> AIRFLOW-1573 |Remove `thrift < 0.10.0` requirement  |1
>> AIRFLOW-1568 |Add datastore import/export operator  |1
>> AIRFLOW-1567 |Clean up ML Engine operators  |1
>> AIRFLOW-1564 |Default logging filename contains a colon |1
>> AIRFLOW-1556 |BigQueryBaseCursor should support SQL parameters  |1
>> AIRFLOW-1546 | add Zymergen to org list in README   |1
>> AIRFLOW-1535 |Add support for Dataproc serviceAccountScopes in D|1
>> AIRFLOW-1529 |Support quoted newlines in Google BigQuery load jo|1
>> AIRFLOW-1522 |Increase size of val column for variable table in |1
>> AIRFLOW-1521 |Template fields definition for bigquery_table_dele|1
>> AIRFLOW-1507 |Make src, dst and bucket parameters as templated i|1
>> AIRFLOW-1505 |Document when Jinja substitution occurs   |1
>> AIRFLOW-1504 |Log Cluster Name on Dataproc Operator When Execute|1
>> AIRFLOW-1499ss|Eliminate duplicate and unneeded code |0
>> AIRFLOW-1493 |Fix race condition with airflow run   |1
>> AIRFLOW-1492 |Add metric for task success/failure   |1
>> AIRFLOW-1489 |Docs: Typo in BigQueryCheckOperator   |1
>> AIRFLOW-1478 |Chart -> Owner column should be sortable  |1
>> AIRFLOW-1476 |Add INSTALL file for source releases  |1
>> AIRFLOW-1474 |Add dag_id regex for 'airflow clear' CLI command  |1
>> AIRFLOW-1459 |integration rst doc is broken in github view  |1
>> AIRFLOW-1438 |Scheduler batch queries should have a limit   |1
>> AIRFLOW-1437 |BigQueryTableDeleteOperator should define deletion|1
>> AIRFLOW-1402 |Cleanup SafeConfigParser DeprecationWarning   |1
>> AIRFLOW-1401 |Standardize GCP project, region, and zone argument|1
>> AIRFLOW-1394 |Add quote_character parameter to GoogleCloudStorag|1
>> AIRFLOW-1389 |BigQueryOperator should support `createDisposition|1
>> AIRFLOW-1384 |Add ARGO/CaDC |1
>> AIRFLOW-1359 |Provide GoogleCloudML operator for model evaluatio|1
>> AIRFLOW-1352 |Revert bad logging Handler|0
>> AIRFLOW-1350 |Add "query_uri" parameter for Google DataProc oper|1
>> AIRFLOW-1345 |Don't commit on each loop |1
>> AIRFLOW-1344 |Builds failing on Python 3.5 with AttributeError  |1
>> AIRFLOW-1343 |Add airflow default label to the dataproc operator|1
>> AIRFLOW-1338 |gcp_dataflow_hook is incompatible with the recent |1
>> AIRFLOW-1337 |Customize log format via config file  |1
>> AIRFLOW-1335 |Use buffered logger   |1
>>
>> If you have stuff you want to get in, please set it with a fix version of
>> 1.9.0.
>>
>> Please begin testing, stabilizing, and reporting bugs now! :)
>>
>> Cheers,
>> Chris
>>
>
>


Re: 1.9.0 test branch has been cut

2017-09-13 Thread Alex Guziel
Shouldn't we include everything on master?

On Wed, Sep 13, 2017 at 12:45 PM, Chris Riccomini 
wrote:

> Hey all,
>
> I've cut a 1.9.0 test branch.
>
> https://github.com/apache/incubator-airflow/tree/v1-9-test
>
> Here are the tickets that are being tracked on 1.9.0.
>
> ISSUE ID |DESCRIPTION   |MERGED
> AIRFLOW-1608 |GCP Dataflow hook missing pending job state   |1
> AIRFLOW-1606 |DAG.sync_to_db is static, but takes a DAG as first|1
> AIRFLOW-1605 |Fix log source of local loggers   |0
> AIRFLOW-1602 |Use LoggingMixin for the DAG class|1
> AIRFLOW-1597 |Add GameWisp as Airflow user  |1
> AIRFLOW-1594 |Installing via pip copies test files into python l|1
> AIRFLOW-1593 |Expose load_string in WasbHook|1
> AIRFLOW-1586 |MySQL to GCS to BigQuery fails for tables with dat|1
> AIRFLOW-1584 |Remove the insecure /headers endpoints|1
> AIRFLOW-1582 |Improve logging structure of Airflow  |1
> AIRFLOW-1580 |Error in string formatter when throwing an excepti|1
> AIRFLOW-1579 |Allow jagged rows in BQ Hook. |1
> AIRFLOW-1577 |Add token support to DatabricksHook   |1
> AIRFLOW-1573 |Remove `thrift < 0.10.0` requirement  |1
> AIRFLOW-1568 |Add datastore import/export operator  |1
> AIRFLOW-1567 |Clean up ML Engine operators  |1
> AIRFLOW-1564 |Default logging filename contains a colon |1
> AIRFLOW-1556 |BigQueryBaseCursor should support SQL parameters  |1
> AIRFLOW-1546 | add Zymergen to org list in README   |1
> AIRFLOW-1535 |Add support for Dataproc serviceAccountScopes in D|1
> AIRFLOW-1529 |Support quoted newlines in Google BigQuery load jo|1
> AIRFLOW-1522 |Increase size of val column for variable table in |1
> AIRFLOW-1521 |Template fields definition for bigquery_table_dele|1
> AIRFLOW-1507 |Make src, dst and bucket parameters as templated i|1
> AIRFLOW-1505 |Document when Jinja substitution occurs   |1
> AIRFLOW-1504 |Log Cluster Name on Dataproc Operator When Execute|1
> AIRFLOW-1499ss|Eliminate duplicate and unneeded code |0
> AIRFLOW-1493 |Fix race condition with airflow run   |1
> AIRFLOW-1492 |Add metric for task success/failure   |1
> AIRFLOW-1489 |Docs: Typo in BigQueryCheckOperator   |1
> AIRFLOW-1478 |Chart -> Owner column should be sortable  |1
> AIRFLOW-1476 |Add INSTALL file for source releases  |1
> AIRFLOW-1474 |Add dag_id regex for 'airflow clear' CLI command  |1
> AIRFLOW-1459 |integration rst doc is broken in github view  |1
> AIRFLOW-1438 |Scheduler batch queries should have a limit   |1
> AIRFLOW-1437 |BigQueryTableDeleteOperator should define deletion|1
> AIRFLOW-1402 |Cleanup SafeConfigParser DeprecationWarning   |1
> AIRFLOW-1401 |Standardize GCP project, region, and zone argument|1
> AIRFLOW-1394 |Add quote_character parameter to GoogleCloudStorag|1
> AIRFLOW-1389 |BigQueryOperator should support `createDisposition|1
> AIRFLOW-1384 |Add ARGO/CaDC |1
> AIRFLOW-1359 |Provide GoogleCloudML operator for model evaluatio|1
> AIRFLOW-1352 |Revert bad logging Handler|0
> AIRFLOW-1350 |Add "query_uri" parameter for Google DataProc oper|1
> AIRFLOW-1345 |Don't commit on each loop |1
> AIRFLOW-1344 |Builds failing on Python 3.5 with AttributeError  |1
> AIRFLOW-1343 |Add airflow default label to the dataproc operator|1
> AIRFLOW-1338 |gcp_dataflow_hook is incompatible with the recent |1
> AIRFLOW-1337 |Customize log format via config file  |1
> AIRFLOW-1335 |Use buffered logger   |1
>
> If you have stuff you want to get in, please set it with a fix version of
> 1.9.0.
>
> Please begin testing, stabilizing, and reporting bugs now! :)
>
> Cheers,
> Chris
>


Re: As history grows UI gets slower

2017-08-28 Thread Alex Guziel
Here at Airbnb we delete old "completed" task instances.

On Mon, Aug 28, 2017 at 3:01 PM, David Capwell  wrote:

> We are on 1.8.0 and have a monitor DAG that monitors the health of Airflow
> and Celery every minute.  This has been running for awhile now and at 26k
> dag runs. We see that the UI for this DAG is multiple seconds slower (6-7
> second) than any other DAG.
>
> My question is, what do people do about managing history as it grows over
> time? Do people delete history after N or so days?
>
> Thanks for your time reading this email
>


Re: Bad Request CSRF

2017-08-17 Thread Alex Guziel
Curious, how did you fix this? We see this from time-to-time and we also
have a single sign-on system.

On Wed, Aug 16, 2017 at 10:29 AM, George Leslie-Waksman <
geo...@cloverhealth.com.invalid> wrote:

> I have further tracked the issue to our new single-sign-on system. Airflow
> is fine. Please disregard.
>
> On Wed, Aug 16, 2017 at 9:15 AM Chris Riccomini 
> wrote:
>
> > Try
> >
> > pip install --upgrade flask-wtf
> >
> > On Tue, Aug 15, 2017 at 4:10 PM, George Leslie-Waksman <
> > geo...@cloverhealth.com.invalid> wrote:
> >
> > > I'm seeing "Bad Request \n CSRF token missing or incorrect." when
> > > attempting to clear things from the Task Instance interface in Airflow
> > > 1.8.1.
> > >
> > > Is anyone else seeing this or is this more likely something on our end?
> > >
> > > --George
> > >
> >
>


Landscape broken - anyone knows what's up?

2017-08-10 Thread Alex Guziel
It seems Landscape is throwing 500s. When I look into it,
https://landscape.io/github/apache/incubator-airflow/
"The repository *apache/incubator-airflow* is not being checked by
Landscape."

Does anyone have the perms to fix this?


Re: Stuck Tasks that don't report status

2017-08-09 Thread Alex Guziel
I know with a scheduler restart, tasks that may still report as running
even though they are not.

On Wed, Aug 9, 2017 at 6:07 PM, David Klosowski 
wrote:

> Hi Gerard,
>
> The interesting thing is that we didn't see this issue in 1.7.1.3 but we
> did when upgrading to 1.8.0.
>
> We aren't seeing any timeout on the task in question to be quite honest.
> The state of the task never changes and we have reasonable timeouts on our
> tasks that would notify us.  The task is in fact "stuck" w/o reporting any
> status.  There are other cases where tasks do in fact fail and then go into
> retry state, which we see normally (this happens quite a bit for us on
> deploys).  There is clearly some edge case here where the failure -> retry
> does not happen and the dagrun never updates.
>
> What we do see timeouts on Sensors that depend on those tasks and we've
> added SLAs to some of our important tasks to see issues earlier.
>
> Does anyone know where this code lives?  Is that a function of the
> dagrun_timeout?
>
> Thanks.
>
> Regards,
> David
>
>
>
>
>
>
> On Mon, Aug 7, 2017 at 1:30 PM, Gerard Toonstra 
> wrote:
>
> > Hi David,
> >
> > When tasks are put on the MQ, they are out of the control of the
> scheduler.
> > The scheduler puts the state of that task instance in "queued".
> >
> > What happens next:
> >
> > 1. A worker picks up the task to run and tries to run it.
> > 2. It first executes a couple of checks against the DB prior to executing
> > this. These are final instance checks to see
> > if it should still run when the worker is about to pick up the task
> > (another could have processed, started processing, etc).
> > 3. The worker puts the state of the TI in "running".
> > 4. The worker does the work as described in the operator
> > 5. The worker then updates the database with fail or success.
> >
> > If you kill the docker container doing the execution prior to it having
> > updated the state to success or fail,
> > it will get into a situation where a timeout must occur to get airflow to
> > see if the task failed or not. This is because
> > the worker is claiming to be processing the message, but this worker/task
> > got killed.
> >
> > It is actually the task instance updating the database, so if you leave
> > that container running, it will possibly finish
> > and update the db.
> >
> >
> > The task results are also communicated back to the executors and there's
> a
> > check to see if the results agree.
> >
> > You can find this code in models.py / Taskinstance / run()   and any
> > Executor you are using under (airflow/executors).
> >
> >
> > The reason why this happens I think is because docker doesn't really care
> > what's running at the moment, it's assuming 'services',
> > where you may have interruption of services because they are retried all
> > the time anyway. In an environment like airflow,
> > There's a persistent backend database that doesn't automatically retry
> > because it's driven through the scheduler, which only sees
> > a "RUNNING" record in the database.
> >
> > How to deal with this depends on your situation. If you run only short
> > running tasks (up to 5 mins), you could drain the task queue
> > by stopping the scheduler first. This means no new messages are sent to
> the
> > queue, so after 10 mins you should have no tasks running on any workers.
> >
> > Another way is to update the database inbetween, but I'd personally avoid
> > that as much as you can.
> >
> >
> > Not sure if anyone wants to chime in here on how to best deal with this
> in
> > docker?
> >
> > Rgds,
> >
> > Gerard
> >
> >
> > On Mon, Aug 7, 2017 at 8:21 PM, David Klosowski 
> > wrote:
> >
> > > Hi Airflow Dev List:
> > >
> > > Has anyone had cases where tasks get "stuck"?  What I mean by "stuck"
> is
> > > that tasks show as running through the Airflow UI but never actually
> run
> > > (and dependent tasks will eventually timeout).
> > >
> > > This only happens during our deployments and we replace all the hosts
> in
> > > our stack (3 workers and 1 host with the scheduler + webserver +
> flower)
> > > with a dockerized deployment.  We've been deploying to the worker hosts
> > > after the scheduler + webserver + flower host.
> > >
> > > It also doesn't occur all the time, which is a bit frustrating to try
> to
> > > debug.
> > >
> > > We have the following settings:
> > >
> > > > celery_result_backend = Postgres
> > > > sql_alchmey_conn = Postgres
> > > > broker_url = Redis
> > > > exector = CeleryExecutor
> > >
> > > Any thoughts from anyone regarding known issues or observed problems?
> I
> > > haven't seen a jira on this after looking through the Airflow jira.
> > >
> > > Thanks.
> > >
> > > Regards,
> > > David
> > >
> >
>


Re: Email on last failed try

2017-07-28 Thread Alex Guziel
Sounds like unintended behavior. That should be what email_on_retry does.
If you can repro, file a ticket.

On Fri, Jul 28, 2017 at 11:44 AM, Andrew Maguire 
wrote:

> Yeah - i have:
>
> 'email_on_failure': True
> 'retries': 4
>
> So i get emails on every try: e.g. Try 1 out of 5
>
> Really what i'm most worried about is the final failures then i have a
> problem, whereas if it fails 3 times and then succeeds i'm ok to be unaware
> of that.
>
> Maybe email routing via gmail might be a good idea - so i can send the
> emails with "Try 5 out of 5" to a priority folder and just let the other
> ones come somewhere that i can check on every now and then.
>
> Cheers,
> Andy
>
> On Fri, Jul 28, 2017 at 5:42 PM Maxime Beauchemin <
> maximebeauche...@gmail.com> wrote:
>
> > Wouldn't `email_on_failure=True` work for you?
> >
> > https://airflow.incubator.apache.org/code.html?
> highlight=email_on_failure#baseoperator
> >
> > On Fri, Jul 28, 2017 at 9:32 AM, Andrew Maguire 
> > wrote:
> >
> > > Hey,
> > >
> > > Just wondering if anyone knows if there might be a way to only send
> email
> > > on the last failed try of a task?
> > >
> > > Could I use a callable on failure only send the mail on the last failed
> > > try.
> > >
> > > We are using big query and getting lots of transient errors around
> limit
> > of
> > > concurrent queries that usually work on 2nd or 3rd try.
> > >
> > > Else I might just use Gmail rules to filter emails accordingly.
> > >
> > > Just wondering if anyone has done anything like this before or if there
> > > would be an easy enough way to do this.
> > >
> > > Cheers,
> > > Andy
> > >
> >
>


Re: Sensor slots utilization

2017-07-28 Thread Alex Guziel
I'm concerned that we would be making the logic more complex, unless the
new sensor 'pokeonce' case is just a high number of retries. And the other
overhead of course.
Running the poke method inline wouldn't be great for perf either since it's
a blocking I/O and would need to be handled async in order to not slow down
scheduling.

FWIW, our current setup at Airbnb has a separate queue for sensors with a
high number of slots per worker.

On Fri, Jul 28, 2017 at 11:14 AM, Maxime Beauchemin <
maximebeauche...@gmail.com> wrote:

> Thought his was interesting to bubble up to the mailing list. From:
> https://github.com/apache/incubator-airflow/pull/2423#
> issuecomment-318723842
>
> This is about the issue around sensors utilizing a lot of worker slots. The
> context is a PR from @shaform introducing sensors that check once and give
> up their slot and get reschedule for each sensing operation (as opposed to
> the current behavior of sleeping and poking while constantly using the slot
> until the criteria is met or timeout is reached)
>
> ---
>
> *So this is legitimate, but shifts some of the burden of slot utilization
> towards other costs like task startups costs and more communication
> overhead. These costs may be preferable depending on the
> scenario/environment. Starting a task can have significant overhead
> depending on the size of the DAG and other factors that depend on the
> executor. Say for the upcoming Kubernetes executor, startup may include
> booting up a docker instance and doing a shallow clone of the repo.*
>
> *Since this is a major change, I would argue that we shouldn't change the
> current default since organizations have provisioned and stabilized their
> environments based on the current behavior. Default behavior could be
> changed when moving to 2.0, which isn't really planned or scheduled at the
> moment.*
>
> *Another idea around reducing the overall sensor slot utilization would be
> to move that burden towards the scheduler (let's call it the supervisor now
> since it does more than just scheduling at this point). My idea there was
> to add a flag to BaseSensorOperator that would tell the scheduler to run
> the poke method in line with the scheduling instead of using the executor.
> In that scenario, there's no startup cost and no communication overhead.
> The downside is that it can slow down the scheduler. This would be a great
> option where sensing is cheap and fast*
>
> *That gives us potentially 3 sensor_modes, which I would argue should be
> implemented as a BaseOperator argument. Derivative classes can decide to
> expose the argument or force it. Administrator could also use
> the policy function to force certain sensing mode in certain or all
> contexts in their environment.*
>
> Max
>


[AIRFLOW-xxx] in commit messages

2017-07-25 Thread Alex Guziel
What is our actual ruling on this? I see quite a few commits without this
tag. Also, what is our policy on changes to README (like the company list)
and JIRA tickets? It seems like we apply this inconsistently, but we should
probably make a firm standard and abide by it.

Best,
Alex


Re: Role Based Access Control for Airflow UI

2017-07-25 Thread Alex Guziel
Yeah, I could call in but I probably won't be able to come down that day.

On Tue, Jul 25, 2017 at 1:36 PM, Maxime Beauchemin <
maximebeauche...@gmail.com> wrote:

> Works for me! Dan said he might confcall in. Alex?
>
> Max
>
> On Mon, Jul 24, 2017 at 11:25 AM, Chris Riccomini 
> wrote:
>
> > Wednesday 8/2 is perfect. Want to do it like 3-5? I booked a room for 12
> > people (and video conferencing) at WePay in this time slot. Should allow
> > you to head home easily afterwards. :) That work for you guys?
> >
> > On Mon, Jul 24, 2017 at 11:01 AM, Maxime Beauchemin <
> > maximebeauche...@gmail.com> wrote:
> >
> > > The week of the 31st sound good. Wednesday?
> > >
> > > About React we may not need a frontend lib like it (or at least not
> just
> > > yet). We can talk about it at the meeting.
> > >
> > > Max
> > >
> > > On Fri, Jul 21, 2017 at 12:47 AM, Bolke de Bruin 
> > > wrote:
> > >
> > > > We avoid React for the same reasons as the ASF and use Polymer 2
> > instead.
> > > > Would that work?
> > > >
> > > > Bolke.
> > > >
> > > > > On 20 Jul 2017, at 19:35, Chris Riccomini 
> > > wrote:
> > > > >
> > > > > Hey Max,
> > > > >
> > > > > Want to come down to WePay? We can set up a zoom for those that
> want
> > to
> > > > > join online, and record it as well to post for the community.
> > > > >
> > > > > Since Joy is just getting started, and it looks like there's going
> to
> > > be
> > > > a
> > > > > K8s discussion next week, maybe we can shoot for the week after
> (the
> > > week
> > > > > of the 31st of July)? Care to float a few times that week?
> > > > >
> > > > > Cheers,
> > > > > Chris
> > > > >
> > > > > On Thu, Jul 20, 2017 at 9:31 AM, Maxime Beauchemin <
> > > > > maximebeauche...@gmail.com> wrote:
> > > > >
> > > > >> Sounds awesome, count me in!
> > > > >>
> > > > >> * check out the prototype in my fork, I went far enough to hit
> some
> > > > >> hurdles, try different workarounds. I hooked up the Airflow
> > Bootstrap
> > > > >> template too so that we feel at home in this new UI
> > > > >> * using a single `id` field is a requirement for FAB that airflow
> > > > doesn't
> > > > >> respect (composite pks), either we add the feature to support that
> > in
> > > > FAB,
> > > > >> or we align on the Airflow side and modify the models and add a
> > > > migration
> > > > >> script. This upgrade would require downtime and might be annoying
> to
> > > the
> > > > >> Airflow community, but could help with db performance a bit
> (smaller
> > > > >> index)... I probably could be convinced either way but I'm leaning
> > on
> > > > >> improving FAB
> > > > >> * I'm a maintainer for FAB so I can help get stuff through there
> > > > >> * React is in limbo at the ASF for licensing reasons, so no React
> at
> > > > least
> > > > >> for now
> > > > >> * npm/webpack/ES6, javascript only in `.js` files
> > > > >> * I vote for eslint + eslint-config-airbnb as a set of linting
> rules
> > > > for JS
> > > > >> * Keep out of apache (for now), this new app ships as its own pypi
> > > > package
> > > > >> `airflow-webserver`, have a period of overlap (maintaining 2 web
> > apps)
> > > > >> before ripping out `airflow/www` from the core package
> > > > >> * You need to get in touch with Marty Kausas, an intern at Airbnb
> > > who's
> > > > >> been working on a Flask blueprint for improved, more personalized
> > > views
> > > > on
> > > > >> DAGs that we were planning on merging into the main branch
> > eventually.
> > > > Some
> > > > >> of Marty's idea and code could be merged into this effort.
> > > > >>
> > > > >> These are ideas on how I would proceed personally on this but
> > > definitely
> > > > >> everything here is up for discussion.
> > > > >>
> > > > >> Let's meet physically at either WePay or Airbnb. Folks from the
> > > > community,
> > > > >> let us know on this thread if you want to be part of this effort,
> > > we'll
> > > > be
> > > > >> happy to include you.
> > > > >>
> > > > >> Thanks,
> > > > >>
> > > > >> Max
> > > > >>
> > > > >> On Wed, Jul 19, 2017 at 7:33 PM, Joy Gao  wrote:
> > > > >>
> > > > >>> Hey everyone,
> > > > >>>
> > > > >>> I recently transferred to Data Infra team here at WePay to focus
> on
> > > > >>> Airflow-related initiatives.
> > > > >>>
> > > > >>> Given the RBAC design is mostly hashed out, I'm happy to get this
> > > > feature
> > > > >>> off the ground for Q3, starting with converting Airflow to Fab,
> if
> > > > there
> > > > >>> are no objections.
> > > > >>>
> > > > >>> Cheers,
> > > > >>> Joy
> > > > >>>
> > > > >>> On Thu, Jun 29, 2017 at 7:32 AM, Gurer Kiratli <
> > > > >>> gurer.kira...@airbnb.com.invalid> wrote:
> > > > >>>
> > > >  Hey all,
> > > > 
> > > >  We talked about this internally. We would like to work on this
> > > feature
> > > > >>> but
> > > >  given the immediate priorities we are not going to be working on
> > it
> > > in
> > > > >>> Q3.
> > > >  Comes end of Q3 we will reevaluate. Likely scenario is we can
> work
> > > on

Re: AIRFLOW-1258

2017-07-16 Thread Alex Guziel
I think this may be related to a celery bug. I'll follow up with more
details later.

On Sun, Jul 16, 2017 at 12:56 AM Jawahar Panchal 
wrote:

> Hi!
>
> I am currently running a couple of long-running tasks on a
> database/dataset at school for a project that results in behavior/log
> output similar to what was flagged in this bug:
> https://issues.apache.org/jira/browse/AIRFLOW-1258 <
> https://issues.apache.org/jira/browse/AIRFLOW-1258>
>
> Wasn’t sure if anyone on the list had seen anything similar, or would know
> what I can do to possibly debug further/patch. As it takes 1hr to test a
> change, needless to say any pointers from the dev team on the right
> direction to look within the codebase would be much appreciated! :)
>
> Thanks in advance for everyone’s/anyone's time and help - am not an
> Airflow expert, but am hopefully learning quickly enough to help resolve
> this issue (if I am ‘barking up the right tree’ with this bug number…)
>
> Cheers,
> J
>
>


Re: celereyd processes in the worker nodes

2017-07-12 Thread Alex Guziel
The celeryd processes exist even if they are idling

On Wed, Jul 12, 2017 at 11:40 PM Niranda Perera 
wrote:

> Hi,
>
> I am using the celery executor and in my worker node, I only have a single
> task running currently (there were number of tasks completed already). But
> when I check the processes, I get a long list of celeryd: processes
>
> Please find the terminal output here https://pastebin.com/uE5eZYvT
>
> any idea why this is and whether these are processes which have not been
> closed?
>
> Best regards
>
> Niranda Perera
> Research Assistant
> Dept of CSE, University of Moratuwa
> niranda...@cse.mrt.ac.lk
> +94 71 554 8430
> https://lk.linkedin.com/in/niranda
>


Re: Airflow profiling

2017-06-27 Thread Alex Guziel
Yeah, actually we have setup Newrelic for Airflow too at Airbnb, which
gives decent insights into webserver perf. In terms of SQL queries, adding
`echo=True` to the SQLAlchemy engine creation is pretty good for seeing
which sql queries get created. I tried some Python profilers before but
they weren't super helpful.

On Tue, Jun 27, 2017 at 1:27 PM, Maxime Beauchemin <
maximebeauche...@gmail.com> wrote:

> Nice. It would be great if DAG parsing was faster, and some of the
> endpoints on the website have grown really slow as you we've grown the
> number of DAGs, and on the DAGs with large number of tasks.
>
> I had the intuition that DAG parsing could be faster if operators
> late-imported hooks (who themselves import external libs) but I have no
> evidence or test to support it.
>
> I'm sure there's tons of low hanging fruit and this type of tool should
> make it really clear.
>
> We've set up NewRelic (which seems similar as this tooling at first sight)
> for Superset at Airbnb and it gave us great insight.
>
> Max
>
> On Tue, Jun 27, 2017 at 1:01 PM, Bolke de Bruin  wrote:
>
> > Free version also there, maybe more integration testing and benchmarking.
> >
> > https://stackimpact.com/pricing/ 
> >
> > B.
> >
> > > On 27 Jun 2017, at 22:00, Chris Riccomini 
> wrote:
> > >
> > > Seems you have to pay?
> > >
> > > On Tue, Jun 27, 2017 at 12:56 PM, Bolke de Bruin 
> > wrote:
> > >
> > >> Just saw this tool on hacker news:
> > >>
> > >> https://github.com/stackimpact/stackimpact-python <
> https://github.com/
> > >> stackimpact/stackimpact-python>
> > >>
> > >> Might be interesting for some profiling.
> > >>
> > >> Bolke
> >
> >
>


Re: [VOTE] Release Airflow 1.8.2 based on Airflow 1.8.2 RC2

2017-06-26 Thread Alex Guziel
Yeah that makes sense. It pages by default at 500 so it explains why we saw
it.

On Mon, Jun 26, 2017 at 2:53 PM, Chris Riccomini 
wrote:

> In 1.8.1, the "DAGs" page has "Show  entries". In 1.8.2, it has
> "Show <25> entries". So it looks like prior to 1.8.2, the pagination was
> broken in the sense that it defaulted to the whole list. We have 479 DAGs
> in one env, and it shows them all. It looks like someone fixed the entry to
> default to 25 now, which exposed the problem for our environments.
>
> On Mon, Jun 26, 2017 at 2:47 PM, Alex Guziel  invalid
> > wrote:
>
> > We're running 1.8.0 + some extras, and none of us added pagination
> > recently, and our homepage is paginated. Are you sure it's not the number
> > of dags crossing the threshold? Maybe it's some Flask version thing?
> >
> > On Mon, Jun 26, 2017 at 2:45 PM, Chris Riccomini 
> > wrote:
> >
> > > Yes, I did the 1.8.1 release.
> > >
> > > On Mon, Jun 26, 2017 at 2:44 PM, Alex Guziel  > > invalid
> > > > wrote:
> > >
> > > > There's no pagination in 1.8.1? Are you sure?
> > > >
> > > > On Mon, Jun 26, 2017 at 2:37 PM, Chris Riccomini <
> > criccom...@apache.org>
> > > > wrote:
> > > >
> > > > > It's not happening on 1.8.1 (since there's no pagination in that
> > > > version),
> > > > > so I'd count this as a regression. I wouldn't say it's blocking,
> but
> > > it's
> > > > > pretty ugly.
> > > > >
> > > > > On Mon, Jun 26, 2017 at 2:34 PM, Alex Guziel <
> alex.guz...@airbnb.com
> > .
> > > > > invalid
> > > > > > wrote:
> > > > >
> > > > > > I'm not so sure this is a new issue. I think we've seen it on our
> > > > > > production for quite a while.
> > > > > >
> > > > > > On Mon, Jun 26, 2017 at 2:31 PM, Chris Riccomini <
> > > > criccom...@apache.org>
> > > > > > wrote:
> > > > > >
> > > > > > > I am seeing a strange UI behavior on 1.8.2.RC2. I've opened a
> > JIRA
> > > > > here:
> > > > > > >
> > > > > > > https://issues.apache.org/jira/browse/AIRFLOW-1348
> > > > > > >
> > > > > > > Has anyone else seen this?
> > > > > > >
> > > > > > > On Mon, Jun 26, 2017 at 3:27 AM, Sumit Maheshwari <
> > > > > > sumeet.ma...@gmail.com>
> > > > > > > wrote:
> > > > > > >
> > > > > > > > +1, binding.
> > > > > > > >
> > > > > > > >
> > > > > > > > On Mon, Jun 26, 2017 at 3:49 PM, Bolke de Bruin <
> > > bdbr...@gmail.com
> > > > >
> > > > > > > wrote:
> > > > > > > >
> > > > > > > > > We have been running it for the last couple of days. No
> > issues
> > > > and
> > > > > > > seems
> > > > > > > > > more responsive.
> > > > > > > > >
> > > > > > > > > +1, binding
> > > > > > > > >
> > > > > > > > > Bolke
> > > > > > > > >
> > > > > > > > > > On 25 Jun 2017, at 01:10, Maxime Beauchemin <
> > > > > > > > maximebeauche...@gmail.com>
> > > > > > > > > wrote:
> > > > > > > > > >
> > > > > > > > > > Dear all,
> > > > > > > > > >
> > > > > > > > > > 1.8.2 RC2 is baked and available at:
> > > > > > > > > > https://dist.apache.org/repos/dist/dev/incubator/airflow
> ,
> > > > public
> > > > > > > keys
> > > > > > > > > > are available
> > > > > > > > > > at https://dist.apache.org/repos/
> > > > dist/release/incubator/airflow.
> > > > > > > > > >
> > > > > > > > > > Note that RC1 was the first RC (skipped RC0) and was
> never
> > > > > > announced
> > > > > > > > > since
> > > > > > > > 

Re: [VOTE] Release Airflow 1.8.2 based on Airflow 1.8.2 RC2

2017-06-26 Thread Alex Guziel
We're running 1.8.0 + some extras, and none of us added pagination
recently, and our homepage is paginated. Are you sure it's not the number
of dags crossing the threshold? Maybe it's some Flask version thing?

On Mon, Jun 26, 2017 at 2:45 PM, Chris Riccomini 
wrote:

> Yes, I did the 1.8.1 release.
>
> On Mon, Jun 26, 2017 at 2:44 PM, Alex Guziel  invalid
> > wrote:
>
> > There's no pagination in 1.8.1? Are you sure?
> >
> > On Mon, Jun 26, 2017 at 2:37 PM, Chris Riccomini 
> > wrote:
> >
> > > It's not happening on 1.8.1 (since there's no pagination in that
> > version),
> > > so I'd count this as a regression. I wouldn't say it's blocking, but
> it's
> > > pretty ugly.
> > >
> > > On Mon, Jun 26, 2017 at 2:34 PM, Alex Guziel  > > invalid
> > > > wrote:
> > >
> > > > I'm not so sure this is a new issue. I think we've seen it on our
> > > > production for quite a while.
> > > >
> > > > On Mon, Jun 26, 2017 at 2:31 PM, Chris Riccomini <
> > criccom...@apache.org>
> > > > wrote:
> > > >
> > > > > I am seeing a strange UI behavior on 1.8.2.RC2. I've opened a JIRA
> > > here:
> > > > >
> > > > > https://issues.apache.org/jira/browse/AIRFLOW-1348
> > > > >
> > > > > Has anyone else seen this?
> > > > >
> > > > > On Mon, Jun 26, 2017 at 3:27 AM, Sumit Maheshwari <
> > > > sumeet.ma...@gmail.com>
> > > > > wrote:
> > > > >
> > > > > > +1, binding.
> > > > > >
> > > > > >
> > > > > > On Mon, Jun 26, 2017 at 3:49 PM, Bolke de Bruin <
> bdbr...@gmail.com
> > >
> > > > > wrote:
> > > > > >
> > > > > > > We have been running it for the last couple of days. No issues
> > and
> > > > > seems
> > > > > > > more responsive.
> > > > > > >
> > > > > > > +1, binding
> > > > > > >
> > > > > > > Bolke
> > > > > > >
> > > > > > > > On 25 Jun 2017, at 01:10, Maxime Beauchemin <
> > > > > > maximebeauche...@gmail.com>
> > > > > > > wrote:
> > > > > > > >
> > > > > > > > Dear all,
> > > > > > > >
> > > > > > > > 1.8.2 RC2 is baked and available at:
> > > > > > > > https://dist.apache.org/repos/dist/dev/incubator/airflow,
> > public
> > > > > keys
> > > > > > > > are available
> > > > > > > > at https://dist.apache.org/repos/
> > dist/release/incubator/airflow.
> > > > > > > >
> > > > > > > > Note that RC1 was the first RC (skipped RC0) and was never
> > > > announced
> > > > > > > since
> > > > > > > > it had issues coming out of the oven, so RC2 is the first
> > public
> > > > RC.
> > > > > > > >
> > > > > > > > 1.8.2 RC2 is build on to of 1.8.1 with these listed
> "cherries"
> > on
> > > > > top.
> > > > > > I
> > > > > > > > added the JIRAs that were identified blockers and targeted
> > > 1.8.2. I
> > > > > > > > attempted to bring in all of the JIRAs that targeted 1.8.2
> but
> > > > bailed
> > > > > > on
> > > > > > > > the ones that were generating merge conflicts. I also added
> all
> > > of
> > > > > the
> > > > > > > > JIRAs that we've been running in production at Airbnb.
> > > > > > > >
> > > > > > > > Issues fixed:
> > > > > > > > 9a53e66 [AIRFLOW-809][AIRFLOW-1] Use __eq__ ColumnOperator
> When
> > > > > Testing
> > > > > > > > Booleans
> > > > > > > > 333e0b3 [AIRFLOW-1296] Propagate SKIPPED to all downstream
> > tasks
> > > > > > > > 93825d5 [AIRFLOW-XXX] Re-enable caching for hadoop components
> > > > > > > > 33a9dcb [AIRFLOW-XXX] Pin Hive and Hadoop to a specific
> version
> > > and
> > > > > > > create
> > > 

Re: [VOTE] Release Airflow 1.8.2 based on Airflow 1.8.2 RC2

2017-06-26 Thread Alex Guziel
Also, still see `airflow.exceptions.AirflowConfigException: section/key
[celery/celery_ssl_active] not found in config` when running with celery
executor. Likely this PR fixes,
https://github.com/apache/incubator-airflow/pull/2341, but it needs rebase
to try to pass tests.

On Mon, Jun 26, 2017 at 2:44 PM, Alex Guziel  wrote:

> There's no pagination in 1.8.1? Are you sure?
>
> On Mon, Jun 26, 2017 at 2:37 PM, Chris Riccomini 
> wrote:
>
>> It's not happening on 1.8.1 (since there's no pagination in that version),
>> so I'd count this as a regression. I wouldn't say it's blocking, but it's
>> pretty ugly.
>>
>> On Mon, Jun 26, 2017 at 2:34 PM, Alex Guziel > .invalid
>> > wrote:
>>
>> > I'm not so sure this is a new issue. I think we've seen it on our
>> > production for quite a while.
>> >
>> > On Mon, Jun 26, 2017 at 2:31 PM, Chris Riccomini > >
>> > wrote:
>> >
>> > > I am seeing a strange UI behavior on 1.8.2.RC2. I've opened a JIRA
>> here:
>> > >
>> > > https://issues.apache.org/jira/browse/AIRFLOW-1348
>> > >
>> > > Has anyone else seen this?
>> > >
>> > > On Mon, Jun 26, 2017 at 3:27 AM, Sumit Maheshwari <
>> > sumeet.ma...@gmail.com>
>> > > wrote:
>> > >
>> > > > +1, binding.
>> > > >
>> > > >
>> > > > On Mon, Jun 26, 2017 at 3:49 PM, Bolke de Bruin 
>> > > wrote:
>> > > >
>> > > > > We have been running it for the last couple of days. No issues and
>> > > seems
>> > > > > more responsive.
>> > > > >
>> > > > > +1, binding
>> > > > >
>> > > > > Bolke
>> > > > >
>> > > > > > On 25 Jun 2017, at 01:10, Maxime Beauchemin <
>> > > > maximebeauche...@gmail.com>
>> > > > > wrote:
>> > > > > >
>> > > > > > Dear all,
>> > > > > >
>> > > > > > 1.8.2 RC2 is baked and available at:
>> > > > > > https://dist.apache.org/repos/dist/dev/incubator/airflow,
>> public
>> > > keys
>> > > > > > are available
>> > > > > > at https://dist.apache.org/repos/dist/release/incubator/airflow
>> .
>> > > > > >
>> > > > > > Note that RC1 was the first RC (skipped RC0) and was never
>> > announced
>> > > > > since
>> > > > > > it had issues coming out of the oven, so RC2 is the first public
>> > RC.
>> > > > > >
>> > > > > > 1.8.2 RC2 is build on to of 1.8.1 with these listed "cherries"
>> on
>> > > top.
>> > > > I
>> > > > > > added the JIRAs that were identified blockers and targeted
>> 1.8.2. I
>> > > > > > attempted to bring in all of the JIRAs that targeted 1.8.2 but
>> > bailed
>> > > > on
>> > > > > > the ones that were generating merge conflicts. I also added all
>> of
>> > > the
>> > > > > > JIRAs that we've been running in production at Airbnb.
>> > > > > >
>> > > > > > Issues fixed:
>> > > > > > 9a53e66 [AIRFLOW-809][AIRFLOW-1] Use __eq__ ColumnOperator When
>> > > Testing
>> > > > > > Booleans
>> > > > > > 333e0b3 [AIRFLOW-1296] Propagate SKIPPED to all downstream tasks
>> > > > > > 93825d5 [AIRFLOW-XXX] Re-enable caching for hadoop components
>> > > > > > 33a9dcb [AIRFLOW-XXX] Pin Hive and Hadoop to a specific version
>> and
>> > > > > create
>> > > > > > writable warehouse dir
>> > > > > > 7cff6cd [AIRFLOW-1308] Disable nanny usage for Dask
>> > > > > > 570b2ed [AIRFLOW-1294] Backfills can loose tasks to execute
>> > > > > > 3f48d48 [AIRFLOW-1291] Update NOTICE and LICENSE files to match
>> ASF
>> > > > > > requirements
>> > > > > > 69bd269 [AIRFLOW-1160] Update Spark parameters for Mesos
>> > > > > > 9692510 [AIRFLOW 1149][AIRFLOW-1149] Allow for custom filters in
>> > > Jinja2
>> > > > > > templates
>> > > > > > 6de5330 [AIRFLOW-1119] Fix unload query so headers are on f

Re: [VOTE] Release Airflow 1.8.2 based on Airflow 1.8.2 RC2

2017-06-26 Thread Alex Guziel
There's no pagination in 1.8.1? Are you sure?

On Mon, Jun 26, 2017 at 2:37 PM, Chris Riccomini 
wrote:

> It's not happening on 1.8.1 (since there's no pagination in that version),
> so I'd count this as a regression. I wouldn't say it's blocking, but it's
> pretty ugly.
>
> On Mon, Jun 26, 2017 at 2:34 PM, Alex Guziel  invalid
> > wrote:
>
> > I'm not so sure this is a new issue. I think we've seen it on our
> > production for quite a while.
> >
> > On Mon, Jun 26, 2017 at 2:31 PM, Chris Riccomini 
> > wrote:
> >
> > > I am seeing a strange UI behavior on 1.8.2.RC2. I've opened a JIRA
> here:
> > >
> > > https://issues.apache.org/jira/browse/AIRFLOW-1348
> > >
> > > Has anyone else seen this?
> > >
> > > On Mon, Jun 26, 2017 at 3:27 AM, Sumit Maheshwari <
> > sumeet.ma...@gmail.com>
> > > wrote:
> > >
> > > > +1, binding.
> > > >
> > > >
> > > > On Mon, Jun 26, 2017 at 3:49 PM, Bolke de Bruin 
> > > wrote:
> > > >
> > > > > We have been running it for the last couple of days. No issues and
> > > seems
> > > > > more responsive.
> > > > >
> > > > > +1, binding
> > > > >
> > > > > Bolke
> > > > >
> > > > > > On 25 Jun 2017, at 01:10, Maxime Beauchemin <
> > > > maximebeauche...@gmail.com>
> > > > > wrote:
> > > > > >
> > > > > > Dear all,
> > > > > >
> > > > > > 1.8.2 RC2 is baked and available at:
> > > > > > https://dist.apache.org/repos/dist/dev/incubator/airflow, public
> > > keys
> > > > > > are available
> > > > > > at https://dist.apache.org/repos/dist/release/incubator/airflow.
> > > > > >
> > > > > > Note that RC1 was the first RC (skipped RC0) and was never
> > announced
> > > > > since
> > > > > > it had issues coming out of the oven, so RC2 is the first public
> > RC.
> > > > > >
> > > > > > 1.8.2 RC2 is build on to of 1.8.1 with these listed "cherries" on
> > > top.
> > > > I
> > > > > > added the JIRAs that were identified blockers and targeted
> 1.8.2. I
> > > > > > attempted to bring in all of the JIRAs that targeted 1.8.2 but
> > bailed
> > > > on
> > > > > > the ones that were generating merge conflicts. I also added all
> of
> > > the
> > > > > > JIRAs that we've been running in production at Airbnb.
> > > > > >
> > > > > > Issues fixed:
> > > > > > 9a53e66 [AIRFLOW-809][AIRFLOW-1] Use __eq__ ColumnOperator When
> > > Testing
> > > > > > Booleans
> > > > > > 333e0b3 [AIRFLOW-1296] Propagate SKIPPED to all downstream tasks
> > > > > > 93825d5 [AIRFLOW-XXX] Re-enable caching for hadoop components
> > > > > > 33a9dcb [AIRFLOW-XXX] Pin Hive and Hadoop to a specific version
> and
> > > > > create
> > > > > > writable warehouse dir
> > > > > > 7cff6cd [AIRFLOW-1308] Disable nanny usage for Dask
> > > > > > 570b2ed [AIRFLOW-1294] Backfills can loose tasks to execute
> > > > > > 3f48d48 [AIRFLOW-1291] Update NOTICE and LICENSE files to match
> ASF
> > > > > > requirements
> > > > > > 69bd269 [AIRFLOW-1160] Update Spark parameters for Mesos
> > > > > > 9692510 [AIRFLOW 1149][AIRFLOW-1149] Allow for custom filters in
> > > Jinja2
> > > > > > templates
> > > > > > 6de5330 [AIRFLOW-1119] Fix unload query so headers are on first
> > row[]
> > > > > > b4e9eb8 [AIRFLOW-1089] Add Spark application arguments
> > > > > > a4083f3 [AIRFLOW-1078] Fix latest_runs endpoint for old flask
> > > versions
> > > > > > 7a02841 [AIRFLOW-1074] Don't count queued tasks for concurrency
> > > limits
> > > > > > a2c18a5 [AIRFLOW-1064] Change default sort to job_id for
> > > > > > TaskInstanceModelView
> > > > > > d1c64ab [AIRFLOW-1038] Specify celery serialization options
> > > explicitly
> > > > > > b4ee88a [AIRFLOW-1036] Randomize exponential backoff
> > > > > > 9fca409 [AIRFLOW-993] Update date infer

Re: [VOTE] Release Airflow 1.8.2 based on Airflow 1.8.2 RC2

2017-06-26 Thread Alex Guziel
I'm not so sure this is a new issue. I think we've seen it on our
production for quite a while.

On Mon, Jun 26, 2017 at 2:31 PM, Chris Riccomini 
wrote:

> I am seeing a strange UI behavior on 1.8.2.RC2. I've opened a JIRA here:
>
> https://issues.apache.org/jira/browse/AIRFLOW-1348
>
> Has anyone else seen this?
>
> On Mon, Jun 26, 2017 at 3:27 AM, Sumit Maheshwari 
> wrote:
>
> > +1, binding.
> >
> >
> > On Mon, Jun 26, 2017 at 3:49 PM, Bolke de Bruin 
> wrote:
> >
> > > We have been running it for the last couple of days. No issues and
> seems
> > > more responsive.
> > >
> > > +1, binding
> > >
> > > Bolke
> > >
> > > > On 25 Jun 2017, at 01:10, Maxime Beauchemin <
> > maximebeauche...@gmail.com>
> > > wrote:
> > > >
> > > > Dear all,
> > > >
> > > > 1.8.2 RC2 is baked and available at:
> > > > https://dist.apache.org/repos/dist/dev/incubator/airflow, public
> keys
> > > > are available
> > > > at https://dist.apache.org/repos/dist/release/incubator/airflow.
> > > >
> > > > Note that RC1 was the first RC (skipped RC0) and was never announced
> > > since
> > > > it had issues coming out of the oven, so RC2 is the first public RC.
> > > >
> > > > 1.8.2 RC2 is build on to of 1.8.1 with these listed "cherries" on
> top.
> > I
> > > > added the JIRAs that were identified blockers and targeted 1.8.2. I
> > > > attempted to bring in all of the JIRAs that targeted 1.8.2 but bailed
> > on
> > > > the ones that were generating merge conflicts. I also added all of
> the
> > > > JIRAs that we've been running in production at Airbnb.
> > > >
> > > > Issues fixed:
> > > > 9a53e66 [AIRFLOW-809][AIRFLOW-1] Use __eq__ ColumnOperator When
> Testing
> > > > Booleans
> > > > 333e0b3 [AIRFLOW-1296] Propagate SKIPPED to all downstream tasks
> > > > 93825d5 [AIRFLOW-XXX] Re-enable caching for hadoop components
> > > > 33a9dcb [AIRFLOW-XXX] Pin Hive and Hadoop to a specific version and
> > > create
> > > > writable warehouse dir
> > > > 7cff6cd [AIRFLOW-1308] Disable nanny usage for Dask
> > > > 570b2ed [AIRFLOW-1294] Backfills can loose tasks to execute
> > > > 3f48d48 [AIRFLOW-1291] Update NOTICE and LICENSE files to match ASF
> > > > requirements
> > > > 69bd269 [AIRFLOW-1160] Update Spark parameters for Mesos
> > > > 9692510 [AIRFLOW 1149][AIRFLOW-1149] Allow for custom filters in
> Jinja2
> > > > templates
> > > > 6de5330 [AIRFLOW-1119] Fix unload query so headers are on first row[]
> > > > b4e9eb8 [AIRFLOW-1089] Add Spark application arguments
> > > > a4083f3 [AIRFLOW-1078] Fix latest_runs endpoint for old flask
> versions
> > > > 7a02841 [AIRFLOW-1074] Don't count queued tasks for concurrency
> limits
> > > > a2c18a5 [AIRFLOW-1064] Change default sort to job_id for
> > > > TaskInstanceModelView
> > > > d1c64ab [AIRFLOW-1038] Specify celery serialization options
> explicitly
> > > > b4ee88a [AIRFLOW-1036] Randomize exponential backoff
> > > > 9fca409 [AIRFLOW-993] Update date inference logic
> > > > 272c2f5 [AIRFLOW-1167] Support microseconds in FTPHook modification
> > time
> > > > c7c0b72 [AIRFLOW-1179] Fix Pandas 0.2x breaking Google BigQuery
> change
> > > > acd0166 [AIRFLOW-1263] Dynamic height for charts
> > > > 7f33f6e [AIRFLOW-1266] Increase width of gantt y axis
> > > > fc33c04 [AIRFLOW-1290] set docs author to 'Apache Airflow'
> > > > 2e9eee3 [AIRFLOW-1282] Fix known event column sorting
> > > > 2389a8a [AIRFLOW-1166] Speed up _change_state_for_tis_without_dagrun
> > > > bf966e6 [AIRFLOW-1192] Some enhancements to qubole_operator
> > > > 57d5bcd [AIRFLOW-1281] Sort variables by key field by default
> > > > 802fc15 [AIRFLOW-1244] Forbid creation of a pool with empty name
> > > > 1232b6a [AIRFLOW-1243] DAGs table has no default entries to show
> > > > b0ba3c9 [AIRFLOW-1227] Remove empty column on the Logs view
> > > > c406652 [AIRFLOW-1226] Remove empty column on the Jobs view
> > > > 51a83cc [AIRFLOW-1199] Fix create modal
> > > > cac7d4c [AIRFLOW-1200] Forbid creation of a variable with an empty
> key
> > > > 5f3ee52 [AIRFLOW-1186] Sort dag.get_task_instances by execution_date
> > > > f446c08 [AIRFLOW-1145] Fix closest_date_partition function with
> before
> > > set
> > > > to True If we're looking for the closest date before, we should take
> > the
> > > > latest date in the list of date before.
> > > > 93b8e96 [AIRFLOW-1180] Fix flask-wtf version for test_csrf_rejection
> > > > bb56805 [AIRFLOW-1170] DbApiHook insert_rows inserts parameters
> > > separately
> > > > 093b2f0 [AIRFLOW-1150] Fix scripts execution in sparksql hook[]
> > > > 777f181 [AIRFLOW-1168] Add closing() to all connections and cursors
> > > >
> > > > Max
> > >
> > >
> >
>


Re: Tasks Queued but never run

2017-06-01 Thread Alex Guziel
We've noticed this with celery, relating to this
https://github.com/celery/celery/issues/3765

We also use `-n 5` option on the scheduler so it restarts every 5 runs,
which will reset all queued tasks.

Best,
Alex

On Thu, Jun 1, 2017 at 2:18 PM, Josef Samanek 
wrote:

> Hi!
>
> We have a problem with our airflow. Sometimes, several tasks get queued
> but they never get run and remain in Queud state forever. Other tasks from
> the same schedule interval run. And next schedule interval runs normally
> too. But these several tasks remain queued.
>
> We are using Airflow 1.8.1. Currently with CeleryExecutor and redis, but
> we had the same problem with LocalExecutor as well (actually switching to
> Celery helped quite a bit, the problem now happens way less often, but
> still it happens). We have 18 DAGs total, 13 active. Some have just 1-2
> tasks, but some are more complex, like 8 tasks or so and with upstreams.
> There are also ExternalTaskSensor tasks used.
>
> I tried playing around with DAG configurations (limiting concurrency,
> max_active_runs, ...), tried switching off some DAGs completely (not all
> but most) etc., so far nothing helped. Right now, I am not really sure,
> what else to try to identify a solve the issue.
>
> I am getting a bit desperate, so I would really appreciate any help with
> this. Thank you all in advance!
>
> Joe
>


Re: Discussion on Airflow 1.8.1 RC2

2017-05-04 Thread Alex Guziel
I don't think any of the fixes I did were regressions.

On Thu, May 4, 2017 at 8:11 AM, Bolke de Bruin  wrote:

> I know of one that Alex wanted to get in, but wasn’t targeted for 1.8.1 in
> Jira and thus didn’t make the cut at RC time. There is is another one out
> that seems to have stalled a bit (https://github.com/apache/
> incubator-airflow/pull/2205).
>
> Reading the changelog of 1.8.1 I see bug fixes, apache requirements and
> one “new” feature (UI lightning bolt). Regressions could have happened but
> we have been quite vigilant on the fact that these bug fixes needed proper
> tests, so I am very interested in 1.8.0 -> 1.8.1 regressions. If it is a
> pre-backfill-change 1.8.0 to 1.8.1 regression then I would also like to
> know, cause I made that change and feel responsible for it.
>
> Cheers
> Bolke
>
>
> On 3 May 2017, at 22:13, Dan Davydov  wrote:
>
> cc Alex and Rui who were working on fixes, I'm not sure if their commits
> got in before 1.8.1.
>
> On Wed, May 3, 2017 at 1:09 PM, Bolke de Bruin  wrote:
>
>> Hi Dan,
>>
>> (Thread renamed to make sure it does not clash, dev@ now added)
>>
>> It surprises me that you found regression from 1.8.0 to 1.8.1 as 1.8.1 is
>> very much focused on bug fixes. Were the regressions shared yet?
>>
>> The whole 1.8.X release will be bug fix focused (per release management)
>> and minor feature updates. The 1.9.0 release will be the first release with
>> major feature updates. So what you want, more robustness and focus on
>> stability, is now underway. I agree with beefing up tests and including the
>> major operators in this. Executors should also be on this list btw. Turning
>> on coverage reporting might be a first step in helping this (it isn’t the
>> solution obviously).
>>
>> Cheers
>> Bolke
>>
>>
>> On 3 May 2017, at 20:28, Dan Davydov  wrote:
>>
>> We saw several regressions moving from 1.8.0 to 1.8.1 the first time we
>> tried, and while I think we merged all our fixes to master (not sure if
>> they all made it into 1.8.1 however), we have put releasing on hold due to
>> stability issues from the last couple of releases. It's either the case
>> that:
>> A) Airbnb requires more robustness from new releases.
>> or
>> B) Most companies using Airflow require more robustness and we should
>> halt on feature work until we are more confident in our testing
>>
>> I think the biggest problem currently is the lack of unit testing
>> coverage, e.g. when the backfill framework was refactored (which was the
>> right long-term fix), it caused a lot of breakages that weren't caught by
>> tests. I think we need to audit the major operators/classes and beef up the
>> unit testing coverage. The coverage metric does not necessarily cover these
>> cases (e.g. cyclomatic complexity). Writing regression tests is good but we
>> shouldn't have so many new blocker issues in our releases.
>>
>> We are fighting some fires internally at the moment (not Airflow
>> related), but Alex and I have been working on some stuff that we will push
>> to the community once we are done. Alex is working on a good solution for
>> python package isolation, and I'm working on integration with Kubernetes at
>> the executor level.
>>
>> Feel free to forward any of my messages to the dev mailing list.
>>
>> On Wed, May 3, 2017 at 11:18 AM, Bolke de Bruin 
>> wrote:
>>
>>> Grrr, I seriously dislike to send button on the touch bar…here goes
>>> again.
>>>
>>> Hi Dan,
>>>
>>> (Please note I would like to forward the next message to dev@, but let
>>> me know if you don’t find it comfortable)
>>>
>>> I understand your point. The gap between 1.7.1 was large in terms of
>>> functionality changes etc. It was going to be a (bit?) rough and as you
>>> guys are using many of the edge cases you probably found more issues than
>>> any of us. Still, between 1.8.0 and 1.8.1 we have added many tests
>>> (coverage increased from 67% to close to 69%, which is a lot as you know).
>>> It would be nice if you can share where your areas of concern are so we can
>>> address those and a suggestion on how to proceed with integration tests is
>>> also welcome.
>>>
>>> You guys (=Airbnb) have been a bit quiet over the past couple of days,
>>> so I am getting a bit worried in terms of engagement. Is that warranted?
>>>
>>> Cheers
>>> Bolke
>>>
>>>
>>> On 3 May 2017, at 20:13, Bolke de Bruin  wrote:
>>>
>>> Hi Dan,
>>>
>>> (Please note I would like to forward the next message to dev@, but let
>>> me know if you don’t find it comfortable)
>>>
>>> I understand your point. The gap between 1.7.1 was large in terms of
>>> functionality changes etc. It was going to be a (bit?) rough and as you
>>> guys are using many of the edge cases you probably found more issues than
>>> any of us. Still, between 1.8.0 and 1.8.1 we have added many tests
>>> (coverage increased from 67
>>>
>>> On 3 May 2017, at 19:41, Arthur Wiedmer 
>>> wrote:
>>>
>>> As a counterpoint,
>>>
>>> I am comfortable voting +1 on this release in the sense that it fixes

Re: dag file processing times

2017-04-24 Thread Alex Guziel
It wouldn't really be serialization. You would still need to watch all the
dependent code unless you wanted a continuous parse going on.

On Mon, Apr 24, 2017 at 3:19 PM, Bolke de Bruin  wrote:

> That would be close to serialization which you could do with marshmallow
> (which works better than pickle).
>
> B.
>
> Sent from my iPhone
>
> > On 25 Apr 2017, at 00:07, Alex Guziel 
> wrote:
> >
> > You can also use reflection in Python to read the modules all the way
> down.
> >
> > On Mon, Apr 24, 2017 at 3:05 PM, Dan Davydov  invalid
> >> wrote:
> >
> >> Was talking with Alex about the DB case offline, for those we could
> support
> >> a force refresh arg with an interval param.
> >>
> >> Manifests would need to be hierarchal but I feel like it would spin out
> >> into a full blown build system inevitably.
> >>
> >> On Mon, Apr 24, 2017 at 3:02 PM, Arthur Wiedmer <
> arthur.wied...@gmail.com>
> >> wrote:
> >>
> >>> What if the DAG actually depends on configuration that only exists in a
> >>> database and is retrieved by the Python code generating the DAG?
> >>>
> >>> Just asking because we have this case in production here. It is slowly
> >>> changing, so still fits within the Airflow framework, but you cannot
> just
> >>> watch a file...
> >>>
> >>> Best,
> >>> Arthur
> >>>
> >>> On Mon, Apr 24, 2017 at 2:55 PM, Bolke de Bruin 
> >> wrote:
> >>>
> >>>> Inotify can work without a daemon. Just fire a call to the API when a
> >>> file
> >>>> changes. Just a few lines in bash.
> >>>>
> >>>> If you bundle you dependencies in a zip you should be fine with the
> >>> above.
> >>>> Or if we start using manifests that list the files that are needed in
> a
> >>>> dag...
> >>>>
> >>>>
> >>>> Sent from my iPhone
> >>>>
> >>>>> On 24 Apr 2017, at 22:46, Dan Davydov  >> INVALID>
> >>>> wrote:
> >>>>>
> >>>>> One idea to solve this is to use a daemon that uses inotify to watch
> >>> for
> >>>>> changes in files and then reprocesses just those files. The hard part
> >>> is
> >>>>> without any kind of dependency/build system for DAGs it can be hard
> >> to
> >>>> tell
> >>>>> which DAGs depend on which files.
> >>>>>
> >>>>> On Mon, Apr 24, 2017 at 1:21 PM, Gerard Toonstra <
> >> gtoons...@gmail.com>
> >>>>> wrote:
> >>>>>
> >>>>>> Hey,
> >>>>>>
> >>>>>> I've seen some people complain about DAG file processing times. An
> >>> issue
> >>>>>> was raised about this today:
> >>>>>>
> >>>>>> https://issues.apache.org/jira/browse/AIRFLOW-1139
> >>>>>>
> >>>>>> I attempted to provide a good explanation what's going on. Feel free
> >>> to
> >>>>>> validate and comment.
> >>>>>>
> >>>>>>
> >>>>>> I'm noticing that the file processor is a bit naive in the way it
> >>>>>> reprocesses DAGs. It doesn't look at the DAG interval for example,
> >> so
> >>> it
> >>>>>> looks like it reprocesses all files continuously in one big batch,
> >>> even
> >>>> if
> >>>>>> we can determine that the next "schedule"  for all its dags are in
> >> the
> >>>>>> future?
> >>>>>>
> >>>>>>
> >>>>>> Wondering if a change in the DagFileProcessingManager could optimize
> >>>> things
> >>>>>> a bit here.
> >>>>>>
> >>>>>> In the part where it gets the simple_dags from a file it's currently
> >>>>>> processing:
> >>>>>>
> >>>>>>   for simple_dag in processor.result:
> >>>>>>   simple_dags.append(simple_dag)
> >>>>>>
> >>>>>> the file_path is in the context and the simple_dags should be able
> >> to
> >>>>>> provide the next interval date for each dag in the file.
> >>>>>>
> >>>>>> The idea is to add files to a sorted deque by
> >> "next_schedule_datetime"
> >>>> (the
> >>>>>> minimum next interval date), so that when we build the list
> >>>>>> "files_paths_to_queue", it can remove files that have dags that we
> >>> know
> >>>>>> won't have a new dagrun for a while.
> >>>>>>
> >>>>>> One gotcha to resolve after that is to deal with files getting
> >> updated
> >>>> with
> >>>>>> new dags or changed dag definitions and renames and different
> >> interval
> >>>>>> schedules.
> >>>>>>
> >>>>>> Worth a PR to glance over?
> >>>>>>
> >>>>>> Rgds,
> >>>>>>
> >>>>>> Gerard
> >>>>>>
> >>>>
> >>>
> >>
>


Re: dag file processing times

2017-04-24 Thread Alex Guziel
You can also use reflection in Python to read the modules all the way down.

On Mon, Apr 24, 2017 at 3:05 PM, Dan Davydov  wrote:

> Was talking with Alex about the DB case offline, for those we could support
> a force refresh arg with an interval param.
>
> Manifests would need to be hierarchal but I feel like it would spin out
> into a full blown build system inevitably.
>
> On Mon, Apr 24, 2017 at 3:02 PM, Arthur Wiedmer 
> wrote:
>
> > What if the DAG actually depends on configuration that only exists in a
> > database and is retrieved by the Python code generating the DAG?
> >
> > Just asking because we have this case in production here. It is slowly
> > changing, so still fits within the Airflow framework, but you cannot just
> > watch a file...
> >
> > Best,
> > Arthur
> >
> > On Mon, Apr 24, 2017 at 2:55 PM, Bolke de Bruin 
> wrote:
> >
> > > Inotify can work without a daemon. Just fire a call to the API when a
> > file
> > > changes. Just a few lines in bash.
> > >
> > > If you bundle you dependencies in a zip you should be fine with the
> > above.
> > > Or if we start using manifests that list the files that are needed in a
> > > dag...
> > >
> > >
> > > Sent from my iPhone
> > >
> > > > On 24 Apr 2017, at 22:46, Dan Davydov  INVALID>
> > > wrote:
> > > >
> > > > One idea to solve this is to use a daemon that uses inotify to watch
> > for
> > > > changes in files and then reprocesses just those files. The hard part
> > is
> > > > without any kind of dependency/build system for DAGs it can be hard
> to
> > > tell
> > > > which DAGs depend on which files.
> > > >
> > > > On Mon, Apr 24, 2017 at 1:21 PM, Gerard Toonstra <
> gtoons...@gmail.com>
> > > > wrote:
> > > >
> > > >> Hey,
> > > >>
> > > >> I've seen some people complain about DAG file processing times. An
> > issue
> > > >> was raised about this today:
> > > >>
> > > >> https://issues.apache.org/jira/browse/AIRFLOW-1139
> > > >>
> > > >> I attempted to provide a good explanation what's going on. Feel free
> > to
> > > >> validate and comment.
> > > >>
> > > >>
> > > >> I'm noticing that the file processor is a bit naive in the way it
> > > >> reprocesses DAGs. It doesn't look at the DAG interval for example,
> so
> > it
> > > >> looks like it reprocesses all files continuously in one big batch,
> > even
> > > if
> > > >> we can determine that the next "schedule"  for all its dags are in
> the
> > > >> future?
> > > >>
> > > >>
> > > >> Wondering if a change in the DagFileProcessingManager could optimize
> > > things
> > > >> a bit here.
> > > >>
> > > >> In the part where it gets the simple_dags from a file it's currently
> > > >> processing:
> > > >>
> > > >>for simple_dag in processor.result:
> > > >>simple_dags.append(simple_dag)
> > > >>
> > > >> the file_path is in the context and the simple_dags should be able
> to
> > > >> provide the next interval date for each dag in the file.
> > > >>
> > > >> The idea is to add files to a sorted deque by
> "next_schedule_datetime"
> > > (the
> > > >> minimum next interval date), so that when we build the list
> > > >> "files_paths_to_queue", it can remove files that have dags that we
> > know
> > > >> won't have a new dagrun for a while.
> > > >>
> > > >> One gotcha to resolve after that is to deal with files getting
> updated
> > > with
> > > >> new dags or changed dag definitions and renames and different
> interval
> > > >> schedules.
> > > >>
> > > >> Worth a PR to glance over?
> > > >>
> > > >> Rgds,
> > > >>
> > > >> Gerard
> > > >>
> > >
> >
>


Re: [VOTE] Release Airflow 1.8.1 based on Airflow 1.8.1 RC0

2017-04-17 Thread Alex Guziel
Sorry about that. FWIW, these were recent and I don't think they were
blockers but are nice to fix. Particularly, the tree one was forgotten
about. I remember seeing it at the Airflow hackathon but I guess I forgot
to correct it.

On Mon, Apr 17, 2017 at 12:17 PM, Chris Riccomini 
wrote:

> :(:(:( Why was this not included in 1.8.1 JIRA? I've been emailing the list
> all last week
>
> On Mon, Apr 17, 2017 at 11:28 AM, Alex Guziel <
> alex.guz...@airbnb.com.invalid> wrote:
>
> > I would say to include [1074] (
> > https://github.com/apache/incubator-airflow/pull/2221) so we don't have
> a
> > regression in the release after. I would also say
> > https://github.com/apache/incubator-airflow/pull/2241 is semi important
> > but
> > less so.
> >
> > On Mon, Apr 17, 2017 at 11:24 AM, Chris Riccomini  >
> > wrote:
> >
> > > Dear All,
> > >
> > > I have been able to make the Airflow 1.8.1 RC0 available at:
> > > https://dist.apache.org/repos/dist/dev/incubator/airflow, public keys
> > are
> > > available at https://dist.apache.org/repos/
> > dist/release/incubator/airflow.
> > >
> > > Issues fixed:
> > >
> > > [AIRFLOW-1062] DagRun#find returns wrong result if external_trigg
> > > [AIRFLOW-1054] Fix broken import on test_dag
> > > [AIRFLOW-1050] Retries ignored - regression
> > > [AIRFLOW-1033] TypeError: can't compare datetime.datetime to None
> > > [AIRFLOW-1030] HttpHook error when creating HttpSensor
> > > [AIRFLOW-1017] get_task_instance should return None instead of th
> > > [AIRFLOW-1011] Fix bug in BackfillJob._execute() for SubDAGs
> > > [AIRFLOW-1001] Landing Time shows "unsupported operand type(s) fo
> > > [AIRFLOW-1000] Rebrand to Apache Airflow instead of Airflow
> > > [AIRFLOW-989] Clear Task Regression
> > > [AIRFLOW-974] airflow.util.file mkdir has a race condition
> > > [AIRFLOW-906] Update Code icon from lightning bolt to file
> > > [AIRFLOW-858] Configurable database name for DB operators
> > > [AIRFLOW-853] ssh_execute_operator.py stdout decode default to A
> > > [AIRFLOW-832] Fix debug server
> > > [AIRFLOW-817] Trigger dag fails when using CLI + API
> > > [AIRFLOW-816] Make sure to pull nvd3 from local resources
> > > [AIRFLOW-815] Add previous/next execution dates to available def
> > > [AIRFLOW-813] Fix unterminated unit tests in tests.job (tests/jo
> > > [AIRFLOW-812] Scheduler job terminates when there is no dag file
> > > [AIRFLOW-806] UI should properly ignore DAG doc when it is None
> > > [AIRFLOW-794] Consistent access to DAGS_FOLDER and SQL_ALCHEMY_C
> > > [AIRFLOW-785] ImportError if cgroupspy is not installed
> > > [AIRFLOW-784] Cannot install with funcsigs > 1.0.0
> > > [AIRFLOW-780] The UI no longer shows broken DAGs
> > > [AIRFLOW-777] dag_is_running is initlialized to True instead of
> > > [AIRFLOW-719] Skipped operations make DAG finish prematurely
> > > [AIRFLOW-694] Empty env vars do not overwrite non-empty config v
> > > [AIRFLOW-139] Executing VACUUM with PostgresOperator
> > > [AIRFLOW-111] DAG concurrency is not honored
> > > [AIRFLOW-88] Improve clarity Travis CI reports
> > >
> > > I would like to raise a VOTE for releasing 1.8.1 based on release
> > candidate
> > > 0, i.e. just renaming release candidate 0 to 1.8.1 release.
> > >
> > > Please respond to this email by:
> > >
> > > +1,0,-1 with *binding* if you are a PMC member or *non-binding* if you
> > are
> > > not.
> > >
> > > Vote will run for 72 hours (ends this Thursday).
> > >
> > > Thanks!
> > > Chris
> > >
> > > My VOTE: +1 (binding)
> > >
> >
>


Re: [VOTE] Release Airflow 1.8.1 based on Airflow 1.8.1 RC0

2017-04-17 Thread Alex Guziel
I would say to include [1074] (
https://github.com/apache/incubator-airflow/pull/2221) so we don't have a
regression in the release after. I would also say
https://github.com/apache/incubator-airflow/pull/2241 is semi important but
less so.

On Mon, Apr 17, 2017 at 11:24 AM, Chris Riccomini 
wrote:

> Dear All,
>
> I have been able to make the Airflow 1.8.1 RC0 available at:
> https://dist.apache.org/repos/dist/dev/incubator/airflow, public keys are
> available at https://dist.apache.org/repos/dist/release/incubator/airflow.
>
> Issues fixed:
>
> [AIRFLOW-1062] DagRun#find returns wrong result if external_trigg
> [AIRFLOW-1054] Fix broken import on test_dag
> [AIRFLOW-1050] Retries ignored - regression
> [AIRFLOW-1033] TypeError: can't compare datetime.datetime to None
> [AIRFLOW-1030] HttpHook error when creating HttpSensor
> [AIRFLOW-1017] get_task_instance should return None instead of th
> [AIRFLOW-1011] Fix bug in BackfillJob._execute() for SubDAGs
> [AIRFLOW-1001] Landing Time shows "unsupported operand type(s) fo
> [AIRFLOW-1000] Rebrand to Apache Airflow instead of Airflow
> [AIRFLOW-989] Clear Task Regression
> [AIRFLOW-974] airflow.util.file mkdir has a race condition
> [AIRFLOW-906] Update Code icon from lightning bolt to file
> [AIRFLOW-858] Configurable database name for DB operators
> [AIRFLOW-853] ssh_execute_operator.py stdout decode default to A
> [AIRFLOW-832] Fix debug server
> [AIRFLOW-817] Trigger dag fails when using CLI + API
> [AIRFLOW-816] Make sure to pull nvd3 from local resources
> [AIRFLOW-815] Add previous/next execution dates to available def
> [AIRFLOW-813] Fix unterminated unit tests in tests.job (tests/jo
> [AIRFLOW-812] Scheduler job terminates when there is no dag file
> [AIRFLOW-806] UI should properly ignore DAG doc when it is None
> [AIRFLOW-794] Consistent access to DAGS_FOLDER and SQL_ALCHEMY_C
> [AIRFLOW-785] ImportError if cgroupspy is not installed
> [AIRFLOW-784] Cannot install with funcsigs > 1.0.0
> [AIRFLOW-780] The UI no longer shows broken DAGs
> [AIRFLOW-777] dag_is_running is initlialized to True instead of
> [AIRFLOW-719] Skipped operations make DAG finish prematurely
> [AIRFLOW-694] Empty env vars do not overwrite non-empty config v
> [AIRFLOW-139] Executing VACUUM with PostgresOperator
> [AIRFLOW-111] DAG concurrency is not honored
> [AIRFLOW-88] Improve clarity Travis CI reports
>
> I would like to raise a VOTE for releasing 1.8.1 based on release candidate
> 0, i.e. just renaming release candidate 0 to 1.8.1 release.
>
> Please respond to this email by:
>
> +1,0,-1 with *binding* if you are a PMC member or *non-binding* if you are
> not.
>
> Vote will run for 72 hours (ends this Thursday).
>
> Thanks!
> Chris
>
> My VOTE: +1 (binding)
>


Re: Memory Issues with Airflow Subdag

2017-04-05 Thread Alex Guziel
Which container is using the memory?

On Wed, Apr 5, 2017 at 2:23 PM, Alex Keating  wrote:

> Hey Everyone,
>
> We recently added a subdag with over 200 tasks on it and any time airflow
> is running the cpu and memory usage spikes taking down the server. I am
> putting the subdag in a factory outside of the dags folder and importing it
> into a dag. We are using the Celery Executor, and running the worker,
> webserver, scheduler, and redis on a t2.medium instance with everything in
> a separate docker container. Is this normal? Why might this subdag be
> spiking our memory and cpu?
>
> --
> Alexander Keating
>
> Software Engineer | WayUp 
>
> 646.535.8724 | a...@wayup.com
>
> We're hiring at WayUp!
>  opsignature&utm_campaign=joinus>
>
> We help students get hired. 
>


Re: Podling Report Reminder - April 2017

2017-04-03 Thread Alex Guziel
Did we do this?

On Mon, Apr 3, 2017 at 5:53 PM,  wrote:

> Dear podling,
>
> This email was sent by an automated system on behalf of the Apache
> Incubator PMC. It is an initial reminder to give you plenty of time to
> prepare your quarterly board report.
>
> The board meeting is scheduled for Wed, 19 April 2017, 10:30 am PDT.
> The report for your podling will form a part of the Incubator PMC
> report. The Incubator PMC requires your report to be submitted 2 weeks
> before the board meeting, to allow sufficient time for review and
> submission (Wed, April 05).
>
> Please submit your report with sufficient time to allow the Incubator
> PMC, and subsequently board members to review and digest. Again, the
> very latest you should submit your report is 2 weeks prior to the board
> meeting.
>
> Thanks,
>
> The Apache Incubator PMC
>
> Submitting your Report
>
> --
>
> Your report should contain the following:
>
> *   Your project name
> *   A brief description of your project, which assumes no knowledge of
> the project or necessarily of its field
> *   A list of the three most important issues to address in the move
> towards graduation.
> *   Any issues that the Incubator PMC or ASF Board might wish/need to be
> aware of
> *   How has the community developed since the last report
> *   How has the project developed since the last report.
> *   How does the podling rate their own maturity.
>
> This should be appended to the Incubator Wiki page at:
>
> https://wiki.apache.org/incubator/April2017
>
> Note: This is manually populated. You may need to wait a little before
> this page is created from a template.
>
> Mentors
> ---
>
> Mentors should review reports for their project(s) and sign them off on
> the Incubator wiki page. Signing off reports shows that you are
> following the project - projects that are not signed may raise alarms
> for the Incubator PMC.
>
> Incubator PMC
>


Re: SQLOperator?

2017-03-17 Thread Alex Guziel
I'm not sure if that one went away but there are different SQL operators,
like MySqlOperator, MsSqlOperator, etc. that I see.

Best,
Alex

On Fri, Mar 17, 2017 at 7:56 PM, Ruslan Dautkhanov 
wrote:

> I can't find references to SQLOperator neither in the source code nor in
> the API Reference.
>
> Although it is mentioned in Concepts page :
>
> https://github.com/apache/incubator-airflow/blob/master/
> docs/concepts.rst#operators
>
>
>
>- SqlOperator - executes a SQL command
>
> Sorry for basic questions - just started using Airflow this week.
> Did it got replaced with something else? If so what that is?
>
>
>
> Thanks,
> Ruslan Dautkhanov
>


Re: Airflow Committers: Landscape checks doing more harm than good?

2017-03-16 Thread Alex Guziel
+1 also

We have code review already and the amount of false positives makes this
useless.

On Thu, Mar 16, 2017 at 5:02 PM, Maxime Beauchemin <
maximebeauche...@gmail.com> wrote:

> +1 as well
>
> I'm disappointed because the service is inches away from getting everything
> right. As Bolke said, behind the cover it's little more than pylint, git
> hooks, and a somewhat-fancy ui.
>
> Operationally it's been getting in the way.
>
> There's a way to pipe the output of git diff into pylint and check whether
> the touched lines need linting, in which case we should break the build.
> This could run in it's own slot in the Travis build matrix.
>
> Max
>
> On Thu, Mar 16, 2017 at 4:51 PM, Bolke de Bruin  wrote:
>
> > We can do it in Travis’ afaik. We should replace it.
> >
> > So +1.
> >
> > B.
> >
> > > On 16 Mar 2017, at 16:48, Jeremiah Lowin  wrote:
> > >
> > > This may be an unpopular opinion, but most Airflow PRs have a little
> red
> > > "x" next to them not because they have failing unit tests, but because
> > the
> > > Landscape check has decided they introduce bad code.
> > >
> > > Unfortunately Landscape is often wrong -- here it is telling me my
> latest
> > > PR introduced no less than 30 errors... in files I didn't touch!
> > > https://github.com/apache/incubator-airflow/pull/2157 (however, it
> > gives me
> > > credit for fixing 23 errors in those same files, so I've got that going
> > for
> > > me... which is nice.)
> > >
> > > The upshot is that Github's "health" indicator can be swayed by minor
> or
> > > erroneous issues, and therefore it serves little purpose other than
> > making
> > > it look like every PR is bad. This creates committer fatigue, since
> every
> > > PR needs to be parsed to see if it actually is OK or not.
> > >
> > > Don't get me wrong, I'm all for proper style and on occasion Landscape
> > has
> > > pointed out problems that I've gone and fixed. But on the whole, I
> > believe
> > > that having it as part of our red / green PR evaluation -- equal to and
> > > often superseding unit tests -- is harmful. I'd much rather be able to
> > scan
> > > the PR list and know unequivocally that "green" indicates ready to
> merge.
> > >
> > > J
> >
> >
>


Re: Continuous Dag

2017-03-13 Thread Alex Guziel
FWIW, for our streaming jobs, we run a 5 minute schedule interval with
max_active_runs=1

On Mon, Mar 13, 2017 at 2:00 PM, Maxime Beauchemin <
maximebeauche...@gmail.com> wrote:

> Airflow isn't designed to work well with short schedule intervals. The
> guarantees that we give in terms of scheduling latency are limited as the
> platform isn't optimized for that specifically.
>
> What is the type of operation that you are performing every 2 minutes?
>
> If you're doing data processing in microbatches you should look into data
> streaming solutions like Spark Streaming, Flink, Sanza, or Storm.
>
> Max
>
> On Mon, Mar 13, 2017 at 10:17 AM, Edgardo Vega 
> wrote:
>
> > Max,
> >
> > Sorry for so many ambiguous antecedents.
> >
> > I want to create a dag that does an operation waits 2 minutes and the
> runs
> > again over and over for all time. I don't know if that is possible to do
> > with airflow or somehow trick airflow into doing this.
> >
> > I hope that clears things up.
> >
> > Cheers,
> >
> > Edgardo
> >
> >
> > On Mon, Mar 13, 2017 at 12:39 PM, Maxime Beauchemin <
> > maximebeauche...@gmail.com> wrote:
> >
> > > "Would this be possible in airflow?"
> > >
> > > What do you mean by "this"?
> > >
> > > Max
> > >
> > > On Mon, Mar 13, 2017 at 9:09 AM, Edgardo Vega 
> > > wrote:
> > >
> > > > We are currently trying to port our current solution into airflow.
> What
> > > is
> > > > currently stumping me is we have a few tasks we have running pretty
> > much
> > > > all the time. Once it is done we wait a few minutes and kick off
> > another.
> > > >
> > > > Would this be possible in airflow?
> > > >
> > > > --
> > > > Cheers,
> > > >
> > > > Edgardo
> > > >
> > >
> >
> >
> >
> > --
> > Cheers,
> >
> > Edgardo
> >
>


Re: [RESULT] [VOTE] Release Airflow 1.8.0 based on Airflow 1.8.0rc4

2017-02-24 Thread Alex Guziel
Right now, we see some jobs getting run twice with the second run happening
exactly one hour later. We can't isolate why. The current doublerun job
prevention kills both jobs as it happens in the heartbeat and they run
concurrently and is overeager in killing both. This PR changes it so the job
will get killed in a more sensible way and not have an issue with the current
situation.

https://github.com/apache/incubator-airflow/pull/2102  





On Thu, Feb 23, 2017 5:59 PM, siddharth anand san...@apache.org  wrote:
IMHO, a DAG run without a start date is non-sensical but is not enforced

  That said, our UI allows for the manual creation of DAG Runs without a

start date as shown in the images below:







  - https://www.dropbox.com/s/3sxcqh04eztpl7p/Screenshot%

  202017-02-22%2016.00.40.png?dl=0

 


  - https://www.dropbox.com/s/4q6rr9dwghag1yy/Screenshot%

  202017-02-22%2016.02.22.png?dl=0

 








On Wed, Feb 22, 2017 at 2:26 PM, Maxime Beauchemin <

maximebeauche...@gmail.com> wrote:




> Our database may have edge cases that could be associated with running any

> previous version that may or may not have been part of an official release.

>

> Let's see if anyone else reports the issue. If no one does, one option is

> to release 1.8.0 as is with a comment in the release notes, and have a

> future official minor apache release 1.8.1 that would fix these minor

> issues that are not deal breaker.

>

> @bolke, I'm curious, how long does it take you to go through one release

> cycle? Oh, and do you have a documented step by step process for releasing?

> I'd like to add the Pypi part to this doc and add committers that are

> interested to have rights on the project on Pypi.

>

> Max

>

> On Wed, Feb 22, 2017 at 2:00 PM, Bolke de Bruin  wrote:

>

> > So it is a database integrity issue? Afaik a start_date should always be

> > set for a DagRun (create_dagrun) does so I didn't check the code though.

> >

> > Sent from my iPhone

> >

> > > On 22 Feb 2017, at 22:19, Dan Davydov 

> > wrote:

> > >

> > > Should clarify this occurs when a dagrun does not have a start date,

> not

> > a

> > > dag (which makes it even less likely to happen). I don't think this is

> a

> > > blocker for releasing.

> > >

> > >> On Wed, Feb 22, 2017 at 1:15 PM, Dan Davydov 

> > wrote:

> > >>

> > >> I rolled this out in our prod and the webservers failed to load due to

> > >> this commit:

> > >>

> > >> [AIRFLOW-510] Filter Paused Dags, show Last Run & Trigger Dag

> > >> 7c94d81c390881643f94d5e3d7d6fb351a445b72

> > >>

> > >> This fixed it:

> > >> -   > >> class="glyphicon glyphicon-info-sign" aria-hidden="true" title="Start

> > Date:

> > >> {{last_run.start_date.strftime('%Y-%m-%d %H:%M')}}">

> > >> +   > >> class="glyphicon glyphicon-info-sign" aria-hidden="true">

> > >>

> > >> This is caused by assuming that all DAGs have start dates set, so a

> > broken

> > >> DAG will take down the whole UI. Not sure if we want to make this a

> > blocker

> > >> for the release or not, I'm guessing for most deployments this would

> > occur

> > >> pretty rarely. I'll submit a PR to fix it soon.

> > >>

> > >>

> > >>

> > >> On Tue, Feb 21, 2017 at 9:49 AM, Chris Riccomini <

> criccom...@apache.org

> > >

> > >> wrote:

> > >>

> > >>> Ack that the vote has already passed, but belated +1 (binding)

> > >>>

> > >>> On Tue, Feb 21, 2017 at 7:42 AM, Bolke de Bruin 

> > >>> wrote:

> > >>>

> >  IPMC Voting can be found here:

> > 

> >  http://mail-archives.apache.org/mod_mbox/incubator-general/

> > >>> 201702.mbox/%

> >  3c676bdc9f-1b55-4469-92a7-9ff309ad0...@gmail.com%3e <

> >  http://mail-archives.apache.org/mod_mbox/incubator-general/

> > >>> 201702.mbox/%

> >  3c676bdc9f-1b55-4469-92a7-9ff309ad0...@gmail.com%3E>

> > 

> >  Kind regards,

> >  Bolke

> > 

> > > On 21 Feb 2017, at 08:20, Bolke de Bruin 

> wrote:

> > >

> > > Hello,

> > >

> > > Apache Airflow (incubating) 1.8.0 (based on RC4) has been accepted.

> > >

> > > 9 “+1” votes received:

> > >

> > > - Maxime Beauchemin (binding)

> > > - Arthur Wiedmer (binding)

> > > - Dan Davydov (binding)

> > > - Jeremiah Lowin (binding)

> > > - Siddharth Anand (binding)

> > > - Alex van Boxel (binding)

> > > - Bolke de Bruin (binding)

> > >

> > > - Jayesh Senjaliya (non-binding)

> > > - Yi (non-binding)

> > >

> > > Vote thread (start):

> > > http://mail-archives.apache.org/mod_mbox/incubator-

> >  airflow-dev/201702.mbox/%3cD360D9BE-C358-42A1-9188-

> >  6c92c31a2...@gmail.com%3e  >  org/mod_mbox/incubator-airflow-dev/201702.mbox/%3C7EB7B6D6-

> > >>> 092E-48D2-AA0F-

> >  15f44376a...@gmail

Re: [RESULT] [VOTE] Release Airflow 1.8.0 based on Airflow 1.8.0rc4

2017-02-22 Thread Alex Guziel
I have some concern that this change
https://github.com/apache/incubator-airflow/pull/1939
[AIRFLOW-679] may be having issues because we are seeing lots of double triggers
of tasks and tasks being killed as a result.  





On Wed, Feb 22, 2017 4:35 PM, Dan Davydov dan.davy...@airbnb.com.INVALID  wrote:
Bumping the thread so another user can comment.




On Wed, Feb 22, 2017 at 3:12 PM, Maxime Beauchemin <

maximebeauche...@gmail.com> wrote:




> What I meant to ask is "how much engineering effort it takes to bake a

> single RC?", I guess it depends on how much git-fu is necessary plus some

> overhead cost of doing the series of actions/commands/emails/jira.

>

> I can volunteer for 1.8.1 (hopefully I can get do it along another Airbnb

> engineer/volunteer to tag along) and will try to document/automate

> everything I can as I go through the process. The goal of 1.8.1 could be to

> basically package 1.8.0 + Dan's bugfix, and for Airbnb to get familiar with

> the process.

>

> It'd be great if you can dump your whole process on the wiki, and we'll

> improve it on this next pass.

>

> Thanks again for the mountain of work that went into packaging this

> release.

>

> Max

>

> On Wed, Feb 22, 2017 at 2:44 PM, Bolke de Bruin  wrote:

>

> > I thought you volunteered to baby sit 1.8.1 Chris ;-)?

> >

> > Sent from my iPhone

> >

> > > On 22 Feb 2017, at 23:31, Chris Riccomini 

> wrote:

> > >

> > > I'm +1 for doing a 1.8.1 fast follow-on

> > >

> > > On Wed, Feb 22, 2017 at 2:26 PM, Maxime Beauchemin <

> > > maximebeauche...@gmail.com> wrote:

> > >

> > >> Our database may have edge cases that could be associated with running

> > any

> > >> previous version that may or may not have been part of an official

> > release.

> > >>

> > >> Let's see if anyone else reports the issue. If no one does, one option

> > is

> > >> to release 1.8.0 as is with a comment in the release notes, and have a

> > >> future official minor apache release 1.8.1 that would fix these minor

> > >> issues that are not deal breaker.

> > >>

> > >> @bolke, I'm curious, how long does it take you to go through one

> release

> > >> cycle? Oh, and do you have a documented step by step process for

> > releasing?

> > >> I'd like to add the Pypi part to this doc and add committers that are

> > >> interested to have rights on the project on Pypi.

> > >>

> > >> Max

> > >>

> > >>> On Wed, Feb 22, 2017 at 2:00 PM, Bolke de Bruin 

> > wrote:

> > >>>

> > >>> So it is a database integrity issue? Afaik a start_date should always

> > be

> > >>> set for a DagRun (create_dagrun) does so I didn't check the code

> > though.

> > >>>

> > >>> Sent from my iPhone

> > >>>

> >  On 22 Feb 2017, at 22:19, Dan Davydov  > INVALID>

> > >>> wrote:

> > 

> >  Should clarify this occurs when a dagrun does not have a start date,

> > >> not

> > >>> a

> >  dag (which makes it even less likely to happen). I don't think this

> is

> > >> a

> >  blocker for releasing.

> > 

> > > On Wed, Feb 22, 2017 at 1:15 PM, Dan Davydov <

> dan.davy...@airbnb.com

> > >

> > >>> wrote:

> > >

> > > I rolled this out in our prod and the webservers failed to load due

> > to

> > > this commit:

> > >

> > > [AIRFLOW-510] Filter Paused Dags, show Last Run & Trigger Dag

> > > 7c94d81c390881643f94d5e3d7d6fb351a445b72

> > >

> > > This fixed it:

> > > -   > > class="glyphicon glyphicon-info-sign" aria-hidden="true"

> title="Start

> > >>> Date:

> > > {{last_run.start_date.strftime('%Y-%m-%d %H:%M')}}">

> > > +   > > class="glyphicon glyphicon-info-sign" aria-hidden="true">

> > >

> > > This is caused by assuming that all DAGs have start dates set, so a

> > >>> broken

> > > DAG will take down the whole UI. Not sure if we want to make this a

> > >>> blocker

> > > for the release or not, I'm guessing for most deployments this

> would

> > >>> occur

> > > pretty rarely. I'll submit a PR to fix it soon.

> > >

> > >

> > >

> > > On Tue, Feb 21, 2017 at 9:49 AM, Chris Riccomini <

> > >> criccom...@apache.org

> > 

> > > wrote:

> > >

> > >> Ack that the vote has already passed, but belated +1 (binding)

> > >>

> > >> On Tue, Feb 21, 2017 at 7:42 AM, Bolke de Bruin <

> bdbr...@gmail.com>

> > >> wrote:

> > >>

> > >>> IPMC Voting can be found here:

> > >>>

> > >>> http://mail-archives.apache.org/mod_mbox/incubator-general/

> > >> 201702.mbox/%

> > >>> 3c676bdc9f-1b55-4469-92a7-9ff309ad0...@gmail.com%3e <

> > >>> http://mail-archives.apache.org/mod_mbox/incubator-general/

> > >> 201702.mbox/%

> > >>> 3c676bdc9f-1b55-4469-92a7-9ff309ad0...@gmail.com%3E>

> > >>>

> > >>> Kind regards,

> > >>> Bolke

> > >>>

> >  On 21 Feb 2017, at 08:20, Bolke de Bruin 

> > >> wrote:

> > 

> >  Hello,

> > 

> >