Re: UI not working on Master

2018-12-19 Thread Ash Berlin-Taylor
NPM is needed when running off a git checkout to build the asset 
pipeline. This has caused a fair amount of confusion, so I think it is 
worth adding an explicit check and bailing out if the compiled files 
aren't found.


(Right  now you'll see a small warning in the start up of `airflow 
webserver` but this is quite easy to miss if you aren't looking for it)


-ash


Daniel Imberman wrote on 18/12/2018 23:23:

Well locally I do this (which I'm hoping to PR soon once tests are
passing): https://github.com/apache/incubator-airflow/pull/4326. This
allows me to connect to any k8s cluster.

But for simplicity/parity with CI, you can follow these steps:

   export DOCKER_COMPOSE_VERSION=1.20.0
   export SLUGIFY_USES_TEXT_UNIDECODE=yes
export TOX_ENV=py35-backend_postgres-env_kubernetes
KUBERNETES_VERSION=v1.10.0 PYTHON_VERSION=3
./scripts/ci/kubernetes/setup_kubernetes.sh
./scripts/ci/kubernetes/kube/deploy.sh

This will give you airflow runniung on minikube. you can then do a kubectl
port-forward  8080 8080 to port-forward into the
instance.

I can attempt to check webserver logs again when I have some time
(hopefully this weekend/next week) as well.

Thanks Tao!



On Tue, Dec 18, 2018 at 3:15 PM Tao Feng  wrote:


I assume so. How do you run your test? I could try it on my side and see if
I could reproduce the problem.

On Tue, Dec 18, 2018 at 2:17 PM Daniel Imberman 
wrote:


Do we need to install NPM? Should I add npm to the docker image/bootstrap
script?

On Tue, Dec 18, 2018 at 1:20 PM Tao Feng  wrote:


I just try with latest master which could show the dag after login as
Admin. One last check: have you done the `npm install` and `npm run

build`?

On Tue, Dec 18, 2018 at 12:50 PM Daniel Imberman <
daniel.imber...@gmail.com>
wrote:


I believe I am using the new UI since I have to login as admin to get

in. I

can still see certain admin panels, but the primary DAG one doesn't

show

any DAGs even when I explicitly put one in the folder.

On Tue, Dec 18, 2018, 12:26 PM Tao Feng 
Are you using the Old UI or the new UI? Do you login as Admin user

for

new

UI?

On Tue, Dec 18, 2018 at 11:56 AM Daniel Imberman <
daniel.imber...@gmail.com>
wrote:


Hello all,

I've been testing the K8sExecutor and I notice that when I open

the

UI,

I'm

unable to see any of the DAGs even though they are in the

aiflow/dags

folder. I'm also still able to launch DAGs via the API. I have

not

tested

this on any of the other executors yet. Is this a known issue?

Also

raises

the question on whether we can set up some form of UI testing

since I

don't

think the current unit/integration tests would catch this kind of

error.

Thank you





Re: Please add me to the mailing list for airflow

2018-12-19 Thread Ash Berlin-Taylor

Please send an email to dev-subscr...@airflow.apache.org

Ryan Riopelle wrote on 18/12/2018 20:12:

Hello,

Please add me to the mailing list for airflow.

Thanks,
Ryan



Ryan Riopelle
Data Engineer
San Francisco Office
figure.com 




This message and its contents are confidential. If you received this 
message in error, do not use or rely upon it. Instead, please inform 
the sender and then delete it. Thank you.




Re: Configuring/sharing Airflow github repo security alerts

2018-12-18 Thread Ash Berlin-Taylor
We're not admins of the repo - only the ASF Infra team are, so we'll 
have to open an ticket against the INFRA queue in jira asking for this


(I haven't done this. Not on large device right now)

-a

Feng Lu wrote on 18/12/2018 08:01:

Hi all,

Looks like GitHub now adds a new "Security Alert" feature

for tracking dependency CVEs, unfortunately I couldn't find it in Airflow
repo.  So if it makes
sense to the community, could Airflow repo admin (assume it means PMC
members ;p) help to enable the alert feature and make it publicly
available?

Happy to take a stab myself if I have the access permission.
Thanks.

Feng





Re: Call for fixes for Airflow 1.10.2

2018-12-15 Thread Ash Berlin-Taylor

Looks good, thanks for the reminder.

Merged to master and cherry-picked to the release branch.

Kaxil Naik wrote on 15/12/2018 11:21:

Sure, I am waiting for final comments from Ash on it, if he has any. If not
I will pick it up and merge to master + cherry-pick to the test branch

On Thu, Dec 13, 2018 at 2:51 PM Thomas Brockmeier <
thomas.brockme...@randstadgroep.nl> wrote:


Hi Kaxil,

If possible, I would like to get AIRFLOW-1552 in:
https://issues.apache.org/jira/browse/AIRFLOW-1552
https://github.com/apache/incubator-airflow/pull/4276

Thanks,
Thomas

On 2018/11/28 22:40:55, Kaxil Naik  wrote:

Hi everyone,>

I'm starting the process of gathering fixes for a 1.10.2. So far the

list>

of issues I have that we should pull in are>
*

https://issues.apache.org/jira/browse/AIRFLOW-3384?jql=project%20%3D%20AIRFLOW%20AND%20status%20%3D%20Resolved%20AND%20fixVersion%20%3D%201.10.2

<

https://issues.apache.org/jira/browse/AIRFLOW-3384?jql=project%20%3D%20AIRFLOW%20AND%20status%20%3D%20Resolved%20AND%20fixVersion%20%3D%201.10.2

*>
I will start pushing these as cherry-picked commits to the v1-10-test>
branch today.>

*Kaxil Naik*>
*Big Data Consultant *@ *Data Reply UK*>
*Certified *Google Cloud Data Engineer | *Certified* Apache Spark &

Neo4j>

Developer>
*Phone: *+44 (0) 74820 88992>
*LinkedIn*: https://www.linkedin.com/in/kaxil>


--
The information contained in this e-mail communication is solely intended
for the person/legal person to whom it has been sent, and as it may
contain
information of a personal or confidential nature, it may not be made
public
by virtue of law, regulations or agreement. If someone other than the
intended recipient should receive or come into possession of this e-mail
communication, he/she will not be entitled to read, disseminate, disclose
or duplicate it. If you are not the intended recipient, you are requested
to inform the sender of this e-mail message of this immediately, and to
destroy the original e-mail communication. Neither Randstad Holding nv nor
its subsidiaries accept any liability for incorrect and incomplete
transmission or delayed receipt of this e-mail.







Re: Recommended backend metastore for Airflow

2018-12-10 Thread Ash Berlin-Taylor
Postgres.

Friends don't let friends use MySQL is my personal rule.

(I can get in to the reasons if you'd like, but the short version is I find 
Postgres has more compliant behaviour with SQL standard, and a much better 
query planner.)

-ash

> On 10 Dec 2018, at 15:10, ramandu...@gmail.com wrote:
> 
> Hi All,
> 
> It seems that Airflow supports mysql, postgresql and mssql as backend store. 
> Any recommendation on using one over other. We are expecting to run 1000(s) 
> of concurrent Dags which would generate heavy load on backend store.
> Any pointer on this would be useful.
> 
> Thanks,
> Raman Gupta



Re: Refactor models.py

2018-12-06 Thread Ash Berlin-Taylor
Hi Fokko,

I commented on some of the issues -we could possibly just delete User and 
KnownEvent*

> My suggestion is to create a new package, called models, which will contain 
> all the orm classes

And do what with the current airflow.models?

Do you have an idea of module names to move things to? Are you thinking we have 
airflow.models.connection module containing just a Connection class for example?

-ash

> On 6 Dec 2018, at 11:35, Driesprong, Fokko  wrote:
> 
> Hi All,
> 
> I think it is time to refactor the infamous models.py. This file is far too
> big, and it only keeps growing. My suggestion is to create a new package,
> called models, which will contain all the orm classes (the ones
> with __tablename__ in the class). And for example the BaseOperator to the
> operator packages. I've created a lot of tickets to move the classes one by
> one out of models.py. The reason to do this one by one is to relieve the
> pain of fixing the circular dependencies.
> 
> Refactor: Move DagBag out of models.py
> https://issues.apache.org/jira/browse/AIRFLOW-3456
> 
> Refactor: Move User out of models.py
> https://issues.apache.org/jira/browse/AIRFLOW-3457
> 
> Refactor: Move Connection out of models.py
> https://issues.apache.org/jira/browse/AIRFLOW-3458
> 
> Refactor: Move DagPickle out of models.py
> https://issues.apache.org/jira/browse/AIRFLOW-3459
> 
> Refactor: Move TaskInstance out of models.py
> https://issues.apache.org/jira/browse/AIRFLOW-3460
> 
> Refactor: Move TaskFail out of models.py
> https://issues.apache.org/jira/browse/AIRFLOW-3461
> 
> Refactor: Move TaskReschedule out of models.py
> https://issues.apache.org/jira/browse/AIRFLOW-3462
> 
> Refactor: Move Log out of models.py
> https://issues.apache.org/jira/browse/AIRFLOW-3463
> 
> Refactor: Move SkipMixin out of models.py
> https://issues.apache.org/jira/browse/AIRFLOW-3464
> 
> Refactor: Move BaseOperator out of models.py
> https://issues.apache.org/jira/browse/AIRFLOW-3465
> 
> Refactor: Move DAG out of models.py
> https://issues.apache.org/jira/browse/AIRFLOW-3466
> 
> Refactor: Move Chart out of models.py
> https://issues.apache.org/jira/browse/AIRFLOW-3467
> 
> Refactor: Move KnownEventType out of models.py
> https://issues.apache.org/jira/browse/AIRFLOW-3468
> 
> Refactor: Move KnownEvent out of models.py
> https://issues.apache.org/jira/browse/AIRFLOW-3469
> 
> Refactor: Move Variable out of models.py
> https://issues.apache.org/jira/browse/AIRFLOW-3470
> 
> Refactor: Move XCom out of models.py
> https://issues.apache.org/jira/browse/AIRFLOW-3471
> 
> Refactor: Move DagStat out of models.py
> https://issues.apache.org/jira/browse/AIRFLOW-3472
> 
> Refactor: Move DagRun out of models.py
> https://issues.apache.org/jira/browse/AIRFLOW-3473
> 
> Refactor: Move SlaMiss out of models.py
> https://issues.apache.org/jira/browse/AIRFLOW-3474
> 
> Refactor: Move ImportError out of models.py
> https://issues.apache.org/jira/browse/AIRFLOW-3475
> 
> Refactor: Move KubeResourceVersion out of models.py
> https://issues.apache.org/jira/browse/AIRFLOW-3476
> 
> Refactor: Move KubeWorkerIdentifier out of models.py
> https://issues.apache.org/jira/browse/AIRFLOW-3477
> 
> Some classes are really simple, and would also be a nice opportunity for
> newcomers to start contributing to Airflow :-)
> 
> Cheers, Fokko



Call for Graduation press releases and testimonials

2018-12-06 Thread Ash Berlin-Taylor
Hi everyone,

As you've probably seen we're now progressing toward becoming our own top-level 
project and we can remove that pesky "(Incubating)" tag!

Assuming the vote passes and the Apache Software Foundation Board on 19th 
December we'd like to get some press releases out announcing and celebrating 
this.

If your company uses Airflow and would like to create a press release 
celebrating the graduation, or if you'd like to say some nice words about how 
Airflow has helped your company to include in the Apache press release please 
get in touch.

Some example PRs from previous project graduations:

https://finance.yahoo.com/news/cask-congratulates-apache-twill-top-145050868.html
https://www.salesforce.com/blog/2017/10/apache-predictionio-graduates-top-level-project

We would like all the publicity to go out at the same time, so please 
co-ordinate with me and Sally Khudairi  off list.

Time is a bit short on this, so if you are interested please get in touch!

Thanks,
Ash

Re: [RESULT] Graduate Apache Airflow as a TLP

2018-12-04 Thread Ash Berlin-Taylor
Missed my vote off that list :)

> On 4 Dec 2018, at 18:17, Jakob Homan  wrote:
> 
> I neglected to add my binding +1, so I'll do so now.
> 
> With three days having elapsed, the VOTE is concluded successfully.
> 
> Overall: 20 x +1 votes, 0 x -1 votes
> 
> Binding +1 x 10: Kaxil, Tao, Bolke, Fokko, Maxime, Arthur, Hitesh,
> Chris, Sid, Jakob.
> Non-binding +1 x 10: Daniel, Shah, Stefan, Kevin, Marc, Sunil,
> Adityan, Deng, Neelesh, Sai
> 
> I'll use this result to start the corresponding VOTE on the IPMC.  I'm
> at an offsite today, so I have limited email time.  Likely will open
> the VOTE this evening.
> 
> Thanks everyone.
> -Jakob
> 
> 
> On Tue, Dec 4, 2018 at 6:03 AM Bolke de Bruin  wrote:
>> 
>> Shall we close the vote? @jakob?
>> 
>>> On 2 Dec 2018, at 13:08, Sid Anand  wrote:
>>> 
>>> +1 binding
>>> 
>>> Woot! Thanks to all for this happy day!
>>> -s
>>> 
>>> On Sun, Dec 2, 2018 at 1:25 AM Sai Phanindhra  wrote:
>>> 
>>>> +1 (non binding)
>>>> 
>>>> Excited to see this happenning.
>>>> 
>>>> On Sat 1 Dec, 2018, 20:35 >>> 
>>>>> +1 (binding)!
>>>>> 
>>>>> On 30 November 2018 21:33:14 GMT, Jakob Homan  wrote:
>>>>>> Hey all!
>>>>>> 
>>>>>> Following a very successful DISCUSS[1] regarding graduating Airflow to
>>>>>> Top Level Project (TLP) status, I'm starting the official VOTE.
>>>>>> 
>>>>>> Since entering the Incubator in 2016, the community has:
>>>>>> * successfully produced 7 releases
>>>>>> * added 9 new committers/PPMC members
>>>>>> * built a diverse group of committers from multiple different employers
>>>>>> * had more than 3,300 JIRA tickets opened
>>>>>> * completed the project maturity model with positive responses[2]
>>>>>> 
>>>>>> Accordingly, I believe we're ready to graduate and am calling a VOTE
>>>>>> on the following graduation resolution.  This VOTE will remain open
>>>>>> for at least 72 hours.  If successful, the resolution will be
>>>>>> forwarded to the IPMC for its consideration.  If that VOTE is
>>>>>> successful, the resolution will be voted upon by the Board at its next
>>>>>> monthly meeting.
>>>>>> 
>>>>>> Everyone is encouraged to vote, even if their vote is not binding.
>>>>>> We've built a nice community here, let's make sure everyone has their
>>>>>> voice heard.
>>>>>> 
>>>>>> Thanks,
>>>>>> Jakob
>>>>>> 
>>>>>> [1]
>>>>>> 
>>>>> 
>>>> https://lists.apache.org/thread.html/%3c0a763b0b-7d0d-4353-979a-ac6769eb0...@gmail.com%3E
>>>>>> [2]
>>>>>> https://cwiki.apache.org/confluence/display/AIRFLOW/Maturity+Evaluation
>>>>>> 
>>>>>> 
>>>>>> 
>>>>>> Establish the Apache Airflow Project
>>>>>> 
>>>>>> WHEREAS, the Board of Directors deems it to be in the best
>>>>>> interests of the Foundation and consistent with the
>>>>>> Foundation's purpose to establish a Project Management
>>>>>> Committee charged with the creation and maintenance of
>>>>>> open-source software, for distribution at no charge to
>>>>>> the public, related to workflow automation and scheduling
>>>>>> that can be used to author and manage data pipelines.
>>>>>> 
>>>>>> NOW, THEREFORE, BE IT RESOLVED, that a Project Management
>>>>>> Committee (PMC), to be known as the "Apache Airflow Project",
>>>>>> be and hereby is established pursuant to Bylaws of the
>>>>>> Foundation; and be it further
>>>>>> 
>>>>>> RESOLVED, that the Apache Airflow Project be and hereby is
>>>>>> responsible for the creation and maintenance of software
>>>>>> related to workflow automation and scheduling that can be
>>>>>> used to author and manage data pipelines; and be it further
>>>>>> 
>>>>>> RESOLVED, that the office of "Vice President, Apache Airflow" be
>>>>>> and hereb

Re: Call for fixes for Airflow 1.10.2

2018-12-01 Thread Ash Berlin-Taylor
I'd like to get https://issues.apache.org/jira/browse/AIRFLOW-3422 in if we can 
- but we need a fix first (Bolke: it's our favourite! DST time zones in 
next_schedule!) I'll take a look at this ... soon.

-ash

> On 28 Nov 2018, at 22:40, Kaxil Naik  wrote:
> 
> Hi everyone,
> 
> I'm starting the process of gathering fixes for a 1.10.2. So far the list
> of issues I have that we should pull in are
> *https://issues.apache.org/jira/browse/AIRFLOW-3384?jql=project%20%3D%20AIRFLOW%20AND%20status%20%3D%20Resolved%20AND%20fixVersion%20%3D%201.10.2
> *
> 
> I will start pushing these as cherry-picked commits to the v1-10-test
> branch today.
> 
> *Kaxil Naik*
> *Big Data Consultant *@ *Data Reply UK*
> *Certified *Google Cloud Data Engineer | *Certified* Apache Spark & Neo4j
> Developer
> *Phone: *+44 (0) 74820 88992
> *LinkedIn*: https://www.linkedin.com/in/kaxil



Re: [DISCUSS] Apache Airflow graduation from the incubator

2018-12-01 Thread Ash Berlin-Taylor
I've created two tickets to add QU30 and CO50 to our docs.

(I think even if we use sec@a.o we should still add something to our docs 
saying how to do it)

https://issues.apache.org/jira/browse/AIRFLOW-3430 -- Fokko: did you get 
anywhere on this?
https://issues.apache.org/jira/browse/AIRFLOW-3431 -- I'll make a start on this

-ash

> On 30 Nov 2018, at 22:06, Bolke de Bruin  wrote:
> 
> Thanks Jakob!
> 
> Verstuurd vanaf mijn iPad
> 
>> Op 30 nov. 2018 om 22:49 heeft Jakob Homan  het volgende 
>> geschreven:
>> 
>> I've finished the paperwork.  I don't seem to have karma to trigger
>> the build on Jenkins, so we'll just wait for the daily rebuild.  With
>> that, I've opened the VOTE thread as well.  Thanks everybody.
>>> On Wed, Nov 28, 2018 at 5:08 PM Jakob Homan  wrote:
>>> 
>>> I'll finish up the template at
>>> http://incubator.apache.org/projects/airflow.html tomorrow or Friday
>>> (I *think* you have to be an IPMC member to update it since it lives
>>> in the Incubator SVN).  Looks like there's no actual work to do, just
>>> marking stuff that has been done but not yet recorded, and verifying
>>> some licenses.
>>> 
>>> -Jakob
>>> 
>>> 
>>> 
>>>> On Wed, Nov 28, 2018 at 2:48 PM Tao Feng  wrote:
>>>> 
>>>> Sorry, just saw Kaxil's latest email. Kaxil, is there anything else I could
>>>> help with?
>>>> 
>>>> Thanks,
>>>> -Tao
>>>> 
>>>>> On Wed, Nov 28, 2018 at 2:40 PM Tao Feng  wrote:
>>>>> 
>>>>> I would like to help on the documentation. Let me take a look at it. I
>>>>> will work Kaxil on that.
>>>>> 
>>>>>> On Tue, Nov 27, 2018 at 12:39 PM Bolke de Bruin  
>>>>>> wrote:
>>>>>> 
>>>>>> Hi Folks,
>>>>>> 
>>>>>> Thanks all for your responses and particularly Stefan for his suggestion
>>>>>> to use the generic Apache way to handle security issues. This seems to be
>>>>>> an accepted way for more projects, so I have added this to the maturity
>>>>>> evaluation[1] and marked is as resolved. While handling the GPL library 
>>>>>> can
>>>>>> be nicer we are already in compliance with CD30, so @Fokko and @Ash if 
>>>>>> you
>>>>>> want to help out towards graduation please spend your time elsewhere like
>>>>>> fixing CO50. This means adding a page to confluence that describes how to
>>>>>> become a committer on the project. As we are following Apache many 
>>>>>> examples
>>>>>> of other projects are around[2]
>>>>>> 
>>>>>> Then there is the paperwork[3] as referred to by Jakob. This mainly
>>>>>> concerns filling in some items, maybe here and there creation some
>>>>>> documentation but I don't think much. @Kaxil, @Tao: are you willing to 
>>>>>> pick
>>>>>> this up? @Sid can you share how to edit that page?
>>>>>> 
>>>>>> If we have resolved these items in my opinion we can start the voting
>>>>>> here and at the IPMC thereafter, targeting the board meeting of January 
>>>>>> for
>>>>>> graduation. How’s that for a New Year’s resolution?
>>>>>> 
>>>>>> Cheers!
>>>>>> Bolke
>>>>>> 
>>>>>> P.S. Would it be nice to have updated graduation web page? Maybe one of
>>>>>> the contributors/community members likes to take a stab at this[4]
>>>>>> 
>>>>>> [1]
>>>>>> https://cwiki.apache.org/confluence/display/AIRFLOW/Maturity+Evaluation <
>>>>>> https://cwiki.apache.org/confluence/display/AIRFLOW/Maturity+Evaluation>
>>>>>> [2] https://cwiki.apache.org/confluence/display/HAWQ/Becoming+a+committer
>>>>>> <https://cwiki.apache.org/confluence/display/HAWQ/Becoming+a+committer>
>>>>>> [3] http://incubator.apache.org/projects/airflow.html <
>>>>>> http://incubator.apache.org/projects/airflow.html>
>>>>>> [4] https://airflow.apache.org/ <https://airflow.apache.org/>
>>>>>> 
>>>>>> 
>>>>>> 
>>>>>>> On 27 Nov 2018, at 16:32, Driesprong, Fokko 
>>>>>> wrote

Re: Will Airflow 2.0.0 support Python 2.7?

2018-11-29 Thread Ash Berlin-Taylor
This came up previously, and no firm conclusion was released, but given Python 
2.7 is still maintained for another year, yes, probably.

> On 29 Nov 2018, at 08:48, airflowuser  
> wrote:
> 
> Are there plans to drop support for Python 2.7 - if so when ?



Re: programmatically creating and airflow quirks

2018-11-28 Thread Ash Berlin-Taylor
I have similar feelings around the "core" of Airflow and would _love_ to 
somehow find time to spend a month really getting to grips with the scheduler 
and the dagbag and see what comes to light with fresh eyes and the benefits of 
hindsight.

Finding that time is going to be A Challenge though.

(Oh, except no to microservices. Airflow is hard enough to operator right now 
without splitting things in to even more daemons)

-ash
> On 26 Nov 2018, at 03:06, soma dhavala  wrote:
> 
> 
> 
>> On Nov 26, 2018, at 7:50 AM, Maxime Beauchemin  
>> wrote:
>> 
>> The historical reason is that people would check in scripts in the repo
>> that had actual compute or other forms or undesired effect in module scope
>> (scripts with no "if __name__ == '__main__':") and Airflow would just run
>> this script while seeking for DAGs. So we added this mitigation patch that
>> would confirm that there's something Airflow-related in the .py file. Not
>> elegant, and confusing at times, but it also probably prevented some issues
>> over the years.
>> 
>> The solution here is to have a more explicit way of adding DAGs to the
>> DagBag (instead of the folder-crawling approach). The DagFetcher proposal
>> offers solutions around that, having a central "manifest" file that
>> provides explicit pointers to all DAGs in the environment.
> 
> Some rebasing needs to happen. When I looked at 1.8 code base almost an year 
> ago, it felt like more complex than necessary.  What airflow is trying to 
> promise from an architectural standpoint — that was not clear to me. It is 
> trying to do too many things, scattered in too many places, is the feeling I 
> got. As a result, I stopped peeping, and just trust that it works — which it 
> does, btw. I tend to think that, airflow outgrew its original intents. A sort 
> of micro-services architecture has to be brought in. I may sound critical, 
> but no offense. I truly appreciate the contributions.
> 
>> 
>> Max
>> 
>> On Sat, Nov 24, 2018 at 5:04 PM Beau Barker 
>> wrote:
>> 
>>> In my opinion this searching for dags is not ideal.
>>> 
>>> We should be explicitly specifying the dags to load somewhere.
>>> 
>>> 
 On 25 Nov 2018, at 10:41 am, Kevin Yang  wrote:
 
 I believe that is mostly because we want to skip parsing/loading .py
>>> files
 that doesn't contain DAG defs to save time, as scheduler is going to
 parse/load the .py files over and over again and some files can take
>>> quite
 long to load.
 
 Cheers,
 Kevin Y
 
 On Fri, Nov 23, 2018 at 12:44 AM soma dhavala 
 wrote:
 
> happy to report that the “fix” worked. thanks Alex.
> 
> btw, wondering why was it there in the first place? how does it help —
> saves time, early termination — what?
> 
> 
>> On Nov 23, 2018, at 8:18 AM, Alex Guziel 
>>> wrote:
>> 
>> Yup.
>> 
>> On Thu, Nov 22, 2018 at 3:16 PM soma dhavala  > wrote:
>> 
>> 
>>> On Nov 23, 2018, at 3:28 AM, Alex Guziel  > wrote:
>>> 
>>> It’s because of this
>>> 
>>> “When searching for DAGs, Airflow will only consider files where the
> string “airflow” and “DAG” both appear in the contents of the .py file.”
>>> 
>> 
>> Have not noticed it.  From airflow/models.py, in process_file — (both
>>> in
> 1.9 and 1.10)
>> ..
>> if not all([s in content for s in (b'DAG', b'airflow')]):
>> ..
>> is looking for those strings and if they are not found, it is returning
> without loading the DAGs.
>> 
>> 
>> So having “airflow” and “DAG”  dummy strings placed somewhere will make
> it work?
>> 
>> 
>>> On Thu, Nov 22, 2018 at 2:27 AM soma dhavala  > wrote:
>>> 
>>> 
 On Nov 22, 2018, at 3:37 PM, Alex Guziel  > wrote:
 
 I think this is what is going on. The dags are picked by local
> variables. I.E. if you do
 dag = Dag(...)
 dag = Dag(…)
>>> 
>>> from my_module import create_dag
>>> 
>>> for file in yaml_files:
>>>   dag = create_dag(file)
>>>   globals()[dag.dag_id] = dag
>>> 
>>> You notice that create_dag is in a different module. If it is in the
> same scope (file), it will be fine.
>>> 
 
>>> 
 Only the second dag will be picked up.
 
 On Thu, Nov 22, 2018 at 2:04 AM Soma S Dhavala <
>>> soma.dhav...@gmail.com
> > wrote:
 Hey AirFlow Devs:
 In our organization, we build a Machine Learning WorkBench with
> AirFlow as
 an orchestrator of the ML Work Flows, and have wrapped AirFlow python
 operators to customize the behaviour. These work flows are specified
>>> in
 YAML.
 
 We drop a DAG loader (written python) in the default location airflow

Re: [DISCUSS] Apache Airflow graduation from the incubator

2018-11-27 Thread Ash Berlin-Taylor
Oarsome Bolke, thanks for starting this.

It looks like we are closer than I thought!

We can use those security lists (though having our own would be nice) - either 
way we will need to make this prominent in the docs.

Couple of points

CS10: that github link is only visible to members of the team

CD30: probably good as it is, we may want to do 
https://issues.apache.org/jira/browse/AIRFLOW-3400 
 to remove the last niggle 
of the GPL env var at install time (but not a hard requirement, just nice)

-ash

> On 26 Nov 2018, at 21:10, Stefan Seelmann  wrote:
> 
> I agree that Apache Airflow should graduate.
> 
> I'm only involved since beginning of this year, but the project did two
> releases during that time, once TLP releasing becomes easier :)
> 
> Regarding QU30 you may consider to use the ASF wide security mailing
> list [3] and process [4].
> 
> Kind Regards,
> Stefan
> 
> [3] https://www.apache.org/security/
> [4] https://www.apache.org/security/committers.html
> 
> 
> On 11/26/18 8:46 PM, Bolke de Bruin wrote:
>> Ping!
>> 
>> Sent from my iPhone
>> 
>>> On 24 Nov 2018, at 12:57, Bolke de Bruin  wrote:
>>> 
>>> Hi All,
>>> 
>>> With the Apache Airflow community healthy and growing, I think now would be 
>>> a good time to 
>>> discuss where we stand regarding to graduation from the Incubator, and what 
>>> requirements remains. 
>>> 
>>> Apache Airflow entered incubation around 2 years ago, since then, the 
>>> Airflow community learned 
>>> a lot about how to do things in Apache ways. Now we are a very helpful and 
>>> engaged community, 
>>> ready to help on all questions from the Airflow community. We delivered 
>>> multiple releases that have 
>>> been increasing in quality ever since, now we can do self-driving releases 
>>> in good cadence. 
>>> 
>>> The community is growing, new committers and PPMC members keep joining. We 
>>> addressed almost all 
>>> the maturity issues stipulated by Apache Project Maturity Model [1]. So 
>>> final requirements remain, but
>>> those just need a final nudge. Committers and contributors are invited to 
>>> verify the list and pick up the last 
>>> bits (QU30, CO50). Finally (yahoo!) all the License and IP issues we can 
>>> see got resolved. 
>>> 
>>> Base on those, I believes it's time for us to graduate to TLP. [2] Any 
>>> thoughts? 
>>> And welcome advice from Airflow Mentors? 
>>> 
>>> Thanks, 
>>> 
>>> [1] https://cwiki.apache.org/confluence/display/AIRFLOW/Maturity+Evaluation
>>> [2] 
>>> https://incubator.apache.org/guides/graduation.html#graduating_to_a_top_level_project
>>>  Regards,
> 



Airflow 1.10.1 is released

2018-11-21 Thread Ash Berlin-Taylor
Dear Airflow community,
 
I'm happy to announce that Airflow 1.10.1 was just released.
 
The source release as well as the binary "sdist" release are available here:
 
https://dist.apache.org/repos/dist/release/incubator/airflow/1.10.1-incubating/
 
We also made this version available on PyPi for convenience (`pip install 
apache-airflow`):
 
https://pypi.python.org/pypi/apache-airflow
 
Find the CHANGELOG here for more details:
 
https://github.com/apache/incubator-airflow/blob/master/CHANGELOG.txt

[RESULT][VOTE] Release Airflow 1.10.1 (incubating)

2018-11-21 Thread Ash Berlin-Taylor
The vote to release Airflow 1.10.1-incubating, having been open for 3
days is now closed.
 
There were three binding +1s and no -1 votes.

+1 (binding):
Hitesh Shah
Jakob Homan
Justin Mclean
 
The release is approved.
 
Thanks to all those who voted.
 
Cheers,
Ash

> On 21 Nov 2018, at 04:44, Justin Mclean  wrote:
> 
> Hi,
> 
> +1 (binding)
> 
> I checked:
> - incubating in name
> - signatures and hashes good
> - DISCLAIMER exists
> - LICENSE and NOTICE correct
> - No unexpected binary files
> - All ASF source code has ASF headers
> 
> I don’t have the setup to test if it compiles.
> 
> One minor thing I’d remove the "Apache Airflow contains subcomponents …” and 
> "See the LICENSE file …” from teh NOTICE file as I don’t think they are 
> needed.
> 
> Thanks,
> Justin
> -
> To unsubscribe, e-mail: general-unsubscr...@incubator.apache.org
> For additional commands, e-mail: general-h...@incubator.apache.org
> 



[RESULT][VOTE] Airflow 1.10.1 RC2

2018-11-18 Thread Ash Berlin-Taylor
Hello,
 
Apache Airflow (incubating) 1.10.1 (based on RC2) has been accepted.
 
3 “+1” binding votes received:
 
- Ash Berlin-Taylor (binding)
- Kaxil Naik  (binding)
- Fokko Driesprong (binding)

2 "+1" non-binding votes received:

- Deng Xiaodong (non-binding)
- Ikar Pohorsky (non-binding)
 
My next step is to open a thread with the IPMC.
 
Cheers,
Ash

> On 18 Nov 2018, at 16:47, Driesprong, Fokko  wrote:
> 
> A +1 from my side as well.
> 
> Thanks for picking this up Ash. Just checked the new release using Docke
> <https://github.com/Fokko/docker-airflow/commit/eb904450ffbc38cee61421ad8c6ff7cfd28c42eb>r,
> everything seems to work.
> 
> Cheers, Fokko
> 
> Op za 17 nov. 2018 om 16:43 schreef Deng Xiaodong :
> 
>> Even though my vote is non-binding, I would like to change my vote to +1 as
>> well.
>> Reason being the both points I suggested earlier were not regressions from
>> 1.10.0, and they should not be blocking the release.
>> 
>> Cheers.
>> 
>> XD
>> 
>> On Sat, Nov 17, 2018 at 8:11 PM Naik Kaxil  wrote:
>> 
>>> +1 (binding) . I am convinced, we should follow up with 1.10.2 with fixes
>>> soon with small number of commits avoiding a huge gap again between minor
>>> releases.
>>> 
>>> Regards,
>>> Kaxil
>>> 
>>> On 17/11/2018, 11:53, "Ash Berlin-Taylor"  wrote:
>>> 
>>>The RBAC UI is still marked as experimental and this isn't a
>>> regression from 1.10.0, so could you be convinced to change this to a +1?
>>> 
>>>There are other more critical changes I would like to get out, and I
>>> will follow up straight away with a 1.10.2 that addresses this and XD's
>>> points.
>>> 
>>>(I feel Bolke's pain :) I'm now moderately annoyed at the Apache
>>> release process and how long it takes, it means each release ends up
>>> getting big)
>>> 
>>>-ash
>>> 
>>>> 
>>> 
>>> Kaxil Naik
>>> 
>>> Data Reply
>>> Nova South
>>> 160 Victoria Street, Westminster
>>> London SW1E 5LB - UK
>>> phone: +44 (0)20 7730 6000
>>> k.n...@reply.com
>>> www.reply.com
>>> On 17 Nov 2018, at 01:01, Naik Kaxil  wrote:
>>>> 
>>>> -1 (binding) . Tested it on Python 2.7.14, got expected result but
>>> had 1 security concern that I want to get in the release.
>>>> 
>>>> Even when 'expose_config'=False, RBAC you still shows the configs
>>> which can contain sensitive information like airflow metadb passwords,
>> etc.
>>>> 
>>>> If we can get that in +1 from me. The PR with this fixed has been
>>> merged in the master, commit:
>>> 
>> https://github.com/apache/incubator-airflow/commit/85abd44e241e17338a800e37a3c2e85ef346898d
>>> <
>>> 
>> https://github.com/apache/incubator-airflow/commit/85abd44e241e17338a800e37a3c2e85ef346898d
>>>> 
>>>> 
>>>> PR: https://github.com/apache/incubator-airflow/pull/4194 <
>>> https://github.com/apache/incubator-airflow/pull/4194>
>>>> 
>>>> Regards,
>>>> Kaxil
>>>> 
>>>> On 16/11/2018, 13:41, "Deng Xiaodong" > >> xd.den...@gmail.com>> wrote:
>>>> 
>>>>   Hi Ash,
>>>> 
>>>>   I would like to give -1 (non-binding), due to two reasons we
>>> discussed
>>>>   earlier on Slack:
>>>> 
>>>>   - there is an issue with the new “delete DAG” button in UI. It’s
>>> a great
>>>>   feature, so let’s try to release it “bug-less”. The fix is in PR
>>>>   https://github.com/apache/incubator-airflow/pull/4069 (But
>>> understand your
>>>>   concern is that this PR comes with no test yet).
>>>> 
>>>>   - it may be good to pin all dependencies to a specific version
>> to
>>> avoid the
>>>>   incident caused by dependency breaking change (like what happens
>>> to Redis
>>>>   yesterday)
>>>> 
>>>> 
>>>>   Last but not least: nice job! Thanks for your works!
>>>> 
>>>> 
>>>>   XD
>>>> 
>>>> 
>>>>   On Fri, Nov 16, 2018 at 21:13 Ash Berlin-Taylor >> 
>>> wrote:
>>>> 
>>>>> Friendly reminder for people (and especially committers) to test
>>> this out
>>>>> and vote on it please!
>>>

Re: [VOTE] Airflow 1.10.1 RC2

2018-11-17 Thread Ash Berlin-Taylor
The RBAC UI is still marked as experimental and this isn't a regression from 
1.10.0, so could you be convinced to change this to a +1?

There are other more critical changes I would like to get out, and I will 
follow up straight away with a 1.10.2 that addresses this and XD's points.

(I feel Bolke's pain :) I'm now moderately annoyed at the Apache release 
process and how long it takes, it means each release ends up getting big)

-ash

> On 17 Nov 2018, at 01:01, Naik Kaxil  wrote:
> 
> -1 (binding) . Tested it on Python 2.7.14, got expected result but had 1 
> security concern that I want to get in the release.
> 
> Even when 'expose_config'=False, RBAC you still shows the configs which can 
> contain sensitive information like airflow metadb passwords, etc.
> 
> If we can get that in +1 from me. The PR with this fixed has been merged in 
> the master, commit: 
> https://github.com/apache/incubator-airflow/commit/85abd44e241e17338a800e37a3c2e85ef346898d
>  
> <https://github.com/apache/incubator-airflow/commit/85abd44e241e17338a800e37a3c2e85ef346898d>
> 
> PR: https://github.com/apache/incubator-airflow/pull/4194 
> <https://github.com/apache/incubator-airflow/pull/4194>
> 
> Regards,
> Kaxil
> 
> On 16/11/2018, 13:41, "Deng Xiaodong"  <mailto:xd.den...@gmail.com>> wrote:
> 
>Hi Ash,
> 
>I would like to give -1 (non-binding), due to two reasons we discussed
>earlier on Slack:
> 
>- there is an issue with the new “delete DAG” button in UI. It’s a great
>feature, so let’s try to release it “bug-less”. The fix is in PR
>https://github.com/apache/incubator-airflow/pull/4069 (But understand your
>concern is that this PR comes with no test yet).
> 
>- it may be good to pin all dependencies to a specific version to avoid the
>incident caused by dependency breaking change (like what happens to Redis
>yesterday)
> 
> 
>Last but not least: nice job! Thanks for your works!
> 
> 
>XD
> 
> 
>On Fri, Nov 16, 2018 at 21:13 Ash Berlin-Taylor  wrote:
> 
>> Friendly reminder for people (and especially committers) to test this out
>> and vote on it please!
>> 
>> -ash
>> 
>>> 
> 
> Kaxil Naik 
> 
> Data Reply
> Nova South
> 160 Victoria Street, Westminster
> London SW1E 5LB - UK 
> phone: +44 (0)20 7730 6000
> k.n...@reply.com <mailto:k.n...@reply.com>
> www.reply.com <http://www.reply.com/>
> On 14 Nov 2018, at 22:31, Ash Berlin-Taylor  <mailto:a...@apache.org>> wrote:
>>> 
>>> Hey all,
>>> 
>>> I have cut Airflow 1.10.1 RC2. This email is calling a vote on the
>> release, which will last for 72 hours. Consider this my (binding) +1.
>>> 
>>> Airflow 1.10.1 RC2 is available at:
>>> 
>>> https://dist.apache.org/repos/dist/dev/incubator/airflow/1.10.1rc2/
>>> 
>>> apache-airflow-1.10.1rc2+incubating-source.tar.gz is a source release
>> that comes with INSTALL instructions.
>>> apache-airflow-1.10.1rc2+incubating-bin.tar.gz is the binary Python
>> "sdist" release.
>>> 
>>> Public keys are available at:
>>> 
>>> https://dist.apache.org/repos/dist/release/incubator/airflow/KEYS
>>> 
>>> This release candidate has been published to PyPi as a convince for
>> testing, but the vote is against the published artefacts at the above URL,
>> and not this. To install from PyPI run `pip install --pre apache-airflow`
>>> 
>>> Only votes from PMC members are binding, but members of the community
>> are encouraged to test the release and vote with "(non-binding)".
>>> 
>>> Changes since 1.10.1rc1:
>>> 
>>> [AIRFLOW-3343] Update DockerOperator for Docker-py 3.0.0 API changes
>> (#4187)
>>> [AIRFLOW-XXX] Include 3193 in the changelog
>>> [AIRFLOW-XXX] Remove duplicated line in Changelog (#4181)
>>> [AIRFLOW-3339] Correctly get DAG timezone when start_date in
>> default_args (#4186)
>>> 
>>> Changes since 1.10.1b1:
>>> 
>>> [AIRFLOW-XXX] Correct date and version in Changelog
>>> [AIRFLOW-2779] Add license headers to doc files (#4178)
>>> [AIRFLOW-XXX] Changelog and version for 1.10.1
>>> [AIRFLOW-2779] Add license headers to doc files (#4178)
>>> [AIRFLOW-2779] Add project version to license (#4177)
>>> [AIRFLOW-XXX] Sync changelog between release and master branch
>>> [AIRFLOW-XXX] Add missing docs for SNS classes (#4155)
>>> [AIRFLOW-3072] Assign permission get_logs_with_metadata to viewer role
>> (#4

Re: [VOTE] Airflow 1.10.1 RC2

2018-11-16 Thread Ash Berlin-Taylor
Friendly reminder for people (and especially committers) to test this out and 
vote on it please!

-ash

> On 14 Nov 2018, at 22:31, Ash Berlin-Taylor  wrote:
> 
> Hey all,
> 
> I have cut Airflow 1.10.1 RC2. This email is calling a vote on the release, 
> which will last for 72 hours. Consider this my (binding) +1.
> 
> Airflow 1.10.1 RC2 is available at:
> 
> https://dist.apache.org/repos/dist/dev/incubator/airflow/1.10.1rc2/
> 
> apache-airflow-1.10.1rc2+incubating-source.tar.gz is a source release that 
> comes with INSTALL instructions.
> apache-airflow-1.10.1rc2+incubating-bin.tar.gz is the binary Python "sdist" 
> release.
> 
> Public keys are available at:
> 
> https://dist.apache.org/repos/dist/release/incubator/airflow/KEYS
> 
> This release candidate has been published to PyPi as a convince for testing, 
> but the vote is against the published artefacts at the above URL, and not 
> this. To install from PyPI run `pip install --pre apache-airflow`
> 
> Only votes from PMC members are binding, but members of the community are 
> encouraged to test the release and vote with "(non-binding)".
> 
> Changes since 1.10.1rc1:
> 
> [AIRFLOW-3343] Update DockerOperator for Docker-py 3.0.0 API changes (#4187)
> [AIRFLOW-XXX] Include 3193 in the changelog
> [AIRFLOW-XXX] Remove duplicated line in Changelog (#4181)
> [AIRFLOW-3339] Correctly get DAG timezone when start_date in default_args 
> (#4186)
> 
> Changes since 1.10.1b1:
> 
> [AIRFLOW-XXX] Correct date and version in Changelog
> [AIRFLOW-2779] Add license headers to doc files (#4178)
> [AIRFLOW-XXX] Changelog and version for 1.10.1
> [AIRFLOW-2779] Add license headers to doc files (#4178)
> [AIRFLOW-2779] Add project version to license (#4177)
> [AIRFLOW-XXX] Sync changelog between release and master branch
> [AIRFLOW-XXX] Add missing docs for SNS classes (#4155)
> [AIRFLOW-3072] Assign permission get_logs_with_metadata to viewer role (#4175)
> [AIRFLOW-2723] Update lxml dependancy to >= 4.0.0
> [AIRFLOW-3325] Fix UI Page DAGs-column 'Recent Tasks' display issue (#4173)
> [AIRFLOW-XXX] Update Updating instructions for changes in 1.10.1
> [AIRFLOW-XXX] Fix a few typos in CHANGELOG (#4169)
> 
> 
> Full changelog is below:
> 
> New features:
> 
> [AIRFLOW-2524] Airflow integration with AWS Sagemaker
> [AIRFLOW-2657] Add ability to delete DAG from web ui
> [AIRFLOW-2780] Adds IMAP Hook to interact with a mail server
> [AIRFLOW-2794] Add delete support for Azure blob
> [AIRFLOW-2912] Add operators for Google Cloud Functions
> [AIRFLOW-2974] Add Start/Restart/Terminate methods Databricks Hook
> [AIRFLOW-2989] No Parameter to change bootDiskType for 
> DataprocClusterCreateOperator
> [AIRFLOW-3078] Basic operators for Google Compute Engine
> [AIRFLOW-3147] Update Flask-AppBuilder version
> [AIRFLOW-3231] Basic operators for Google Cloud SQL (deploy / patch / delete)
> [AIRFLOW-3276] Google Cloud SQL database create / patch / delete operators
> 
> Improvements:
> 
> [AIRFLOW-393] Add progress callbacks for FTP downloads
> [AIRFLOW-520] Show Airflow version on web page
> [AIRFLOW-843] Exceptions now available in context during on_failure_callback
> [AIRFLOW-2476] Update tabulate dependency to v0.8.2
> [AIRFLOW-2592] Bump Bleach dependency
> [AIRFLOW-2622] Add "confirm=False" option to SFTPOperator
> [AIRFLOW-2662] support affinity & nodeSelector policies for kubernetes 
> executor/operator
> [AIRFLOW-2709] Improve error handling in Databricks hook
> [AIRFLOW-2723] Update lxml dependancy to >= 4.0.
> [AIRFLOW-2763] No precheck mechanism in place during worker initialisation 
> for the connection to metadata database
> [AIRFLOW-2789] Add ability to create single node cluster to 
> DataprocClusterCreateOperator
> [AIRFLOW-2797] Add ability to create Google Dataproc cluster with custom image
> [AIRFLOW-2854] kubernetes_pod_operator add more configuration items
> [AIRFLOW-2855] Need to Check Validity of Cron Expression When Process DAG 
> File/Zip File
> [AIRFLOW-2904] Clean an unnecessary line in 
> airflow/executors/celery_executor.py
> [AIRFLOW-2921] A trivial incorrectness in CeleryExecutor()
> [AIRFLOW-2922] Potential deal-lock bug in CeleryExecutor()
> [AIRFLOW-2932] GoogleCloudStorageHook - allow compression of file
> [AIRFLOW-2949] Syntax Highlight for Single Quote
> [AIRFLOW-2951] dag_run end_date Null after a dag is finished
> [AIRFLOW-2956] Kubernetes tolerations for pod operator
> [AIRFLOW-2997] Support for clustered tables in Bigquery hooks/operators
> [AIRFLOW-3006] Fix error when schedule_interval="None"
> [AIRFLOW-3008] Move Kubernetes related example DAGs to contrib/example_dags
> [AIRFLOW-302

Re: reg airflow on kubernetes

2018-11-16 Thread Ash Berlin-Taylor
Pod has unbound PersistentVolumeClaims sounds like an error from Kubernetes.

Have you specificed any specific persistent volumes in your kube config for 
Airflow? Are you running in AWS with EBS volume provisioning - if so 
https://github.com/kubernetes/kubernetes/issues/34583 could be your issue (Not 
that is "closed" but the problem still persists)

-ash


> On 15 Nov 2018, at 14:09, manojbabu...@gmail.com wrote:
> 
> Hi,
> I was following the below steps to run airflow on kubernetes and getting 
> below error.
> Can any one share thoughts or point to detailed steps to install airflow on 
> kubernetes?
> 
> Error:
> The POD status is "Pending" and shows error "Pod has unbound 
> PersistentVolumeClaims (repeated 2 times)"
> 
> Steps:
> sed -ie "s/KubernetesExecutor/LocalExecutor/g" 
> scripts/ci/kubernetes/kube/configmaps.yaml
> ./scripts/ci/kubernetes/docker/build.sh
> ./scripts/ci/kubernetes/kube/deploy.sh
> 
> thanks.
> 



Re: Moving Airflow Config to Database.

2018-11-15 Thread Ash Berlin-Taylor
> problem with
> this approach is these env variables wont behave correctly when we
> subshells


Can you explain what you mean by this?

-ash


> On 15 Nov 2018, at 12:03, Sai Phanindhra  wrote:
> 
> Hi deng,
> I am currently using env variables for few airflow config variables which
> may differ across machines(airflow folder, log folder etc..,) Problem with
> this approach is these env variables wont behave correctly when we
> subshells. ( I faced issues when i added airflow jobs in supervisord).
>  Moving config to db not only addresses this issue, it will give provision
> to change config from UI.(most of the time, we cant give box access to all
> users. ). Config is db makes it easy to create/update config without
> touching code.
> 
> On Thu 15 Nov, 2018, 16:04 Deng Xiaodong  
>> A few solutions that may address your problem:
>> 
>> - Specify your configurations in environment variables, so it becomes much
>> easier to manage across machines
>> - use network attached storage to save your configuration file and mount it
>> to all your machines (this can address DAG file sync as well)
>> - ...
>> 
>> Personally I don’t see point moving configuration to DB.
>> 
>> 
>> XD
>> 
>> On Thu, Nov 15, 2018 at 18:29 Sai Phanindhra  wrote:
>> 
>>> Hello Airflow users,
>>>  I recently encountered as issue with airflow. I am maintaining a
>>> airflow cluster, whenever i make a change in airflow configuration in one
>>> of the machine i have to consciously copy these changes to other machines
>>> in airflow cluster. Problem with this is it's a manual process and
>>> something users tend to forget to sync changes. I want move airflow
>> config
>>> to a database. I will be happy if you can share your valuable
>>> inputs/thoughts on this.
>>> 
>>> --
>>> Sai Phanindhra,
>>> Ph: +91 9043258999
>>> 
>> 



[VOTE] Airflow 1.10.1 RC2

2018-11-14 Thread Ash Berlin-Taylor
Hey all,

I have cut Airflow 1.10.1 RC2. This email is calling a vote on the release, 
which will last for 72 hours. Consider this my (binding) +1.

Airflow 1.10.1 RC2 is available at:

https://dist.apache.org/repos/dist/dev/incubator/airflow/1.10.1rc2/

apache-airflow-1.10.1rc2+incubating-source.tar.gz is a source release that 
comes with INSTALL instructions.
apache-airflow-1.10.1rc2+incubating-bin.tar.gz is the binary Python "sdist" 
release.

Public keys are available at:

https://dist.apache.org/repos/dist/release/incubator/airflow/KEYS

This release candidate has been published to PyPi as a convince for testing, 
but the vote is against the published artefacts at the above URL, and not this. 
To install from PyPI run `pip install --pre apache-airflow`

Only votes from PMC members are binding, but members of the community are 
encouraged to test the release and vote with "(non-binding)".

Changes since 1.10.1rc1:

[AIRFLOW-3343] Update DockerOperator for Docker-py 3.0.0 API changes (#4187)
[AIRFLOW-XXX] Include 3193 in the changelog
[AIRFLOW-XXX] Remove duplicated line in Changelog (#4181)
[AIRFLOW-3339] Correctly get DAG timezone when start_date in default_args 
(#4186)

Changes since 1.10.1b1:

[AIRFLOW-XXX] Correct date and version in Changelog
[AIRFLOW-2779] Add license headers to doc files (#4178)
[AIRFLOW-XXX] Changelog and version for 1.10.1
[AIRFLOW-2779] Add license headers to doc files (#4178)
[AIRFLOW-2779] Add project version to license (#4177)
[AIRFLOW-XXX] Sync changelog between release and master branch
[AIRFLOW-XXX] Add missing docs for SNS classes (#4155)
[AIRFLOW-3072] Assign permission get_logs_with_metadata to viewer role (#4175)
[AIRFLOW-2723] Update lxml dependancy to >= 4.0.0
[AIRFLOW-3325] Fix UI Page DAGs-column 'Recent Tasks' display issue (#4173)
[AIRFLOW-XXX] Update Updating instructions for changes in 1.10.1
[AIRFLOW-XXX] Fix a few typos in CHANGELOG (#4169)


Full changelog is below:

New features:

[AIRFLOW-2524] Airflow integration with AWS Sagemaker
[AIRFLOW-2657] Add ability to delete DAG from web ui
[AIRFLOW-2780] Adds IMAP Hook to interact with a mail server
[AIRFLOW-2794] Add delete support for Azure blob
[AIRFLOW-2912] Add operators for Google Cloud Functions
[AIRFLOW-2974] Add Start/Restart/Terminate methods Databricks Hook
[AIRFLOW-2989] No Parameter to change bootDiskType for 
DataprocClusterCreateOperator
[AIRFLOW-3078] Basic operators for Google Compute Engine
[AIRFLOW-3147] Update Flask-AppBuilder version
[AIRFLOW-3231] Basic operators for Google Cloud SQL (deploy / patch / delete)
[AIRFLOW-3276] Google Cloud SQL database create / patch / delete operators

Improvements:

[AIRFLOW-393] Add progress callbacks for FTP downloads
[AIRFLOW-520] Show Airflow version on web page
[AIRFLOW-843] Exceptions now available in context during on_failure_callback
[AIRFLOW-2476] Update tabulate dependency to v0.8.2
[AIRFLOW-2592] Bump Bleach dependency
[AIRFLOW-2622] Add "confirm=False" option to SFTPOperator
[AIRFLOW-2662] support affinity & nodeSelector policies for kubernetes 
executor/operator
[AIRFLOW-2709] Improve error handling in Databricks hook
[AIRFLOW-2723] Update lxml dependancy to >= 4.0.
[AIRFLOW-2763] No precheck mechanism in place during worker initialisation for 
the connection to metadata database
[AIRFLOW-2789] Add ability to create single node cluster to 
DataprocClusterCreateOperator
[AIRFLOW-2797] Add ability to create Google Dataproc cluster with custom image
[AIRFLOW-2854] kubernetes_pod_operator add more configuration items
[AIRFLOW-2855] Need to Check Validity of Cron Expression When Process DAG 
File/Zip File
[AIRFLOW-2904] Clean an unnecessary line in airflow/executors/celery_executor.py
[AIRFLOW-2921] A trivial incorrectness in CeleryExecutor()
[AIRFLOW-2922] Potential deal-lock bug in CeleryExecutor()
[AIRFLOW-2932] GoogleCloudStorageHook - allow compression of file
[AIRFLOW-2949] Syntax Highlight for Single Quote
[AIRFLOW-2951] dag_run end_date Null after a dag is finished
[AIRFLOW-2956] Kubernetes tolerations for pod operator
[AIRFLOW-2997] Support for clustered tables in Bigquery hooks/operators
[AIRFLOW-3006] Fix error when schedule_interval="None"
[AIRFLOW-3008] Move Kubernetes related example DAGs to contrib/example_dags
[AIRFLOW-3025] Allow to specify dns and dns-search parameters for DockerOperator
[AIRFLOW-3067] (www_rbac) Flask flash messages are not displayed properly (no 
background color)
[AIRFLOW-3069] Decode output of S3 file transform operator
[AIRFLOW-3072] Assign permission get_logs_with_metadata to viewer role
[AIRFLOW-3090] INFO logs are too verbose
[AIRFLOW-3103] Update Flask-Login
[AIRFLOW-3112] Align SFTP hook with SSH hook
[AIRFLOW-3119] Enable loglevel on celery worker and inherit from airflow.cfg
[AIRFLOW-3137] Make ProxyFix middleware optional
[AIRFLOW-3173] Add _cmd options for more password config options
[AIRFLOW-3177] Change scheduler_heartbeat metric from gauge to counter
[AIRFLOW-3195] Druid Hook: Log ingestion sp

Re: [VOTE CANCELED] Airflow 1.10.1rc1

2018-11-14 Thread Ash Berlin-Taylor
We've had two regressions against this release reported in Slack so I'm 
cancelling this vote, to re-open a new one once these two PRs are merged:

https://github.com/apache/incubator-airflow/pull/4186
https://github.com/apache/incubator-airflow/pull/4187

Committeers: if you could look at the PRs so we can get a new vote started?

Some other questions: 

- Do our votes need to last 72 hours each, or can we have it be "72 hours or 
until 3 (or 5) +1 binding votes?"
- What do people think about making the RCs available on pip? It is how most 
people install Airflow and publishing it there makes it easier for people to 
test. (Pip won't install beta or rc versions when doing `pip install 
apache-airflow`, you have to add `==1.10.1b1`, so it's "safe" in that regard.)

-ash

> On 13 Nov 2018, at 15:59, Ash Berlin-Taylor  wrote:
> 
> CORRECTION: Correct URLs are
> 
> https://dist.apache.org/repos/dist/dev/incubator/airflow/1.10.1rc1 
> <https://dist.apache.org/repos/dist/dev/incubator/airflow/1.10.1rc1>
> 
> Copy-and-paste fail
> 
> 
>> On 13 Nov 2018, at 15:29, Ash Berlin-Taylor  wrote:
>> 
>> Hey all,
>> 
>> I have cut Airflow 1.10.1 RC1. This email is calling a vote on the release, 
>> which will last for 72 hours. Consider this my (binding) +1.
>> 
>> Airflow 1.10.1 RC1 is available at:
>> 
>> https://dist.apache.org/repos/dist/dev/incubator/airflow/1.10.0rc1/
>> 
>> apache-airflow-1.10.1rc1+incubating-source.tar.gz is a source release that 
>> comes with INSTALL instructions.
>> apache-airflow-1.10.1rc1+incubating-bin.tar.gz is the binary Python "sdist" 
>> release.
>> 
>> Public keys are available at:
>> 
>> https://dist.apache.org/repos/dist/release/incubator/airflow/KEYS
>> 
>> Only votes from PMC members are binding, but members of the community are 
>> encouraged to test the release and vote with "(non-binding)".
>> 
>> Changes since 1.10.1b1:
>> 
>> [AIRFLOW-XXX] Correct date and version in Changelog
>> [AIRFLOW-2779] Add license headers to doc files (#4178)
>> [AIRFLOW-XXX] Changelog and version for 1.10.1
>> [AIRFLOW-2779] Add license headers to doc files (#4178)
>> [AIRFLOW-2779] Add project version to license (#4177)
>> [AIRFLOW-XXX] Sync changelog between release and master branch
>> [AIRFLOW-XXX] Add missing docs for SNS classes (#4155)
>> [AIRFLOW-3072] Assign permission get_logs_with_metadata to viewer role 
>> (#4175)
>> [AIRFLOW-2723] Update lxml dependancy to >= 4.0.0
>> [AIRFLOW-3325] Fix UI Page DAGs-column 'Recent Tasks' display issue (#4173)
>> [AIRFLOW-XXX] Update Updating instructions for changes in 1.10.1
>> [AIRFLOW-XXX] Fix a few typos in CHANGELOG (#4169)
>> 
>> 
>> Full changelog is below:
>> 
>> New features:
>> 
>> [AIRFLOW-2524] Airflow integration with AWS Sagemaker
>> [AIRFLOW-2657] Add ability to delete DAG from web ui
>> [AIRFLOW-2780] Adds IMAP Hook to interact with a mail server
>> [AIRFLOW-2794] Add delete support for Azure blob
>> [AIRFLOW-2912] Add operators for Google Cloud Functions
>> [AIRFLOW-2974] Add Start/Restart/Terminate methods Databricks Hook
>> [AIRFLOW-2989] No Parameter to change bootDiskType for 
>> DataprocClusterCreateOperator
>> [AIRFLOW-3078] Basic operators for Google Compute Engine
>> [AIRFLOW-3147] Update Flask-AppBuilder version
>> [AIRFLOW-3231] Basic operators for Google Cloud SQL (deploy / patch / delete)
>> [AIRFLOW-3276] Google Cloud SQL database create / patch / delete operators
>> 
>> Improvements:
>> 
>> [AIRFLOW-393] Add progress callbacks for FTP downloads
>> [AIRFLOW-520] Show Airflow version on web page
>> [AIRFLOW-843] Exceptions now available in context during on_failure_callback
>> [AIRFLOW-2476] Update tabulate dependency to v0.8.2
>> [AIRFLOW-2592] Bump Bleach dependency
>> [AIRFLOW-2622] Add "confirm=False" option to SFTPOperator
>> [AIRFLOW-2662] support affinity & nodeSelector policies for kubernetes 
>> executor/operator
>> [AIRFLOW-2709] Improve error handling in Databricks hook
>> [AIRFLOW-2723] Update lxml dependancy to >= 4.0.
>> [AIRFLOW-2763] No precheck mechanism in place during worker initialisation 
>> for the connection to metadata database
>> [AIRFLOW-2789] Add ability to create single node cluster to 
>> DataprocClusterCreateOperator
>> [AIRFLOW-2797] Add ability to create Google Dataproc cluster with custom 
>> image
>> [AIRFLOW-2854] kubernetes_pod_operator add more configuration items
>> [AIRFLOW-2855] Need to Check Vali

Re: [VOTE] Airflow 1.10.1rc1

2018-11-13 Thread Ash Berlin-Taylor
CORRECTION: Correct URLs are

https://dist.apache.org/repos/dist/dev/incubator/airflow/1.10.1rc1 
<https://dist.apache.org/repos/dist/dev/incubator/airflow/1.10.1rc1>

Copy-and-paste fail


> On 13 Nov 2018, at 15:29, Ash Berlin-Taylor  wrote:
> 
> Hey all,
> 
> I have cut Airflow 1.10.1 RC1. This email is calling a vote on the release, 
> which will last for 72 hours. Consider this my (binding) +1.
> 
> Airflow 1.10.1 RC1 is available at:
> 
> https://dist.apache.org/repos/dist/dev/incubator/airflow/1.10.0rc1/
> 
> apache-airflow-1.10.1rc1+incubating-source.tar.gz is a source release that 
> comes with INSTALL instructions.
> apache-airflow-1.10.1rc1+incubating-bin.tar.gz is the binary Python "sdist" 
> release.
> 
> Public keys are available at:
> 
> https://dist.apache.org/repos/dist/release/incubator/airflow/KEYS
> 
> Only votes from PMC members are binding, but members of the community are 
> encouraged to test the release and vote with "(non-binding)".
> 
> Changes since 1.10.1b1:
> 
> [AIRFLOW-XXX] Correct date and version in Changelog
> [AIRFLOW-2779] Add license headers to doc files (#4178)
> [AIRFLOW-XXX] Changelog and version for 1.10.1
> [AIRFLOW-2779] Add license headers to doc files (#4178)
> [AIRFLOW-2779] Add project version to license (#4177)
> [AIRFLOW-XXX] Sync changelog between release and master branch
> [AIRFLOW-XXX] Add missing docs for SNS classes (#4155)
> [AIRFLOW-3072] Assign permission get_logs_with_metadata to viewer role (#4175)
> [AIRFLOW-2723] Update lxml dependancy to >= 4.0.0
> [AIRFLOW-3325] Fix UI Page DAGs-column 'Recent Tasks' display issue (#4173)
> [AIRFLOW-XXX] Update Updating instructions for changes in 1.10.1
> [AIRFLOW-XXX] Fix a few typos in CHANGELOG (#4169)
> 
> 
> Full changelog is below:
> 
> New features:
> 
> [AIRFLOW-2524] Airflow integration with AWS Sagemaker
> [AIRFLOW-2657] Add ability to delete DAG from web ui
> [AIRFLOW-2780] Adds IMAP Hook to interact with a mail server
> [AIRFLOW-2794] Add delete support for Azure blob
> [AIRFLOW-2912] Add operators for Google Cloud Functions
> [AIRFLOW-2974] Add Start/Restart/Terminate methods Databricks Hook
> [AIRFLOW-2989] No Parameter to change bootDiskType for 
> DataprocClusterCreateOperator
> [AIRFLOW-3078] Basic operators for Google Compute Engine
> [AIRFLOW-3147] Update Flask-AppBuilder version
> [AIRFLOW-3231] Basic operators for Google Cloud SQL (deploy / patch / delete)
> [AIRFLOW-3276] Google Cloud SQL database create / patch / delete operators
> 
> Improvements:
> 
> [AIRFLOW-393] Add progress callbacks for FTP downloads
> [AIRFLOW-520] Show Airflow version on web page
> [AIRFLOW-843] Exceptions now available in context during on_failure_callback
> [AIRFLOW-2476] Update tabulate dependency to v0.8.2
> [AIRFLOW-2592] Bump Bleach dependency
> [AIRFLOW-2622] Add "confirm=False" option to SFTPOperator
> [AIRFLOW-2662] support affinity & nodeSelector policies for kubernetes 
> executor/operator
> [AIRFLOW-2709] Improve error handling in Databricks hook
> [AIRFLOW-2723] Update lxml dependancy to >= 4.0.
> [AIRFLOW-2763] No precheck mechanism in place during worker initialisation 
> for the connection to metadata database
> [AIRFLOW-2789] Add ability to create single node cluster to 
> DataprocClusterCreateOperator
> [AIRFLOW-2797] Add ability to create Google Dataproc cluster with custom image
> [AIRFLOW-2854] kubernetes_pod_operator add more configuration items
> [AIRFLOW-2855] Need to Check Validity of Cron Expression When Process DAG 
> File/Zip File
> [AIRFLOW-2904] Clean an unnecessary line in 
> airflow/executors/celery_executor.py
> [AIRFLOW-2921] A trivial incorrectness in CeleryExecutor()
> [AIRFLOW-2922] Potential deal-lock bug in CeleryExecutor()
> [AIRFLOW-2932] GoogleCloudStorageHook - allow compression of file
> [AIRFLOW-2949] Syntax Highlight for Single Quote
> [AIRFLOW-2951] dag_run end_date Null after a dag is finished
> [AIRFLOW-2956] Kubernetes tolerations for pod operator
> [AIRFLOW-2997] Support for clustered tables in Bigquery hooks/operators
> [AIRFLOW-3006] Fix error when schedule_interval="None"
> [AIRFLOW-3008] Move Kubernetes related example DAGs to contrib/example_dags
> [AIRFLOW-3025] Allow to specify dns and dns-search parameters for 
> DockerOperator
> [AIRFLOW-3067] (www_rbac) Flask flash messages are not displayed properly (no 
> background color)
> [AIRFLOW-3069] Decode output of S3 file transform operator
> [AIRFLOW-3072] Assign permission get_logs_with_metadata to viewer role
> [AIRFLOW-3090] INFO logs are too verbose
> [AIRFLOW-3103] Update Flask-Login
> [AIRFLOW-3112] Align SFTP hook with SSH hook
> [AIRFLOW-311

[VOTE] Airflow 1.10.1rc1

2018-11-13 Thread Ash Berlin-Taylor
Hey all,
 
I have cut Airflow 1.10.1 RC1. This email is calling a vote on the release, 
which will last for 72 hours. Consider this my (binding) +1.
 
Airflow 1.10.1 RC1 is available at:
 
https://dist.apache.org/repos/dist/dev/incubator/airflow/1.10.0rc1/
 
apache-airflow-1.10.1rc1+incubating-source.tar.gz is a source release that 
comes with INSTALL instructions.
apache-airflow-1.10.1rc1+incubating-bin.tar.gz is the binary Python "sdist" 
release.
 
Public keys are available at:
 
https://dist.apache.org/repos/dist/release/incubator/airflow/KEYS

Only votes from PMC members are binding, but members of the community are 
encouraged to test the release and vote with "(non-binding)".

Changes since 1.10.1b1:

[AIRFLOW-XXX] Correct date and version in Changelog
[AIRFLOW-2779] Add license headers to doc files (#4178)
[AIRFLOW-XXX] Changelog and version for 1.10.1
[AIRFLOW-2779] Add license headers to doc files (#4178)
[AIRFLOW-2779] Add project version to license (#4177)
[AIRFLOW-XXX] Sync changelog between release and master branch
[AIRFLOW-XXX] Add missing docs for SNS classes (#4155)
[AIRFLOW-3072] Assign permission get_logs_with_metadata to viewer role (#4175)
[AIRFLOW-2723] Update lxml dependancy to >= 4.0.0
[AIRFLOW-3325] Fix UI Page DAGs-column 'Recent Tasks' display issue (#4173)
[AIRFLOW-XXX] Update Updating instructions for changes in 1.10.1
[AIRFLOW-XXX] Fix a few typos in CHANGELOG (#4169)


Full changelog is below:

New features:

[AIRFLOW-2524] Airflow integration with AWS Sagemaker
[AIRFLOW-2657] Add ability to delete DAG from web ui
[AIRFLOW-2780] Adds IMAP Hook to interact with a mail server
[AIRFLOW-2794] Add delete support for Azure blob
[AIRFLOW-2912] Add operators for Google Cloud Functions
[AIRFLOW-2974] Add Start/Restart/Terminate methods Databricks Hook
[AIRFLOW-2989] No Parameter to change bootDiskType for 
DataprocClusterCreateOperator
[AIRFLOW-3078] Basic operators for Google Compute Engine
[AIRFLOW-3147] Update Flask-AppBuilder version
[AIRFLOW-3231] Basic operators for Google Cloud SQL (deploy / patch / delete)
[AIRFLOW-3276] Google Cloud SQL database create / patch / delete operators

Improvements:

[AIRFLOW-393] Add progress callbacks for FTP downloads
[AIRFLOW-520] Show Airflow version on web page
[AIRFLOW-843] Exceptions now available in context during on_failure_callback
[AIRFLOW-2476] Update tabulate dependency to v0.8.2
[AIRFLOW-2592] Bump Bleach dependency
[AIRFLOW-2622] Add "confirm=False" option to SFTPOperator
[AIRFLOW-2662] support affinity & nodeSelector policies for kubernetes 
executor/operator
[AIRFLOW-2709] Improve error handling in Databricks hook
[AIRFLOW-2723] Update lxml dependancy to >= 4.0.
[AIRFLOW-2763] No precheck mechanism in place during worker initialisation for 
the connection to metadata database
[AIRFLOW-2789] Add ability to create single node cluster to 
DataprocClusterCreateOperator
[AIRFLOW-2797] Add ability to create Google Dataproc cluster with custom image
[AIRFLOW-2854] kubernetes_pod_operator add more configuration items
[AIRFLOW-2855] Need to Check Validity of Cron Expression When Process DAG 
File/Zip File
[AIRFLOW-2904] Clean an unnecessary line in airflow/executors/celery_executor.py
[AIRFLOW-2921] A trivial incorrectness in CeleryExecutor()
[AIRFLOW-2922] Potential deal-lock bug in CeleryExecutor()
[AIRFLOW-2932] GoogleCloudStorageHook - allow compression of file
[AIRFLOW-2949] Syntax Highlight for Single Quote
[AIRFLOW-2951] dag_run end_date Null after a dag is finished
[AIRFLOW-2956] Kubernetes tolerations for pod operator
[AIRFLOW-2997] Support for clustered tables in Bigquery hooks/operators
[AIRFLOW-3006] Fix error when schedule_interval="None"
[AIRFLOW-3008] Move Kubernetes related example DAGs to contrib/example_dags
[AIRFLOW-3025] Allow to specify dns and dns-search parameters for DockerOperator
[AIRFLOW-3067] (www_rbac) Flask flash messages are not displayed properly (no 
background color)
[AIRFLOW-3069] Decode output of S3 file transform operator
[AIRFLOW-3072] Assign permission get_logs_with_metadata to viewer role
[AIRFLOW-3090] INFO logs are too verbose
[AIRFLOW-3103] Update Flask-Login
[AIRFLOW-3112] Align SFTP hook with SSH hook
[AIRFLOW-3119] Enable loglevel on celery worker and inherit from airflow.cfg
[AIRFLOW-3137] Make ProxyFix middleware optional
[AIRFLOW-3173] Add _cmd options for more password config options
[AIRFLOW-3177] Change scheduler_heartbeat metric from gauge to counter
[AIRFLOW-3195] Druid Hook: Log ingestion spec and task id
[AIRFLOW-3197] EMR Hook is missing some parameters to valid on the AWS API
[AIRFLOW-3232] Make documentation for GCF Functions operator more readable
[AIRFLOW-3262] Can't get log containing Response when using SimpleHttpOperator
[AIRFLOW-3265] Add support for "unix_socket" in connection extra for Mysql Hook

Doc-only changes:

[AIRFLOW-1441] Tutorial Inconsistencies Between Example Pipeline Definition and 
Recap
[AIRFLOW-2682] Add how-to guide(s) for how to use basic operators like 

Re: Airflow 1.10.1b1 release available - PLEASE TEST

2018-11-12 Thread Ash Berlin-Taylor
Hi Hitesh,

My understanding was that the only official place an Apache release can happen 
from is https://www.apache.org/dist/incubator/airflow/ 
<https://www.apache.org/dist/incubator/airflow/> - so anything else is by 
definition not an official Apache release.

So i guess it depends on what we mean by "Apache" release - could it be 
confused that it is in some way official, yes, possibly. Although the end-user 
would have to go out of their way to find it and install it so the risk was 
low, and I felt that the benefit to the community of being able to test and 
install this easily was worth the small risk of confusion.

Possibly something I should have asked (voted on?) first, or before we do this 
in the future.

-ash

> On 12 Nov 2018, at 20:32, Hitesh Shah  wrote:
> 
> Hello Ash
> 
> For someone who is not familiar with the beta notation or folks who do not
> check whether it is signed or not, could this be confused as an apache
> release? Also, is there a plan to clean up/delete this version within a
> finite time window once the official release voting is kicked off?
> 
> thanks
> Hitesh
> 
> 
> On Fri, Nov 9, 2018 at 5:52 AM Naik Kaxil  wrote:
> 
>> Good work Ash. Appreciate the clean and categorised Change Log.
>> 
>> Regards,
>> Kaxil
>> 
>> On 09/11/2018, 13:45, "Ash Berlin-Taylor"  wrote:
>> 
>>Hi Everyone,
>> 
>>I've just released a beta version of 1.10.1! Please could you test
>> this and report back any problems you notice, and also report back if you
>> tried it and it works fine. As this is the first time I've released Airflow
>> there is possible that there are packaging mistakes too. I'm not calling
>> for a vote just yet, but I will give this a few days until I start making
>> release candidates and calling for a formal vote, probably on Monday or
>> Tuesday.
>> 
>>In order to distinguish it from an actual (apache) release it is:
>> 
>>1. Marked as beta (python package managers do not install beta
>> versions by default - PEP 440)
>>2. It is not signed
>>3. It is not at an official apache distribution location
>> 
>>It can be installed with SLUGIFY_USES_TEXT_UNIDECODE=yes pip install
>> 'apache-airflow==1.10.1b1'
>> 
>>(Don't worry, without asking for `--pre` or specifying the version
>> `pip install apache-airflow` will still get 1.10.0)
>> 
>>Thanks,
>>Ash
>> 
>> 
>> 
>> 
>> 
>>Included below is the changelog of this release:
>> 
>>New features:
>> 
>>[AIRFLOW-2524] Airflow integration with AWS Sagemaker
>>[AIRFLOW-2657] Add ability to delete DAG from web ui
>>[AIRFLOW-2780] Adds IMAP Hook to interact with a mail server
>>[AIRFLOW-2794] Add delete support for Azure blob
>>[AIRFLOW-2912] Add operators for Google Cloud Functions
>>[AIRFLOW-2974] Add Start/Restart/Terminate methods Databricks Hook
>>[AIRFLOW-2989] No Parameter to change bootDiskType for
>> DataprocClusterCreateOperator
>>[AIRFLOW-3078] Basic operators for Google Compute Engine
>>[AIRFLOW-3147] Update Flask-AppBuilder version
>>[AIRFLOW-3231] Basic operators for Google Cloud SQL (deploy / patch /
>> delete)
>>[AIRFLOW-3276] Google Cloud SQL database create / patch / delete
>> operators
>> 
>>Improvements:
>> 
>>[AIRFLOW-393] Add progress callbacks for FTP downloads
>>[AIRFLOW-520] Show Airflow version on web page
>>[AIRFLOW-843] Excpetions now available in context durint
>> on_failure_callback
>>[AIRFLOW-2476] Update tabulate dependency to v0.8.2
>>[AIRFLOW-2592] Bump Bleach dependency
>>[AIRFLOW-2622] Add "confirm=False" option to SFTPOperator
>>[AIRFLOW-2662] support affinity & nodeSelector policies for kubernetes
>> executor/operator
>>[AIRFLOW-2709] Improve error handling in Databricks hook
>>[AIRFLOW-2763] No precheck mechanism in place during worker
>> initialisation for the connection to metadata database
>>[AIRFLOW-2789] Add ability to create single node cluster to
>> DataprocClusterCreateOperator
>>[AIRFLOW-2797] Add ability to create Google Dataproc cluster with
>> custom image
>>[AIRFLOW-2854] kubernetes_pod_operator add more configuration items
>>[AIRFLOW-2855] Need to Check Validity of Cron Expression When Process
>> DAG File/Zip File
>>[AIRFLOW-2904] Clean an unnecessary line in
>> airflow/executors/celery_executor.py
>>[AIRFLOW-2921] A trivial incorrectness in Cele

Re: CSS issue on Airflow 1.10 Tree view UI

2018-11-12 Thread Ash Berlin-Taylor
Screenshots (attachments) don't come through on the list - could you post a 
link to it somewhere?

Does this only apply when the list of tasks is particularly tall or wide? A 
simple case I can't reproduce this on 1.10.1b1 (but I don't think anythere has 
changed recently.)

-ash

> On 9 Nov 2018, at 20:25, Adityan MS  wrote:
> 
> Hi,
> 
> After upgrading to airflow 1.10, hovering the mouse over the rectangles in 
> tree view, causes the tooltip to popup much higher than it used to be. Any 
> quick fix for this behavior? In the screenshot, the right lowermost rectangle 
> is the one that is being hovered over.
> 
> Once you scroll down on the tree view, the tooltip starts floating up. 
> 
> Things I have tried to fix this behavior:
> 1. Change the css themes in airflow.cfg and restart the web server
> 2. Inspected the tooltip in Chrome, it seems to be a dynamically generated 
> CSS class. The CSS class controlling this behavior seem to be the same in 
> Airflow 1.9. 
> 
> Has anyone else run into this issue?
> 
> Thanks!
> 
> 



Airflow 1.10.1b1 release available - PLEASE TEST

2018-11-09 Thread Ash Berlin-Taylor
Hi Everyone,

I've just released a beta version of 1.10.1! Please could you test this and 
report back any problems you notice, and also report back if you tried it and 
it works fine. As this is the first time I've released Airflow there is 
possible that there are packaging mistakes too. I'm not calling for a vote just 
yet, but I will give this a few days until I start making release candidates 
and calling for a formal vote, probably on Monday or Tuesday.

In order to distinguish it from an actual (apache) release it is:

1. Marked as beta (python package managers do not install beta versions by 
default - PEP 440)
2. It is not signed
3. It is not at an official apache distribution location

It can be installed with SLUGIFY_USES_TEXT_UNIDECODE=yes pip install 
'apache-airflow==1.10.1b1'

(Don't worry, without asking for `--pre` or specifying the version `pip install 
apache-airflow` will still get 1.10.0)

Thanks,
Ash





Included below is the changelog of this release:

New features:

[AIRFLOW-2524] Airflow integration with AWS Sagemaker
[AIRFLOW-2657] Add ability to delete DAG from web ui
[AIRFLOW-2780] Adds IMAP Hook to interact with a mail server
[AIRFLOW-2794] Add delete support for Azure blob
[AIRFLOW-2912] Add operators for Google Cloud Functions
[AIRFLOW-2974] Add Start/Restart/Terminate methods Databricks Hook
[AIRFLOW-2989] No Parameter to change bootDiskType for 
DataprocClusterCreateOperator
[AIRFLOW-3078] Basic operators for Google Compute Engine
[AIRFLOW-3147] Update Flask-AppBuilder version
[AIRFLOW-3231] Basic operators for Google Cloud SQL (deploy / patch / delete)
[AIRFLOW-3276] Google Cloud SQL database create / patch / delete operators

Improvements:

[AIRFLOW-393] Add progress callbacks for FTP downloads
[AIRFLOW-520] Show Airflow version on web page
[AIRFLOW-843] Excpetions now available in context durint on_failure_callback
[AIRFLOW-2476] Update tabulate dependency to v0.8.2
[AIRFLOW-2592] Bump Bleach dependency
[AIRFLOW-2622] Add "confirm=False" option to SFTPOperator
[AIRFLOW-2662] support affinity & nodeSelector policies for kubernetes 
executor/operator
[AIRFLOW-2709] Improve error handling in Databricks hook
[AIRFLOW-2763] No precheck mechanism in place during worker initialisation for 
the connection to metadata database
[AIRFLOW-2789] Add ability to create single node cluster to 
DataprocClusterCreateOperator
[AIRFLOW-2797] Add ability to create Google Dataproc cluster with custom image
[AIRFLOW-2854] kubernetes_pod_operator add more configuration items
[AIRFLOW-2855] Need to Check Validity of Cron Expression When Process DAG 
File/Zip File
[AIRFLOW-2904] Clean an unnecessary line in airflow/executors/celery_executor.py
[AIRFLOW-2921] A trivial incorrectness in CeleryExecutor()
[AIRFLOW-2922] Potential deal-lock bug in CeleryExecutor()
[AIRFLOW-2932] GoogleCloudStorageHook - allow compression of file
[AIRFLOW-2949] Syntax Highlight for Single Quote
[AIRFLOW-2951] dag_run end_date Null after a dag is finished
[AIRFLOW-2956] Kubernetes tolerations for pod operator
[AIRFLOW-2997] Support for clustered tables in Bigquery hooks/operators
[AIRFLOW-3006] Fix rrror when schedule_interval="None"
[AIRFLOW-3008] Move Kubernetes related example DAGs to contribe/example_dags
[AIRFLOW-3025] Allow to specify dns and dns-search parameters for DockerOperator
[AIRFLOW-3067] (www_rbac) Flask flash messages are not displayed properly (no 
background color)
[AIRFLOW-3069] Decode output of S3 file transform operator
[AIRFLOW-3090] INFO logs are too verbose
[AIRFLOW-3103] Update Flask-Login
[AIRFLOW-3112] Align SFTP hook with SSH hook
[AIRFLOW-3119] Enable loglevel on celery worker and inherit from airflow.cfg
[AIRFLOW-3137] Make ProxyFix middleware optional
[AIRFLOW-3173] Add _cmd options for more password config options
[AIRFLOW-3177] Change scheduler_heartbeat metric from gauge to counter
[AIRFLOW-3195] Druid Hook: Log ingestion spec and task id
[AIRFLOW-3197] EMR Hook is missing some parameters to valid on the AWS API
[AIRFLOW-3232] Make documentation for GCF Functions operator more readable
[AIRFLOW-3262] Can't get log containing Response when using SimpleHttpOperator
[AIRFLOW-3265] Add support for "unix_socket" in connection extra for Mysql Hook

Doc-only changes:

[AIRFLOW-1441] Tutorial Inconsistencies Between Example Pipeline Definition and 
Recap
[AIRFLOW-2682] Add how-to guide(s) for how to use basic operators like 
BashOperator and PythonOperator
[AIRFLOW-3104] .airflowignore feature is not mentioned at all in documentation
[AIRFLOW-3237] Refactor example DAGs
[AIRFLOW-3187] Update airflow.gif file with a slower version
[AIRFLOW-3159] Update Airflow documentation on GCP Logging
[AIRFLOW-3030] Command Line docs incorrect subdir
[AIRFLOW-2990] Docstrings for Hooks/Operators are in incorrect format
[AIRFLOW-3127] Celery SSL Documentation is out-dated

Bug fixes:

[AIRFLOW-839] docker_operator.py attempts to log status key without first 
checking existence
[AIRFLOW-1104] Concurrency che

Re: what is error[111] and how to deal with it on sending the email notification?

2018-11-01 Thread Ash Berlin-Taylor
Errno 111 is a connection refused socket-level error, and it's saying that the 
server where your airflow scheduler is running cannot reach smtp.live.com 
 on port 587.

First thing to look at would be your firewall and networking settings.

-ash


> On 1 Nov 2018, at 00:30, rajasimmangan...@gmail.com wrote:
> 
> #
> 
> 
> from airflow import DAG
> from airflow.operators.bash_operator import BashOperator
> from airflow.operators.python_operator import PythonOperator
> from airflow.operators.email_operator import EmailOperator
> from airflow.utils.email import send_email_smtp
> import datetime as dt
> 
> 
> 
> 
> default_args = {
>'owner': 'arflow',
>'depends_on_past': False,
>'start_date': dt.datetime(2018, 10, 30),
>'email':['@gmail.com'],
>'email_on_failure': True,
>'email_on_retry': False,
>#'retries': 1,
>#'retry_delay': timedelta(minutes=5),
> }
> 
> dag =  DAG('Raja's Airflow',
> default_args=default_args,
> schedule_interval='0 3 * * *') 
> 
> notify_email = 
> EmailOperator(task_id='email',to=['@gmail.com'],subject="HI",html_content="raw
>  content #2",dag=dag)
> 
> t1 = BashOperator(task_id='load_data',
>bash_command='python3 
> /usr/share/airflow/documents/scripts/Airflow-Testing/arfotest.py',email_on_failure
>  = notify_email,dag=dag)
> 
> t1
> 
> 
> ###
> Content of .cfg file
> smtp_host = Smtp.live.com
> smtp_starttls = True
> smtp_ssl = False
> smtp_user = ***
> smtp_password= 
> smtp_port = 587
> smtp_mail_from=***(same as stmp_user)
> 
> ##
> 
> This is the error, I'm getting it
> 
> [2018-11-01 11:18:56,405] {models.py:1769} ERROR - [Errno 111] Connection 
> refused
> Traceback (most recent call last):
>  File "/usr/local/lib/python2.7/dist-packages/airflow/models.py", line 1766, 
> in handle_failure
>self.email_alert(error, is_retry=False)
>  File "/usr/local/lib/python2.7/dist-packages/airflow/models.py", line 1945, 
> in email_alert
>send_email(task.email, title, body)
>  File "/usr/local/lib/python2.7/dist-packages/airflow/utils/email.py", line 
> 53, in send_email
>mime_subtype=mime_subtype, mime_charset=mime_charset, **kwargs)
>  File "/usr/local/lib/python2.7/dist-packages/airflow/utils/email.py", line 
> 99, in send_email_smtp
>send_MIME_email(SMTP_MAIL_FROM, recipients, msg, dryrun)
>  File "/usr/local/lib/python2.7/dist-packages/airflow/utils/email.py", line 
> 119, in send_MIME_email
>s = smtplib.SMTP_SSL(SMTP_HOST, SMTP_PORT) if SMTP_SSL else 
> smtplib.SMTP(SMTP_HOST, SMTP_PORT)
>  File "/usr/lib/python2.7/smtplib.py", line 256, in __init__
>(code, msg) = self.connect(host, port)
>  File "/usr/lib/python2.7/smtplib.py", line 317, in connect
>self.sock = self._get_socket(host, port, self.timeout)
>  File "/usr/lib/python2.7/smtplib.py", line 292, in _get_socket
>return socket.create_connection((host, port), timeout)
>  File "/usr/lib/python2.7/socket.py", line 575, in create_connection
>raise err
> error: [Errno 111] Connection refused
> 
> 



Re: 1.10.1 Release?

2018-10-30 Thread Ash Berlin-Taylor
Fair :) Timezones are _hard_

Giving it a look now.

-ash

> On 30 Oct 2018, at 20:50, Bolke de Bruin  wrote:
> 
> The reason for not passing a TZ aware object is, is that many libraries make 
> mistakes (pytz, arrow etc) when doing transitions hence to use of pendulum 
> which seem most complete. I don’t know what croniter is relying on and I 
> don’t want to find out ;-).
> 
> B.
> 
>> On 30 Oct 2018, at 21:13, Ash Berlin-Taylor > <mailto:a...@firemirror.com>> wrote:
>> 
>> I think if we give croniter a tz-aware DT in the local tz it will deal with 
>> DST (i.e. will give 2:55 CEST followed by 2:00 CET) and then we convert it 
>> to UTC for return - but right now we are giving it a TZ-unaware local time.
>> 
>> I think.
>> 
>> Ash
>> 
>> On 30 October 2018 19:40:27 GMT, Bolke de Bruin  wrote:
>> I think we should use the UTC date for cron instead of the naive local date 
>> time. I will check of croniter implements this so we can rely on that.
>> 
>> B.
>> 
>> On 28 Oct 2018, at 02:09, Bolke de Bruin  wrote:
>> 
>> I wonder how to treat this:
>> 
>> This is what I think happens (need to verify more, but I am pretty sure) the 
>> specified DAG should run every 5 minutes. At DST change (3AM -> 2AM) we 
>> basically hit a schedule that we have already seen. 2AM -> 3AM has already 
>> happened. Obviously the intention is to run every 5 minutes. But what do we 
>> do with the execution_date? Is this still idempotent? Should we indeed 
>> reschedule? 
>> 
>> B.
>> 
>> On 30 Oct 2018, at 19:01, Ash Berlin-Taylor  wrote:
>> 
>> I've done a bit more digging - the issue is of our tz-aware handling inside 
>> following_schedule (and previous schedule) - causing it to loop.
>> 
>> This section of the croniter docs seems relevant 
>> https://github.com/kiorky/croniter#about-dst 
>> <https://github.com/kiorky/croniter#about-dst><https://github.com/kiorky/croniter#about-dst
>>  <https://github.com/kiorky/croniter#about-dst>>
>> 
>>  Be sure to init your croniter instance with a TZ aware datetime for this to 
>> work !:
>> local_date = tz.localize(datetime(2017, 3, 26))
>> val = croniter('0 0 * * *', local_date).get_next(datetime)
>> 
>> I think the problem is that we are _not_ passing a TZ aware dag in and we 
>> should be.
>> 
>> On 30 Oct 2018, at 17:35, Bolke de Bruin > <mailto:bdbr...@gmail.com>> wrote:
>> 
>> Oh that’s a great environment to start digging. Thanks. I’ll have a look.
>> 
>> B.
>> 
>> Verstuurd vanaf mijn iPad
>> 
>> Op 30 okt. 2018 om 18:25 heeft Ash Berlin-Taylor > <mailto:a...@apache.org>> het volgende geschreven:
>> 
>> This line in airflow.jobs (line 874 in my checkout) is causing the loop:
>> 
>>last_run = dag.get_last_dagrun(session=session)
>>if last_run and next_run_date:
>>while next_run_date <= last_run.execution_date:
>>next_run_date = dag.following_schedule(next_run_date)
>> 
>> 
>> 
>> On 30 Oct 2018, at 17:20, Ash Berlin-Taylor > <mailto:a...@apache.org>> wrote:
>> 
>> Hi, kaczors on gitter has produced a minmal reproduction case: 
>> https://github.com/kaczors/airflow_1_10_tz_bug 
>> <https://github.com/kaczors/airflow_1_10_tz_bug> 
>> <https://github.com/kaczors/airflow_1_10_tz_bug 
>> <https://github.com/kaczors/airflow_1_10_tz_bug>>
>> 
>> Rough repro steps: In a VM, with time syncing disabled, and configured with 
>> system timezone of Europe/Zurich (or any other CEST one) run 
>> 
>> - `date 10280250.00`
>> - initdb, start scheduler, webserver, enable dag etc.
>> - `date 10280259.00`
>> - wait 5-10 mins for scheduler to catch up
>> - After the on-the-hour task run the scheduler will spin up another process 
>> to parse the dag... and it never returns.
>> 
>> I've only just managed to reproduce it, so haven't dug in to why yet. A 
>> quick hacky debug print shows something is stuck in an infinite loop.
>> 
>> -ash
>> 
>> On 29 Oct 2018, at 17:59, Bolke de Bruin > <mailto:bdbr...@gmail.com>> wrote:
>> 
>> Can this be confirmed? Then I can have a look at it. Preferably with dag 
>> definition code.
>> 
>> On the licensing requirements:
>> 
>> 1. Indeed licensing header for markdown documents. It was suggested to use 
>> html comments. I’m not sure how that renders with others like PDF though.
>> 2. The

Re: 1.10.1 Release?

2018-10-30 Thread Ash Berlin-Taylor
I think if we give croniter a tz-aware DT in the local tz it will deal with DST 
(i.e. will give 2:55 CEST followed by 2:00 CET) and then we convert it to UTC 
for return - but right now we are giving it a TZ-unaware local time.

I think.

Ash

On 30 October 2018 19:40:27 GMT, Bolke de Bruin  wrote:
>I think we should use the UTC date for cron instead of the naive local
>date time. I will check of croniter implements this so we can rely on
>that.
>
>B.
>
>> On 28 Oct 2018, at 02:09, Bolke de Bruin  wrote:
>> 
>> I wonder how to treat this:
>> 
>> This is what I think happens (need to verify more, but I am pretty
>sure) the specified DAG should run every 5 minutes. At DST change (3AM
>-> 2AM) we basically hit a schedule that we have already seen. 2AM ->
>3AM has already happened. Obviously the intention is to run every 5
>minutes. But what do we do with the execution_date? Is this still
>idempotent? Should we indeed reschedule? 
>> 
>> B.
>> 
>>> On 30 Oct 2018, at 19:01, Ash Berlin-Taylor  wrote:
>>> 
>>> I've done a bit more digging - the issue is of our tz-aware handling
>inside following_schedule (and previous schedule) - causing it to loop.
>>> 
>>> This section of the croniter docs seems relevant
>https://github.com/kiorky/croniter#about-dst
>>> 
>>>   Be sure to init your croniter instance with a TZ aware datetime
>for this to work !:
>>>>>> local_date = tz.localize(datetime(2017, 3, 26))
>>>>>> val = croniter('0 0 * * *', local_date).get_next(datetime)
>>> 
>>> I think the problem is that we are _not_ passing a TZ aware dag in
>and we should be.
>>> 
>>>> On 30 Oct 2018, at 17:35, Bolke de Bruin  wrote:
>>>> 
>>>> Oh that’s a great environment to start digging. Thanks. I’ll have a
>look.
>>>> 
>>>> B.
>>>> 
>>>> Verstuurd vanaf mijn iPad
>>>> 
>>>>> Op 30 okt. 2018 om 18:25 heeft Ash Berlin-Taylor 
>het volgende geschreven:
>>>>> 
>>>>> This line in airflow.jobs (line 874 in my checkout) is causing the
>loop:
>>>>> 
>>>>> last_run = dag.get_last_dagrun(session=session)
>>>>> if last_run and next_run_date:
>>>>> while next_run_date <= last_run.execution_date:
>>>>> next_run_date =
>dag.following_schedule(next_run_date)
>>>>> 
>>>>> 
>>>>> 
>>>>>> On 30 Oct 2018, at 17:20, Ash Berlin-Taylor 
>wrote:
>>>>>> 
>>>>>> Hi, kaczors on gitter has produced a minmal reproduction case:
>https://github.com/kaczors/airflow_1_10_tz_bug
>>>>>> 
>>>>>> Rough repro steps: In a VM, with time syncing disabled, and
>configured with system timezone of Europe/Zurich (or any other CEST
>one) run 
>>>>>> 
>>>>>> - `date 10280250.00`
>>>>>> - initdb, start scheduler, webserver, enable dag etc.
>>>>>> - `date 10280259.00`
>>>>>> - wait 5-10 mins for scheduler to catch up
>>>>>> - After the on-the-hour task run the scheduler will spin up
>another process to parse the dag... and it never returns.
>>>>>> 
>>>>>> I've only just managed to reproduce it, so haven't dug in to why
>yet. A quick hacky debug print shows something is stuck in an infinite
>loop.
>>>>>> 
>>>>>> -ash
>>>>>> 
>>>>>>> On 29 Oct 2018, at 17:59, Bolke de Bruin 
>wrote:
>>>>>>> 
>>>>>>> Can this be confirmed? Then I can have a look at it. Preferably
>with dag definition code.
>>>>>>> 
>>>>>>> On the licensing requirements:
>>>>>>> 
>>>>>>> 1. Indeed licensing header for markdown documents. It was
>suggested to use html comments. I’m not sure how that renders with
>others like PDF though.
>>>>>>> 2. The licensing notifications need to be tied to a specific
>version as licenses might change with versions.
>>>>>>> 
>>>>>>> Cheers
>>>>>>> Bolke
>>>>>>> 
>>>>>>> Verstuurd vanaf mijn iPad
>>>>>>> 
>>>>>>>> Op 29 okt. 2018 om 12:39 heeft Ash Berlin-Taylor
> het volgende geschreven:
>>>>>>>> 
>>>>>>>> I was going to make a start

Re: 1.10.1 Release?

2018-10-30 Thread Ash Berlin-Taylor
I've done a bit more digging - the issue is of our tz-aware handling inside 
following_schedule (and previous schedule) - causing it to loop.

This section of the croniter docs seems relevant 
https://github.com/kiorky/croniter#about-dst

Be sure to init your croniter instance with a TZ aware datetime for this to 
work !:
>>> local_date = tz.localize(datetime(2017, 3, 26))
>>> val = croniter('0 0 * * *', local_date).get_next(datetime)

I think the problem is that we are _not_ passing a TZ aware dag in and we 
should be.

> On 30 Oct 2018, at 17:35, Bolke de Bruin  wrote:
> 
> Oh that’s a great environment to start digging. Thanks. I’ll have a look.
> 
> B.
> 
> Verstuurd vanaf mijn iPad
> 
>> Op 30 okt. 2018 om 18:25 heeft Ash Berlin-Taylor  het 
>> volgende geschreven:
>> 
>> This line in airflow.jobs (line 874 in my checkout) is causing the loop:
>> 
>>   last_run = dag.get_last_dagrun(session=session)
>>   if last_run and next_run_date:
>>   while next_run_date <= last_run.execution_date:
>>           next_run_date = dag.following_schedule(next_run_date)
>> 
>> 
>> 
>>> On 30 Oct 2018, at 17:20, Ash Berlin-Taylor  wrote:
>>> 
>>> Hi, kaczors on gitter has produced a minmal reproduction case: 
>>> https://github.com/kaczors/airflow_1_10_tz_bug
>>> 
>>> Rough repro steps: In a VM, with time syncing disabled, and configured with 
>>> system timezone of Europe/Zurich (or any other CEST one) run 
>>> 
>>> - `date 10280250.00`
>>> - initdb, start scheduler, webserver, enable dag etc.
>>> - `date 10280259.00`
>>> - wait 5-10 mins for scheduler to catch up
>>> - After the on-the-hour task run the scheduler will spin up another process 
>>> to parse the dag... and it never returns.
>>> 
>>> I've only just managed to reproduce it, so haven't dug in to why yet. A 
>>> quick hacky debug print shows something is stuck in an infinite loop.
>>> 
>>> -ash
>>> 
>>>> On 29 Oct 2018, at 17:59, Bolke de Bruin  wrote:
>>>> 
>>>> Can this be confirmed? Then I can have a look at it. Preferably with dag 
>>>> definition code.
>>>> 
>>>> On the licensing requirements:
>>>> 
>>>> 1. Indeed licensing header for markdown documents. It was suggested to use 
>>>> html comments. I’m not sure how that renders with others like PDF though.
>>>> 2. The licensing notifications need to be tied to a specific version as 
>>>> licenses might change with versions.
>>>> 
>>>> Cheers
>>>> Bolke
>>>> 
>>>> Verstuurd vanaf mijn iPad
>>>> 
>>>>> Op 29 okt. 2018 om 12:39 heeft Ash Berlin-Taylor  het 
>>>>> volgende geschreven:
>>>>> 
>>>>> I was going to make a start on the release, but two people have reported 
>>>>> that there might be an issue around non-UTC dags and the scheduler 
>>>>> changing over from Summer time.
>>>>> 
>>>>>> 08:45 Emmanuel> Hi there, we are currently experiencing a very strange 
>>>>>> issue : we have hourly DAGs with a start_date in a local timezone (not 
>>>>>> UTC) and since (Sunday) the last winter time change they don’t run 
>>>>>> anymore. Any idea ?
>>>>>> 09:41  it impacted all our DAG that had a run at 3am 
>>>>>> (Europe/Paris), the exact time of winter time change :(
>>>>> 
>>>>> I am going to take a look at this today and see if I can get to the 
>>>>> bottom of it.
>>>>> 
>>>>> Bolke: are there any outstanding tasks/issues that you know of that might 
>>>>> slow down the vote for a 1.10.1? (i.e. did we sort of out all the 
>>>>> licensing issues that were asked of us? I thought I read something about 
>>>>> license declarations in markdown files?)
>>>>> 
>>>>> -ash
>>>>> 
>>>>>> On 28 Oct 2018, at 14:46, Bolke de Bruin  wrote:
>>>>>> 
>>>>>> I agree with that, but I would favor time based releases instead. We are 
>>>>>> again at the point that a release takes so much time that the gap is 
>>>>>> getting really big again. @ash why not start releasing now and move the 
>>>>>> remainder to 1.10.2? I dont think there are real blockers (although we 
>>>>>> might find them).
>>>>>> 
>>>>>> 
>>>>>>> On 28 Oct 2018, at 15:35, airflowuser 
>>>>>>>  wrote:
>>>>>>> 
>>>>>>> I was really hoping that 
>>>>>>> https://github.com/apache/incubator-airflow/pull/4069 will be merged 
>>>>>>> into 1.10.1
>>>>>>> Deleting dags was a highly requested feature for 1.10 - this can fix 
>>>>>>> the problem with it.
>>>>>>> 
>>>>>>> 
>>>>>>> ‐‐‐ Original Message ‐‐‐
>>>>>>>>> On Friday, October 26, 2018 6:12 PM, Bolke de Bruin 
>>>>>>>>>  wrote:
>>>>>>>> 
>>>>>>>> Hey Ash,
>>>>>>>> 
>>>>>>>> I was wondering if you are picking up the 1.10.1 release? Master is 
>>>>>>>> speeding ahead and you were tracking fixes for 1.10.1 right?
>>>>>>>> 
>>>>>>>> B.
>>>>> 
>>> 
>> 



Re: 1.10.1 Release?

2018-10-30 Thread Ash Berlin-Taylor
This line in airflow.jobs (line 874 in my checkout) is causing the loop:

last_run = dag.get_last_dagrun(session=session)
if last_run and next_run_date:
while next_run_date <= last_run.execution_date:
next_run_date = dag.following_schedule(next_run_date)



> On 30 Oct 2018, at 17:20, Ash Berlin-Taylor  wrote:
> 
> Hi, kaczors on gitter has produced a minmal reproduction case: 
> https://github.com/kaczors/airflow_1_10_tz_bug
> 
> Rough repro steps: In a VM, with time syncing disabled, and configured with 
> system timezone of Europe/Zurich (or any other CEST one) run 
> 
> - `date 10280250.00`
> - initdb, start scheduler, webserver, enable dag etc.
> - `date 10280259.00`
> - wait 5-10 mins for scheduler to catch up
> - After the on-the-hour task run the scheduler will spin up another process 
> to parse the dag... and it never returns.
> 
> I've only just managed to reproduce it, so haven't dug in to why yet. A quick 
> hacky debug print shows something is stuck in an infinite loop.
> 
> -ash
> 
>> On 29 Oct 2018, at 17:59, Bolke de Bruin  wrote:
>> 
>> Can this be confirmed? Then I can have a look at it. Preferably with dag 
>> definition code.
>> 
>> On the licensing requirements:
>> 
>> 1. Indeed licensing header for markdown documents. It was suggested to use 
>> html comments. I’m not sure how that renders with others like PDF though.
>> 2. The licensing notifications need to be tied to a specific version as 
>> licenses might change with versions.
>> 
>> Cheers
>> Bolke
>> 
>> Verstuurd vanaf mijn iPad
>> 
>>> Op 29 okt. 2018 om 12:39 heeft Ash Berlin-Taylor  het 
>>> volgende geschreven:
>>> 
>>> I was going to make a start on the release, but two people have reported 
>>> that there might be an issue around non-UTC dags and the scheduler changing 
>>> over from Summer time.
>>> 
>>>> 08:45 Emmanuel> Hi there, we are currently experiencing a very strange 
>>>> issue : we have hourly DAGs with a start_date in a local timezone (not 
>>>> UTC) and since (Sunday) the last winter time change they don’t run 
>>>> anymore. Any idea ?
>>>> 09:41  it impacted all our DAG that had a run at 3am 
>>>> (Europe/Paris), the exact time of winter time change :(
>>> 
>>> I am going to take a look at this today and see if I can get to the bottom 
>>> of it.
>>> 
>>> Bolke: are there any outstanding tasks/issues that you know of that might 
>>> slow down the vote for a 1.10.1? (i.e. did we sort of out all the licensing 
>>> issues that were asked of us? I thought I read something about license 
>>> declarations in markdown files?)
>>> 
>>> -ash
>>> 
>>>> On 28 Oct 2018, at 14:46, Bolke de Bruin  wrote:
>>>> 
>>>> I agree with that, but I would favor time based releases instead. We are 
>>>> again at the point that a release takes so much time that the gap is 
>>>> getting really big again. @ash why not start releasing now and move the 
>>>> remainder to 1.10.2? I dont think there are real blockers (although we 
>>>> might find them).
>>>> 
>>>> 
>>>>> On 28 Oct 2018, at 15:35, airflowuser 
>>>>>  wrote:
>>>>> 
>>>>> I was really hoping that 
>>>>> https://github.com/apache/incubator-airflow/pull/4069 will be merged into 
>>>>> 1.10.1
>>>>> Deleting dags was a highly requested feature for 1.10 - this can fix the 
>>>>> problem with it.
>>>>> 
>>>>> 
>>>>> ‐‐‐ Original Message ‐‐‐
>>>>>>> On Friday, October 26, 2018 6:12 PM, Bolke de Bruin  
>>>>>>> wrote:
>>>>>> 
>>>>>> Hey Ash,
>>>>>> 
>>>>>> I was wondering if you are picking up the 1.10.1 release? Master is 
>>>>>> speeding ahead and you were tracking fixes for 1.10.1 right?
>>>>>> 
>>>>>> B.
>>> 
> 



Re: 1.10.1 Release?

2018-10-30 Thread Ash Berlin-Taylor
Hi, kaczors on gitter has produced a minmal reproduction case: 
https://github.com/kaczors/airflow_1_10_tz_bug

Rough repro steps: In a VM, with time syncing disabled, and configured with 
system timezone of Europe/Zurich (or any other CEST one) run 

- `date 10280250.00`
- initdb, start scheduler, webserver, enable dag etc.
- `date 10280259.00`
- wait 5-10 mins for scheduler to catch up
- After the on-the-hour task run the scheduler will spin up another process to 
parse the dag... and it never returns.

I've only just managed to reproduce it, so haven't dug in to why yet. A quick 
hacky debug print shows something is stuck in an infinite loop.

-ash

> On 29 Oct 2018, at 17:59, Bolke de Bruin  wrote:
> 
> Can this be confirmed? Then I can have a look at it. Preferably with dag 
> definition code.
> 
> On the licensing requirements:
> 
> 1. Indeed licensing header for markdown documents. It was suggested to use 
> html comments. I’m not sure how that renders with others like PDF though.
> 2. The licensing notifications need to be tied to a specific version as 
> licenses might change with versions.
> 
> Cheers
> Bolke
> 
> Verstuurd vanaf mijn iPad
> 
>> Op 29 okt. 2018 om 12:39 heeft Ash Berlin-Taylor  het 
>> volgende geschreven:
>> 
>> I was going to make a start on the release, but two people have reported 
>> that there might be an issue around non-UTC dags and the scheduler changing 
>> over from Summer time.
>> 
>>> 08:45 Emmanuel> Hi there, we are currently experiencing a very strange 
>>> issue : we have hourly DAGs with a start_date in a local timezone (not UTC) 
>>> and since (Sunday) the last winter time change they don’t run anymore. Any 
>>> idea ?
>>> 09:41  it impacted all our DAG that had a run at 3am 
>>> (Europe/Paris), the exact time of winter time change :(
>> 
>> I am going to take a look at this today and see if I can get to the bottom 
>> of it.
>> 
>> Bolke: are there any outstanding tasks/issues that you know of that might 
>> slow down the vote for a 1.10.1? (i.e. did we sort of out all the licensing 
>> issues that were asked of us? I thought I read something about license 
>> declarations in markdown files?)
>> 
>> -ash
>> 
>>> On 28 Oct 2018, at 14:46, Bolke de Bruin  wrote:
>>> 
>>> I agree with that, but I would favor time based releases instead. We are 
>>> again at the point that a release takes so much time that the gap is 
>>> getting really big again. @ash why not start releasing now and move the 
>>> remainder to 1.10.2? I dont think there are real blockers (although we 
>>> might find them).
>>> 
>>> 
>>>> On 28 Oct 2018, at 15:35, airflowuser  
>>>> wrote:
>>>> 
>>>> I was really hoping that 
>>>> https://github.com/apache/incubator-airflow/pull/4069 will be merged into 
>>>> 1.10.1
>>>> Deleting dags was a highly requested feature for 1.10 - this can fix the 
>>>> problem with it.
>>>> 
>>>> 
>>>> ‐‐‐ Original Message ‐‐‐
>>>>>> On Friday, October 26, 2018 6:12 PM, Bolke de Bruin  
>>>>>> wrote:
>>>>> 
>>>>> Hey Ash,
>>>>> 
>>>>> I was wondering if you are picking up the 1.10.1 release? Master is 
>>>>> speeding ahead and you were tracking fixes for 1.10.1 right?
>>>>> 
>>>>> B.
>> 



Re: 1.10.1 Release?

2018-10-29 Thread Ash Berlin-Taylor
I was going to make a start on the release, but two people have reported that 
there might be an issue around non-UTC dags and the scheduler changing over 
from Summer time.

> 08:45 Emmanuel> Hi there, we are currently experiencing a very strange issue 
> : we have hourly DAGs with a start_date in a local timezone (not UTC) and 
> since (Sunday) the last winter time change they don’t run anymore. Any idea ?
> 09:41  it impacted all our DAG that had a run at 3am 
> (Europe/Paris), the exact time of winter time change :(

I am going to take a look at this today and see if I can get to the bottom of 
it.

Bolke: are there any outstanding tasks/issues that you know of that might slow 
down the vote for a 1.10.1? (i.e. did we sort of out all the licensing issues 
that were asked of us? I thought I read something about license declarations in 
markdown files?)

-ash

> On 28 Oct 2018, at 14:46, Bolke de Bruin  wrote:
> 
> I agree with that, but I would favor time based releases instead. We are 
> again at the point that a release takes so much time that the gap is getting 
> really big again. @ash why not start releasing now and move the remainder to 
> 1.10.2? I dont think there are real blockers (although we might find them).
> 
> 
>> On 28 Oct 2018, at 15:35, airflowuser  
>> wrote:
>> 
>> I was really hoping that 
>> https://github.com/apache/incubator-airflow/pull/4069 will be merged into 
>> 1.10.1
>> Deleting dags was a highly requested feature for 1.10 - this can fix the 
>> problem with it.
>> 
>> 
>> ‐‐‐ Original Message ‐‐‐
>> On Friday, October 26, 2018 6:12 PM, Bolke de Bruin  
>> wrote:
>> 
>>> Hey Ash,
>>> 
>>> I was wondering if you are picking up the 1.10.1 release? Master is 
>>> speeding ahead and you were tracking fixes for 1.10.1 right?
>>> 
>>> B.
>> 
>> 
> 



Re: 1.10.1 Release?

2018-10-26 Thread Ash Berlin-Taylor
Hey, yeah I've been working (a bit slowly) on it - we had a few test failures 
on the v1-10-test branch (conflicts/bad cherry-picks) to unravel to get green 
builds, but we're there now.

A rough summary of where we are (wide columns, won't look good if it has to 
line-wrap, https://gist.github.com/ashb/4285c5a6c6b3be616495b1181d3fd63e to see 
it in another format):

(airflow) (themisto python/incuba…irflow v1-10-test:+)% ./dev/airflow-jira 
compare 1.10.1
ISSUE ID  |TYPE||PRIORITY  ||STATUS|DESCRIPTION 
  |MERGED|PR|COMMIT
AIRFLOW-3238  |Bug ||Major ||Resolved  |Dags, removed from the 
filesystem, are not deactiv|1 |#na   
|1eeb0a4a24fa8541763a67f84ec9f4b034f66475
AIRFLOW-3237  |Improvement ||Major ||Resolved  |Refactor example DAGs   
  |1 |#na   
|fdfb359e4b95dfadfa3973d43025f61f4aa3b96a
AIRFLOW-3232  |Improvement ||Trivial   ||Resolved  |Make documentation for 
GCF Functions operator more|1 |#na   
|d4dff076a6eaf169424822c0010c802f4af80c6a
AIRFLOW-3203  |Bug ||Critical  ||Closed|Bugs in DockerOperator 
& Some operator test script|1 |#na   
|3dfc9562d3df127ca8337edb600ab6a0259521ac
AIRFLOW-3197  |Improvement ||Minor ||Resolved  |EMR Hook is missing 
some parameters to valid on th|1 |#na   
|079b0ee95e4a1d37bdb31c477b531db264242bf7
AIRFLOW-3195  |Improvement ||Trivial   ||Resolved  |Druid Hook: Log 
ingestion spec and task id|1 |#na   
|8e55b499b2dcc6e4dc4d86f8225d1424f6886a0c
AIRFLOW-3187  |Wish||Major ||Resolved  |Update airflow.gif file 
with a slower version |0 |- |-
AIRFLOW-3183  |Bug ||Minor ||Resolved  |Potential Bug in 
utils/dag_processing/DagFileProce|1 |#na   
|0e98c60268c805639be8cd8c1cdf9a8909f966bb
AIRFLOW-3178  |Bug ||Major ||Resolved  |`airflow run` config 
doens't cope with % in config|1 |#na   
|5bc4cfa4a7908818877828ed3db1090b71e65b93
AIRFLOW-3177  |Improvement ||Minor ||Resolved  |Change 
scheduler_heartbeat metric from gauge to co|1 |#na   
|cab121bc036be24bcb1e397f48ca1672e1f73692
AIRFLOW-3173  |Improvement ||Major ||Resolved  |Add _cmd options for 
password config options  |1 |#na   
|e2a238c3a55be5fcb17b5ec64e7165dd954a4bc0
AIRFLOW-3172  |Bug ||Major ||Open  |AttributeError: 
'DagModel' object has no attribute|0 |- |-
AIRFLOW-3162  |Bug ||Minor ||Resolved  |HttpHook fails to parse 
URL when port is specified|1 |#na   
|040707b5d830acb96fc6c9a367039ed96aba7231
AIRFLOW-3147  |New Feature ||Major ||Resolved  |Update Flask-AppBuilder 
version   |1 |#na   
|35f996b65a2e68f4e168e3f466f3a939e4fb904a
AIRFLOW-3138  |Bug ||Major ||Resolved  |Migration cc1e65623dc7 
creates issues with postgre|1 |#na   
|1234af2995780327517c770bd09daf684563527d
AIRFLOW-3137  |Improvement ||Trivial   ||Resolved  |Make ProxyFix 
middleware optional |0 |- |-
AIRFLOW-3127  |Improvement ||Minor ||Resolved  |Celery SSL 
Documentation is out-dated |0 |- |-
AIRFLOW-3124  |Bug ||Minor ||Resolved  |Broken webserver debug 
mode (RBAC)|1 |#na   
|f9988147999b73f10978f74d37f704dae8e4f012
AIRFLOW-3119  |Improvement ||Minor ||Resolved  |Enable loglevel on 
celery worker and inherit from |1 |#na   
|23be2a30cd28f236722433a07c536cb17c8515ee
AIRFLOW-3114  |Improvement ||Minor ||Open  |Add feature to create 
External BigQuery Table for |0 |- |-
AIRFLOW-3112  |Improvement ||Minor ||Resolved  |Align SFTP hook with 
SSH hook |1 |#na   
|35969a426822e5fc9bc2b941b4fe5c5098a5ee52
AIRFLOW-3111  |Bug ||Minor ||Resolved  |Confusing comments and 
instructions for log templa|0 |- |-
AIRFLOW-3109  |Bug ||Major ||Resolved  |Default user permission 
should contain 'can_clear'|0 |- |-
AIRFLOW-3104  |Improvement ||Minor ||Resolved  |.airflowignore feature 
is not mentioned at all in |0 |- |-
AIRFLOW-3099  |Bug ||Minor ||Resolved  |Errors raised when some 
blocs are missing in airfl|1 |#na   
|25bb0dff64687948f73b3fb86fae4e8476f4f9ce
AIRFLOW-3090  |Wish||Minor ||Closed|INFO logs are too 
verbose |0 |- |-
AIRFLOW-3089  |Bug ||Minor ||Resolved  |Google auth doesn't 
work under http   |1 |#na   
|bd7510094de25f2c2bce2a49a3febae28899f3b6
AIRFLOW-3079  |Bug ||Major ||Resolved  |Improve initdb to 
support MSSQL Server|0 |- |-
AIRFLOW-3078  |New Feature ||Trivial   ||Resolved  |Basic operators for 
Google Compute Engine |1 |#na   
|6aeda0fcb2ecbba647a6cede992bf23556c1b05c
AIRFLOW-3072  |Bug ||Major ||Resolved

Re: [IE] Re: [External] RE: [IE] airflow ui not showing logs

2018-10-26 Thread Ash Berlin-Taylor
I added a check for this that will be in 1.10.1 so Airflow will 1) warn you 
about this, and 2) try to correct it automatically 

https://github.com/apache/incubator-airflow/blob/a1e922fe6d7ac3aa19848a1ad34836b61fccf24d/airflow/logging_config.py#L79-L106
 



> On 25 Oct 2018, at 22:04, Frank Maritato  
> wrote:
> 
> Ok, I finally had time to try and track down the issue. Looks like the cfg
> option 'task_log_reader' was changed at some point prior to 1.10.0.
> Previously, my airflow.cfg had:
> 
> task_log_reader = file.task
> 
> But now it needs to be set to
> 
> task_log_reader = task
> 
> 
> On Mon, Oct 15, 2018 at 1:57 PM Frank Maritato 
> wrote:
> 
>> James - No, my configuration doesn't have a setting for rbac, so I'm
>> assuming the default is False.
>> 
>> Sunil - We are currently running the LocalExecutor. Is the worker process
>> just for Celery? According to the quickstart, I thought the only processes
>> we need to start are the webserver and scheduler.
>> 
>> 
>> On Mon, Oct 15, 2018 at 12:32 PM Sunil Varma Chiluvuri <
>> sunilvarma.chiluv...@equifax.com> wrote:
>> 
>>> Frank,
>>> 
>>> The serve_logs process usually starts automatically with the 'airflow
>>> worker' command. Try restarting your worker process and see if it also
>>> starts the serve_logs process.
>>> 
>>> I've had a situation where the serve_logs process went down for some
>>> reason and restarting the worker fixed it.  It doesn't explain the cause
>>> but could possibly resolve your issue.
>>> 
>>> Sunil
>>> 
>>> -Original Message-
>>> From: Frank Maritato [mailto:fmarit...@opentable.com.INVALID]
>>> Sent: Monday, October 15, 2018 2:14 PM
>>> To: dev@airflow.incubator.apache.org
>>> Subject: [IE] Re: [External] RE: [IE] airflow ui not showing logs
>>> 
>>> Hi Sunil,
>>> 
>>> I don't see this process running. I have never had to run this command
>>> previously. Should it have started as part of the 'airflow webserver'
>>> command?
>>> I ran it manually from the command line as 'airflow serve_logs &' (does
>>> not
>>> have a daemon option I guess) but I am still not seeing logs in the ui. I
>>> did verify with netstat that it is running on port 8793.
>>> 
>>> 
>>> On Mon, Oct 15, 2018 at 12:02 PM Sunil Varma Chiluvuri <
>>> sunilvarma.chiluv...@equifax.com> wrote:
>>> 
 Check if the serve_logs process is running alongside your worker
 process(es). This is the process that takes the log files written to
>>> disk
 and serves them to the web UI. It should be running on port 8793 by
>>> default.
 
 Sunil
 
 -Original Message-
 From: Frank Maritato [mailto:fmarit...@opentable.com.INVALID]
 Sent: Monday, October 15, 2018 1:42 PM
 To: dev@airflow.incubator.apache.org
 Subject: [IE] airflow ui not showing logs
 
 Hi All,
 
 I'm running airflow 1.10.0 and the ui isn't showing the task logs
>>> anymore.
 This has worked in the past so I'm not sure what changed. I was able to
 verify that the logs are definitely being written to the same local
 directory as what is specified in the airflow.cfg 'base_log_folder'. I
 don't see any errors or debug in the airflow-webserver.{out|err} logs. I
 tried restarting the process and it still doesn't work.
 
 Anyone know how I can track this down?
 --
 Frank Maritato
 This message contains proprietary information from Equifax which may be
 confidential. If you are not an intended recipient, please refrain from
>>> any
 disclosure, copying, distribution or use of this information and note
>>> that
 such actions are prohibited. If you have received this transmission in
 error, please notify by e-mail postmas...@equifax.com. Equifax® is a
 registered trademark of Equifax Inc. All rights reserved.
 
>>> 
>>> 
>>> --
>>> 
>>> Frank Maritato
>>> This message contains proprietary information from Equifax which may be
>>> confidential. If you are not an intended recipient, please refrain from any
>>> disclosure, copying, distribution or use of this information and note that
>>> such actions are prohibited. If you have received this transmission in
>>> error, please notify by e-mail postmas...@equifax.com. Equifax® is a
>>> registered trademark of Equifax Inc. All rights reserved.
>>> 
>> 
>> 
>> --
>> 
>> Frank Maritato
>> 
> 
> 
> -- 
> Frank Maritato



Re: taking an assigned Jira ticket

2018-10-23 Thread Ash Berlin-Taylor
Usually comment on the Jira ticket itself seems to be the best way. If it's 
been a while then it's also probably safe to assume they aren't actively 
working on it.

-ash

> On 23 Oct 2018, at 16:10, matthew  wrote:
> 
> Hey all,
> 
> There are a collection of docker-related Jira tickets that revolve around
> the changes in the Docker python API.  I'd like to grab some of those and
> take care of them but at least one is currently assigned to someone else
> and I can't find a way to contact the person to see if they're actually
> working on it or not.  What's the protocol here?
> 
> Thanks
> -Matthew



Re: Using Too Many Aiflow Variables in Dag is Good thing ?

2018-10-22 Thread Ash Berlin-Taylor
Redis is not a requirement of Airflow currently, nor should it become a hard 
requirement either.

Benchmarks definitely needed before we bring in anything as complex as a cache, 
certainly.

Queries to the variables table _should_ be fast too - even if it's got 1000 
rows in it that is tiny by RDBMS standards. If the problem is connection set up 
and tear down times then we should find that out.

> On 22 Oct 2018, at 11:59, Sai Phanindhra  wrote:
> 
> On top of that we can expire the cache in order of few times of scheduler
> runs(5 or 10 times one scheduler run time)
> 
> On Mon 22 Oct, 2018, 16:27 Sai Phanindhra,  wrote:
> 
>> Thats true. But variable wont change very frequently.  We can cache these
>> variables in some place outside airflow ecosystem. Something like redis or
>> memcache. As queries to these dbs are fast. We can reduce the latency and
>> decrease the number of connections to main database. This whole assumption
>> need to be benchmarked to prove the point. I feel like its worth a try.
>> 
>> On Mon 22 Oct, 2018, 15:47 Ash Berlin-Taylor,  wrote:
>> 
>>> Cache them where? When would it get invalidated? Given the DAG parsing
>>> happens in a sub-process how would the cache live longer than that process?
>>> 
>>> I think the change might be to use a per-process/per-thread SQLA
>>> connection when parsing dags, so that if a DAG needs access to the metadata
>>> DB it does it with just one connection rather than N.
>>> 
>>> -ash
>>> 
>>>> On 22 Oct 2018, at 11:11, Sai Phanindhra  wrote:
>>>> 
>>>> Who don't we cache variables? We can fairly assume that variables won't
>>> get
>>>> changed very frequently(not as frequent as scheduler DAG run time). We
>>> can
>>>> keep default timeout to few times scheduler run time. This will help
>>>> control number of connections to database and reduces load both on
>>>> scheduler and database.
>>>> 
>>>> On Mon 22 Oct, 2018, 13:34 Marcin Szymański,  wrote:
>>>> 
>>>>> Hi
>>>>> 
>>>>> You are right, it's a sure way to saturate db connections, as a
>>> connection
>>>>> is established every few seconds when the DAGs are parsed. The same
>>> happens
>>>>> when you use variables in __init__ of an operator. Os environment
>>> variable
>>>>> would be safer for your need.
>>>>> 
>>>>> Marcin
>>>>> 
>>>>> 
>>>>> On Mon, 22 Oct 2018, 08:34 Pramiti Goel, 
>>> wrote:
>>>>> 
>>>>>> Hi,
>>>>>> 
>>>>>> We want to make owner and email Id general, so we don't want to put in
>>>>>> airflow dag. Using variables will help us in changing the email/owner
>>>>>> later, if there are lot of dags of same owner.
>>>>>> 
>>>>>> For example:
>>>>>> 
>>>>>> 
>>>>>> default_args = {
>>>>>>   'owner': Variable.get('test_owner_de'),
>>>>>>   'depends_on_past': False,
>>>>>>   'start_date': datetime(2018, 10, 17),
>>>>>>   'email': Variable.get('de_infra_email'),
>>>>>>   'email_on_failure': True,
>>>>>>   'email_on_retry': True,
>>>>>>   'retries': 2,
>>>>>>   'retry_delay': timedelta(minutes=1)}
>>>>>> 
>>>>>> 
>>>>>> Looking into the code of Airflow, it is making connection session
>>>>> everytime
>>>>>> the variable is created, and then close it. (Let me know if I
>>> understand
>>>>>> wrong). If there are many dags with variables in default args running
>>>>>> parallel, querying variable table in MySQL, will it have any sort of
>>>>>> limitation on number of sessions of SQLAlchemy ? Will that make dag
>>> slow
>>>>> as
>>>>>> there will be many queries to mysql for each dag? is the above
>>> approach
>>>>>> good ?
>>>>>> 
>>>>>>> using Airlfow 1.9
>>>>>> 
>>>>>> Thanks,
>>>>>> Pramiti.
>>>>>> 
>>>>> 
>>> 
>>> 


Re: Using Too Many Aiflow Variables in Dag is Good thing ?

2018-10-22 Thread Ash Berlin-Taylor
Cache them where? When would it get invalidated? Given the DAG parsing happens 
in a sub-process how would the cache live longer than that process?

I think the change might be to use a per-process/per-thread SQLA connection 
when parsing dags, so that if a DAG needs access to the metadata DB it does it 
with just one connection rather than N.

-ash

> On 22 Oct 2018, at 11:11, Sai Phanindhra  wrote:
> 
> Who don't we cache variables? We can fairly assume that variables won't get
> changed very frequently(not as frequent as scheduler DAG run time). We can
> keep default timeout to few times scheduler run time. This will help
> control number of connections to database and reduces load both on
> scheduler and database.
> 
> On Mon 22 Oct, 2018, 13:34 Marcin Szymański,  wrote:
> 
>> Hi
>> 
>> You are right, it's a sure way to saturate db connections, as a connection
>> is established every few seconds when the DAGs are parsed. The same happens
>> when you use variables in __init__ of an operator. Os environment variable
>> would be safer for your need.
>> 
>> Marcin
>> 
>> 
>> On Mon, 22 Oct 2018, 08:34 Pramiti Goel,  wrote:
>> 
>>> Hi,
>>> 
>>> We want to make owner and email Id general, so we don't want to put in
>>> airflow dag. Using variables will help us in changing the email/owner
>>> later, if there are lot of dags of same owner.
>>> 
>>> For example:
>>> 
>>> 
>>> default_args = {
>>>'owner': Variable.get('test_owner_de'),
>>>'depends_on_past': False,
>>>'start_date': datetime(2018, 10, 17),
>>>'email': Variable.get('de_infra_email'),
>>>'email_on_failure': True,
>>>'email_on_retry': True,
>>>'retries': 2,
>>>'retry_delay': timedelta(minutes=1)}
>>> 
>>> 
>>> Looking into the code of Airflow, it is making connection session
>> everytime
>>> the variable is created, and then close it. (Let me know if I understand
>>> wrong). If there are many dags with variables in default args running
>>> parallel, querying variable table in MySQL, will it have any sort of
>>> limitation on number of sessions of SQLAlchemy ? Will that make dag slow
>> as
>>> there will be many queries to mysql for each dag? is the above approach
>>> good ?
>>> 
 using Airlfow 1.9
>>> 
>>> Thanks,
>>> Pramiti.
>>> 
>> 



Re: explicit_defaults_for_timestamp for mysql

2018-10-19 Thread Ash Berlin-Taylor
This sounds sensible and would mean we could also run on GCP's MySQL offering 
too.

This would need someone to try out and check that timezones behave sensibly 
with this change made.

Any volunteers?

-ash

> On 19 Oct 2018, at 17:32, Deng Xiaodong  wrote:
> 
> Wondering if there is any further thoughts about this proposal kindly raised 
> by Feng Lu earlier?
> 
> If we can skip this check & allow explicit_defaults_for_timestamp to be 0, it 
> would be helpful, especially for enterprise users in whose environments it’s 
> really hard to ask for a database global variable change (like myself…).
> 
> 
> XD
> 
> On 2018/08/28 15:23:10, Feng Lu  wrote: 
>> Bolke, a gentle ping..> 
>> Thank you.> 
>> 
>> On Thu, Aug 23, 2018, 23:01 Feng Lu  wrote:> 
>> 
>>> Hi all,> 
 
>>> After reading the MySQL documentation on the> 
>>> exlicit_defaults_for_timestamp, it appears that we can skip the check on 
>>> explicit_defaults_for_timestamp> 
>>> = 1> 
>>> 
>>>  by> 
>>> setting the column to accept NULL explicitly. For example:> 
 
>>> op.alter_column(table_name='chart', column_name='last_modified',> 
>>> type_=mysql.TIMESTAMP(fsp=6)) -->> 
>>> op.alter_column(table_name='chart', column_name='last_modified',> 
>>> type_=mysql.TIMESTAMP(fsp=6), nullable=True)> 
 
>>> Here's why:> 
>>> From MySQL doc (when explicit_defaults_for_timestamp is set to True):> 
>>> "TIMESTAMP columns not explicitly declared with the NOT NULL attribute are> 
>>> automatically declared with the NULL attribute and permit NULL values.> 
>>> Assigning such a column a value of NULL sets it to NULL, not the current> 
>>> timestamp."> 
 
>>> Thanks and happy to shoot a PR if it makes sense.> 
 
>>> Feng> 
 
 
 



Re: Pinning dependencies for Apache Airflow

2018-10-19 Thread Ash Berlin-Taylor
echo 'pandas==2.1.3' > constraints.txt

pip install -c constraints.txt apache-airflow[pandas]

That will ignore what ever we specify in setup.py and use 2.1.3. 
https://pip.pypa.io/en/latest/user_guide/#constraints-files

(sorry for the brief message) 

> On 19 Oct 2018, at 17:02, Maxime Beauchemin  
> wrote:
> 
>> releases in pip should have stable (pinned deps)
> I think that's an issue. When setup.py (the only reqs that setuptools/pip
> knows about) is restrictive, there's no way to change that in your
> environment, install will just fail if you deviate (are there any
> hacks/solutions around that that I don't know about???). For example if you
> want a specific version of pandas in your env, and Airflow's setup.py has
> another version of pandas pinned, you're out of luck. I think the only way
> is to fork and make you own build at that point as you cannot alter
> setup.py once it's installed. On the other hand, when a version range is
> specified in setup.py, you're free to pin using your own reqs.txt within
> the specified version range.
> 
> I think pinning in setup.py is just not viable. setup.py should have
> version ranges based semantic versioning expectations. (lib>=1.1.2,
> <2.0.0). Personally I think we should always have 2 bounds based on either
> 1-semantic versioning major release, or 2- a lower version than prescribed
> by semver that we know breaks backwards compatibility features we require.
> 
> I think we have consensus around something like pip-tools to generate a
> "deterministic" `requirements.txt`. A caveat is we may need 2:
> requirements.txt and requirements3.txt for Python 3 as some package
> versions can be flagged as only py2 or only py3.
> 
> Max
> 
> 
> On Fri, Oct 19, 2018 at 1:47 AM Jarek Potiuk 
> wrote:
> 
>> I think i might have a proposal that could be acceptable by everyone in the
>> discussion (hopefully :) ).  Let me summarise what I am leaning towards
>> now:
>> 
>> I think we can have a solution where it will be relatively easy to keep
>> both "open" and "fixed" requirements (open in setup.py, fixed in
>> requirements.txt). Possibly we can use pip-tools or poetry (including using
>> of the poetry-setup  which seem
>> to be able to generate setup.py/constraints.txt/requirements.txt from
>> poetry setup). Poetry is still "new" so it might not work, then we can try
>> to get similar approach with pip-tools or our own custom solution. Here are
>> the basic assumptions:
>> 
>>   - we can leave master with "open" requirements which makes it
>>   potentially unstable with potential conflicting dependencies. We will
>> also
>>   document how to generate stable set of requirements (hopefully
>>   automatically) and a way how to install from master using those. *This
>>   addresses needs of people using master for active development with
>> latest
>>   libraries.*
>>   - releases in pip should have stable (pinned deps). Upgrading pinned
>>   releases to latest "working" stable set should be part of the release
>>   process (possibly automated with poetry). We can try it out and decide
>> if
>>   we want to pin only direct dependencies or also the transitive ones (I
>>   think including transitive dependencies is a bit more stable). *This way
>>   we keep long-term "install-ability" of releases and make job of release
>>   maintainer easier*.
>>   - CI builds will use the stable dependencies from requirements.txt.
>> *This
>>   way we keep CI from dependency-triggered failures.*
>>   - we add documentation on how to use pip --constraints mechanism by
>>   anyone who would like to use airflow from PIP rather than sources, but
>>   would like also to use other (up- or down- graded) versions of specific
>>   dependencies. *This way we let active developers to work with airflow
>>   and more recent/or older releases.*
>> 
>> If we can have general consensus that we should try it, I might try to find
>> some time next week to do some "real work". Rather than implement it and
>> make a pull request immediately, I think of a Proof Of Concept branch
>> showing how it would work (with some artificial going back to older
>> versions of requirements). I thought about pre-flaskappbuilder upgrade in
>> one commit and update to post-flaskappbuilder upgrade in second, explaining
>> the steps I've done to get to it. That would be much better for the
>> community to discuss if that's the right approach.
>> 
>> Does it sound good ?
>> 
>> J.
>> 
>> On Wed, Oct 17, 2018 at 2:21 AM Daniel (Daniel Lamblin) [BDP - Seoul] <
>> lamb...@coupang.com> wrote:
>> 
>>> On 10/17/18, 12:24 AM, "William Pursell" 
>>> wrote:
>>> 
>>>I'm jumping in a bit late here, and perhaps have missed some of the
>>>discussion, but I haven't seen any mention of the fact that pinning
>>>versions in setup.py isn't going to solve the problem.  Perhaps it's
>>>my lack of experience with pip, but currently pip doesn't provide any
>>>guarantee that the version of a

Re: Ingest daily data, but delivery is always delayed by two days

2018-10-12 Thread Ash Berlin-Taylor
That would work for some of our other uses cases (and has been an idea in our 
backlog for months) but not this case as we're reading from someone else's 
bucket so can't set up notifications etc. :(

-ash

> On 12 Oct 2018, at 11:57, Bolke de Bruin  wrote:
> 
> S3 Bucket notification that triggers a dag?
> 
> Verstuurd vanaf mijn iPad
> 
>> Op 12 okt. 2018 om 12:42 heeft Ash Berlin-Taylor  het 
>> volgende geschreven:
>> 
>> A lot of our dags are ingesting data (usually daily or weekly) from 
>> suppliers, and they are universally late.
>> 
>> In the case I'm setting up now the delivery lag is about 30hours - data for 
>> 2018-10-10 turned up at 2018-10-12 05:43.
>> 
>> I was going to just set this up with an S3KeySensor and a daily schedule, 
>> but I'm wondering if anyone has any other bright ideas for a better way of 
>> handling this sort of case:
>> 
>>   dag = DAG(
>>   DAG_ID
>>   default_args=args,
>>   start_date=args['start_date'],
>>   concurrency=1,
>>   schedule_interval='@daily',
>>   params={'country': cc}
>>   )
>> 
>>   with dag:
>>   task = S3KeySensor(
>>   task_id="await_files",
>>   bucket_key="s3://bucket/raw/table1-{{ params.country }}/{{ 
>> execution_date.strftime('%Y/%m/%d') }}/SUCCESS",
>>   poke_interval=60 * 60 * 2,
>>   timeout=60 * 60 * 72,
>>   )
>> 
>> That S3 key sensor is _going_ to fail the first 18 times or so it runs which 
>> just seems silly.
>> 
>> One option could be to use `ds_add` or similar on the execution date, but I 
>> don't like breaking the (obvious) link between execution date and which 
>> files it picks up, so I've ruled out this option
>> 
>> I could use a Time(Delta)Sensor to just delay the start of the checking. I 
>> guess with the new change in master to make sensors yield their execution 
>> slots that's not a terrible plan.
>> 
>> Does anyone else have any other idea, including possible things we could add 
>> to Airflow itself.
>> 
>> -ash
>> 



Ingest daily data, but delivery is always delayed by two days

2018-10-12 Thread Ash Berlin-Taylor
A lot of our dags are ingesting data (usually daily or weekly) from suppliers, 
and they are universally late.

In the case I'm setting up now the delivery lag is about 30hours - data for 
2018-10-10 turned up at 2018-10-12 05:43.

I was going to just set this up with an S3KeySensor and a daily schedule, but 
I'm wondering if anyone has any other bright ideas for a better way of handling 
this sort of case:

dag = DAG(
DAG_ID
default_args=args,
start_date=args['start_date'],
concurrency=1,
schedule_interval='@daily',
params={'country': cc}
)

with dag:
task = S3KeySensor(
task_id="await_files",
bucket_key="s3://bucket/raw/table1-{{ params.country }}/{{ 
execution_date.strftime('%Y/%m/%d') }}/SUCCESS",
poke_interval=60 * 60 * 2,
timeout=60 * 60 * 72,
)

That S3 key sensor is _going_ to fail the first 18 times or so it runs which 
just seems silly.

One option could be to use `ds_add` or similar on the execution date, but I 
don't like breaking the (obvious) link between execution date and which files 
it picks up, so I've ruled out this option

I could use a Time(Delta)Sensor to just delay the start of the checking. I 
guess with the new change in master to make sensors yield their execution slots 
that's not a terrible plan.

Does anyone else have any other idea, including possible things we could add to 
Airflow itself.

-ash



Re: Pinning dependencies for Apache Airflow

2018-10-08 Thread Ash Berlin-Taylor
;>> needs
>>>>> for requirements, you might actually - in the very same way with
>>>>> pip-tools/poetry - upgrade all your dependencies in your local fork of
>>>>> Airflow before someone else does it in master/release. Those tools kind
>>>> of
>>>>> democratise dependency management. It should be as easy as `pip-compile
>>>>> --upgrade` or `poetry update` and you will get all the
>>> "non-conflicting"
>>>>> latest dependencies in your local fork (and poetry especially seems to
>>> do
>>>>> all the heavy lifting of figuring out which versions will work). You
>>>> should
>>>>> be able to test and publish it locally as your private package for
>>> local
>>>>> installations. You can even mark the specific dependency you want to
>>> use
>>>>> specific version and let pip-tools/poetry figure out exact versions of
>>>>> other requirements. You can even make a PR with such upgrade eventually
>>>> to
>>>>> get it faster in master. You can even downgrade in case newer
>>> dependency
>>>>> causes problems for you in similar way. Guided by the tools, it's much
>>>>> faster than figuring the versions out by yourself.
>>>>> 
>>>>> As long as we have simple way of managing it and document how to
>>>>> upgrade/downgrade dependencies in your own fork, and mention how to
>>>> locally
>>>>> release Airflow as a package, I think your case could be covered even
>>>>> better than now. What do you think ?
>>>>> 
>>>>> J.
>>>>> 
>>>>> On Fri, Oct 5, 2018 at 2:34 PM EKC (Erik Cederstrand)
>>>>>  wrote:
>>>>> 
>>>>>> For us, exact pinning of versions would be problematic. We have DAG
>>>> code
>>>>>> that shares direct and indirect dependencies with Airflow, e.g. lxml,
>>>>>> requests, pyhive, future, thrift, tzlocal, psycopg2 and ldap3. If our
>>>> DAG
>>>>>> code for some reason needs a newer point release due to a bug that's
>>>>> fixed,
>>>>>> then we can't cleanly build a virtual environment containing the
>>> fixed
>>>>>> version. For us, it's already a problem that Airflow has quite strict
>>>>> (and
>>>>>> sometimes old) requirements in setup.py.
>>>>>> 
>>>>>> Erik
>>>>>> 
>>>>>> From: Jarek Potiuk 
>>>>>> Sent: Friday, October 5, 2018 2:01:15 PM
>>>>>> To: dev@airflow.incubator.apache.org
>>>>>> Subject: Re: Pinning dependencies for Apache Airflow
>>>>>> 
>>>>>> I think one solution to release approach is to check as part of
>>>> automated
>>>>>> Travis build if all requirements are pinned with == (even the deep
>>>> ones)
>>>>>> and fail the build in case they are not for ALL versions (including
>>>>>> dev). And of course we should document the approach of
>>>> releases/upgrades
>>>>>> etc. If we do it all the time for development versions (which seems
>>>> quite
>>>>>> doable), then transitively all the releases will also have pinned
>>>>> versions
>>>>>> and they will never try to upgrade any of the dependencies. In poetry
>>>>>> (similarly in pip-tools with .in file) it is done by having a .lock
>>>> file
>>>>>> that specifies exact versions of each package so it can be rather
>>> easy
>>>> to
>>>>>> manage (so it's worth trying it out I think  :D  - seems a bit more
>>>>>> friendly than pip-tools).
>>>>>> 
>>>>>> There is a drawback - of course - with manually updating the module
>>>> that
>>>>>> you want, but I really see that as an advantage rather than drawback
>>>>>> especially for users. This way you maintain the property that it will
>>>>>> always install and work the same way no matter if you installed it
>>>> today
>>>>> or
>>>>>> two months ago. I think the biggest drawback for maintainers is that
>>>> you
>>>>>> need some kind of monitoring of security vulnerabilities and cannot
>>

Re: Pinning dependencies for Apache Airflow

2018-10-05 Thread Ash Berlin-Taylor
One thing to point out here.

Right now if you `pip install apache-airflow=1.10.0` in a clean environment it 
will fail.

This is because we pin flask-login to 0.2.1 but flask-appbuilder is >= 1.11.1, 
so that pulls in 1.12.0 which requires flask-login >= 0.3.

So I do think there is maybe something to be said about pinning for releases. 
The down side to that is that if there are updates to a module that we want 
then we have to make a point release to let people get it

Both methods have draw-backs

-ash

> On 4 Oct 2018, at 17:13, Arthur Wiedmer  wrote:
> 
> Hi Jarek,
> 
> I will +1 the discussion Dan is referring to and George's advice.
> 
> I just want to double check we are talking about pinning in
> requirements.txt only.
> 
> This offers the ability to
> pip install -r requirements.txt
> pip install --no-deps airflow
> For a guaranteed install which works.
> 
> Several different requirement files can be provided for specific use cases,
> like a stable dev one for instance for people wanting to work on operators
> and non-core functions.
> 
> However, I think we should proactively test in CI against unpinned
> dependencies (though it might be a separate case in the matrix) , so that
> we get advance warning if possible that things will break.
> CI downtime is not a bad thing here, it actually caught a problem :)
> 
> We should unpin as possible in setup.py to only maintain minimum required
> compatibility. The process of pinning in setup.py is extremely detrimental
> when you have a large number of python libraries installed with different
> pinned versions.
> 
> Best,
> Arthur
> 
> On Thu, Oct 4, 2018 at 8:36 AM Dan Davydov 
> wrote:
> 
>> Relevant discussion about this:
>> 
>> https://github.com/apache/incubator-airflow/pull/1809#issuecomment-257502174
>> 
>> On Thu, Oct 4, 2018 at 11:25 AM Jarek Potiuk 
>> wrote:
>> 
>>> TL;DR; A change is coming in the way how dependencies/requirements are
>>> specified for Apache Airflow - they will be fixed rather than flexible
>> (==
>>> rather than >=).
>>> 
>>> This is follow up after Slack discussion we had with Ash and Kaxil -
>>> summarising what we propose we'll do.
>>> 
>>> *Problem:*
>>> During last few weeks we experienced quite a few downtimes of TravisCI
>>> builds (for all PRs/branches including master) as some of the transitive
>>> dependencies were automatically upgraded. This because in a number of
>>> dependencies we have  >= rather than == dependencies.
>>> 
>>> Whenever there is a new release of such dependency, it might cause chain
>>> reaction with upgrade of transitive dependencies which might get into
>>> conflict.
>>> 
>>> An example was Flask-AppBuilder vs flask-login transitive dependency with
>>> click. They started to conflict once AppBuilder has released version
>>> 1.12.0.
>>> 
>>> *Diagnosis:*
>>> Transitive dependencies with "flexible" versions (where >= is used
>> instead
>>> of ==) is a reason for "dependency hell". We will sooner or later hit
>> other
>>> cases where not fixed dependencies cause similar problems with other
>>> transitive dependencies. We need to fix-pin them. This causes problems
>> for
>>> both - released versions (cause they stop to work!) and for development
>>> (cause they break master builds in TravisCI and prevent people from
>>> installing development environment from the scratch.
>>> 
>>> *Solution:*
>>> 
>>>   - Following the old-but-good post
>>>   https://nvie.com/posts/pin-your-packages/ we are going to fix the
>>> pinned
>>>   dependencies to specific versions (so basically all dependencies are
>>>   "fixed").
>>>   - We will introduce mechanism to be able to upgrade dependencies with
>>>   pip-tools (https://github.com/jazzband/pip-tools). We might also
>> take a
>>>   look at pipenv: https://pipenv.readthedocs.io/en/latest/
>>>   - People who would like to upgrade some dependencies for their PRs
>> will
>>>   still be able to do it - but such upgrades will be in their PR thus
>> they
>>>   will go through TravisCI tests and they will also have to be specified
>>> with
>>>   pinned fixed versions (==). This should be part of review process to
>>> make
>>>   sure new/changed requirements are pinned.
>>>   - In release process there will be a point where an upgrade will be
>>>   attempted for all requirements (using pip-tools) so that we are not
>>> stuck
>>>   with older releases. This will be in controlled PR environment where
>>> there
>>>   will be time to fix all dependencies without impacting others and
>> likely
>>>   enough time to "vet" such changes (this can be done for alpha/beta
>>> releases
>>>   for example).
>>>   - As a side effect dependencies specification will become far simpler
>>>   and straightforward.
>>> 
>>> Happy to hear community comments to the proposal. I am happy to take a
>> lead
>>> on that, open JIRA issue and implement if this is something community is
>>> happy with.
>>> 
>>> J.
>>> 
>>> --
>>> 
>>> *Jarek Potiuk, Principal Software Engineer*
>>> Mobile: +48 660 

Re: Flask-AppBuilder has pinned versions of Click & Flask-Login in 1.10.0

2018-10-05 Thread Ash Berlin-Taylor
t;=0.10, installed: 0.12.2]
>- setuptools [required: Any, installed: 40.4.3]
>  - python-dateutil [required: >=2.3,<3, installed: 2.7.3]
>- six [required: >=1.5, installed: 1.11.0]
>  - python-nvd3 [required: ==0.15.0, installed: 0.15.0]
>- Jinja2 [required: >=2.8, installed: 2.8.1]
>  - MarkupSafe [required: Any, installed: 1.0]
>- python-slugify [required: >=1.2.5, installed: 1.2.6]
>  - Unidecode [required: >=0.04.16, installed: 1.0.22]
>  - requests [required: >=2.5.1,<3, installed: 2.19.1]
>- certifi [required: >=2017.4.17, installed: 2018.8.24]
>- chardet [required: >=3.0.2,<3.1.0, installed: 3.0.4]
>- idna [required: >=2.5,<2.8, installed: 2.7]
>- urllib3 [required: >=1.21.1,<1.24, installed: 1.23]
>  - setproctitle [required: >=1.1.8,<2, installed: 1.1.10]
>  - sqlalchemy [required: >=1.1.15,<1.2.0, installed: 1.1.18]
>  - tabulate [required: >=0.7.5,<=0.8.2, installed: 0.8.2]
>  - tenacity [required: ==4.8.0, installed: 4.8.0]
>- monotonic [required: >=0.6, installed: 1.5]
>- six [required: >=1.9.0, installed: 1.11.0]
>  - thrift [required: >=0.9.2, installed: 0.11.0]
>- six [required: >=1.7.2, installed: 1.11.0]
>  - tzlocal [required: >=1.4, installed: 1.5.1]
>- pytz [required: Any, installed: 2018.5]
>  - unicodecsv [required: >=0.14.1, installed: 0.14.1]
>  - werkzeug [required: >=0.14.1,<0.15.0, installed: 0.14.1]
>  - zope.deprecation [required: >=4.0,<5.0, installed: 4.3.0]
>- setuptools [required: Any, installed: 40.4.3]
> 
> On Thu, Oct 4, 2018 at 11:29 AM Kyle Hamlin  wrote:
> 
>> whoops remove the [[source]] at the end of the url = "
>> https://pypi.python.org/simple"; that is a typo.
>> 
>> On Thu, Oct 4, 2018 at 11:26 AM Kyle Hamlin  wrote:
>> 
>>> Thank you for the response Ash.
>>> 
>>> Even with your suggestion, there appear to be version conflicts all over
>>> the place. Can you get this Pipfile to install because I cannot?
>>> 
>>> *Pipfile:*
>>> 
>>> [[source]]
>>> url = "https://pypi.python.org/simple"; [[source]]
>>> verify_ssl = true
>>> name = "pypi"
>>> 
>>> [packages]
>>> apache-airflow = {editable = true, ref =
>>> "fb5ffd146a5a33820cfa7541e5ce09098f3d541a", git = "
>>> https://github.com/apache/incubator-airflow.git";, extras = ["s3",
>>> "slack", "kubernetes", "celery", "postgres", "mongo", "crypto"]}
>>> Flask-AppBuilder="==1.11.0"
>>> 
>>> [requires]
>>> python_version = "3.6"
>>> 
>>> 
>>> 
>>> 
>>> 
>>> 
>>> On Thu, Oct 4, 2018 at 10:50 AM Ash Berlin-Taylor  wrote:
>>> 
>>>> We've committed a fix for this to master and will include it in a 1.10.1
>>>> https://github.com/apache/incubator-airflow/commit/fb5ffd146a5a33820cfa7541e5ce09098f3d541a
>>>> 
>>>> 
>>>> For installing in the mea time pin `Flask-AppBuilder=1.11.0'
>>>> 
>>>>> On 4 Oct 2018, at 00:41, Kyle Hamlin  wrote:
>>>>> 
>>>>> Hi,
>>>>> 
>>>>> Today I was trying to upgrade Airflow to 1.10.0 and it appears that
>>>> there
>>>>> are some version conflicts with click and flask-login. I uncovered
>>>> these
>>>>> because I use Pipenv to manage our project's dependencies. You can see
>>>> here
>>>>> that Flask-AppBuilder pins click==6.7 and flask-login>=0.3,<0.5
>>>>> 
>>>>> 
>>>> https://github.com/dpgaspar/Flask-AppBuilder/blob/master/setup.py#L37-L47
>>>>> 
>>>>> I'm able to force pipenv to install click==6.7 because that is not
>>>> pinned
>>>>> in Airflow's setup.py, but I can do nothing about flask-login because
>>>>> Airflow pins the flask-login version:
>>>>> https://github.com/apache/incubator-airflow/blob/master/setup.py#L304
>>>>> 
>>>>> This prevents me from being able to upgrade to 1.10.0.
>>>>> 
>>>>> *Pipenv's Graphed project dependencies (conflicts highlighted):*
>>>>> 
>>>>> apache-airflow==1.10.0
>>>>> - alembic [required: >=0.8.3,<0.9, installed: 0.8.10]
>>>>>   - Mako [required: Any, installed: 1.0.7]
>>>>

Re: Flask-AppBuilder has pinned versions of Click & Flask-Login in 1.10.0

2018-10-04 Thread Ash Berlin-Taylor
We've committed a fix for this to master and will include it in a 1.10.1 
https://github.com/apache/incubator-airflow/commit/fb5ffd146a5a33820cfa7541e5ce09098f3d541a


For installing in the mea time pin `Flask-AppBuilder=1.11.0'

> On 4 Oct 2018, at 00:41, Kyle Hamlin  wrote:
> 
> Hi,
> 
> Today I was trying to upgrade Airflow to 1.10.0 and it appears that there
> are some version conflicts with click and flask-login. I uncovered these
> because I use Pipenv to manage our project's dependencies. You can see here
> that Flask-AppBuilder pins click==6.7 and flask-login>=0.3,<0.5
> 
> https://github.com/dpgaspar/Flask-AppBuilder/blob/master/setup.py#L37-L47
> 
> I'm able to force pipenv to install click==6.7 because that is not pinned
> in Airflow's setup.py, but I can do nothing about flask-login because
> Airflow pins the flask-login version:
> https://github.com/apache/incubator-airflow/blob/master/setup.py#L304
> 
> This prevents me from being able to upgrade to 1.10.0.
> 
> *Pipenv's Graphed project dependencies (conflicts highlighted):*
> 
> apache-airflow==1.10.0
>  - alembic [required: >=0.8.3,<0.9, installed: 0.8.10]
>- Mako [required: Any, installed: 1.0.7]
>  - MarkupSafe [required: >=0.9.2, installed: 1.0]
>- python-editor [required: >=0.3, installed: 1.0.3]
>- SQLAlchemy [required: >=0.7.6, installed: 1.2.12]
>  - bleach [required: ==2.1.2, installed: 2.1.2]
>- html5lib [required:
>> =0.pre,!=1.0b8,!=1.0b7,!=1.0b6,!=1.0b5,!=1.0b4,!=1.0b3,!=1.0b2,!=1.0b1,
> installed: 1.0.1]
>  - six [required: >=1.9, installed: 1.11.0]
>  - webencodings [required: Any, installed: 0.5.1]
>- six [required: Any, installed: 1.11.0]
>  - configparser [required: >=3.5.0,<3.6.0, installed: 3.5.0]
>  - croniter [required: >=0.3.17,<0.4, installed: 0.3.25]
>- python-dateutil [required: Any, installed: 2.7.3]
>  - six [required: >=1.5, installed: 1.11.0]
>  - dill [required: >=0.2.2,<0.3, installed: 0.2.8.2]
>  - flask [required: >=0.12.4,<0.13, installed: 0.12.4]
>- click [required: >=2.0, installed: 7.0]
>- itsdangerous [required: >=0.21, installed: 0.24]
>- Jinja2 [required: >=2.4, installed: 2.8.1]
>  - MarkupSafe [required: Any, installed: 1.0]
>- Werkzeug [required: >=0.7, installed: 0.14.1]
>  - flask-admin [required: ==1.4.1, installed: 1.4.1]
>- Flask [required: >=0.7, installed: 0.12.4]
>  - click [required: >=2.0, installed: 7.0]
>  - itsdangerous [required: >=0.21, installed: 0.24]
>  - Jinja2 [required: >=2.4, installed: 2.8.1]
>- MarkupSafe [required: Any, installed: 1.0]
>  - Werkzeug [required: >=0.7, installed: 0.14.1]
>- wtforms [required: Any, installed: 2.2.1]
>  - flask-appbuilder [required: >=1.11.1,<2.0.0, installed: 1.12.0]
>- click [required: ==6.7, installed: 7.0]
>- colorama [required: ==0.3.9, installed: 0.3.9]
>- Flask [required: >=0.10.0,<0.12.99, installed: 0.12.4]
>  - click [required: >=2.0, installed: 7.0]
>  - itsdangerous [required: >=0.21, installed: 0.24]
>  - Jinja2 [required: >=2.4, installed: 2.8.1]
>- MarkupSafe [required: Any, installed: 1.0]
>  - Werkzeug [required: >=0.7, installed: 0.14.1]
>- Flask-Babel [required: ==0.11.1, installed: 0.11.1]
>  - Babel [required: >=2.3, installed: 2.6.0]
>- pytz [required: >=0a, installed: 2018.5]
>  - Flask [required: Any, installed: 0.12.4]
>- click [required: >=2.0, installed: 7.0]
>- itsdangerous [required: >=0.21, installed: 0.24]
>- Jinja2 [required: >=2.4, installed: 2.8.1]
>  - MarkupSafe [required: Any, installed: 1.0]
>- Werkzeug [required: >=0.7, installed: 0.14.1]
>  - Jinja2 [required: >=2.5, installed: 2.8.1]
>- MarkupSafe [required: Any, installed: 1.0]
>- Flask-Login [required: >=0.3,<0.5, installed: 0.2.11]
>  - Flask [required: Any, installed: 0.12.4]
>- click [required: >=2.0, installed: 7.0]
>- itsdangerous [required: >=0.21, installed: 0.24]
>- Jinja2 [required: >=2.4, installed: 2.8.1]
>  - MarkupSafe [required: Any, installed: 1.0]
>- Werkzeug [required: >=0.7, installed: 0.14.1]
>- Flask-OpenID [required: ==1.2.5, installed: 1.2.5]
>  - Flask [required: >=0.10.1, installed: 0.12.4]
>- click [required: >=2.0, installed: 7.0]
>- itsdangerous [required: >=0.21, installed: 0.24]
>- Jinja2 [required: >=2.4, installed: 2.8.1]
>  - MarkupSafe [required: Any, installed: 1.0]
>- Werkzeug [required: >=0.7, installed: 0.14.1]
>  - python3-openid [required: >=2.0, installed: 3.1.0]
>- defusedxml [required: Any, installed: 0.5.0]
>- Flask-SQLAlchemy [required: ==2.1, installed: 2.1]
>  - Flask [required: >=0.10, installed: 0.12.4]
>- click [required: >=2.0, installed: 7.0]
>- itsdangerous [required: >=0.21, installed: 0.24]
>- Jinja2 [required: >=2.4, installed: 2.8.1]
>  - MarkupSa

Re: Slides from London Airflow Meetup #1

2018-10-02 Thread Ash Berlin-Taylor
Is there demand for me to give a re-run of this talk (recorded this time), 
probably over a Google Hangout or a YouTube stream?

-ah

> On 1 Oct 2018, at 14:14, Jarek Potiuk  wrote:
> 
> Very good slides indeed. It's fast-forward over concept of Airflow that
> everyone new should follow.
> 
> Maybe that could be a base for a short (interactive ?) tutorial for someone
> who would like to learn more about Airflow. I think there are few concepts
> which are not obvious (recently discussed execution_date) which get
> crystal-clear when you just follow the slides.
> 
> J.
> 
> On Fri, Sep 28, 2018 at 10:52 AM Kevin Yang  wrote:
> 
>> Wow the slides captured much more info that I thought and the talk covered
>> much wider range about Airflow than I thought. Very useful, great job,
>> thank you very much!
>> 
>> Cheers,
>> Kevin Y
>> 
>> On Tue, Sep 25, 2018 at 3:36 AM Sumit Maheshwari 
>> wrote:
>> 
>>> Thanks a lot, Ash.
>>> 
>>> 
>>> 
>>> On Tue, Sep 25, 2018 at 3:47 PM Ash Berlin-Taylor 
>> wrote:
>>> 
>>>> Hi everyone,
>>>> 
>>>> Slides from the first London Airflow Meetup are available in this this
>>>> Google drive folder:
>>>> 
>> https://drive.google.com/drive/folders/1wiSkrg_1rvqGrmbYN7rNFQaqW0Ty10Vk
>>> <
>>>> 
>> https://drive.google.com/drive/folders/1wiSkrg_1rvqGrmbYN7rNFQaqW0Ty10Vk
>>>> 
>>>> 
>>>> Sorry, we didn't get them captured on video :)
>>> 
>> 
> 
> 
> -- 
> 
> *Jarek Potiuk, Principal Software Engineer*
> Mobile: +48 660 796 129



Re: Solved: suppress PendingDeprecationWarning messages in airflow logs

2018-09-28 Thread Ash Berlin-Taylor
Sounds good for your use, certainly.

I mainly wanted to make sure other people knew before blindly equipping a 
foot-canon :)

-ash

> On 29 Sep 2018, at 00:09, Sean Carey  wrote:
> 
> Thanks, Ash.  I understand what you're saying.  The warnings are coming from 
> the Qubole operator.  We get a lot of this:
> 
> PendingDeprecationWarning: Invalid arguments were passed to QuboleOperator. 
> Support for passing such arguments will be dropped in Airflow 2.0. Invalid 
> arguments were:
> *args: ()
> **kwargs: {...}
>  category=PendingDeprecationWarning
> 
> We've spoken to Qubole about this and they plan to address it.  In the 
> meantime, it generates a ton of noise in our logs which we'd like to suppress 
> -- we are aware of the issue, and don't need to be told about it every 5 
> seconds.
> 
> As for it suddenly breaking, being that this is pending Airflow 2.0 I feel 
> the risk is low and when we do upgrade it will be thoroughly tested.
> 
> Thanks!
> 
> Sean
> 
> 
> On 9/28/18, 5:01 PM, "Ash Berlin-Taylor"  wrote:
> 
>What deprecation warnings are you getting? Are they from Airflow itself 
> (i.e. things Airflow is calling like flask_wtf, etc) or of your use of 
> Airflow?
> 
>If it is the form could you check and see if someone has already reported 
> a Jira issue so we can fix them? 
> https://issues.apache.org/jira/issues/?jql=project%3DAIRFLOW
> 
>If it is the latter PLEASE DO NOT IGNORE THESE.
> 
>Deprecation warnings are how we, the Airflow community tell users that you 
> need to make a change to your DAG/code/config to upgrade things. If you 
> silence these warnings you will have a much harder time upgrading to new 
> versions of Airflow (read: you might suddenly find that things stop working 
> because you turned of the warnings.)
> 
>-ash
> 
>> On 28 Sep 2018, at 22:52, Sean Carey  wrote:
>> 
>> Hello,
>> 
>> I’ve been looking for a way to suppress the PendingDeprecationWarning 
>> messages cluttering our airflow logs and I have a working solution which I 
>> thought I would share.
>> 
>> In order to do this, you first need to configure airflow for custom logging 
>> using steps 1-4 here:
>> 
>> https://airflow.readthedocs.io/en/stable/howto/write-logs.html#writing-logs-to-azure-blob-storage
>> 
>> (note that although the document is for Azure remote logging you don’t 
>> actually need azure for this)
>> 
>> Next, modify the log_config.py script created in the step above as follows:
>> 
>> 
>> 1.  Import logging
>> 2.  Define the filter class:
>> 
>> 
>> 
>> class DeprecationWarningFilter(logging.Filter):
>> 
>>   def filter(self, record):
>> 
>>   allow = 'DeprecationWarning' not in record.msg
>> 
>>   return allow
>> 
>> 
>> 1.  Add a “filters” section to the LOGGING_CONFIG beneath “formatters:
>> 
>> 
>> 
>> 'filters': {
>> 
>>   'noDepWarn': {
>> 
>>   '()': DeprecationWarningFilter,
>> 
>>   }
>> 
>>   },
>> 
>> 
>> 1.  For each of the handlers where you want to suppress the warnings 
>> (console, task, processor, or any of the remote log handlers you may be 
>> using) add the following line to its configuration:
>> 
>> 
>> 
>> 'filters': ['noDepWarn'],
>> 
>> Restart airflow and your logs should be clean.
>> 
>> 
>> Sean Carey
>> 
> 
> 
> 



Re: Solved: suppress PendingDeprecationWarning messages in airflow logs

2018-09-28 Thread Ash Berlin-Taylor
What deprecation warnings are you getting? Are they from Airflow itself (i.e. 
things Airflow is calling like flask_wtf, etc) or of your use of Airflow?

If it is the form could you check and see if someone has already reported a 
Jira issue so we can fix them? 
https://issues.apache.org/jira/issues/?jql=project%3DAIRFLOW

If it is the latter PLEASE DO NOT IGNORE THESE.

Deprecation warnings are how we, the Airflow community tell users that you need 
to make a change to your DAG/code/config to upgrade things. If you silence 
these warnings you will have a much harder time upgrading to new versions of 
Airflow (read: you might suddenly find that things stop working because you 
turned of the warnings.)

-ash

> On 28 Sep 2018, at 22:52, Sean Carey  wrote:
> 
> Hello,
> 
> I’ve been looking for a way to suppress the PendingDeprecationWarning 
> messages cluttering our airflow logs and I have a working solution which I 
> thought I would share.
> 
> In order to do this, you first need to configure airflow for custom logging 
> using steps 1-4 here:
> 
> https://airflow.readthedocs.io/en/stable/howto/write-logs.html#writing-logs-to-azure-blob-storage
> 
> (note that although the document is for Azure remote logging you don’t 
> actually need azure for this)
> 
> Next, modify the log_config.py script created in the step above as follows:
> 
> 
>  1.  Import logging
>  2.  Define the filter class:
> 
> 
> 
> class DeprecationWarningFilter(logging.Filter):
> 
>def filter(self, record):
> 
>allow = 'DeprecationWarning' not in record.msg
> 
>return allow
> 
> 
>  1.  Add a “filters” section to the LOGGING_CONFIG beneath “formatters:
> 
> 
> 
> 'filters': {
> 
>'noDepWarn': {
> 
>'()': DeprecationWarningFilter,
> 
>}
> 
>},
> 
> 
>  1.  For each of the handlers where you want to suppress the warnings 
> (console, task, processor, or any of the remote log handlers you may be 
> using) add the following line to its configuration:
> 
> 
> 
> 'filters': ['noDepWarn'],
> 
> Restart airflow and your logs should be clean.
> 
> 
> Sean Carey
> 



Re: Travis CI tests failing in master

2018-09-28 Thread Ash Berlin-Taylor
Looks like someone has beaten you to the punch 
https://github.com/apache/incubator-airflow/pull/3968#pullrequestreview-159772413
 


:)

-ash

> On 28 Sep 2018, at 09:59, Kaxil Naik  wrote:
> 
> The problem is 80% solved. The only failing error are related to
> TestPythonVirtualenvOperator .
> More details are discussed at
> https://github.com/apache/incubator-airflow-ci/pull/3
> 
> It is because protocol=3 is used instead of 2.
> 
> Can someone take a dig to resolve this?
> 
> Regards,
> Kaxil
> 
> 
> On Thu, Sep 27, 2018 at 11:36 AM Jarek Potiuk 
> wrote:
> 
>> I see also that the solution is on it's way for the first problem as well
>> :). https://github.com/apache/incubator-airflow/pull/3962
>> 
>> On Thu, Sep 27, 2018 at 10:47 AM Jarek Potiuk 
>> wrote:
>> 
>>> Seem that the second problem is already solved in this PR:
>>> https://github.com/apache/incubator-airflow-ci/pull/3 - hopefully will
>> be
>>> merged soon :).
>>> 
>>> On Thu, Sep 27, 2018 at 9:57 AM airflowuser
>>>  wrote:
>>> 
 Also I see this a lot:
 1) ERROR: Failure: ProgrammingError
>> ((_mysql_exceptions.ProgrammingError)
 (1146, "Table 'airflow.task_instance' doesn't exist") [SQL: u'DELETE
>> FROM
 task_instance WHERE task_instance.dag_id = %s'] [parameters:
 ('unit_tests',)])
 
 
 Sent with ProtonMail Secure Email.
 
 ‐‐‐ Original Message ‐‐‐
 On Thursday, September 27, 2018 9:20 AM, Jarek Potiuk <
 jarek.pot...@polidea.com> wrote:
 
> Hello Everyone,
> 
> Seems that since yesterday the builds started to fail in Travis CI
 badly.
> And we need some urgent actions to fix it otherwise everyone
>> developing
> Airflow is affected now.
> 
> I am a bit fresh in Airflow so I am not sure how to handle those
 problems
> in an "emergency" way (and I am not sure if I am the best to propose
> solutions), so it would be great that more experienced people could
 help to
> solve it.
> 
> It was not caused by any change - builds that worked before, started
>> to
> fail now. The nature of those errors suggests transitive dependencies
> problems (they are all errors during installation of requirements).
> 
> Currently the master of airflow-incubator is broken. Actually the
 problem
> has worsened - now we seem to have two distinct problems depending on
> python requirement.
> 
> Error 1: Flask appbuilder
> 
> Yesterday we had just this one (for both python 2.7 and 3.5):
> "flask-appbuilder 1.11.1 has requirement click==6.7, but you'll have
 click
> 7.0 which is incompatible."
> and resulting error:
> 
> pkg_resources.DistributionNotFound: The 'click==6.7' distribution was
 not
> found and is required by flask-appbuilder
> 
> You can see it failing in the master here:
> https://travis-ci.org/apache/incubator-airflow/builds/433628053
> 
> Likely the solution is to pin flask-appbuilder to one of the earlier
> versions or exclude 1.11.1 from valid versions, or limit
 flask-appbuilder
> to <1.11.1 (but I am not sure which one is the best solution). I will
 try
> some of that soon to see if that helps and let you know.
> 
> *Error 2: pynacl build problem *
> 
> This is only affecting python3.5. And started to appear only today in
 new
> pull requests (of several people including mine, so it's again some
> dependency problem).
> 
> Exception: ERROR: The 'make' utility is missing from PATH
> 
> 
 
>> ---

Re: Fundamental change - Separate DAG name and id.

2018-09-25 Thread Ash Berlin-Taylor

> On 24 Sep 2018, at 23:12, Alex Tronchin-James 949-412-7220 
>  wrote:
> 
> Re: [Brian Greene] "How does filename matter?  Frankly I wish the filename
> was REQUIRED to be the dag name so people would quit confusing themselves
> by mismatching them !"
> 
> FWIW in the Facebook predecessor to airflow, the file path/name WAS the dag
> name. E.g. if your dag resided in best_team/new_project/sweet_dag.py then
> the dag name would be best_team.new_project.sweet_dag
> All tasks were identified by their variable name after that prefix: E.g. if
> best_team.new_project.sweet_dag defines an operator in a variable named
> task1, then the respective task_id is best_team.new_project.sweet_dag.task1.
> 
> Airflow provides additional flexibility to specify DAG and task names to
> avoid the sometimes annoyingly long task names this resulted in and allow
> DAG/task names without forcing a code directory structure and python's
> variable naming restrictions, and I think this is a Good Thing.
> 
> It seems like airflowuser is trying to provide additional metadata beyond
> the DAG/task names (so far, a DAG 'title' distinct from the ID). I've
> provided this through a README.md included in the DAG source directory, but
> maybe it would be a win to instead add a DAG parameter named 'readme' of
> string type which can include a docstring or even markdown to provide any
> desired additional metadata? This could then be displayed by the UI to
> simplify access to any such provided DAG documentation.

You mean like https://airflow.apache.org/concepts.html#documentation-notes 
 ? ✨
> 
> 🍿
> 
> 
> 
> On Thu, Sep 20, 2018 at 10:45 PM Brian Greene <
> br...@heisenbergwoodworking.com> wrote:
> 
>> Prior to using airflow for much, on first inspection, I think I may have
>> agreed with you.
>> 
>> After a bit of use I’d agree with Fokko and others - this isn’t really a
>> problem, and separating them seems to do more harm than good related to
>> deployment.
>> 
>> I was gonna stop there, but why?
>> 
>> You can add a task to a dag that’s deployed and has run and still view
>> history.  The “new” task shows up white Squares in the old dags.  nobody
>> said you’re required to also rename the dag when you do so this.  If your
>> process or desire or design determines you need to rename it, well then by
>> definition... isn’t it a new thing without a history?  Airflow is
>> implementing exactly that.
>> 
>> One could argue that renaming to reflect exact purpose is good practice.
>> Yes, I’d agree, but again following that logic if it’s a small enough
>> change to “slip in” then the name likely shouldn’t change.  If it’s big
>> enough I want to change the name then it’s a big enough change that I’m
>> functionally running something “new”, and I expect to need to account for
>> that.  Airflow is enforcing that logic by coupling the name to the
>> deployment of what you said was a new process.
>> 
>> One might put forth that changing the name to be more descriptive In the
>> ui makes it easier for support staff.  I think perhaps if that’s your
>> challenge it’s not airflow that’s a problem.  Dags are of course documented
>> elsewhere besides their name, right?  Yeah it’s self documenting (and the
>> graphs are cool), but I have to assume there’s something besides the NAME
>> to tell people what it does.  Additionally, far more than the name is
>> required for even an operator or monitor watcher to take action - you don’t
>> expect them to know which tasks to rerun or how to troubleshoot failures
>> just based on your “now most descriptive name in the UI” do you?
>> 
>> I spent time In an informatica shop where all the jobs were numbered.
>> Numbered.  Let’s be more exact... their NAMES were NUMBERS like 56709.
>> Terrible, but 100% worked, because while a descriptive name would have been
>> useful, the name is the thing that’s supposed to NOT CHANGE (see code of
>> Abibarshim), and all the other information can attach to that in places
>> where you write... other information.  People would curse a number “F’ing
>> 6291 failed again” - everyone knew what they were talking about.. I digress.
>> 
>> You might decide to document “dag ID 12” or just “12” on your wiki - I’m
>> going to document “daily_sales_import”.  And when things start failing at
>> 3am it’s not my dag “56” that’s failing, it’s the sales_export dag.  But if
>> you document “12”, that’s still it’s name, and it’d better be 12 in all
>> your environments and documents.  This also means the actual db IDs from
>> your proposal are almost certainly NOT the same across your environments,
>> making the 12 unchangeable name!
>> 
>> There are lots of languages (most of them) where the name of a thing is
>> important and hard to change.  It’s not a bad thing, and I’d assume that
>> deploying a thing by name has some significance in many systems.  Go rename
>> a class in... pick a language... tell me how that should be easier to d

Slides from London Airflow Meetup #1

2018-09-25 Thread Ash Berlin-Taylor
Hi everyone,

Slides from the first London Airflow Meetup are available in this this Google 
drive folder: 
https://drive.google.com/drive/folders/1wiSkrg_1rvqGrmbYN7rNFQaqW0Ty10Vk 


Sorry, we didn't get them captured on video :)

Re: Airflow: Apache Graduation

2018-09-21 Thread Ash Berlin-Taylor
Your guess is as good as mine on what is involved in graduation. I think that 
we sorted the Licensing issues in 1.10.0 (even if the way we sorted it was a 
little annoying - having to specify the environment variable at `pip install` 
time is a littlle bit un-pythonic, but I wasn't thinking of fixing that in 
1.10.1)

Some of the steps and requirements are listed here 
graduating_to_a_top_level_project 
, but in summary from a 
quick read of it I think the process is:

- Collect some information, put it on 
http://incubator.apache.org/projects/airflow.html 
 for the IPMC
- We as a community hold a vote on if we think we're ready to graduate
- IPMC vote on it too
- propose motion for (monthly) Apache board meeting

There might be a few more steps involves, such as drafting a Charter (we would 
probably start with a "stock" Apache one)

-ash

> On 20 Sep 2018, at 18:22, Maxime Beauchemin  
> wrote:
> 
> Yeah let's make it happen! I'm happy to set some time aside to help with
> the final push.
> 
> Max
> 
> On Thu, Sep 20, 2018 at 9:53 AM Sid Anand  wrote:
> 
>> Folks! (specifically Bolke, Fokko, Ash)
>> What's needed to graduate from Apache?
>> 
>> Can we make 1.10.1 be about meeting our licensing needs to allow us to
>> graduate?
>> 
>> -s
>> 



Re: It's very hard to become a committer on the project

2018-09-20 Thread Ash Berlin-Taylor
> Remember my basic question: I want to contribute - how on earth I can find a 
> ticket that is suitable for first time committer? Can you show me?

There aren't that many feature requests in Jira, so looking there for easy 
tickets, is as you have probably found a fruitless exercise. I'd recommend 
using Airflow and when you come across something you want fixed, or a feature 
you want added, that you open a PR for it. 

> Again, If decided to stay with Jira.. I highly recommend that someone from 
> the project will maintain it. Don't allow to regular users to tag and set 
> priorities for the tickets.. someone from the project should do it.

Are you volunteering to sponsor someone's time to be able to do this?

> 
> 
> Sent with [ProtonMail](https://protonmail.com) Secure Email.
> 
> ‐‐‐ Original Message ‐‐‐
> On Tuesday, September 18, 2018 11:57 AM, Sid Anand  wrote:
> 
>> Hi Folks!
>> For some history, Airlfow started on GH issues. We also had a very popular 
>> Google group. When we moved to Apache, we were told that Jira was the way we 
>> needed to go for issue tracking because it resided on Apache infrastructure. 
>> When we moved over, we had to drop 100+ GH issues on the floor -- there was 
>> no way to transfer them to Jira and maintain the original submitter/owner 
>> info since there was no mapping of users between the 2 systems.
>> 
>> Here's a pie chart of our existing issues by status:
>> [https://issues.apache.org/jira/secure/ConfigureReport.jspa?projectOrFilterId=project-12320023&statistictype=statuses&selectedProjectId=12320023&reportKey=com.atlassian.jira.jira-core-reports-plugin%3Apie-report&atl_token=A5KQ-2QAV-T4JA-FDED|a85ff737799378265f90bab4f1456b5e2811a507|lin&Next=Next](https://issues.apache.org/jira/secure/ConfigureReport.jspa?projectOrFilterId=project-12320023&statistictype=statuses&selectedProjectId=12320023&reportKey=com.atlassian.jira.jira-core-reports-plugin%3Apie-report&atl_token=A5KQ-2QAV-T4JA-FDED%7Ca85ff737799378265f90bab4f1456b5e2811a507%7Clin&Next=Next)
>> 
>> I'm attaching a screen shot as well.
>> 
>> I think we all agree that there is better integration between GH PRs and GH 
>> Issues than between GH PRs and Jira issues.
>> 
>> There are some practical matters to consider:
>> 
>> - For the 1100-1200 unclosed/unresolved issues, how will we transfer them to 
>> GH or will we drop those on the floor? How would we map submitters between 
>> the 2 systems, and how would we transfer the content/comments,etc...
>> - For the existing closed PRs (>3k), whose PRs reference JIRA, we'd need to 
>> keep JIRA around in read-only mode so we could reference the bug/feature 
>> details, but somehow disallow new JIRA creations, lest some people continue 
>> to use it to create new issues
>> - I'm assuming the GH issue naming would not conflict with that of JIRA 
>> naming in commit message subjects and PRs. In other words, 
>> incubator-airlow-1 vs AIRFLOW-1 or airflow-1 vs AIRFLOW-1 or possibly 
>> conflict at AIRFLOW-1? Once we graduate, I'm pretty sure the incubator name 
>> will be dropped, so there may be a naming conflict.
>> 
>> In the end, these are 2 different tools. The issues you raise are mainly 
>> around governance.
>> 
>> If you folks would like to propose a new means to manage the JIRAs, can you 
>> outline a solution on Wiki and drop a link into an email on this list? We 
>> can then raise a vote.
>> 
>> IMHO, our community would scale the best if more people picked up 
>> responsibilities such as these. Grooming/Organizing JIRAs doesn't need to be 
>> a responsibility owned by the maintainers. Anyone can take the lead on 
>> discussions, etc...
>> 
>> -s
>> 
>> On Mon, Sep 17, 2018 at 2:09 AM Sumit Maheshwari  
>> wrote:
>> 
>>> Strong +1 for moving to GitHub from Jira.
>>> 
>>> On Mon, Sep 17, 2018 at 12:35 PM George Leslie-Waksman 
>>> wrote:
>>> 
 Are there Apache rules preventing us from switching to GitHub Issues?
 
 That seems like it might better fit much of Airflow's user base.
 
 
 On Sun, Sep 16, 2018, 9:21 AM Jeff Payne  wrote:
 
> I agree that Jira could be better utilized. I read the original
> conversation on the mailing list about how Jira should be used (or if it
> should be used at all) and I'm still unclear about why it was picked over
> just using github issues. It refers to a dashboard, which I've yet to
> investigate, but Jira is much more than just dashboards.
> 
> If this project is going to use Jira, then:
> 
> 1) It would be great to see moderation and labeling of the Jira issues by
> the main contributors to make it easier for people to break into
> contributing.
> 2) It would also be nice if the initial conversation of whether or not an
> issue warrants development at all happened on the Jira issue, or at least
> some acknowledgement by the main contributors.
> 3) Larger enhancements and efforts or vague suggestions still get
> discussed o

Re: Connection Management in Multi-tenancy Scenario

2018-09-19 Thread Ash Berlin-Taylor
You are correct that currently all DAGs can access all connections and 
variables.

The other thing to bear in mind: currently PythonOperators have an active 
connection to the metadata DB where connections are stored, so at best this is 
"co-operative" security, to prevent one team from accessing another team's 
connections, and not a hard barrier against an even mildly determined attacker.

As for the implementation of it: it would be worth looking to see if we can use 
the Permissions model built in to FAB (Flask App Builder) that we are using in 
the RBAC-based UI. This would allow for much more granular permissions, and 
provides a pre-existing management UI for it to.

I don't know if this would make the work dependent on the (in progress?) 
DAG-level access controls.

-ash

> On 19 Sep 2018, at 15:00, Deng Xiaodong  wrote:
> 
> Hi folks,
> 
> Thinking of a scenario: I may have multiple users in the same Airflow
> instance. I can use filter_by_owner feature so that each user can only see
> their own DAGs. But what if their DAGs are using different data sources,
> say owner A is using mysql_conn_a, and owner B is using mysql_conn_b, and
> we don't want to allow them to access each other's database?
> 
> Seems like all DAG (no matter who is the owner) can access all defined
> connections? or have I missed something?
> 
> If my suspicion is making sense, I think it would be necessary to have
> values "*if_protect*" and "*owner*" for each connection. When "if_protect"
> == True, only DAGs whose owner == "owner" would be able to use this
> connection. I would like to take this up to prepare a PR.
> 
> Thanks.
> 
> XD



Re: It's very hard to become a committer on the project

2018-09-18 Thread Ash Berlin-Taylor

> On 18 Sep 2018, at 13:07, Ash Berlin-Taylor  wrote:
> 
> Somewhat annoyingly you can't (or we don't have permission to) set the fix 
> version on a closed Jira ticket. We can probably ask for permission to edit 
> closed/resolved Jira tickets in the AIRFLOW project, which would remove some 
> of the pain here. I've asked for that 
> https://issues.apache.org/jira/browse/INFRA-17033 
> <https://issues.apache.org/jira/browse/INFRA-17033>

I was being blind- the Edit button is in a different place on closed issues is 
all >_<

Re: It's very hard to become a committer on the project

2018-09-18 Thread Ash Berlin-Taylor
Before we could merge directly on Github the way we (committers) closed PRs was 
by using a script that would close the Jira ticket and set the fix version at 
the same time.

Somewhat annoyingly you can't (or we don't have permission to) set the fix 
version on a closed Jira ticket. We can probably ask for permission to edit 
closed/resolved Jira tickets in the AIRFLOW project, which would remove some of 
the pain here. I've asked for that 
https://issues.apache.org/jira/browse/INFRA-17033

Thanks for finding the issues to close btw, it's helpful.

-ash

> On 18 Sep 2018, at 12:33, Deng Xiaodong  wrote:
> 
> Hi,
> 
> Regarding your 2nd point, I think one of the reasons for what the
> committers are not using hook to automatically close JIRA ticket after PR
> merged is that they need to set fix version for each ticket.
> 
> XD
> 
> 
> On Tue, Sep 18, 2018 at 17:17 Юли Волкова  > wrote:
> 
>> Hi, Sid, thanks for some clarification about JIRA. At now, I walking
>> through jira's task and ask Ash or other guys with admins right to close
>> issues (for example:
>> 
>> https://issues.apache.org/jira/browse/AIRFLOW-307?page=com.atlassian.jira.plugin.system.issuetabpanels%3Aall-tabpanel
>> ),
>> because PRs was successful merged 2 years ago. I would like to help in this
>> scope: 1) If I have same moderators rights I can close tickets by myself
>> 2) JIRA is a powerful tool on about integration - question only in do we
>> have admin rights to add different integrations and triggers-hooks to JIRA.
>> First of all, I propose to create hook what close task after PR was merged
>> in master. Many task are open without any status changing with  already
>> merged PRs.
>> 
>> On Tue, Sep 18, 2018 at 11:58 AM Sid Anand  wrote:
>> 
>>> Hi Folks!
>>> For some history, Airlfow started on GH issues. We also had a very
>> popular
>>> Google group. When we moved to Apache, we were told that Jira was the way
>>> we needed to go for issue tracking because it resided on Apache
>>> infrastructure. When we moved over, we had to drop 100+ GH issues on the
>>> floor -- there was no way to transfer them to Jira and maintain the
>>> original submitter/owner info since there was no mapping of users between
>>> the 2 systems.
>>> 
>>> Here's a pie chart of our existing issues by status:
>>> 
>>> 
>> https://issues.apache.org/jira/secure/ConfigureReport.jspa?projectOrFilterId=project-12320023&statistictype=statuses&selectedProjectId=12320023&reportKey=com.atlassian.jira.jira-core-reports-plugin%3Apie-report&atl_token=A5KQ-2QAV-T4JA-FDED|a85ff737799378265f90bab4f1456b5e2811a507|lin&Next=Next
>> 
>>> <
>> https://issues.apache.org/jira/secure/ConfigureReport.jspa?projectOrFilterId=project-12320023&statistictype=statuses&selectedProjectId=12320023&reportKey=com.atlassian.jira.jira-core-reports-plugin%3Apie-report&atl_token=A5KQ-2QAV-T4JA-FDED%7Ca85ff737799378265f90bab4f1456b5e2811a507%7Clin&Next=Next
>>  
>> 
>>> 
>>> 
>>> I'm attaching a screen shot as well.
>>> 
>>> [image: Screenshot 2018-09-16 11.28.57.png]
>>> 
>>> I think we all agree that there is better integration between GH PRs and
>>> GH Issues than between GH PRs and Jira issues.
>>> 
>>> There are some practical matters to consider:
>>> 
>>>   - For the 1100-1200 unclosed/unresolved issues, how will we transfer
>>>   them to GH or will we drop those on the floor? How would we map
>> submitters
>>>   between the 2 systems, and how would we transfer the
>> content/comments,etc...
>>>   - For the existing closed PRs (>3k), whose PRs reference JIRA, we'd
>>>   need to keep JIRA around in read-only mode so we could reference the
>>>   bug/feature details, but somehow disallow new JIRA creations, lest
>> some
>>>   people continue to use it to create new issues
>>>   - I'm assuming the GH issue naming would not conflict with that of
>>>   JIRA naming in commit message subjects and PRs. In other words,
>>>   incubator-airlow-1 vs AIRFLOW-1 or airflow-1 vs AIRFLOW-1 or possibly
>>>   conflict at AIRFLOW-1? Once we graduate, I'm pretty sure the
>> incubator name
>>>   will be dropped, so there may be a naming conflict.
>>> 
>>> In the end, these are 2 different tools. The issues you raise are mainly
>>> around governance.
>>> 
>>> If you folks would like to propose a new means to manage the JIRAs, can
>>> you outline a solution on Wiki and drop a link into an email on th

Re: Guidelines on Contrib vs Non-contrib

2018-09-18 Thread Ash Berlin-Taylor
Operators and hooks don't need any special plugin system - simply having them 
as as separate Python modules which are imported using normal python semantics 
is enough.

In fact now that I think about it: I want to deprecate the plugins registering 
hooks/operators etc and limit it to only bits which a simple python import 
can't manage - which I think is only anything that needs to be registered with 
another system, such as custom routes in the web UI.

I'll draft an AIP for this soon.

-ash


> On 18 Sep 2018, at 00:50, George Leslie-Waksman  wrote:
> 
> Given we have a plugin system, could we alternatively move away from
> keeping non-core supported code outside of the core project/repo?
> 
> It would hugely decrease the surface area of the main repository and
> testing infrastructure to get most of the contrib code out to its own place.
> 
> Further, it would decrease the committer burden of having to approve/merge
> code that is not supposed to be their responsibility.
> 
> On Mon, Sep 17, 2018 at 4:37 PM Tim Swast  wrote:
> 
>>> Individual operators and hooks living in separate repositories on github
>> (or possibly other Apache projects), which are then distributed by pip and
>> installed as libraries seems like it would scale better.
>> 
>> Pandas did this about a year ago, and it's seemed to have worked well. For
>> example, pandas.read_gbq is a very thin wrapper around pandas_gbq.read_gbq
>> (distributed as a separate package). It has made it easier for me to track
>> issues corresponding to my area of expertise.
>> 
>> On Sun, Sep 16, 2018 at 1:25 PM Jakob Homan  wrote:
>> 
 My understanding as a contributor is that if a hook/operator is in
>> core,
>>> it
 means that a committer is willing to take personal responsibility to
 maintain it (or at least help maintain it), and everything else goes in
 contrib.
>>> 
>>> That's not correct.  All of the code is owned by the entire
>>> community[1]; no one person is responsible for any of it.  There's no
>>> silos, fiefdoms, walled gardens, etc.  If the community cannot support
>>> a piece of code it should be deprecated and subsequently removed.
>>> 
>>> Contrib sections are almost always problematic for this reason.
>>> Hadoop ended up abandoning its.  Because Airflow acts as a gathering
>>> point for so many disparate technologies (databases, storage systems,
>>> compute engines, etc.), trying to keep all of them corralled and up to
>>> date will be very difficult.  Individual operators and hooks living in
>>> separate repositories on github (or possibly other Apache projects),
>>> which are then distributed by pip and installed as libraries seems
>>> like it would scale better.
>>> 
>>> -Jakob
>>> 
>>> [1] https://blogs.apache.org/foundation/entry/success-at-apache-a-newbie
>>> 
>>> On 15 September 2018 at 13:29, Jeff Payne  wrote:
 How many operators are added to contrib per month? Is it too many to
>>> make the decision case by case? If so, then the above mentioned rule
>> sounds
>>> fairly reasonable. However, if that's the rule, shouldn't a bunch of
>>> existing modules be moved from contrib to core?
 
 Get Outlook for Android
 
 
 From: Taylor Edmiston 
 Sent: Saturday, September 15, 2018 1:13:47 PM
 To: dev@airflow.incubator.apache.org
 Subject: Re: Guidelines on Contrib vs Non-contrib
 
 My understanding as a contributor is that if a hook/operator is in
>> core,
>>> it
 means that a committer is willing to take personal responsibility to
 maintain it (or at least help maintain it), and everything else goes in
 contrib.
 
 *Taylor Edmiston*
 Blog  | LinkedIn
  | Stack Overflow
  | Developer
>>> Story
 
 
 
 
 On Sat, Sep 15, 2018 at 2:02 PM Kaxil Naik 
>> wrote:
 
> Hi, all (mainly contributors),
> 
> Can we decide on a common guideline on when a hook/operator should go
>>> under
> contrib vs core?
> 
> Regards,
> 
> *Kaxil Naik*
> *Big Data Consultant *@ *Data Reply UK*
> *Certified *Google Cloud Data Engineer | *Certified* Apache Spark &
>>> Neo4j
> Developer
> *Phone: *+44 (0) 74820 88992
> *LinkedIn*: https://www.linkedin.com/in/kaxil
> 
>>> 
>> --
>> *  •  **Tim Swast*
>> *  •  *Software Friendliness Engineer
>> *  •  *Google Cloud Developer Relations
>> *  •  *Seattle, WA, USA
>> 



Re: Database referral integrity

2018-09-18 Thread Ash Berlin-Taylor
Ooh good spot.

Yes I would be in favour of adding these, but as you say we need to thing about 
how we might migrate old data.

Doing this at 2.0.0 and providing a cleanup script (or doing it as part of the 
migration?) is probably the way to go.

-ash-

> On 17 Sep 2018, at 19:56, Stefan Seelmann  wrote:
> 
> Hi,
> 
> looking into the DB schema there is almost no referral integrity
> enforced at the database level. Many foreign key constraints between
> dag, dag_run, task_instance, xcom, dag_pickle, log, etc would make sense
> IMO.
> 
> Is there a particular reason why that's not implemented?
> 
> Introducing it now will be hard, probably any real-world setup has some
> violations. But I'm still in favor of this additional safety net.
> 
> Kind Regards,
> Stefan



Re: It's very hard to become a committer on the project

2018-09-18 Thread Ash Berlin-Taylor
For background I got involve in Airflow by adding features I wanted, and fixing 
bugs/oddities that I came across - I didn't fix other peoples tickets. Most 
other committers are the same. We've all "scratched our own itch" as it were.

You have proposed a technical solution to a non-technical problem.

The problem is not a lack of visibility on issues (If you are subscribed to the 
commits@ list, which most active committers should be), as I am, then I receive 
an email for every new ticket opened or every comment on an issue.

The problem is time.

No one is paid to work on Airflow full time, we all do it as a side-effect of 
using Airflow for our jobs. And triaging Jira tickets is _very_ hard to justify 
spending that time. I also think you under estimate how much time it takes to 
triage even a single ticket.

I would *love* to be able to spend more time improving airflow, making it 
easier for other people to get involved, improving documentation, and triaging 
issues. But no one is currently paying me or any other committer to do this, so 
any time we spent on Airflow is going to be focused on what we need out of 
airflow. Sorry.

(For the record though, I do dislike Jira, but I don't care enough to think 
about changing.)

Hope this helps explain the situation.

-ash


> On 16 Sep 2018, at 14:29, airflowuser  
> wrote:
> 
> Hello all,
> 
> I'm struggling finding tickets to address and while discussing it on chat 
> others reported they had the same problem when they began working on the 
> project.
> 
> The problem is due to:
> 1. It's very hard to locate tickets on Jira. The categories are a mess, 
> versions are not enforced. each use can tag,label and set priority at his 
> will. No one monitor or overwrite it
> 2. It's impossible for a new committer to  find issues which can be easy-fix 
> and a "good first issue".
> 
> My suggestions:
> 1. Looking at the ticket system there are usually less than 10 new tickets a 
> day. It won't take too much time for someone with knowledge on the project to 
> properly tag  the ticket.
> 
> 2. I think that most of you don't even check the Jira. Most of you submit PRs 
> and 5 seconds before opening a ticket (just because you must). There is no 
> doubt that the Jira is a "side" system which doesn't really perform it's job.
> 
> Take a look at this:
> https://issues.apache.org/jira/projects/AIRFLOW/issues/AIRFLOW-2999
> a member of the community asks for committers for input but no one replies. I 
> doubt this is because no one has input.. I am sure that if a PR was submitted 
> you had comments. It's simply because you don't see it. This is why I think 
> the current Jira does't function properly. I think that Github can perform a 
> better role. All of you as committers are already there and it's always 
> better to work with one system rather with two. The colors and labels of the 
> GitHub as very easy to notice.
> 
> Either way, what ever you decide something needs to be change. Either Jira 
> will be more informative or move to GitHub.
> 
> Thank you all for your good work :)



Re: Call for fixes for Airflow 1.10.1

2018-09-18 Thread Ash Berlin-Taylor
Thanks Kaxil!

I'll get on with finishing off the 1.10.1 release next week after my talk at 
the London Meetup.

-ash

> On 15 Sep 2018, at 16:03, Kaxil Naik  wrote:
> 
> I have cherry-picked the fix for this issue on top of 1.10-test branch
> along-with 16 other commits (the ones I listed earlier in the mail + their
> dependent commits)
> 
> Regards,
> Kaxil
> 
> On Fri, Sep 14, 2018 at 6:07 AM Gerardo Curiel  wrote:
> 
>> Hello,
>> 
>> I wonder if there is still time to wait for a fix for "BigQuery hook does
>> not allow specifying both the partition field name and table name at the
>> same time": https://issues.apache.org/jira/browse/AIRFLOW-2772.
>> 
>> The BigQueryHook is taking some liberties and implementing some client-side
>> logic that shouldn't be there.
>> 
>> 
>> On Wed, Sep 12, 2018 at 6:59 PM Driesprong, Fokko 
>> wrote:
>> 
>>> Hi Ash,
>>> 
>>> I've cherry-picked two commits on top of 1.10-test branch:
>>> 
>> 
>> Cheers,
>> 
>> --
>> Gerardo Curiel // https://gerar.do
>> 
> 
> 
> -- 
> *Kaxil Naik*
> *Big Data Consultant *@ *Data Reply UK*
> *Certified *Google Cloud Data Engineer | *Certified* Apache Spark & Neo4j
> Developer
> *Phone: *+44 (0) 74820 88992
> *LinkedIn*: https://www.linkedin.com/in/kaxil



Re: [External] Dynamic tasks in a dag?

2018-09-18 Thread Ash Berlin-Taylor
This isn't needed as the tasks are added to the dag when specified, so they DAG 
object keeps track of the tasks.

-ash 

> On 14 Sep 2018, at 18:30, Alex Tronchin-James 949-412-7220 
>  wrote:
> 
> Don't you need to preserve the task objects? Your implementation overwrites
> each by the successor, so only the last task would be kept, despite your
> print statements. Try building a list or dict of tasks like:
> 
> tasks =[] #only at the top
> for file in glob('dags/snowsql/create/udf/*.sql'):
> print("FILE {}".format(file))
> tasks.append(
> create_snowflake_operator(file, dag, 'snowflake_default')
> )
> tasks[-1].set_upstream(start)
> 
> On Fri, Sep 14, 2018 at 17:20 Frank Maritato
>  wrote:
> 
>> Ok, my mistake. I thought that command was querying the server for its
>> information and not just looking in a directory relative to where it is
>> being run. I have it working now. Thanks Chris and Sai!
>> 
>> 
>> On 9/14/18, 9:58 AM, "Chris Palmer"  wrote:
>> 
>>   The relative paths might work from where ever you are evoking 'airflow
>>   list_tasks', but that doesn't mean they work from wherever the
>> webserver is
>>   parsing the dags from.
>> 
>>   Does running 'airflow list_tasks' from some other running directory
>> work?
>> 
>>   On Fri, Sep 14, 2018 at 12:35 PM Frank Maritato
>>wrote:
>> 
>>> Do you mean give the full path to the files? The relative path I'm
>> using
>>> definitely works. When I type airflow list_dags, I can see the
>> output from
>>> the print statements that the glob is finding my sql files and
>> creating the
>>> snowflake operators.
>>> 
>>> airflow list_tasks workflow also lists all the operators I'm
>> creating. I'm
>>> just not seeing them in the ui.
>>> 
>>> On 9/14/18, 9:10 AM, "Sai Phanindhra"  wrote:
>>> 
>>>   Hi frank,
>>>   Can you try giving global paths?
>>> 
>>>   On Fri 14 Sep, 2018, 21:35 Frank Maritato, <
>> fmarit...@opentable.com
>>> .invalid>
>>>   wrote:
>>> 
 Hi,
 
 I'm using apache airflow 1.10.0 and I'm trying to dynamically
>>> generate
 some tasks in my dag based on files that are in the dags
>> directory.
>>> The
 problem is, I don't see these tasks in the ui, I just see the
>>> 'start' dummy
 operator. If I type 'airflow list_tasks workflow', they are
>> listed.
 Thoughts?
 
 Here is how I'm generating the tasks:
 
 
 def create_snowflake_operator(file, dag, snowflake_connection):
   file_repl = file.replace('/', '_')
   file_repl = file_repl.replace('.sql', '')
   print("TASK_ID {}".format(file_repl))
   return SnowflakeOperator(
   dag=dag,
   task_id='create_{}'.format(file_repl),
   snowflake_conn_id=snowflake_connection,
   sql=file
   )
 
 DAG_NAME = 'create_objects'
 dag = DAG(
   DAG_NAME,
   default_args=args,
   dagrun_timeout=timedelta(hours=2),
   schedule_interval=None,
 )
 
 start = DummyOperator(
   dag=dag,
   task_id="start",
 )
 
 print("creating snowflake operators")
 
 for file in glob('dags/snowsql/create/udf/*.sql'):
   print("FILE {}".format(file))
   task = create_snowflake_operator(file, dag,
>> 'snowflake_default')
   task.set_upstream(start)
 
 for file in glob('dags/snowsql/create/table/*.sql'):
   print("FILE {}".format(file))
   task = create_snowflake_operator(file, dag,
>> 'snowflake_default')
   task.set_upstream(start)
 
 for file in glob('dags/snowsql/create/view/*.sql'):
   print("FILE {}".format(file))
   task = create_snowflake_operator(file, dag,
>> 'snowflake_default')
   task.set_upstream(start)
 
 print("done {}".format(start.downstream_task_ids))
 
 Thanks in advance
 --
 Frank Maritato
 
>>> 
>>> 
>>> 
>> 
>> 
>> 



Re: [External] Dynamic tasks in a dag?

2018-09-18 Thread Ash Berlin-Taylor
This isn't needed as the tasks are added to the dag when specified, so they DAG 
object keeps track of the tasks.

-ash 

> On 14 Sep 2018, at 18:30, Alex Tronchin-James 949-412-7220 
>  wrote:
> 
> Don't you need to preserve the task objects? Your implementation overwrites
> each by the successor, so only the last task would be kept, despite your
> print statements. Try building a list or dict of tasks like:
> 
> tasks =[] #only at the top
> for file in glob('dags/snowsql/create/udf/*.sql'):
> print("FILE {}".format(file))
> tasks.append(
> create_snowflake_operator(file, dag, 'snowflake_default')
> )
> tasks[-1].set_upstream(start)
> 
> On Fri, Sep 14, 2018 at 17:20 Frank Maritato
>  wrote:
> 
>> Ok, my mistake. I thought that command was querying the server for its
>> information and not just looking in a directory relative to where it is
>> being run. I have it working now. Thanks Chris and Sai!
>> 
>> 
>> On 9/14/18, 9:58 AM, "Chris Palmer"  wrote:
>> 
>>The relative paths might work from where ever you are evoking 'airflow
>>list_tasks', but that doesn't mean they work from wherever the
>> webserver is
>>parsing the dags from.
>> 
>>Does running 'airflow list_tasks' from some other running directory
>> work?
>> 
>>On Fri, Sep 14, 2018 at 12:35 PM Frank Maritato
>> wrote:
>> 
>>> Do you mean give the full path to the files? The relative path I'm
>> using
>>> definitely works. When I type airflow list_dags, I can see the
>> output from
>>> the print statements that the glob is finding my sql files and
>> creating the
>>> snowflake operators.
>>> 
>>> airflow list_tasks workflow also lists all the operators I'm
>> creating. I'm
>>> just not seeing them in the ui.
>>> 
>>> On 9/14/18, 9:10 AM, "Sai Phanindhra"  wrote:
>>> 
>>>Hi frank,
>>>Can you try giving global paths?
>>> 
>>>On Fri 14 Sep, 2018, 21:35 Frank Maritato, <
>> fmarit...@opentable.com
>>> .invalid>
>>>wrote:
>>> 
 Hi,
 
 I'm using apache airflow 1.10.0 and I'm trying to dynamically
>>> generate
 some tasks in my dag based on files that are in the dags
>> directory.
>>> The
 problem is, I don't see these tasks in the ui, I just see the
>>> 'start' dummy
 operator. If I type 'airflow list_tasks workflow', they are
>> listed.
 Thoughts?
 
 Here is how I'm generating the tasks:
 
 
 def create_snowflake_operator(file, dag, snowflake_connection):
file_repl = file.replace('/', '_')
file_repl = file_repl.replace('.sql', '')
print("TASK_ID {}".format(file_repl))
return SnowflakeOperator(
dag=dag,
task_id='create_{}'.format(file_repl),
snowflake_conn_id=snowflake_connection,
sql=file
)
 
 DAG_NAME = 'create_objects'
 dag = DAG(
DAG_NAME,
default_args=args,
dagrun_timeout=timedelta(hours=2),
schedule_interval=None,
 )
 
 start = DummyOperator(
dag=dag,
task_id="start",
 )
 
 print("creating snowflake operators")
 
 for file in glob('dags/snowsql/create/udf/*.sql'):
print("FILE {}".format(file))
task = create_snowflake_operator(file, dag,
>> 'snowflake_default')
task.set_upstream(start)
 
 for file in glob('dags/snowsql/create/table/*.sql'):
print("FILE {}".format(file))
task = create_snowflake_operator(file, dag,
>> 'snowflake_default')
task.set_upstream(start)
 
 for file in glob('dags/snowsql/create/view/*.sql'):
print("FILE {}".format(file))
task = create_snowflake_operator(file, dag,
>> 'snowflake_default')
task.set_upstream(start)
 
 print("done {}".format(start.downstream_task_ids))
 
 Thanks in advance
 --
 Frank Maritato
 
>>> 
>>> 
>>> 
>> 
>> 
>> 



Re: Call for fixes for Airflow 1.10.1

2018-09-09 Thread Ash Berlin-Taylor
I've (re)created the v1-10-test branch with some of the fixes cherry-picked in. 
I can't give much time this week (as what spare time I have is being used up 
working on my talk) but I'll work more on this towards the end of next week.

I'll look at resolved Jira tickets targeted with a fix version of 1.10.1 (i.e. 
if you want it in 1.10.1, merge the pr into master and also mark the Jira as 
fix in 1.10.1 and I'll work on cherry-picking the fixes. If they can be. If it 
is diffucult/has other things to cherry pick in I might change the fix version 
on you.)

-ash


> On 9 Sep 2018, at 19:22, Ash Berlin-Taylor  wrote:
> 
> On 9 September 2018 18:19:40 BST, Bolke de Bruin  wrote:
> You can already add them to v1-10-test. 
> 
> Normally we are a bit cautious to this if you are not the release manager to 
> ensure that he/she knows what the state is. 
> 
> B
> 
> Op zo 9 sep. 2018 18:02 schreef Driesprong, Fokko :
> Can we add this one as well?
> 
> https://github.com/apache/incubator-airflow/pull/3862 
> <https://github.com/apache/incubator-airflow/pull/3862>
> https://issues.apache.org/jira/browse/AIRFLOW-1917 
> <https://issues.apache.org/jira/browse/AIRFLOW-1917>
> 
> I'm happy to cherry pick them onto the 1.10.1 by myself as well. Any idea
> when we will start this branch?
> 
> Cheers, Fokko
> 
> Op do 6 sep. 2018 om 08:08 schreef Deng Xiaodong  <mailto:xd.den...@gmail.com>>:
> 
> > Hi Ash,
> >
> >
> > May you consider including JIRA ticket 2848 (PR 3693, Ensure dag_id in
> > metadata "job" for LocalTaskJob) in 1.10.1 as well?
> >
> > https://issues.apache.org/jira/browse/AIRFLOW-2848 
> > <https://issues.apache.org/jira/browse/AIRFLOW-2848>
> >
> > https://github.com/apache/incubator-airflow/pull/3693 
> > <https://github.com/apache/incubator-airflow/pull/3693>
> >
> >
> > This is a bug in terms of metadata, which also affects the UI
> > “Browse->Jobs”.
> >
> >
> > Thanks.
> >
> >
> > Regards,
> >
> > XD
> >
> > On Wed, Sep 5, 2018 at 23:55 Bolke de Bruin  > <mailto:bdbr...@gmail.com>> wrote:
> >
> > > You should push these to v1-10-test not to stable. Only once we start
> > > cutting RCs you should push to -stable. See the docs. This ensures a
> > stable
> > > “stable”branch.
> > >
> > > Cheers
> > > Bolke.
> > >
> > > > On 3 Sep 2018, at 14:20, Ash Berlin-Taylor  > > > <mailto:a...@apache.org>> wrote:
> > > >
> > > > Hi everyone,
> > > >
> > > > I'm starting the process of gathering fixes for a 1.10.1. So far the
> > > list of issues I have that we should pull in are
> > >
> > https://issues.apache.org/jira/issues/?jql=project%20%3D%20AIRFLOW%20AND%20fixVersion%20%3D%201.10.1%20ORDER%20BY%20key%20ASC
> >  
> > <https://issues.apache.org/jira/issues/?jql=project%20%3D%20AIRFLOW%20AND%20fixVersion%20%3D%201.10.1%20ORDER%20BY%20key%20ASC>
> > > (reproduces below)
> > > >
> > > > I will start pushing these as cherry-picked commits to the v1-10-stable
> > > branch today.
> > > >
> > > > If you have something that is not in the list below let me know. I'd
> > > like to keep this to bug fixes against 1.10.0 only if possible.
> > > >
> > > > https://issues.apache.org/jira/browse/AIRFLOW-2145 
> > > > <https://issues.apache.org/jira/browse/AIRFLOW-2145> Deadlock after
> > > clearing a running task
> > > > https://github.com/apache/incubator-airflow/pull/3657 
> > > > <https://github.com/apache/incubator-airflow/pull/3657>
> > > >
> > > > https://issues.apache.org/jira/browse/AIRFLOW-2476 
> > > > <https://issues.apache.org/jira/browse/AIRFLOW-2476> update tabulate dep
> > > to 0.8.2
> > > > https://github.com/apache/incubator-airflow/pull/3835 
> > > > <https://github.com/apache/incubator-airflow/pull/3835>
> > > >
> > > > https://issues.apache.org/jira/browse/AIRFLOW-2778 
> > > > <https://issues.apache.org/jira/browse/AIRFLOW-2778> Bad Import in
> > > collect_dag in DagBag
> > > > https://github.com/apache/incubator-airflow/pull/3624 
> > > > <https://github.com/apache/incubator-airflow/pull/3624>
> > > >
> > > > https://issues.apache.org/jira/browse/AIRFLOW-2900 
> > > > <https://issues.apache.org/jira/browse/AIRFLOW-2900> Show code f

Re: Call for fixes for Airflow 1.10.1

2018-09-05 Thread Ash Berlin-Taylor
That is a good idea! Though it would involve some work right now as I don't 
think anyone (committers or contributors) have been particularly careful of 
what type of Jira issue they create. Or at least not uniformly.

-ash

> On 5 Sep 2018, at 14:08, airflowuser  
> wrote:
> 
> May I suggest for future releases to show the change log  as:
> 
> Bug Fixes:
> 
> New Features:
> 
> etc...
> 
> This makes it easier to look over the list. This shouldn't be manual work.. 
> it can be taken from the Jira ticket.
> 
> 
> Sent with ProtonMail Secure Email.
> 
> ‐‐‐‐‐‐‐ Original Message ‐‐‐
> On September 3, 2018 3:20 PM, Ash Berlin-Taylor  wrote:
> 
>> Hi everyone,
>> 
>> I'm starting the process of gathering fixes for a 1.10.1. So far the list of 
>> issues I have that we should pull in are 
>> https://issues.apache.org/jira/issues/?jql=project %3D AIRFLOW AND 
>> fixVersion %3D 1.10.1 ORDER BY key ASC (reproduces below)
>> 
>> I will start pushing these as cherry-picked commits to the v1-10-stable 
>> branch today.
>> 
>> If you have something that is not in the list below let me know. I'd like to 
>> keep this to bug fixes against 1.10.0 only if possible.
>> 
>> https://issues.apache.org/jira/browse/AIRFLOW-2145 Deadlock after clearing a 
>> running task
>> https://github.com/apache/incubator-airflow/pull/3657
>> 
>> https://issues.apache.org/jira/browse/AIRFLOW-2476 update tabulate dep to 
>> 0.8.2
>> https://github.com/apache/incubator-airflow/pull/3835
>> 
>> https://issues.apache.org/jira/browse/AIRFLOW-2778 Bad Import in collect_dag 
>> in DagBag
>> https://github.com/apache/incubator-airflow/pull/3624
>> 
>> https://issues.apache.org/jira/browse/AIRFLOW-2900 Show code for packaged 
>> DAGs
>> https://github.com/apache/incubator-airflow/pull/3749
>> 
>> https://issues.apache.org/jira/browse/AIRFLOW-2949 Add syntax highlight for 
>> single quote strings
>> https://github.com/apache/incubator-airflow/pull/3795
>> 
>> https://issues.apache.org/jira/browse/AIRFLOW-2984 Cannot convert 
>> naive_datetime when task has a naive start_date/end_date
>> https://github.com/apache/incubator-airflow/pull/3822
>> 
>> https://issues.apache.org/jira/browse/AIRFLOW-2979 Deprecated Celery Option 
>> not in Options list
>> https://github.com/apache/incubator-airflow/pull/3832
>> 
>> https://issues.apache.org/jira/browse/AIRFLOW-2989 No Parameter to change 
>> bootDiskType for DataprocClusterCreateOperator
>> https://github.com/apache/incubator-airflow/pull/3825
>> 
>> https://issues.apache.org/jira/browse/AIRFLOW-2990 Docstrings for 
>> Hooks/Operators are in incorrect format
>> https://github.com/apache/incubator-airflow/pull/3820
>> 
>> https://issues.apache.org/jira/browse/AIRFLOW-2994 flatten_results in 
>> BigQueryOperator/BigQueryHook should default to None
>> https://github.com/apache/incubator-airflow/pull/3829
>> 
>> In addition to those PRs which are already marked with Fix Version of 1.10.1 
>> I think we should also pull in these:
>> 
>> https://issues.apache.org/jira/browse/AIRFLOW-2713 Rename async variable for 
>> Python 3.7.0 compatibility
>> https://github.com/apache/incubator-airflow/pull/3561
>> 
>> https://issues.apache.org/jira/browse/AIRFLOW-2895 Prevent scheduler from 
>> spamming heartbeats/logs
>> https://github.com/apache/incubator-airflow/pull/3747
>> 
>> https://issues.apache.org/jira/browse/AIRFLOW-2921 A trivial incorrectness 
>> in CeleryExecutor()
>> https://github.com/apache/incubator-airflow/pull/3773
>> 
>> https://issues.apache.org/jira/browse/AIRFLOW-2866 Missing CSRF Token Error 
>> on Web RBAC UI Create/Update Operations
>> https://github.com/apache/incubator-airflow/pull/3804
>> 
>> https://issues.apache.org/jira/browse/AIRFLOW-2951
>> https://github.com/apache/incubator-airflow/pull/3798 Update dag_run table 
>> end_date when state change
>> (though as written it has a few other deps to cherry pick in, so will see 
>> about this one)
> 
> 



Re: Add git tag for 1.10

2018-09-03 Thread Ash Berlin-Taylor
It is above the heading of Airflow 1.10, i.e. in the Airflow Master section 
already.

-ash

> On 3 Sep 2018, at 13:53, Robin Edwards  wrote:
> 
> I am not sure if anyone's aware of this, the 1.10.0 tag and the PyPi upload
> dont contain the 'BashTaskRunner' -> 'StandardTaskRunner' change.
> 
> The docs in master do
> https://github.com/apache/incubator-airflow/blob/master/UPDATING.md (They
> don't in the 1.10.0 tag)
> 
> If this is intentional and the change is going to be in 1.10.1 perhaps it
> should be put under a new heading in UPDATING.md?
> 
> It just tripped me up as I thought it was part of 1.10
> 
> 
> On Fri, Aug 31, 2018 at 8:00 AM, Kaxil Naik  wrote:
> 
>> We already do have it:
>> https://github.com/apache/incubator-airflow/releases/tag/1.10.0
>> 
>> On Fri, 31 Aug 2018, 06:23 Beau Barker,  wrote:
>> 
>>> Can we please tag the final v1.10 commit?
>>> 
>> 



Call for fixes for Airflow 1.10.1

2018-09-03 Thread Ash Berlin-Taylor
Hi everyone,

I'm starting the process of gathering fixes for a 1.10.1. So far the list of 
issues I have that we should pull in are 
https://issues.apache.org/jira/issues/?jql=project%20%3D%20AIRFLOW%20AND%20fixVersion%20%3D%201.10.1%20ORDER%20BY%20key%20ASC
 (reproduces below)

I will start pushing these as cherry-picked commits to the v1-10-stable branch 
today.

If you have something that is not in the list below let me know. I'd like to 
keep this to bug fixes against 1.10.0 only if possible.

https://issues.apache.org/jira/browse/AIRFLOW-2145 Deadlock after clearing a 
running task
https://github.com/apache/incubator-airflow/pull/3657

https://issues.apache.org/jira/browse/AIRFLOW-2476 update tabulate dep to 0.8.2
https://github.com/apache/incubator-airflow/pull/3835

https://issues.apache.org/jira/browse/AIRFLOW-2778 Bad Import in collect_dag in 
DagBag
https://github.com/apache/incubator-airflow/pull/3624

https://issues.apache.org/jira/browse/AIRFLOW-2900 Show code for packaged DAGs
https://github.com/apache/incubator-airflow/pull/3749

https://issues.apache.org/jira/browse/AIRFLOW-2949 Add syntax highlight for 
single quote strings
https://github.com/apache/incubator-airflow/pull/3795

https://issues.apache.org/jira/browse/AIRFLOW-2984 Cannot convert 
naive_datetime when task has a naive start_date/end_date
https://github.com/apache/incubator-airflow/pull/3822

https://issues.apache.org/jira/browse/AIRFLOW-2979 Deprecated Celery Option not 
in Options list
https://github.com/apache/incubator-airflow/pull/3832 

https://issues.apache.org/jira/browse/AIRFLOW-2989 No Parameter to change 
bootDiskType for DataprocClusterCreateOperator
https://github.com/apache/incubator-airflow/pull/3825

https://issues.apache.org/jira/browse/AIRFLOW-2990 Docstrings for 
Hooks/Operators are in incorrect format
https://github.com/apache/incubator-airflow/pull/3820

https://issues.apache.org/jira/browse/AIRFLOW-2994 flatten_results in 
BigQueryOperator/BigQueryHook should default to None
https://github.com/apache/incubator-airflow/pull/3829


In addition to those PRs which are already marked with Fix Version of 1.10.1 I 
think we should also pull in these:


https://issues.apache.org/jira/browse/AIRFLOW-2713 Rename async variable for 
Python 3.7.0 compatibility
https://github.com/apache/incubator-airflow/pull/3561

https://issues.apache.org/jira/browse/AIRFLOW-2895 Prevent scheduler from 
spamming heartbeats/logs
https://github.com/apache/incubator-airflow/pull/3747

https://issues.apache.org/jira/browse/AIRFLOW-2921 A trivial incorrectness in 
CeleryExecutor()
https://github.com/apache/incubator-airflow/pull/3773

https://issues.apache.org/jira/browse/AIRFLOW-2866 Missing CSRF Token Error on 
Web RBAC UI Create/Update Operations
https://github.com/apache/incubator-airflow/pull/3804


https://issues.apache.org/jira/browse/AIRFLOW-2951 
https://github.com/apache/incubator-airflow/pull/3798 Update dag_run table 
end_date when state change
(though as written it has a few other deps to cherry pick in, so will see about 
this one)



Re: Apache Spark Interfering with Airflow Jira/PRs ??

2018-09-03 Thread Ash Berlin-Taylor
Irrespective of the mis-configuration (Spark vs Airflow, bug causing loop etc.) 
is it possible to do the action that is triggering the new email? We already 
have Github creating some comments in Jira (which result in an email) and would 
like to keep the email volume down.

Might be worth checking what settings Spark have for their Gitbox integration 
Jira and copying that?

-ash

> On 2 Sep 2018, at 21:57, Holden Karau  wrote:
> 
> Really sorry for the noise on JIRA, I've shut down the app and I'll try and
> figure out how it ended up doings this.
> 
> On Sun, Sep 2, 2018 at 1:17 PM Holden Karau  wrote:
> 
>> huh the one with "Apache Spark" in the name doesn't make any sense, I'll
>> turn off the airflow dashboard but I don't have the credentials to the
>> Apache Spark JIRA account so I'm super confused how that's happening (my
>> guess would be there's an unprotected queue in the Spark version that's
>> somehow gotten entries in it or someone put some credentials in the src
>> repo is forked from).
>> 
>> On Sun, Sep 2, 2018 at 1:12 PM Holden Karau 
>> wrote:
>> 
>>> Yup that's me, I've turned off the linking feature it was stuck failing
>>> in a loop because of permissions (it had half of the permissions required).
>>> 
>>> On Sun, Sep 2, 2018 at 1:07 PM Arthur Wiedmer 
>>> wrote:
>>> 
 I am guessing Holden is testing the JIRA bot/integration, since some of
 the JIRA's are being assigned to her magical unicorn 🦄.
 
 Holden?
 
 Best,
 Arthur
 
 On Sun, Sep 2, 2018, 12:48 Kaxil Naik  wrote:
 
> I am getting that too. Tons of emails as well about the same. Not sure
> the
> reason.
> 
> On Sun, 2 Sep 2018, 20:45 Sid Anand,  wrote:
> 
>> Fellow committers... what happened around 11a PT today? I see a flood
> of
>> updates to our Jiras from "Apache Spark".
>> 
>> Some examples:
>> 
>>   1.
>> 
>> 
> https://issues.apache.org/jira/browse/AIRFLOW-2408?page=com.atlassian.jira.plugin.system.issuetabpanels%3Acomment-tabpanel&focusedCommentId=16601616#comment-16601616
>>   2. It also updated
> https://issues.apache.org/jira/browse/AIRFLOW-2553
>> as
>>   resolved and closed
>> https://github.com/apache/incubator-airflow/pull/3451
>>   PR
>> 
>> What's going on?
>> -s
>> 
> 
 
>>> 
>>> --
>>> Cell : 425-233-8271
>>> 
>> 
>> 
>> --
>> Cell : 425-233-8271
>> 
> 
> 
> -- 
> Cell : 425-233-8271



Re: Retiring Airflow Gitter?

2018-09-01 Thread Ash Berlin-Taylor
I'm not a fan of slack for open source work - it's a walled garden, signing up 
is a hurdle (where you have to use a work around of a heroku app), not to 
mention that the client is just so memory hungry!

I'm just a curmudgeon who still likes IRC mainly. So long as I can install 
https://slack.com/apps/A7DL60U5D-irccloud/ 
 I won't object ;)

(I am not a fan of Gitter either, but I am constantly logged in via their IRC 
gateway. I'm usually the only PPMC responding in there, Bolke pops up from 
time-to-time too)

-ash

> On 1 Sep 2018, at 02:40, Sid Anand  wrote:
> 
> Great feedback. There is an overwhelming interest in moving Gitter to
> Slack. Just curious: for the folks who set up Slack for other Apache
> projects, did you go via an Apache Infra ticket?
> 
> -s
> 
> On Fri, Aug 31, 2018 at 4:32 PM James Meickle
>  wrote:
> 
>> I am in the gitter chat most work days and there's always activity.
>> 
>> I would be fine with switching to permanent retention slack for
>> searchability but don't see the point of switching without that feature.
>> 
>> On Fri, Aug 31, 2018, 12:59 Sid Anand  wrote:
>> 
>>> For a while now, we have had an Airflow Gitter account. Though this
>> seemed
>>> like a good idea initially, I'd like to hear from the community if anyone
>>> gets value of out it. I don't believe any of the committers spend any
>> time
>>> on Gitter.
>>> 
>>> Early on, the initial committers tried to be available on it, but soon
>>> found it impossible to be available on all the timezones in which we had
>>> users. Furthermore, Gitter notoriously sucks at making previously
>> answered
>>> questions discoverable. Also, the single-threaded nature of Gitter
>>> essentially makes it confusing to debug/discuss more than on topic at a
>>> time.
>>> 
>>> The community seems to be humming along by relying on the Apache mailing
>>> lists, which don't suffer the downside listed above. Hence, as newbies
>> join
>>> Apache Airflow, they likely hop onto Gitter. Are they getting value from
>>> it? If not, perhaps we are doing them a disservice and should consider
>> just
>>> deleting it.
>>> 
>>> Thoughts welcome.
>>> -s
>>> 
>> 



Re: Missing operators in the docs

2018-08-30 Thread Ash Berlin-Taylor
There's a setting available for sphinx projects where imports/modules can be 
mocked - it might be worth exploring 
http://www.sphinx-doc.org/en/master/usage/extensions/autodoc.html?highlight=autodoc_mock_imports#confval-autodoc_mock_imports
 

 for some of the harder-to-install modules?

-ash

> On 29 Aug 2018, at 21:53, Kaxil Naik  wrote:
> 
> I have fixed the issue on https://airflow.apache.org/ , added a comment on
> confluence as well. Will try to fix ReadTheDocs environment (Have opened a
> Jira for it) as it can't install all the dependencies that depend on C
> Modules (
> https://read-the-docs.readthedocs.io/en/latest/faq.html#i-get-import-errors-on-libraries-that-depend-on-c-modules
> )
> 
> Regards,
> Kaxil
> 
> On Wed, Aug 29, 2018 at 9:15 PM Kaxil Naik  wrote:
> 
>> Will have a look and resolve it.
>> 
>> On Wed, Aug 29, 2018 at 8:25 PM Maxime Beauchemin <
>> maximebeauche...@gmail.com> wrote:
>> 
>>> Looks like both.
>>> 
>>> On Wed, Aug 29, 2018 at 12:18 PM Kaxil Naik  wrote:
>>> 
 Hi Max,
 
 Did you see that on readthedocs or airflow.apache one?
 
 On Wed, 29 Aug 2018, 20:15 Maxime Beauchemin, <
>>> maximebeauche...@gmail.com>
 wrote:
 
> Hey committers,
> 
> I noticed that some of the operators are missing from the API
>>> reference
> part of the docs (HiveOperator for instance). I'm guessing a committer
> generated / pushed the docs with some libs missing and that the
>>> operators
> depending on those missing libs got skipped.
> 
> We may have to improve the doc-generation wiki page or make a
>>> bulletproof
> shell script that ensures all libs are installed prior to generating
>>> the
> docs.
> 
> Max
> 
 
>>> 
>> 
>> 
>> --
>> *Kaxil Naik*
>> *Big Data Consultant *@ *Data Reply UK*
>> *Certified *Google Cloud Data Engineer | *Certified* Apache Spark & Neo4j
>> Developer
>> *Phone: *+44 (0) 74820 88992
>> *LinkedIn*: https://www.linkedin.com/in/kaxil
>> 
> 
> 
> -- 
> *Kaxil Naik*
> *Big Data Consultant *@ *Data Reply UK*
> *Certified *Google Cloud Data Engineer | *Certified* Apache Spark & Neo4j
> Developer
> *Phone: *+44 (0) 74820 88992
> *LinkedIn*: https://www.linkedin.com/in/kaxil



Re: Python 3.6 Support for Airflow 1.10.0

2018-08-29 Thread Ash Berlin-Taylor
Okay, a worrying (to me) number of people are still on Python 2.7 :(

AIP created anyway 
https://cwiki.apache.org/confluence/display/AIRFLOW/AIP-3+Drop+support+for+Python+2

If you are adding support for a Py3 to your platform I'd skip 3.5 and go 
straight to 3.6/3.7

Let's not forget that supporting Python 2 is not free for us - both in terms of 
test time on Travis, and in terms of code complexity in having to use 
six/backports all over the place.


> On 29 Aug 2018, at 09:46, Sumit Maheshwari  wrote:
> 
> Strong +1 for keeping 2.7 for now. We've 30+ big and small enterprises
> running Airflow as a service on our platform and they all are on py 2.7 for
> now. Soon we are adding support for py 3.5 as well, but still, not everyone
> gonna switch to that immediately.
> 
> So IMO we should keep the support of py 2.7 until mid-2019.
> 
> On Wed, Aug 29, 2018 at 6:01 AM Tao Feng  wrote:
> 
>> +1 for keeping 2.7 as well. Lyft runs airflow on 2.7 internally.
>> 
>> On Tue, Aug 28, 2018 at 1:12 PM, Feng Lu 
>> wrote:
>> 
>>> +1 for keeping 2.7 as long as we can so people have time to plan and
>>> migrate away from it.
>>> 
>>> On Tue, Aug 28, 2018, 10:35 Arthur Wiedmer 
>>> wrote:
>>> 
>>>> Given that Python 2.7 EOL is slated for January 1st 2020, we should
>>>> probably ensure that the early releases of 2019 are still 2.7
>> compatible.
>>>> 
>>>> Beyond this, I think we can also be responsible security wise and help
>>>> nudge people towards 3.
>>>> 
>>>> Best,
>>>> Arthur
>>>> 
>>>> On Tue, Aug 28, 2018 at 10:28 AM Bolke de Bruin 
>>> wrote:
>>>> 
>>>>> Let’s not drop 2.7 too quickly but maybe mark it deprecated. I’m
>> pretty
>>>>> sure Airbnb still runs on 2.7.
>>>>> 
>>>>> Also RedHat does not deliver python 3 in its enterprise edition yet
>> by
>>>>> default so it will put enterprise users in a bit of an awkward spot.
>>>>> 
>>>>> B.
>>>>> 
>>>>> Verstuurd vanaf mijn iPad
>>>>> 
>>>>>> Op 28 aug. 2018 om 19:00 heeft Sid Anand  het
>>>>> volgende geschreven:
>>>>>> 
>>>>>> I'm +1 on going to 3.7 -- I'm running 3.6 myself.
>>>>>> 
>>>>>> Regarding dropping Python2 support, with almost 200 companies using
>>>>>> Airflow, I'd want to be very careful that we don't put any of them
>>> at a
>>>>>> disadvantage. For example, my former employer (a small startup) is
>>>>> running
>>>>>> on Python2 -- after I left, they don't have anyone actively
>>> maintaining
>>>>> it
>>>>>> at the company. Easing upgrades for such cases will keep them using
>>>>> Airflow.
>>>>>> 
>>>>>> It would be good to hold a survey that we promote beyond daily
>>> readers
>>>> of
>>>>>> this mailing list and raise this as an AIP, since it's a major
>>> change.
>>>>>> Let's not rush it.
>>>>>> 
>>>>>> -s
>>>>>> 
>>>>>>> On Tue, Aug 28, 2018 at 9:24 AM Naik Kaxil 
>>> wrote:
>>>>>>> 
>>>>>>> We should definitely support 3.7. I left comments on the PR
>>> @tedmiston
>>>>>>> regarding the same. Python 2.7 will be dropped in 2020, so I guess
>>> we
>>>>>>> should start planning about it. Not really 100% sure though that
>> we
>>>>> should
>>>>>>> drop it in Airflow 2.0
>>>>>>> 
>>>>>>> On 28/08/2018, 17:08, "Taylor Edmiston" 
>>> wrote:
>>>>>>> 
>>>>>>>   I am onboard with dropping Python 2.x support.  Django
>> officially
>>>>>>> dropped
>>>>>>>   Python 2.x support with their 2.0 release since December 2017.
>>>>>>> 
>>>>>>>   *Taylor Edmiston*
>>>>>>>   Blog <https://blog.tedmiston.com/> | CV
>>>>>>>   <https://stackoverflow.com/cv/taylor> | LinkedIn
>>>>>>>   <https://www.linkedin.com/in/tedmiston/> | AngelList
>>>>>>>   <https://angel.co/taylor> | Stack Overflow
>>>>>>>   <https://stackove

Re: Running unit tests against SLUGIFY_USES_TEXT_UNIDECODE and AIRFLOW_GPL_UNIDECODE (also is this broken?)

2018-08-29 Thread Ash Berlin-Taylor
I don't think we strictly care about running the tests in both these 
circumstances - it is a flag that control which dep is installed two or three 
levels down as you say, and the project has it's own tests.

I'd rather we spent time on replacing python-nvd with something that means we 
don't have to force our users to install it in a slightly odd way, and instead 
can just `pip install apache-airflow` again.

The reason this setting might not make any difference to what is installed 
comes down to Python Wheels - i.e. binary packages. When pip installs a wheel 
(which it does for preference as it is quicker) it doesn't run that project's 
setup.py anymore, so this block 
https://github.com/un33k/python-slugify/blob/master/setup.py#L18-L21 
 isn't 
run.

The fix if you want to make it run this block still is `pip install 
--no-binary=python-slugify apache-airflow`

> On 28 Aug 2018, at 21:18, Taylor Edmiston  wrote:
> 
> Since the release of 1.10, we now have the option to install Airflow with
> either:
> 
> 1. python-nvd3  --> python-slugify
>  --> text-unidecode
>  (via env var
> SLUGIFY_USES_TEXT_UNIDECODE=yes), or
> 2. python-nvd3 --> python-slugify --> unidecode
>  (via AIRFLOW_GPL_UNIDECODE=yes)
> 
> Currently on Travis CI we only test the former configuration.  Does anyone
> have a recommendation on how to go about testing the latter?  Running an
> entire second copy of the unit tests for one dependency feels a bit
> redundant... maybe there's a small relevant subset of tests that could only
> run for the alternative dependency config?
> 
> On a related note, I think this part of the install may be broken.
> 
> I've tried running a pip install under each config like so (using pyenv +
> virtualenvwrapper):
> 
> Shell 1:
> 
> pyenv shell 3.6.5
> mktmpenv
> export SLUGIFY_USES_TEXT_UNIDECODE=yes
> pip install apache-airflow
> pip freeze > ~/a.txt
> 
> Shell 2:
> 
> pyenv shell 3.6.5
> mktmpenv
> export AIRFLOW_GPL_UNIDECODE=yes
> pip install apache-airflow
> pip freeze > ~/b.txt
> 
> Shell 3:
> 
> diff ~/a.txt ~/b.txt
> (empty)
> 
> I would expect the former to have text-unidecode and the latter to have
> Unidecode.  *Can someone else attempt to reproduce this behavior?*
> 
> Additionally, I'm also experiencing this same behavior when trying to
> install the underlying python-slugify package similarly as well.  I've
> opened an issue for that here -
> https://github.com/un33k/python-slugify/issues/59.
> 
> Thank you,
> Taylor
> 
> *Taylor Edmiston*
> Blog  | CV
>  | LinkedIn
>  | AngelList
>  | Stack Overflow
> 



Re: Airflow variables and data profiling hidden

2018-08-29 Thread Ash Berlin-Taylor
Your users are not set up as as Admin users. The mechanism for fixing this 
depends upon what auth backend you are using.

Look in your airflow.cfg under the [core] section for the auth_backend 
settings, that will (likely) map into one of these classes 
https://github.com/apache/incubator-airflow/tree/master/airflow/contrib/auth/backends
 


For example if you are using the LDAP backend then look at the [ldap] 
superuser_filter setting, which will map LDAP group membership into Airflow 
admins.

Hope this helps.

Ash

> On 29 Aug 2018, at 07:15, Shubham Gupta  wrote:
> 
> Hi,
> 
> Airflow variables and data profiling tabs are hidden in the UI. Can someone
> suggest how to unhide them? If I try to access through '/admin/variable/',
> the result is
> 
>> 'You don't have the permission to access the requested resource. It is
>> either read-protected or not readable by the server.'
> 
> 
> The people who set airflow cluster initially are no longer in the same
> company.
> 
> Regards
> Shubham Gupta



Re: Python 3.6 Support for Airflow 1.10.0

2018-08-28 Thread Ash Berlin-Taylor
Supporting 3.7 is absolutely something we should do - it just got released 
while we were already mid-way through the release process of 1.10 and didn't 
want the scope creep.

I'm happy to release a 1.10.1 that supports Py 3.7. The only issue I've seen so 
far is around the use of `async` as a keyword. both in 

A perhaps bigger question: What are people's thoughts on dropping support for 
Python2? This wouldn't happen before 2.0 at the earliest if we did it. Probably 
something to raise an AIP for.

-ash

> On 28 Aug 2018, at 16:56, Taylor Edmiston  wrote:
> 
> We are also running on 3.6 for some time.
> 
> I put a quick branch together adding / upgrading to 3.6 in all of the
> places.  CI is still running so I may expect some test failures but
> hopefully nothing major.  I would be happy to merge this into Kaxil's
> current #3815 or as a follow-on PR.  I'll paste this back onto his PR as
> well.
> 
> https://github.com/apache/incubator-airflow/pull/3816
> 
> I think it's important for the project to officially support Python 3.6
> latest especially since 3.7 is out now.  While we're on the topic, does
> anyone else have thoughts on supporting 3.7 (perhaps unofficially to
> start)?  I wouldn't mind starting to get that ball rolling.
> 
> *Taylor Edmiston*
> Blog  | CV
>  | LinkedIn
>  | AngelList
>  | Stack Overflow
> 
> 
> 
> 
> On Tue, Aug 28, 2018 at 9:29 AM Adam Boscarino
>  wrote:
> 
>> fwiw, we run Airflow on Python 3.6.
>> 
>> On Tue, Aug 28, 2018 at 8:30 AM Naik Kaxil  wrote:
>> 
>>> To provide more context to the issue:
>>> 
>>> 
>>> 
>>> PyPI shows that Airflow is supported on Py2.7, 3.4 and 3.5 :
>>> https://pypi.org/project/apache-airflow/
>>> 
>>> 
>>> 
>>> This is picked from setup.py:
>>> 
>>> 
>>> 
>> https://github.com/apache/incubator-airflow/blob/26e0d449737e8671000f671d820a9537f23f345a/setup.py#L367
>>> 
>>> 
>>> 
>>> 
>>> 
>>> So, should we update setup.py to include 3.6 as well?
>>> 
>>> 
>>> 
>>> @bolke – Thughts?
>>> 
>>> 
>>> 
>>> 
>>> Kaxil Naik
>>> 
>>> Data Reply
>>> 2nd Floor, Nova South
>>> 160 Victoria Street, Westminster
>>> London SW1E 5LB - UK
>>> phone: +44 (0)20 7730 6000
>>> k.n...@reply.com
>>> www.reply.com
>>> 
>>> [image: Data Reply]
>>> 
>>> *From: *Naik Kaxil 
>>> *Reply-To: *"dev@airflow.incubator.apache.org" <
>>> dev@airflow.incubator.apache.org>
>>> *Date: *Tuesday, 28 August 2018 at 13:27
>>> *To: *"dev@airflow.incubator.apache.org" <
>> dev@airflow.incubator.apache.org
 
>>> *Subject: *Python 3.6 Support for Airflow 1.10.0
>>> 
>>> 
>>> 
>>> Hi all,
>>> 
>>> 
>>> 
>>> @fokko – I remember that you had test Airflow on 3.6 . Can we include 3.6
>>> in setup.py then ?
>>> 
>>> 
>>> 
>>> Regards,
>>> 
>>> Kaxil
>>> 
>>> 
>>> 
>>> Kaxil Naik
>>> 
>>> *Data Reply*
>>> 2nd Floor, Nova South
>>> 160 Victoria Street, Westminster
>>> London SW1E 5LB - UK
>>> phone: +44 (0)20 7730 6000
>>> k.n...@reply.com
>>> www.reply.com
>>> 
>>> [image: Data Reply]
>>> 
>>> 
>> 
>> --
>> Adam Boscarino
>> Senior Data Engineer
>> 
>> aboscar...@digitalocean.com
>> --
>> We're Hiring!  |
>> @digitalocean 
>> 



Re: [RESULT][VOTE] Release Airflow 1.10.0

2018-08-28 Thread Ash Berlin-Taylor
And mine is 'ashb', please :)


> On 27 Aug 2018, at 18:19, Sid Anand  wrote:
> 
> @max (mine is r39132)
> 
> On Mon, Aug 27, 2018 at 10:13 AM Driesprong, Fokko 
> wrote:
> 
>> Thanks for picking this up Naik! Did not have the time today to upload the
>> artifacts.
>> 
>> Cheers, Fokko
>> 
>> Op ma 27 aug. 2018 om 18:05 schreef Naik Kaxil :
>> 
>>> I have upload it on PyPi as well and will update the documentation now.
>>> 
>>> On 27/08/2018, 00:32, "Arthur Wiedmer" 
>> wrote:
>>> 
>>>Done for Bolke, Fokko and kaxil.
>>> 
>>>Best,
>>>Arthur
>>> 
>>>On Sun, Aug 26, 2018 at 3:08 AM Driesprong, Fokko
>> >>> 
>>>wrote:
>>> 
>>>> Gentle ping! Would be awesome to get 1.10 on Pypi :-)
>>>> 
>>>> Op wo 22 aug. 2018 om 23:43 schreef Naik Kaxil :
>>>> 
>>>>> Mine is "kaxil"
>>>>> 
>>>>> On 22/08/2018, 16:18, "Bolke de Bruin" 
>> wrote:
>>>>> 
>>>>>@max
>>>>> 
>>>>>Mine is "bolke"
>>>>> 
>>>>>Cheers
>>>>> 
>>>>>B.
>>>>> 
>>>>>Sent from my iPhone
>>>>> 
>>>>>> 
>>>>> 
>>>>> Kaxil Naik
>>>>> 
>>>>> Data Reply
>>>>> 2nd Floor, Nova South
>>>>> 160 Victoria Street, Westminster
>>>>> London SW1E 5LB - UK
>>>>> phone: +44 (0)20 7730 6000
>>>>> k.n...@reply.com
>>>>> www.reply.com
>>>>> 
>>> 
>>> Kaxil Naik
>>> 
>>> Data Reply
>>> 2nd Floor, Nova South
>>> 160 Victoria Street, Westminster
>>> London SW1E 5LB - UK
>>> phone: +44 (0)20 7730 6000
>>> k.n...@reply.com
>>> www.reply.com
>>> On 22 Aug 2018, at 16:13, Driesprong, Fokko 
>>>> wrote:
>>>>>> 
>>>>>> Certainly:
>>>>> https://github.com/apache/incubator-airflow/releases/tag/1.10.0
>>>>>> 
>>>>>> Cheers, Fokko
>>>>>> 
>>>>>> Op wo 22 aug. 2018 om 15:18 schreef Ash Berlin-Taylor <
>>>>> a...@apache.org>:
>>>>>> 
>>>>>>> Could you push the git tag too please Fokko/Bolke?
>>>>>>> 
>>>>>>> -ash
>>>>>>> 
>>>>>>>> On 22 Aug 2018, at 08:16, Driesprong, Fokko
>>> >>>> 
>>>>>>> wrote:
>>>>>>>> 
>>>>>>>> Thanks Max,
>>>>>>>> 
>>>>>>>> My PyPI ID is Fokko
>>>>>>>> 
>>>>>>>> Cheers, Fokko
>>>>>>>> 
>>>>>>>> 2018-08-21 22:49 GMT+02:00 Maxime Beauchemin <
>>>>> maximebeauche...@gmail.com
>>>>>>>> :
>>>>>>>> 
>>>>>>>>> I can, what's your PyPI ID?
>>>>>>>>> 
>>>>>>>>> Max
>>>>>>>>> 
>>>>>>>>> On Mon, Aug 20, 2018 at 2:11 PM Driesprong, Fokko
>>>>> >>>>>>> 
>>>>>>>>> wrote:
>>>>>>>>> 
>>>>>>>>>> Thanks Bolke!
>>>>>>>>>> 
>>>>>>>>>> I've just pushed the artifacts to Apache Dist:
>>>>>>>>>> 
>>>>>>>>>> https://dist.apache.org/repos/dist/release/incubator/
>>>>>>>>> airflow/1.10.0-incubating/
>>>>>>>>>> 
>>>>>>>>>> I don't have any access to pypi, this means that I'm
>> not
>>> able
>>>> to
>>>>> upload
>>>>>>>>> the
>>>>>>>>>> artifacts over there. Anyone in the position to grand
>> me
>>> access
>>>>> or
>>>>>>> upload
>>>>>>>>>> it to pypi?
>>>>>>>>>> 
>>>>>>>>>> Thanks! Cheers, Fokko
>>>>>>>>>> 
>>>>>>>>

Cloudera Hue in License

2018-08-24 Thread Ash Berlin-Taylor
Hi everyone,

So we include references to Cloudera's Hue in the LICENSE file, and mention it 
again in the NOTICE file saying:

> This product contains a modified portion of 'Hue' developed by Cloudera, Inc.

Does anyone know what this refers to? Is it still the case? Grepping for hue 
doesn't turn up anything likely looking.

-ash

Re: [RESULT][VOTE] Release Airflow 1.10.0

2018-08-22 Thread Ash Berlin-Taylor
Could you push the git tag too please Fokko/Bolke?

-ash

> On 22 Aug 2018, at 08:16, Driesprong, Fokko  wrote:
> 
> Thanks Max,
> 
> My PyPI ID is Fokko
> 
> Cheers, Fokko
> 
> 2018-08-21 22:49 GMT+02:00 Maxime Beauchemin :
> 
>> I can, what's your PyPI ID?
>> 
>> Max
>> 
>> On Mon, Aug 20, 2018 at 2:11 PM Driesprong, Fokko 
>> wrote:
>> 
>>> Thanks Bolke!
>>> 
>>> I've just pushed the artifacts to Apache Dist:
>>> 
>>> https://dist.apache.org/repos/dist/release/incubator/
>> airflow/1.10.0-incubating/
>>> 
>>> I don't have any access to pypi, this means that I'm not able to upload
>> the
>>> artifacts over there. Anyone in the position to grand me access or upload
>>> it to pypi?
>>> 
>>> Thanks! Cheers, Fokko
>>> 
>>> 2018-08-20 20:06 GMT+02:00 Bolke de Bruin :
>>> 
 Hi Guys and Gals,
 
 The vote has passed! Apache Airflow 1.10.0 is official.
 
 As I am AFK for a while can one of the committers please rename
>> according
 to the release docs and push it to the relevant locations (pypi and
>>> Apache
 dist)?
 
 Oh and maybe start a quick 1.10.1?
 
 Cheers
 Bolke
 
 Sent from my iPhone
 
 Begin forwarded message:
 
> From: Bolke de Bruin 
> Date: 20 August 2018 at 20:00:28 CEST
> To: gene...@incubator.apache.org, dev@airflow.incubator.apache.org
> Subject: [RESULT][VOTE] Release Airflow 1.10.0
> 
> The vote to release Airflow 1.10.0-incubating, having been open for 8
> days is now closed.
> 
> There were three binding +1s and no -1 votes.
> 
> +1 (binding):
> Justin Mclean
> Jakob Homan
> Hitesh Shah
> 
> The release is approved.
> 
> Thanks to all those who voted.
> 
> Bolke
> 
> Sent from my iPhone
> 
> Begin forwarded message:
> 
>> From: Bolke de Bruin 
>> Date: 20 August 2018 at 19:56:23 CEST
>> To: gene...@incubator.apache.org
>> Subject: Re: [VOTE] Release Airflow 1.10.0 (new vote based on rc4)
>> 
>> Appreciated Hitesh. Do you know how to add headers to .MD files?
>> There
 seems to be no technical standard way[1]. Is there a way to solve this
 elegantly?
>> 
>> Cheers
>> Bolke
>> 
>> [1] https://alvinalexander.com/technology/markdown-comments-
 syntax-not-in-generated-output
>> 
>> 
>> 
>> Sent from my iPhone
>> 
>>> On 20 Aug 2018, at 19:48, Hitesh Shah  wrote:
>>> 
>>> +1 (binding)
>>> 
>>> Ran through the basic checks.
>>> 
>>> Minor nit which can be fixed in the next release: there are a bunch
>>> of
>>> documentation files which could have a license header added (e.g.
>>> .md,
>>> .rst, )
>>> 
>>> thanks
>>> Hitesh
>>> 
 On Mon, Aug 20, 2018 at 4:08 AM Bolke de Bruin >> 
 wrote:
 
 Sorry Willem that should be of course. Apologies.
 
 Sent from my iPhone
 
> On 20 Aug 2018, at 13:07, Bolke de Bruin 
>>> wrote:
> 
> Hi William
> 
> You seem to be missing a "4" at the end of the URL? Ah it seems
>>> that
 my
 original email had a quirk. Would you mind using the below?
> 
> https://github.com/apache/incubator-airflow/releases/
>> tag/1.10.0rc4
> 
> Thanks!
> Bolke
> 
> Sent from my iPhone
> 
>> On 20 Aug 2018, at 13:03, Willem Jiang 
 wrote:
>> 
>> Hi,
>> 
>> The Git tag cannot be accessed.  I can only get the 404  error
 there.
>> 
>> https://github.com/apache/incubator-airflow/releases/
>> tag/1.10.0rc
>> 
>> 
>> Willem Jiang
>> 
>> Twitter: willemjiang
>> Weibo: 姜宁willem
>> 
>>> On Sun, Aug 12, 2018 at 8:25 PM, Bolke de Bruin <
>>> bdbr...@gmail.com
> 
 wrote:
>>> 
>>> Hello Incubator PMC’ers,
>>> 
>>> The Apache Airflow community has voted and approved the
>> proposal
>>> to
 release
>>> Apache Airflow 1.10.0 (incubating) based on 1.10.0 Release
 Candidate
 4. We
>>> now kindly request the Incubator PMC members to review and vote
>>> on
 this
>>> incubator release.
>>> 
>>> Airflow is a platform to programmatically author, schedule, and
 monitor
>>> workflows. Use Airflow to author workflows as directed acyclic
 graphs
>>> (DAGs) of tasks. The airflow scheduler executes your tasks on
>> an
 array
 of
>>> workers while following the specified dependencies. Rich
>> command
 line
>>> utilities make performing complex surgeries on DAGs a snap. The
 rich
 user
>>> interface makes it easy to visualize pipelines running in
 production,
>>> monitor progress, and troubleshoot issues when needed. When
 workflows
 are

Re: Plan to change type of dag_id from String to Number?

2018-08-16 Thread Ash Berlin-Taylor
The performance of SQLite doesn't matter is it is restricted to a single worker 
anyway -- it's definitely not recommended for running in production.

-ash

> On 16 Aug 2018, at 08:33, George Leslie-Waksman  wrote:
> 
> These performance characteristics are metadata database backend dependent
> as well. If there are benchmarks, I would hope we look at them across
> sqlite, mysql, postgresql, and any other supported backends before we take
> action.
> 
> On Thu, Aug 9, 2018 at 12:41 PM Maxime Beauchemin <
> maximebeauche...@gmail.com> wrote:
> 
>> The change on perf for the DAG table would be extremely negligible.
>> 
>> Maybe for task_instances (large table with millions of rows, 3 fields
>> composite key) it *could* be a decent idea. Though you'd then need to have
>> two indexes to store and maintain and we may have to change the code to
>> actually use and reference that new more efficient pk in places where it's
>> more efficient to use that index (some of it SQLAlchemy would do right out
>> of the box).
>> 
>> This mostly affects the index size (btree(id) is much smaller than
>> btree(dag_id, task_id, execution_date)), not the key lookup time much as it
>> is log(n). We'd still have to use the composite btree when we want to do
>> range scans, which we use frequently to get sets of tasks for a dag or
>> specific dag task. Since lookups are log(n), and that we need to maintain
>> that composite btree anyways for range scans, I don't see where that would
>> really help. It would be a better index (less pages, less memory usage,
>> ...) if we didn't need that other composite one, which we do.
>> 
>> Max
>> 
>> On Thu, Aug 9, 2018 at 8:05 AM Vardan Gupta 
>> wrote:
>> 
>>> Point well taken on backward compatibility, we will have to take this
>>> change very diligently, if implemented.
>>> 
>>> On Thu, Aug 9, 2018 at 7:29 PM Юли Волкова  wrote:
>>> 
>>>> Because in case what you described nothing about backward
>> compatibility.
>>>> You think what all who use need to change all theirs DAG's? It's very
>>>> strange, because you propose one of the most critical change and it
>> will
>>>> side everyone. If you want id - call it dag_metadata_id and add it. But
>>> not
>>>> propose change what hasn't backward compatibility. It's to strange.
>>>> 
>>>> On Thu, Aug 9, 2018 at 7:04 AM vardangupta...@gmail.com <
>>>> vardangupta...@gmail.com> wrote:
>>>> 
>>>>> 
>>>>> 
>>>>> On 2018/08/09 11:55:11, Ash Berlin-Taylor  wrote:
>>>>>> Absolutely - there will still need to be a human-readable DAG id,
>>> even
>>>>> we end up with an auto-icrementing integer ID column internally and
>> for
>>>>> table join performance reasons.
>>>>>> 
>>>>>> -ash
>>>>>> 
>>>>>>> On 9 Aug 2018, at 12:35, Юли Волкова 
>> wrote:
>>>>>>> 
>>>>>>> How will you understand what your DAG 2 doing enter to it?
>> For
>>>>> each of
>>>>>>> 100, for example?
>>>>>>> Especially, if you are not a developer, who create it. You are a
>>>>> support
>>>>>>> team and have 120 DAGs.
>>>>>>> 
>>>>>>> The first time, when want to also send the answer to dev-mail
>> list.
>>>>> Please,
>>>>>>> don't do it.
>>>>>>> 
>>>>>>> I think it's will be really bad to all who use dag_id like a
>> saying
>>>>> name of
>>>>>>> dag. If I will be looked at 0329313 this does not say anything
>>> useful
>>>>> for
>>>>>>> me and it will be very very complicated to identify for which
>>> process
>>>>> dag
>>>>>>> using.  It could be another id for the indexes in DB if it's real
>>>>> problem
>>>>>>> for somebody. But, please, do not change dag_id.
>>>>>>> 
>>>>>>> On Mon, Aug 6, 2018 at 1:32 AM vardangupta...@gmail.com <
>>>>>>> vardangupta...@gmail.com> wrote:
>>>>>>> 
>>>>>>>> Hi Everyone,
>>>>>>>> 
>>>>>>>> Do we have any plan to change type of dag_id from String to
>>> Numb

Re: apache-airflow v1.10.0 on PyPi?

2018-08-16 Thread Ash Berlin-Taylor
1.10.0 isn't officially released yet, so that's why it's not on PyPi/tagged 
yet. (As we are still in the Incubation phase of the project we need our 
mentors to also sign off on our RC after we, the Airflow community, have voted 
on it)

But yes, we should push the tags to Github. I've done that:

git tag 1.10.0rc3 862ad8b9 # [AIRFLOW-XXX] Update changelog for 1.10
git tag 1.10.0rc4 26e0d449 # AIRFLOW-XXX] Update changelog for 1.10
git push --tags github
 
Having just updated the instructions 
(https://cwiki.apache.org/confluence/display/AIRFLOW/Releasing+Airflow 
) I 
suspect the tags were in Bolke's local checkout. I hope I got the commits right

-ash

> On 15 Aug 2018, at 17:34, James Meickle  
> wrote:
> 
> Can we make it a policy going forward to push GH tags for all RCs as part
> of the release announcement? I deploy via the incubator-airflow repo, but
> currently it only has tags for up to RC2, which means I have to look up and
> then specify an ugly-looking commit to deploy an RC :(
> 
> On Wed, Aug 15, 2018 at 10:54 AM Taylor Edmiston 
> wrote:
> 
>> Krish - You can also use the RCs before they're released on PyPI if you'd
>> like to help test.  Instead of:
>> 
>> pip install apache-airflow
>> 
>> You can install the 1.10 stable latest with:
>> 
>> pip install git+git://github.com/apache/incubator-airflow.git@v1-10-stable
>> 
>> Or the 1.10 RC tags with eg:
>> 
>> pip install git+git://github.com/apache/incubator-airflow.git@1.10.0rc2
>> 
>> Best,
>> Taylor
>> 
>> *Taylor Edmiston*
>> Blog  | CV
>>  | LinkedIn
>>  | AngelList
>>  | Stack Overflow
>> 
>> 
>> 
>> On Thu, Aug 9, 2018 at 5:43 PM, Krish Sigler  wrote:
>> 
>>> Got it, will use the mailing list in the future.  Thanks for the info
>>> 
>>> On Thu, Aug 9, 2018 at 2:42 PM, Bolke de Bruin 
>> wrote:
>>> 
 Hi Kris,
 
 Please use the mailing list for these kind of questions.
 
 Airflow 1.10.0 hasn’t been released yet. We are going through the
>>> motions,
 but it will take a couple of days before it’s official (if all goes
>>> well).
 
 B.
 
 Verstuurd vanaf mijn iPad
 
 Op 9 aug. 2018 om 23:33 heeft Krish Sigler  het
 volgende geschreven:
 
 Hi,
 
 First, I apologize if this is weird.  I saw on the Airflow github page
 that you most recently updated the v1.10.0 changelog, and I found your
 email using the instructions here (https://www.sourcecon.com/
 how-to-find-almost-any-github-users-email-address/).  If that's too
>>> weird
 feel free to tell me and/or ignore this.
 
 I'm emailing because I'm working with the apache-airflow project,
 specifically for setting up pipelines involving GCP packages.  My
 environment uses Python3, and I've been running into the issue outlined
>>> in
 this PR: https://github.com/apache/incubator-airflow/pull/3273.  I
 noticed that the fix is part of the v1.10.0 changelog.
 However, the latest version available on PyPi is 1.9.0.  On the Airflow
 wiki page I read that the project is intended to be updated every ~6
 months, and v1.9.0 was released in January.
 
 So my question, if you're at liberty to tell me, is can I expect
>> v1.10.0
 to be available on PyPi in the near future?  If so then great!  That
>>> would
 solve my package dependency problem.  If not, then I'll look into some
 workaround for my issue.
 
 Thanks,
 Krish
 
 
>>> 
>> 



Re: [VOTE] Airflow 1.10.0rc4

2018-08-10 Thread Ash Berlin-Taylor
If we can't score fractions then, yes +1 :)

(And this time sent from the correct email address. I'm really bad at driving a 
Mail client it turns out.)

-ash

> On 9 Aug 2018, at 19:22, Bolke de Bruin  wrote:
> 
> 0.5?? Can we score fractions :-) ? Sorry I missed this Ash. I think Fokko 
> really wants a 1.10.1 quickly so better include it then? Can you make your 
> vote +1?
> 
> Thx
> Bolke
> 
>> On 9 Aug 2018, at 14:06, Ash Berlin-Taylor  wrote:
>> 
>> +0.5 (binding) from me.
>> 
>> Tested upgrading form 1.9.0 metadb on Py3.5. Timezones behaving themselves 
>> on Postgres. Have not tested the Rbac-based UI.
>> 
>> https://github.com/apache/incubator-airflow/commit/d9fecba14c5eb56990508573a91b13ab27ca5153
>>  
>> <https://github.com/apache/incubator-airflow/commit/d9fecba14c5eb56990508573a91b13ab27ca5153>
>>  (expanding on UPDATING.md for Logging changes) isn't in the release, but 
>> would only affect people who look at the UPDATING.md in the source tarball, 
>> which isn't going to be very many - most people will check in the repo and 
>> just install via PyPi I'd guess?
>> 
>> -ash
>> 
>>> On 8 Aug 2018, at 19:21, Bolke de Bruin  wrote:
>>> 
>>> Hey all,
>>> 
>>> I have cut Airflow 1.10.0 RC4. This email is calling a vote on the release,
>>> which will last for 72 hours. Consider this my (binding) +1.
>>> 
>>> Airflow 1.10.0 RC 4 is available at:
>>> 
>>> https://dist.apache.org/repos/dist/dev/incubator/airflow/1.10.0rc4/ 
>>> <https://dist.apache.org/repos/dist/dev/incubator/airflow/1.10.0rc4/>
>>> 
>>> apache-airflow-1.10.0rc4+incubating-source.tar.gz is a source release that
>>> comes with INSTALL instructions.
>>> apache-airflow-1.10.0rc4+incubating-bin.tar.gz is the binary Python "sdist"
>>> release.
>>> 
>>> Public keys are available at:
>>> 
>>> https://dist.apache.org/repos/dist/release/incubator/airflow/ 
>>> <https://dist.apache.org/repos/dist/release/incubator/airflow/>
>>> 
>>> The amount of JIRAs fixed is over 700. Please have a look at the changelog. 
>>> Since RC3 the following has been fixed:
>>> 
>>> [AIRFLOW-2870] Use abstract TaskInstance for migration
>>> [AIRFLOW-2859] Implement own UtcDateTime
>>> [AIRFLOW-2140] Don't require kubernetes for the SparkSubmit hook
>>> [AIRFLOW-2869] Remove smart quote from default config
>>> [AIRFLOW-2857] Fix Read the Docs env
>>> 
>>> Please note that the version number excludes the `rcX` string as well
>>> as the "+incubating" string, so it's now simply 1.10.0. This will allow us
>>> to rename the artifact without modifying the artifact checksums when we
>>> actually release.
>>> 
>>> WARNING: Due to licensing requirements you will need to set 
>>> SLUGIFY_USES_TEXT_UNIDECODE=yes in your environment when
>>> installing or upgrading. We will try to remove this requirement for the 
>>> next release.
>>> 
>>> Cheers,
>>> Bolke
>> 
> 



Re: [VOTE] Airflow 1.10.0rc4

2018-08-09 Thread Ash Berlin-Taylor
+0.5 (binding) from me.

Tested upgrading form 1.9.0 metadb on Py3.5. Timezones behaving themselves on 
Postgres. Have not tested the Rbac-based UI.

https://github.com/apache/incubator-airflow/commit/d9fecba14c5eb56990508573a91b13ab27ca5153
 

 (expanding on UPDATING.md for Logging changes) isn't in the release, but would 
only affect people who look at the UPDATING.md in the source tarball, which 
isn't going to be very many - most people will check in the repo and just 
install via PyPi I'd guess?

-ash

> On 8 Aug 2018, at 19:21, Bolke de Bruin  wrote:
> 
> Hey all,
> 
> I have cut Airflow 1.10.0 RC4. This email is calling a vote on the release,
> which will last for 72 hours. Consider this my (binding) +1.
> 
> Airflow 1.10.0 RC 4 is available at:
> 
> https://dist.apache.org/repos/dist/dev/incubator/airflow/1.10.0rc4/ 
> 
> 
> apache-airflow-1.10.0rc4+incubating-source.tar.gz is a source release that
> comes with INSTALL instructions.
> apache-airflow-1.10.0rc4+incubating-bin.tar.gz is the binary Python "sdist"
> release.
> 
> Public keys are available at:
> 
> https://dist.apache.org/repos/dist/release/incubator/airflow/ 
> 
> 
> The amount of JIRAs fixed is over 700. Please have a look at the changelog. 
> Since RC3 the following has been fixed:
> 
> [AIRFLOW-2870] Use abstract TaskInstance for migration
> [AIRFLOW-2859] Implement own UtcDateTime
> [AIRFLOW-2140] Don't require kubernetes for the SparkSubmit hook
> [AIRFLOW-2869] Remove smart quote from default config
> [AIRFLOW-2857] Fix Read the Docs env
> 
> Please note that the version number excludes the `rcX` string as well
> as the "+incubating" string, so it's now simply 1.10.0. This will allow us
> to rename the artifact without modifying the artifact checksums when we
> actually release.
> 
> WARNING: Due to licensing requirements you will need to set 
> SLUGIFY_USES_TEXT_UNIDECODE=yes in your environment when
> installing or upgrading. We will try to remove this requirement for the 
> next release.
> 
> Cheers,
> Bolke



Re: Plan to change type of dag_id from String to Number?

2018-08-09 Thread Ash Berlin-Taylor
Since this is a big change that would touch much of the code base, before we do 
this we need to see some hard numbers - timing or benchmarks of queries etc.

Also how often do we actually do such a join etc?

-ash

> On 9 Aug 2018, at 13:04, vardangupta...@gmail.com 
>  wrote:
> 
> Thanks Ash for your reply, I am aligned with what you're saying. 
> 
> I was not proposing to take away human readable dag_id instead I was 
> thinking, why can't we create another field like dag_name which will hold 
> this information at all front facing sites while dag_id is changed to 
> integer, this will help in making joins work faster in metastore. Though, 
> currently dag_id is indexed but still indexing int (4 bytes) vs varchar(250) 
> are going to take more index blocks and therefore more look up time. Also, if 
> dag_id is not trivial to change to int, let it be present and let's introduce 
> another col which is actually integer in type and let joining happen on this 
> column across all tables.



Re: [VOTE] Airflow 1.10.0rc4

2018-08-09 Thread Ash Berlin-Taylor
+0.5 (binding) from me.

Tested upgrading form 1.9.0 metadb on Py3.5. Timezones behaving themselves on 
Postgres. Have not tested the Rbac-based UI.

https://github.com/apache/incubator-airflow/commit/d9fecba14c5eb56990508573a91b13ab27ca5153
 

 (expanding on UPDATING.md for Logging changes) isn't in the release, but would 
only affect people who look at the UPDATING.md in the source tarball, which 
isn't going to be very many - most people will check in the repo and just 
install via PyPi I'd guess?

-ash

> On 8 Aug 2018, at 19:21, Bolke de Bruin  wrote:
> 
> Hey all,
> 
> I have cut Airflow 1.10.0 RC4. This email is calling a vote on the release,
> which will last for 72 hours. Consider this my (binding) +1.
> 
> Airflow 1.10.0 RC 4 is available at:
> 
> https://dist.apache.org/repos/dist/dev/incubator/airflow/1.10.0rc4/ 
> 
> 
> apache-airflow-1.10.0rc4+incubating-source.tar.gz is a source release that
> comes with INSTALL instructions.
> apache-airflow-1.10.0rc4+incubating-bin.tar.gz is the binary Python "sdist"
> release.
> 
> Public keys are available at:
> 
> https://dist.apache.org/repos/dist/release/incubator/airflow/ 
> 
> 
> The amount of JIRAs fixed is over 700. Please have a look at the changelog. 
> Since RC3 the following has been fixed:
> 
> [AIRFLOW-2870] Use abstract TaskInstance for migration
> [AIRFLOW-2859] Implement own UtcDateTime
> [AIRFLOW-2140] Don't require kubernetes for the SparkSubmit hook
> [AIRFLOW-2869] Remove smart quote from default config
> [AIRFLOW-2857] Fix Read the Docs env
> 
> Please note that the version number excludes the `rcX` string as well
> as the "+incubating" string, so it's now simply 1.10.0. This will allow us
> to rename the artifact without modifying the artifact checksums when we
> actually release.
> 
> WARNING: Due to licensing requirements you will need to set 
> SLUGIFY_USES_TEXT_UNIDECODE=yes in your environment when
> installing or upgrading. We will try to remove this requirement for the 
> next release.
> 
> Cheers,
> Bolke



Re: Plan to change type of dag_id from String to Number?

2018-08-09 Thread Ash Berlin-Taylor
Absolutely - there will still need to be a human-readable DAG id, even we end 
up with an auto-icrementing integer ID column internally and for table join 
performance reasons.

-ash

> On 9 Aug 2018, at 12:35, Юли Волкова  wrote:
> 
> How will you understand what your DAG 2 doing enter to it? For each of
> 100, for example?
> Especially, if you are not a developer, who create it. You are a support
> team and have 120 DAGs.
> 
> The first time, when want to also send the answer to dev-mail list. Please,
> don't do it.
> 
> I think it's will be really bad to all who use dag_id like a saying name of
> dag. If I will be looked at 0329313 this does not say anything useful for
> me and it will be very very complicated to identify for which process dag
> using.  It could be another id for the indexes in DB if it's real problem
> for somebody. But, please, do not change dag_id.
> 
> On Mon, Aug 6, 2018 at 1:32 AM vardangupta...@gmail.com <
> vardangupta...@gmail.com> wrote:
> 
>> Hi Everyone,
>> 
>> Do we have any plan to change type of dag_id from String to Number, this
>> will make queries on metadata more performant, proposal could be generating
>> an auto-incremental value in dag table and this id getting used in rest of
>> the other tables?
>> 
>> 
>> Regards,
>> Vardan Gupta
>> 
> 
> 
> -- 
> _
> 
> С уважением, Юлия Волкова
> Тел. : +7 (911) 116-71-82



Re: [VOTE] Airflow 1.10.0rc3

2018-08-08 Thread Ash Berlin-Taylor
Could you upgrading to 1.9 first? And see if that helps?

-ash

> On 8 Aug 2018, at 00:07, George Leslie-Waksman  <mailto:waks...@gmail.com>> wrote:
> 
> We just tried to upgrade a 1.8.1 install to 1.10rc3 and ran into a critical
> error on alembic migration execution. I have captured the issue in JIRA:
> https://issues.apache.org/jira/browse/AIRFLOW-2870 
> <https://issues.apache.org/jira/browse/AIRFLOW-2870>
> 
> I would consider this a critical blocker for release because it hard blocks
> upgrading.
> 
> George
> 
> On Tue, Aug 7, 2018 at 7:58 AM Bolke de Bruin  <mailto:bdbr...@gmail.com>> wrote:
> 
>> Done. When I roll rc4 it will be part of it.
>> 
>> 
>>> On 7 Aug 2018, at 16:26, Naik Kaxil >> <mailto:k.n...@reply.com>> wrote:
>>> 
>>> @bolke Can we also include the following commit to 1.10 release as we
>> would need this commit to generate docs at ReadTheDocs?
>>> 
>>> -
>> https://github.com/apache/incubator-airflow/commit/8af0aa96bfe3caa51d67ab393db069d37b0c4169
>>  
>> <https://github.com/apache/incubator-airflow/commit/8af0aa96bfe3caa51d67ab393db069d37b0c4169>
>>> 
>>> Regards,
>>> Kaxil
>>> 
>>> On 06/08/2018, 14:59, "James Meickle" 
>> wrote:
>>> 
>>>   Not a vote, but a comment: it might be worth noting that the new
>>>   environment variable is also required if you have any Airflow plugin
>> test
>>>   suites that install Airflow as part of their dependencies. In my
>> case, I
>>>   had to set the new env var outsidfe of tox and add this:
>>> 
>>>   ```
>>>   [testenv]
>>>   passenv = SLUGIFY_USES_TEXT_UNIDECODE
>>>   ```
>>> 
>>>   (`setenv` did not work as that provides env vars at runtime but not
>>>   installtime, as far as I can tell.)
>>> 
>>> 
>>>   On Sun, Aug 5, 2018 at 5:20 PM Bolke de Bruin 
>> wrote:
>>> 
>>>> +1 :-)
>>>> 
>>>> Sent from my iPhone
>>>> 
>>>>> On 5 Aug 2018, at 23:08, Ash Berlin-Taylor <
>>>> ash_airflowl...@firemirror.com> wrote:
>>>>> 
>>>>> Yup, just worked out the same thing.
>>>>> 
>>>>> I think as "punishment" for me finding bugs so late in two RCs (this,
>>>> and 1.9) I should run the release for the next release.
>>>>> 
>>>>> -ash
>>>>> 
>>>>>> On 5 Aug 2018, at 22:05, Bolke de Bruin  wrote:
>>>>>> 
>>>>>> Yeah I figured it out. Originally i was using a different
>>>> implementation of UTCDateTime, but that was unmaintained. I switched,
>> but
>>>> this version changed or has a different contract. While it transforms on
>>>> storing to UTC it does not so when it receives timezone aware fields
>> from
>>>> the db. Hence the issue.
>>>>>> 
>>>>>> I will prepare a PR that removes the dependency and implements our own
>>>> extension of DateTime. Probably tomorrow.
>>>>>> 
>>>>>> Good catch! Just in time :-(.
>>>>>> 
>>>>>> B.
>>>>>> 
>>>>>>> On 5 Aug 2018, at 22:43, Ash Berlin-Taylor <
>>>> ash_airflowl...@firemirror.com> wrote:
>>>>>>> 
>>>>>>> Entirely possible, though I wasn't even dealing with the scheduler -
>>>> the issue I was addressing was entirely in the webserver for a
>> pre-existing
>>>> Task Instance.
>>>>>>> 
>>>>>>> Ah, I hadn't noticed/twigged we are using sqlalchemy-utc. It appears
>>>> that isn't working right/ as expected. This line:
>>>> 
>> https://github.com/spoqa/sqlalchemy-utc/blob/master/sqlalchemy_utc/sqltypes.py#L34
>>>> doens't look right for us - as you mentioned the TZ is set to something
>>>> (rather than having no TZ value).
>>>>>>> 
>>>>>>> Some background on how Pq handles TZs. It always returns DTs in the
>> TZ
>>>> of the connection. I'm not sure if this is unique to postgres or if
>> other
>>>> DBs behave the same.
>>>>>>> 
>>>>>>> postgres=# select '2018-08-03 00:00:00+00:00'::timestamp with time
>>>> zone;
>>>>>>>  timestamptz
>>>>>>> 

Re: Multiple hosts for a single connection

2018-08-07 Thread Ash Berlin-Taylor
Hmm yes, it appears that the `airflow connections` CLI doesn't let you create 
multiple connections of the same conn_id. What the WebUI can do the CLI should 
be able to do also! It should allow that in some way (behind a 
`--allow-multiple` flag perhaps? I can see an argument for not allowing set 
multiple by default as it is often not what people want.)

-ash

> On 7 Aug 2018, at 15:17, Deng Xiaodong  wrote:
> 
> Hi Ben,
> 
> If you would like to set multiple connections with the same *conn_id*, you
> can only do that in Web UI. That is, you need to build multiple entries
> with different host, but with the same *conn_id*.
> 
> The method of setting connection in environment variables can only help set
> one connection for each *conn_id*.
> 
> (Of course Web UI stores your connections in the metadata database. So if
> you really need to avoid manual interventions and need multiple connections
> for one single conn_id, hacking into the metadata database may be an
> option? even thought not recommended)
> 
> Other guys please correct me if I'm wrong in any parts. Thanks.
> 
> 
> Regards,
> XD
> 
> On Tue, Aug 7, 2018 at 9:43 PM Ben Laird  wrote:
> 
>> Hello -
>> 
>> I'd like to define multiple connections for the `webhdfs_default` conn_id,
>> as we have a primary and backup host for the hadoop namenode. This is for
>> use with the WebHDFSHook, which queries for connections and seemingly can
>> accept a list of them from the database.
>> 
>> ```
>>nn_connections = self.get_connections(self.webhdfs_conn_id)
>>for nn in nn_connections:
>> ```
>> 
>> https://github.com/apache/incubator-airflow/blob/master/airflow/hooks/webhdfs_hook.py#L54-L55
>> 
>> I can manually add 2 entries for `webhdfs_default` in the Web UI, one for
>> each hostname, however with the CLI and adding the entries to the
>> airflow.cfg I get errors that a connection already exists.
>> 
>> It seems WebHDFSHook doesn't support reading multiple hosts from one entry.
>> Is this a bug? Or is there a different way to accomplish without manually
>> adding via the UI?
>> 
>> Thanks!
>> Ben Laird
>> 



CVE-2017-12614 XSS Vulnerability in Airflow < 1.9

2018-08-06 Thread Ash Berlin-Taylor
CVE-2017-12614: Apache Reflected Reflected XSS Vulnerability

Vendor: The Apache Software Foundation:

Versions Affected: < 1.9

Description:
It was noticed an XSS in certain 404 pages that could be exploited to perform 
an XSS attack. Chrome will detect this as a reflected XSS attempt and prevent 
the page from loading. Firefox and other browsers don't, and are vulnerable to 
this attack.

Mitigation:
The fix for this is to upgrade to Apache Airflow 1.9.0 or above

Credit:
This issue was discovered by Seth Long at Credit Karma

Re: [VOTE] Airflow 1.10.0rc3

2018-08-05 Thread Ash Berlin-Taylor
Yup, just worked out the same thing.

I think as "punishment" for me finding bugs so late in two RCs (this, and 1.9) 
I should run the release for the next release.

-ash

> On 5 Aug 2018, at 22:05, Bolke de Bruin  wrote:
> 
> Yeah I figured it out. Originally i was using a different implementation of 
> UTCDateTime, but that was unmaintained. I switched, but this version changed 
> or has a different contract. While it transforms on storing to UTC it does 
> not so when it receives timezone aware fields from the db. Hence the issue.
> 
> I will prepare a PR that removes the dependency and implements our own 
> extension of DateTime. Probably tomorrow.
> 
> Good catch! Just in time :-(.
> 
> B.
> 
>> On 5 Aug 2018, at 22:43, Ash Berlin-Taylor  
>> wrote:
>> 
>> Entirely possible, though I wasn't even dealing with the scheduler - the 
>> issue I was addressing was entirely in the webserver for a pre-existing Task 
>> Instance.
>> 
>> Ah, I hadn't noticed/twigged we are using sqlalchemy-utc. It appears that 
>> isn't working right/ as expected. This line: 
>> https://github.com/spoqa/sqlalchemy-utc/blob/master/sqlalchemy_utc/sqltypes.py#L34
>>  doens't look right for us - as you mentioned the TZ is set to something 
>> (rather than having no TZ value).
>> 
>> Some background on how Pq handles TZs. It always returns DTs in the TZ of 
>> the connection. I'm not sure if this is unique to postgres or if other DBs 
>> behave the same.
>> 
>> postgres=# select '2018-08-03 00:00:00+00:00'::timestamp with time zone;
>> timestamptz
>> 
>> 2018-08-03 01:00:00+01
>> 
>> postgres=# select '2018-08-03 02:00:00+02'::timestamp with time zone;
>> timestamptz
>> 
>> 2018-08-03 01:00:00+01
>> 
>> The server will always return TZs in the connection timezone.
>> 
>> postgres=# set timezone=utc;
>> SET
>> postgres=# select '2018-08-03 02:00:00+02'::timestamp with time zone;
>> timestamptz
>> 
>> 2018-08-03 00:00:00+00
>> (1 row)
>> 
>> postgres=# select '2018-08-03 01:00:00+01'::timestamp with time zone;
>> timestamptz
>> 
>> 2018-08-03 00:00:00+00
>> (1 row)
>> 
>> 
>> 
>> 
>> -ash
>> 
>>> On 5 Aug 2018, at 21:28, Bolke de Bruin  wrote:
>>> 
>>> This is the issue:
>>> 
>>> [2018-08-05 22:08:21,952] {jobs.py:906} INFO - NEXT RUN DATE: 2018-08-03 
>>> 00:00:00+00:00 tzinfo: 
>>> [2018-08-05 22:08:22,007] {jobs.py:1425} INFO - Created >> example_http_operator @ 2018-08-03 02:00:00+02:00: 
>>> scheduled__2018-08-03T00:00:00+00:00, externally triggered: False>
>>> 
>>> [2018-08-05 22:08:24,651] {jobs.py:906} INFO - NEXT RUN DATE: 2018-08-04 
>>> 02:00:00+02:00 tzinfo: psycopg2.tz.FixedOffsetTimezone(offset=120, 
>>> name=None)
>>> [2018-08-05 22:08:24,696] {jobs.py:1425} INFO - Created >> example_http_operator @ 2018-08-04 02:00:00+02:00: 
>>> scheduled__2018-08-04T02:00:00+02:00, externally triggered: False>
>>> 
>>> Notice at line 1+2: that the next run date is correctly in UTC but from the 
>>> DB it gets a +2. At the next bit (3+4) we get a 
>>> psycopg2.tz.FixedOffsetTimezone which should be set to UTC according to the 
>>> specs of https://github.com/spoqa/sqlalchemy-utc 
>>> <https://github.com/spoqa/sqlalchemy-utc> , but it isn’t. 
>>> 
>>> So changing your setting of the DB to UTC fixes the symptom but not the 
>>> cause.
>>> 
>>> B.
>>> 
>>>> On 5 Aug 2018, at 22:03, Ash Berlin-Taylor 
>>>>  wrote:
>>>> 
>>>> Sorry for being terse before.
>>>> 
>>>> So the issue is that the ts loaded from the DB is not in UTC, it's in 
>>>> GB/+01 (the default of the DB server)
>>>> 
>>>> For me, on a currently running 1.9 (no TZ) db:
>>>> 
>>>> airflow=# select * from task_instance;
>>>> get_op| example_http_operator | 2018-07-23 00:00:00
>>>> 
>>>> This date time appears in the log url, and the path it looks at on S3 is 
>>>> 
>>>> .../example_http_operator/2018-07-23T00:00:00/1.log
>>>> 
>>>> If my postgres server has a default timezone of GB (which the one running 
>>>> on my laptop does), and I then apply the migration then it is

Re: [VOTE] Airflow 1.10.0rc3

2018-08-05 Thread Ash Berlin-Taylor
Entirely possible, though I wasn't even dealing with the scheduler - the issue 
I was addressing was entirely in the webserver for a pre-existing Task Instance.

Ah, I hadn't noticed/twigged we are using sqlalchemy-utc. It appears that isn't 
working right/ as expected. This line: 
https://github.com/spoqa/sqlalchemy-utc/blob/master/sqlalchemy_utc/sqltypes.py#L34
 doens't look right for us - as you mentioned the TZ is set to something 
(rather than having no TZ value).

Some background on how Pq handles TZs. It always returns DTs in the TZ of the 
connection. I'm not sure if this is unique to postgres or if other DBs behave 
the same.

postgres=# select '2018-08-03 00:00:00+00:00'::timestamp with time zone;
  timestamptz

 2018-08-03 01:00:00+01

postgres=# select '2018-08-03 02:00:00+02'::timestamp with time zone;
  timestamptz

 2018-08-03 01:00:00+01

The server will always return TZs in the connection timezone.

postgres=# set timezone=utc;
SET
postgres=# select '2018-08-03 02:00:00+02'::timestamp with time zone;
  timestamptz

 2018-08-03 00:00:00+00
(1 row)

postgres=# select '2018-08-03 01:00:00+01'::timestamp with time zone;
  timestamptz

 2018-08-03 00:00:00+00
(1 row)




-ash

> On 5 Aug 2018, at 21:28, Bolke de Bruin  wrote:
> 
> This is the issue:
> 
> [2018-08-05 22:08:21,952] {jobs.py:906} INFO - NEXT RUN DATE: 2018-08-03 
> 00:00:00+00:00 tzinfo: 
> [2018-08-05 22:08:22,007] {jobs.py:1425} INFO - Created  example_http_operator @ 2018-08-03 02:00:00+02:00: 
> scheduled__2018-08-03T00:00:00+00:00, externally triggered: False>
> 
> [2018-08-05 22:08:24,651] {jobs.py:906} INFO - NEXT RUN DATE: 2018-08-04 
> 02:00:00+02:00 tzinfo: psycopg2.tz.FixedOffsetTimezone(offset=120, name=None)
> [2018-08-05 22:08:24,696] {jobs.py:1425} INFO - Created  example_http_operator @ 2018-08-04 02:00:00+02:00: 
> scheduled__2018-08-04T02:00:00+02:00, externally triggered: False>
> 
> Notice at line 1+2: that the next run date is correctly in UTC but from the 
> DB it gets a +2. At the next bit (3+4) we get a 
> psycopg2.tz.FixedOffsetTimezone which should be set to UTC according to the 
> specs of https://github.com/spoqa/sqlalchemy-utc 
> <https://github.com/spoqa/sqlalchemy-utc> , but it isn’t. 
> 
> So changing your setting of the DB to UTC fixes the symptom but not the cause.
> 
> B.
> 
>> On 5 Aug 2018, at 22:03, Ash Berlin-Taylor  
>> wrote:
>> 
>> Sorry for being terse before.
>> 
>> So the issue is that the ts loaded from the DB is not in UTC, it's in GB/+01 
>> (the default of the DB server)
>> 
>> For me, on a currently running 1.9 (no TZ) db:
>> 
>> airflow=# select * from task_instance;
>> get_op| example_http_operator | 2018-07-23 00:00:00
>> 
>> This date time appears in the log url, and the path it looks at on S3 is 
>> 
>> .../example_http_operator/2018-07-23T00:00:00/1.log
>> 
>> If my postgres server has a default timezone of GB (which the one running on 
>> my laptop does), and I then apply the migration then it is converted to that 
>> local time.
>> 
>> airflow=# select * from task_instance;
>> get_op| example_http_operator | 2018-07-23 01:00:00+01
>> 
>> airflow=# set timezone=UTC;
>> airflow=# select * from task_instance;
>> get_op| example_http_operator | 2018-07-23 00:00:00+00
>> 
>> 
>> This is all okay so far. The migration has kept the column at the same 
>> moment in time.
>> 
>> The issue come when the UI tries to display logs for this old task: because 
>> the timezone of the connection is not UTC, PG returns a date with a +01 TZ. 
>> Thus after the migration this old task tries to look for a log file of
>> 
>> .../example_http_operator/2018-07-23T01:00:00/1.log
>> 
>> which doesn't exist - it's changed the time it has rendered from midnight 
>> (in v1.9) to 1am (in v1.10).
>> 
>> (This is with my change to log_filename_template from UPDATING.md in my 
>> other branch)
>> 
>> Setting the timezone to UTC per connection means the behaviour of Airflow 
>> doesn't change depending on how the server is configured.
>> 
>> -ash
>> 
>>> On 5 Aug 2018, at 20:58, Bolke de Bruin  wrote:
>>> 
>>> Digging in a bit further. 
>>> 
>>>  ti.dag_id / ti.task_id / ts / try_number 
>>> .log
>>> 
>>> is the format
>>> 
>>> ts = execution_date.isoformat and should be in UTC

Re: Airflow committers (a list)

2018-08-05 Thread Ash Berlin-Taylor
  - https://github.com/orgs/apache/teams/airflow-committers/members 


This one at least is populated automatically via a 30-minutely cron job.


> On 5 Aug 2018, at 21:35, Sid Anand  wrote:
> 
> Committers/Mentors,
> We have several places where committers are listed:
> 
>   - https://whimsy.apache.org/roster/ppmc/airflow
>   - http://incubator.apache.org/projects/airflow.html
>   - Mentors, is the committer list here meant to be current?
>  - Kaxil, I added tfeng to this page (done via SVN).. though I'm not
>  sure when the actual publishing happens to this site. I believe it's done
>  via some CI/CD process now.
>   - https://github.com/orgs/apache/teams/airflow-committers/members
>   - This is needed for the Gitbox integration we are using now. Does it
>  get populated automatically from whimsy?
> 
> When we promote a contributor to committer/PPMC, do we need to update all 3
> of these places?
> 
> FYI, there is a PR to also add this list to the README so the community can
> see who the committers are :
> https://github.com/apache/incubator-airflow/pull/3699
> 
> Currently, the committer section of the README points to a wiki page that
> displays all of these links.
> 
> And if you wanted more options, GitHub's CODEOWNERS file provides some
> interesting functionality, though I don't think we need it:
> https://blog.github.com/2017-07-06-introducing-code-owners/
> 
> -s



Re: [VOTE] Airflow 1.10.0rc3

2018-08-05 Thread Ash Berlin-Taylor
Sorry for being terse before.

So the issue is that the ts loaded from the DB is not in UTC, it's in GB/+01 
(the default of the DB server)

For me, on a currently running 1.9 (no TZ) db:

airflow=# select * from task_instance;
get_op| example_http_operator | 2018-07-23 00:00:00

This date time appears in the log url, and the path it looks at on S3 is 

.../example_http_operator/2018-07-23T00:00:00/1.log

If my postgres server has a default timezone of GB (which the one running on my 
laptop does), and I then apply the migration then it is converted to that local 
time.

airflow=# select * from task_instance;
get_op| example_http_operator | 2018-07-23 01:00:00+01

airflow=# set timezone=UTC;
airflow=# select * from task_instance;
get_op| example_http_operator | 2018-07-23 00:00:00+00


This is all okay so far. The migration has kept the column at the same moment 
in time.

The issue come when the UI tries to display logs for this old task: because the 
timezone of the connection is not UTC, PG returns a date with a +01 TZ. Thus 
after the migration this old task tries to look for a log file of

.../example_http_operator/2018-07-23T01:00:00/1.log

which doesn't exist - it's changed the time it has rendered from midnight (in 
v1.9) to 1am (in v1.10).

(This is with my change to log_filename_template from UPDATING.md in my other 
branch)

Setting the timezone to UTC per connection means the behaviour of Airflow 
doesn't change depending on how the server is configured.

-ash

> On 5 Aug 2018, at 20:58, Bolke de Bruin  wrote:
> 
> Digging in a bit further. 
> 
>  ti.dag_id / ti.task_id / ts / try_number .log
> 
> is the format
> 
> ts = execution_date.isoformat and should be in UTC afaik.
> 
> something is weird tbh.
> 
> B.
> 
> 
>> On 5 Aug 2018, at 21:32, Bolke de Bruin  wrote:
>> 
>> Ash,
>> 
>> Reading your proposed changes on your “set-timezone-to-utc” branch and below 
>> analysis, I am not sure what you are perceiving as an issue.
>> 
>> For conversion we assume everything is stored in UTC and in a naive format. 
>> Conversion then adds the timezone information. This results in the following
>> 
>> postgres timezone = “Europe/Amsterdam”
>> 
>> 
>> airflow=# select * from task_instance;
>> get_op| example_http_operator | 2018-07-27 02:00:00+02
>> 
>> airflow=# set timezone=UTC;
>> airflow=# select * from task_instance;
>> get_op| example_http_operator | 2018-07-27 00:00:00+00
>> 
>> If we don’t set the timezone in the connection postgres assumes server 
>> timezone (in my case “Europe/Amsterdam”). So every datetime Airflow receives 
>> will be in “Europe/Amsterdam” format. However as we defined the model to use 
>> UTCDateTime it will always convert the returned DateTime to UTC.
>> 
>> If we have configured Airflow to support something else as UTC as the 
>> default timezone or a DAG has a associated timezone we only convert to that 
>> timezone when calculating the next runtime (not for cron btw). Nowhere else 
>> and thus we are UTC everywhere.
>> 
>> What do you think is inconsistent?
>> 
>> Bolke
>> 
>> 
>> 
>> 
>> 
>> 
>>> On 5 Aug 2018, at 18:13, Ash Berlin-Taylor  
>>> wrote:
>>> 
>>> Relating to 2): I'm not sure that the upgrade from timezoneless to timezone 
>>> aware colums in the task instance is right, or at least it's not what I 
>>> expected.
>>> 
>>> Before weren't all TZs from schedule dates etc in UTC? For the same task 
>>> instance (these outputs from psql directly):
>>> 
>>> before: execution_date=2017-09-04 00:00:00
>>> after: execution_date=2017-09-04 01:00:00+01
>>> 
>>> **Okay the migration is fine**. It appears that the migration has done the 
>>> right thing, but my local DB I'm testing with has a Timezone of GB set, so 
>>> Postgres converts it to that TZ on returning an object.
>>> 
>>> 3) Do we need to set the TZ of the connection to UTC in SQLAlchemy to have 
>>> consistent behaviour? Is this possible some how? I don't know SQLAlchemy 
>>> that well.
>>> 
>>> 
>>> -ash
>>> 
>>>> On 5 Aug 2018, at 16:01, Ash Berlin-Taylor 
>>>>  wrote:
>>>> 
>>>> 1.) Missing UPDATING note about change of task_log_reader to now always 
>>>> being "task" (was "s3.task" before.). Logging config is much simpler now 
>>>> though. This may be particular to my logging config, but gi

Re: [VOTE] Airflow 1.10.0rc3

2018-08-05 Thread Ash Berlin-Taylor

> On 5 Aug 2018, at 18:01, Bolke de Bruin  wrote:
> 
> Hi Ash,
> 
> Thanks a lot for the proper review, obviously I would have liked that these 
> issues (I think just one) are popping up at rc3 but I understand why it 
> happened. 

Yeah, sorry I didn't couldn't make time to test the betas :(

> 
> Can you work out a patch for the k8s issue? I’m sure Fokko and others can 
> chime in to make sure it will be the right change. 

Working on it - I'll go for a conditional import, and only except if the 
"k8s://" scheme is specified I think.

> 
> On the timezone change. The database will do the right thing and correctly 
> transform a datetime into the timezone the client is using. Even then we 
> enforce UTC internally and only transform it for user interaction or when it 
> is relevant (to make sure we do daylight savings for example). It is 
> therefore not required to force a timezone setting with sql alchemy beyond 
> when we convert to timezone aware (see migration scripts).

I think the database is being "smart" here in converting, but I'm not sure it's 
the Right thing. It wouldn't surprise me if we have other places in the 
codebase that expect datetime columns to come back in UTC, but they might come 
back in DB-server local timezone.

Trying 
https://github.com/apache/incubator-airflow/compare/master...ashb:set-timezone-to-utc-on-connect?expand=1
 
<https://github.com/apache/incubator-airflow/compare/master...ashb:set-timezone-to-utc-on-connect?expand=1>
 - it "fixes" my logging issue, tests are running 
https://travis-ci.org/ashb/incubator-airflow/builds/412360920

> 
> On the logging file format I agree this could be handled better. However I do 
> think we should honor local system time for this as this is the standard for 
> any other logging. Also logging output will be time stamped  in local system 
> time. Maybe we could cut off the timezone identifier as it can be assumed to 
> be in local system time (+01:00). 

The issue with just cutting off the timezone is that old log files are now 
unviewable - they ran at 00:00:00 UTC, but the hour of the record coming back 
is 01.

> 
> If we take on the k8s fix we can also fix the logging format. What do you 
> think?

Also as a quick fix I've changed the UPDATING.md as suggested: 
https://github.com/apache/incubator-airflow/compare/master...ashb:updating-for-logging-changes?expand=1.
 The log format is a bit clunky, but the note about log_task_reader is needed 
either way. (Do we need a Jira ticket for this sort of change, or is 
AIRFLOW-XXX okay for this?)

> 
> Cheers
> Bolke
> 
> Verstuurd vanaf mijn iPad
> 
>> Op 5 aug. 2018 om 18:13 heeft Ash Berlin-Taylor 
>>  het volgende geschreven:
>> 
>> Relating to 2): I'm not sure that the upgrade from timezoneless to timezone 
>> aware colums in the task instance is right, or at least it's not what I 
>> expected.
>> 
>> Before weren't all TZs from schedule dates etc in UTC? For the same task 
>> instance (these outputs from psql directly):
>> 
>> before: execution_date=2017-09-04 00:00:00
>> after: execution_date=2017-09-04 01:00:00+01
>> 
>> **Okay the migration is fine**. It appears that the migration has done the 
>> right thing, but my local DB I'm testing with has a Timezone of GB set, so 
>> Postgres converts it to that TZ on returning an object.
>> 
>> 3) Do we need to set the TZ of the connection to UTC in SQLAlchemy to have 
>> consistent behaviour? Is this possible some how? I don't know SQLAlchemy 
>> that well.
>> 
>> 
>> -ash
>> 
>>> On 5 Aug 2018, at 16:01, Ash Berlin-Taylor  
>>> wrote:
>>> 
>>> 1.) Missing UPDATING note about change of task_log_reader to now always 
>>> being "task" (was "s3.task" before.). Logging config is much simpler now 
>>> though. This may be particular to my logging config, but given how much of 
>>> a pain it was to set up S3 logging in 1.9 I have shared my config with some 
>>> people in the Gitter chat so It's not just me.
>>> 
>>> 2) The path that log-files are written to in S3 has changed (again - this 
>>> happened from 1.8 to 1.9). I'd like to avoid having to move all of my log 
>>> files again to continue viewing them. The change is that the path now (in 
>>> 1.10) has a timezone in it, and the date is in local time, before it was 
>>> UTC:
>>> 
>>> before: 2018-07-23T00:00:00/1.log
>>> after: 2018-07-23T01:00:00+01:00/1.log
>>> 
>>> We can possibly get away with an updating note about this to set a custom 
>>> log

Re: [VOTE] Airflow 1.10.0rc3

2018-08-05 Thread Ash Berlin-Taylor
Relating to 2): I'm not sure that the upgrade from timezoneless to timezone 
aware colums in the task instance is right, or at least it's not what I 
expected.

Before weren't all TZs from schedule dates etc in UTC? For the same task 
instance (these outputs from psql directly):

before: execution_date=2017-09-04 00:00:00
after: execution_date=2017-09-04 01:00:00+01

**Okay the migration is fine**. It appears that the migration has done the 
right thing, but my local DB I'm testing with has a Timezone of GB set, so 
Postgres converts it to that TZ on returning an object.

3) Do we need to set the TZ of the connection to UTC in SQLAlchemy to have 
consistent behaviour? Is this possible some how? I don't know SQLAlchemy that 
well.


-ash

> On 5 Aug 2018, at 16:01, Ash Berlin-Taylor  
> wrote:
> 
> 1.) Missing UPDATING note about change of task_log_reader to now always being 
> "task" (was "s3.task" before.). Logging config is much simpler now though. 
> This may be particular to my logging config, but given how much of a pain it 
> was to set up S3 logging in 1.9 I have shared my config with some people in 
> the Gitter chat so It's not just me.
> 
> 2) The path that log-files are written to in S3 has changed (again - this 
> happened from 1.8 to 1.9). I'd like to avoid having to move all of my log 
> files again to continue viewing them. The change is that the path now (in 
> 1.10) has a timezone in it, and the date is in local time, before it was UTC:
> 
> before: 2018-07-23T00:00:00/1.log
> after: 2018-07-23T01:00:00+01:00/1.log
> 
> We can possibly get away with an updating note about this to set a custom 
> log_filename_template. Testing this now.
> 
>> On 5 Aug 2018, at 15:00, Ash Berlin-Taylor  wrote:
>> 
>> -1(binding) from me.
>> 
>> Installed with:
>> 
>> AIRFLOW_GPL_UNIDECODE=yes pip install 
>> 'https://dist.apache.org/repos/dist/dev/incubator/airflow/1.10.0rc3/apache-airflow-1.10.0rc3+incubating-bin.tar.gz#egg=apache-airflow[emr
>>  
>> <https://dist.apache.org/repos/dist/dev/incubator/airflow/1.10.0rc3/apache-airflow-1.10.0rc3+incubating-bin.tar.gz#egg=apache-airflow[emr>,
>>  s3, crypto]>=1.10'
>> 
>> Install went fine.
>> 
>> Our DAGs that use SparkSubmitOperator are now failing as there is now a hard 
>> dependency on the Kubernetes client libs, but the `emr` group doesn't 
>> mention this.
>> 
>> Introduced in https://github.com/apache/incubator-airflow/pull/3112 
>> <https://github.com/apache/incubator-airflow/pull/3112>
>> 
>> I see two options for this - either conditionally enable k8s:// support if 
>> the import works, or (less preferred) add kube-client to the emr deps (which 
>> I like less)
>> 
>> Sorry - this is the first time I've been able to test it.
>> 
>> I will install this dep manually and continue testing.
>> 
>> -ash
>> 
>> (Normally no time at home due to new baby, but I got a standing desk, and a 
>> carrier meaning she can sleep on me and I can use my laptop. Win!)
>> 
>> 
>> 
>>> On 4 Aug 2018, at 22:32, Bolke de Bruin >> <mailto:bdbr...@gmail.com>> wrote:
>>> 
>>> Bump. 
>>> 
>>> Committers please cast your vote. 
>>> 
>>> B.
>>> 
>>> Sent from my iPhone
>>> 
>>>> On 3 Aug 2018, at 13:23, Driesprong, Fokko >>> <mailto:fo...@driesprong.frl>> wrote:
>>>> 
>>>> +1 Binding
>>>> 
>>>> Installed it using: SLUGIFY_USES_TEXT_UNIDECODE=yes pip install
>>>> https://dist.apache.org/repos/dist/dev/incubator/airflow/1.10.0rc3/apache-airflow-1.10.0rc3+incubating-bin.tar.gz
>>>>  
>>>> <https://dist.apache.org/repos/dist/dev/incubator/airflow/1.10.0rc3/apache-airflow-1.10.0rc3+incubating-bin.tar.gz>
>>>> 
>>>> Cheers, Fokko
>>>> 
>>>> 2018-08-03 9:47 GMT+02:00 Bolke de Bruin :
>>>> 
>>>>> Hey all,
>>>>> 
>>>>> I have cut Airflow 1.10.0 RC3. This email is calling a vote on the 
>>>>> release,
>>>>> which will last for 72 hours. Consider this my (binding) +1.
>>>>> 
>>>>> Airflow 1.10.0 RC 3 is available at:
>>>>> 
>>>>> https://dist.apache.org/repos/dist/dev/incubator/airflow/1.10.0rc3/ <
>>>>> https://dist.apache.org/repos/dist/dev/incubator/airflow/1.10.0rc3/>
>>>>> 
>>>>> apache-airflow-1.10.0rc3+incubating-source.tar.gz is a source release that
>&g

Re: [VOTE] Airflow 1.10.0rc3

2018-08-05 Thread Ash Berlin-Taylor
-1(binding) from me.

Installed with:

AIRFLOW_GPL_UNIDECODE=yes pip install 
'https://dist.apache.org/repos/dist/dev/incubator/airflow/1.10.0rc3/apache-airflow-1.10.0rc3+incubating-bin.tar.gz#egg=apache-airflow[emr,
 s3, crypto]>=1.10'

Install went fine.

Our DAGs that use SparkSubmitOperator are now failing as there is now a hard 
dependency on the Kubernetes client libs, but the `emr` group doesn't mention 
this.

Introduced in https://github.com/apache/incubator-airflow/pull/3112 


I see two options for this - either conditionally enable k8s:// support if the 
import works, or (less preferred) add kube-client to the emr deps (which I like 
less)

Sorry - this is the first time I've been able to test it.

I will install this dep manually and continue testing.

-ash

(Normally no time at home due to new baby, but I got a standing desk, and a 
carrier meaning she can sleep on me and I can use my laptop. Win!)



> On 4 Aug 2018, at 22:32, Bolke de Bruin  wrote:
> 
> Bump. 
> 
> Committers please cast your vote. 
> 
> B.
> 
> Sent from my iPhone
> 
>> On 3 Aug 2018, at 13:23, Driesprong, Fokko  wrote:
>> 
>> +1 Binding
>> 
>> Installed it using: SLUGIFY_USES_TEXT_UNIDECODE=yes pip install
>> https://dist.apache.org/repos/dist/dev/incubator/airflow/1.10.0rc3/apache-airflow-1.10.0rc3+incubating-bin.tar.gz
>> 
>> Cheers, Fokko
>> 
>> 2018-08-03 9:47 GMT+02:00 Bolke de Bruin :
>> 
>>> Hey all,
>>> 
>>> I have cut Airflow 1.10.0 RC3. This email is calling a vote on the release,
>>> which will last for 72 hours. Consider this my (binding) +1.
>>> 
>>> Airflow 1.10.0 RC 3 is available at:
>>> 
>>> https://dist.apache.org/repos/dist/dev/incubator/airflow/1.10.0rc3/ <
>>> https://dist.apache.org/repos/dist/dev/incubator/airflow/1.10.0rc3/>
>>> 
>>> apache-airflow-1.10.0rc3+incubating-source.tar.gz is a source release that
>>> comes with INSTALL instructions.
>>> apache-airflow-1.10.0rc3+incubating-bin.tar.gz is the binary Python
>>> "sdist"
>>> release.
>>> 
>>> Public keys are available at:
>>> 
>>> https://dist.apache.org/repos/dist/release/incubator/airflow/ <
>>> https://dist.apache.org/repos/dist/release/incubator/airflow/>
>>> 
>>> The amount of JIRAs fixed is over 700. Please have a look at the
>>> changelog.
>>> Since RC2 the following has been fixed:
>>> 
>>> * [AIRFLOW-2817] Force explicit choice on GPL dependency
>>> * [AIRFLOW-2716] Replace async and await py3.7 keywords
>>> * [AIRFLOW-2810] Fix typo in Xcom model timestamp
>>> 
>>> Please note that the version number excludes the `rcX` string as well
>>> as the "+incubating" string, so it's now simply 1.10.0. This will allow us
>>> to rename the artifact without modifying the artifact checksums when we
>>> actually release.
>>> 
>>> WARNING: Due to licensing requirements you will need to set
>>> SLUGIFY_USES_TEXT_UNIDECODE=yes in your environment when
>>> installing or upgrading. We will try to remove this requirement for the
>>> next release.
>>> 
>>> Cheers,
>>> Bolke



Re: [VOTE] Airflow 1.10.0rc3

2018-08-05 Thread Ash Berlin-Taylor
1) Missing UPDATING note about change of task_log_reader to now always being 
"task" (was "s3.task" before.). Logging config is much simpler now though. This 
may be particular to my logging config, but given how much of a pain it was to 
set up S3 logging in 1.9 I have shared my config with some people in the Gitter 
chat so It's not just me.

2) The path that log-files are written to in S3 has changed (again - this 
happened from 1.8 to 1.9). I'd like to avoid having to move all of my log files 
again to continue viewing them. The change is that the path now (in 1.10) has a 
timezone in it, and the date is in local time, before it was UTC:

before: 2018-07-23T00:00:00/1.log
after: 2018-07-23T01:00:00+01:00/1.log

We can possibly get away with an updating note about this to set a custom 
log_filename_template. Testing this now.


> On 5 Aug 2018, at 15:00, Ash Berlin-Taylor  wrote:
> 
> -1(binding) from me.
> 
> Installed with:
> 
> AIRFLOW_GPL_UNIDECODE=yes pip install 
> 'https://dist.apache.org/repos/dist/dev/incubator/airflow/1.10.0rc3/apache-airflow-1.10.0rc3+incubating-bin.tar.gz#egg=apache-airflow[emr
>  
> <https://dist.apache.org/repos/dist/dev/incubator/airflow/1.10.0rc3/apache-airflow-1.10.0rc3+incubating-bin.tar.gz#egg=apache-airflow[emr>,
>  s3, crypto]>=1.10'
> 
> Install went fine.
> 
> Our DAGs that use SparkSubmitOperator are now failing as there is now a hard 
> dependency on the Kubernetes client libs, but the `emr` group doesn't mention 
> this.
> 
> Introduced in https://github.com/apache/incubator-airflow/pull/3112 
> <https://github.com/apache/incubator-airflow/pull/3112>
> 
> I see two options for this - either conditionally enable k8s:// support if 
> the import works, or (less preferred) add kube-client to the emr deps (which 
> I like less)
> 
> Sorry - this is the first time I've been able to test it.
> 
> I will install this dep manually and continue testing.
> 
> -ash
> 
> (Normally no time at home due to new baby, but I got a standing desk, and a 
> carrier meaning she can sleep on me and I can use my laptop. Win!)
> 
> 
> 
>> On 4 Aug 2018, at 22:32, Bolke de Bruin > <mailto:bdbr...@gmail.com>> wrote:
>> 
>> Bump. 
>> 
>> Committers please cast your vote. 
>> 
>> B.
>> 
>> Sent from my iPhone
>> 
>>> On 3 Aug 2018, at 13:23, Driesprong, Fokko >> <mailto:fo...@driesprong.frl>> wrote:
>>> 
>>> +1 Binding
>>> 
>>> Installed it using: SLUGIFY_USES_TEXT_UNIDECODE=yes pip install
>>> https://dist.apache.org/repos/dist/dev/incubator/airflow/1.10.0rc3/apache-airflow-1.10.0rc3+incubating-bin.tar.gz
>>>  
>>> <https://dist.apache.org/repos/dist/dev/incubator/airflow/1.10.0rc3/apache-airflow-1.10.0rc3+incubating-bin.tar.gz>
>>> 
>>> Cheers, Fokko
>>> 
>>> 2018-08-03 9:47 GMT+02:00 Bolke de Bruin :
>>> 
>>>> Hey all,
>>>> 
>>>> I have cut Airflow 1.10.0 RC3. This email is calling a vote on the release,
>>>> which will last for 72 hours. Consider this my (binding) +1.
>>>> 
>>>> Airflow 1.10.0 RC 3 is available at:
>>>> 
>>>> https://dist.apache.org/repos/dist/dev/incubator/airflow/1.10.0rc3/ <
>>>> https://dist.apache.org/repos/dist/dev/incubator/airflow/1.10.0rc3/>
>>>> 
>>>> apache-airflow-1.10.0rc3+incubating-source.tar.gz is a source release that
>>>> comes with INSTALL instructions.
>>>> apache-airflow-1.10.0rc3+incubating-bin.tar.gz is the binary Python
>>>> "sdist"
>>>> release.
>>>> 
>>>> Public keys are available at:
>>>> 
>>>> https://dist.apache.org/repos/dist/release/incubator/airflow/ <
>>>> https://dist.apache.org/repos/dist/release/incubator/airflow/>
>>>> 
>>>> The amount of JIRAs fixed is over 700. Please have a look at the
>>>> changelog.
>>>> Since RC2 the following has been fixed:
>>>> 
>>>> * [AIRFLOW-2817] Force explicit choice on GPL dependency
>>>> * [AIRFLOW-2716] Replace async and await py3.7 keywords
>>>> * [AIRFLOW-2810] Fix typo in Xcom model timestamp
>>>> 
>>>> Please note that the version number excludes the `rcX` string as well
>>>> as the "+incubating" string, so it's now simply 1.10.0. This will allow us
>>>> to rename the artifact without modifying the artifact checksums when we
>>>> actually release.
>>>> 
>>>> WARNING: Due to licensing requirements you will need to set
>>>> SLUGIFY_USES_TEXT_UNIDECODE=yes in your environment when
>>>> installing or upgrading. We will try to remove this requirement for the
>>>> next release.
>>>> 
>>>> Cheers,
>>>> Bolke
> 



Re: [VOTE] Airflow 1.10.0rc3

2018-08-05 Thread Ash Berlin-Taylor
1.) Missing UPDATING note about change of task_log_reader to now always being 
"task" (was "s3.task" before.). Logging config is much simpler now though. This 
may be particular to my logging config, but given how much of a pain it was to 
set up S3 logging in 1.9 I have shared my config with some people in the Gitter 
chat so It's not just me.

2) The path that log-files are written to in S3 has changed (again - this 
happened from 1.8 to 1.9). I'd like to avoid having to move all of my log files 
again to continue viewing them. The change is that the path now (in 1.10) has a 
timezone in it, and the date is in local time, before it was UTC:

before: 2018-07-23T00:00:00/1.log
after: 2018-07-23T01:00:00+01:00/1.log

We can possibly get away with an updating note about this to set a custom 
log_filename_template. Testing this now.

> On 5 Aug 2018, at 15:00, Ash Berlin-Taylor  wrote:
> 
> -1(binding) from me.
> 
> Installed with:
> 
> AIRFLOW_GPL_UNIDECODE=yes pip install 
> 'https://dist.apache.org/repos/dist/dev/incubator/airflow/1.10.0rc3/apache-airflow-1.10.0rc3+incubating-bin.tar.gz#egg=apache-airflow[emr
>  
> <https://dist.apache.org/repos/dist/dev/incubator/airflow/1.10.0rc3/apache-airflow-1.10.0rc3+incubating-bin.tar.gz#egg=apache-airflow[emr>,
>  s3, crypto]>=1.10'
> 
> Install went fine.
> 
> Our DAGs that use SparkSubmitOperator are now failing as there is now a hard 
> dependency on the Kubernetes client libs, but the `emr` group doesn't mention 
> this.
> 
> Introduced in https://github.com/apache/incubator-airflow/pull/3112 
> <https://github.com/apache/incubator-airflow/pull/3112>
> 
> I see two options for this - either conditionally enable k8s:// support if 
> the import works, or (less preferred) add kube-client to the emr deps (which 
> I like less)
> 
> Sorry - this is the first time I've been able to test it.
> 
> I will install this dep manually and continue testing.
> 
> -ash
> 
> (Normally no time at home due to new baby, but I got a standing desk, and a 
> carrier meaning she can sleep on me and I can use my laptop. Win!)
> 
> 
> 
>> On 4 Aug 2018, at 22:32, Bolke de Bruin > <mailto:bdbr...@gmail.com>> wrote:
>> 
>> Bump. 
>> 
>> Committers please cast your vote. 
>> 
>> B.
>> 
>> Sent from my iPhone
>> 
>>> On 3 Aug 2018, at 13:23, Driesprong, Fokko >> <mailto:fo...@driesprong.frl>> wrote:
>>> 
>>> +1 Binding
>>> 
>>> Installed it using: SLUGIFY_USES_TEXT_UNIDECODE=yes pip install
>>> https://dist.apache.org/repos/dist/dev/incubator/airflow/1.10.0rc3/apache-airflow-1.10.0rc3+incubating-bin.tar.gz
>>>  
>>> <https://dist.apache.org/repos/dist/dev/incubator/airflow/1.10.0rc3/apache-airflow-1.10.0rc3+incubating-bin.tar.gz>
>>> 
>>> Cheers, Fokko
>>> 
>>> 2018-08-03 9:47 GMT+02:00 Bolke de Bruin :
>>> 
>>>> Hey all,
>>>> 
>>>> I have cut Airflow 1.10.0 RC3. This email is calling a vote on the release,
>>>> which will last for 72 hours. Consider this my (binding) +1.
>>>> 
>>>> Airflow 1.10.0 RC 3 is available at:
>>>> 
>>>> https://dist.apache.org/repos/dist/dev/incubator/airflow/1.10.0rc3/ <
>>>> https://dist.apache.org/repos/dist/dev/incubator/airflow/1.10.0rc3/>
>>>> 
>>>> apache-airflow-1.10.0rc3+incubating-source.tar.gz is a source release that
>>>> comes with INSTALL instructions.
>>>> apache-airflow-1.10.0rc3+incubating-bin.tar.gz is the binary Python
>>>> "sdist"
>>>> release.
>>>> 
>>>> Public keys are available at:
>>>> 
>>>> https://dist.apache.org/repos/dist/release/incubator/airflow/ <
>>>> https://dist.apache.org/repos/dist/release/incubator/airflow/>
>>>> 
>>>> The amount of JIRAs fixed is over 700. Please have a look at the
>>>> changelog.
>>>> Since RC2 the following has been fixed:
>>>> 
>>>> * [AIRFLOW-2817] Force explicit choice on GPL dependency
>>>> * [AIRFLOW-2716] Replace async and await py3.7 keywords
>>>> * [AIRFLOW-2810] Fix typo in Xcom model timestamp
>>>> 
>>>> Please note that the version number excludes the `rcX` string as well
>>>> as the "+incubating" string, so it's now simply 1.10.0. This will allow us
>>>> to rename the artifact without modifying the artifact checksums when we
>>>> actually release.
>>>> 
>>>> WARNING: Due to licensing requirements you will need to set
>>>> SLUGIFY_USES_TEXT_UNIDECODE=yes in your environment when
>>>> installing or upgrading. We will try to remove this requirement for the
>>>> next release.
>>>> 
>>>> Cheers,
>>>> Bolke
> 



Re: The need for LocalTaskJob

2018-08-04 Thread Ash Berlin-Taylor


> On 4 Aug 2018, at 21:25, Bolke de Bruin  wrote:
> 
> We can just execute “python” just fine. Because it will run in a separate 
> interpreter no issues will come from sys.modules as that is not inherited. 
> Will still parse DAGs in a separate process then. Forking (@ash) probably 
> does not work as that does share sys.modules. 

Some sharing of modules was my idea - if we are careful about what modules we 
load, and we only load the airflow core pre fork, and don't parse any DAG 
pre-fork, then forking sharing currently loaded modules is a good thing for 
speed. Think of it like the preload_app option to a gunicorn worker, where the 
master loads the app and then forks.

> [snip]
> 
> I’m writing AIP-2 
> https://cwiki.apache.org/confluence/display/AIRFLOW/AIP-2+Simplify+process+launching
>  to work this out.

Sounds good. I'm not proposing we try my forking idea yet, and your proposal is 
a definite improvement from where we are now.

> 
> B.
> 
> Verstuurd vanaf mijn iPad
> 
>> Op 4 aug. 2018 om 19:40 heeft Ash Berlin-Taylor 
>>  het volgende geschreven:
>> 
>> Comments inline.
>> 
>>> On 4 Aug 2018, at 18:28, Maxime Beauchemin  
>>> wrote:
>>> 
>>> Let me confirm I'm understanding this right, we're talking specifically
>>> about the CeleryExecutor not starting and `airflow run` (not --raw)
>>> command, and fire up a LocalTaskJob instead? Then we'd still have the
>>> worker fire up the `airflow run --raw` command?
>>> 
>>> Seems reasonable. One thing to keep in mind is the fact that shelling out
>>> guarantees no `sys.module` caching, which is a real issue for slowly
>>> changing DAG definitions. That's the reason why we'd have to reboot the
>>> scheduler periodically before it used sub-processes to evaluate DAGs. Any
>>> code that needs to evaluate a DAG should probably be done in a subprocess.
>> 
>>> 
>>> Shelling out also allows for doing things like unix impersonation and
>>> applying CGROUPS. This currently happens between `airflow run` and `airflow
>>> run --raw`. The parent process also does heartbeat and listen for external
>>> kill signal (kill pills).
>>> 
>>> I think what we want is smarter executors and only one level of bash
>>> command: the `airflow run --raw`, and ideally the system that fires this up
>>> is not Airflow itself, and cannot be DAG-aware (or it will need to get
>>> restarted to flush the cache).
>> 
>> Rather than shelling out to `airflow run` could we instead fork and run the 
>> CLI code directly? This involves parsing the config twice, loading all of 
>> the airflow and SQLAlchemy deps twice etc. This I think would account for a 
>> not-insignificant speed difference for the unit tests. In the case of 
>> impersonation we'd probably have no option but to exec `airflow`, but 
>> most(?) people don't use that?
>> 
>> Avoiding the extra parsing pentalty and process when we don't need it might 
>> be worth it for test speed up alone. And we've already got impersonation 
>> covered in the tests so we'll know that it still works.
>> 
>>> 
>>> To me that really brings up the whole question of what should be handled by
>>> the Executor, and what belongs in core Airflow. The Executor needs to do
>>> more, and Airflow core less.
>> 
>> I agree with the sentiment that Core should do less and Executors more -- 
>> many parts of the core are reimplementing what Celery itself could do.
>> 
>> 
>>> 
>>> When you think about how this should all work on Kubernetes, it looks
>>> something like this:
>>> * the scheduler, through KubeExecutor, calls the k8s API, tells it to fire
>>> up and Airflow task
>>> * container boots up and starts an `airflow run --raw` command
>>> * k8s handles heartbeats, monitors tasks, knows how to kill a running task
>>> * the scheduler process (call it supervisor), talks with k8s through
>>> KubeExecutor
>>> and handles zombie cleanup and sending kill pills
>>> 
>>> Now because Celery doesn't offer as many guarantees it gets a bit more
>>> tricky. Is there even a way to send a kill pill through Celery? Are there
>>> other ways than using a parent process to accomplish this?
>> 
>> It does 
>> http://docs.celeryproject.org/en/latest/userguide/workers.html#revoke-revoking-tasks
>>  (at least it does now)
>> 
>>> 
>>> At a higher level, it seems like we need to move more logic from core
>>> Airflow into the executors. For instance, the heartbeat construct should
>>> probably be 100% handled by the executor, and not an assumption in the core
>>> code base.
>>> 
>>> I think I drifted a bit, hopefully that's still helpful.
>>> 
>>> Max



Re: The need for LocalTaskJob

2018-08-04 Thread Ash Berlin-Taylor
Comments inline.

> On 4 Aug 2018, at 18:28, Maxime Beauchemin  wrote:
> 
> Let me confirm I'm understanding this right, we're talking specifically
> about the CeleryExecutor not starting and `airflow run` (not --raw)
> command, and fire up a LocalTaskJob instead? Then we'd still have the
> worker fire up the `airflow run --raw` command?
> 
> Seems reasonable. One thing to keep in mind is the fact that shelling out
> guarantees no `sys.module` caching, which is a real issue for slowly
> changing DAG definitions. That's the reason why we'd have to reboot the
> scheduler periodically before it used sub-processes to evaluate DAGs. Any
> code that needs to evaluate a DAG should probably be done in a subprocess.

> 
> Shelling out also allows for doing things like unix impersonation and
> applying CGROUPS. This currently happens between `airflow run` and `airflow
> run --raw`. The parent process also does heartbeat and listen for external
> kill signal (kill pills).
> 
> I think what we want is smarter executors and only one level of bash
> command: the `airflow run --raw`, and ideally the system that fires this up
> is not Airflow itself, and cannot be DAG-aware (or it will need to get
> restarted to flush the cache).

Rather than shelling out to `airflow run` could we instead fork and run the CLI 
code directly? This involves parsing the config twice, loading all of the 
airflow and SQLAlchemy deps twice etc. This I think would account for a 
not-insignificant speed difference for the unit tests. In the case of 
impersonation we'd probably have no option but to exec `airflow`, but most(?) 
people don't use that?

Avoiding the extra parsing pentalty and process when we don't need it might be 
worth it for test speed up alone. And we've already got impersonation covered 
in the tests so we'll know that it still works.

> 
> To me that really brings up the whole question of what should be handled by
> the Executor, and what belongs in core Airflow. The Executor needs to do
> more, and Airflow core less.

I agree with the sentiment that Core should do less and Executors more -- many 
parts of the core are reimplementing what Celery itself could do.


> 
> When you think about how this should all work on Kubernetes, it looks
> something like this:
> * the scheduler, through KubeExecutor, calls the k8s API, tells it to fire
> up and Airflow task
> * container boots up and starts an `airflow run --raw` command
> * k8s handles heartbeats, monitors tasks, knows how to kill a running task
> * the scheduler process (call it supervisor), talks with k8s through
> KubeExecutor
> and handles zombie cleanup and sending kill pills
> 
> Now because Celery doesn't offer as many guarantees it gets a bit more
> tricky. Is there even a way to send a kill pill through Celery? Are there
> other ways than using a parent process to accomplish this?

It does 
http://docs.celeryproject.org/en/latest/userguide/workers.html#revoke-revoking-tasks
 (at least it does now)

> 
> At a higher level, it seems like we need to move more logic from core
> Airflow into the executors. For instance, the heartbeat construct should
> probably be 100% handled by the executor, and not an assumption in the core
> code base.
> 
> I think I drifted a bit, hopefully that's still helpful.
> 
> Max


Re: Apache Git Services

2018-08-02 Thread Ash Berlin-Taylor
I think we now only get them once, rather than once from gitbox, and once again 
form gitbox sending them to Jira :/

On https://issues.apache.org/jira/browse/INFRA-16854 it was said we require 
github notifications to go a list (even though we didn't have them before. 
Guess policies change, eh?), and using notifications@ is a common pattern other 
projects use.

We need one of our Apache mentors to create us 
notificati...@airflow.incubator.apache.org via http://selfserve.apache.org/ 
then we can get the github comments etc redirected there.

Chris, Hitesh, Jakob: would one of you be so kind as to create this list for 
us? Thanks.

-ash

> On 1 Aug 2018, at 22:46, Sid Anand  wrote:
> 
> So, apparently, we should no longer see comments via Gitbox?
> 
> On Wed, Aug 1, 2018 at 3:40 AM Michał Niemiec 
> wrote:
> 
>> I find the experience similar - it's already classified as spam and makes
>> it bit difficult to use.
>> 
>> Any chance this could become separate list maybe?
>> 
>> Regards
>> Michał Niemiec
>> 
>> 
>> On 01/08/2018, 12:07, "Victor Noagbodji" <
>> vnoagbo...@amplify-analytics.com> wrote:
>> 
>>Hey people, what is Apache Git Services? And why are we all receiving
>> notifications (even those by bots) sent by that service? Where can I turn
>> those off? They are (for me at least) ruining the mailing list experience...
>> 
>> 



  1   2   3   >