Continuous Dag

2017-03-13 Thread Edgardo Vega
We are currently trying to port our current solution into airflow. What is
currently stumping me is we have a few tasks we have running pretty much
all the time. Once it is done we wait a few minutes and kick off another.

Would this be possible in airflow?

-- 
Cheers,

Edgardo


Re: Continuous Dag

2017-03-13 Thread Maxime Beauchemin
"Would this be possible in airflow?"

What do you mean by "this"?

Max

On Mon, Mar 13, 2017 at 9:09 AM, Edgardo Vega 
wrote:

> We are currently trying to port our current solution into airflow. What is
> currently stumping me is we have a few tasks we have running pretty much
> all the time. Once it is done we wait a few minutes and kick off another.
>
> Would this be possible in airflow?
>
> --
> Cheers,
>
> Edgardo
>


Re: Continuous Dag

2017-03-13 Thread Edgardo Vega
Max,

Sorry for so many ambiguous antecedents.

I want to create a dag that does an operation waits 2 minutes and the runs
again over and over for all time. I don't know if that is possible to do
with airflow or somehow trick airflow into doing this.

I hope that clears things up.

Cheers,

Edgardo


On Mon, Mar 13, 2017 at 12:39 PM, Maxime Beauchemin <
maximebeauche...@gmail.com> wrote:

> "Would this be possible in airflow?"
>
> What do you mean by "this"?
>
> Max
>
> On Mon, Mar 13, 2017 at 9:09 AM, Edgardo Vega 
> wrote:
>
> > We are currently trying to port our current solution into airflow. What
> is
> > currently stumping me is we have a few tasks we have running pretty much
> > all the time. Once it is done we wait a few minutes and kick off another.
> >
> > Would this be possible in airflow?
> >
> > --
> > Cheers,
> >
> > Edgardo
> >
>



-- 
Cheers,

Edgardo


Re: [VOTE] Release Airflow 1.8.0 based on Airflow 1.8.0rc5

2017-03-13 Thread Jeremiah Lowin
+1 (binding) extremely impressed by the work and diligence all contributors
have put in to getting these blockers fixed, Bolke in particular.

On Mon, Mar 13, 2017 at 1:07 AM Arthur Wiedmer  wrote:

> +1 (binding)
>
> Thanks again for steering us through Bolke.
>
> Best,
> Arthur
>
> On Sun, Mar 12, 2017 at 9:59 PM, Bolke de Bruin  wrote:
>
> > Dear All,
> >
> > Finally, I have been able to make the FIFTH RELEASE CANDIDATE of Airflow
> > 1.8.0 available at: https://dist.apache.org/repos/
> > dist/dev/incubator/airflow/  > repos/dist/dev/incubator/airflow/> , public keys are available at
> > https://dist.apache.org/repos/dist/release/incubator/airflow/ <
> > https://dist.apache.org/repos/dist/release/incubator/airflow/> . It is
> > tagged with a local version “apache.incubating” so it allows upgrading
> from
> > earlier releases.
> >
> > Issues fixed since rc4:
> >
> > [AIRFLOW-900] Double trigger should not kill original task instance
> > [AIRFLOW-900] Fixes bugs in LocalTaskJob for double run protection
> > [AIRFLOW-932] Do not mark tasks removed when backfilling
> > [AIRFLOW-961] run onkill when SIGTERMed
> > [AIRFLOW-910] Use parallel task execution for backfills
> > [AIRFLOW-967] Wrap strings in native for py2 ldap compatibility
> > [AIRFLOW-941] Use defined parameters for psycopg2
> > [AIRFLOW-719] Prevent DAGs from ending prematurely
> > [AIRFLOW-938] Use test for True in task_stats queries
> > [AIRFLOW-937] Improve performance of task_stats
> > [AIRFLOW-933] use ast.literal_eval rather eval because ast.literal_eval
> > does not execute input.
> > [AIRFLOW-919] Running tasks with no start date shouldn't break a DAGs UI
> > [AIRFLOW-897] Prevent dagruns from failing with unfinished tasks
> > [AIRFLOW-861] make pickle_info endpoint be login_required
> > [AIRFLOW-853] use utf8 encoding for stdout line decode
> > [AIRFLOW-856] Make sure execution date is set for local client
> > [AIRFLOW-830][AIRFLOW-829][AIRFLOW-88] Reduce Travis log verbosity
> > [AIRFLOW-794] Access DAGS_FOLDER and SQL_ALCHEMY_CONN exclusively from
> > settings
> > [AIRFLOW-694] Fix config behaviour for empty envvar
> > [AIRFLOW-365] Set dag.fileloc explicitly and use for Code view
> > [AIRFLOW-931] Do not set QUEUED in TaskInstances
> > [AIRFLOW-899] Tasks in SCHEDULED state should be white in the UI instead
> > of black
> > [AIRFLOW-895] Address Apache release incompliancies
> > [AIRFLOW-893][AIRFLOW-510] Fix crashing webservers when a dagrun has no
> > start date
> > [AIRFLOW-793] Enable compressed loading in S3ToHiveTransfer
> > [AIRFLOW-863] Example DAGs should have recent start dates
> > [AIRFLOW-869] Refactor mark success functionality
> > [AIRFLOW-856] Make sure execution date is set for local client
> > [AIRFLOW-814] Fix Presto*CheckOperator.__init__
> > [AIRFLOW-844] Fix cgroups directory creation
> >
> > No known issues anymore.
> >
> > I would also like to raise a VOTE for releasing 1.8.0 based on release
> > candidate 5, i.e. just renaming release candidate 5 to 1.8.0 release.
> >
> > Please respond to this email by:
> >
> > +1,0,-1 with *binding* if you are a PMC member or *non-binding* if you
> are
> > not.
> >
> > Thanks!
> > Bolke
> >
> > My VOTE: +1 (binding)
>


Issue installing airflow on ubuntu 14.04

2017-03-13 Thread Derrick Schneider
Hello!

I'm just getting started with airflow, and I'm trying to spin it up on a
fresh Ubuntu 14.04 VM. I install python and pip, set AIRFLOW_HOME, and then
run "pip install airflow" as stated here:
https://pythonhosted.org/airflow/start.html

When I do that, I get a long stream of messages that starts with

"  Running setup.py install for pandas

package init file 'pandas/io/tests/sas/__init__.py' not found (or not a
regular file)

package init file 'pandas/io/tests/sas/__init__.py' not found (or not a
regular file)

UPDATING build/lib.linux-x86_64-2.7/pandas/_version.py

set build/lib.linux-x86_64-2.7/pandas/_version.py to '0.19.2'

building 'pandas.index' extension

x86_64-linux-gnu-gcc -pthread -fno-strict-aliasing -DNDEBUG -g -fwrapv
-O2 -Wall -Wstrict-prototypes -fPIC -Ipandas/src/klib -Ipandas/src
-I/usr/lib/python2.7/dist-packages/numpy/core/include
-I/usr/include/python2.7 -c pandas/index.c -o
build/temp.linux-x86_64-2.7/pandas/index.o -Wno-unused-function

In file included from
/usr/lib/python2.7/dist-packages/numpy/core/include/numpy/ndarraytypes.h:1761:0,

 from
/usr/lib/python2.7/dist-packages/numpy/core/include/numpy/ndarrayobject.h:17,

 from
/usr/lib/python2.7/dist-packages/numpy/core/include/numpy/arrayobject.h:4,

 from pandas/index.c:274:


/usr/lib/python2.7/dist-packages/numpy/core/include/numpy/npy_1_7_deprecated_api.h:15:2:
warning: #warning "Using deprecated NumPy API, disable it by " "#defining
NPY_NO_DEPRECATED_API NPY_1_7_API_VERSION" [-Wcpp]

 #warning "Using deprecated NumPy API, disable it by " \

  ^

pandas/index.c: In function
'__pyx_f_6pandas_5index_11IndexEngine_get_loc':

pandas/index.c:9488:13: warning: '__pyx_v_mid' may be used
uninitialized in this function [-Wmaybe-uninitialized]

   __pyx_t_5 = PyInt_FromSsize_t(__pyx_v_mid); if
(unlikely(!__pyx_t_5)) __PYX_ERR(0, 496, __pyx_L1_error)

 ^

pandas/index.c:9263:14: note: '__pyx_v_mid' was declared here

   Py_ssize_t __pyx_v_mid;

"

and gives a final error of "InstallationError: Command /usr/bin/python -c
"import setuptools,
tokenize;__file__='/tmp/pip_build_vagrant/pandas/setup.py';exec(compile(getattr(tokenize,
'open', open)(__file__).read().replace('\r\n', '\n'), __file__, 'exec'))"
install --record /tmp/pip-Uw_sOI-record/install-record.txt
--single-version-externally-managed --compile failed with error code 1 in
/tmp/pip_build_vagrant/pandas"


Various online recs suggest installing python-dev and python-numpy as apt
packages, but my apt says those are up to date.


Thanks in advance for the help

Derrick


Re: Continuous Dag

2017-03-13 Thread Maxime Beauchemin
Airflow isn't designed to work well with short schedule intervals. The
guarantees that we give in terms of scheduling latency are limited as the
platform isn't optimized for that specifically.

What is the type of operation that you are performing every 2 minutes?

If you're doing data processing in microbatches you should look into data
streaming solutions like Spark Streaming, Flink, Sanza, or Storm.

Max

On Mon, Mar 13, 2017 at 10:17 AM, Edgardo Vega 
wrote:

> Max,
>
> Sorry for so many ambiguous antecedents.
>
> I want to create a dag that does an operation waits 2 minutes and the runs
> again over and over for all time. I don't know if that is possible to do
> with airflow or somehow trick airflow into doing this.
>
> I hope that clears things up.
>
> Cheers,
>
> Edgardo
>
>
> On Mon, Mar 13, 2017 at 12:39 PM, Maxime Beauchemin <
> maximebeauche...@gmail.com> wrote:
>
> > "Would this be possible in airflow?"
> >
> > What do you mean by "this"?
> >
> > Max
> >
> > On Mon, Mar 13, 2017 at 9:09 AM, Edgardo Vega 
> > wrote:
> >
> > > We are currently trying to port our current solution into airflow. What
> > is
> > > currently stumping me is we have a few tasks we have running pretty
> much
> > > all the time. Once it is done we wait a few minutes and kick off
> another.
> > >
> > > Would this be possible in airflow?
> > >
> > > --
> > > Cheers,
> > >
> > > Edgardo
> > >
> >
>
>
>
> --
> Cheers,
>
> Edgardo
>


Re: Continuous Dag

2017-03-13 Thread Alex Guziel
FWIW, for our streaming jobs, we run a 5 minute schedule interval with
max_active_runs=1

On Mon, Mar 13, 2017 at 2:00 PM, Maxime Beauchemin <
maximebeauche...@gmail.com> wrote:

> Airflow isn't designed to work well with short schedule intervals. The
> guarantees that we give in terms of scheduling latency are limited as the
> platform isn't optimized for that specifically.
>
> What is the type of operation that you are performing every 2 minutes?
>
> If you're doing data processing in microbatches you should look into data
> streaming solutions like Spark Streaming, Flink, Sanza, or Storm.
>
> Max
>
> On Mon, Mar 13, 2017 at 10:17 AM, Edgardo Vega 
> wrote:
>
> > Max,
> >
> > Sorry for so many ambiguous antecedents.
> >
> > I want to create a dag that does an operation waits 2 minutes and the
> runs
> > again over and over for all time. I don't know if that is possible to do
> > with airflow or somehow trick airflow into doing this.
> >
> > I hope that clears things up.
> >
> > Cheers,
> >
> > Edgardo
> >
> >
> > On Mon, Mar 13, 2017 at 12:39 PM, Maxime Beauchemin <
> > maximebeauche...@gmail.com> wrote:
> >
> > > "Would this be possible in airflow?"
> > >
> > > What do you mean by "this"?
> > >
> > > Max
> > >
> > > On Mon, Mar 13, 2017 at 9:09 AM, Edgardo Vega 
> > > wrote:
> > >
> > > > We are currently trying to port our current solution into airflow.
> What
> > > is
> > > > currently stumping me is we have a few tasks we have running pretty
> > much
> > > > all the time. Once it is done we wait a few minutes and kick off
> > another.
> > > >
> > > > Would this be possible in airflow?
> > > >
> > > > --
> > > > Cheers,
> > > >
> > > > Edgardo
> > > >
> > >
> >
> >
> >
> > --
> > Cheers,
> >
> > Edgardo
> >
>


Re: [VOTE] Release Airflow 1.8.0 based on Airflow 1.8.0rc5

2017-03-13 Thread Dan Davydov
I'll test this on staging as soon as I get a chance (the testing is
non-blocking on the rc5). Bolke very much in particular :).

On Mon, Mar 13, 2017 at 10:46 AM, Jeremiah Lowin  wrote:

> +1 (binding) extremely impressed by the work and diligence all contributors
> have put in to getting these blockers fixed, Bolke in particular.
>
> On Mon, Mar 13, 2017 at 1:07 AM Arthur Wiedmer  wrote:
>
> > +1 (binding)
> >
> > Thanks again for steering us through Bolke.
> >
> > Best,
> > Arthur
> >
> > On Sun, Mar 12, 2017 at 9:59 PM, Bolke de Bruin 
> wrote:
> >
> > > Dear All,
> > >
> > > Finally, I have been able to make the FIFTH RELEASE CANDIDATE of
> Airflow
> > > 1.8.0 available at: https://dist.apache.org/repos/
> > > dist/dev/incubator/airflow/  > > repos/dist/dev/incubator/airflow/> , public keys are available at
> > > https://dist.apache.org/repos/dist/release/incubator/airflow/ <
> > > https://dist.apache.org/repos/dist/release/incubator/airflow/> . It is
> > > tagged with a local version “apache.incubating” so it allows upgrading
> > from
> > > earlier releases.
> > >
> > > Issues fixed since rc4:
> > >
> > > [AIRFLOW-900] Double trigger should not kill original task instance
> > > [AIRFLOW-900] Fixes bugs in LocalTaskJob for double run protection
> > > [AIRFLOW-932] Do not mark tasks removed when backfilling
> > > [AIRFLOW-961] run onkill when SIGTERMed
> > > [AIRFLOW-910] Use parallel task execution for backfills
> > > [AIRFLOW-967] Wrap strings in native for py2 ldap compatibility
> > > [AIRFLOW-941] Use defined parameters for psycopg2
> > > [AIRFLOW-719] Prevent DAGs from ending prematurely
> > > [AIRFLOW-938] Use test for True in task_stats queries
> > > [AIRFLOW-937] Improve performance of task_stats
> > > [AIRFLOW-933] use ast.literal_eval rather eval because ast.literal_eval
> > > does not execute input.
> > > [AIRFLOW-919] Running tasks with no start date shouldn't break a DAGs
> UI
> > > [AIRFLOW-897] Prevent dagruns from failing with unfinished tasks
> > > [AIRFLOW-861] make pickle_info endpoint be login_required
> > > [AIRFLOW-853] use utf8 encoding for stdout line decode
> > > [AIRFLOW-856] Make sure execution date is set for local client
> > > [AIRFLOW-830][AIRFLOW-829][AIRFLOW-88] Reduce Travis log verbosity
> > > [AIRFLOW-794] Access DAGS_FOLDER and SQL_ALCHEMY_CONN exclusively from
> > > settings
> > > [AIRFLOW-694] Fix config behaviour for empty envvar
> > > [AIRFLOW-365] Set dag.fileloc explicitly and use for Code view
> > > [AIRFLOW-931] Do not set QUEUED in TaskInstances
> > > [AIRFLOW-899] Tasks in SCHEDULED state should be white in the UI
> instead
> > > of black
> > > [AIRFLOW-895] Address Apache release incompliancies
> > > [AIRFLOW-893][AIRFLOW-510] Fix crashing webservers when a dagrun has no
> > > start date
> > > [AIRFLOW-793] Enable compressed loading in S3ToHiveTransfer
> > > [AIRFLOW-863] Example DAGs should have recent start dates
> > > [AIRFLOW-869] Refactor mark success functionality
> > > [AIRFLOW-856] Make sure execution date is set for local client
> > > [AIRFLOW-814] Fix Presto*CheckOperator.__init__
> > > [AIRFLOW-844] Fix cgroups directory creation
> > >
> > > No known issues anymore.
> > >
> > > I would also like to raise a VOTE for releasing 1.8.0 based on release
> > > candidate 5, i.e. just renaming release candidate 5 to 1.8.0 release.
> > >
> > > Please respond to this email by:
> > >
> > > +1,0,-1 with *binding* if you are a PMC member or *non-binding* if you
> > are
> > > not.
> > >
> > > Thanks!
> > > Bolke
> > >
> > > My VOTE: +1 (binding)
> >
>


Re: Continuous Dag

2017-03-13 Thread Edgardo Vega
We have a very quick job that processes request for the entirety current
hour in an idempotent way so we can get results to the user a bit quicker
than every hour and then at the end of the hour run another job to make
sure we didn't miss anything for the whole hour. Maybe there is a better
way to accomplish the same thing. All recommendations welcomes ;-)

I can try the 5 minutes schedule interval with max_active_runs=1.

Thanks for all the help.

Cheers,

Edgardo

On Mon, Mar 13, 2017 at 5:12 PM, Alex Guziel  wrote:

> FWIW, for our streaming jobs, we run a 5 minute schedule interval with
> max_active_runs=1
>
> On Mon, Mar 13, 2017 at 2:00 PM, Maxime Beauchemin <
> maximebeauche...@gmail.com> wrote:
>
> > Airflow isn't designed to work well with short schedule intervals. The
> > guarantees that we give in terms of scheduling latency are limited as the
> > platform isn't optimized for that specifically.
> >
> > What is the type of operation that you are performing every 2 minutes?
> >
> > If you're doing data processing in microbatches you should look into data
> > streaming solutions like Spark Streaming, Flink, Sanza, or Storm.
> >
> > Max
> >
> > On Mon, Mar 13, 2017 at 10:17 AM, Edgardo Vega 
> > wrote:
> >
> > > Max,
> > >
> > > Sorry for so many ambiguous antecedents.
> > >
> > > I want to create a dag that does an operation waits 2 minutes and the
> > runs
> > > again over and over for all time. I don't know if that is possible to
> do
> > > with airflow or somehow trick airflow into doing this.
> > >
> > > I hope that clears things up.
> > >
> > > Cheers,
> > >
> > > Edgardo
> > >
> > >
> > > On Mon, Mar 13, 2017 at 12:39 PM, Maxime Beauchemin <
> > > maximebeauche...@gmail.com> wrote:
> > >
> > > > "Would this be possible in airflow?"
> > > >
> > > > What do you mean by "this"?
> > > >
> > > > Max
> > > >
> > > > On Mon, Mar 13, 2017 at 9:09 AM, Edgardo Vega <
> edgardo.v...@gmail.com>
> > > > wrote:
> > > >
> > > > > We are currently trying to port our current solution into airflow.
> > What
> > > > is
> > > > > currently stumping me is we have a few tasks we have running pretty
> > > much
> > > > > all the time. Once it is done we wait a few minutes and kick off
> > > another.
> > > > >
> > > > > Would this be possible in airflow?
> > > > >
> > > > > --
> > > > > Cheers,
> > > > >
> > > > > Edgardo
> > > > >
> > > >
> > >
> > >
> > >
> > > --
> > > Cheers,
> > >
> > > Edgardo
> > >
> >
>



-- 
Cheers,

Edgardo


Re: [VOTE] Release Airflow 1.8.0 based on Airflow 1.8.0rc5

2017-03-13 Thread siddharth anand
I'm going to deploy this to staging now. Fab work Bolke!
-s

On Mon, Mar 13, 2017 at 2:16 PM, Dan Davydov  wrote:

> I'll test this on staging as soon as I get a chance (the testing is
> non-blocking on the rc5). Bolke very much in particular :).
>
> On Mon, Mar 13, 2017 at 10:46 AM, Jeremiah Lowin 
> wrote:
>
> > +1 (binding) extremely impressed by the work and diligence all
> contributors
> > have put in to getting these blockers fixed, Bolke in particular.
> >
> > On Mon, Mar 13, 2017 at 1:07 AM Arthur Wiedmer 
> wrote:
> >
> > > +1 (binding)
> > >
> > > Thanks again for steering us through Bolke.
> > >
> > > Best,
> > > Arthur
> > >
> > > On Sun, Mar 12, 2017 at 9:59 PM, Bolke de Bruin 
> > wrote:
> > >
> > > > Dear All,
> > > >
> > > > Finally, I have been able to make the FIFTH RELEASE CANDIDATE of
> > Airflow
> > > > 1.8.0 available at: https://dist.apache.org/repos/
> > > > dist/dev/incubator/airflow/  > > > repos/dist/dev/incubator/airflow/> , public keys are available at
> > > > https://dist.apache.org/repos/dist/release/incubator/airflow/ <
> > > > https://dist.apache.org/repos/dist/release/incubator/airflow/> . It
> is
> > > > tagged with a local version “apache.incubating” so it allows
> upgrading
> > > from
> > > > earlier releases.
> > > >
> > > > Issues fixed since rc4:
> > > >
> > > > [AIRFLOW-900] Double trigger should not kill original task instance
> > > > [AIRFLOW-900] Fixes bugs in LocalTaskJob for double run protection
> > > > [AIRFLOW-932] Do not mark tasks removed when backfilling
> > > > [AIRFLOW-961] run onkill when SIGTERMed
> > > > [AIRFLOW-910] Use parallel task execution for backfills
> > > > [AIRFLOW-967] Wrap strings in native for py2 ldap compatibility
> > > > [AIRFLOW-941] Use defined parameters for psycopg2
> > > > [AIRFLOW-719] Prevent DAGs from ending prematurely
> > > > [AIRFLOW-938] Use test for True in task_stats queries
> > > > [AIRFLOW-937] Improve performance of task_stats
> > > > [AIRFLOW-933] use ast.literal_eval rather eval because
> ast.literal_eval
> > > > does not execute input.
> > > > [AIRFLOW-919] Running tasks with no start date shouldn't break a DAGs
> > UI
> > > > [AIRFLOW-897] Prevent dagruns from failing with unfinished tasks
> > > > [AIRFLOW-861] make pickle_info endpoint be login_required
> > > > [AIRFLOW-853] use utf8 encoding for stdout line decode
> > > > [AIRFLOW-856] Make sure execution date is set for local client
> > > > [AIRFLOW-830][AIRFLOW-829][AIRFLOW-88] Reduce Travis log verbosity
> > > > [AIRFLOW-794] Access DAGS_FOLDER and SQL_ALCHEMY_CONN exclusively
> from
> > > > settings
> > > > [AIRFLOW-694] Fix config behaviour for empty envvar
> > > > [AIRFLOW-365] Set dag.fileloc explicitly and use for Code view
> > > > [AIRFLOW-931] Do not set QUEUED in TaskInstances
> > > > [AIRFLOW-899] Tasks in SCHEDULED state should be white in the UI
> > instead
> > > > of black
> > > > [AIRFLOW-895] Address Apache release incompliancies
> > > > [AIRFLOW-893][AIRFLOW-510] Fix crashing webservers when a dagrun has
> no
> > > > start date
> > > > [AIRFLOW-793] Enable compressed loading in S3ToHiveTransfer
> > > > [AIRFLOW-863] Example DAGs should have recent start dates
> > > > [AIRFLOW-869] Refactor mark success functionality
> > > > [AIRFLOW-856] Make sure execution date is set for local client
> > > > [AIRFLOW-814] Fix Presto*CheckOperator.__init__
> > > > [AIRFLOW-844] Fix cgroups directory creation
> > > >
> > > > No known issues anymore.
> > > >
> > > > I would also like to raise a VOTE for releasing 1.8.0 based on
> release
> > > > candidate 5, i.e. just renaming release candidate 5 to 1.8.0 release.
> > > >
> > > > Please respond to this email by:
> > > >
> > > > +1,0,-1 with *binding* if you are a PMC member or *non-binding* if
> you
> > > are
> > > > not.
> > > >
> > > > Thanks!
> > > > Bolke
> > > >
> > > > My VOTE: +1 (binding)
> > >
> >
>


BashOperator Templated Command with Quotes

2017-03-13 Thread Edgardo Vega
I am trying to template a bash command and I need to have quotes in the
command so
bash_command= 'command opts="startTime={{ts}}
endTime={{ (execution_date + macros.timedelta(hours=1)).isoformat() }}"'

but jinja treats quotes as a literals.

Any workaround to this problem, that this noob hasn't figured out?

-- 
Cheers,

Edgardo