[jira] [Commented] (AIRFLOW-161) Redirection to external url

2016-05-23 Thread Sumit Maheshwari (JIRA)

[ 
https://issues.apache.org/jira/browse/AIRFLOW-161?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15297652#comment-15297652
 ] 

Sumit Maheshwari commented on AIRFLOW-161:
--

Hi Chris,

Yes, for offline things like mail or slack this is what we are using as well, 
but for real time cases like mine, it doesn't suit. Please take a look on this 
PR (https://github.com/apache/incubator-airflow/pull/1538) and let me know if I 
can achieve similar using some other way. 

Thanks,
Sumit

> Redirection to external url
> ---
>
> Key: AIRFLOW-161
> URL: https://issues.apache.org/jira/browse/AIRFLOW-161
> Project: Apache Airflow
>  Issue Type: Improvement
>  Components: webserver
>Reporter: Sumit Maheshwari
>
> Hi,
> I am not able to find a good way (apart from loading everything upfront), 
> where I can redirect someone to a external service url, using the information 
> stored in airflow. There could be many use cases like downloading a signed 
> file from s3, redirecting to hadoop job tracker, or a direct case on which I 
> am working which is linking airflow tasks to qubole commands. 
> I already have a working model and will open a PR soon. Please let me know if 
> there existing ways already.
> Thanks,
> Sumit



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


Re: [1/3] incubator-airflow git commit: use targetPartitionSize as the default partition spec

2016-05-23 Thread Chris Riccomini
s/two/too .. sigh

On Mon, May 23, 2016 at 8:29 PM, Chris Riccomini 
wrote:

> Ah, yea. I get bitten by that two. It's annoying to have to ask people to
> add a JIRA to their commit message. And we can't squash through GitHub
> anymore. :( Wonder if the airflow-pr script allows us to edit it? I think
> it might
>
> On Mon, May 23, 2016 at 5:50 PM, Dan Davydov <
> dan.davy...@airbnb.com.invalid> wrote:
>
>> Yep sorry will check the versions in the future. My own commits have JIRA
>> labels but I haven't validated that other users have done this for theirs
>> when I merge their commits (as the LGTM is delegated to either another
>> committer or the owner of a particular operator). Will be more vigilant in
>> the future.
>>
>> On Mon, May 23, 2016 at 5:07 PM, Chris Riccomini 
>> wrote:
>>
>> > Hey Dan,
>> >
>> > Could you please file JIRAs, and put the JIRA name as the prefix to your
>> > commits?
>> >
>> > Cheers,
>> > Chris
>> >
>> > On Mon, May 23, 2016 at 5:01 PM,  wrote:
>> >
>> >> Repository: incubator-airflow
>> >> Updated Branches:
>> >>   refs/heads/airbnb_rb1.7.1_4 1d0d8681d -> 6f7ea90ae
>> >>
>> >>
>> >> use targetPartitionSize as the default partition spec
>> >>
>> >>
>> >> Project: http://git-wip-us.apache.org/repos/asf/incubator-airflow/repo
>> >> Commit:
>> >>
>> http://git-wip-us.apache.org/repos/asf/incubator-airflow/commit/b58b5e09
>> >> Tree:
>> >> http://git-wip-us.apache.org/repos/asf/incubator-airflow/tree/b58b5e09
>> >> Diff:
>> >> http://git-wip-us.apache.org/repos/asf/incubator-airflow/diff/b58b5e09
>> >>
>> >> Branch: refs/heads/airbnb_rb1.7.1_4
>> >> Commit: b58b5e09578d8a0df17b4de12fe3b49792e9feda
>> >> Parents: 1d0d868
>> >> Author: Hongbo Zeng 
>> >> Authored: Sat May 14 17:00:42 2016 -0700
>> >> Committer: Dan Davydov 
>> >> Committed: Mon May 23 16:59:52 2016 -0700
>> >>
>> >> --
>> >>  airflow/hooks/druid_hook.py| 23 ---
>> >>  airflow/operators/hive_to_druid.py |  8 +---
>> >>  2 files changed, 21 insertions(+), 10 deletions(-)
>> >> --
>> >>
>> >>
>> >>
>> >>
>> http://git-wip-us.apache.org/repos/asf/incubator-airflow/blob/b58b5e09/airflow/hooks/druid_hook.py
>> >> --
>> >> diff --git a/airflow/hooks/druid_hook.py b/airflow/hooks/druid_hook.py
>> >> index b6cb231..7c80c7c 100644
>> >> --- a/airflow/hooks/druid_hook.py
>> >> +++ b/airflow/hooks/druid_hook.py
>> >> @@ -10,7 +10,7 @@ from airflow.hooks.base_hook import BaseHook
>> >>  from airflow.exceptions import AirflowException
>> >>
>> >>  LOAD_CHECK_INTERVAL = 5
>> >> -
>> >> +TARGET_PARTITION_SIZE = 500
>> >>
>> >>  class AirflowDruidLoadException(AirflowException):
>> >>  pass
>> >> @@ -52,13 +52,22 @@ class DruidHook(BaseHook):
>> >>
>> >>  def construct_ingest_query(
>> >>  self, datasource, static_path, ts_dim, columns,
>> metric_spec,
>> >> -intervals, num_shards,
>> hadoop_dependency_coordinates=None):
>> >> +intervals, num_shards, target_partition_size,
>> >> hadoop_dependency_coordinates=None):
>> >>  """
>> >>  Builds an ingest query for an HDFS TSV load.
>> >>
>> >>  :param datasource: target datasource in druid
>> >>  :param columns: list of all columns in the TSV, in the right
>> >> order
>> >>  """
>> >> +
>> >> +# backward compatibilty for num_shards, but
>> >> target_partition_size is the default setting
>> >> +# and overwrites the num_shards
>> >> +if target_partition_size == -1:
>> >> +if num_shards == -1:
>> >> +target_partition_size = TARGET_PARTITION_SIZE
>> >> +else:
>> >> +num_shards = -1
>> >> +
>> >>  metric_names = [
>> >>  m['fieldName'] for m in metric_spec if m['type'] !=
>> 'count']
>> >>  dimensions = [c for c in columns if c not in metric_names and
>> c
>> >> != ts_dim]
>> >> @@ -100,7 +109,7 @@ class DruidHook(BaseHook):
>> >>  },
>> >>  "partitionsSpec" : {
>> >>  "type" : "hashed",
>> >> -"targetPartitionSize" : -1,
>> >> +"targetPartitionSize" : target_partition_size,
>> >>  "numShards" : num_shards,
>> >>  },
>> >>  },
>> >> @@ -121,10 +130,10 @@ class DruidHook(BaseHook):
>> >>
>> >>  def send_ingest_query(
>> >>  self, datasource, static_path, ts_dim, columns,
>> metric_spec,
>> >> -intervals, num_shards,
>> hadoop_dependency_coordinates=None):
>> >> +intervals, num_shards, target_partition_size,
>> >> hadoop_dependency_coordinates=None):
>> 

[jira] [Commented] (AIRFLOW-163) Running multiple LocalExecutor schedulers makes system load skyrocket

2016-05-23 Thread Bolke de Bruin (JIRA)

[ 
https://issues.apache.org/jira/browse/AIRFLOW-163?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15296980#comment-15296980
 ] 

Bolke de Bruin commented on AIRFLOW-163:


Not sure. Bence are you able to test the PR attached to airflow-128 or provide 
a sample dag that exposes the issue?

> Running multiple LocalExecutor schedulers makes system load skyrocket
> -
>
> Key: AIRFLOW-163
> URL: https://issues.apache.org/jira/browse/AIRFLOW-163
> Project: Apache Airflow
>  Issue Type: Bug
>Affects Versions: Airflow 1.7.1
> Environment: EC2 t2.medium instance, 
> Docker `version 1.11.1, build 5604cbe`, 
> Host is `Linux ip-172-31-44-140 3.13.0-85-generic #129-Ubuntu SMP Thu Mar 17 
> 20:50:15 UTC 2016 x86_64 x86_64 x86_64 GNU/Linux`, 
> Docker containers are built upon the `python:3.5` image, 
> LocalExecutor is used with two scheduler containers running
>Reporter: Bence Nagy
>Priority: Minor
>  Labels: scheduler
>
> I've been told on Gitter that this is expected currently, but thought I'd 
> create an issue for it anyway.
> See this screenshot of a task duration chart — I launched a second scheduler 
> for the 8:50 execution. The orange line represents a PostgresOperator task 
> (i.e. processing happens independent of airflow), while the other lines 
> represent data copying tasks that go through a temp file on the airflow host 
> https://i.imgur.com/2tDKgKj.png
> I'm seeing a system load of around 4.0-5.0 when processing tasks when one 
> scheduler is running, and 20.0-30.0 with two.
> Running {{airflow scheduler --num_runs 3}} under yappi got me these results 
> when ordered by total time: http://pastebin.com/8TiEG4P3. I still have the 
> raw profiling data, let me know if another data extract would be useful.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Closed] (AIRFLOW-165) Add a description/metadata field to the Task

2016-05-23 Thread Chris Riccomini (JIRA)

 [ 
https://issues.apache.org/jira/browse/AIRFLOW-165?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Chris Riccomini closed AIRFLOW-165.
---
Resolution: Information Provided

> Add a description/metadata field to the Task
> 
>
> Key: AIRFLOW-165
> URL: https://issues.apache.org/jira/browse/AIRFLOW-165
> Project: Apache Airflow
>  Issue Type: Improvement
>Reporter: Richard Davison
>Priority: Minor
>
> I think that to help facilitate self documentation, we should add either a 
> `description` field or a `metadata` field at the Task level so we can add an 
> arbitrary blob of information to describe it.  
> On the UI side, we could put the description in the the alt text, a mouseover 
> popup, in the onclick overlay popup, or in a link inside of that similar to 
> the optional 'subdag' link.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (AIRFLOW-165) Add a description/metadata field to the Task

2016-05-23 Thread Chris Riccomini (JIRA)

[ 
https://issues.apache.org/jira/browse/AIRFLOW-165?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15296952#comment-15296952
 ] 

Chris Riccomini commented on AIRFLOW-165:
-

I think this is already done, right?

https://pythonhosted.org/airflow/concepts.html#task-documentation-notes

You can do:

{code}
t = BashOperator("foo", dag=dag)
t.doc_md = """\
#Title"
Here's a [url](www.airbnb.com)
"""
{code}

> Add a description/metadata field to the Task
> 
>
> Key: AIRFLOW-165
> URL: https://issues.apache.org/jira/browse/AIRFLOW-165
> Project: Apache Airflow
>  Issue Type: Improvement
>Reporter: Richard Davison
>Priority: Minor
>
> I think that to help facilitate self documentation, we should add either a 
> `description` field or a `metadata` field at the Task level so we can add an 
> arbitrary blob of information to describe it.  
> On the UI side, we could put the description in the the alt text, a mouseover 
> popup, in the onclick overlay popup, or in a link inside of that similar to 
> the optional 'subdag' link.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (AIRFLOW-161) Redirection to external url

2016-05-23 Thread Chris Riccomini (JIRA)

[ 
https://issues.apache.org/jira/browse/AIRFLOW-161?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15296944#comment-15296944
 ] 

Chris Riccomini commented on AIRFLOW-161:
-

What context is this under? You mean like you want to use the EmailOperator or 
SlackOperator to notify people to download a file that's been created as part 
of the DAG?

We do this using XCom+EmailOperator. XCom variables can be accessed via 
templates. We store the file in a blob store (like S3). The file location is 
stored in XCom, which the EmailOperator references when it sends the email.

> Redirection to external url
> ---
>
> Key: AIRFLOW-161
> URL: https://issues.apache.org/jira/browse/AIRFLOW-161
> Project: Apache Airflow
>  Issue Type: Improvement
>  Components: webserver
>Reporter: Sumit Maheshwari
>
> Hi,
> I am not able to find a good way (apart from loading everything upfront), 
> where I can redirect someone to a external service url, using the information 
> stored in airflow. There could be many use cases like downloading a signed 
> file from s3, redirecting to hadoop job tracker, or a direct case on which I 
> am working which is linking airflow tasks to qubole commands. 
> I already have a working model and will open a PR soon. Please let me know if 
> there existing ways already.
> Thanks,
> Sumit



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Assigned] (AIRFLOW-58) Add bulk_dump abstract method to DbApiHook

2016-05-23 Thread Bence Nagy (JIRA)

 [ 
https://issues.apache.org/jira/browse/AIRFLOW-58?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Bence Nagy reassigned AIRFLOW-58:
-

Assignee: Bence Nagy

> Add bulk_dump abstract method to DbApiHook
> --
>
> Key: AIRFLOW-58
> URL: https://issues.apache.org/jira/browse/AIRFLOW-58
> Project: Apache Airflow
>  Issue Type: Improvement
>  Components: hooks
>Affects Versions: Airflow 1.7.0
>Reporter: Bence Nagy
>Assignee: Bence Nagy
>Priority: Trivial
>
> I just see no reason for having a method for bulk loading but not for the 
> inverse.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (AIRFLOW-159) Documentation: Cloud integration : GCP

2016-05-23 Thread Chris Riccomini (JIRA)

[ 
https://issues.apache.org/jira/browse/AIRFLOW-159?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15296939#comment-15296939
 ] 

Chris Riccomini commented on AIRFLOW-159:
-

Sounds good!

> Documentation: Cloud integration : GCP
> --
>
> Key: AIRFLOW-159
> URL: https://issues.apache.org/jira/browse/AIRFLOW-159
> Project: Apache Airflow
>  Issue Type: Task
>  Components: gcp
>Reporter: Alex Van Boxel
>Assignee: Alex Van Boxel
>
> Start documenting all GCP operators and hooks. 
> I propose a new top-level documentation section that's called "Integration". 
> Under that section I would make a sub-section "Google Cloud Platform".
> This way other Cloud integration can be documented as well in the Integration 
> section.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Closed] (AIRFLOW-10) Migrate GH issues to Apache JIRA

2016-05-23 Thread Chris Riccomini (JIRA)

 [ 
https://issues.apache.org/jira/browse/AIRFLOW-10?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Chris Riccomini closed AIRFLOW-10.
--
Resolution: Done

> Migrate GH issues to Apache JIRA
> 
>
> Key: AIRFLOW-10
> URL: https://issues.apache.org/jira/browse/AIRFLOW-10
> Project: Apache Airflow
>  Issue Type: Bug
>  Components: project-management
>Reporter: Chris Riccomini
>Assignee: Bolke de Bruin
>




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (AIRFLOW-164) Disable the web UI's page load animations

2016-05-23 Thread Chris Riccomini (JIRA)

[ 
https://issues.apache.org/jira/browse/AIRFLOW-164?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15296927#comment-15296927
 ] 

Chris Riccomini commented on AIRFLOW-164:
-

I'm +1 on removing it, personally. I suspect [~maxime.beauche...@apache.org] 
will have the strongest preference.

> Disable the web UI's page load animations
> -
>
> Key: AIRFLOW-164
> URL: https://issues.apache.org/jira/browse/AIRFLOW-164
> Project: Apache Airflow
>  Issue Type: Improvement
>  Components: ui
>Affects Versions: Airflow 1.7.1
>Reporter: Bence Nagy
>
> Alright, some people might disagree, looking forward to the discussion here. 
> Basically my qualm is that opening almost any page (even the DAGs list) will 
> trigger an animation where the content kinda swoops in from the left/the 
> top/the top-left corner. This gets pretty annoying for a few reasons:
> - It takes around half a second before the content is visually parsable, and 
> these half seconds accumulate pretty quick when doing lots of administration.
> - This makes visual diffing when refreshing or editing the URL impossible. If 
> the animations weren't firing, it would be possible to refresh for instance 
> the tree view of a complicated DAG and just see the treemap change, making 
> the differences obvious. Currently you need to commit the state to memory and 
> then recall it after the animation has finished to try and figure out what 
> has changed.
> - I think it just makes no sense from a design point of view anyway to have 
> all this data sliding around the screen. It's not like it passes off as a 
> transition animation or anything.
> What does everyone else think?



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


incubator-airflow git commit: docfix: Fix a couple of minor typos.

2016-05-23 Thread sanand
Repository: incubator-airflow
Updated Branches:
  refs/heads/master 88f895aa6 -> 8d7297573


docfix: Fix a couple of minor typos.


Project: http://git-wip-us.apache.org/repos/asf/incubator-airflow/repo
Commit: http://git-wip-us.apache.org/repos/asf/incubator-airflow/commit/8d729757
Tree: http://git-wip-us.apache.org/repos/asf/incubator-airflow/tree/8d729757
Diff: http://git-wip-us.apache.org/repos/asf/incubator-airflow/diff/8d729757

Branch: refs/heads/master
Commit: 8d72975734e66d6efa775cec62dd0aea87575c0d
Parents: 88f895a
Author: Mark Reid 
Authored: Mon May 23 09:16:03 2016 -0300
Committer: Mark Reid 
Committed: Mon May 23 09:16:38 2016 -0300

--
 docs/concepts.rst  | 2 +-
 docs/configuration.rst | 2 +-
 2 files changed, 2 insertions(+), 2 deletions(-)
--


http://git-wip-us.apache.org/repos/asf/incubator-airflow/blob/8d729757/docs/concepts.rst
--
diff --git a/docs/concepts.rst b/docs/concepts.rst
index 405048a..6e15ff8 100644
--- a/docs/concepts.rst
+++ b/docs/concepts.rst
@@ -172,7 +172,7 @@ functionally equivalent:
 
 When using the bitshift to compose operators, the relationship is set in the
 direction that the bitshift operator points. For example, ``op1 >> op2`` means
-that ``op1`` runs first and ``op2`` runs seconds. Multiple operators can be
+that ``op1`` runs first and ``op2`` runs second. Multiple operators can be
 composed -- keep in mind the chain is executed left-to-right and the rightmost
 object is always returned. For example:
 

http://git-wip-us.apache.org/repos/asf/incubator-airflow/blob/8d729757/docs/configuration.rst
--
diff --git a/docs/configuration.rst b/docs/configuration.rst
index 2d8a9fb..3eed553 100644
--- a/docs/configuration.rst
+++ b/docs/configuration.rst
@@ -107,7 +107,7 @@ Here are a few imperative requirements for your workers:
   ``MySqlOperator``, the required Python library needs to be available in
   the ``PYTHONPATH`` somehow
 - The worker needs to have access to its ``DAGS_FOLDER``, and you need to
-  synchronize the filesystems by your own mean. A common setup would be to
+  synchronize the filesystems by your own means. A common setup would be to
   store your DAGS_FOLDER in a Git repository and sync it across machines using
   Chef, Puppet, Ansible, or whatever you use to configure machines in your
   environment. If all your boxes have a common mount point, having your



[jira] [Closed] (AIRFLOW-138) Airflow improperly shows task status as 'up for retry' for a task that failed on re-run

2016-05-23 Thread Siddharth Anand (JIRA)

 [ 
https://issues.apache.org/jira/browse/AIRFLOW-138?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Siddharth Anand closed AIRFLOW-138.
---

> Airflow improperly shows task status as 'up for retry' for a task that failed 
> on re-run
> ---
>
> Key: AIRFLOW-138
> URL: https://issues.apache.org/jira/browse/AIRFLOW-138
> Project: Apache Airflow
>  Issue Type: Bug
>  Components: scheduler
>Affects Versions: Airflow 1.7.0
>Reporter: Tomasz Bartczak
>Assignee: Siddharth Anand
>Priority: Minor
>
> Migrated from https://github.com/apache/incubator-airflow/issues/1441
> Dear Airflow Maintainers,
> *Environment*
> Before I tell you about my issue, let me describe my Airflow environment:
> {panel}
> Airflow version: 1.7.0
> Airflow components: webserver, mysql, scheduler with celery executor
> Python Version: 2.7.6
> Operating System: Linux Ubuntu 3.19.0-26-generic
> {panel}
> *Description of Issue*
> Now that you know a little about me, let me tell you about the issue I am 
> having:
> *What I expect:*
> If I do a re-run and it fails - The task should be either re-tried again 
> (resetting retry count) and marked accordingly in GUI OR not retried - and 
> marked in GUI as 'failed'
> What happened instead? The task in the GUI was presented as 
> 'up_for_retry' however it was not retried, even after retry_delay has passed
> *Reproducing the Issue*
> DAG does not have some strange settings:
> {code}
> concurrency= 3,
> max_active_runs = 2,
> start_date = datetime(2016,04,03,01),
> default_args={
> 'depends_on_past': False,
> 'retries': 2,
> 'retry_delay': timedelta(minutes=3) }
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Closed] (AIRFLOW-151) trigger_rule='one_success' not allowing tasks downstream of a BranchPythonOperator to be executed

2016-05-23 Thread Siddharth Anand (JIRA)

 [ 
https://issues.apache.org/jira/browse/AIRFLOW-151?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Siddharth Anand closed AIRFLOW-151.
---

> trigger_rule='one_success' not allowing tasks downstream of a 
> BranchPythonOperator to be executed
> -
>
> Key: AIRFLOW-151
> URL: https://issues.apache.org/jira/browse/AIRFLOW-151
> Project: Apache Airflow
>  Issue Type: Bug
>Reporter: Siddharth Anand
>Assignee: Siddharth Anand
> Attachments: DAG_Problem.png, DAG_Problem_Resolved.PNG, 
> DAG_Solution_Example.png
>
>
> Porting from https://github.com/apache/incubator-airflow/issues/1521
> Dear Airflow Maintainers,
> *Environment*
> {panel}
> Airflow version: 1.7.0rc3
> Airflow components: webserver, scheduler, worker, postgres database, 
> CeleryExecutor
> Relevant airflow.cfg settings: nothing special here; mostly defaults
> Python Version: 3.4.3
> Operating System: Centos 6.7
> Python packages: virtualenv with standard airflow install
> {panel}
> *Background*
> We are constructing a workflow to automate standard business processes around 
> the creation and maintenance of virtual machines. After creation, we verify 
> several information points on the VM to ensure that it is a viable machine 
> and that no configuration errors occurred. If it fails verification and is 
> not running, then it should be deleted. If it fails verification and is 
> running, then we stop it first, then delete it.
> *What did you expect to happen?*
> After researching the BranchPythonOperator, I found that I should be using 
> trigger_rule='one_success' to allow a task at a join point downstream of the 
> branch(es) to be triggered, as mentioned in #1078. So, I defined the task as 
> follows:
> {code}
> delete_vm = PythonOperator(
>  task_id='delete_vm',
>  trigger_rule=TriggerRule.ONE_SUCCESS,
>  python_callable=_delete_vm,
>  provide_context=True,
>  dag=dag)
> delete_vm.set_upstream({poll_vm_stop, verify_vm})
> {code}
> *What happened instead?*
> Rather than executing correctly, the delete_vm task is marked as skipped and 
> is not re-evaluated following poll_vm_stop. There is no stack trace 
> available, as the task simply does not execute. Sidenote: the PythonSensor 
> you see in the picture below is a sensor which evaluates the truthy- or 
> falsey-ness of a Python callable. It has been tested extensively and works as 
> intended.
> !DAG_Problem.png!
> Any help would be greatly appreciated. I've tested various ways of linking 
> the dag, providing DummyOperators as buffers, using a second 
> BranchPythonOperator to explicitly call the task; all of these have failed. 
> Am I missing something obvious here?



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (AIRFLOW-151) trigger_rule='one_success' not allowing tasks downstream of a BranchPythonOperator to be executed

2016-05-23 Thread William Clark (JIRA)

[ 
https://issues.apache.org/jira/browse/AIRFLOW-151?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15296493#comment-15296493
 ] 

William Clark commented on AIRFLOW-151:
---

[~sanand], thank you so much for the assistance! With those changes, the DAG is 
now functioning as expected.

!DAG_Problem_Resolved.PNG!

> trigger_rule='one_success' not allowing tasks downstream of a 
> BranchPythonOperator to be executed
> -
>
> Key: AIRFLOW-151
> URL: https://issues.apache.org/jira/browse/AIRFLOW-151
> Project: Apache Airflow
>  Issue Type: Bug
>Reporter: Siddharth Anand
>Assignee: Siddharth Anand
> Attachments: DAG_Problem.png, DAG_Problem_Resolved.PNG, 
> DAG_Solution_Example.png
>
>
> Porting from https://github.com/apache/incubator-airflow/issues/1521
> Dear Airflow Maintainers,
> *Environment*
> {panel}
> Airflow version: 1.7.0rc3
> Airflow components: webserver, scheduler, worker, postgres database, 
> CeleryExecutor
> Relevant airflow.cfg settings: nothing special here; mostly defaults
> Python Version: 3.4.3
> Operating System: Centos 6.7
> Python packages: virtualenv with standard airflow install
> {panel}
> *Background*
> We are constructing a workflow to automate standard business processes around 
> the creation and maintenance of virtual machines. After creation, we verify 
> several information points on the VM to ensure that it is a viable machine 
> and that no configuration errors occurred. If it fails verification and is 
> not running, then it should be deleted. If it fails verification and is 
> running, then we stop it first, then delete it.
> *What did you expect to happen?*
> After researching the BranchPythonOperator, I found that I should be using 
> trigger_rule='one_success' to allow a task at a join point downstream of the 
> branch(es) to be triggered, as mentioned in #1078. So, I defined the task as 
> follows:
> {code}
> delete_vm = PythonOperator(
>  task_id='delete_vm',
>  trigger_rule=TriggerRule.ONE_SUCCESS,
>  python_callable=_delete_vm,
>  provide_context=True,
>  dag=dag)
> delete_vm.set_upstream({poll_vm_stop, verify_vm})
> {code}
> *What happened instead?*
> Rather than executing correctly, the delete_vm task is marked as skipped and 
> is not re-evaluated following poll_vm_stop. There is no stack trace 
> available, as the task simply does not execute. Sidenote: the PythonSensor 
> you see in the picture below is a sensor which evaluates the truthy- or 
> falsey-ness of a Python callable. It has been tested extensively and works as 
> intended.
> !DAG_Problem.png!
> Any help would be greatly appreciated. I've tested various ways of linking 
> the dag, providing DummyOperators as buffers, using a second 
> BranchPythonOperator to explicitly call the task; all of these have failed. 
> Am I missing something obvious here?



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (AIRFLOW-165) Add a description/metadata field to the Task

2016-05-23 Thread Richard Davison (JIRA)
Richard Davison created AIRFLOW-165:
---

 Summary: Add a description/metadata field to the Task
 Key: AIRFLOW-165
 URL: https://issues.apache.org/jira/browse/AIRFLOW-165
 Project: Apache Airflow
  Issue Type: Improvement
Reporter: Richard Davison
Priority: Minor


I think that to help facilitate self documentation, we should add either a 
`description` field or a `metadata` field at the Task level so we can add an 
arbitrary blob of information to describe it.  

On the UI side, we could put the description in the the alt text, a mouseover 
popup, in the onclick overlay popup, or in a link inside of that similar to the 
optional 'subdag' link.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Comment Edited] (AIRFLOW-140) DagRun state not updated

2016-05-23 Thread dud (JIRA)

[ 
https://issues.apache.org/jira/browse/AIRFLOW-140?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15292963#comment-15292963
 ] 

dud edited comment on AIRFLOW-140 at 5/23/16 9:49 AM:
--

Hello.
I tried with the LocalExecutor as requested and I observed the same behaviour :
{code}
airflow=> SELECT * FROM task_instance WHERE dag_id = :dag_id ORDER BY 
execution_date ; SELECT * FROM dag_run WHERE dag_id = :dag_id ; SELECT * FROM 
job ORDER BY start_date DESC LIMIT 5;
 task_id   |   dag_id   |   execution_date| start_date  
   |  end_date  | duration  |  state  | try_number | hostname  
| unixname | job_id | pool |  queue  | priority_weight |operator| 
queued_dttm
--+---+-+++---+-++---+--++--+-+-++-
 alt_sleep | dagrun_not_updated | 2016-05-20 07:45:00 | 2016-05-20 
07:46:54.372843 ||   | running |  1 
| localhost | airflow  |   3203 |  | default |   1 | 
PythonOperator |
 alt_sleep | dagrun_not_updated | 2016-05-20 07:46:00 | 2016-05-20 
07:47:19.317705 | 2016-05-20 07:47:29.453316 | 10.135611 | success |  1 
| localhost | airflow  |   3204 |  | default |   1 | 
PythonOperator |
 alt_sleep | dagrun_not_updated | 2016-05-20 07:47:00 | 2016-05-20 
07:48:01.724885 ||   | running |  1 
| localhost | airflow  |   3205 |  | default |   1 | 
PythonOperator |
 alt_sleep | dagrun_not_updated | 2016-05-20 07:48:00 | 2016-05-20 
07:49:12.031225 | 2016-05-20 07:49:22.083763 | 10.052538 | success |  1 
| localhost | airflow  |   3206 |  | default |   1 | 
PythonOperator |
(4 rows)

  id  |dag_id  |   execution_date|  state  | run_id 
| external_trigger | conf | end_date | start_date
--+---+-+-++--+--+--+
 1485 | dagrun_not_updated | 2016-05-20 07:45:00 | running | 
scheduled__2016-05-20T07:45:00 | f|  |  | 
2016-05-20 07:46:38.30924
 1486 | dagrun_not_updated | 2016-05-20 07:46:00 | running | 
scheduled__2016-05-20T07:46:00 | f|  |  | 
2016-05-20 07:47:01.563541
 1487 | dagrun_not_updated | 2016-05-20 07:47:00 | running | 
scheduled__2016-05-20T07:47:00 | f|  |  | 
2016-05-20 07:48:00.016718
 1488 | dagrun_not_updated | 2016-05-20 07:48:00 | running | 
scheduled__2016-05-20T07:48:00 | f|  |  | 
2016-05-20 07:49:00.203204
(4 rows)

  id  | dag_id |  state  |   job_type   | start_date |  
end_date  |  latest_heartbeat  | executor_class | hostname  | 
unixname 
--++-+--+++++---+--
 3206 || success | LocalTaskJob | 2016-05-20 07:49:08.691714 | 
2016-05-20 07:49:23.706144 | 2016-05-20 07:49:08.691725 | LocalExecutor  | 
localhost | airflow
 3205 || running | LocalTaskJob | 2016-05-20 07:48:01.155988 |  
  | 2016-05-20 07:50:51.312164 | LocalExecutor  | localhost | 
airflow
 3204 || success | LocalTaskJob | 2016-05-20 07:47:16.153078 | 
2016-05-20 07:47:31.168997 | 2016-05-20 07:47:16.153091 | LocalExecutor  | 
localhost | airflow
 3203 || running | LocalTaskJob | 2016-05-20 07:46:48.198379 |  
  | 2016-05-20 07:50:53.42636  | LocalExecutor  | localhost | 
airflow
 3202 || running | SchedulerJob | 2016-05-20 07:45:31.43799  |  
  | 2016-05-20 07:50:55.061958 | LocalExecutor  | localhost | 
airflow
{code}

Extract of database logs :
{code}
2016-05-20 07:47:31 UTC [24003-36] airflow@airflow LOG:  duration: 38.731 ms  
statement: UPDATE job SET state='success', 
end_date='2016-05-20T07:47:31.168997'::timestamp, 
latest_heartbeat='2016-05-20T07:47:16.153091'::timestamp WHERE job.id = 3204
2016-05-20 07:49:23 UTC [24107-36] airflow@airflow LOG:  duration: 0.179 ms  
statement: UPDATE job SET state='success', 
end_date='2016-05-20T07:49:23.706144'::timestamp, 
latest_heartbeat='2016-05-20T07:49:08.691725'::timestamp WHERE job.id = 3206
2016-05-20 07:52:03 UTC [23971-336] airflow@airflow LOG:  duration: 0.291 ms  
statement: UPDATE job SET state='success', 
end_date='2016-05-20T07:52:03.526927'::timestamp, 
latest_heartbeat='2016-05-20T07:46:48.198389'::timestamp WHERE job.id = 3203
2016-05-20 07:53:06 UTC [24047-326] airflow@airflow LOG:  duration: 0.179 

[jira] [Created] (AIRFLOW-164) Disable the web UI's page load animations

2016-05-23 Thread Bence Nagy (JIRA)
Bence Nagy created AIRFLOW-164:
--

 Summary: Disable the web UI's page load animations
 Key: AIRFLOW-164
 URL: https://issues.apache.org/jira/browse/AIRFLOW-164
 Project: Apache Airflow
  Issue Type: Improvement
  Components: ui
Affects Versions: Airflow 1.7.1
Reporter: Bence Nagy


Alright, some people might disagree, looking forward to the discussion here. 
Basically my qualm is that opening almost any page (even the DAGs list) will 
trigger an animation where the content kinda swoops in from the left/the 
top/the top-left corner. This gets pretty annoying for a few reasons:

- It takes around half a second before the content is visually parsable, and 
these half seconds accumulate pretty quick when doing lots of administration.
- This makes visual diffing when refreshing or editing the URL impossible. If 
the animations weren't firing, it would be possible to refresh for instance the 
tree view of a complicated DAG and just see the treemap change, making the 
differences obvious. Currently you need to commit the state to memory and then 
recall it after the animation has finished to try and figure out what has 
changed.
- I think it just makes no sense from a design point of view anyway to have all 
this data sliding around the screen. It's not like it passes off as a 
transition animation or anything.

What does everyone else think?



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (AIRFLOW-162) Allow variables to be exposed in the templates

2016-05-23 Thread Alex Van Boxel (JIRA)
Alex Van Boxel created AIRFLOW-162:
--

 Summary: Allow variables to be exposed in the templates
 Key: AIRFLOW-162
 URL: https://issues.apache.org/jira/browse/AIRFLOW-162
 Project: Apache Airflow
  Issue Type: Improvement
  Components: core
Reporter: Alex Van Boxel
Assignee: Alex Van Boxel
Priority: Trivial


Allow variables to be exposed in the templates. This makes it possible to 
access them in the following way, example:

{var.gcp_dataflow_base}/test-pipleline.jar

In this example the basepath in configured in variables. This makes it possible 
to make some parts configurable (example for differences in prod/staging/test).



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (AIRFLOW-161) Redirection to external url

2016-05-23 Thread Sumit Maheshwari (JIRA)
Sumit Maheshwari created AIRFLOW-161:


 Summary: Redirection to external url
 Key: AIRFLOW-161
 URL: https://issues.apache.org/jira/browse/AIRFLOW-161
 Project: Apache Airflow
  Issue Type: Improvement
  Components: webserver
Reporter: Sumit Maheshwari


Hi,

I am not able to find a good way (apart from loading everything upfront), where 
I can redirect someone to a external service url, using the information stored 
in airflow. There could be many use cases like downloading a signed file from 
s3, redirecting to hadoop job tracker, or a direct case on which I am working 
which is linking airflow tasks to qubole commands. 

I already have a working model and will open a PR soon. Please let me know if 
there existing ways already.

Thanks,
Sumit





--
This message was sent by Atlassian JIRA
(v6.3.4#6332)