StackOverflow synonym proposal

2018-01-15 Thread Baghino, Stefano(AWF)
Hi everybody,

on StackOverflow many people regularly come to get help about Airflow. 
Currently on the site there are both the `airflow` and `apache-airflow` tags, 
which both mean the same but with the former having more tagged questions and 
history behind it. To make it easier for people to search and tag posts 
regarding Airflow I proposed making `apache-airflow` a synonym of `airflow`. If 
you are active on the tag on the site and have at least a score of 5 on the 
`airflow` tag, I'd love if you could consider approving the tag creation here: 
https://stackoverflow.com/tags/airflow/synonyms

Once the proposal reaches 4 upvotes, the synonym is created.

Best,
Stefano


Close SqlSensor Connection

2018-01-15 Thread Alexis Rolland
Hello everyone,

I'm reaching out to discuss / suggest a small improvement in the class 
SqlSensor:
https://pythonhosted.org/airflow/_modules/airflow/operators/sensors.html

We are currently using SqlSensors on top of Teradata in several DAGs. When the 
DAGs execute we receive the following error message from Teradata engine: Error 
8024 All virtual circuits are currently in use.
This error message would typically appear when we reach the maximum number of 
simultaneous connections to the database.

I am suspecting the SqlSensor task to create a lot of connections - basically 
every time it (re)tries, and these connections would end up in idle state.
Does closing the connection at the end of the SqlSensor poke method sounds 
feasible?

I'd like to take this opportunity as well to thank you for the awesome work 
you've been doing with Airflow.
Keep it up!

Best,

[far-cry-signature]



Re: Airflow + Kubernetes + Git Sync

2018-01-15 Thread jordan.zuc...@gmail.com


On 2018-01-12 16:17, Anirudh Ramanathan  wrote: 
> > Any good way to debug this?
> 
> One way might be reading the events from "kubectl get events". That should
> reveal some information about the pod removal event.
> This brings up another question - should errored pods be persisted for
> debugging?
> 
> On Fri, Jan 12, 2018 at 3:07 PM, jordan.zuc...@gmail.com <
> jordan.zuc...@gmail.com> wrote:
> 
> > I'm trying to use Airflow and Kubernetes and having trouble using git sync
> > to pull DAGs into workers.
> >
> > I use a git sync init container on the scheduler to pull in DAGs initially
> > and that works. But when worker pods are spawned, the workers terminate
> > almost immediately because they cannot find the DAGs. But since the workers
> > terminate so quickly, I can't even inspect the file structure to see where
> > the DAGs ended up during the workers git sync init container.
> >
> > I noticed that the git sync init container for the workers is hard coded
> > into /tmp/dags and there is a git_subpath config setting as well. But I
> > can't understand how the git synced DAGs ever end up in /root/airflow/dags
> >
> > I am successfully using a git sync init container for the scheduler, so I
> > know my git credentials are valid. Any good way to debug this? Or an
> > example of how to set this up correctly?
> >
> 
> 
> 
> -- 
> Anirudh Ramanathan
Anirudh, in my case, the pods are persisted because I set the 
`delete_worker_pods` to False in the airflow configmap, however, I cannot exec 
into them because they terminated on an error and are no longer running.


Re: Airflow + Kubernetes + Git Sync

2018-01-15 Thread jordan.zuc...@gmail.com


On 2018-01-13 08:12, Daniel Imberman  wrote: 
> @jordan can you turn delete mode off and post the kubectl describe results
> for the workers?
Already had delete mode turned off. This was a really useful command. I can see 
the basic logs in the k8s dashboard:
+ airflow run jordan_dag_3 run_this_1 2018-01-15T10:00:00 --local -sd 
/root/airflow/dags/jordan3.py
[2018-01-15 19:40:52,978] {__init__.py:46} INFO - Using executor LocalExecutor
[2018-01-15 19:40:53,012] {models.py:187} INFO - Filling up the DagBag from 
/root/airflow/dags/jordan3.py
Traceback (most recent call last):
  File "/usr/local/bin/airflow", line 27, in 
args.func(args)
  File "/usr/local/lib/python2.7/dist-packages/airflow/bin/cli.py", line 350, 
in run
dag = get_dag(args)
  File "/usr/local/lib/python2.7/dist-packages/airflow/bin/cli.py", line 128, 
in get_dag
'parse.'.format(args.dag_id))
airflow.exceptions.AirflowException: dag_id could not be found: jordan_dag_3. 
Either the dag did not exist or it failed to parse.

I know the DAG is there in the scheduler and the webserver. I have reason to 
believe the git-sync init container in the worker isn't checking out the files 
in a way that the worker can use. Here's the info you requested:

Name: jordandag3runthis1-8df809f80c874d6ca50acb0d0480307c
Namespace:default
Node: minikube/192.168.99.100
Start Time:   Mon, 15 Jan 2018 11:59:57 -0800
Labels:   airflow-slave=
  dag_id=jordan_dag_3
  execution_date=2018-01-15T19_59_54.838835
  task_id=run_this_1
Annotations:  
pod.alpha.kubernetes.io/init-container-statuses=[{"name":"git-sync-clone","state":{"terminated":{"exitCode":0,"reason":"Completed","startedAt":"2018-01-15T19:59:58Z","finishedAt":"2018-01-15T19:59:59Z...
  
pod.alpha.kubernetes.io/init-containers=[{"name":"git-sync-clone","image":"gcr.io/google-containers/git-sync-amd64:v2.0.5","env":[{"name":"GIT_SYNC_REPO","value":"...
  
pod.beta.kubernetes.io/init-container-statuses=[{"name":"git-sync-clone","state":{"terminated":{"exitCode":0,"reason":"Completed","startedAt":"2018-01-15T19:59:58Z","finishedAt":"2018-01-15T19:59:59Z"...
  
pod.beta.kubernetes.io/init-containers=[{"name":"git-sync-clone","image":"gcr.io/google-containers/git-sync-amd64:v2.0.5","env":[{"name":"GIT_SYNC_REPO","value":"https://github.com/pubnub/caravan.git;...
Status:   Failed
IP:   
Init Containers:
  git-sync-clone:
Container ID:   
docker://c3dcc435d18362271fe5ab8098275d082c01ab36fc451d695e6e0e54ad71132a
Image:  gcr.io/google-containers/git-sync-amd64:v2.0.5
Image ID:   
docker-pullable://gcr.io/google-containers/git-sync-amd64@sha256:904833aedf3f14373e73296240ed44d54aecd4c02367b004452dfeca2465e5bf
Port:   
State:  Terminated
  Reason:   Completed
  Exit Code:0
  Started:  Mon, 15 Jan 2018 11:59:58 -0800
  Finished: Mon, 15 Jan 2018 11:59:59 -0800
Ready:  True
Restart Count:  0
Environment:
  GIT_SYNC_REPO:  
  GIT_SYNC_BRANCH:master
  GIT_SYNC_ROOT:  /tmp
  GIT_SYNC_DEST:  dags
  GIT_SYNC_ONE_TIME:  true
  GIT_SYNC_USERNAME:  jzucker2
  GIT_SYNC_PASSWORD:  
Mounts:
  /root/airflow/airflow.cfg from airflow-config (ro)
  /root/airflow/dags/ from airflow-dags (rw)
  /var/run/secrets/kubernetes.io/serviceaccount from default-token-0bq1k 
(ro)
Containers:
  base:
Container ID:  
Image: 
Image ID:  
Port:  
Command:
  bash
  -cx
  --
Args:
  airflow run jordan_dag_3 run_this_1 2018-01-15T19:59:54.838835 --local 
-sd /root/airflow/dags/jordan3.py
State:  Terminated
  Reason:   Error
  Exit Code:1
  Started:  Mon, 15 Jan 2018 12:00:00 -0800
  Finished: Mon, 15 Jan 2018 12:00:01 -0800
Ready:  False
Restart Count:  0
Environment:
  AIRFLOW__CORE__AIRFLOW_HOME:   /root/airflow
  AIRFLOW__CORE__EXECUTOR:   LocalExecutor
  AIRFLOW__CORE__DAGS_FOLDER:/tmp/dags
  SQL_ALCHEMY_CONN:Optional: false
  GIT_SYNC_USERNAME: Optional: false
  GIT_SYNC_PASSWORD: Optional: false
  AIRFLOW_CONN_PORTAL_DB_URI:  Optional: false
  AIRFLOW_CONN_OVERMIND_DB_URI:Optional: false
Mounts:
  /root/airflow/airflow.cfg from airflow-config (ro)
  /root/airflow/dags/ from airflow-dags (ro)
  /var/run/secrets/kubernetes.io/serviceaccount from default-token-0bq1k 
(ro)
Conditions:
  Type   Status
  InitializedTrue 
  Ready  False 
  PodScheduled   True 
Volumes:
  airflow-dags:
Type:EmptyDir (a temporary directory that shares a pod's lifetime)
Medium:  
  airflow-config:
Type:  ConfigMap (a volume populated by a ConfigMap)
Name:  

Re: Credentials for accessing endpoints

2018-01-15 Thread Shah Altaf
It might depend on who you're trying to secure it from.  My understanding
is (correct me) that the airflow connections are encrypted in the DB so it
should be good enough -- just make sure you've got the crypto package and
you've generated a FERNET key.  See :
http://airflow.readthedocs.io/en/latest/configuration.html#connections

If you want to go beyond that, and suppose you're using containers, you
could have a look at environment variables.  There are other external
secret holders like Hashicorp Vault, AWS Parameter Store... but you'll need
to evaluate whether it's worth the extra setup/overhead.



On Mon, Jan 15, 2018 at 7:05 PM Veeranagouda Mukkanagoudar <
mukkanagou...@gmail.com> wrote:

> Hi,
>
> We have been using .conf/.ini files to store the credentials to access
> endpoints. Wondering if there is a better and secure way to store the
> credentials besides as Airflow connections in Admin console.
>
> -Veera
>


Credentials for accessing endpoints

2018-01-15 Thread Veeranagouda Mukkanagoudar
Hi,

We have been using .conf/.ini files to store the credentials to access
endpoints. Wondering if there is a better and secure way to store the
credentials besides as Airflow connections in Admin console.

-Veera


Re: Fix on_kill command for operators

2018-01-15 Thread EKC (Erik Cederstrand)
I'm not sure if it's related, but there's an additional issue with attempting 
to kill processes, that doesn't always kill the process: 
https://issues.apache.org/jira/browse/AIRFLOW-949


Kind regards,

Erik Cederstrand


From: Milan van der Meer 
Sent: Friday, January 12, 2018 9:23:24 PM
To: dev@airflow.incubator.apache.org
Subject: Re: Fix on_kill command for operators

Currently the on_kill does not get triggered when you clear from the UI.
As you mention, adding the 'fix' mentioned in the first comment of the
issue, does no fix the problem as it does not trigger on the right
operators context.

Im not sure what exact changes are planned for the next 1.10 release, but
if the whole UI change is planned, this could be a good opportunity to also
fix this bug.

On Mon, Jan 8, 2018 at 2:55 PM, Ash Berlin-Taylor <
ash_airflowl...@firemirror.com> wrote:

> Without this change does on_kill ever get triggered? It seems like this
> change is desired behaviour.
>
> As per the first comment 
> https://emea01.safelinks.protection.outlook.com/?url=https%3A%2F%2Fissues.apache.org%2F=01%7C01%7CEKC%40novozymes.com%7C4d2e0189d7e44025c48608d559fa5913%7C43d5f49ee03a4d22a2285684196bb001%7C0=9%2BWOuDmUe9M1WK2%2FqSsOtsaNrZnTtbvX%2Fq20dOBXVTA%3D=0
> jira/browse/AIRFLOW-1623?focusedCommentId=16171819&
> page=com.atlassian.jira.plugin.system.issuetabpanels:
> comment-tabpanel#comment-16171819 
>  jira/browse/AIRFLOW-1623?focusedCommentId=16171819&
> page=com.atlassian.jira.plugin.system.issuetabpanels:
> comment-tabpanel#comment-16171819> I'm not sure this is the right fix. It
> also seems like this would end up running the on_kill in a different
> process to the rest of the operator.
>
> I wonder if somewhere a signal handler is missing somewhere in one of the
> `run --local` or `run --raw`. I tried to follow all the paths through from
> ui to sig handlers but got stuck in a tiwsty maze of classes. (and was
> attempting to do it just from reading the code)?
>
>
> > On 8 Jan 2018, at 13:15, Driesprong, Fokko  wrote:
> >
> > Yes, for Spark this should work. Depending on the operator and the
> > implementation:
> > https://emea01.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgithub.com%2Fapache%2Fincubator-airflow%2Fblob%2F=01%7C01%7CEKC%40novozymes.com%7C4d2e0189d7e44025c48608d559fa5913%7C43d5f49ee03a4d22a2285684196bb001%7C0=z1p%2FFlZ2cgPo%2BVK9sQJoDDJbKBEIdslsxtGOP%2F%2B7g0M%3D=0
> 3e6babe8ed8f8f281b67aa3f4e03bf3cfc1bcbaa/airflow/contrib/
> hooks/spark_submit_hook.py#L412-L428
> >
> > However this is a big change in behaviour. I'm curious about the opinion
> of
> > others.
> >
> > Cheers,
> > Fokko
> >
> >
> > 2018-01-08 14:12 GMT+01:00 Milan van der Meer <
> > milan.vanderm...@realimpactanalytics.com>:
> >
> >> Any help? :)
> >>
> >> On Thu, Dec 14, 2017 at 8:12 PM, Milan van der Meer <
> >> milan.vanderm...@realimpactanalytics.com> wrote:
> >>
> >>> I recently openend the following PR: 
> >>> https://emea01.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgithub.com%2Fapache%2F=01%7C01%7CEKC%40novozymes.com%7C4d2e0189d7e44025c48608d559fa5913%7C43d5f49ee03a4d22a2285684196bb001%7C0=7AHCTsBnf%2F9nW0IY4B5ef3zfeY%2FlrH86WXwCt9YPwiU%3D=0
> >>> incubator-airflow/pull/2877
> >>>
> >>> The problem is that on_kill is not called for operators when you clear
> a
> >>> task from the UI.
> >>> Thats problematic when working with ex. spark clusters as the jobs on
> the
> >>> cluster need to be killed.
> >>>
> >>> The issue is in the core code of Airflow and Im not familiar enough
> with
> >>> the inner workings there. So I could use some directions on this one
> from
> >>> people who are familiar.
> >>>
> >>> For more info, check out the PR.
> >>>
> >>> Kind regards,
> >>> Milan
> >>>
> >>
> >>
> >>
> >> --
> >>
> >> *Milan van der Meer*
> >>
> >> *Real**Impact* Analytics *| *Big Data Consultant
> >> https://emea01.safelinks.protection.outlook.com/?url=www.realimpactanalytics.com=01%7C01%7CEKC%40novozymes.com%7C4d2e0189d7e44025c48608d559fa5913%7C43d5f49ee03a4d22a2285684196bb001%7C0=863cz5%2FF3LXZh52xM3pD0ORkew1PI8Q1Bt%2B2pRiayWI%3D=0
> >>
> >> *BE *+32 498 45 96 22 <0032498459622>* | Skype *milan.vandermeer.ria
> >>
>
>


--

*Milan van der Meer*

*Real**Impact* Analytics *| *Big Data Consultant
https://emea01.safelinks.protection.outlook.com/?url=www.realimpactanalytics.com=01%7C01%7CEKC%40novozymes.com%7C4d2e0189d7e44025c48608d559fa5913%7C43d5f49ee03a4d22a2285684196bb001%7C0=863cz5%2FF3LXZh52xM3pD0ORkew1PI8Q1Bt%2B2pRiayWI%3D=0

*BE *+32 498 45 96 22 <0032498459622>* | Skype *milan.vandermeer.ria