Re: S3keysonsor

2018-05-21 Thread Joe Napolitano
Great, I think we're in agreement on your definition of static.

In my own experience, working with S3 keys can be painful if you can't
anticipate the key name. I don't think the S3KeySensor will work as it's
written.

There's another operator that's not in the docs, but can be seen below the
S3KeySensor called S3PrefixSensor here:
https://airflow.apache.org/_modules/sensors.html#S3KeySensor

That may work for you. Overall your question was whether or Airflow suits
your needs. I think the answer to that is YES, but in the worst case you'll
have to write a customer operator to handle your needs precisely, .e.g. by
processing all files that match a prefix "s3a://mybucket/{{date}}*".

On Mon, May 21, 2018 at 2:59 PM, purna pradeep <purna2prad...@gmail.com>
wrote:

> + Joe
>
>
>
> On Mon, May 21, 2018 at 2:56 PM purna pradeep <purna2prad...@gmail.com>
> wrote:
>
>> I do know only to some extent , I mean If you see my sample s3 locations
>>
>> s3a://mybucket/20180425_111447_data1/_SUCCESS
>>
>> s3a://mybucket/20180424_111241_data1/_SUCCESS
>>
>>
>>
>> The only values which are static in above location are
>>
>> s3a://mybucket/
>>
>> data1/_SUCCESS
>>
>> Now I want to configure tolerance for _SUCCESS file as latest or 1 day
>> older based on this configuration it should pick the right time stamp
>> folder which has _SUCCESS file
>>
>> On Mon, May 21, 2018 at 2:35 PM Joe Napolitano <joe.napolit...@wework.com>
>> wrote:
>>
>>> Purna, with regards to "this path is not completely static," can you
>>> clarify what you mean?
>>>
>>> Do you mean that you don't know the actual key name beforehand? E.g.
>>> pertaining to "111447", "111241", and "111035" in your example?
>>>
>>> On Mon, May 21, 2018 at 2:23 PM, Brian Greene <
>>> br...@heisenbergwoodworking.com> wrote:
>>>
>>> > I suggest it’ll work for your needs.
>>> >
>>> > Sent from a device with less than stellar autocorrect
>>> >
>>> > > On May 21, 2018, at 10:16 AM, purna pradeep <purna2prad...@gmail.com
>>> >
>>> > wrote:
>>> > >
>>> > > Hi ,
>>> > >
>>> > > I’m trying to evaluate airflow to see if it suits my needs.
>>> > >
>>> > > Basically i can have below steps in a DAG
>>> > >
>>> > >
>>> > >
>>> > > 1)Look for a file arrival on given s3 location (this path is not
>>> > completely
>>> > > static) (i can use S3Keysensor in this step)
>>> > >
>>> > >  i should be able to specify to look either for latest folder or
>>> 24hrs or
>>> > > n number of days older folder which has _SUCCESS file as mentioned
>>> below
>>> > >
>>> > >  sample file location(s):
>>> > >
>>> > >  s3a://mybucket/20180425_111447_data1/_SUCCESS
>>> > >
>>> > >
>>
>>
>> s3a://mybucket/20180424_111241_data1/_SUCCESS
>>> > >
>>> > >  s3a://mybucket/20180424_111035_data1/_SUCCESS
>>> > >
>>> > >
>>> > >
>>> > > 2)invoke a simple restapi using HttpSimpleOperator once the above
>>> > > dependency is met ,i can set upstream for step2 as step1
>>> > >
>>> > >
>>> > >
>>> > > Does S3keysensor supports step1 out of the box?
>>> > >
>>> > > Also in some cases i may to have a DAG without start date & end date
>>> it
>>> > > just needs to be triggered once file is available in a given s3
>>> location
>>> > >
>>> > >
>>> > >
>>> > > *Please suggest !*
>>> >
>>>
>>


Re: S3keysonsor

2018-05-21 Thread Joe Napolitano
Purna, with regards to "this path is not completely static," can you
clarify what you mean?

Do you mean that you don't know the actual key name beforehand? E.g.
pertaining to "111447", "111241", and "111035" in your example?

On Mon, May 21, 2018 at 2:23 PM, Brian Greene <
br...@heisenbergwoodworking.com> wrote:

> I suggest it’ll work for your needs.
>
> Sent from a device with less than stellar autocorrect
>
> > On May 21, 2018, at 10:16 AM, purna pradeep 
> wrote:
> >
> > Hi ,
> >
> > I’m trying to evaluate airflow to see if it suits my needs.
> >
> > Basically i can have below steps in a DAG
> >
> >
> >
> > 1)Look for a file arrival on given s3 location (this path is not
> completely
> > static) (i can use S3Keysensor in this step)
> >
> >  i should be able to specify to look either for latest folder or 24hrs or
> > n number of days older folder which has _SUCCESS file as mentioned below
> >
> >  sample file location(s):
> >
> >  s3a://mybucket/20180425_111447_data1/_SUCCESS
> >
> >  s3a://mybucket/20180424_111241_data1/_SUCCESS
> >
> >  s3a://mybucket/20180424_111035_data1/_SUCCESS
> >
> >
> >
> > 2)invoke a simple restapi using HttpSimpleOperator once the above
> > dependency is met ,i can set upstream for step2 as step1
> >
> >
> >
> > Does S3keysensor supports step1 out of the box?
> >
> > Also in some cases i may to have a DAG without start date & end date it
> > just needs to be triggered once file is available in a given s3 location
> >
> >
> >
> > *Please suggest !*
>


Re: Airflow Docker Container

2018-05-14 Thread Joe Napolitano
You may consider this base image we put together at Blue Apron. My fork
fixes a build issue by pinning to pip < 10.

https://github.com/joenap/airflow-base

Joe Nap

On Mon, May 14, 2018 at 4:37 PM, Daniel Imberman 
wrote:

> @Fokko
>
> I definitely agree with that. I think that having a "super lightweight"
> image for just running a basic airflow instance makes sense. We could even
> name the image something like  airflow-k8s so people know it's ONLY meant
> to work in a k8s cluster. I'm trying to figure out what methods besides
> helm we should be considering (Helm doesn't really have full saturation in
> the k8s world so wanna see if there are other deployment tools we should
> consider).
>
> @Scott Dang quite a bit is definitely an understatement :). Would anyone on
> your team have some cycles to work with @jzucker or @sedwards on the
> helm/deployment stuff?
>
> On Mon, May 14, 2018 at 1:18 PM Driesprong, Fokko 
> wrote:
>
> > Hi Daniel,
> >
> > My dear colleague from GoDataDriven, Bas Harenslak, started on building
> an
> > official Docker container on the Dockerhub. I've put him in the CC. In
> the
> > end I strongly believe the image should end up in the official Docker
> > repository: https://github.com/docker-library/official-images
> >
> > Right now, the excellent images provided by Puckel are widely used for
> > running Airflow in Docker. For the Kubernetes build we need to pull in
> some
> > additional dependencies. Maybe a good idea to do this separately from the
> > one from Puckel, to keep his images lightweight. Any thoughts?
> >
> > Kind regards,
> > Fokko Driesprong
> >
> >
> > 2018-05-14 22:09 GMT+02:00 Anirudh Ramanathan <
> > ramanath...@google.com.invalid>:
> >
> >> @Erik Erlandson  has had conversations about publishing
> >
> >
> >> docker images with the ASF Legal team.
> >> Adding him to the thread.
> >>
> >> On Mon, May 14, 2018 at 1:07 PM Daniel Imberman <
> >> daniel.imber...@gmail.com>
> >> wrote:
> >>
> >> > Hi everyone,
> >> >
> >> > I've started looking into creating an official airflow docker
> container
> >> > s.t. users of the KubernetesExecutor could auto-pull from helm
> >> > charts/deployment yamls/etc. I was wondering what everyone thinks the
> >> best
> >> > way to do this would be? Is there an official apache docker repo? Is
> >> there
> >> > a preferred linux distro?
> >> >
> >> > cc: @anirudh since this was something you had to deal with for
> >> > spark-on-k8s.
> >> >
> >>
> >>
> >> --
> >> Anirudh Ramanathan
> >>
> >
>