Who's hiring, December 2016

2016-12-16 Thread Kostas Tzoumas
Hi folks,

As promised, here is the first thread for Flink-related job positions. If
your organization is hiring people on Flink-related positions do reply to
this thread with a link for applications.

data Artisans is hiring on multiple technical positions. Help us build
Flink, and help our customers be successful in their Flink projects:

- Senior distributed systems engineer:
https://data-artisans.workable.com/jobs/396284

- Software engineer (Java/Scala and/or Python):
https://data-artisans.workable.com/jobs/396286

- QA/DevOps engineer: https://data-artisans.workable.com/jobs/396288

- UI/UX engineer: https://data-artisans.workable.com/jobs/396287

- Senior data engineer (EU and USA):
https://data-artisans.workable.com/jobs/325667

Best regards,
Kostas

PS: As mentioned in the original DISCUSS thread, I am cc'ing the dev and
user lists in the first few emails to remind folks to subscribe to the new
commun...@flink.apache.org mailing list

Instructions to subscribe are here:
http://flink.apache.org/community.html#mailing-lists


[ANNOUNCE] New Flink community mailing list

2016-12-14 Thread Kostas Tzoumas
Hi everyone,

We have created a new Flink mailing lists, commun...@flink.apache.org where
we can post everything related to the broader Flink community including job
offers, upcoming meetups and conferences, exciting reads, and everything
else that is deemed worthy for the greater Flink community. We will be
running a monthly "Who's hiring" thread there for job positions on Flink.

The scope of this list is restricted to Flink-related content, i.e., please
do not post content or job offers on Big Data in general.

You can subscribe to this list as usual by sending an email to

community-subscr...@flink.apache.org

We will be adding this list to the Flink website soon.

Best,
Kostas


Re: [DISCUSS] "Who's hiring on Flink" monthly thread in the mailing lists?

2016-12-13 Thread Kostas Tzoumas
It seems that several folks are excited about the idea - but there is still
a concern on whether this would be spam for the dev@ and user@ lists (which
I share)

As a compromise, I propose to request a new mailing list (
commun...@flink.apache.org) which we can use for this purpose, and also to
post upcoming meetups, conferences, etc. In order to inform the community
about this mailing list, we can cc the dev@ and user@ lists in the first
months until the new mailing list has ramped up.

On Fri, Dec 9, 2016 at 4:55 PM, Greg Hogan  wrote:

> Google indexes the mailing list. Anyone can filter the messages to trash
> in a few clicks.
>
> This will also be a means for the community to better understand which and
> how companies are using Flink.
>
> On Fri, Dec 9, 2016 at 8:27 AM, Felix Neutatz 
> wrote:
>
>> Hi,
>>
>> I wonder whether a mailing list is a good choice for that in general. If
>> I am looking for a job I won't register for a mailing list or browse
>> through the archive of one but rather search it via Google. So what about
>> putting it on a dedicated site on the Web Page. This feels more intuitive
>> to me and gives a better overview.
>>
>> Best regards,
>> Felix
>>
>> On Dec 9, 2016 14:20, "Ufuk Celebi"  wrote:
>>
>>
>>
>>
>> On 9 December 2016 at 14:13:14, Robert Metzger (rmetz...@apache.org)
>> wrote:
>> > I'm against using the news@ list for that.
>> > The promise of the news@ list is that its low-traffic and only for
>> news. If
>> > we now start having job offers (and potentially some questions on them
>> > etc.) it'll be a list with more than some announcements.
>> > That's also the reason why the news@ list is completely moderated.
>>
>> I agree with Robert. I would consider that to be spam if posted to news@.
>>
>>
>>
>>
>


Re: [DISCUSS] "Who's hiring on Flink" monthly thread in the mailing lists?

2016-12-09 Thread Kostas Tzoumas
I appreciate the concern Kanstantsin!

We do have a news@ mailing list, but it has been under-utilized so far.
Perhaps revamping that one would do it?

My only concern is that subscribing to a new mailing list is an overhead.
As a temp solution, we could cc the dev and user list in the first few
(say, 3) threads and encourage folks in these threads to sign up for the
news@ list.

On Thu, Dec 8, 2016 at 10:07 AM, Robert Metzger  wrote:

> Thank you for speaking up Kanstantsin. I really don't want to downgrade
> the experience on the user@ list.
>
> I wonder if jobs@flink would be a too narrowly-scoped mailing list.
> Maybe we could also start a community@flink (alternatively also general@)
> mailing list for everything relating to the broader Flink community,
> including job offers, meetups, conferences and everything else that is
> important for the community to grow.
>
> On Thu, Dec 8, 2016 at 3:10 AM, Radu Tudoran 
> wrote:
>
>> Hi,
>>
>>
>>
>> I think the idea of having such a monthly thread is very good and it
>> might even help to further attract new people in the community.
>>
>> In the same time I do not think that 1 extra mail per month is necessary
>> a spam J
>>
>> In the same time – we can also consider a jobs@flink mailing list
>>
>>
>>
>>
>>
>> Dr. Radu Tudoran
>>
>> Senior Research Engineer - Big Data Expert
>>
>> IT R&D Division
>>
>>
>>
>> [image: cid:image007.jpg@01CD52EB.AD060EE0]
>>
>> HUAWEI TECHNOLOGIES Duesseldorf GmbH
>>
>> European Research Center
>>
>> Riesstrasse 25, 80992 München
>>
>>
>>
>> E-mail: *radu.tudo...@huawei.com *
>>
>> Mobile: +49 15209084330 <01520%209084330>
>>
>> Telephone: +49 891588344173 <089%201588344173>
>>
>>
>>
>> HUAWEI TECHNOLOGIES Duesseldorf GmbH
>> Hansaallee 205, 40549 Düsseldorf, Germany, www.huawei.com
>> Registered Office: Düsseldorf, Register Court Düsseldorf, HRB 56063,
>> Managing Director: Bo PENG, Wanzhou MENG, Lifang CHEN
>> Sitz der Gesellschaft: Düsseldorf, Amtsgericht Düsseldorf, HRB 56063,
>> Geschäftsführer: Bo PENG, Wanzhou MENG, Lifang CHEN
>>
>> This e-mail and its attachments contain confidential information from
>> HUAWEI, which is intended only for the person or entity whose address is
>> listed above. Any use of the information contained herein in any way
>> (including, but not limited to, total or partial disclosure, reproduction,
>> or dissemination) by persons other than the intended recipient(s) is
>> prohibited. If you receive this e-mail in error, please notify the sender
>> by phone or email immediately and delete it!
>>
>>
>>
>> *From:* Kanstantsin Kamkou [mailto:kkam...@gmail.com]
>> *Sent:* Wednesday, December 07, 2016 9:57 PM
>> *To:* user@flink.apache.org
>> *Subject:* Re: [DISCUSS] "Who's hiring on Flink" monthly thread in the
>> mailing lists?
>>
>>
>>
>> Is it possible to avoid such a spam here? If I need a new job, I could
>> search it. The same way I might want to subscribe to a different thread,
>> like jobs@flink. * The idea itself is great.
>>
>>
>>
>> On Tue, 6 Dec 2016 at 14:04, Kostas Tzoumas  wrote:
>>
>> yes, of course!
>>
>>
>>
>> On Tue, Dec 6, 2016 at 12:54 PM, Márton Balassi > > wrote:
>>
>> +1. It keeps it both organized and to a reasonable minimum overhead.
>>
>>
>>
>> Would you volunteer for starting the mail thread each month then, Kostas?
>>
>>
>>
>> Best,
>>
>>
>>
>> Marton
>>
>>
>>
>> On Tue, Dec 6, 2016 at 6:42 AM, Kostas Tzoumas 
>> wrote:
>>
>> Hi folks,
>>
>>
>>
>>
>>
>> I'd like to see how the community feels about a monthly "Who is hiring on
>>
>>
>> Flink" email thread on the dev@ and user@ mailing lists where folks can
>>
>>
>> post job positions related to Flink.
>>
>>
>>
>>
>>
>> I personally think that posting individual job offerings in the mailing
>>
>>
>> list is off-topic (hence I have refrained to do that wearing my company
>>
>>
>> hat, and I have discouraged others when they asked for my opinion on
>> this),
>>
>>
>> but I thought that a monthly thread like this would be both helpful to the
>>
>>
>> community and not cause overhead.
>>
>>
>>
>>
>>
>> Cheers,
>>
>>
>> Kostas
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>


Re: [DISCUSS] "Who's hiring on Flink" monthly thread in the mailing lists?

2016-12-06 Thread Kostas Tzoumas
yes, of course!

On Tue, Dec 6, 2016 at 12:54 PM, Márton Balassi 
wrote:

> +1. It keeps it both organized and to a reasonable minimum overhead.
>
> Would you volunteer for starting the mail thread each month then, Kostas?
>
> Best,
>
> Marton
>
> On Tue, Dec 6, 2016 at 6:42 AM, Kostas Tzoumas 
> wrote:
>
>> Hi folks,
>>
>> I'd like to see how the community feels about a monthly "Who is hiring on
>> Flink" email thread on the dev@ and user@ mailing lists where folks can
>> post job positions related to Flink.
>>
>> I personally think that posting individual job offerings in the mailing
>> list is off-topic (hence I have refrained to do that wearing my company
>> hat, and I have discouraged others when they asked for my opinion on
>> this),
>> but I thought that a monthly thread like this would be both helpful to the
>> community and not cause overhead.
>>
>> Cheers,
>> Kostas
>>
>
>


[DISCUSS] "Who's hiring on Flink" monthly thread in the mailing lists?

2016-12-06 Thread Kostas Tzoumas
Hi folks,

I'd like to see how the community feels about a monthly "Who is hiring on
Flink" email thread on the dev@ and user@ mailing lists where folks can
post job positions related to Flink.

I personally think that posting individual job offerings in the mailing
list is off-topic (hence I have refrained to do that wearing my company
hat, and I have discouraged others when they asked for my opinion on this),
but I thought that a monthly thread like this would be both helpful to the
community and not cause overhead.

Cheers,
Kostas


Flink survey by data Artisans

2016-11-18 Thread Kostas Tzoumas
Hi everyone!

The Apache Flink community has evolved quickly over the past 2+ years, and
there are now many production Flink deployments in organizations of all
sizes.  This is both exciting and humbling :-)

data Artisans is running a brief survey to understand Apache Flink usage
and the needs of the community. We are hoping that this survey will help
identify common usage patterns, as well as pinpoint what are the most
needed features for Flink.

We'll share a report with a summary of findings at the conclusion of the
survey with the community. All of the responses will remain confidential,
and only aggregate statistics will be shared.

I expect the survey to take 5-10 minutes, and all questions are
optional--we appreciate any feedback that you're willing to provide.

As a thank you, respondents will be entered in a drawing to win one of 10
tickets to Flink Forward 2017 (your choice of Berlin or the first-ever San
Francisco edition).

The survey is available here: http://www.surveygizmo.com/s3/
3166399/181bdb611f22

Looking forward to hearing back from you!

Best,
Kostas


[ANNOUNCE] Flink Forward 2016: First round of speakers and sessions is out

2016-07-25 Thread Kostas Tzoumas
Hi everyone,

I wanted to share this with the community: we have announced the first
round of speakers and sessions of Flink Forward 2016, and it looks amazing!

Check it out here: http://flink-forward.org/program/sessions/

This year we have a great mix of use case talks (e.g., by Netflix, Alibaba,
Intel, Cisco, King, Zalando, etc), in-depth developer-oriented talks on
Flink existing and upcoming features by committers and contributors, and
several talks on the wider stream processing landscape, including Apache
Beam (incubating), streaming SQL, and more.

As a reminder, the last day for early bird tickets is this Sunday, July 31.
I'm really looking forward to seeing as many of us there as possible!

Best,
Kostas


Re: Flink on Azure HDInsight

2016-05-04 Thread Kostas Tzoumas
As far as I know, Azure HDInsight is based on Hortonworks HDP, on top of
which Flink has been used extensively.

On Mon, May 2, 2016 at 10:42 AM, Brig Lamoreaux <
brig.lamore...@microsoft.com> wrote:

> Thanks Stephan,
>
>
>
> Turns out Azure Table is slightly different than Azure HDInsight. Both use
> Azure Storage however, HDInsight allows HDFS over Azure Storage.
>
>
>
> I’d be curious if anyone has tried to use Flink on top of Azure HDInsight.
>
>
>
> Thanks,
>
> Brig
>
>
>
> *From:* ewenstep...@gmail.com [mailto:ewenstep...@gmail.com] *On Behalf
> Of *Stephan Ewen
> *Sent:* Saturday, April 30, 2016 9:36 PM
> *To:* user@flink.apache.org
> *Subject:* Re: Flink on Azure HDInsight
>
>
>
> Hi!
>
>
>
> As far as I know, some people have been using Flink together with Azure,
> and we try and do some release validation on Azure as well.
>
>
>
> There is even a section in the docs that describes how to use Hadoop's
> Azure Table formats with Flink
>
>
> https://ci.apache.org/projects/flink/flink-docs-master/apis/batch/connectors.html#access-microsoft-azure-table-storage
> 
>
>
>
> I am not aware of any Azure specific issues at this point...
>
>
>
> Greetings,
>
> Stephan
>
>
>
>
>
>
>
> On Fri, Apr 29, 2016 at 11:18 AM, Brig Lamoreaux <
> brig.lamore...@microsoft.com> wrote:
>
> Hi All,
>
>
>
> Are there any issues with Flink on Azure HDInsight?
>
>
>
> Thanks,
>
> Brig Lamoreaux
>
> Data Solution Architect
>
> US Desert/Mountain Tempe
>
>
>
>
>
> [image: MSFT_logo_Gray DE sized SIG1.png]
>
>
>
>
>
>
>
>
>


[ANNOUNCE] Flink 1.0.0 has been released

2016-03-08 Thread Kostas Tzoumas
Hi everyone!

As you might have noticed, Apache Flink 1.0.0 has been released and
announced!

You can read more about the release at the ASF blog and the Flink blog
-
https://blogs.apache.org/foundation/entry/the_apache_software_foundation_announces88
- http://flink.apache.org/news/2016/03/08/release-1.0.0.html

Don't forget to retweet and spread the news :-)
- https://twitter.com/TheASF/status/707174116969857024
- https://twitter.com/ApacheFlink/status/707175973482012672

Check out the changelog and the migration guide, download the release, and
check out the documentation
- http://flink.apache.org/blog/release_1.0.0-changelog_known_issues.html
-
https://cwiki.apache.org/confluence/display/FLINK/Migration+Guide%3A+0.10.x+to+1.0.x
- https://cwiki.apache.org/confluence/display/FLINK/Stability+Annotations
- http://flink.apache.org/downloads.html
- https://ci.apache.org/projects/flink/flink-docs-release-1.0/

Many congratulations to the Flink community for making this happen!

Best,
Kostas


[DISCUSS] Flink roadmap for 2016

2015-12-16 Thread Kostas Tzoumas
Hi everyone,

I think it is very interesting to both developers and users of Flink to
define a roadmap for future development. Together with Stephan, we started
a draft containing a couple of areas that we think are important to focus
on next.

https://docs.google.com/document/d/1ExmtVpeVVT3TIhO1JoBpC5JKXm-778DAD7eqw5GANwE/edit?usp=sharing


It would be great if we could get your feedback on this document, and use
it to get the discussing on the roadmap started. I am especially including
the users@ list to see what would be the next priorities for users (for
example, which systems should Flink integrate with next for data input and
output).

The document is editable by anyone, so feel free to comment and suggest
directly there.

Happy roadmaping!

Best,
Kostas


Community choice for Hadoop Summit Europe 2016

2015-12-09 Thread Kostas Tzoumas
Hi everyone,

Just a reminder, the community vote for the Hadoop Summit Europe 2016 talks
in Dublin is still open until December 15.

There is a very good number of talks around Flink submitted, here are the
ones that mention "flink" in their abstract:
https://hadoopsummit.uservoice.com/search?filter=merged&query=flink

Vote away :-)

Best,
Kostas


Re: Fold vs Reduce in DataStream API

2015-11-18 Thread Kostas Tzoumas
Granted, both are presented with the same example in the docs. They are
modeled after reduce and fold in functional programming. Perhaps we should
have a bit more enlightening examples.

On Wed, Nov 18, 2015 at 6:39 PM, Fabian Hueske  wrote:

> Hi Ron,
>
> Have you checked:
> https://ci.apache.org/projects/flink/flink-docs-release-0.10/apis/streaming_guide.html#transformations
> ?
>
> Fold is like reduce, except that you define a start element (of a
> different type than the input type) and the result type is the type of the
> initial value. In reduce, the result type must be identical to the input
> type.
>
> Best, Fabian
>
> 2015-11-18 18:32 GMT+01:00 Ron Crocker :
>
>> Is there a succinct description of the distinction between these
>> transforms?
>>
>> Ron
>> —
>> Ron Crocker
>> Principal Engineer & Architect
>> ( ( •)) New Relic
>> rcroc...@newrelic.com
>> M: +1 630 363 8835
>>
>>
>


Re: Apache Flink Operator State as Query Cache

2015-11-15 Thread Kostas Tzoumas
Hi Wally,

This version adds support for specifying and switching between time
semantics - processing time, ingestion time, or event time.

When working with event time, you can specify watermarks to track the
progress of event time. So, even if events arrive out of order, windows
will be specified on the event time (not arrival time), and the computation
will be triggered on watermark arrival.

You can see the API reference and an example here:
https://ci.apache.org/projects/flink/flink-docs-release-0.10/apis/streaming_guide.html#working-with-time

Is this what you are looking for?

Kostas


On Sat, Nov 14, 2015 at 1:54 AM, Welly Tambunan  wrote:

> Hi Robert,
>
> Is this version has already handle the stream perfection or out of order
> event ?
>
> Any resource on how this work and the API reference ?
>
>
> Cheers
>
> On Fri, Nov 13, 2015 at 4:00 PM, Welly Tambunan  wrote:
>
>> Awesome !
>>
>> This is really the best weekend gift ever. :)
>>
>> Cheers
>>
>> On Fri, Nov 13, 2015 at 3:54 PM, Robert Metzger 
>> wrote:
>>
>>> Hi Welly,
>>> Flink 0.10.0 is out, its just not announced yet.
>>> Its available on maven central and the global mirrors are currently
>>> syncing it. This mirror for example has the update already:
>>> http://apache.mirror.digionline.de/flink/flink-0.10.0/
>>>
>>> On Fri, Nov 13, 2015 at 9:50 AM, Welly Tambunan 
>>> wrote:
>>>
 Hi Aljoscha,

 Thanks for this one. Looking forward for 0.10 release version.

 Cheers

 On Thu, Nov 12, 2015 at 5:34 PM, Aljoscha Krettek 
 wrote:

> Hi,
> I don’t know yet when the operator state will be transitioned to
> managed memory but it could happen for 1.0 (which will come after 0.10).
> The good thing is that the interfaces won’t change, so state can be used 
> as
> it is now.
>
> For 0.10, the release vote is winding down right now, so you can
> expect the release to happen today or tomorrow. I think the streaming is
> production ready now, we expect to mostly to hardening and some
> infrastructure changes (for example annotations that specify API 
> stability)
> for the 1.0 release.
>
> Let us know if you need more information.
>
> Cheers,
> Aljoscha
> > On 12 Nov 2015, at 02:42, Welly Tambunan  wrote:
> >
> > Hi Stephan,
> >
> > >Storing the model in OperatorState is a good idea, if you can. On
> the roadmap is to migrate the operator state to managed memory as well, so
> that should take care of the GC issues.
> > Is this using off the heap memory ? Which version we expect this one
> to be available ?
> >
> > Another question is when will the release version of 0.10 will be
> out ? We would love to upgrade to that one when it's available. That
> version will be a production ready streaming right ?
> >
> >
> >
> >
> >
> > On Wed, Nov 11, 2015 at 4:49 PM, Stephan Ewen 
> wrote:
> > Hi!
> >
> > In general, if you can keep state in Flink, you get better
> throughput/latency/consistency and have one less system to worry about
> (external k/v store). State outside means that the Flink processes can be
> slimmer and need fewer resources and as such recover a bit faster. There
> are use cases for that as well.
> >
> > Storing the model in OperatorState is a good idea, if you can. On
> the roadmap is to migrate the operator state to managed memory as well, so
> that should take care of the GC issues.
> >
> > We are just adding functionality to make the Key/Value operator
> state usable in CoMap/CoFlatMap as well (currently it only works in 
> windows
> and in Map/FlatMap/Filter functions over the KeyedStream).
> > Until the, you should be able to use a simple Java HashMap and use
> the "Checkpointed" interface to get it persistent.
> >
> > Greetings,
> > Stephan
> >
> >
> > On Sun, Nov 8, 2015 at 10:11 AM, Welly Tambunan 
> wrote:
> > Thanks for the answer.
> >
> > Currently the approach that i'm using right now is creating a
> base/marker interface to stream different type of message to the same
> operator. Not sure about the performance hit about this compare to the
> CoFlatMap function.
> >
> > Basically this one is providing query cache, so i'm thinking instead
> of using in memory cache like redis, ignite etc, i can just use operator
> state for this one.
> >
> > I just want to gauge do i need to use memory cache or operator state
> would be just fine.
> >
> > However i'm concern about the Gen 2 Garbage Collection for caching
> our own state without using operator state. Is there any clarification on
> that one ?
> >
> >
> >
> > On Sat, Nov 7, 2015 at 12:38 AM, Anwar Rizal 
> wrote:
> >
> > Let me understand your case better here. You have a stream of model
> and stream 

Re: Powered by Flink

2015-10-19 Thread Kostas Tzoumas
yes, definitely. How about a link under the Community drop-down that points
to the wiki page?

On Mon, Oct 19, 2015 at 2:53 PM, Fabian Hueske  wrote:

> Thanks for starting this Kostas.
>
> I think the list is quite hidden in the wiki. Should we link from
> flink.apache.org to that page?
>
> Cheers, Fabian
>
> 2015-10-19 14:50 GMT+02:00 Kostas Tzoumas :
>
>> Hi everyone,
>>
>> I started a "Powered by Flink" wiki page, listing some of the
>> organizations that are using Flink:
>>
>> https://cwiki.apache.org/confluence/display/FLINK/Powered+by+Flink
>>
>> If you would like to be added to the list, just send me a short email
>> with your organization's name and a description and I will add you to the
>> wiki page.
>>
>> Best,
>> Kostas
>>
>
>


Powered by Flink

2015-10-19 Thread Kostas Tzoumas
Hi everyone,

I started a "Powered by Flink" wiki page, listing some of the organizations
that are using Flink:

https://cwiki.apache.org/confluence/display/FLINK/Powered+by+Flink

If you would like to be added to the list, just send me a short email with
your organization's name and a description and I will add you to the wiki
page.

Best,
Kostas


Volunteers needed for Flink Forward 2015 (and they get a free ticket)

2015-09-07 Thread Kostas Tzoumas
Hi folks,

The Flink Forward 2015 organizers are looking for volunteers (and they are
offering free tickets in exchange).

Sign up here if you are interested (or send me an email):
http://flink-forward.org/?page_id=495

Best,
Kostas


Re: Hardware requirements and learning resources

2015-09-03 Thread Kostas Tzoumas
Well hidden.

I added now a link at the menu of http://data-artisans.com/. This material
is provided for free by data Artisans but they are not part of the official
Apache Flink project.

On Thu, Sep 3, 2015 at 2:20 PM, Stefan Winterstein <
stefan.winterst...@dfki.de> wrote:

>
> > Answering to myself, I have found some nice training material at
> > http://dataartisans.github.io/flink-training.
>
> Excellent resources! Somehow, I managed not to stumble over them by
> myself - either I was blind, or they are well hidden... :)
>
>
> Best,
> -Stefan
>
>


Re: Hardware requirements and learning resources

2015-09-02 Thread Kostas Tzoumas
Hi Juan,

Flink is quite nimble with hardware requirements; people have run it in
old-ish laptops and also the largest instances available in cloud
providers. I will let others chime in with more details.

I am not aware of something along the lines of a cheatsheet that you
mention. If you actually try to do this, I would love to see it, and it
might be useful to others as well. Both use similar abstractions at the API
level (i.e., parallel collections), so if you stay true to the functional
paradigm and not try to "abuse" the system by exploiting knowledge of its
internals things should be straightforward. These apply to the batch APIs;
the streaming API in Flink follows a true streaming paradigm, where you get
an unbounded stream of records and operators on these streams.

Funny that you ask about a video for the DataStream slides. There is a
Flink training happening as we speak, and a video is being recorded right
now :-) Hopefully it will be made available soon.

Best,
Kostas


On Wed, Sep 2, 2015 at 1:13 PM, Juan Rodríguez Hortalá <
juan.rodriguez.hort...@gmail.com> wrote:

> Answering to myself, I have found some nice training material at
> http://dataartisans.github.io/flink-training. There are even videos at
> youtube for some of the slides
>
>   - http://dataartisans.github.io/flink-training/overview/intro.html
> https://www.youtube.com/watch?v=XgC6c4Wiqvs
>
>   - http://dataartisans.github.io/flink-training/dataSetBasics/intro.html
> https://www.youtube.com/watch?v=0EARqW15dDk
>
> The third lecture
> http://dataartisans.github.io/flink-training/dataSetAdvanced/intro.html
> more or less corresponds to https://www.youtube.com/watch?v=1yWKZ26NQeU
> but not exactly, and there are more lessons at
> http://dataartisans.github.io/flink-training, for stream processing and
> the table API for which I haven't found a video. Does anyone have pointers
> to the missing videos?
>
> Greetings,
>
> Juan
>
> 2015-09-02 12:50 GMT+02:00 Juan Rodríguez Hortalá <
> juan.rodriguez.hort...@gmail.com>:
>
>> Hi list,
>>
>> I'm new to Flink, and I find this project very interesting. I have
>> experience with Apache Spark, and for I've seen so far I find that Flink
>> provides an API at a similar abstraction level but based on single record
>> processing instead of batch processing. I've read in Quora that Flink
>> extends stream processing to batch processing, while Spark extends batch
>> processing to streaming. Therefore I find Flink specially attractive for
>> low latency stream processing. Anyway, I would appreciate if someone could
>> give some indication about where I could find a list of hardware
>> requirements for the slave nodes in a Flink cluster. Something along the
>> lines of https://spark.apache.org/docs/latest/hardware-provisioning.html.
>> Spark is known for having quite high minimal memory requirements (8GB RAM
>> and 8 cores minimum), and I was wondering if it is also the case for Flink.
>> Lower memory requirements would be very interesting for building small
>> Flink clusters for educational purposes, or for small projects.
>>
>> Apart from that, I wonder if there is some blog post by the comunity
>> about transitioning from Spark to Flink. I think it could be interesting,
>> as there are some similarities in the APIs, but also deep differences in
>> the underlying approaches. I was thinking in something like Breeze's
>> cheatsheet comparing its matrix operatations with those available in Matlab
>> and Numpy
>> https://github.com/scalanlp/breeze/wiki/Linear-Algebra-Cheat-Sheet, or
>> like http://rosettacode.org/wiki/Factorial. Just an idea anyway. Also,
>> any pointer to some online course, book or training for Flink besides the
>> official programming guides would be much appreciated
>>
>> Thanks in advance for help
>>
>> Greetings,
>>
>> Juan
>>
>>
>


Re: About exactly once question?

2015-08-27 Thread Kostas Tzoumas
Oops, seems that Stephan's email covers my answer plus the plans to provide
transactional sinks :-)

On Thu, Aug 27, 2015 at 1:25 PM, Kostas Tzoumas  wrote:

> Note that the definition of "exactly-once" means that records are
> guaranteed to be processed exactly once by Flink operators, and thus state
> updates to operator state happen exactly once (e.g., if C had a counter
> that x1, x2, and x3 incremented, the counter would have a value of 3 and
> not a value of 6). This is not specific to Flink, but the most accepted
> definition, and applicable to all stream processing systems. The reason is
> that the stream processor cannot by itself guarantee what happens to the
> outside world (the outside world is in this case the data sink).
>
> See the docs (
> https://ci.apache.org/projects/flink/flink-docs-master/internals/stream_checkpointing.html
> ):
>
> "Apache Flink offers a fault tolerance mechanism to consistently recover
> the state of data streaming applications. The mechanism ensures that even
> in the presence of failures, the program’s state will eventually reflect
> every record from the data stream exactly once."
>
> Guaranteeing exactly once delivery to the sink is possible, as Marton
> above suggests, but the sink implementation needs to be aware and take part
> in the checkpointing mechanism.
>
>
> On Thu, Aug 27, 2015 at 1:14 PM, Márton Balassi 
> wrote:
>
>> Dear Zhangrucong,
>>
>> From your explanation it seems that you have a good general understanding
>> of Flink's checkpointing algorithm. Your concern is valid, by default a
>> sink C with emits tuples to the "outside world" potentially multiple times.
>> A neat trick to solve this issue for your user defined sinks is to use the
>> CheckpointNotifier interface to output records only after the corresponding
>> checkpoint has been totally processed by the system, so sinks can also
>> provid exactly once guarantees in Flink.
>>
>> This would mean that your SinkFunction has to implement both the
>> Checkpointed and the CheckpointNotifier interfaces. The idea is to mark the
>> output tuples with the correspoding checkpoint id, so then they can be
>> emitted in a "consistent" manner when the checkpoint is globally
>> acknowledged by the system. You buffer your output records in a collection
>> of your choice and whenever a snapshotState of the Checkpointed interface
>> is invoked you mark your fresh output records with the current
>> checkpointID. Whenever the notifyCheckpointComplete is invoked you emit
>> records with the corresponding ID.
>>
>> Note that this adds latency to your processing and as you potentially
>> need to checkpoint a lot of data in the sinks I would recommend to use a
>> HDFS as a state backend instead of the default solution.
>>
>> Best,
>>
>> Marton
>>
>> On Thu, Aug 27, 2015 at 12:32 PM, Zhangrucong 
>> wrote:
>>
>>> Hi:
>>>
>>>   The document said Flink can guarantee processing each tuple
>>> exactly-once, but I can not understand how it works.
>>>
>>>For example, In Fig 1, C is running between snapshot n-1 and snapshot
>>> n(snapshot n hasn’t been generated). After snapshot n-1, C has processed
>>> tuple x1, x2, x3 and already outputted to user,  then C failed and it
>>> recoveries from snapshot n-1. In my opinion, x1, x2, x3 will be processed
>>> and outputted to user again. My question is how Flink guarantee x1,x2,x3
>>> are processed and outputted to user only once?
>>>
>>>
>>>
>>>
>>>
>>> Fig 1.
>>>
>>> Thanks for answing.
>>>
>>
>>
>


Re: About exactly once question?

2015-08-27 Thread Kostas Tzoumas
Note that the definition of "exactly-once" means that records are
guaranteed to be processed exactly once by Flink operators, and thus state
updates to operator state happen exactly once (e.g., if C had a counter
that x1, x2, and x3 incremented, the counter would have a value of 3 and
not a value of 6). This is not specific to Flink, but the most accepted
definition, and applicable to all stream processing systems. The reason is
that the stream processor cannot by itself guarantee what happens to the
outside world (the outside world is in this case the data sink).

See the docs (
https://ci.apache.org/projects/flink/flink-docs-master/internals/stream_checkpointing.html
):

"Apache Flink offers a fault tolerance mechanism to consistently recover
the state of data streaming applications. The mechanism ensures that even
in the presence of failures, the program’s state will eventually reflect
every record from the data stream exactly once."

Guaranteeing exactly once delivery to the sink is possible, as Marton above
suggests, but the sink implementation needs to be aware and take part in
the checkpointing mechanism.


On Thu, Aug 27, 2015 at 1:14 PM, Márton Balassi 
wrote:

> Dear Zhangrucong,
>
> From your explanation it seems that you have a good general understanding
> of Flink's checkpointing algorithm. Your concern is valid, by default a
> sink C with emits tuples to the "outside world" potentially multiple times.
> A neat trick to solve this issue for your user defined sinks is to use the
> CheckpointNotifier interface to output records only after the corresponding
> checkpoint has been totally processed by the system, so sinks can also
> provid exactly once guarantees in Flink.
>
> This would mean that your SinkFunction has to implement both the
> Checkpointed and the CheckpointNotifier interfaces. The idea is to mark the
> output tuples with the correspoding checkpoint id, so then they can be
> emitted in a "consistent" manner when the checkpoint is globally
> acknowledged by the system. You buffer your output records in a collection
> of your choice and whenever a snapshotState of the Checkpointed interface
> is invoked you mark your fresh output records with the current
> checkpointID. Whenever the notifyCheckpointComplete is invoked you emit
> records with the corresponding ID.
>
> Note that this adds latency to your processing and as you potentially need
> to checkpoint a lot of data in the sinks I would recommend to use a HDFS as
> a state backend instead of the default solution.
>
> Best,
>
> Marton
>
> On Thu, Aug 27, 2015 at 12:32 PM, Zhangrucong 
> wrote:
>
>> Hi:
>>
>>   The document said Flink can guarantee processing each tuple
>> exactly-once, but I can not understand how it works.
>>
>>For example, In Fig 1, C is running between snapshot n-1 and snapshot
>> n(snapshot n hasn’t been generated). After snapshot n-1, C has processed
>> tuple x1, x2, x3 and already outputted to user,  then C failed and it
>> recoveries from snapshot n-1. In my opinion, x1, x2, x3 will be processed
>> and outputted to user again. My question is how Flink guarantee x1,x2,x3
>> are processed and outputted to user only once?
>>
>>
>>
>>
>>
>> Fig 1.
>>
>> Thanks for answing.
>>
>
>


[ANNOUNCE] Flink Forward 2015 program is online

2015-08-25 Thread Kostas Tzoumas
Hi everyone,

Just a shoutout that we have posted the program of Flink Forward 2015 here:
http://flink-forward.org/?post_type=day

You can expect few changes here and there, but the majority of the talks is
in.

Thanks again to the speakers and the reviewers!

If you have not registered yet, now is the time to do it :-) (here:
http://flink-forward.org/?page_id=96)

Kostas


Re: Flink 0.9 built with Scala 2.11

2015-06-10 Thread Kostas Tzoumas
Please do ping this list if you encounter any problems with Flink during
your project (you have done so already :-), but also if you find that the
Flink API needs additions to map Pig well to Flink

On Wed, Jun 10, 2015 at 3:47 PM, Philipp Goetze <
philipp.goe...@tu-ilmenau.de> wrote:

> Done. Can be found here: https://issues.apache.org/jira/browse/FLINK-2200
>
> Best Regards,
> Philipp
>
>
>
> On 10.06.2015 15:29, Chiwan Park wrote:
>
>> But I think uploading Flink API with scala 2.11 to maven repository is
>> nice idea.
>> Could you create a JIRA issue?
>>
>> Regards,
>> Chiwan Park
>>
>>  On Jun 10, 2015, at 10:23 PM, Chiwan Park  wrote:
>>>
>>> No. Currently, there are no Flink binaries with scala 2.11 which are
>>> downloadable.
>>>
>>> Regards,
>>> Chiwan Park
>>>
>>>  On Jun 10, 2015, at 10:18 PM, Philipp Goetze <
 philipp.goe...@tu-ilmenau.de> wrote:

 Thank you Chiwan!

 I did not know the master has a 2.11 profile.

 But there is no pre-built Flink with 2.11, which I could refer to in
 sbt or maven, is it?

 Best Regards,
 Philipp

 On 10.06.2015 15:03, Chiwan Park wrote:

> Hi. You can build Flink with Scala 2.11 with scala-2.11 profile in
> master branch.
> `mvn clean install -DskipTests -P \!scala-2.10,scala-2.11` command
> builds Flink with Scala 2.11.
>
> Regards,
> Chiwan Park
>
>  On Jun 10, 2015, at 9:56 PM, Flavio Pompermaier 
>> wrote:
>>
>> Nice!
>>
>> On 10 Jun 2015 14:49, "Philipp Goetze" 
>> wrote:
>> Hi community!
>>
>> We started a new project called Piglet (
>> https://github.com/ksattler/piglet).
>> For that we use i.a. Flink as a backend. The project is based on
>> Scala 2.11. Thus we need a 2.11 build of Flink.
>>
>> Until now we used the 2.11 branch of the stratosphere project and
>> built Flink ourselves. Unfortunately this branch is not up-to-date.
>>
>> Do you have an official repository for Flink 0.9 (built with Scala
>> 2.11)?
>>
>> Best Regards,
>> Philipp
>>
>
>
>
>
>>>
>>>
>>>
>>
>>
>>
>


Re: CoGgroup Operator Data Sink

2015-04-14 Thread Kostas Tzoumas
Each operator has only one output (which can be consumed by multiple
downstream operators), so you cannot branch out to two different directions
from inside the user code with many collectors. The reasoning is that you
can have the same effect with what Robert suggested.

But perhaps your use case is different; can you not achieve the same result
with branching out to two different DataSets as per Robert's suggestion? If
this is the case, posting some details on the function would be helpful.

On Tue, Apr 14, 2015 at 11:37 AM, Mustafa Elbehery <
elbeherymust...@gmail.com> wrote:

> Thanks for prompt reply.
>
> Maybe the expression "Sink" is not suitable to what I need. What if I want
> to *Collect* two data sets directly from the coGroup operator. Is there
> anyway to do so ?!!
>
> As I might know, the operator has only Collector Object, but I wonder if
> there is another feature in Flink that supports what I need .
>
> Thanks.
>
> On Tue, Apr 14, 2015 at 11:27 AM, Robert Metzger 
> wrote:
>
>> Hi,
>>
>> you can write the output of a coGroup operator to two sinks:
>>
>> --\   />Sink1
>>\ /
>> (CoGroup)
>>/\
>> --/  \-->Sink2
>>
>> You can actually write to as many sinks as you want.
>> Note that the data written to Sink1 and Sink2 will be identical.
>> If you want to write different data to S1 and S2, you can use a Tuple2
>> where the first field contains a tag, and the second field contains your
>> data.
>> Then, you use a filter in front of your Sinks to select the data based on
>> the tag.
>>
>> --\   /---(Filter)-->Sink1
>>\ /
>> (CoGroup)
>>/\
>> --/  \(Filter)-->Sink2
>>
>> So the output of CoGroup could be Tuple2, when the
>> integer is 1, it is only written by Sink1, when the integer is 2, its only
>> written by Sink2.
>>
>>
>>
>>
>> On Tue, Apr 14, 2015 at 10:20 AM, Mustafa Elbehery <
>> elbeherymust...@gmail.com> wrote:
>>
>>> Hi all,
>>>
>>> I wonder if the coGroup operator have the ability to sink two output
>>> simultaneously. I am trying to mock it by calling a function inside the
>>> operator, in which I sink the first output, and get the second output
>>> myself.
>>>
>>> I am not sure if this is the best way, and I would like to hear your
>>> suggestions,
>>>
>>> Regards.
>>>
>>> --
>>> Mustafa Elbehery
>>> EIT ICT Labs Master School 
>>> +49(0)15750363097
>>> skype: mustafaelbehery87
>>>
>>>
>>
>
>
> --
> Mustafa Elbehery
> EIT ICT Labs Master School 
> +49(0)15750363097
> skype: mustafaelbehery87
>
>


Flink 0.9.0-milestone1 released

2015-04-13 Thread Kostas Tzoumas
We are very excited to announce Flink 0.9.0-milestone1, a preview release
to give users early access to some Flink 0.9.0 features, including:

- A Table API for SQL-like queries embedded in Java and Scala
- Gelly, Flink's graph processing API
- A Machine Learning library on Flink inspired by scikit-learn
- A mode to run Flink on YARN leveraging Apache Tez as the execution engine
- Exactly-once delivery guarantees for streaming Flink programs backed by
persistent Kafka sources
- Improved YARN support
- A rewrite of Flink's RPC service which is now built on Akka

and a lot more!

See http://flink.apache.org/news/2015/04/13/release-0.9.0-milestone1.html

Thank you everyone for all the work, looking forward to the next step,
Flink 0.9.0.


Re: Flink meetup group in Stockholm

2015-04-08 Thread Kostas Tzoumas
Super cool!!! I'm sure you will make it a huge success :-)

On Wed, Apr 8, 2015 at 5:44 PM, Till Rohrmann  wrote:

> Really cool :-)
>
> On Wed, Apr 8, 2015 at 5:09 PM, Maximilian Michels  wrote:
>
>> Love the purple. Have fun! :)
>>
>> On Wed, Apr 8, 2015 at 5:05 PM, Henry Saputra 
>> wrote:
>>
>>> Nice, congrats!
>>>
>>> On Wed, Apr 8, 2015 at 7:39 AM, Gyula Fóra  wrote:
>>> > Hey Everyone!
>>> >
>>> > We our proud to announce the first Apache Flink meetup group in
>>> Stockholm.
>>> >
>>> > Join us at http://www.meetup.com/Apache-Flink-Stockholm/
>>> >
>>> > We are looking forward to organise our first event in May!
>>> >
>>> > Cheers,
>>> > Gyula
>>>
>>
>>
>


Re: Flink Forward 2015

2015-04-07 Thread Kostas Tzoumas
Anwar, I will publish this soon on the FF website, we are looking at
mid/late summer.

Kostas

On Tue, Apr 7, 2015 at 5:04 PM, Anwar Rizal  wrote:

> Look great. Any dates for the abstract deadline already ?
>
> On Tue, Apr 7, 2015 at 2:38 PM, Kostas Tzoumas 
> wrote:
>
>> Ah, thanks Sebastian! :-)
>>
>> On Tue, Apr 7, 2015 at 2:33 PM, Sebastian 
>> wrote:
>>
>>> There are still some "Berlin Buzzwords" snippets in your texts ;)
>>>
>>> http://flink-forward.org/?page_id=294
>>>
>>>
>>> On 07.04.2015 14:24, Kostas Tzoumas wrote:
>>>
>>>> Hi everyone,
>>>>
>>>> The folks at data Artisans and the Berlin Big Data Center are organizing
>>>> the first physical conference all about Apache Flink in Berlin the
>>>> coming October:
>>>>
>>>> http://flink-forward.org
>>>>
>>>> The conference will be held in a beautiful spot an old brewery turned
>>>> event space (the same space that Berlin Buzzwords took place last year).
>>>> We are soliciting technical talks on Flink, talks on how you are using
>>>> Flink to solve real world problems, as well as talks on Big Data
>>>> technology in general that relate to Apache Flink's general direction.
>>>> And of course, there will be enough social and networking events to get
>>>> the community together :-)
>>>>
>>>> The website and the call for abstracts are live, but the ticket
>>>> registration is not yet open.
>>>>
>>>> At this point, I would like to ask the community to mark your calendars
>>>> if you'd like to attend, submit an abstract, and forward the event to
>>>> your friends and family. If you can help us market the event, help in
>>>> any other way, or have any other inquiries, please get in touch with me!
>>>>
>>>> I will also announce this via our social media channels this week.
>>>>
>>>> I am looking forward to gathering the community in a great conference!
>>>>
>>>> Best,
>>>> Kostas
>>>>
>>>>
>>
>


Re: Flink Forward 2015

2015-04-07 Thread Kostas Tzoumas
Ah, thanks Sebastian! :-)

On Tue, Apr 7, 2015 at 2:33 PM, Sebastian  wrote:

> There are still some "Berlin Buzzwords" snippets in your texts ;)
>
> http://flink-forward.org/?page_id=294
>
>
> On 07.04.2015 14:24, Kostas Tzoumas wrote:
>
>> Hi everyone,
>>
>> The folks at data Artisans and the Berlin Big Data Center are organizing
>> the first physical conference all about Apache Flink in Berlin the
>> coming October:
>>
>> http://flink-forward.org
>>
>> The conference will be held in a beautiful spot an old brewery turned
>> event space (the same space that Berlin Buzzwords took place last year).
>> We are soliciting technical talks on Flink, talks on how you are using
>> Flink to solve real world problems, as well as talks on Big Data
>> technology in general that relate to Apache Flink's general direction.
>> And of course, there will be enough social and networking events to get
>> the community together :-)
>>
>> The website and the call for abstracts are live, but the ticket
>> registration is not yet open.
>>
>> At this point, I would like to ask the community to mark your calendars
>> if you'd like to attend, submit an abstract, and forward the event to
>> your friends and family. If you can help us market the event, help in
>> any other way, or have any other inquiries, please get in touch with me!
>>
>> I will also announce this via our social media channels this week.
>>
>> I am looking forward to gathering the community in a great conference!
>>
>> Best,
>> Kostas
>>
>>


Flink Forward 2015

2015-04-07 Thread Kostas Tzoumas
Hi everyone,

The folks at data Artisans and the Berlin Big Data Center are organizing
the first physical conference all about Apache Flink in Berlin the coming
October:

http://flink-forward.org

The conference will be held in a beautiful spot an old brewery turned event
space (the same space that Berlin Buzzwords took place last year). We are
soliciting technical talks on Flink, talks on how you are using Flink to
solve real world problems, as well as talks on Big Data technology in
general that relate to Apache Flink's general direction. And of course,
there will be enough social and networking events to get the community
together :-)

The website and the call for abstracts are live, but the ticket
registration is not yet open.

At this point, I would like to ask the community to mark your calendars if
you'd like to attend, submit an abstract, and forward the event to your
friends and family. If you can help us market the event, help in any other
way, or have any other inquiries, please get in touch with me!

I will also announce this via our social media channels this week.

I am looking forward to gathering the community in a great conference!

Best,
Kostas


Fwd: External Talk: Apache Flink - Speakers: Kostas Tzoumas (CEO dataArtisans), Stephan Ewen (CTO dataArtisans)

2015-04-07 Thread Kostas Tzoumas
Hi everyone,

I'm forwarding a private conversation to the list with Mats' approval.

The problem is how to compute correlation between time series in Flink. We
have two time series, U and V, and need to compute 1000 correlation
measures between the series, each measure shifts one series by one more
item: corr(U[0:N], V[n:N+n]) for n=0 to n=1000.

Any ideas on how one can do that without a Cartesian product?

Best,
Kostas

-- Forwarded message --
From: Mats Zachrison 
Date: Tue, Mar 31, 2015 at 9:21 AM
Subject:
To: Kostas Tzoumas , Stefan Avesand <
stefan.aves...@ericsson.com>
Cc: "step...@data-artisans.com" 


As Stefan said, what I’m trying to achieve is basically a nice way to do a
correlation between two large time series. Since I’m looking for an optimal
delay between the two series, I’d like to delay one of the series x
observations when doing the correlation, and step x from 1 to 1000.



Some pseudo code:



  For (x = 1 to 1000)

  Shift Series A ‘x-1’ steps

  Correlation[x] = Correlate(Series A and Series B)

  End For



In R, using cor() and apply(), this could look like:



  shift <- as.array(c(1:1000))

  corrAB <- apply(shift, 1, function(x) cor(data[x:nrow(data), ]$ColumnA,
data[1:(nrow(data) - (x - 1)), ]$ColumnB))





Since this basically is 1000 independent correlation calculations, it is
fairly easy to parallelize. Here is an R example using foreach() and
package doParallel:



  cl <- makeCluster(3)

  registerDoParallel(cl)

  corrAB <- foreach(step = c(1:1000)) %dopar% {

corrAB <- cor(data[step:nrow(data), ]$ColumnA, data[1:(nrow(data) -
(step - 1)), ]$ColumnB)

  }

  stopCluster(cl)



So I guess the question is – how to do this in a Flink environment? Do we
have to define how to parallelize the algorithm, or can the cluster take
care of that for us?



And of course this is most interesting on a generic level – given the
environment of a multi-core or –processor setup running Flink, how hard is
it to take advantage of all the clock cycles? Do we have to split the
algorithm, and data, and distribute the processing, or can the system do
much of that for us?


Fwd: Contact from site - Lou

2015-03-27 Thread Kostas Tzoumas
Hi,

I am forwarding this question from Lou so that others can benefit as well.

Kostas

-- Forwarded message --
From: 
Date: Thu, Mar 26, 2015 at 4:07 PM


Hi there,

I am Lou, a Ph.D. research scientist working at Ericsson Research (ER),
Stockholm, Sweden.

At first, many thanks for the presentation given by Kostas and Stephan at
ER this Tuesday, about which I attended via telephone conference.

Next, may I ask you a short question please? We are currently working with
the development of an in-house cluster/job manager, and my question is: is
it easy to test our own job manager on Apache Flink, and do you have any
guidelines about doing it? Moreover, is the current standalone cluster
manager called “direct” on Flink?

Thanks in advance,

Cheers,
Lou


February 2015 in the Flink community

2015-03-02 Thread Kostas Tzoumas
Hi everyone

February might be the shortest month of the year, but the community has
been pretty busy:

- Flink 0.8.1, a bugfix release has been made available

- The project added a new committer

- Flink contributors developed a Flink adapter for Apache SAMOA

- Flink committers contributed to Google's bdutil. Starting from release
1.2, users of bdutil can deploy Flink clusters on Google Cloud Platform

- Flink was mentioned on several articles online, and many large features
have been merged to the master repository.

You can read the full blog post here:
http://flink.apache.org/news/2015/03/02/february-2015-in-flink.html


January 2015 in the Flink community

2015-02-04 Thread Kostas Tzoumas
Here is a digestible read on some January activity in the Flink community:

http://flink.apache.org/news/2015/02/04/january-in-flink.html

Highlights:

- Flink 0.8.0 was released

- The Flink community published a technical roadmap for 2015

- Flink was used to scale matrix factorization to extreme scale

- Flink's graduation was advertised and picked up by several articles in
the press

- Flink Streaming won a community vote at the Hadoop Summit Europe

- Flink was presented at meetups and events in Paris, the Bay Area, Berlin
and FOSDEM in Brussels.

- Notable code contributions include an off-heap memory mode, Gelly,
Flink's new Graph API, semantic annotations, and a new YARN client.


[ANNOUNCE] Apache Flink 0.8.0 released

2015-01-22 Thread Kostas Tzoumas
The Apache Flink team is proud to announce the release of Apache Flink 0.8.0

This is the first release of Apache Flink as an Apache Top-Level Project

Flink is building a system for distributed batch and real-time streaming
data analysis that offers familiar collection-based programming APIs in
Java and Scala. The Flink API is backed by a robust execution backend with
true streaming capabilities, a custom memory manager, native iteration
execution, and a cost-based optimizer.

This release adds a Scala API and flexible window definitions to Flink
Streaming, several performance and usability improvements, updates to the
HBase module, and a lot more.

See the blog post for more details on the 0.8.0 release:
http://flink.apache.org/news/2015/01/21/release-0.8.html

The release can be downloaded at:
http://flink.apache.org/downloads.html

We would like to thank all the contributors that made this release possible!

Regards,
The Apache Flink Team


Fwd: The Apache Software Foundation Announces Apache™ Flink™ as a Top-Level Project

2015-01-12 Thread Kostas Tzoumas
-- Forwarded message --
From: *Sally Khudairi* 
Date: Monday, January 12, 2015
Subject: The Apache Software Foundation Announces Apache™ Flink™ as a
Top-Level Project
To: Apache Announce List 


>> this announcement is available online at http://s.apache.org/YrZ

Open Source distributed Big Data system for expressive, declarative, and
efficient batch and streaming data processing and analysis

Forest Hill, MD –12 January 2015– The Apache Software Foundation (ASF), the
all-volunteer developers, stewards, and incubators of more than 350 Open
Source projects and initiatives, announced today that Apache™ Flink™ has
graduated from the Apache Incubator to become a Top-Level Project (TLP),
signifying that the project's community and products have been
well-governed under the ASF's meritocratic process and principles.

Apache Flink is an Open Source distributed data analysis engine for batch
and streaming data. It offers programming APIs in Java and Scala, as well
as specialized APIs for graph processing, with more libraries in the making.

"I am very happy that the ASF has become the home for Flink," said Stephan
Ewen, Vice President of Apache Flink. "For a community-driven effort, I can
think of no better umbrella. It is great to see the project is maturing and
many new people are joining the community."

Flink uses a unique combination of streaming/pipelining and batch
processing techniques to create a platform that covers and unifies a broad
set of batch and streaming data analytics use cases. The project has put
significant efforts into making a system that runs reliably and fast in a
wide variety of scenarios. For that reason, Flink contained its own type
serialization, memory management, and cost-based query optimization
components from the early days of the project.

Apache Flink has its roots in the Stratosphere research project that
started in 2009 at TU Berlin together with the Berlin and later the
European data management communities, including HU Berlin, Hasso Plattner
Institute, KTH (Stockholm), ELTE (Budapest), and others. Several Flink
committers recently started data Artisans, a Berlin-based startup committed
to growing Flink both in code and community as 100% Open Source. More than
70 people have by now contributed to Flink.

"Becoming a Top-Level Project in such short time is a great milestone for
Flink and reflects the speed with which the community has been growing,"
said Kostas Tzoumas, co-founder and CEO of data Artisans. "The community is
currently working on some exciting new features that make Flink even more
powerful and accessible to a wider audience, and several companies around
the world are including Flink in their data infrastructure."

"We use Apache Flink as part of our production data infrastructure," said
Ijad Madisch, co-founder and CEO of ResearchGate. "We are happy all around
and excited that Flink provides us with the opportunity for even better
developer productivity and testability, especially for complex data flows.
It’s with good reason that Flink is now a top-level Apache project."

"I have been experimenting with Flink, and we are very excited to hear that
Flink is becoming a top-level Apache project," said Anders Arpteg,
Analytics Machine Learning Manager at Spotify.

Denis Arnaud, Head of Data Science Development of Travel Intelligence at
Amadeus said, "At Amadeus, we continually seek for better improvement in
our analytic platform and our experiments with Apache Flink for analytics
on our travel data show a lot of potential in the system for our production
use."

"Flink was a pleasure to mentor as a new Apache project," said Alan Gates,
Apache Flink Incubator champion at the ASF, and architect/co-founder at
Hortonworks. "The Flink team learned The Apache Way very quickly. They
worked hard at being open in their decision making and including new
contributors. Those of us mentoring them just needed to point them in the
right direction and then let them get to work."

Availability and Oversight
As with all Apache products, Apache Flink software is released under the
Apache License v2.0, and is overseen by a self-selected team of active
contributors to the project. A Project Management Committee (PMC) guides
the Project's day-to-day operations, including community development and
product releases. For documentation and ways to become involved with Apache
Flink, visit http://flink.apache.org/ and @ApacheFlink on Twitter.

About The Apache Software Foundation (ASF)
Established in 1999, the all-volunteer Foundation oversees more than 350
leading Open Source projects, including Apache HTTP Server --the world's
most popular Web server software. Through the ASF's meritocratic process
known as "The Apache Way," more than 500 individual Members and 4,500
Committers successfully collaborate to develop freely available
ent

December in Flink

2015-01-09 Thread Kostas Tzoumas
Hi folks,

I started a new blog series along the lines of "This month in Flink" to
give a summary of the community activity for those that feel overwhelmed by
the traffic in the mailing lists but would still like to follow what's
happening in the project.

Here is the first post I wrote with input from Stephan, Ufuk, Aljoscha,
Robert, and Till:

http://flink.apache.org/news/2015/01/06/december-in-flink.html

Let me know if you'd like me to add something.

Do you think it would make sense to create a separate mailing list along
the lines of news@flink where we try to keep the traffic low (e.g., such
summaries and important news)?

Best,
Kostas