Re: Signing off

2019-02-15 Thread Michael Luckey
Hi Scott,

yes, thanks for all your time and all the best!

michel

On Fri, Feb 15, 2019 at 5:47 AM Kenneth Knowles  wrote:

> +1
>
> Thanks for the contributions to community & code, and enjoy the new
> chapter!
>
> Kenn
>
> On Thu, Feb 14, 2019 at 3:25 PM Thomas Weise  wrote:
>
>> Hi Scott,
>>
>> Thank you for the many contributions to Beam and best of luck with the
>> new endeavor!
>>
>> Thomas
>>
>>
>> On Thu, Feb 14, 2019 at 10:37 AM Scott Wegner  wrote:
>>
>>> I wanted to let you all know that I've decided to pursue a new adventure
>>> in my career, which will take me away from Apache Beam development.
>>>
>>> It's been a fun and fulfilling journey. Apache Beam has been my first
>>> significant experience working in open source. I'm inspired observing how
>>> the community has come together to deliver something great.
>>>
>>> Thanks for everything. If you're curious what's next: I'll be working on
>>> Federated Learning at Google:
>>> https://ai.googleblog.com/2017/04/federated-learning-collaborative.html
>>>
>>> Take care,
>>> Scott
>>>
>>>
>>>
>>> Got feedback? tinyurl.com/swegner-feedback
>>>
>>


Re: Hazelcast Jet Runner

2019-02-15 Thread Robert Bradshaw
On Fri, Feb 15, 2019 at 7:36 AM Can Gencer  wrote:
>
> We at Hazelcast are looking into writing a Beam runner for Hazelcast Jet 
> (https://github.com/hazelcast/hazelcast-jet). I wanted to introduce myself as 
> we'll likely have questions as we start development.

Welcome!

Hazelcast looks interesting, a Beam runner for it would be very cool.

> Some of the things I'm wondering about currently:
>
> * Currently there seems to be a guide available at 
> https://beam.apache.org/contribute/runner-guide/ , is this up to date? Is 
> there anything in specific to be aware of when starting with a new runner 
> that's not covered here?

That looks like a pretty good starting point. At a quick glance, I
don't see anything that looks out of date. Another resource that might
be helpful is a talk from last year on writing an SDK (but as it
mostly covers the runner-sdk interaction, it's also quite useful for
understanding the runner side:
https://docs.google.com/presentation/d/1Cso0XP9dmj77OD9Bd53C1M3W1sPJF0ZnA20gzb2BPhE/edit#slide=id.p
And please feel free to ask any questions on this list as well; we'd
be happy to help.

> * Should we be targeting the latest master which is at 2.12-SNAPSHOT or a 
> stable version?

I would target the latest master.

> * After a runner is developed, how is the maintenance typically handled, as 
> the runners seems to be part of Beam codebase?

Either is possible. Several runner adapters are part of the Beam
codebase, but for example the IMB Streams Beam runner is not. There
are certainly pros and cons (certainly early on when the APIs
themselves were under heavy development it was easier to keep things
in sync in the same codebase, but things have mostly stabilized now).
A runner only becomes part of the Beam codebase if there are members
of the community committed to maintaining it (which could include
you). Both approaches are fine.

- Robert


Re: Signing off

2019-02-15 Thread Maximilian Michels

Thank you for your contributions Scott. Best of luck!

On 15.02.19 10:48, Michael Luckey wrote:

Hi Scott,

yes, thanks for all your time and all the best!

michel

On Fri, Feb 15, 2019 at 5:47 AM Kenneth Knowles > wrote:


+1

Thanks for the contributions to community & code, and enjoy the new
chapter!

Kenn

On Thu, Feb 14, 2019 at 3:25 PM Thomas Weise mailto:t...@apache.org>> wrote:

Hi Scott,

Thank you for the many contributions to Beam and best of luck
with the new endeavor!

Thomas


On Thu, Feb 14, 2019 at 10:37 AM Scott Wegner mailto:sc...@apache.org>> wrote:

I wanted to let you all know that I've decided to pursue a
new adventure in my career, which will take me away from
Apache Beam development.

It's been a fun and fulfilling journey. Apache Beam has been
my first significant experience working in open source. I'm
inspired observing how the community has come together to
deliver something great.

Thanks for everything. If you're curious what's next: I'll
be working on Federated Learning at Google:

https://ai.googleblog.com/2017/04/federated-learning-collaborative.html

Take care,
Scott



Got feedback? tinyurl.com/swegner-feedback




Re: Signing off

2019-02-15 Thread Alexey Romanenko
Good luck, Scott, with your new adventure! 

> On 15 Feb 2019, at 11:22, Maximilian Michels  wrote:
> 
> Thank you for your contributions Scott. Best of luck!
> 
> On 15.02.19 10:48, Michael Luckey wrote:
>> Hi Scott,
>> yes, thanks for all your time and all the best!
>> michel
>> On Fri, Feb 15, 2019 at 5:47 AM Kenneth Knowles >  >> 
>> wrote:
>>+1
>>Thanks for the contributions to community & code, and enjoy the new
>>chapter!
>>Kenn
>>On Thu, Feb 14, 2019 at 3:25 PM Thomas Weise > 
>>>> wrote:
>>Hi Scott,
>>Thank you for the many contributions to Beam and best of luck
>>with the new endeavor!
>>Thomas
>>On Thu, Feb 14, 2019 at 10:37 AM Scott Wegner > 
>>>> wrote:
>>I wanted to let you all know that I've decided to pursue a
>>new adventure in my career, which will take me away from
>>Apache Beam development.
>>It's been a fun and fulfilling journey. Apache Beam has been
>>my first significant experience working in open source. I'm
>>inspired observing how the community has come together to
>>deliver something great.
>>Thanks for everything. If you're curious what's next: I'll
>>be working on Federated Learning at Google:
>>
>> https://ai.googleblog.com/2017/04/federated-learning-collaborative.html
>>Take care,
>>Scott
>>Got feedback? tinyurl.com/swegner-feedback
>>> >



Re: Signing off

2019-02-15 Thread Łukasz Gajowy
Good luck!

pt., 15 lut 2019 o 11:24 Alexey Romanenko 
napisał(a):

> Good luck, Scott, with your new adventure!
>
> On 15 Feb 2019, at 11:22, Maximilian Michels  wrote:
>
> Thank you for your contributions Scott. Best of luck!
>
> On 15.02.19 10:48, Michael Luckey wrote:
>
> Hi Scott,
> yes, thanks for all your time and all the best!
> michel
> On Fri, Feb 15, 2019 at 5:47 AM Kenneth Knowles  mailto:k...@apache.org >> wrote:
>+1
>Thanks for the contributions to community & code, and enjoy the new
>chapter!
>Kenn
>On Thu, Feb 14, 2019 at 3:25 PM Thomas Weise > wrote:
>Hi Scott,
>Thank you for the many contributions to Beam and best of luck
>with the new endeavor!
>Thomas
>On Thu, Feb 14, 2019 at 10:37 AM Scott Wegner > wrote:
>I wanted to let you all know that I've decided to pursue a
>new adventure in my career, which will take me away from
>Apache Beam development.
>It's been a fun and fulfilling journey. Apache Beam has been
>my first significant experience working in open source. I'm
>inspired observing how the community has come together to
>deliver something great.
>Thanks for everything. If you're curious what's next: I'll
>be working on Federated Learning at Google:
>
> https://ai.googleblog.com/2017/04/federated-learning-collaborative.html
>Take care,
>Scott
>Got feedback? tinyurl.com/swegner-feedback
>
>
>
>


Re: Signing off

2019-02-15 Thread Ismaël Mejía
Your work and willingness to make Beam better will be missed.
Good luck for the next phase!

On Fri, Feb 15, 2019 at 1:39 PM Łukasz Gajowy  wrote:
>
> Good luck!
>
> pt., 15 lut 2019 o 11:24 Alexey Romanenko  
> napisał(a):
>>
>> Good luck, Scott, with your new adventure!
>>
>> On 15 Feb 2019, at 11:22, Maximilian Michels  wrote:
>>
>> Thank you for your contributions Scott. Best of luck!
>>
>> On 15.02.19 10:48, Michael Luckey wrote:
>>
>> Hi Scott,
>> yes, thanks for all your time and all the best!
>> michel
>> On Fri, Feb 15, 2019 at 5:47 AM Kenneth Knowles > > wrote:
>>+1
>>Thanks for the contributions to community & code, and enjoy the new
>>chapter!
>>Kenn
>>On Thu, Feb 14, 2019 at 3:25 PM Thomas Weise >> wrote:
>>Hi Scott,
>>Thank you for the many contributions to Beam and best of luck
>>with the new endeavor!
>>Thomas
>>On Thu, Feb 14, 2019 at 10:37 AM Scott Wegner >> wrote:
>>I wanted to let you all know that I've decided to pursue a
>>new adventure in my career, which will take me away from
>>Apache Beam development.
>>It's been a fun and fulfilling journey. Apache Beam has been
>>my first significant experience working in open source. I'm
>>inspired observing how the community has come together to
>>deliver something great.
>>Thanks for everything. If you're curious what's next: I'll
>>be working on Federated Learning at Google:
>>
>> https://ai.googleblog.com/2017/04/federated-learning-collaborative.html
>>Take care,
>>Scott
>>Got feedback? tinyurl.com/swegner-feedback
>>
>>
>>


Re: Thoughts on a reference runner to invest in?

2019-02-15 Thread Ismaël Mejía
Just a minor point, building a comunity around the Java-based ULR has
not happened in part because there has not been a lot of effort to try
to do so, focus has been into creating production ready runners (which
makes sense), but the opportunity is still there.

Kenn is right a more 'formalized' and 'readable' runner would be
amazing, but probably hard to get right without good gRPC support.

On Fri, Feb 15, 2019 at 6:05 AM Kenneth Knowles  wrote:
>
> Interesting point about community and the fact that it didn't build a 
> Java-based ULR even though it has been a possibility for a long time.
>
> It makes sense to me. A non-Java SDK needs portability to run on Beam's 
> distributed runners, so building the portable SDK harness is key, unlike for 
> Java. And to build it, a local portability-based runner is a great help 
> (can't really imagine doing it without one). And of course building it in 
> Python makes sense if you are steeped in Python.
>
> Joking-but-not-Joking the best reference runner would probably be in some 
> less popular but very readable functional language so it is different from 
> every SDK :-). I've looked into it and discovered that gRPC support is not 
> great...
>
> Kenn
>
> On Thu, Feb 14, 2019 at 5:47 AM Robert Bradshaw  wrote:
>>
>> I think it's good to distinguish between direct runners (which would
>> be good to have in every language, and can grow in sophistication with
>> the userbase) and a fully universal reference runner. We should of
>> course continue to grow and maintain the java-runners-core shared
>> library, possibly as driven by the various production runners which
>> has been the most productive to date. (The point about community is a
>> good one. Unfortunately over the past 1.5 years the bigger Java
>> community has not resulted in a more complete Java ULR (in terms of
>> number of contributors or features/maturity), and it's unclear what
>> would change that in the future.)
>>
>> It would be really great to have (at least) two completely separate
>> implementations, but (at the moment at least) I see that as lower
>> value than accelerating the efforts to get existing production runners
>> onto portability.
>>
>> On Thu, Feb 14, 2019 at 2:01 PM Ismaël Mejía  wrote:
>> >
>> > This is a really interesting and important discussion. Having multiple
>> > reference runners can have its pros and cons. It is all about
>> > tradeoffs. From the end user point of view it can feel weird to deal
>> > with tools and packaging of a different ecosystem, e.g. python devs
>> > dealing with all the quirkiness of Java packaging, or the viceversa
>> > Java developers dealing with pip and friends. So having a reference
>> > runner per language would be more natural and help also valídate the
>> > portability concept, however having multiple reference runners sounds
>> > harder from the maintenance point of view.
>> >
>> > Most of the software in the domain of beam have been traditionally
>> > written in Java so there is a BIG advantage of ready to use (and
>> > mature) libraries and reusable components (also the reference runner
>> > may profit of the librarires that Thomas and others in the community
>> > have developed for multi runner s). This is a big win, but more
>> > important, we can have more eyes looking and contributing improvemetns
>> > and fixes that will benefit the reference runner and others.
>> >
>> > Having a reference runner per language would be nice but if we must
>> > choose only one language I prefer it to be Java just because we have a
>> > bigger community that can contribute and improve it. We may work on
>> > making the distribution of such runner more easier or friendly for
>> > users of different languages.
>> >
>> > On Wed, Feb 13, 2019 at 3:47 AM Robert Bradshaw  
>> > wrote:
>> > >
>> > > I agree, it's useful for runners that are used for tests (including 
>> > > testing SDKs) to push into the dark corners of what's allowed by the 
>> > > spec. I think this can be added (where they don't already exist) to 
>> > > existing non-production runners. (Whether a direct runner should be 
>> > > considered production or not depends on who you ask...)
>> > >
>> > > On Wed, Feb 13, 2019 at 2:49 AM Daniel Oliveira  
>> > > wrote:
>> > >>
>> > >> +1 to Kenn's point. Regardless of whether we go with a Python runner or 
>> > >> a Java runner, I think we should have at least one portable runner that 
>> > >> isn't a production runner for the reasons he outlined.
>> > >>
>> > >> As for the rest of the discussion, it sounds like people are generally 
>> > >> supportive of having the Python FnApiRunner as that runner, and using 
>> > >> Flink as a reference implementation for portability in Java.
>> > >>
>> > >> On Tue, Feb 12, 2019 at 4:37 PM Kenneth Knowles  wrote:
>> > >>>
>> > >>>
>> > >>> On Tue, Feb 12, 2019 at 8:59 AM Thomas Weise  wrote:
>> > 
>> >  The Java ULR initially provided some value for the portability effort 
>> >  as Max mentions. It helpe

Re: Signing off

2019-02-15 Thread Etienne Chauchot
Thank you for your contributions Scott ! Your new project seems very fun. Enjoy 
!
Etienne
Le vendredi 15 février 2019 à 15:01 +0100, Ismaël Mejía a écrit :
> Your work and willingness to make Beam better will be missed.Good luck for 
> the next phase!
> On Fri, Feb 15, 2019 at 1:39 PM Łukasz Gajowy  wrote:
> 
> Good luck!
> pt., 15 lut 2019 o 11:24 Alexey Romanenko  
> napisał(a):
> 
> Good luck, Scott, with your new adventure!
> On 15 Feb 2019, at 11:22, Maximilian Michels  wrote:
> Thank you for your contributions Scott. Best of luck!
> On 15.02.19 10:48, Michael Luckey wrote:
> Hi Scott,yes, thanks for all your time and all the best!michelOn Fri, Feb 15, 
> 2019 at 5:47 AM Kenneth Knowles
> mailto:k...@apache.org>> wrote:   +1   Thanks for the 
> contributions to community & code, and enjoy
> the new   chapter!   Kenn   On Thu, Feb 14, 2019 at 3:25 PM Thomas Weise 
> mailto:t...@apache.org>>
> wrote:   Hi Scott,   Thank you for the many contributions to Beam and 
> best of luck   with the new
> endeavor!   Thomas   On Thu, Feb 14, 2019 at 10:37 AM Scott Wegner
> mailto:sc...@apache.org>> wrote:   I wanted 
> to let you all know that I've decided to
> pursue a   new adventure in my career, which will take me away from   
> Apache Beam
> development.   It's been a fun and fulfilling journey. Apache Beam 
> has been   my first significant
> experience working in open source. I'm   inspired observing how the 
> community has come together
> to   deliver something great.   Thanks for everything. If 
> you're curious what's next:
> I'll   be working on Federated Learning at Google:   
> https://ai.googleblog.com/2017/04/federated-learning-collaborative.html   
> Take
> care,   Scott   Got feedback? tinyurl.com/swegner-feedback
><
> https://tinyurl.com/swegner-feedback>
> 


Re: [RESULT] [VOTE] Release 2.10.0, release candidate #3

2019-02-15 Thread Alexey Romanenko
I just wanted to confirm that, finally, it has been released? 
I can see 2.10 artifacts in maven repo and release notes on website but I don’t 
recall if it was announced on mailing list.

> On 11 Feb 2019, at 18:42, Kenneth Knowles  wrote:
> 
> Thank you everyone for voting.
> 
> The vote has passed with 9 supportive +1 votes, 6 of which are binding PMC 
> votes:
> 
> * Ahmet Altay
> * Robert Bradshaw
> * Etienne Chauchot
> * Kenneth Knowles
> * Reuven Lax
> * Maximilian Michels
> 
> I will proceed with release finalization steps.
> 
> Kenn
> 
> On Mon, Feb 11, 2019 at 9:41 AM Kenneth Knowles  > wrote:
> +1
> 
> On Fri, Feb 8, 2019 at 12:37 PM Chamikara Jayalath  > wrote:
> +1. Verified that leaderboard passes with Dataflow streaming engine (which 
> was broken for 2.9.0).
> 
> Thanks,
> Cham
> 
> On Fri, Feb 8, 2019 at 9:58 AM Ahmet Altay  > wrote:
> +1. I verified python quick start examples.
> 
> On Fri, Feb 8, 2019 at 8:11 AM Etienne Chauchot  > wrote:
> Thanks Robert !
> 
> Etienne
> 
> Le vendredi 08 février 2019 à 16:42 +0100, Robert Bradshaw a écrit :
>> +1 (binding)
>> 
>> I have verified that the artifacts and their checksums/signatures look good, 
>> and also checked the Python wheels against simple pipelines. 
>> 
>> On Fri, Feb 8, 2019 at 4:29 PM Etienne Chauchot > > wrote:
>>> Hi,
>>> I did the same visual checks of Nexmark that I did on RC2 for both 
>>> functional regressions (output size) and performance regressions (execution 
>>> time) on all the runners/modes for RC3 cut date (02/06) and I saw no 
>>> regression except the one that I already mentioned (end of october perf 
>>> degradation on Q7 in spark batch mode) but it was already in previous 
>>> version.
>>> 
>>> Though I did not have time to check the artifacts. +1 (binding) provided 
>>> that artifacts are correct
>>> 
>>> Etienne
>>> 
>>> Le jeudi 07 février 2019 à 10:25 -0800, Scott Wegner a écrit :
 +1
 
 I validated running:
 * Java Quickstart (Direct)
 * Java Quickstart (Apex local)
 * Java Quickstart (Flink local)
 * Java Quickstart (Spark local)
 * Java Quickstart (Dataflow)
 * Java Mobile Game (Dataflow) 
 
 On Wed, Feb 6, 2019 at 2:28 PM Kenneth Knowles >>> > wrote:
> Hi everyone,
> 
> Please review and vote on the release candidate #3 for the version 
> 2.10.0, as follows:
> 
> [ ] +1, Approve the release
> [ ] -1, Do not approve the release (please provide specific comments)
> 
> The complete staging area is available for your review, which includes:
> * JIRA release notes [1],
> * the official Apache source release to be deployed to dist.apache.org 
>  [2], which is signed with the key with 
> fingerprint 6ED551A8AE02461C [3],
> * all artifacts to be deployed to the Maven Central Repository [4],
> * source code tag "v2.10.0-RC3" [5],
> * website pull request listing the release [6] and publishing the API 
> reference manual [7].
> * Python artifacts are deployed along with the source release to the 
> dist.apache.org  [2].
> * Validation sheet with a tab for 2.10.0 release to help with validation 
> [7].
> 
> The vote will be open for at least 72 hours. It is adopted by majority 
> approval, with at least 3 PMC affirmative votes.
> 
> Thanks,
> Kenn
> 
> [1] 
> https://issues.apache.org/jira/secure/ReleaseNote.jspa?projectId=12319527&version=12344540
>  
> 
> [2] https://dist.apache.org/repos/dist/dev/beam/2.10.0/ 
> 
> [3] https://dist.apache.org/repos/dist/release/beam/KEYS 
> 
> [4] 
> https://repository.apache.org/content/repositories/orgapachebeam-1058/ 
> 
> [5] https://github.com/apache/beam/tree/v2.10.0-RC3 
> 
> [6] https://github.com/apache/beam/pull/7651/files 
> 
> [7] https://github.com/apache/beam-site/pull/586 
> 
> [8] 
> https://docs.google.com/spreadsheets/d/1qk-N5vjXvbcEk68GjbkSZTR8AGqyNUM-oLFo_ZXBpJw/edit#gid=2053422529
>  
> 
 



Re: [RESULT] [VOTE] Release 2.10.0, release candidate #3

2019-02-15 Thread Kenneth Knowles
Still drafting the announcements and whatnot. Sorry for the delay.

Kenn

On Fri, Feb 15, 2019 at 8:29 AM Alexey Romanenko 
wrote:

> I just wanted to confirm that, finally, it has been released?
> I can see 2.10 artifacts in maven repo and release notes on website but I
> don’t recall if it was announced on mailing list.
>
> On 11 Feb 2019, at 18:42, Kenneth Knowles  wrote:
>
> Thank you everyone for voting.
>
> The vote has passed with 9 supportive +1 votes, 6 of which are binding PMC
> votes:
>
> * Ahmet Altay
> * Robert Bradshaw
> * Etienne Chauchot
> * Kenneth Knowles
> * Reuven Lax
> * Maximilian Michels
>
> I will proceed with release finalization steps.
>
> Kenn
>
> On Mon, Feb 11, 2019 at 9:41 AM Kenneth Knowles  wrote:
>
>> +1
>>
>> On Fri, Feb 8, 2019 at 12:37 PM Chamikara Jayalath 
>> wrote:
>>
>>> +1. Verified that leaderboard passes with Dataflow streaming engine
>>> (which was broken for 2.9.0).
>>>
>>> Thanks,
>>> Cham
>>>
>>> On Fri, Feb 8, 2019 at 9:58 AM Ahmet Altay  wrote:
>>>
 +1. I verified python quick start examples.

 On Fri, Feb 8, 2019 at 8:11 AM Etienne Chauchot 
 wrote:

> Thanks Robert !
>
> Etienne
>
> Le vendredi 08 février 2019 à 16:42 +0100, Robert Bradshaw a écrit :
>
> +1 (binding)
>
> I have verified that the artifacts and their checksums/signatures look
> good, and also checked the Python wheels against simple pipelines.
>
> On Fri, Feb 8, 2019 at 4:29 PM Etienne Chauchot 
> wrote:
>
> Hi,
> I did the same visual checks of Nexmark that I did on RC2 for both
> functional regressions (output size) and performance regressions 
> (execution
> time) on all the runners/modes for RC3 cut date (02/06) and I saw no
> regression except the one that I already mentioned (end of october perf
> degradation on Q7 in spark batch mode) but it was already in previous
> version.
>
> Though I did not have time to check the artifacts. +1 (binding)
> provided that artifacts are correct
>
> Etienne
>
> Le jeudi 07 février 2019 à 10:25 -0800, Scott Wegner a écrit :
>
> +1
>
> I validated running:
> * Java Quickstart (Direct)
> * Java Quickstart (Apex local)
> * Java Quickstart (Flink local)
> * Java Quickstart (Spark local)
> * Java Quickstart (Dataflow)
> * Java Mobile Game (Dataflow)
>
> On Wed, Feb 6, 2019 at 2:28 PM Kenneth Knowles 
> wrote:
>
> Hi everyone,
>
> Please review and vote on the release candidate #3 for the version 2
> .10.0, as follows:
>
> [ ] +1, Approve the release
> [ ] -1, Do not approve the release (please provide specific comments)
>
> The complete staging area is available for your review, which includes:
> * JIRA release notes [1],
> * the official Apache source release to be deployed to dist.apache.org
>  [2], which is signed with the key with
> fingerprint 6ED551A8AE02461C [3],
> * all artifacts to be deployed to the Maven Central Repository [4],
> * source code tag "v2.10.0-RC3" [5],
> * website pull request listing the release [6] and publishing the API
> reference manual [7].
> * Python artifacts are deployed along with the source release to the
> dist.apache.org [2].
> * Validation sheet with a tab for 2.10.0 release to help with
> validation [7].
>
> The vote will be open for at least 72 hours. It is adopted by
> majority approval, with at least 3 PMC affirmative votes.
>
> Thanks,
> Kenn
>
> [1]
> https://issues.apache.org/jira/secure/ReleaseNote.jspa?projectId=12319527&version=12344540
> [2] https://dist.apache.org/repos/dist/dev/beam/2.10.0/
> [3] https://dist.apache.org/repos/dist/release/beam/KEYS
> [4]
> https://repository.apache.org/content/repositories/orgapachebeam-1058/
> [5] https://github.com/apache/beam/tree/v2.10.0-RC3
> [6] https://github.com/apache/beam/pull/7651/files
> [7] https://github.com/apache/beam-site/pull/586
> [8]
> https://docs.google.com/spreadsheets/d/1qk-N5vjXvbcEk68GjbkSZTR8AGqyNUM-oLFo_ZXBpJw/edit#gid=2053422529
>
>
>
>
>


Re: Signing off

2019-02-15 Thread Alex Amato
Thanks's for your contributions Scott. We will miss you.

On Fri, Feb 15, 2019 at 7:08 AM Etienne Chauchot 
wrote:

> Thank you for your contributions Scott ! Your new project seems very fun.
> Enjoy !
>
> Etienne
>
> Le vendredi 15 février 2019 à 15:01 +0100, Ismaël Mejía a écrit :
>
> Your work and willingness to make Beam better will be missed.
>
> Good luck for the next phase!
>
>
> On Fri, Feb 15, 2019 at 1:39 PM Łukasz Gajowy  wrote:
>
>
> Good luck!
>
>
> pt., 15 lut 2019 o 11:24 Alexey Romanenko  
> napisał(a):
>
>
> Good luck, Scott, with your new adventure!
>
>
> On 15 Feb 2019, at 11:22, Maximilian Michels  wrote:
>
>
> Thank you for your contributions Scott. Best of luck!
>
>
> On 15.02.19 10:48, Michael Luckey wrote:
>
>
> Hi Scott,
>
> yes, thanks for all your time and all the best!
>
> michel
>
> On Fri, Feb 15, 2019 at 5:47 AM Kenneth Knowles  > wrote:
>
>+1
>
>Thanks for the contributions to community & code, and enjoy the new
>
>chapter!
>
>Kenn
>
>On Thu, Feb 14, 2019 at 3:25 PM Thomas Weise 
>> wrote:
>
>Hi Scott,
>
>Thank you for the many contributions to Beam and best of luck
>
>with the new endeavor!
>
>Thomas
>
>On Thu, Feb 14, 2019 at 10:37 AM Scott Wegner 
>> wrote:
>
>I wanted to let you all know that I've decided to pursue a
>
>new adventure in my career, which will take me away from
>
>Apache Beam development.
>
>It's been a fun and fulfilling journey. Apache Beam has been
>
>my first significant experience working in open source. I'm
>
>inspired observing how the community has come together to
>
>deliver something great.
>
>Thanks for everything. If you're curious what's next: I'll
>
>be working on Federated Learning at Google:
>
>
> https://ai.googleblog.com/2017/04/federated-learning-collaborative.html
>
>Take care,
>
>Scott
>
>Got feedback? tinyurl.com/swegner-feedback
>
>
>
>
>
>


Re: Signing off

2019-02-15 Thread Udi Meiri
Good luck Scott!

On Fri, Feb 15, 2019 at 9:32 AM Alex Amato  wrote:

> Thanks's for your contributions Scott. We will miss you.
>
> On Fri, Feb 15, 2019 at 7:08 AM Etienne Chauchot 
> wrote:
>
>> Thank you for your contributions Scott ! Your new project seems very fun.
>> Enjoy !
>>
>> Etienne
>>
>> Le vendredi 15 février 2019 à 15:01 +0100, Ismaël Mejía a écrit :
>>
>> Your work and willingness to make Beam better will be missed.
>>
>> Good luck for the next phase!
>>
>>
>> On Fri, Feb 15, 2019 at 1:39 PM Łukasz Gajowy  wrote:
>>
>>
>> Good luck!
>>
>>
>> pt., 15 lut 2019 o 11:24 Alexey Romanenko  
>> napisał(a):
>>
>>
>> Good luck, Scott, with your new adventure!
>>
>>
>> On 15 Feb 2019, at 11:22, Maximilian Michels  wrote:
>>
>>
>> Thank you for your contributions Scott. Best of luck!
>>
>>
>> On 15.02.19 10:48, Michael Luckey wrote:
>>
>>
>> Hi Scott,
>>
>> yes, thanks for all your time and all the best!
>>
>> michel
>>
>> On Fri, Feb 15, 2019 at 5:47 AM Kenneth Knowles > > wrote:
>>
>>+1
>>
>>Thanks for the contributions to community & code, and enjoy the new
>>
>>chapter!
>>
>>Kenn
>>
>>On Thu, Feb 14, 2019 at 3:25 PM Thomas Weise >
>>> wrote:
>>
>>Hi Scott,
>>
>>Thank you for the many contributions to Beam and best of luck
>>
>>with the new endeavor!
>>
>>Thomas
>>
>>On Thu, Feb 14, 2019 at 10:37 AM Scott Wegner >
>>> wrote:
>>
>>I wanted to let you all know that I've decided to pursue a
>>
>>new adventure in my career, which will take me away from
>>
>>Apache Beam development.
>>
>>It's been a fun and fulfilling journey. Apache Beam has been
>>
>>my first significant experience working in open source. I'm
>>
>>inspired observing how the community has come together to
>>
>>deliver something great.
>>
>>Thanks for everything. If you're curious what's next: I'll
>>
>>be working on Federated Learning at Google:
>>
>>
>> https://ai.googleblog.com/2017/04/federated-learning-collaborative.html
>>
>>Take care,
>>
>>Scott
>>
>>Got feedback? tinyurl.com/swegner-feedback
>>
>>
>>
>>
>>
>>


smime.p7s
Description: S/MIME Cryptographic Signature


Re: [RESULT] [VOTE] Release 2.10.0, release candidate #3

2019-02-15 Thread Alexey Romanenko
Thanks, Kenn! 
I just wanted to confirm if I missed something.

> On 15 Feb 2019, at 17:53, Kenneth Knowles  wrote:
> 
> Still drafting the announcements and whatnot. Sorry for the delay.
> 
> Kenn
> 
> On Fri, Feb 15, 2019 at 8:29 AM Alexey Romanenko  > wrote:
> I just wanted to confirm that, finally, it has been released? 
> I can see 2.10 artifacts in maven repo and release notes on website but I 
> don’t recall if it was announced on mailing list.
> 
>> On 11 Feb 2019, at 18:42, Kenneth Knowles > > wrote:
>> 
>> Thank you everyone for voting.
>> 
>> The vote has passed with 9 supportive +1 votes, 6 of which are binding PMC 
>> votes:
>> 
>> * Ahmet Altay
>> * Robert Bradshaw
>> * Etienne Chauchot
>> * Kenneth Knowles
>> * Reuven Lax
>> * Maximilian Michels
>> 
>> I will proceed with release finalization steps.
>> 
>> Kenn
>> 
>> On Mon, Feb 11, 2019 at 9:41 AM Kenneth Knowles > > wrote:
>> +1
>> 
>> On Fri, Feb 8, 2019 at 12:37 PM Chamikara Jayalath > > wrote:
>> +1. Verified that leaderboard passes with Dataflow streaming engine (which 
>> was broken for 2.9.0).
>> 
>> Thanks,
>> Cham
>> 
>> On Fri, Feb 8, 2019 at 9:58 AM Ahmet Altay > > wrote:
>> +1. I verified python quick start examples.
>> 
>> On Fri, Feb 8, 2019 at 8:11 AM Etienne Chauchot > > wrote:
>> Thanks Robert !
>> 
>> Etienne
>> 
>> Le vendredi 08 février 2019 à 16:42 +0100, Robert Bradshaw a écrit :
>>> +1 (binding)
>>> 
>>> I have verified that the artifacts and their checksums/signatures look 
>>> good, and also checked the Python wheels against simple pipelines. 
>>> 
>>> On Fri, Feb 8, 2019 at 4:29 PM Etienne Chauchot >> > wrote:
 Hi,
 I did the same visual checks of Nexmark that I did on RC2 for both 
 functional regressions (output size) and performance regressions 
 (execution time) on all the runners/modes for RC3 cut date (02/06) and I 
 saw no regression except the one that I already mentioned (end of october 
 perf degradation on Q7 in spark batch mode) but it was already in previous 
 version.
 
 Though I did not have time to check the artifacts. +1 (binding) provided 
 that artifacts are correct
 
 Etienne
 
 Le jeudi 07 février 2019 à 10:25 -0800, Scott Wegner a écrit :
> +1
> 
> I validated running:
> * Java Quickstart (Direct)
> * Java Quickstart (Apex local)
> * Java Quickstart (Flink local)
> * Java Quickstart (Spark local)
> * Java Quickstart (Dataflow)
> * Java Mobile Game (Dataflow) 
> 
> On Wed, Feb 6, 2019 at 2:28 PM Kenneth Knowles  > wrote:
>> Hi everyone,
>> 
>> Please review and vote on the release candidate #3 for the version 
>> 2.10.0, as follows:
>> 
>> [ ] +1, Approve the release
>> [ ] -1, Do not approve the release (please provide specific comments)
>> 
>> The complete staging area is available for your review, which includes:
>> * JIRA release notes [1],
>> * the official Apache source release to be deployed to dist.apache.org 
>>  [2], which is signed with the key with 
>> fingerprint 6ED551A8AE02461C [3],
>> * all artifacts to be deployed to the Maven Central Repository [4],
>> * source code tag "v2.10.0-RC3" [5],
>> * website pull request listing the release [6] and publishing the API 
>> reference manual [7].
>> * Python artifacts are deployed along with the source release to the 
>> dist.apache.org  [2].
>> * Validation sheet with a tab for 2.10.0 release to help with validation 
>> [7].
>> 
>> The vote will be open for at least 72 hours. It is adopted by majority 
>> approval, with at least 3 PMC affirmative votes.
>> 
>> Thanks,
>> Kenn
>> 
>> [1] 
>> https://issues.apache.org/jira/secure/ReleaseNote.jspa?projectId=12319527&version=12344540
>>  
>> 
>> [2] https://dist.apache.org/repos/dist/dev/beam/2.10.0/ 
>> 
>> [3] https://dist.apache.org/repos/dist/release/beam/KEYS 
>> 
>> [4] 
>> https://repository.apache.org/content/repositories/orgapachebeam-1058/ 
>> 
>> [5] https://github.com/apache/beam/tree/v2.10.0-RC3 
>> 
>> [6] https://github.com/apache/beam/pull/7651/files 
>> 
>> [7] https://github.com/apache/beam-site/pull/586 
>> 
>> [8] 

[ANNOUNCE] Apache Beam 2.10.0 released!

2019-02-15 Thread Kenneth Knowles
The Apache Beam team is pleased to announce the release of version 2.10.0!

Apache Beam is an open source unified programming model to define and
execute data processing pipelines, including ETL, batch and stream
(continuous) processing. See https://beam.apache.org

You can download the release here:

https://beam.apache.org/get-started/downloads/

This release includes bugfixes, features, and improvements detailed on the
Beam blog: https://beam.apache.org/blog/2019/02/15/beam-2.10.0.html

Thanks to everyone who contributed to this release, and we hope you enjoy
using Beam 2.10.0.

-- Kenneth Knowles, on behalf of The Apache Beam team


Re: Signing off

2019-02-15 Thread Robin Qiu
Thanks Scott and good luck in your next adventure!

Best,
Robin

On Fri, Feb 15, 2019 at 9:35 AM Udi Meiri  wrote:

> Good luck Scott!
>
> On Fri, Feb 15, 2019 at 9:32 AM Alex Amato  wrote:
>
>> Thanks's for your contributions Scott. We will miss you.
>>
>> On Fri, Feb 15, 2019 at 7:08 AM Etienne Chauchot 
>> wrote:
>>
>>> Thank you for your contributions Scott ! Your new project seems very
>>> fun. Enjoy !
>>>
>>> Etienne
>>>
>>> Le vendredi 15 février 2019 à 15:01 +0100, Ismaël Mejía a écrit :
>>>
>>> Your work and willingness to make Beam better will be missed.
>>>
>>> Good luck for the next phase!
>>>
>>>
>>> On Fri, Feb 15, 2019 at 1:39 PM Łukasz Gajowy  wrote:
>>>
>>>
>>> Good luck!
>>>
>>>
>>> pt., 15 lut 2019 o 11:24 Alexey Romanenko  
>>> napisał(a):
>>>
>>>
>>> Good luck, Scott, with your new adventure!
>>>
>>>
>>> On 15 Feb 2019, at 11:22, Maximilian Michels  wrote:
>>>
>>>
>>> Thank you for your contributions Scott. Best of luck!
>>>
>>>
>>> On 15.02.19 10:48, Michael Luckey wrote:
>>>
>>>
>>> Hi Scott,
>>>
>>> yes, thanks for all your time and all the best!
>>>
>>> michel
>>>
>>> On Fri, Feb 15, 2019 at 5:47 AM Kenneth Knowles >> > wrote:
>>>
>>>+1
>>>
>>>Thanks for the contributions to community & code, and enjoy the new
>>>
>>>chapter!
>>>
>>>Kenn
>>>
>>>On Thu, Feb 14, 2019 at 3:25 PM Thomas Weise >>
>>>> wrote:
>>>
>>>Hi Scott,
>>>
>>>Thank you for the many contributions to Beam and best of luck
>>>
>>>with the new endeavor!
>>>
>>>Thomas
>>>
>>>On Thu, Feb 14, 2019 at 10:37 AM Scott Wegner >>
>>>> wrote:
>>>
>>>I wanted to let you all know that I've decided to pursue a
>>>
>>>new adventure in my career, which will take me away from
>>>
>>>Apache Beam development.
>>>
>>>It's been a fun and fulfilling journey. Apache Beam has been
>>>
>>>my first significant experience working in open source. I'm
>>>
>>>inspired observing how the community has come together to
>>>
>>>deliver something great.
>>>
>>>Thanks for everything. If you're curious what's next: I'll
>>>
>>>be working on Federated Learning at Google:
>>>
>>>
>>> https://ai.googleblog.com/2017/04/federated-learning-collaborative.html
>>>
>>>Take care,
>>>
>>>Scott
>>>
>>>Got feedback? tinyurl.com/swegner-feedback
>>>
>>>
>>>
>>>
>>>
>>>


Beam Jenkins job summary available in .test-infra/jenkins/REAMDE.md

2019-02-15 Thread Mark Liu
TL;DR: Check out .test-infra/jenkins/REAMDE.md
 for
Beam Jenkins job summary!

Hi folks,

I found it's difficult for me to quickly find particular Jenkins job link
or PR trigger phrase during development and PR review. So I collected some
useful job information from groovy files and put them in
.test-infra/jenkins/REAMDE.md
.
And also linked this file from PR template
.
Due to large number of jobs we currently running, I group them into few
tables: PreCommit, PopstCommit, Performance, Inventory and Others.
Hopefully this's clear and also helpful to other contributors.

Since the README is generated based on current state of Jenkins groovy
files, so unfortunately any further changes won't be reflected there
without manual update.

Thanks,
Mark


Dependency management for multiple IOs

2019-02-15 Thread Anton Kedin
Hi dev@,

I have a problem, I don't know a good way to approach the dependency
management between Beam SQL and Beam IOs, and want to collect thoughts
about it.

Beam SQL depends on specific IOs so that users can query them. The IOs need
their dependencies to work. Sometimes the IOs also leak their transitive
dependencies (e.g. HCatRecord leaked from HCatalogIO). So if in SQL we want
to build abstractions on top of these IOs we risk having to bundle the
whole IOs or the leaked dependencies. Overall we can probably avoid it by
making the IOs `provided` dependencies, and by refactoring the code that
leaks. In this case things can be made to build, simple tests will run, and
we won't need to bundle the IOs within SQL.

But as soon as there's a need to actually work with multiple IOs at the
same time the conflicts appear. For example, for testing of Hive/HCatalog
IOs in SQL we need to create an embedded Hive Metastore instance. It is a
very Hive-specific thing that requires its own dependencies that have to be
loaded during testing as part of SQL project. And some other IOs (e.g.
KafkaIO) can bring similar but conflicting dependencies which means that we
cannot easily work with or test both IOs at the same time within SQL. I
think it will become insane as number of IOs supported in SQL grows.

So the question is how to avoid conflicts between IOs within SQL?

One approach is to create separate packages for each of the SQL-specific IO
wrappers, e.g. `beam-sdks-java-extensions-sql-hcatalog`,
`beam-sdks-java-extensions-sql-kafka`,
etc. These projects will compile-depend on Beam SQL and on specific IO.
Beam SQL will load these either from user-specified configuration or
something like @AutoService at runtime. This way Beam SQL doesn't know
about the details of the IOs and their dependencies, and they can be easily
tested in isolation without conflicting with each other. This should also
be relatively simple to manage if things change, the build logic should be
straightforward and easy to update. On the negative side, each of the
projects will require its own separate build logic, it will not be easy to
test multiple IOs together within SQL, and users will have to manage the
conflicting dependencies by themselves.

Another approach is to keep things roughly as they are but create separate
configurations within the main `build.gradle` in SQL project, where
configurations will correspond to separate IOs or use cases (e.g. testing
of Hive-related IOs). The benefit is that everything related to SQL IOs
stays roughly in one place (including build logic) and can be built and
tested together when possible. Negative side is that it will probably
involve some groovy magic and classpath manipulation within Gradle tasks to
make the configurations work, plus it may be brittle if we change our
top-level Beam build logic. And this approach also doesn't make it easier
for the users to manage the conflicts.

Longer term we could probably also reduce the abstraction thickness on top
of the IOs, so that Beam SQL can work directly with IOs. For this to work
the supported IOs will need to expose things like `readRows()` and get/set
the schema on the PCollection. This is probably aligned with the Schema
work that's happening at the moment but I don't know whether it makes sense
to focus on this right now. The problem of the dependencies is not solved
here as well but I think it will be at least the same problem as the users
already have if they see conflicts when using mutliple IOs with Beam
pipelines.'

Thoughts, ideas? Did anyone ever face a problem like this or am I
completely misunderstanding something in  Beam build logic?

Regards,
Anton


Re: Dependency management for multiple IOs

2019-02-15 Thread Chamikara Jayalath
I think the underlying problem is two modules of Beam transitively
depending on conflicting dependencies (a.k.a. the diamond dependency
problem) ?

I think the general solution for this is two fold. (at least the way we
have formulated in https://beam.apache.org/contribute/dependencies/)

(1) Keep Beam dependencies as much as possible hoping that transitive
dependencies stay compatible (we rely on semantic versioning here to not
cause problems for differences in minor/patch versions. Might not be the
case in practice for some dependencies).
(2) For modules with outdated dependencies that we cannot upgrade due to
some reason, we'll vendor those modules.

Not sure if your specific problem need something more.

Thanks,
Cham

On Fri, Feb 15, 2019 at 4:48 PM Anton Kedin  wrote:

> Hi dev@,
>
> I have a problem, I don't know a good way to approach the dependency
> management between Beam SQL and Beam IOs, and want to collect thoughts
> about it.
>
> Beam SQL depends on specific IOs so that users can query them. The IOs
> need their dependencies to work. Sometimes the IOs also leak their
> transitive dependencies (e.g. HCatRecord leaked from HCatalogIO). So if in
> SQL we want to build abstractions on top of these IOs we risk having to
> bundle the whole IOs or the leaked dependencies. Overall we can probably
> avoid it by making the IOs `provided` dependencies, and by refactoring the
> code that leaks. In this case things can be made to build, simple tests
> will run, and we won't need to bundle the IOs within SQL.
>
> But as soon as there's a need to actually work with multiple IOs at the
> same time the conflicts appear. For example, for testing of Hive/HCatalog
> IOs in SQL we need to create an embedded Hive Metastore instance. It is a
> very Hive-specific thing that requires its own dependencies that have to be
> loaded during testing as part of SQL project. And some other IOs (e.g.
> KafkaIO) can bring similar but conflicting dependencies which means that we
> cannot easily work with or test both IOs at the same time within SQL. I
> think it will become insane as number of IOs supported in SQL grows.
>
> So the question is how to avoid conflicts between IOs within SQL?
>
> One approach is to create separate packages for each of the SQL-specific
> IO wrappers, e.g. `beam-sdks-java-extensions-sql-hcatalog`, 
> `beam-sdks-java-extensions-sql-kafka`,
> etc. These projects will compile-depend on Beam SQL and on specific IO.
> Beam SQL will load these either from user-specified configuration or
> something like @AutoService at runtime. This way Beam SQL doesn't know
> about the details of the IOs and their dependencies, and they can be easily
> tested in isolation without conflicting with each other. This should also
> be relatively simple to manage if things change, the build logic should be
> straightforward and easy to update. On the negative side, each of the
> projects will require its own separate build logic, it will not be easy to
> test multiple IOs together within SQL, and users will have to manage the
> conflicting dependencies by themselves.
>
> Another approach is to keep things roughly as they are but create separate
> configurations within the main `build.gradle` in SQL project, where
> configurations will correspond to separate IOs or use cases (e.g. testing
> of Hive-related IOs). The benefit is that everything related to SQL IOs
> stays roughly in one place (including build logic) and can be built and
> tested together when possible. Negative side is that it will probably
> involve some groovy magic and classpath manipulation within Gradle tasks to
> make the configurations work, plus it may be brittle if we change our
> top-level Beam build logic. And this approach also doesn't make it easier
> for the users to manage the conflicts.
>
> Longer term we could probably also reduce the abstraction thickness on top
> of the IOs, so that Beam SQL can work directly with IOs. For this to work
> the supported IOs will need to expose things like `readRows()` and get/set
> the schema on the PCollection. This is probably aligned with the Schema
> work that's happening at the moment but I don't know whether it makes sense
> to focus on this right now. The problem of the dependencies is not solved
> here as well but I think it will be at least the same problem as the users
> already have if they see conflicts when using mutliple IOs with Beam
> pipelines.'
>
> Thoughts, ideas? Did anyone ever face a problem like this or am I
> completely misunderstanding something in  Beam build logic?
>
> Regards,
> Anton
>


Re: Dependency management for multiple IOs

2019-02-15 Thread Kenneth Knowles
I'm not totally convinced Beam's dep versions are the issue here. A user
may have an organizational requirement of a particular version of, say,
Kafka and Hive. So when they depend on Beam they probably pin those
versions of Kafka and Hive which they have determined work together, and
they hope that the Beam IOs work together.

I see this as a choice between two scenarios for users:

1. SQL <--- KafkaTable (@AutoService) --> KafkaIO ---provided->
Kafka
2. SQL (includes KafkaTable) optional> KafkaIO -provided->
Kakfa

For users of 1, they depend on Beam Java, Beam SQL, SQL Kafka Table, and
pin a version of Kafka
For users of 2, they depend on Beam Java, Beam SQL, KakfaIO, and pin a
version of Kafka

To be honest it is really hard to see which is preferable. I think number 1
has fewer funky dependency edges, more simple "compile + runtime"
dependencies.

Kenn




Kenn

On Fri, Feb 15, 2019 at 6:06 PM Chamikara Jayalath 
wrote:

> I think the underlying problem is two modules of Beam transitively
> depending on conflicting dependencies (a.k.a. the diamond dependency
> problem) ?
>
> I think the general solution for this is two fold. (at least the way we
> have formulated in https://beam.apache.org/contribute/dependencies/)
>
> (1) Keep Beam dependencies as much as possible hoping that transitive
> dependencies stay compatible (we rely on semantic versioning here to not
> cause problems for differences in minor/patch versions. Might not be the
> case in practice for some dependencies).
> (2) For modules with outdated dependencies that we cannot upgrade due to
> some reason, we'll vendor those modules.
>
> Not sure if your specific problem need something more.
>
> Thanks,
> Cham
>
> On Fri, Feb 15, 2019 at 4:48 PM Anton Kedin  wrote:
>
>> Hi dev@,
>>
>> I have a problem, I don't know a good way to approach the dependency
>> management between Beam SQL and Beam IOs, and want to collect thoughts
>> about it.
>>
>> Beam SQL depends on specific IOs so that users can query them. The IOs
>> need their dependencies to work. Sometimes the IOs also leak their
>> transitive dependencies (e.g. HCatRecord leaked from HCatalogIO). So if in
>> SQL we want to build abstractions on top of these IOs we risk having to
>> bundle the whole IOs or the leaked dependencies. Overall we can probably
>> avoid it by making the IOs `provided` dependencies, and by refactoring the
>> code that leaks. In this case things can be made to build, simple tests
>> will run, and we won't need to bundle the IOs within SQL.
>>
>> But as soon as there's a need to actually work with multiple IOs at the
>> same time the conflicts appear. For example, for testing of Hive/HCatalog
>> IOs in SQL we need to create an embedded Hive Metastore instance. It is a
>> very Hive-specific thing that requires its own dependencies that have to be
>> loaded during testing as part of SQL project. And some other IOs (e.g.
>> KafkaIO) can bring similar but conflicting dependencies which means that we
>> cannot easily work with or test both IOs at the same time within SQL. I
>> think it will become insane as number of IOs supported in SQL grows.
>>
>> So the question is how to avoid conflicts between IOs within SQL?
>>
>> One approach is to create separate packages for each of the SQL-specific
>> IO wrappers, e.g. `beam-sdks-java-extensions-sql-hcatalog`, 
>> `beam-sdks-java-extensions-sql-kafka`,
>> etc. These projects will compile-depend on Beam SQL and on specific IO.
>> Beam SQL will load these either from user-specified configuration or
>> something like @AutoService at runtime. This way Beam SQL doesn't know
>> about the details of the IOs and their dependencies, and they can be easily
>> tested in isolation without conflicting with each other. This should also
>> be relatively simple to manage if things change, the build logic should be
>> straightforward and easy to update. On the negative side, each of the
>> projects will require its own separate build logic, it will not be easy to
>> test multiple IOs together within SQL, and users will have to manage the
>> conflicting dependencies by themselves.
>>
>> Another approach is to keep things roughly as they are but create
>> separate configurations within the main `build.gradle` in SQL project,
>> where configurations will correspond to separate IOs or use cases (e.g.
>> testing of Hive-related IOs). The benefit is that everything related to SQL
>> IOs stays roughly in one place (including build logic) and can be built and
>> tested together when possible. Negative side is that it will probably
>> involve some groovy magic and classpath manipulation within Gradle tasks to
>> make the configurations work, plus it may be brittle if we change our
>> top-level Beam build logic. And this approach also doesn't make it easier
>> for the users to manage the conflicts.
>>
>> Longer term we could probably also reduce the abstraction thickness on
>> top of the IOs, so that Beam SQL can work directly

Re: Hazelcast Jet Runner

2019-02-15 Thread Kenneth Knowles
Elaborating on what Robert alluded to: when I wrote that runner author
guide, portability was in its infancy. Now Beam Python can be run on Flink.
So that guide is primarily focused on the "deserialize a Java DoFn and call
its methods" approach. A decent amount of it is still really important to
know, but is now the responsibility of the "SDK harness", aka
language-specific coprocessor. For Python & Go &  you really want to use the portability protos and the portable Flink
runner is the best model.

Kenn


On Fri, Feb 15, 2019 at 2:08 AM Robert Bradshaw  wrote:

> On Fri, Feb 15, 2019 at 7:36 AM Can Gencer  wrote:
> >
> > We at Hazelcast are looking into writing a Beam runner for Hazelcast Jet
> (https://github.com/hazelcast/hazelcast-jet). I wanted to introduce
> myself as we'll likely have questions as we start development.
>
> Welcome!
>
> Hazelcast looks interesting, a Beam runner for it would be very cool.
>
> > Some of the things I'm wondering about currently:
> >
> > * Currently there seems to be a guide available at
> https://beam.apache.org/contribute/runner-guide/ , is this up to date? Is
> there anything in specific to be aware of when starting with a new runner
> that's not covered here?
>
> That looks like a pretty good starting point. At a quick glance, I
> don't see anything that looks out of date. Another resource that might
> be helpful is a talk from last year on writing an SDK (but as it
> mostly covers the runner-sdk interaction, it's also quite useful for
> understanding the runner side:
>
> https://docs.google.com/presentation/d/1Cso0XP9dmj77OD9Bd53C1M3W1sPJF0ZnA20gzb2BPhE/edit#slide=id.p
> And please feel free to ask any questions on this list as well; we'd
> be happy to help.
>
> > * Should we be targeting the latest master which is at 2.12-SNAPSHOT or
> a stable version?
>
> I would target the latest master.
>
> > * After a runner is developed, how is the maintenance typically handled,
> as the runners seems to be part of Beam codebase?
>
> Either is possible. Several runner adapters are part of the Beam
> codebase, but for example the IMB Streams Beam runner is not. There
> are certainly pros and cons (certainly early on when the APIs
> themselves were under heavy development it was easier to keep things
> in sync in the same codebase, but things have mostly stabilized now).
> A runner only becomes part of the Beam codebase if there are members
> of the community committed to maintaining it (which could include
> you). Both approaches are fine.
>
> - Robert
>