Re: [Proposal] Requesting PMC approval to start planning for Beam Summits 2019

2019-01-19 Thread Davor Bonaci
I'd say these matters are generally private between the organizer(s) and
the PMC. This thread should continue on the PMC-private mailing list.

On Fri, Jan 18, 2019 at 4:06 PM Ahmet Altay  wrote:

> Thank you Joana.
>
> Kenn and PMC members could you comment on what needs to be done to move
> this forward?
>
> On Thu, Jan 17, 2019 at 3:40 PM joanafil...@google.com <
> joanafil...@google.com> wrote:
>
>> Dear Project Management Committee,
>>
>>
>> The Beam Summits are community events funded by a Sponsoring Committee
>> and organized by an Organizing Committee. I’d like to get the following
>> approvals:
>>
>> To organize and host the Summits under the name of Beam Summit 
>> , i.e. Beam Summit North America 2019, Beam Summit Europe 2019 and
>> Beam Summit Asia 2019.
>>
>> To form organizing committees for each edition
>>
>> Approval to host each edition on the following dates and locations:
>>
>> - Beam Summit North America, on April 3rd, San Francisco, CA. (150
>> attendees)
>>
>> - Beam Summit Europe, on June 19th, Berlin, Germany. (150 attendees)
>>
>> - Beam Summit Asia, October (exact date tbc), Tokyo, Japan. (150
>> attendees)
>>
>> The events will provide educational content selected by the Organizing
>> Committee, and will be a not-for-profit event, however, we might charge a
>> fee to support the organization and logistics costs. This matter will be
>> decided by the Organizing Committee and will be brought back to the PMC if
>> needed. The events will be advertised on various channels, including the
>> Apache Beam’s and Summit sponsor’s social media accounts.
>>
>>
>> The Organizing Committee will acknowledge the Apache Software
>> Foundation's ownership of the Apache Beam trademark and will add
>> attribution required by the foundation’s policy on all marketing channels.
>> The Apache Beam branding will be used in accordance with the foundation’s
>> trademark and events policies specifically as outlined in Third Party Event
>> Branding Policy. The Organizing Committee does not request the ASF to
>> become a Community Partner of the event.
>>
>>
>> Please feel free to request further information if needed.
>>
>>
>> Kind Regards,
>>
>> Joana Carrasqueira
>>
>


Re: [Proposal] Requesting PMC approval to start planning for Beam Summits 2019

2019-01-20 Thread Davor Bonaci
I'm sorry, but certain things require special care. General organizing
discussions can proceed on dev@, excluding this specific one.

On Sun, Jan 20, 2019 at 8:43 PM Gris Cuevas  wrote:

>
>
> On 2019/01/19 16:58:52, Davor Bonaci  wrote:
> > I'd say these matters are generally private between the organizer(s) and
> > the PMC. This thread should continue on the PMC-private mailing list.
>
> Last time we checked (back in October), people who are not part of the PMC
> couldn't send emails to the private mailing list nor read the responses.
> The organization of the last London Summit occurred in the private list
> with me and Matthias directly cc'ed in our individual emails. Additionally
> a few of the other organizers didn't get visibility on how decisions were
> formed and this obstructed the planning.
>
> I'd suggest we continue with the initial discussion here in order to offer
> transparency, if a sensitive matter needs to be discussed, we can call it
> out here and set context that a specific conversation will be moved to
> private, bringing the decision made back to this thread.
>
> Also, I'd recommend we keep the discussion focused on the main matter:
> Summit planning. If you have a proposal to revise what communications need
> to be had in specific channels, you can start the conversation in a new
> separate thread, I see a lot of value in having it.
>
> @PMC members, we'd love to get an outline of what needs to be done to
> continue with the planning. We have a few people interested in helping out
> and we'd love to keep the momentum going.
>
> >
> > On Fri, Jan 18, 2019 at 4:06 PM Ahmet Altay  wrote:
> >
> > > Thank you Joana.
> > >
> > > Kenn and PMC members could you comment on what needs to be done to move
> > > this forward?
> > >
> > > On Thu, Jan 17, 2019 at 3:40 PM joanafil...@google.com <
> > > joanafil...@google.com> wrote:
> > >
> > >> Dear Project Management Committee,
> > >>
> > >>
> > >> The Beam Summits are community events funded by a Sponsoring Committee
> > >> and organized by an Organizing Committee. I’d like to get the
> following
> > >> approvals:
> > >>
> > >> To organize and host the Summits under the name of Beam Summit
> 
> > >> , i.e. Beam Summit North America 2019, Beam Summit Europe 2019
> and
> > >> Beam Summit Asia 2019.
> > >>
> > >> To form organizing committees for each edition
> > >>
> > >> Approval to host each edition on the following dates and locations:
> > >>
> > >> - Beam Summit North America, on April 3rd, San Francisco, CA. (150
> > >> attendees)
> > >>
> > >> - Beam Summit Europe, on June 19th, Berlin, Germany. (150 attendees)
> > >>
> > >> - Beam Summit Asia, October (exact date tbc), Tokyo, Japan. (150
> > >> attendees)
> > >>
> > >> The events will provide educational content selected by the Organizing
> > >> Committee, and will be a not-for-profit event, however, we might
> charge a
> > >> fee to support the organization and logistics costs. This matter will
> be
> > >> decided by the Organizing Committee and will be brought back to the
> PMC if
> > >> needed. The events will be advertised on various channels, including
> the
> > >> Apache Beam’s and Summit sponsor’s social media accounts.
> > >>
> > >>
> > >> The Organizing Committee will acknowledge the Apache Software
> > >> Foundation's ownership of the Apache Beam trademark and will add
> > >> attribution required by the foundation’s policy on all marketing
> channels.
> > >> The Apache Beam branding will be used in accordance with the
> foundation’s
> > >> trademark and events policies specifically as outlined in Third Party
> Event
> > >> Branding Policy. The Organizing Committee does not request the ASF to
> > >> become a Community Partner of the event.
> > >>
> > >>
> > >> Please feel free to request further information if needed.
> > >>
> > >>
> > >> Kind Regards,
> > >>
> > >> Joana Carrasqueira
> > >>
> > >
> >
>


A personal update

2017-12-12 Thread Davor Bonaci
My dear friends,
As many of you have noticed, I’ve been visibly absent from the project for
a little while. During this time, a great number of you kept reaching out,
and for that I’m deeply humbled and grateful to each and every one of you.

I needed some time for personal reflection, which led to a transition in my
professional life. As things have settled, I’m happy to again be working
among all of you, as we propel this project forward. I plan to be active in
the future, but perhaps not quite full-time as I was before.

In the near term, I’m working on getting the report to the Board completed,
as well as framing the discussion about the project state and vision going
forwards. Additionally, I’ll make sure that we foster healthy community
culture and operate in the Apache Way.

For those who are curious, I’m happy to say that I’m starting a company
building products related to Beam, along with several other members of this
community and authors of this technology. I’ll share more on this next
year, but until then if you have a data processing problem or an Apache
Beam question, I’d love to hear from you ;-).

Thanks -- and so happy to be back!

Davor


Report to the Board, December 2017 edition

2017-12-12 Thread Davor Bonaci
We are expected to submit a project report to the ASF Board of Directors
ahead of its next meeting. The report is due on Wednesday, 12/13.

If interested, please take a look at the draft [1], and comment or
contribute content, as appropriate. I'll submit the report sometime in the
next 24 hours.

Thanks!

Davor

[1]
https://docs.google.com/document/d/17AFM3iLXH8YyUc7FiYZS5YFIOHymFWbbEJ5Ea2ySnkA/


Re: Report to the Board, December 2017 edition

2017-12-13 Thread Davor Bonaci
Thanks everyone for your comments and feedback. The report is now submitted.

On Wed, Dec 13, 2017 at 2:06 PM, Kenneth Knowles  wrote:

> LGTM. Thanks for your comment raising the point about adding the release
> via reporter.apache.org. It helps raise awareness of this step to the
> whole PMC so we can raise the issue sooner, during the release. We should
> also move it to the correct place in the release guide.
>
> On Wed, Dec 13, 2017 at 2:00 PM, Lukasz Cwik  wrote:
>
>> LGTM as well
>>
>> On Wed, Dec 13, 2017 at 1:58 PM, Griselda Cuevas  wrote:
>>
>>> LGTM thanks for sharing.
>>>
>>>
>>>
>>> On 12 December 2017 at 20:54, Jean-Baptiste Onofré 
>>> wrote:
>>>
>>>> It looks good to me.
>>>>
>>>> Thanks !
>>>> Regards
>>>> JB
>>>>
>>>>
>>>> On 12/13/2017 05:51 AM, Davor Bonaci wrote:
>>>>
>>>>> We are expected to submit a project report to the ASF Board of
>>>>> Directors ahead of its next meeting. The report is due on Wednesday, 
>>>>> 12/13.
>>>>>
>>>>> If interested, please take a look at the draft [1], and comment or
>>>>> contribute content, as appropriate. I'll submit the report sometime in the
>>>>> next 24 hours.
>>>>>
>>>>> Thanks!
>>>>>
>>>>> Davor
>>>>>
>>>>> [1] https://docs.google.com/document/d/17AFM3iLXH8YyUc7FiYZS5YFI
>>>>> OHymFWbbEJ5Ea2ySnkA/
>>>>>
>>>>
>>>> --
>>>> Jean-Baptiste Onofré
>>>> jbono...@apache.org
>>>> http://blog.nanthrax.net
>>>> Talend - http://www.talend.com
>>>>
>>>
>>>
>>
>


Re: Euphoria Java 8 DSL - proposal

2017-12-17 Thread Davor Bonaci
Hi David,
As JB noted, merging of these two projects is a great idea. If fact, some
of us have had those discussions in the past.

Legally, nothing particular is strictly necessary as the code seem to
already be Apache 2.0 licensed. We don't, however, want to be perceived as
making hostile forks, so it would be great to file a Software Grant
Agreement with the ASF Secretary. I can help with the process, as necessary.

Project alignment-wise, there aren't any particular blockers that I am
aware of. We welcome DSLs.

Technically, the code would start in a feature branch. During this stage,
we'd need to validate a few things, including confirmation the code and
dependencies match the ASF policy, automate testing in Beam's tooling, etc.
At that point, we'd take a community vote to accept the component into
master, and consider author(s) for committership in the overall project.

Welcome to the ASF and Beam -- we are thrilled to have you! Hope this
helps, and please reach out if anybody on our end can help, including JB or
myself.

Davor


On Sun, Dec 17, 2017 at 10:13 AM, Jean-Baptiste Onofré 
wrote:

> Hi David,
>
> Generally speaking, having different fluent DSL on top of the Beam SDK is
> great.
>
> I would like to take a look on your wordcount examples to give you a
> complete feedback. I like the idea and a fluent Java DSL is valuable.
>
> Let's wait feedback from others. If we have a consensus, then I would be
> more than happy to help you for the donation (I worked on the Camel Java
> DSL while ago, so I have some experience here).
>
> Thanks !
> Regards
> JB
>
> On 12/17/2017 07:00 PM, David Morávek wrote:
>
>> Hello,
>>
>>
>> First of all, thanks for the amazing work the Apache Beam community is
>> doing!
>>
>>
>> In 2014, we've started development of the runtime independent Java 8 API,
>> that helps us to create unified big-data processing flows. It has been used
>> as a core building block of Seznam.cz web crawler data infrastructure every
>> since. Its design principles and execution model are very similar to Apache
>> Beam.
>>
>>
>> This API was open sourced in 2016, under the name Euphoria API:
>>
>> https://github.com/seznam/euphoria
>>
>>
>> As it is very similar to Apache Beam, we feel, that it is not worth of
>> duplicating effort in terms of development of new runtimes and fine-tuning
>> of current ones.
>>
>>
>> The main blocker for us to switch to Apache Beam is lack of the Java 8
>> API. *W*e propose the integration of Euphoria API into Apache Beam as a
>> Java 8 DSL, in order to share our effort with the community.
>>
>>
>> Simple example of the Euphoria API usage, can be found here:
>>
>> https://github.com/seznam/euphoria/tree/master/euphoria-exam
>> ples/src/main/java/cz/seznam/euphoria/examples/wordcount
>>
>>
>> If you feel, that Beam community could leverage from our work, we would
>> love to start working on Euphoria integration into Apache Beam (we already
>> have a working POC, with few basic operators implemented).
>>
>>
>> I look forward to hearing from you,
>>
>> David
>>
>>
> --
> Jean-Baptiste Onofré
> jbono...@apache.org
> http://blog.nanthrax.net
> Talend - http://www.talend.com
>


Re: Happy new year

2018-01-01 Thread Davor Bonaci
Hi everyone --
As we begin the new year, I wanted to send the best wishes in 2018 to
everyone in the Beam community -- users, contributors and observers alike!

There's so much to be proud of in 2017; including graduation to a top-level
project and the availability of the first stable release. Thanks to
everyone for making this possible!

Finally, I'd also like to pass along some fun facts compiled by others [1].
Beam mailing lists had the 9th highest volume among all user@+dev@ lists.
Our very own, Jean-Baptiste Onofre, has once again finished in the top 5
committers across all projects in the Apache Software Foundation. This
year, JB finished as #3, with 2,142 commits, among 6,504 committers.
Congrats JB!

Happy New Year -- and I hope to see you out and about in the next few
months!

Davor

[1] https://blogs.apache.org/foundation/entry/apache-in-2017-by-the

On Mon, Jan 1, 2018 at 8:41 AM, Jesse Anderson 
wrote:

> Happy New Year!
>
> On Sun, Dec 31, 2017, 11:09 PM Jean-Baptiste Onofré 
> wrote:
>
>> Hi beamers,
>>
>> I wish you a great and happy new year !
>>
>> Regards
>> JB
>> --
>> Jean-Baptiste Onofré
>> jbono...@apache.org
>> http://blog.nanthrax.net
>> Talend - http://www.talend.com
>>
>


[DISCUSS] State of the project

2018-01-12 Thread Davor Bonaci
Hi everyone --
Apache Beam was established as a top-level project a year ago (on December
21, to be exact). This first anniversary is a great opportunity for us to
look back at the past year, celebrate its successes, learn from any
mistakes we have made, and plan for the next 1+ years.

I’d like to invite everyone in the community, particularly users and
observers on this mailing list, to participate in this discussion. Apache
Beam is your project and I, for one, would much appreciate your candid
thoughts and comments. Just as some other projects do, I’d like to make
this “state of the project” discussion an annual tradition in this
community.

In terms of successes, the availability of the first stable release,
version 2.0.0, was the biggest and most important milestone last year.
Additionally, we have expanded the project’s breadth with new components,
including several new runners, SDKs, and DSLs, and interconnected a large
number of storage/messaging systems with new Beam IOs. In terms of
community growth, crossing 200 lifetime individual contributors and
achieving 76 contributors to a single release were other highlights. We
have doubled the number of committers, and invited a handful of new PMC
members. Thanks to each and every one of you for making all of this
possible in our first year.

On the other hand, in such a young project as Beam, there are naturally
many areas for improvement. This is the principal purpose of this thread
(and any of its forks). To organize the separate discussions, I’d suggest
to fork separate threads for different discussion areas:
* Culture and governance (anything related to people and their behavior)
* Community growth (what can we do to further grow a diverse and vibrant
community)
* Technical execution (anything related to releases, their frequency,
website, infrastructure)
* Feature roadmap for 2018 (what can we do to make the project more
attractive to users, Beam 3.0, etc.).

I know many passionate folks who particularly care about each of these
areas, but let me call on some folks from the community to get things
started: Ismael for culture, Gris for community, JB for technical
execution, and Ben for feature roadmap.

Perhaps we can use this thread to discuss project-wide vision. To seed that
discussion, I’d start somewhat provocatively -- we aren’t doing so well on
the diversity of users across runners, which is very important to the
realization of the project’s vision. Would you agree, and would you be
willing to make it the project’s #1 priority for the next 1-2 years?

Thanks -- and please join us in what would hopefully be a productive and
informative discussion that shapes the future of this project!

Davor


Fwd: Google Summer of Code 2018 is coming

2018-01-21 Thread Davor Bonaci
This is a great way to grow the community. If you are interested in
mentoring students contributing to Beam, please apply!

-- Forwarded message --
From: Ulrich Stärk 
Date: Sun, Jan 21, 2018 at 1:22 PM
Subject: Google Summer of Code 2018 is coming
To: ment...@community.apache.org


Hello PMCs (incubator Mentors, please forward this email to your podlings),

Google Summer of Code [1] is a program sponsored by Google allowing
students to spend their summer
working on open source software. Students will receive stipends for
developing open source software
full-time for three months. Projects will provide mentoring and project
ideas, and in return have
the chance to get new code developed and - most importantly - to identify
and bring in new committers.

The ASF will apply as a participating organization meaning individual
projects don't have to apply
separately.

If you want to participate with your project we ask you to do the following
things as soon as
possible but please no later than 2017-01-30:

1. understand what it means to be a mentor [2].

2. record your project ideas.

Just create issues in JIRA, label them with gsoc2018, and they will show up
at [3]. Please be as
specific as possible when describing your idea. Include the programming
language, the tools and
skills required, but try not to scare potential students away. They are
supposed to learn what's
required before the program starts.

Use labels, e.g. for the programming language (java, c, c++, erlang,
python, brainfuck, ...) or
technology area (cloud, xml, web, foo, bar, ...) and record them at [5].

Please use the COMDEV JIRA project for recording your ideas if your project
doesn't use JIRA (e.g.
httpd, ooo). Contact d...@community.apache.org if you need assistance.

[4] contains some additional information (will be updated for 2017 shortly).

3. subscribe to ment...@community.apache.org; restricted to potential
mentors, meant to be used as a
private list - general discussions on the public d...@community.apache.org
list as much as possible
please). Use a recognized address when subscribing (@apache.org or one of
your alias addresses on
record).

Note that the ASF isn't accepted as a participating organization yet,
nevertheless you *have to*
start recording your ideas now or we will not get accepted.

Over the years we were able to complete hundreds of projects successfully.
Some of our prior
students are active contributors now! Let's make this year a success again!

Cheers,

Uli

[1] https://summerofcode.withgoogle.com/
[2] http://community.apache.org/guide-to-being-a-mentor.html
[3] http://s.apache.org/gsoc2018ideas
[4] http://community.apache.org/gsoc.html


Re: Euphoria Java 8 DSL - proposal

2018-02-18 Thread Davor Bonaci
 straightforward path to
>>  do it).
>>
>>  If I understand well if this gets merged into
>> Apache this
>>  means that
>>  Euphoria's current implementation would be
>> superseded by
>>  this DSL? I
>>  am curious because I would like to
>> understand your
>> level of
>>  investment
>>  on supporting the future of this DSL.
>>
>>  Thanks and congrats again !
>>  Ismaël
>>
>>  On Mon, Dec 18, 2017 at 10:12 AM,
>> Jean-Baptiste Onofré
>>  mailto:j...@nanthrax.net>
>> <mailto:j...@nanthrax.net <mailto:j...@nanthrax.net>>> wrote:
>>
>>  Depending of the donation, you would
>> need ICLA
>> for each
>>  contributor, and
>>  CCLA in addition of SGA.
>>
>>  We can sync with Davor and I for the
>> legal stuff.
>>  However, I would wait a little bit just
>> to have
>> feedback
>>  from the whole team
>>  and start a formal vote.
>>
>>  I would be happy to start the formal
>> vote.
>>
>>  Regards
>>  JB
>>
>>  On 12/18/2017 10:03 AM, David Morávek
>> wrote:
>>
>>  Hello,
>>
>>  Thanks for the awesome feedback!
>>
>>  Romain:
>>
>>  We already use Java Stream API in
>> all operators
>>  where it makes sense (eg.:
>>  ReduceByKey). Still not sure if it
>> was a good
>>  choice, but i can be easily
>>  converted to iterator anyway.
>>
>>  Side outputs support is coming soon,
>> we
>> already made
>>  an initial work on
>>  this.
>>
>>  Side inputs are not supported in a
>> way you
>> are used
>>  to from beam, because
>>  it can be replaced by Join operator
>> on the
>> same key
>>  (if annotated with
>>  broadcastHashJoin, it will be turned
>> into
>> map side
>>  join).
>>
>>  Only significant difference from
>> Beam is,
>> that we
>>  decided not to abstract
>>  serialization, so we need to add
>> support
>> for Type
>>  Hints, because of type
>>  erasure.
>>
>>  Fluent API:
>>
>>  API is fluent within one operator.
>> It is
>> designed to
>>  "lead the
>>  programmer", which means, that he
>> we'll be only
>>  offered methods that makes
>>  sense after the last method he used
>> (eg.: in
>>  ReduceByKey, we know that after
>>  keyBy either reduceBy method should
>> come).
>> It is
>>  implemented as a series of
>>  builders.
>>
>>  Davor:
>>
>>  Thanks, I'll contact you, and will
>> start
>> the

Re: [YouTube channel] Add video: Apache Beam meetup London 2: use case in finance + IO in Beam and Splittable DoFns

2018-02-23 Thread Davor Bonaci
(That's not the question for Matthias, it is for the PMC. Thankfully, JB
volunteered to help formalize things.)

On Fri, Feb 23, 2018 at 7:56 AM, Kenneth Knowles  wrote:

> +1 to the proposal, and same question as Gris.
>
> On Wed, Feb 21, 2018 at 9:17 AM, Griselda Cuevas  wrote:
>
>> +1 to the proposal
>>
>> @Matthias could you share more about the dynamics in how people in the
>> community would be able to add videos and who would be owning the curation
>> of the channel?
>>
>> G
>>
>>
>>
>> On 21 February 2018 at 09:08, Matthias Baetens <
>> matthias.baet...@datatonic.com> wrote:
>>
>>> Hi Gaurav,
>>>
>>> Thanks for pointing that out. The channel can be found here
>>> .
>>> No videos there yet though, waiting for approval to publish the first
>>> one!
>>>
>>> Cheers,
>>> Matthias
>>>
>>> On Wed, Feb 21, 2018 at 12:55 AM, Gaurav Thakur 
>>> wrote:
>>>
 How can people get access to the channel?

 Thanks, Gaurav

 On Wed, Feb 21, 2018 at 1:34 PM, Matthias Baetens <
 matthias.baet...@datatonic.com> wrote:

> Hi all,
>
> This is a proposal to launch the Apache Beam YouTube channel and at
> the same time add the first video to the channel.
>
> We would like to use the channel to centralize all videos / recordings
> related to Apache Beam and make it a community driven channel, so people
> have a one stop shop for learnings about Beam.
>
> The first video would be the recording of the second Beam meetup in
> London:
> Apache Beam meetup London 2: use case in finance + IO in Beam and
> Splittable DoFns. The video can be seen on the channel if you have login
> details (it is currently set to private).
>
> Please let me know if there are any questions or comments!
>
> Best regards,
> Matthias
>


>>>
>>>
>>> --
>>>
>>>
>>> *Matthias Baetens*
>>>
>>>
>>> *datatonic | data power unleashed*
>>>
>>> office +44 203 668 3680 <+44%2020%203668%203680>  |  mobile +44 74 918
>>> 20646
>>>
>>> Level24 | 1 Canada Square | Canary Wharf | E14 5AB London
>>> 
>>>
>>>
>>> We've been announced
>>> 
>>>  as
>>> one of the top global Google Cloud Machine Learning partners.
>>>
>>
>>
>


Re: Euphoria Java 8 DSL - proposal

2018-02-27 Thread Davor Bonaci
(Sounds good, thanks! We'll follow-up there.)

On Tue, Feb 27, 2018 at 10:49 AM, David Morávek 
wrote:

> Hi Davor,
>
> sorry for the delay, we were blocked by our legal department. I've send
> both SGA and CCLA to priv...@apache.beam.org, please let me know if you
> need anything else.
>
> Regards,
> David
>
> On Mon, Feb 19, 2018 at 6:13 AM, Jean-Baptiste Onofré 
> wrote:
>
>> Hi Davor,
>>
>> We still have some discussion/paperwork on Euphoria side (SGA, ...).
>>
>> So, it's on track but it takes a little more time than expected.
>>
>> Regards
>> JB
>>
>> On 02/19/2018 05:40 AM, Davor Bonaci wrote:
>> > I may have missed things, but any update on the progress of this
>> donation?
>> >
>> > On Tue, Jan 2, 2018 at 10:52 PM, Jean-Baptiste Onofré > > <mailto:j...@nanthrax.net>> wrote:
>> >
>> > Great !
>> >
>> > Thanks !
>> > Regards
>> > JB
>> >
>> > On 01/03/2018 07:29 AM, David Morávek wrote:
>> >
>> > Hello JB,
>> >
>> > Perfect! I'm already on the Beam Slack workspace, I'll contact
>> you once
>> > I get to the office.
>> >
>> > Thanks!
>> > D.
>> >
>> > On Wed, Jan 3, 2018 at 6:19 AM, Jean-Baptiste Onofré <
>> j...@nanthrax.net
>> > <mailto:j...@nanthrax.net> <mailto:j...@nanthrax.net
>> > <mailto:j...@nanthrax.net>>> wrote:
>> >
>> > Hi David,
>> >
>> > absolutely !! Let's move forward on the preparation steps.
>> >
>> > Are you on Slack and/or hangout to plan this ?
>> >
>> > Thanks,
>> > Regards
>> > JB
>> >
>> > On 01/02/2018 05:35 PM, David Morávek wrote:
>> >
>> > Hello JB,
>> >
>> > can we help in any way to move things forward?
>> >
>> > Thanks,
>> > D.
>> >
>> > On Mon, Dec 18, 2017 at 4:28 PM, Jean-Baptiste Onofré
>> > mailto:j...@nanthrax.net>
>> > <mailto:j...@nanthrax.net <mailto:j...@nanthrax.net>>
>> > <mailto:j...@nanthrax.net <mailto:j...@nanthrax.net>
>> > <mailto:j...@nanthrax.net <mailto:j...@nanthrax.net>>>>
>> wrote:
>> >
>> >  Thanks Jan,
>> >
>> >  It makes sense.
>> >
>> >  Let me take a look on the code to understand the
>> "interaction".
>> >
>> >  Regards
>> >  JB
>> >
>> >
>> >  On 12/18/2017 04:26 PM, Jan Lukavský wrote:
>> >
>> >  Hi JB,
>> >
>> >  basically you are not wrong. The project
>> started about
>> > three or
>> > four
>> >  years ago with a goal to unify batch and
>> streaming
>> > processing into
>> >  single portable, executor independent API.
>> Because of
>> > that, it is
>> >  currently "close" to Beam in this sense. But
>> we don't
>> > see much
>> > added
>> >  value keeping this as a separate project, with
>> one of
>> > the key
>> >  differences to be the API (not the model
>> itself), so we
>> > would
>> > like to
>> >  focus on translation from Euphoria API to
>> Beam's SDK.
>> > That's why we
>> >  would like to see it as a DSL, so that it
>> would be
>> > possible to use
>> >  Euphoria API with Beam's runners as much
>> natively as
>> > possible.
>> >
>> >  I hope I didn't make the subject even more
>> unclear, if
>> > so, I'll
>> > be happy
>> >  to explain anything in more detail. :-)
&g

Re: The Go SDK got accidentally merged - options to deal with the pain

2018-03-08 Thread Davor Bonaci
I support leaving things as they stand now -- thanks for finding a good way
out of an uncomfortable situation.

That said, two things need to happen:
(1) SGA needs to be filed asap, per Board feedback in the last report, and
(2) releases cannot contain any code from the Go SDK before formally voted
on the new component and accepted. This includes source releases that are
created through "assembly", so manual exclusion in the configuration is
likely needed.

On Wed, Mar 7, 2018 at 1:54 PM, Kenneth Knowles  wrote:

> Re-reading the old thread, I see these desirata:
>
>  - "enough IO to write end-to-end examples such as WordCount and
> demonstrate what IOs would look like"
>  - "accounting and tracking the fact that each element has an associated
> window and timestamp"
>  - "test suites and test utilities"
>
> Browsing the code, it looks like these each exist to some level of
> completion.
>
> Kenn
>
>
> On Wed, Mar 7, 2018 at 1:38 PM Robert Bradshaw 
> wrote:
>
>> I was actually thinking along the same lines: what was yet lacking to
>> "officially" merge the Go branch in? The thread we started on this seems to
>> have fizzled out over the holidays, but windowing support is the only
>> must-have missing technical feature in my book (assuming documentation and
>> testing are, or are brought up to snuff).
>>
>>
>> On Wed, Mar 7, 2018 at 1:35 PM Henning Rohde  wrote:
>>
>>> One thought: the Go SDK is actually not that far away from satisfying
>>> the guidelines for merging to master anyway (as discussed here [1]). If
>>> we decide to simply leave the code in master -- which seems to be what this
>>> thread is leaning towards -- I'll gladly sign up to do the remaining
>>> aspects (I believe it's only windowing, validation tests and documentation)
>>> reasonably quickly to get to an official vote for accepting it and in turn
>>> get master into a sound state. It would seem like the path of least hassle.
>>> Of course, I'm happy to go with whatever the community is comfortable with
>>> -- just trying to make lemonade out of the merge lemon.
>>>
>>> Henning
>>>
>>> [1] https://lists.apache.org/thread.html/fd4201980d7a6e67248b1f183ee06b
>>> 0ff1305bd46f1291495679fc0a@%3Cdev.beam.apache.org%3E
>>>
>>> On Tue, Mar 6, 2018 at 3:40 PM, Kenneth Knowles  wrote:
>>>
 I think a very easy fix to unblock everyone is
 https://github.com/apache/beam/pull/4809. It just updates one line of
 a pom.


 On Tue, Mar 6, 2018 at 3:33 PM Robert Bradshaw 
 wrote:

> I'm not sure what value there is in preserving this accidental merge
> in history, but all options proposed seem fine to me. We should resolve
> this (or at least unblock other dev work) quickly though.
>
>
> On Tue, Mar 6, 2018 at 3:16 PM Kenneth Knowles  wrote:
>
>> My own vote is for leaving the history immutable, which is the case
>> for the full rollback or leaving it there disabled.
>>
>>
>> On Tue, Mar 6, 2018 at 3:01 PM Thomas Weise  wrote:
>>
>>> +1 for (1), assuming it is straightforward to exclude from the build
>>> and eventually will end up in master anyways.
>>>
>>> On Tue, Mar 6, 2018 at 2:59 PM, Robert Bradshaw >> > wrote:
>>>
 I would opt for (2), but I'm not sure who has permissions to do
 that. It should be easy to re-merge the couple of things that have 
 gone in
 since then.


 On Tue, Mar 6, 2018 at 2:43 PM Kenneth Knowles 
 wrote:

> Hi all,
>
> You may have noticed that our tests are red. A pull request that
> was meant for the Go SDK branch accidentally got merged onto the 
> master
> branch. Things have been merged to master since then.
>
> I've opened a revert at https://github.com/apache/beam/pull/4808
>
> The next time there is a master to go-sdk merge it will need to be
> re-reverted.
>
> Two other options are (1) leave it there and disable it in
> whatever way and (2) rebase dropping the commit and force push master
> (breaks all checkouts that are past it).
>
> Kenn
>

>>>
>>>


Re: slack @the-asf?

2018-03-14 Thread Davor Bonaci
Go for it!

Thanks Romain!

On Wed, Mar 14, 2018 at 1:35 AM, Romain Manni-Bucau 
wrote:

> Hi guys,
>
> What do you think to migrate to the standard asf slack? I would make it a
> bit more easy to find beam channel IMHO and it would stay consistent with
> others. It also allows to auto join for asf guys.
>
> If you think it is the way to go we can do:
>
> 1. put a message on current slack channel saying "we are moving to the-asf
> #beam, this channel will be closed on the XXX"
> 2. notify it on the general (current) channel
> 3. update the website
>
> Personally I think a transition period of 10 days is enough then channels
> can be archived.
>
> Wdyt?
>
> Side note: asf is in the process to try to get history on slack as well
> which would be beneficial for beam too.
>
> Romain Manni-Bucau
> @rmannibucau  |  Blog
>  | Old Blog
>  | Github
>  | LinkedIn
>  | Book
> 
>


Re: Board report - March '18

2018-03-15 Thread Davor Bonaci
Thanks JB for starting the report.

If interested, please take a look at the complete draft [1], and comment or
contribute content, as appropriate. I'll submit the report sometime in the
next 24 hours.

Thanks!

Davor

[1]
https://docs.google.com/document/d/1ngoI27CQ25TcxjJlyLBa6Qbj3t7inzf1_20UPN5hBiE/

On Sun, Mar 4, 2018 at 10:36 AM, Jean-Baptiste Onofré 
wrote:

> Hi guys,
>
> In order to help Davor, I started the template and draft for Board Report
> (March
> '18):
>
> https://docs.google.com/document/d/16VZSlG24wfkFfG2Jdou0B_AG5Z4I-
> sD4dw3nO1F3Lj8/edit?usp=sharing
>
> I will add more content. Feel free to do the same.
>
> Regards
> JB
> --
> Jean-Baptiste Onofré
> jbono...@apache.org
> http://blog.nanthrax.net
> Talend - http://www.talend.com
>


Re: Board report - March '18

2018-03-15 Thread Davor Bonaci
The report is now submitted. Thanks to everyone who provided comments and
feedback.

On Thu, Mar 15, 2018 at 11:41 AM, Lukasz Cwik  wrote:

> +1 I also took a pass over it.
>
>
> On Thu, Mar 15, 2018 at 9:29 AM Jean-Baptiste Onofré 
> wrote:
>
>> +1
>>
>> It looks good to me.
>>
>> Thanks !
>>
>> Regards
>> JB
>> Le 15 mars 2018, à 08:40, Davor Bonaci  a écrit:
>>>
>>> Thanks JB for starting the report.
>>>
>>> If interested, please take a look at the complete draft [1], and comment
>>> or contribute content, as appropriate. I'll submit the report sometime in
>>> the next 24 hours.
>>>
>>> Thanks!
>>>
>>> Davor
>>>
>>> [1]  https://docs.google.com/document/d/1ngoI27CQ25TcxjJlyLBa6Qbj3t7in
>>> zf1_20UPN5hBiE/
>>>
>>> On Sun, Mar 4, 2018 at 10:36 AM, Jean-Baptiste Onofré 
>>> wrote:
>>>
>>>> Hi guys,
>>>>
>>>> In order to help Davor, I started the template and draft for Board
>>>> Report (March
>>>> '18):
>>>>
>>>> https://docs.google.com/document/d/16VZSlG24wfkFfG2Jdou0B_AG5Z4I-
>>>> sD4dw3nO1F3Lj8/edit?usp=sharing
>>>>
>>>> I will add more content. Feel free to do the same.
>>>>
>>>> Regards
>>>> JB
>>>> --
>>>> Jean-Baptiste Onofré
>>>> jbono...@apache.org
>>>> http://blog.nanthrax.net
>>>> Talend - http://www.talend.com
>>>>
>>>
>>>


[ANNOUCEMENT] New Foundation members!

2018-03-30 Thread Davor Bonaci
Now that this is public... please join me in welcoming three newly elected
members of the Apache Software Foundation with ties to this community, who
were elected during the most recent Members' Meeting.

* Ismaël Mejía (Beam PMC)

* Josh Wills (Crunch Chair; Beam, DataFu PMC)

* Holden Karau (Spark, SystemML PMC; Mahout, Subversion committer; Beam
contributor)

These individuals demonstrated merit in Foundation's growth, evolution, and
progress. They were recognized, nominated, and elected by existing
membership for their significant impact to the Foundation as a whole, such
as the roots of project-related and cross-project activities.

As members, they now become legal owners and shareholders of the
Foundation. They can vote for the Board, incubate new projects, nominate
new members, participate in any PMC-private discussions, and contribute to
any project.

(For the Beam community, this election nearly doubles the number of
Foundation members. The new members are joining Jean-Baptiste Onofré,
Stephan Ewen, Romain Manni-Bucau and myself in this role.)

I'm happy to be able to call all three of you my fellow members.
Congratulations!

Davor


Re: [PROPOSAL] JIRA notification recipient changes

2018-04-18 Thread Davor Bonaci
An INFRA ticket is needed.

(As long as the report is automatically a watcher, the change is a strict
improvement. If not, it's a strict degradation.)

On Tue, Apr 17, 2018 at 1:55 PM, Kenneth Knowles  wrote:

> Hi all,
>
> Currently, every touch on a JIRA causes four notifications:
>
>  - all watchers
>  - current assignee
>  - reporter
>  - comm...@beam.apache.org
>
> I think there are redundancies and imperfections here. If you report a bug
> you cannot later unsubscribe from it! I may be disproportionately spammed
> here... I try to compensate with filters, but it is not easy to get the
> effect I want except by ignoring all JIRA.
>
> I propose:
>
>  - all watchers
>  - current assignee
>  - iss...@beam.apache.org (any ASF member or PMC chair can set up
> quickly, yes?)
>
> This way, you can report bugs and unwatch them. And if you do want the
> firehose of issues it is trivial to separate from commits.
>
> I think this goes through INFRA ticket or maybe any ASF member or PMC
> chair can do the whole thing?
>
> Kenn
>


Fwd: Apache Beam - jenkins question

2018-04-26 Thread Davor Bonaci
Hi Kamil --
Thanks for reaching out.

This is a great question for the dev@ mailing list. You may want to share a
little bit more why you need, how long, frequency of updates to the secret,
etc. for the community to be aware how things work.

Hopefully others on the mailing list can help you by manually putting the
necessary secret into the cloud settings related to the executors.

Davor

-- Forwarded message --
From: Kamil Szewczyk 
Date: Tue, Apr 24, 2018 at 12:21 PM
Subject: Apache Beam - jenkins question
To: da...@apache.org


Dear Davor

I sent you a message on asf slack, wasn't sure how can I reach you.

Anyway are you able to add secret (environment variable) to jenkins. ??
Or point me to a person that would be able to do that ?

Kind Regards
Kamil Szewczyk


Re: Apache Beam - jenkins question

2018-04-27 Thread Davor Bonaci
Jason, you should now have all the permissions needed. (You should,
however, evaluate whether this is a good place for it. Executors
themselves, for example, might be an alternative.)

On Fri, Apr 27, 2018 at 7:42 PM, Jason Kuster 
wrote:

> See https://github.com/apache/beam/blob/master/.test-infra/
> jenkins/common_job_properties.groovy#L119 for an example of this being
> done in practice to add the coveralls repo token as an environment variable.
>
> On Fri, Apr 27, 2018 at 12:41 PM Jason Kuster 
> wrote:
>
>> Hi Kamil, Davor,
>>
>> I think what you want is the Jenkins secrets feature (see
>> https://support.cloudbees.com/hc/en-us/articles/203802500-Injecting-
>> Secrets-into-Jenkins-Build-Jobs). Davor, I believe you are the only one
>> with enough karma on Jenkins to access the credentials UI; once the
>> credential is created in Jenkins it should be able to be set as an
>> environment variable through the Jenkins job configuration (groovy files in
>> $BEAM_ROOT/.test-infra/jenkins). Hope this helps.
>>
>> Jason
>>
>> On Thu, Apr 26, 2018 at 8:43 PM Davor Bonaci  wrote:
>>
>>> Hi Kamil --
>>> Thanks for reaching out.
>>>
>>> This is a great question for the dev@ mailing list. You may want to
>>> share a little bit more why you need, how long, frequency of updates to the
>>> secret, etc. for the community to be aware how things work.
>>>
>>> Hopefully others on the mailing list can help you by manually putting
>>> the necessary secret into the cloud settings related to the executors.
>>>
>>> Davor
>>>
>>> -- Forwarded message --
>>> From: Kamil Szewczyk 
>>> Date: Tue, Apr 24, 2018 at 12:21 PM
>>> Subject: Apache Beam - jenkins question
>>> To: da...@apache.org
>>>
>>>
>>> Dear Davor
>>>
>>> I sent you a message on asf slack, wasn't sure how can I reach you.
>>>
>>> Anyway are you able to add secret (environment variable) to jenkins. ??
>>> Or point me to a person that would be able to do that ?
>>>
>>> Kind Regards
>>> Kamil Szewczyk
>>>
>>>
>>
>> --
>> ---
>> Jason Kuster
>> Apache Beam / Google Cloud Dataflow
>>
>> See something? Say something. go/jasonkuster-feedback
>>
>
>
> --
> ---
> Jason Kuster
> Apache Beam / Google Cloud Dataflow
>
> See something? Say something. go/jasonkuster-feedback
>


Re: Graal instead of docker?

2018-05-11 Thread Davor Bonaci
This thread is extremely valuable. It poses hard questions. It strengthens
good arguments. It teaches a way of thinking. It gives feedback. I want to
thank Romain in particular for driving it, and everyone who has
participated thus far.

That being said, the exchange has crossed the line on behalf of multiple
actors. I request a pause of 72 hours from *everyone*. It will help to cool
down, and digest the conversation so far. In addition, the PMC would
appreciate that time to process things and potentially advise/steer the
conversation.

On Fri, May 11, 2018 at 12:42 PM, Kenneth Knowles  wrote:

> Romain,
>
> You probably did not mean to, but I think this message crosses outside the
> expected code of conduct.
>
> On Fri, May 11, 2018 at 11:48 AM Romain Manni-Bucau 
> wrote:
>
>>
>> Also beam community is java - dont answer it is python or go without
>> checking ;). Not sure adding a new language will help and give a face
>> people will like to contribute or use the project.
>>
>
> The Beam community includes contributors and users of the Java, Python,
> and Go SDKs. This remark denigrates the work of people building and using
> Beam in Python and Go. Please be careful in the words that you choose and,
> most importantly, please be open, empathetic, and welcoming to these
> members of our community.
>
> Kenn
>
>


Re: [VOTE] Go SDK

2018-05-21 Thread Davor Bonaci
+1 (binding), with the following caveats:

* [before closing the vote] Completion of IP clearance process, as we've
been requested. It is easier to do it than having to argue why it is not
necessary.
* [at any time, possibly later] Figuring out the release mechanics.

Great work; across the board!

On Mon, May 21, 2018 at 6:29 PM, Jason Kuster 
wrote:

> +1! So excited to have gotten to this point -- congrats to all. I've been
> excited to do some reviews of the Go SDK since becoming a committer; really
> happy about this.
>
> On Mon, May 21, 2018 at 6:03 PM Henning Rohde  wrote:
>
>> Hi everyone,
>>
>> Now that the remaining issues have been resolved as discussed, I'd like
>> to propose a formal vote on accepting the Go SDK into master. The main
>> practical difference is that the Go SDK would be part of the Apache Beam
>> release going forward.
>>
>> Highlights of the Go SDK:
>>  * Go user experience with natively-typed DoFns with (simulated) generic
>> types
>>  * Covers most of the Beam model: ParDo, GBK, CoGBK, Flatten, Combine,
>> Windowing, ..
>>  * Includes several IO connectors: Datastore, BigQuery, PubSub,
>> extensible textio.
>>  * Supports the portability framework for both batch and streaming,
>> notably the upcoming portable Flink runner
>>  * Supports a direct runner for small batch workloads and testing.
>>  * Includes pre-commit tests and post-commit integration tests.
>>
>> And last but not least
>>  *  includes contributions from several independent users and developers,
>> notably an IO connector for Datastore!
>>
>> Website: https://beam.apache.org/documentation/sdks/go/
>> Code: https://github.com/apache/beam/tree/master/sdks/go
>> Design: https://s.apache.org/beam-go-sdk-design-rfc
>>
>> Please vote:
>> [ ] +1, Approve that the Go SDK becomes an official part of Beam
>> [ ] -1, Do not approve (please provide specific comments)
>>
>> Thanks,
>>  The Gophers of Apache Beam
>>
>>
>>
>
> --
> ---
> Jason Kuster
> Apache Beam / Google Cloud Dataflow
>
> See something? Say something. go/jasonkuster-feedback
>


Re: [VOTE] Go SDK

2018-05-22 Thread Davor Bonaci
>
>   * Robert mentioned that "SGA should have probably already been filed"
> in the previous thread. I got the impression that nothing further was
> needed. I'll follow up.
>

Please just follow: http://incubator.apache.org/ip-clearance/. Simple.
Quick.

Perhaps relevant: I saw some golang license determinations as Category A
fly by earlier in the week. Reuse/quote anything already available.

  * The standard Go tooling basically always pulls directly from github, so
> there is no real urgency here.
>

No urgency. That said, we'll probably want a copy of whatever GitHub is
serving, to be served also by dist.apache.org (and considered as the source
of truth).

(Great work, again!)


Re: Missing copyright notices for shaded packages

2018-05-22 Thread Davor Bonaci
Thanks for the report!

Could you please comment more as to: (1) what artifacts are impacted and
where are they distributed, (2) the external dependency being distributed,
(3) license and/or term not adhered to, and (4) any proposed fix?

Any such information would be helpful in triaging the problem -- thanks so
much!

(If confirmed, this would be release blocking.)

On Tue, May 22, 2018 at 2:37 PM, Lukasz Cwik  wrote:

> Does it have to be part of the jar or is it good enough to be part of the
> sources jar (as 2.4.0 had it part of the beam-parent-2.4.0-source.zip
> 
> )?
>
> On Tue, May 22, 2018 at 11:16 AM Andrew Pilloud 
> wrote:
>
>> I was digging around in the SQL jar trying to debug some packaging issues
>> and noticed that we aren't including the copyright notices from the
>> packages we are shading. I also looked at our previously released jars and
>> they are the same (so this isn't a regression). Should we be including the
>> copyright notice from packages we are redistributing?
>>
>> Andrew
>>
>


Re: Missing copyright notices for shaded packages

2018-05-22 Thread Davor Bonaci
This analysis looks correct. Great find!

The recommended fix would be different. I'd suggest appending this sentence
to the end of the LICENSE file: "A part of several convenience binary
distributions of this software is licensed as follows", followed by the
full license text (including its copyright, clauses and disclaimer) -- for
each such case separately. Don't edit the NOTICE file.

I'd suggest keeping things simple: no per-artifact license/notice, etc.
Just two project-wide files, but I'd suggest including it/attaching it
"everywhere". Opinions on this part may vary, but, for me, "everywhere"
includes every jar file.

Standard disclaimers apply.

Any volunteers? Thanks so much!

On Tue, May 22, 2018 at 4:02 PM, Andrew Pilloud  wrote:

> Here is what I think might be missing:
>
> (1) what artifacts are impacted and where are they distributed
>
> http://central.maven.org/maven2/org/apache/beam/beam-
> sdks-java-core/2.4.0/beam-sdks-java-core-2.4.0.jar
> http://central.maven.org/maven2/org/apache/beam/beam-
> runners-direct-java/2.4.0/beam-runners-direct-java-2.4.0.jar
> http://central.maven.org/maven2/org/apache/beam/beam-
> sdks-java-harness/2.4.0/beam-sdks-java-harness-2.4.0.jar
> http://central.maven.org/maven2/org/apache/beam/beam-
> sdks-java-extensions-sql/2.4.0/beam-sdks-java-extensions-sql-2.4.0.jar
>
> (2) the external dependency being distributed
>
> beam-sdks-java-core: protobuf
> beam-runners-direct-java: protobuf
> beam-runners-direct-java: jsr-305
> beam-sdks-java-extensions-sql: janino-compiler
>
> (3) license and/or term not adhered to
>
> BSD 3 Clause: Redistributions in binary form must reproduce the above
> copyright notice, this list of conditions and the following disclaimer in
> the documentation and/or other materials provided with the distribution.
>
> (4) any proposed fix
>
> NOTICE file in the jar.
>
> I am not a lawyer, this is not legal advice.
>
> On Tue, May 22, 2018 at 2:55 PM Davor Bonaci  wrote:
>
>> Thanks for the report!
>>
>> Could you please comment more as to: (1) what artifacts are impacted and
>> where are they distributed, (2) the external dependency being distributed,
>> (3) license and/or term not adhered to, and (4) any proposed fix?
>>
>> Any such information would be helpful in triaging the problem -- thanks
>> so much!
>>
>> (If confirmed, this would be release blocking.)
>>
>> On Tue, May 22, 2018 at 2:37 PM, Lukasz Cwik  wrote:
>>
>>> Does it have to be part of the jar or is it good enough to be part of
>>> the sources jar (as 2.4.0 had it part of the
>>> beam-parent-2.4.0-source.zip
>>> <http://central.maven.org/maven2/org/apache/beam/beam-parent/2.4.0/beam-parent-2.4.0-source.zip>
>>> )?
>>>
>>> On Tue, May 22, 2018 at 11:16 AM Andrew Pilloud 
>>> wrote:
>>>
>>>> I was digging around in the SQL jar trying to debug some packaging
>>>> issues and noticed that we aren't including the copyright notices from the
>>>> packages we are shading. I also looked at our previously released jars and
>>>> they are the same (so this isn't a regression). Should we be including the
>>>> copyright notice from packages we are redistributing?
>>>>
>>>> Andrew
>>>>
>>>
>>


Re: [VOTE] Go SDK

2018-05-22 Thread Davor Bonaci
Always happy to help. I'm sure JB is as well, others too!

Please draft/collect any relevant data -- thanks!

On Tue, May 22, 2018 at 3:17 PM, Kenneth Knowles  wrote:

> The process has to be done by an officer or member. Can you help us with
> this, Davor?
>
> On Tue, May 22, 2018 at 3:14 PM Robert Bradshaw 
> wrote:
>
>> On Tue, May 22, 2018 at 2:42 PM Davor Bonaci  wrote:
>>
>> >>* Robert mentioned that "SGA should have probably already been
>> filed"
>> in the previous thread. I got the impression that nothing further was
>> needed. I'll follow up.
>>
>> > Please just follow: http://incubator.apache.org/ip-clearance/. Simple.
>> Quick.
>>
>> +1, let's put this question behind us.
>>
>> > Perhaps relevant: I saw some golang license determinations as Category A
>> fly by earlier in the week. Reuse/quote anything already available.
>>
>> >>* The standard Go tooling basically always pulls directly from
>> github,
>> so there is no real urgency here.
>>
>> > No urgency. That said, we'll probably want a copy of whatever GitHub is
>> serving, to be served also by dist.apache.org (and considered as the
>> source
>> of truth).
>>
>> Yes, we should continue mirroring $(wget
>> https://github.com/apache/beam/archive/release-${VERSION}.zip) there.
>>
>


Re: Missing copyright notices for shaded packages

2018-05-24 Thread Davor Bonaci
>
> jars generated off of current master contain no LICENSE, NOTICE, or other
> metadata
>

We should fix this. This may be an issue.

suggested approach of overapproximating
>>
>
The purpose of quoted sentence above ("A part of...") is to qualify when a
particular portion applies. It contains extra content in some cases, sure,
but such content is qualified that it may not apply / can be ignored. (More
specificity welcome.)

Of course, anybody is welcome to go above and beyond and maintain this
content per module, per distribution, per source jar, per test jar, per
source test jar, etc. This would be ideal, but I don't remember seeing
anybody doing it at this granularity. (Obviously, when shading, contents of
jar and source jar differ in this regard for the same module.) Also,
welcome to find a different balance, perhaps a more reasonable one.

Did you look through all our jars or is that just a sample?
>

Perhaps noteworthy: please don't generalize the fix to other circumstances
(if they are any, or if such come up). This fix applies to this type of
bundling of MIT-licensed and (some) BSD-licensed content *only*. There are
many quirks between different license combinations, which file to modify,
and in what way -- this space is quite deep. When in doubt, please ask here
to get the changes reviewed by senior folks (or referred to other
authoritative ASF resources).

A friendly nearby project recently has gone through public shaming because
they got this wrong. Thanks for working hard to avoid a situation like that
for us!


Re: [VOTE] Go SDK

2018-05-25 Thread Davor Bonaci
ETA: weekend.

On Fri, May 25, 2018 at 9:35 AM Henning Rohde  wrote:

> RESULT: the vote passed with only +1s! Thanks you all for the kind
> comments.
>
> The only pending item is the IP clearance form (draft:
> https://web.tresorit.com/l#nUkKlgi3cBYxYAOyhCMXIw
> ).
> Are there any ASF members who can help getting it recorded?
>
> Thanks,
>  Henning
>
>
> On Wed, May 23, 2018 at 2:45 PM Henning Rohde  wrote:
>
>> Thanks Davor! I filled out the form to the best of my ability and placed
>> it here (avoiding attachments on the list):
>>
>> https://web.tresorit.com/l#nUkKlgi3cBYxYAOyhCMXIw
>> 
>>
>> Please take a look and let me know if you need anything more from me.
>>
>> Thanks,
>>  Henning
>>
>> On Wed, May 23, 2018 at 8:51 AM Thomas Groh  wrote:
>>
>>> +1!
>>>
>>> I, for one, could not be more excited about our glorious portable future.
>>>
>>> On Mon, May 21, 2018 at 6:03 PM Henning Rohde 
>>> wrote:
>>>
 Hi everyone,

 Now that the remaining issues have been resolved as discussed, I'd like
 to propose a formal vote on accepting the Go SDK into master. The main
 practical difference is that the Go SDK would be part of the Apache Beam
 release going forward.

 Highlights of the Go SDK:
  * Go user experience with natively-typed DoFns with (simulated)
 generic types
  * Covers most of the Beam model: ParDo, GBK, CoGBK, Flatten, Combine,
 Windowing, ..
  * Includes several IO connectors: Datastore, BigQuery, PubSub,
 extensible textio.
  * Supports the portability framework for both batch and streaming,
 notably the upcoming portable Flink runner
  * Supports a direct runner for small batch workloads and testing.
  * Includes pre-commit tests and post-commit integration tests.

 And last but not least
  *  includes contributions from several independent users and
 developers, notably an IO connector for Datastore!

 Website: https://beam.apache.org/documentation/sdks/go/
 Code: https://github.com/apache/beam/tree/master/sdks/go
 Design: https://s.apache.org/beam-go-sdk-design-rfc

 Please vote:
 [ ] +1, Approve that the Go SDK becomes an official part of Beam
 [ ] -1, Do not approve (please provide specific comments)

 Thanks,
  The Gophers of Apache Beam





[ANNOUNCEMENT] New committers, May 2018 edition!

2018-05-31 Thread Davor Bonaci
Please join me and the rest of Beam PMC in welcoming the following
contributors as our newest committers. They have significantly contributed
to the project in different ways, and we look forward to many more
contributions in the future.

* Griselda Cuevas
* Pablo Estrada
* Jason Kuster

(Apologizes for a delayed announcement, and the lack of the usual paragraph
summarizing individual contributions.)

Congratulations to all three! Welcome!


Re: [VOTE] Go SDK

2018-05-31 Thread Davor Bonaci
The IP clearance document has been filed into Foundation records, and is
currently under review. No further action necessary, unless we hear back.

On Fri, May 25, 2018 at 10:31 AM, Henning Rohde  wrote:

> Thanks a lot, Davor! Much appreciated.
>
> Thanks,
>  Henning
>
> On Fri, May 25, 2018 at 10:26 AM Davor Bonaci  wrote:
>
>> ETA: weekend.
>>
>> On Fri, May 25, 2018 at 9:35 AM Henning Rohde  wrote:
>>
>>> RESULT: the vote passed with only +1s! Thanks you all for the kind
>>> comments.
>>>
>>> The only pending item is the IP clearance form (draft:
>>> https://web.tresorit.com/l#nUkKlgi3cBYxYAOyhCMXIw
>>> <https://www.google.com/url?q=https://web.tresorit.com/l%23nUkKlgi3cBYxYAOyhCMXIw&sa=D&source=hangouts&ust=1527197425211000&usg=AFQjCNH1eE-U8q-8PsgkiKFsSIfxz49lbw>).
>>> Are there any ASF members who can help getting it recorded?
>>>
>>> Thanks,
>>>  Henning
>>>
>>>
>>> On Wed, May 23, 2018 at 2:45 PM Henning Rohde 
>>> wrote:
>>>
>>>> Thanks Davor! I filled out the form to the best of my ability and
>>>> placed it here (avoiding attachments on the list):
>>>>
>>>> https://web.tresorit.com/l#nUkKlgi3cBYxYAOyhCMXIw
>>>> <https://www.google.com/url?q=https://web.tresorit.com/l%23nUkKlgi3cBYxYAOyhCMXIw&sa=D&source=hangouts&ust=1527197425211000&usg=AFQjCNH1eE-U8q-8PsgkiKFsSIfxz49lbw>
>>>>
>>>> Please take a look and let me know if you need anything more from me.
>>>>
>>>> Thanks,
>>>>  Henning
>>>>
>>>> On Wed, May 23, 2018 at 8:51 AM Thomas Groh  wrote:
>>>>
>>>>> +1!
>>>>>
>>>>> I, for one, could not be more excited about our glorious portable
>>>>> future.
>>>>>
>>>>> On Mon, May 21, 2018 at 6:03 PM Henning Rohde 
>>>>> wrote:
>>>>>
>>>>>> Hi everyone,
>>>>>>
>>>>>> Now that the remaining issues have been resolved as discussed, I'd
>>>>>> like to propose a formal vote on accepting the Go SDK into master. The 
>>>>>> main
>>>>>> practical difference is that the Go SDK would be part of the Apache Beam
>>>>>> release going forward.
>>>>>>
>>>>>> Highlights of the Go SDK:
>>>>>>  * Go user experience with natively-typed DoFns with (simulated)
>>>>>> generic types
>>>>>>  * Covers most of the Beam model: ParDo, GBK, CoGBK, Flatten,
>>>>>> Combine, Windowing, ..
>>>>>>  * Includes several IO connectors: Datastore, BigQuery, PubSub,
>>>>>> extensible textio.
>>>>>>  * Supports the portability framework for both batch and streaming,
>>>>>> notably the upcoming portable Flink runner
>>>>>>  * Supports a direct runner for small batch workloads and testing.
>>>>>>  * Includes pre-commit tests and post-commit integration tests.
>>>>>>
>>>>>> And last but not least
>>>>>>  *  includes contributions from several independent users and
>>>>>> developers, notably an IO connector for Datastore!
>>>>>>
>>>>>> Website: https://beam.apache.org/documentation/sdks/go/
>>>>>> Code: https://github.com/apache/beam/tree/master/sdks/go
>>>>>> Design: https://s.apache.org/beam-go-sdk-design-rfc
>>>>>>
>>>>>> Please vote:
>>>>>> [ ] +1, Approve that the Go SDK becomes an official part of Beam
>>>>>> [ ] -1, Do not approve (please provide specific comments)
>>>>>>
>>>>>> Thanks,
>>>>>>  The Gophers of Apache Beam
>>>>>>
>>>>>>
>>>>>>


Re: Beam Cookbook?

2018-06-09 Thread Davor Bonaci
Hi Austin --
It would be great to see this materialize.

I've been pursued by publishers a lot in the last year, so I might be able
to facilitate introductions if you need them. You'll probably want to have
a few sample draft chapters, however.

I'm aware of 3 different groups creating related content in this space, but
I think nobody is doing it from this angle. They are lurking on the mailing
list, so they may have reached out already.

All the best, and I wish you finish it with the same enthusiasm as you
started!

Davor

On Thu, Jun 7, 2018 at 11:44 AM, Austin Bennett  wrote:

> I'm looking at assembling a physical book along the lines of "Apache Beam
> Cookbook", though might take a different approach to topic (if realize
> there is a better hole to fill or something that needs more attention
> before that).
>
> I believe many could benefit from more substantive write-ups and
> explanations on use-cases, and specific bits in code (ex: to accomplish x
> you might want to use recipe Y, pay special attention to this function,
> with associated paragraphs of text and noting specific lines in the code,
> etc etc).  While this can be done on a website and GitHub, I do believe the
> more concrete nature of a book (esp. with reputable publisher) gives
> additional signaling to others that the subject is sufficiently mature.  My
> aim will be for this book to be freely available at least in an e-book
> version, an example of that I have is: https://www.confluent.io/
> resources/kafka-the-definitive-guide/ and surely you've come across other
> examples.
>
> I see many cookbook examples of code already exist, but the associated
> writeup I know could be useful to others; as well as the overall
> presentation/bundling to make it even easier to find and use.
>
> Wondering thoughts from the group and if there are others with a strong
> interest in collaborating on such an undertaking.
>
>
>


Re: [VOTE] Apache Beam, version 2.5.0, release candidate #2

2018-06-20 Thread Davor Bonaci
Please take a peek at LEGAL-288 [1], which I learned about recently in the
context of another project. Looks like an issue, requiring a new RC, but I
didn't have a chance to look closely.

Thanks.

Davor

[1] https://issues.apache.org/jira/browse/LEGAL-288

On Wed, Jun 20, 2018 at 9:17 AM, Pablo Estrada  wrote:

> +1 (binding)
>
> On Wed, Jun 20, 2018 at 9:08 AM Lukasz Cwik  wrote:
>
>> +1 (binding)
>>
>> On Tue, Jun 19, 2018 at 10:39 PM Jean-Baptiste Onofré 
>> wrote:
>>
>>> +1 (binding)
>>>
>>> Regards
>>> JB
>>>
>>> On 17/06/2018 07:18, Jean-Baptiste Onofré wrote:
>>> > Hi everyone,
>>> >
>>> > Please review and vote on the release candidate #2 for the version
>>> > 2.5.0, as follows:
>>> >
>>> > [ ] +1, Approve the release
>>> > [ ] -1, Do not approve the release (please provide specific comments)
>>> >
>>> > NB: this is the first release using Gradle, so don't be too harsh ;) A
>>> > PR about the release guide will follow thanks to this release.
>>> >
>>> > The complete staging area is available for your review, which includes:
>>> > * JIRA release notes [1],
>>> > * the official Apache source release to be deployed to dist.apache.org
>>> > [2], which is signed with the key with fingerprint C8282E76 [3],
>>> > * all artifacts to be deployed to the Maven Central Repository [4],
>>> > * source code tag "v2.5.0-RC2" [5],
>>> > * website pull request listing the release and publishing the API
>>> > reference manual [6].
>>> > * Java artifacts were built with Gradle 4.7 (wrapper) and
>>> OpenJDK/Oracle
>>> > JDK 1.8.0_172 (Oracle Corporation 25.172-b11).
>>> > * Python artifacts are deployed along with the source release to the
>>> > dist.apache.org [2].
>>> >
>>> > The vote will be open for at least 72 hours. It is adopted by majority
>>> > approval, with at least 3 PMC affirmative votes.
>>> >
>>> > Thanks,
>>> > JB
>>> >
>>> > [1]
>>> > https://issues.apache.org/jira/secure/ReleaseNote.jspa?
>>> projectId=12319527&version=12342847
>>> > [2] https://dist.apache.org/repos/dist/dev/beam/2.5.0/
>>> > [3] https://dist.apache.org/repos/dist/release/beam/KEYS
>>> > [4] https://repository.apache.org/content/repositories/
>>> orgapachebeam-1043/
>>> > [5] https://github.com/apache/beam/tree/v2.5.0-RC2
>>> > [6] https://github.com/apache/beam-site/pull/463
>>> >
>>>
>>> --
>>> Jean-Baptiste Onofré
>>> jbono...@apache.org
>>> http://blog.nanthrax.net
>>> Talend - http://www.talend.com
>>>
>> --
> Got feedback? go/pabloem-feedback
>


Re: [VOTE] Apache Beam, version 2.5.0, release candidate #2

2018-06-20 Thread Davor Bonaci
Sorry, no -1, at least not at this time. This is only a suspected issue.
Even the referenced issue is not a formal ruling, with some dissenting
opinions, and a programatic suggestion that most are comfortable with.
(Personally, I probably lean on the side that inclusion of this file is
fine.)

But, it is something that warrants a discussion...

On Wed, Jun 20, 2018 at 9:47 AM, Lukasz Cwik  wrote:

> Davor, please -1 the release if you believe LEGAL-288 applies.
>
> On Wed, Jun 20, 2018 at 9:37 AM Davor Bonaci  wrote:
>
>> Please take a peek at LEGAL-288 [1], which I learned about recently in
>> the context of another project. Looks like an issue, requiring a new RC,
>> but I didn't have a chance to look closely.
>>
>> Thanks.
>>
>> Davor
>>
>> [1] https://issues.apache.org/jira/browse/LEGAL-288
>>
>> On Wed, Jun 20, 2018 at 9:17 AM, Pablo Estrada 
>> wrote:
>>
>>> +1 (binding)
>>>
>>> On Wed, Jun 20, 2018 at 9:08 AM Lukasz Cwik  wrote:
>>>
>>>> +1 (binding)
>>>>
>>>> On Tue, Jun 19, 2018 at 10:39 PM Jean-Baptiste Onofré 
>>>> wrote:
>>>>
>>>>> +1 (binding)
>>>>>
>>>>> Regards
>>>>> JB
>>>>>
>>>>> On 17/06/2018 07:18, Jean-Baptiste Onofré wrote:
>>>>> > Hi everyone,
>>>>> >
>>>>> > Please review and vote on the release candidate #2 for the version
>>>>> > 2.5.0, as follows:
>>>>> >
>>>>> > [ ] +1, Approve the release
>>>>> > [ ] -1, Do not approve the release (please provide specific comments)
>>>>> >
>>>>> > NB: this is the first release using Gradle, so don't be too harsh ;)
>>>>> A
>>>>> > PR about the release guide will follow thanks to this release.
>>>>> >
>>>>> > The complete staging area is available for your review, which
>>>>> includes:
>>>>> > * JIRA release notes [1],
>>>>> > * the official Apache source release to be deployed to
>>>>> dist.apache.org
>>>>> > [2], which is signed with the key with fingerprint C8282E76 [3],
>>>>> > * all artifacts to be deployed to the Maven Central Repository [4],
>>>>> > * source code tag "v2.5.0-RC2" [5],
>>>>> > * website pull request listing the release and publishing the API
>>>>> > reference manual [6].
>>>>> > * Java artifacts were built with Gradle 4.7 (wrapper) and
>>>>> OpenJDK/Oracle
>>>>> > JDK 1.8.0_172 (Oracle Corporation 25.172-b11).
>>>>> > * Python artifacts are deployed along with the source release to the
>>>>> > dist.apache.org [2].
>>>>> >
>>>>> > The vote will be open for at least 72 hours. It is adopted by
>>>>> majority
>>>>> > approval, with at least 3 PMC affirmative votes.
>>>>> >
>>>>> > Thanks,
>>>>> > JB
>>>>> >
>>>>> > [1]
>>>>> > https://issues.apache.org/jira/secure/ReleaseNote.jspa?proje
>>>>> ctId=12319527&version=12342847
>>>>> > [2] https://dist.apache.org/repos/dist/dev/beam/2.5.0/
>>>>> > [3] https://dist.apache.org/repos/dist/release/beam/KEYS
>>>>> > [4] https://repository.apache.org/content/repositories/orgapache
>>>>> beam-1043/
>>>>> > [5] https://github.com/apache/beam/tree/v2.5.0-RC2
>>>>> > [6] https://github.com/apache/beam-site/pull/463
>>>>> >
>>>>>
>>>>> --
>>>>> Jean-Baptiste Onofré
>>>>> jbono...@apache.org
>>>>> http://blog.nanthrax.net
>>>>> Talend - http://www.talend.com
>>>>>
>>>> --
>>> Got feedback? go/pabloem-feedback
>>> <https://goto.google.com/pabloem-feedback>
>>>
>>
>>


Re: Community Examples Repository

2018-08-01 Thread Davor Bonaci
>
> it makes sense to modularize


It certainly does, but somebody just had another proposal to move the
website into the main repository ;-). That proposal was also good for
~everyone. Fun times...

(I have my opinions, of course, but I'm fine with any approach.)

On Wed, Aug 1, 2018 at 4:37 PM, Ahmet Altay  wrote:

> Thank you for this initiative.
>
> How about keeping a set of core examples in the main repository as a way
> of 1) convenient testing at a PR level 2) Testing with end to end tests
> against Beam head rather than a released Beam version 3) I think there is
> some educational value in having wordcount as a simple example living along
> with the code.
>
> For anything else examples repository would be a great idea.
>
> For testing, I would also like to understand how could we test examples
> against both released versions of Beam and the code currently being
> developed in master.
>
> Ahmet
>
> On Wed, Aug 1, 2018 at 3:36 PM, Jesse Anderson 
> wrote:
>
>> The examples have to be separate from the main beam repository. This way,
>> they serve as an example of how to use them in your code instead of how to
>> do it as part of Beam. It would also you to show the dependencies in sbt or
>> Maven.
>>
>>
>> On Wed, Aug 1, 2018, 3:16 PM Charles Chen  wrote:
>>
>>> The examples we have right now serve both as examples to users and along
>>> with their unit tests, as tests of functionality.  If we move the examples
>>> out, what is a good way to make sure that we continue to have visibility
>>> and test coverage?  Can we address this in a section of the doc?
>>>
>>> On Wed, Aug 1, 2018 at 3:12 PM David Cavazos 
>>> wrote:
>>>
 Hi everyone!

 We wanted to migrate the examples from the core repository to a new
 Beam community examples repository. As the number of examples grow, it
 makes sense to modularize and decouple the core functionality from the
 examples.

 We will also create some guidelines with the best practices for new
 examples to be submitted.

 For more details, feel free to take a look and comment on the proposal
 
 .

 Cheers,
 David

>>>
>


Re: Apache Beam Python Wheels Repository

2018-08-04 Thread Davor Bonaci
New repository is not a ticket, it is a self-serve thing.

That said, you probably want to develop the proposal a bit further,
understanding/educating others about the benefits of what you are
proposing, any alternatives, why a repository is needed, why the sample
repository has Travis CI when everything else is on Jenkins, how this fits
into other decisions about repository management, and so on. Anything can
be done, of course, but I'd suggest developing (or communicating, or
educating) a bit more.

(I'm fine with any approach.)

On Fri, Aug 3, 2018 at 3:29 PM, Ahmet Altay  wrote:

> This LGTM, also greatly simplifies the creation of wheel files for
> multiple platforms.
>
> I can file an INFRA ticket to create a new repo to host wheel setup. Does
> anybody have experience with setting up a new repo similar to this?
>
> Ahmet
>
> On Fri, Aug 3, 2018 at 1:16 PM, Boyuan Zhang  wrote:
>
>> Hey all,
>>
>> I'm Boyuan Zhang from Google Dataflow Team, currently helping Release
>> Manager(Pablo Estrada) with 2.6.0 release. Since Beam decided to release
>> python wheels since 2.5.0, we need to create a wrapper repository(sample
>> repo ) under apache to
>> build and stage released python wheels for each release. Anyone can help to
>> create this repository?
>>
>> Thanks for all your help! Happy Friday~
>>
>> Boyuan Zhang
>>
>
>


Re: [Proposal] Beam Mascot

2018-08-23 Thread Davor Bonaci
>
> I'd like to get input from the PMC on how this should be pursued (people
>> at the foundation who need to be involved, permissions needed, etc.), could
>> you advice?
>>
>
Please proceed as you see fit. Nothing is needed, no restrictions.

NB: the moment vote passes (and/or the mascot is contributed to the
website), its usage by third parties requires written approval (in most
cases).

Thanks for asking.

>


Re: JIRA permissions request

2018-09-12 Thread Davor Bonaci
It would be great to spec out "the necessary dashboards, reports, and
views", get feedback and agreement, and make it better along the way.

AFAIK, requesting this from INFRA should be fine; they can double check
what that specific group "all-developers" entails. That said, any tickets
filed should get an appropriate endorsement from the PMC as soon as they
are filed. If they aren't, most often things get difficult. Happy to
endorse modulo understanding what that group of privileges means.

On Wed, Sep 12, 2018 at 11:12 AM Andrew Pilloud  wrote:

> Hi Connell,
>
> Sounds like you want to contribute your project management skills to Beam?
> That sounds fantastic. Luke and I poked around JIRA for a bit and it
> appears access is only at the global level. You can file an INFRA ticket to
> request access, here is an example:
> https://issues.apache.org/jira/browse/INFRA-13700
>
> Andrew
>
> On Tue, Sep 11, 2018 at 10:03 AM Connell O'Callaghan 
> wrote:
>
>> Hi dev@,
>>
>> There are quite a few efforts in flight that have a lot of identified
>> work that needs a bit of project management to better communicate what is
>> being worked on and in what order across the community -- Portability
>> framework, portable runners, and SQL being examples that come to mind.
>> Rafael, Henning, and I want to work with JIRA's tools to produce (and
>> publish) the necessary dashboards, reports, and views. We appear to be
>> unable to share dashboards we create with the entire project due to a lack
>> of permissions. Can someone explain to us how we can create and then share
>> them? Otherwise, if it's just a permissions issue, is it possible to be
>> given the necessary permissions?
>>
>> Thank you in advance,
>> - Connell
>>
>


[ANNOUNCEMENT] New Beam chair: Kenneth Knowles

2018-09-19 Thread Davor Bonaci
Hi everyone --
It is with great pleasure that I announce that at today's meeting of the
Foundation's Board of Directors, the Board has appointed Kenneth Knowles as
the second chair of the Apache Beam project.

Kenn has served on the PMC since its inception, and is very active and
effective in growing the community. His exemplary posts have been cited in
other projects. I'm super happy to have Kenn accepted the nomination, and
I'm confident that he'll serve with distinction.

As for myself, I'm not going anywhere. I'm still around and will be as
active as I have recently been. Thrilled to be able to pass the baton to
such a key member of this community and to have less administrative work to
do ;-).

Please join me in welcoming Kenn to his new role, and I ask that you
support him as much as possible. As always, please let me know if you have
any questions.

Davor


Re: Please make sure you unincubate

2016-12-22 Thread Davor Bonaci
The ASF has not made any public announcement. As far as I understand, that
is delayed to the first half of January due to the holidays. That puts us
in a awkward state for a few weeks -- I'll reach out to Incubator and ASF
folks to clarify how we should be behaving in that time.

On Thu, Dec 22, 2016 at 9:13 AM, John D. Ament 
wrote:

> Eagle, Beam PMCs
>
> Congratulations on graduating.  Please complete your post-incubation steps
> within the incubator.  These can be done regardless of infra status.
>
> http://incubator.apache.org/guides/graduation.html#unincubate
>
> Regards,
>
> John
>


Re: Please make sure you unincubate

2016-12-27 Thread Davor Bonaci
The Incubator website has been updated [1, 2], and INFRA-13177 is tracking
the infrastructure work [3]. Special thanks to JB and Dan for handling some
parts of the work.

John, please kindly let us know if you spot something we may have missed.

Davor

[1] http://incubator.apache.org/projects/
[2] http://incubator.apache.org/projects/beam.html
[3] https://issues.apache.org/jira/browse/INFRA-13177

On Tue, Dec 27, 2016 at 9:02 AM, Jean-Baptiste Onofré 
wrote:

> Yup. It's done locally, ready to commit. Just want to sync with Davor
> first.
>
> Regards
> JB⁣​
>
> On Dec 27, 2016, 18:01, at 18:01, "John D. Ament" 
> wrote:
> >Ok, and just to be clear my ask is that you update content/podlings.xml
> >and
> >content/projects/beam.xml to reflect your TLP status.
> >
> >John
> >
> >On Tue, Dec 27, 2016 at 11:58 AM Jean-Baptiste Onofré 
> >wrote:
> >
> >> Not yet: we have to create the Jira using infrareq. It's what I would
> >> like to check with Davor (to split the workload and follow).
> >>
> >> It should be done today or tomorrow.
> >>
> >> Regards
> >> JB
> >>
> >> On 12/27/2016 05:53 PM, John D. Ament wrote:
> >> > Thanks JB.  I reached out to Davor about his other questions, are
> >you
> >> guys
> >> > all set?  Infra is eagerly awaiting your tickets.
> >> >
> >> > John
> >> >
> >> > On Tue, Dec 27, 2016 at 11:47 AM Jean-Baptiste Onofré
> >
> >> > wrote:
> >> >
> >> >> Hi John,
> >> >>
> >> >> Davor and I will take care of the post graduation tasks,
> >especially with
> >> >> the Infra.
> >> >>
> >> >> Thanks for the reminder.
> >> >>
> >> >> Regards
> >> >> JB
> >> >>
> >> >> On 12/27/2016 04:59 PM, John D. Ament wrote:
> >> >>> All,
> >> >>>
> >> >>> Second reminder.  Please complete your post graduation steps.
> >Eagle,
> >> >> infra
> >> >>> has almost finished your graduation, so it's a bit awkward for a
> >TLP to
> >> >>> appear on the incubator list as well.
> >> >>>
> >> >>> John
> >> >>>
> >> >>> On Thu, Dec 22, 2016 at 12:13 PM John D. Ament
> >
> >> >>> wrote:
> >> >>>
> >>  Eagle, Beam PMCs
> >> 
> >>  Congratulations on graduating.  Please complete your
> >post-incubation
> >> >> steps
> >>  within the incubator.  These can be done regardless of infra
> >status.
> >> 
> >>  http://incubator.apache.org/guides/graduation.html#unincubate
> >> 
> >>  Regards,
> >> 
> >>  John
> >> 
> >> >>>
> >> >>
> >> >> --
> >> >> Jean-Baptiste Onofré
> >> >> jbono...@apache.org
> >> >> http://blog.nanthrax.net
> >> >> Talend - http://www.talend.com
> >> >>
> >> >
> >>
> >> --
> >> Jean-Baptiste Onofré
> >> jbono...@apache.org
> >> http://blog.nanthrax.net
> >> Talend - http://www.talend.com
> >>
>


Re: [DISCUSS] Graduation to a top-level project

2016-12-27 Thread Davor Bonaci
As part of the process, Apache Infra should be reconfiguring our DNS
records, repository URLs, mailing list addresses, etc. shortly. Please
expect some instability over the next several days while this is sorted
out. This work is tracked in INFRA-13177 [1] -- please report any issues
there.

The announcement is still planned for the second week in January.

Davor

[1] https://issues.apache.org/jira/browse/INFRA-13177

On Tue, Dec 20, 2016 at 1:22 PM, Davor Bonaci  wrote:

> A quick update: a meeting of the ASF Board of Directors is scheduled for
> later this week, at which the Board may consider taking action on our
> graduation proposal!
>
> That said, even if the Board does enact it, any public announcement is
> expected to be delayed to the first half of January due to the holidays.
>
> In the meanwhile, we are still an Incubator podling and should continue to
> operate as such until the ASF announces otherwise... so, please hold your
> speculation and enthusiasm for another few weeks ;-)
>
> Davor
>
> On Thu, Dec 8, 2016 at 11:09 PM, Jean-Baptiste Onofré 
> wrote:
>
>> Congrats all !
>>
>> Regards
>> JB
>>
>>
>> On 12/09/2016 12:42 AM, Davor Bonaci wrote:
>>
>>> A quick update: the Apache Incubator has adopted the proposed graduation
>>> resolution [1], and it is now presented to the ASF Board of Directors for
>>> their consideration.
>>>
>>> Davor
>>>
>>> [1]
>>> https://lists.apache.org/thread.html/71a1c63837a7d1506a10af9
>>> c70af1c24db988451ac5b53fa2467b9b8@%3Cgeneral.incubator.apache.org%3E
>>>
>>> On Mon, Dec 5, 2016 at 10:35 AM, Neelesh Salian 
>>> wrote:
>>>
>>> Quite an interesting discussion. Looking forward to the graduation. :)
>>>> Thanks for putting this together.
>>>>
>>>> On Mon, Dec 5, 2016 at 10:30 AM, Davor Bonaci  wrote:
>>>>
>>>> A quick update: the vote within the Incubator has been started [1].
>>>>>
>>>>> Davor
>>>>>
>>>>> [1]
>>>>> https://lists.apache.org/thread.html/a8e9cecfe93f0e464cc7c1774d2761
>>>>> ca14326df1101b7670ca8b1dc3@%3Cgeneral.incubator.apache.org%3E
>>>>>
>>>>> On Fri, Dec 2, 2016 at 11:40 AM, Davor Bonaci 
>>>>> wrote:
>>>>>
>>>>> A quick update on the progress: the PPMC is nearly complete drafting
>>>>>>
>>>>> the
>>>>
>>>>> proposed resolution, and I've just kicked off the discussion within the
>>>>>> Incubator community [1].
>>>>>>
>>>>>> I'd encourage everyone to participate in the discussion and carry your
>>>>>> enthusiasm there. Thanks!
>>>>>>
>>>>>> Davor
>>>>>>
>>>>>> [1] https://lists.apache.org/thread.html/b9c1071b355588468368145
>>>>>>
>>>>> 75ada3c
>>>>
>>>>> dca61c72dc1e672ab994a9c936@%3Cgeneral.incubator.apache.org%3E
>>>>>>
>>>>>> On Thu, Nov 24, 2016 at 1:52 AM, Maximilian Michels 
>>>>>> wrote:
>>>>>>
>>>>>> +1
>>>>>>>
>>>>>>> I see a healthy project which deserves to graduate.
>>>>>>>
>>>>>>> On Wed, Nov 23, 2016 at 6:03 PM, Davor Bonaci 
>>>>>>>
>>>>>> wrote:
>>>>
>>>>> Thanks everyone for the enthusiastic support!
>>>>>>>>
>>>>>>>> Please keep the thread going, as we kick off the process on private@
>>>>>>>>
>>>>>>> .
>>>>
>>>>> Please don’t forget to bring up any data points that might help
>>>>>>>>
>>>>>>> strengthen
>>>>>>>
>>>>>>>> our case.
>>>>>>>>
>>>>>>>> Thanks!
>>>>>>>>
>>>>>>>> On Wed, Nov 23, 2016 at 8:45 AM, Scott Wegner
>>>>>>>>
>>>>>>> 
>>>>>>>
>>>>>>>> wrote:
>>>>>>>>
>>>>>>>> +1 (beaming)
>>>>>>>>>
>>>>>>>>> On Wed, Nov 23, 2016 at 8:25 AM Robert Bradshaw
>>>>>>>>> 
>>>>>>>>> wrote:
>>>>>>>>

Re: [DISCUSS] Graduation to a top-level project

2016-12-29 Thread Davor Bonaci
A quick update: the Apache Infra process, tracked in INFRA-13177 [1],
should now be complete. As far as I see, everything should be operational
-- please reply back if you encounter something unexpected.

The announcement is still planned for the second week in January.

Davor

[1] https://issues.apache.org/jira/browse/INFRA-13177

On Tue, Dec 27, 2016 at 11:32 PM, Jean-Baptiste Onofré 
wrote:

> FYI, updating remotes on my git local and changing the parent pom version
> to 0.5.0-SNAPSHOT work smoothly. For the test, I also prune my local Maven
> repo.
>
> Regards
> JB
>
> On 12/28/2016 07:22 AM, Davor Bonaci wrote:
>
>> As part of the process, Apache Infra should be reconfiguring our DNS
>> records, repository URLs, mailing list addresses, etc. shortly. Please
>> expect some instability over the next several days while this is sorted
>> out. This work is tracked in INFRA-13177 [1] -- please report any issues
>> there.
>>
>> The announcement is still planned for the second week in January.
>>
>> Davor
>>
>> [1] https://issues.apache.org/jira/browse/INFRA-13177
>>
>> On Tue, Dec 20, 2016 at 1:22 PM, Davor Bonaci  wrote:
>>
>> A quick update: a meeting of the ASF Board of Directors is scheduled for
>>> later this week, at which the Board may consider taking action on our
>>> graduation proposal!
>>>
>>> That said, even if the Board does enact it, any public announcement is
>>> expected to be delayed to the first half of January due to the holidays.
>>>
>>> In the meanwhile, we are still an Incubator podling and should continue
>>> to
>>> operate as such until the ASF announces otherwise... so, please hold your
>>> speculation and enthusiasm for another few weeks ;-)
>>>
>>> Davor
>>>
>>> On Thu, Dec 8, 2016 at 11:09 PM, Jean-Baptiste Onofré 
>>> wrote:
>>>
>>> Congrats all !
>>>>
>>>> Regards
>>>> JB
>>>>
>>>>
>>>> On 12/09/2016 12:42 AM, Davor Bonaci wrote:
>>>>
>>>> A quick update: the Apache Incubator has adopted the proposed graduation
>>>>> resolution [1], and it is now presented to the ASF Board of Directors
>>>>> for
>>>>> their consideration.
>>>>>
>>>>> Davor
>>>>>
>>>>> [1]
>>>>> https://lists.apache.org/thread.html/71a1c63837a7d1506a10af9
>>>>> c70af1c24db988451ac5b53fa2467b9b8@%3Cgeneral.incubator.apache.org%3E
>>>>>
>>>>> On Mon, Dec 5, 2016 at 10:35 AM, Neelesh Salian 
>>>>> wrote:
>>>>>
>>>>> Quite an interesting discussion. Looking forward to the graduation. :)
>>>>>
>>>>>> Thanks for putting this together.
>>>>>>
>>>>>> On Mon, Dec 5, 2016 at 10:30 AM, Davor Bonaci 
>>>>>> wrote:
>>>>>>
>>>>>> A quick update: the vote within the Incubator has been started [1].
>>>>>>
>>>>>>>
>>>>>>> Davor
>>>>>>>
>>>>>>> [1]
>>>>>>> https://lists.apache.org/thread.html/a8e9cecfe93f0e464cc7c1774d2761
>>>>>>> ca14326df1101b7670ca8b1dc3@%3Cgeneral.incubator.apache.org%3E
>>>>>>>
>>>>>>> On Fri, Dec 2, 2016 at 11:40 AM, Davor Bonaci 
>>>>>>> wrote:
>>>>>>>
>>>>>>> A quick update on the progress: the PPMC is nearly complete drafting
>>>>>>>
>>>>>>>>
>>>>>>>> the
>>>>>>>
>>>>>>
>>>>>> proposed resolution, and I've just kicked off the discussion within
>>>>>>> the
>>>>>>>
>>>>>>>> Incubator community [1].
>>>>>>>>
>>>>>>>> I'd encourage everyone to participate in the discussion and carry
>>>>>>>> your
>>>>>>>> enthusiasm there. Thanks!
>>>>>>>>
>>>>>>>> Davor
>>>>>>>>
>>>>>>>> [1] https://lists.apache.org/thread.html/b9c1071b355588468368145
>>>>>>>>
>>>>>>>> 75ada3c
>>>>>>>
>>>>>>
>>>>>> dca61c72dc1e672ab994a9c936@%3Cgeneral.incubator.apache.org%3E
>>>>>>>
>>>&

Re: [VOTE] Release 0.4.0, release candidate #1

2016-12-29 Thread Davor Bonaci
+1

A passing set of Jenkins suites:
* https://builds.apache.org/job/beam_PreCommit_Java_MavenInstall/6336/
* https://builds.apache.org/job/beam_PostCommit_Java_MavenInstall/2245/
*
https://builds.apache.org/job/beam_PostCommit_Java_RunnableOnService_Apex/138/
*
https://builds.apache.org/job/beam_PostCommit_Java_RunnableOnService_Flink/1256/
*
https://builds.apache.org/job/beam_PostCommit_Java_RunnableOnService_Spark/575/
*
https://builds.apache.org/job/beam_PostCommit_Java_RunnableOnService_Dataflow/1928/

Davor

On Thu, Dec 29, 2016 at 4:29 PM, Dan Halperin 
wrote:

> * mvn verify passes with and without network enabled
> * mvn apache-ret:check passes
> * mvn verify passes with -Prelease
> * release signature properly signed by JB (using the KEYS file as the
> keyring)
> * No binary files [one false positive empty file
> in ./runners/core-java/src/test/java/.placeholder we should plausibly
> delete in future]
> (osx: find . -type f -exec file -I '{}' \; | grep 'charset=binary')
>
> * Module changes are as expected (microbenchmarks had a licensing issue and
> was removed). Licensing for dependencies of new modules is okay (all
> Apache).
>
>new:
>> apache-beam/runners/apex/pom.xml
>> apache-beam/sdks/java/extensions/sorter/pom.xml
>> apache-beam/sdks/java/maven-archetypes/examples-java8/pom.xml
>>
> apache-beam/sdks/java/maven-archetypes/examples-java8/src/ma
> in/resources/archetype-resources/pom.xml
>
>removed:
>< apache-beam/sdks/java/microbenchmarks/pom.xml
>
> * No occurrences of the substring `incub` in the source zip.
>
> * Ran all additional postcommits in Jenkins against the release tag, and
> all passed.
>
> So, looks good to me!
>
> +1
>
> Dan
>
> On Wed, Dec 28, 2016 at 11:39 PM, Jean-Baptiste Onofré 
> wrote:
>
> > Minor fix & update: the source code tag is obviously v0.4.0-RC1
> >
> > https://git-wip-us.apache.org/repos/asf?p=beam.git;a=tag;h=r
> > efs/tags/v0.4.0-RC1
> >
> > I launched Jenkins on the tag and it passed:
> >
> > https://builds.apache.org/view/Beam/job/beam_PostCommit_Java
> > _MavenInstall/2245/
> >
> > Regards
> > JB
> >
> >
> > On 12/29/2016 08:33 AM, Jean-Baptiste Onofré wrote:
> >
> >> Hi everyone,
> >>
> >> Please review and vote on the release candidate #1 for the version
> >> 0.4.0, as follows:
> >>
> >> [ ] +1, Approve the release
> >> [ ] -1, Do not approve the release (please provide specific comments)
> >>
> >> The complete staging area is available for your review, which includes:
> >> * JIRA release notes [1],
> >> * the official Apache source release to be deployed to dist.apache.org
> >> [2], which is signed with the key with fingerprint C8282E76 [3],
> >> * all artifacts to be deployed to the Maven Central Repository [4],
> >> * source code tag "v1.2.3-RC3" [5],
> >> * website pull request listing the release and publishing the API
> >> reference manual [6].
> >>
> >> The vote will be open for at least 72 hours. It is adopted by majority
> >> approval, with at least 3 PPMC affirmative votes.
> >>
> >> Thanks,
> >> JB
> >>
> >> [1]
> >> https://issues.apache.org/jira/secure/ReleaseNote.jspa?proje
> >> ctId=12319527&version=12338590
> >>
> >> [2] https://dist.apache.org/repos/dist/dev/beam/0.4.0/
> >> [3] https://dist.apache.org/repos/dist/release/beam/KEYS
> >> [4] https://repository.apache.org/content/repositories/orgapache
> >> beam-1009/
> >> [5]
> >> https://git-wip-us.apache.org/repos/asf?p=beam.git;a=tag;h=a
> >> b73a243ccfdae18f81435bfcf9de21c195fef4d
> >>
> >> [6] https://github.com/apache/beam-site/pull/117
> >>
> >
> > --
> > Jean-Baptiste Onofré
> > jbono...@apache.org
> > http://blog.nanthrax.net
> > Talend - http://www.talend.com
> >
>


Re: Hosting data stores for IO Transform testing

2016-12-29 Thread Davor Bonaci
Just a quick drive-by comment: how tests are laid out has non-trivial
tradeoffs on how/where continuous integration runs, and how results are
integrated into the tooling. The current state is certainly not ideal
(e.g., due to multiple test executions some links in Jenkins point where
they shouldn't), but most other alternatives had even bigger drawbacks at
the time. If someone has great ideas that don't explode the number of
modules, please share ;-)

On Mon, Dec 26, 2016 at 6:30 AM, Etienne Chauchot 
wrote:

> Hi Stephen,
>
> Thanks for taking the time to comment.
>
> My comments are bellow in the email:
>
>
> Le 24/12/2016 à 00:07, Stephen Sisk a écrit :
>
>> hey Etienne -
>>
>> thanks for your thoughts and thanks for sharing your experiences. I
>> generally agree with what you're saying. Quick comments below:
>>
>> IT are stored alongside with UT in src/test directory of the IO but they
>>>
>> might go to dedicated module, waiting for a consensus
>> I don't have a strong opinion or feel that I've worked enough with maven
>> to
>> understand all the consequences - I'd love for someone with more maven
>> experience to weigh in. If this becomes blocking, I'd say check it in, and
>> we can refactor later if it proves problematic.
>>
> Sure, not a blocking point, it could be refactored afterwards. Just as a
> reminder, JB mentioned that storing IT in separate module allows to have
> more coherence between all IT (same behavior) and to do cross IO
> integration tests. JB, have you experienced some long term drawbacks of
> storing IT in a separate module, like, for example, more difficult
> maintenance due to "distance" with production code?
>
>
>>   Also IMHO, it is better that tests load/clean data than doing some
>>>
>> assumptions about the running order of the tests.
>> I definitely agree that we don't want to make assumptions about the
>> running
>> order of the tests - that way lies pain. :) It will be interesting to see
>> how the performance tests work out since they will need more data (and
>> thus
>> loading data can take much longer.)
>>
> Yes, performance testing might push in the direction of data loading from
> outside the tests due to loading time.
>
>>   This should also be an easier problem
>> for read tests than for write tests - if we have long running instances,
>> read tests don't really need cleanup. And if write tests only write a
>> small
>> amount of data, as long as we are sure we're writing to uniquely
>> identifiable locations (ie, new table per test or something similar), we
>> can clean up the write test data on a slower schedule.
>>
> I agree
>
>>
>> this will tend to go to the direction of long running data store
>>>
>> instances rather than data store instances started (and optionally loaded)
>> before tests.
>> It may be easiest to start with a "data stores stay running"
>> implementation, and then if we see issues with that move towards tests
>> that
>> start/stop the data stores on each run. One thing I'd like to make sure is
>> that we're not manually tweaking the configurations for data stores. One
>> way we could do that is to destroy/recreate the data stores on a slower
>> schedule - maybe once per week. That way if the script is changed or the
>> data store instances are changed, we'd be able to detect it relatively
>> soon
>> while still removing the need for the tests to manage the data stores.
>>
> I agree. In addition to configuration manual tweaking, there might be
> cases in which a data store re-partition data during a test or after some
> tests while the dataset changes. The IO must be tolerant to that but the
> asserts (number of bundles for example) in test must not fail in that case.
> I would also prefer if possible that the tests do not manage data stores
> (not setup them, not start them, not stop them)
>
>
>> as a general note, I suspect many of the folks in the states will be on
>> holiday until Jan 2nd/3rd.
>>
>> S
>>
>> On Fri, Dec 23, 2016 at 7:48 AM Etienne Chauchot 
>> wrote:
>>
>> Hi,
>>>
>>> Recently we had a discussion about integration tests of IOs. I'm
>>> preparing a PR for integration tests of the elasticSearch IO
>>> (
>>> https://github.com/echauchot/incubator-beam/tree/BEAM-1184-E
>>> LASTICSEARCH-IO
>>> as a first shot) which are very important IMHO because they helped catch
>>> some bugs that UT could not (volume, data store instance sharing, real
>>> data store instance ...)
>>>
>>> I would like to have your thoughts/remarks about points bellow. Some of
>>> these points are also discussed here
>>>
>>> https://docs.google.com/document/d/153J9jPQhMCNi_eBzJfhAg-Np
>>> rQ7vbf1jNVRgdqeEE8I/edit#heading=h.7ly6e7beup8a
>>> :
>>>
>>> - UT and IT have a similar architecture, but while UT focus on testing
>>> the correct behavior of the code including corner cases and use embedded
>>> in memory data store, IT assume that the behavior is correct (strong UT)
>>> and focus on higher volume testing and testing against real data store
>>> instance(s)

Re: Azure Blob IO for Apache Beam

2016-12-30 Thread Davor Bonaci
Hi Saulo,
I've responded to your Stack Overflow question, but I can give a few more
comments here.

Pei (cc'd) is working on this. See this JIRA issue [1], its sub-tasks, and
several design documents [2, 3].

It would be awesome to also have native Azure Storage Blob support in Beam
that builds on top of current work -- we'd love that contribution!

Thanks,
Davor

[1] https://issues.apache.org/jira/browse/BEAM-59
[2]
https://docs.google.com/document/d/11TdPyZ9_zmjokhNWM3Id-XJsVG3qel2lhdKTknmZ_7M
[3]
https://docs.google.com/document/d/1-7vo9nLRsEEzDGnb562PuL4q9mUiq_ZVpCAiyyJw8p8/edit#heading=h.p3gc3colc2cs

On Fri, Dec 30, 2016 at 5:42 PM, Saulo Ricci  wrote:

> Hi,
>
> I had posted this question
>  support-in-apache-beam>
> @
> stack overflow. Basically I'm trying to run Apache Beam on a spark cluster
> hosted in a MS Azure environment. It seems Apache Beam doesn't have support
> to Azure Blobs IO, right? Should be an alternative solution the
> implementation of an Azure Blob IO for this case?
>
> Best
> Saulo
> --
> Saulo
>


Apache News Round-up

2016-12-30 Thread Davor Bonaci
I stumbled across Apache News Round-up for this week [1], and our own
Jean-Baptiste Onofré is noted as one of the top five committers in 2016
with 1,825 commits across all Apache projects.

Congratulations JB -- this is awesome!

Davor

[1] https://blogs.apache.org/foundation/entry/the_apache_news_round_up118


Re: Testing Metrics

2017-01-02 Thread Davor Bonaci
Sounds like we should do both, right?

1. Test the metrics API without accounting for the various sink types, i.e.
> against the SDK.
>

Metrics API is a runner-independent SDK concept. I'd imagine we'd want to
have runner-independent test that interact with the API, outside of any
specific transform implementation, execute them on all runners, and query
the results. Goal: make sure Metrics work.

2. Have the sink types, or at least some of them, tested as part of
> integration tests, e.g., have an in-memory Graphite server to test Graphite
> metrics and so on.
>

This is valid too -- this is testing *usage* of Metrics API in the given
IO. If a source/sink, or a transform in general, is exposing a metric, that
metric should be tested in its own right as a part of the transform
implementation.


Graduation!

2017-01-10 Thread Davor Bonaci
The ASF has publicly announced our graduation!


https://blogs.apache.org/foundation/entry/the-apache-software-foundation-announces

https://beam.apache.org/blog/2017/01/10/beam-graduates.html

Graduation is a recognition of the community that we have built together. I
am humbled to be part of this group and this project, and so excited for
what we can accomplish together going forward.

Davor


Re: Starter issue

2017-01-11 Thread Davor Bonaci
Welcome Tim -- it's great to have you join our community!

I found your name in JIRA and assigned the issue to you. Thanks for your
(future) contribution.

On Wed, Jan 11, 2017 at 11:27 PM, Jean-Baptiste Onofré 
wrote:

> Hi Tim
>
> What's your Jira id ?
>
> Thanks
> Regards
> JB⁣​
>
> On Jan 12, 2017, 06:48, at 06:48, Tim Taschke  wrote:
> >Hi,
> >
> >I would like to get started with contributing and thought I'd start
> >with this, if that is ok:
> >https://issues.apache.org/jira/browse/BEAM-1056
> >
> >Could somebody please assign it to me?
> >
> >Best regards,
> >Tim
>


Re: On my activity at the project

2017-01-16 Thread Davor Bonaci
Thanks Max -- enjoy the time off!

Davor

On Sun, Jan 15, 2017 at 10:25 AM, Jesse Anderson 
wrote:

> Thanks for all your hard work.
>
> On Sun, Jan 15, 2017, 10:16 AM Jean-Baptiste Onofré 
> wrote:
>
> > Hi Max,
> >
> > thanks for your commitment and your work on the project.
> >
> > Enjoy your time off.
> >
> > Regards
> > JB
> >
> > On 01/14/2017 09:04 AM, Maximilian Michels wrote:
> > > Dear Beamers,
> > >
> > > Thank you for the past year where we built this amazing community! It's
> > > been exciting times.
> > >
> > > For the beginning of this year, I decided to take some time off. I'd
> > > love to stay with the project and I think I'm going to be committing
> > > more in the future. For the meantime, I'd like to pass on the component
> > > lead of the Flink Runner to either Aljoscha or Stephan who are the most
> > > experienced Flink committers of the Beam community.
> > >
> > > Please feel free to reach out to me in case anything pops up. It's
> great
> > > to see Beam as an established top level project. Everyone at the Beam
> > > community can be really proud!
> > >
> > > Best,
> > > Max
> >
> > --
> > Jean-Baptiste Onofré
> > jbono...@apache.org
> > http://blog.nanthrax.net
> > Talend - http://www.talend.com
> >
>


Re: [DISCUSS] Python SDK status and next steps

2017-01-17 Thread Davor Bonaci
+1. I think merging to master would be an awesome next step for the Python
SDK.

And, thanks for a great summary of the current state, roadmap, and impact
to the project as a whole -- awesome!

Process-wise, I'd suggest starting a formal vote once this discussion seems
to be trending towards a conclusion, and complete the merge as soon as the
next release (0.5.0) is cut. This would enable additional time before 0.6.0
to figure out compliance, release process impact, etc.

Great work everyone!

On Tue, Jan 17, 2017 at 8:26 AM, Jean-Baptiste Onofré 
wrote:

> Hi
>
> I didn't try the Python SDK recently but you provided a clear "state of
> the art". Anyway I'm in favor of merging things as quick as possible
> (assuming it's in a good shape in term of build, test, ...): it would
> potentially grow up the "external" contributions.
>
> So +1 from my side.
>
> Regards
> JB⁣​
>
> On Jan 17, 2017, 08:22, at 08:22, Ahmet Altay 
> wrote:
> >Hi all,
> >
> >tl;dr: I would like to start a discussion about merging python-sdk
> >branch
> >to master branch. Python SDK is mature enough and merging it to master
> >will
> >accelerate its development and adoption.
> >
> >With a great effort from a lot of contributors(*), Python SDK [1] is
> >now a
> >mostly complete, tested, performant Python implementation of the Beam
> >model. Since June, when we first started with Python SDK in Apache Beam
> >we
> >have been continuously improving it.
> >
> >** Python SDK currently supports:
> >
> >* Model: All main concepts are present (ParDo, GroupByKey, Windowing
> >etc.).
> >* IO: There are extensible APIs for writing new bounded sources and
> >sinks.
> >Implementations are provided for Text, Avro, BigQuery, and Datastore.
> >* Runners: Python SDK has an extensible base runner module that allows
> >building specific runners on top of it. The SDK comes with two pipeline
> >runners: DirectRunner and DataflowRunner; and it is possible to add
> >more.
> >The existing runners are currently limited to bounded execution and
> >otherwise equivalent to their Java SDK counterparts in functionality.
> >* Testing: Python SDK implements ValidatesRunner test framework for
> >implementing integration test for current and future runners. There is
> >unit
> >test coverage for all modules, and a number of integrations test for
> >validating existing runners.
> >* Documentation and examples: Documentation work has started on Python
> >SDK.
> >Beam Programming Guide page has been updated to include Python [2]. The
> >code comes with many ready to use examples and we are in a good place
> >to
> >start documenting those on the website.
> >
> >** We are not done yet, next on the roadmap we have:
> >
> >* Streaming: Both of the existing runners lack support for streaming
> >execution, and currently there is work going on for adding streaming
> >support to DirectRunner [3].
> >* Documentation: Filling the rest of the Beam documentations with
> >Python
> >SDK specific information and examples.
> >* SDK consistency: Making Python SDK consistent with the Java SDK. We
> >have
> >come a long way on this and have only a few items left [4].
> >* Beamifying: We have been working on removing Dataflow-specific
> >references
> >both from the documentation and from the code. There is some work left,
> >and
> >we are currently working on those as well [5].
> >
> >** Steps and implications of merging to master:
> >
> >* Master branch is merged to python-sdk branch at regular intervals and
> >the
> >last merge was on 12/22. All the past merges were uneventful because
> >there
> >is a minimal overlap in modified files between branches. Integrating
> >python-sdk to master will similarly touch a small number of existing
> >files.
> >
> >* Python SDK is using the same tools for building and testing. It is
> >already integrated with Maven, Jenkins and Travis. Specifically the
> >impact
> >to the testing infrastructure would be:
> >- There will be two additional test configurations in Travis. Since
> >Travis
> >runs all configurations in parallel there should not be a noticeable
> >change
> >in the Travis run time.
> >- Jenkins pre-commit test will start running the Python SDK tests. It
> >will
> >add an additional 5 minutes to the completion time of pre-commit test.
> >Historically Python SDK tests were not flaky and did not cause any
> >random
> >failures.
> >- Jenkins Python post-commit test is already separated from the other
> >post-commit tests and will continue to exist. It would not change the
> >testing time for any other test.
> >
> >* The release process needs to be updated to accommodate releasing
> >Python
> >artifacts. Python SDK would fit in the existing release schedule and
> >could
> >be released along with the Java SDK. The additional steps would
> >include:
> >- Generating Python artifacts. This could be done with a single command
> >using Maven today.
> >- Publishing the artifacts to a central repository such as PyPI.
> >- Updating the release guide to reflect t

Re: [VOTE] Merge Python SDK to the master branch

2017-01-20 Thread Davor Bonaci
[X] +1, Merge python-sdk branch to master after the 0.5.0 release, and
release it in the subsequent minor release.


On Fri, Jan 20, 2017 at 9:03 AM, Ahmet Altay 
wrote:

> Hi all,
>
>
> Please review the earlier discussion on the status of the Python SDK [1]
> and vote on merging the python-sdk branch to the master branch, as follows:
>
> [ ] +1, Merge python-sdk branch to master after the 0.5.0 release, and
> release it in the subsequent minor release.
>
> [ ] -1, Continue development in python-sdk branch (please provide specific
> comments).
>
> The vote will be open for at least 72 hours. This is a procedural vote; it
> is adopted by majority approval of qualified votes with no minimums [2].
>
> Thank you,
>
> Ahmet
>
> [1]
> https://lists.apache.org/thread.html/84a36cf0ad95a76e6bc444603ae87e
> 7312023bc167a6ff3c57a956f1@%3Cdev.beam.apache.org%3E
> [2] http://apache.org/foundation/voting.html
>


Re: Runner-provided ValueProviders

2017-01-20 Thread Davor Bonaci
Expecting runners to populate, or override, SDK-level pipeline options
isn't a great thing, particularly in a scenario that would affect
correctness.

The main thing is discoverability of a subtle API like this -- there's
little chance somebody writing a new runner would stumble across this and
do the right thing. It would be much better to make expectations from a
runner clear, say, via a runner-provided "context" API. I'd stay away from
a pipeline option with a default value.

The other contentions topic here is the usage of a job-level or
execution-level identifier. This easily becomes ambiguous in the presence
of Flink's savepoints, Dataflow's update, fast re-execution, canary vs.
production pipeline, cross-job optimizations, etc. I think we'd be better
off with a transform-level nonce than a job-level one.

Finally, the real solution is to enhance the model and make such a
functionality available to everyone, e.g., roughly "init" + "checkpoint" +
"side-input to source / splittabledofn / composable io".

--

Practically, to solve the problem at hand quickly, I'd be in favor of a
context-based approach.

On Thu, Jan 19, 2017 at 10:22 AM, Sam McVeety 
wrote:

> Hi folks, I'm looking for feedback on whether the following is a reasonable
> approach to handling ValueProviders that are intended to be populated at
> runtime by a given Runner (e.g. a Dataflow job ID, which is a GUID from the
> service).  Two potential pieces of a solution:
>
> 1. Annotate such parameters with @RunnerProvided, which results in an
> Exception if the user manually tries to set the parameter.
>
> 2. Allow for a DefaultValueFactory to be present for the set of Runners
> that do not override the parameter.
>
> Best,
> Sam
>


Re: Subscription to to beam project

2017-01-22 Thread Davor Bonaci
Welcome! Please check out the support page [1] with all mailing lists and
subscribe links.

[1] https://beam.apache.org/get-started/support/

On Sat, Jan 21, 2017 at 11:59 PM, Ritesh Kasat 
wrote:

> Hello,
> Please add me to the beam mailing list.
> Thanks
> Ritesh
>


Release 0.5.0

2017-01-23 Thread Davor Bonaci
It's been about a month since the last release, so we should start
preparing the next one. We have plenty of great content to share with our
users!

We have a couple of release blocking bugs [1], but they seem to be heading
towards a conclusion. Is anybody aware of any other release blocking issues?

Finally, does anybody want to volunteer to manage this release? I'd be
happy to help if you hit any issues along the way.

Thanks!

Davor

[1]
https://issues.apache.org/jira/issues/?jql=project%20%3D%20BEAM%20AND%20fixVersion%20%3D%200.5.0%20AND%20resolution%20%3D%20Unresolved%20ORDER%20BY%20due%20ASC%2C%20priority%20DESC%2C%20created%20ASC


[ANNOUNCEMENT] New committers, January 2017 edition!

2017-01-26 Thread Davor Bonaci
Please join me and the rest of Beam PMC in welcoming the following
contributors as our newest committers. They have significantly contributed
to the project in different ways, and we look forward to many more
contributions in the future.

* Stas Levin
Stas has contributed across the breadth of the project, from the Spark
runner to the core pieces and Java SDK. Looking at code contributions
alone, he authored 43 commits and reported 25 issues. Stas is very active
on the mailing lists too, contributing to good discussions and proposing
improvements to the Beam model.

* Ahmet Altay
Ahmet is a major contributor to the Python SDK, both in terms of design and
code contribution. Looking at code contributions alone, he authored 98
commits and reviewed dozens of pull requests. With Python SDK’s imminent
merge to the master branch, Ahmet contributed towards establishing a new
major component in Beam.

* Pei He
Pei has been contributing to Beam since its inception, accumulating a total
of 118 commits since February. He has made several major contributions,
most recently by redesigning IOChannelFactory / FileSystem APIs (in
progress), which would extend Beam’s portability to many additional file
systems and cloud providers.

Congratulations to all three! Welcome!

Davor


Re: [VOTE] Apache Beam, version 0.5.0, release candidate #1

2017-01-30 Thread Davor Bonaci
It looks good to me, but let's hear Aljoscha's opinion on BEAM-1346.

A passing suite of Jenkins jobs:
* https://builds.apache.org/job/beam_PreCommit_Java_MavenInstall/6870/
* https://builds.apache.org/job/beam_PostCommit_Java_MavenInstall/2474/
*
https://builds.apache.org/job/beam_PostCommit_Java_RunnableOnService_Apex/336/
*
https://builds.apache.org/job/beam_PostCommit_Java_RunnableOnService_Flink/1470/
*
https://builds.apache.org/job/beam_PostCommit_Java_RunnableOnService_Spark/786/
*
https://builds.apache.org/job/beam_PostCommit_Java_RunnableOnService_Dataflow/2130/

On Mon, Jan 30, 2017 at 4:40 PM, Dan Halperin  wrote:

> I am worried about https://issues.apache.org/jira/browse/BEAM-1346 for RC1
> and would at least wait for resolution there before proceeding.
>
> On Mon, Jan 30, 2017 at 3:48 AM, Jean-Baptiste Onofré 
> wrote:
>
> > Good catch for the PPMC, I'm upgrading the email template in the release
> > guide (it was a copy/paste).
> >
> > Regards
> > JB
> >
> >
> > On 01/30/2017 11:50 AM, Sergio Fernández wrote:
> >
> >> +1 (non-binding)
> >>
> >> So far I've successfully checked:
> >> * signatures and digests
> >> * source releases file layouts
> >> * matched git tags and commit ids
> >> * incubator suffix and disclaimer
> >> * NOTICE and LICENSE files
> >> * license headers
> >> * clean build (Java 1.8.0_91, Maven 3.3.9, Debian amd64)
> >>
> >> Two minor comments that do not block the release:
> >> * Usually I like to see the commit id referencing the rc, since git tags
> >> can be changed.
> >> * Just a formality, "PPMC" is not committee that plays a role anymore,
> >> you're a PMC now ;-)
> >>
> >>
> >>
> >> On Fri, Jan 27, 2017 at 9:55 PM, Jean-Baptiste Onofré 
> >> wrote:
> >>
> >> Hi everyone,
> >>>
> >>> Please review and vote on the release candidate #1 for the version
> 0.5.0
> >>> as follows:
> >>>
> >>> [ ] +1, Approve the release
> >>> [ ] -1, Do not approve the release (please provide specific comments)
> >>>
> >>> The complete staging area is available for your review, which includes:
> >>>
> >>> * JIRA release notes [1],
> >>> * the official Apache source release to be deployed to dist.apache.org
> >>> [2], which is signed with the key with fingerprint C8282E76 [3],
> >>> * all artifacts to be deployed to the Maven Central Repository [4],
> >>> * source code tag "v0.5.0-RC1" [5],
> >>> * website pull request listing the release and publishing the API
> >>> reference manual [6].
> >>>
> >>> The vote will be open for at least 72 hours. It is adopted by majority
> >>> approval, with at least 3 PPMC affirmative votes.
> >>>
> >>> Thanks,
> >>> JB
> >>>
> >>> [1] https://issues.apache.org/jira/secure/ReleaseNote.jspa?proje
> >>> ctId=12319527&version=12338859
> >>> [2] https://dist.apache.org/repos/dist/dev/beam/0.5.0/
> >>> [3] https://dist.apache.org/repos/dist/release/beam/KEYS
> >>> [4] https://repository.apache.org/content/repositories/orgapache
> >>> beam-1010/
> >>> [5] https://git-wip-us.apache.org/repos/asf?p=beam.git;a=tag;h=r
> >>> efs/tags/v0.5.0-RC1
> >>> [6] https://github.com/apache/beam-site/pull/132
> >>>
> >>>
> >>
> >>
> >>
> > --
> > Jean-Baptiste Onofré
> > jbono...@apache.org
> > http://blog.nanthrax.net
> > Talend - http://www.talend.com
> >
>


Re: [DISCUSS] Python SDK status and next steps

2017-01-30 Thread Davor Bonaci
Great -- congratulations to everyone who has contributed to the Python SDK!

On Mon, Jan 30, 2017 at 11:10 PM, Ahmet Altay 
wrote:

> Hi all,
>
> This merge is completed. Python SDK is now officially part of the master
> branch! Thank you all for the support. Please open an issue, if you notice
> a reference to the now obsolete python-sdk branch in the documentation.
>
> There will not be any more merges to the python-sdk branch. Going forward
> please use the master branch for Python SDK development. There are a few
> existing open PRs to the python-sdk [1]. If you are the author of one of
> those PRs, please rebase them on top of master.
>
> Thank you,
> Ahmet
>
> [1] https://github.com/pulls?utf8=✓&q=is%3Aopen+is%3Apr+base%
> 
> 3Apython-sdk+repo%3Aapache%2Fbeam+
>  3Apr+base%3Apython-sdk+repo%3Aapache%2Fbeam+>
>
> On Fri, Jan 20, 2017 at 10:06 AM, Kenneth Knowles 
> wrote:
>
> > To clarify the implied criteria of that last exchange, it is "An SDK
> should
> > have at least one runner that can execute the complete model (may be a
> > direct runner)"
> >
> > I want to highlight this, because whether an _SDK_ supports unbounded
> data
> > is not particularly well-defined, and will evolve:
> >
> >  - With the Runner API, an SDK will need to support building a graph with
> > unbounded constructs, as today with probably minimal changes.
> >
> >  - With the Fn API, if any part of the Fn API is specific to unbounded
> > data, the SDK will need to implement it. I think right now there is no
> such
> > thing, and we don't want such a thing, so SDKs implementing the Fn API
> > automatically support unbounded data.
> >
> >  - There will also likely be an SDK-specific shim just as there is today,
> > to leverage idiomatic deserialized representations. The richness of this
> > shim will decrease so that it will need to "support" unbounded data but
> > that will be a ~one liner.
> >
> > Getting the Python SDK on master will accelerate our progress towards the
> > Fn API - partly technical, partly community - which is the best path
> > towards support for unbounded data across multiple runners. I think the
> > criteria are written with the completed portability framework in mind. So
> > this exchange makes me actually more convinced we should merge python-sdk
> > to master.
> >
> > On Fri, Jan 20, 2017 at 9:53 AM, Robert Bradshaw <
> > rober...@google.com.invalid> wrote:
> >
> > > On Thu, Jan 19, 2017 at 11:56 PM, Dan Halperin
> > >  wrote:
> > > > I do not think that Python SDK yet meets the bar [1] for implementing
> > the
> > > > Beam model -- supporting Unbounded data is very important. That said,
> > > given
> > > > the committed and sustained set of contributors, it generally makes
> > sense
> > > > to me to make an exception in anticipation of these features being
> > > fleshed
> > > > out soon; including potentially new users/contributors that would
> > arrive
> > > > once in master.
> > > >
> > > > [1] https://lists.apache.org/thread.html/CAAzyFAxcmexUQnbF=Y
> > > > k0plmm3f5e5bqwjz4+c5doruclnxo...@mail.gmail.com
> > >
> > > That is a valid point. The Python SDK supports all the unbounded parts
> > > of the model except for unbounded sources, which was deferred while
> > > seeing how https://s.apache.org/splittable-do-fn played out. I've been
> > > working with the team and merging/reviewing most of their code, and
> > > have full confidence this will be coming (and on that note can vouch
> > > for a healthy community and support which are much harder to add
> > > later).
> > >
> > > In short, I think it has the required maturity, and I'm in favor of
> > > merging soonish.
> > >
> > > > On Wed, Jan 18, 2017 at 12:24 AM, Ahmet Altay
>  > >
> > > > wrote:
> > > >
> > > >> Thank you all for the comments so far. I would follow the process as
> > > >> suggested by Davor and others in this thread.
> > > >>
> > > >> Ahmet
> > > >>
> > > >> On Tue, Jan 17, 2017 at 11:47 PM, Sergio Fernández <
> wik...@apache.org
> > >
> > > >> wrote:
> > > >>
> > > >> > Hi
> > > >> >
> > > >> > On Tue, Jan 17, 2017 at 5:22 PM, Ahmet Altay
> >  > > >
> > > >> > wrote:
> > > >> > >
> > > >> > > tl;dr: I would like to start a discussion about merging
> python-sdk
> > > >> branch
> > > >> > > to master branch. Python SDK is mature enough and merging it to
> > > master
> > > >> > will
> > > >> > > accelerate its development and adoption.
> > > >> > >
> > > >> >
> > > >> > Good point, Ahmet!
> > > >> >
> > > >> > I've following closed the development since it was imported in
> June.
> > > For
> > > >> > the prototypes I've implemented so far it works quite well; I
> guess
> > > we'd
> > > >> > just need to focus the next months in bringing more runners
> support.
> > > >> >
> > > >> > With a great effort from a lot of contributors(*), Python SDK [1]
> is
> > > now
> > > >> a
> > > >> > > mostly complete, tested, pe

[VOTE] Apache Beam, version 0.5.0, release candidate #2

2017-02-02 Thread Davor Bonaci
Hi everyone,
With JB leaving for his vacation, I'll try to push the 0.5.0 release across
the finish line. Please review and vote on the release candidate #2 for the
version 0.5.0, as follows:
[ ] +1, Approve the release
[ ] -1, Do not approve the release (please provide specific comments)

The complete staging area is available for your review, which includes:
* JIRA release notes [1],
* the official Apache source release to be deployed to dist.apache.org [2],
which is signed with the key with fingerprint 8F0D334F [3],
* all artifacts to be deployed to the Maven Central Repository [4],
* source code tag "v0.5.0-RC2" [5],
* website pull request listing the release and publishing the API reference
manual [6].

Compared to release candidate #1, this candidate contains pull requests
#1903 [7] and #1908 [8]; see the discussion for reasoning.

A passing suite of Jenkins jobs:
* PreCommit_Java_MavenInstall [9],
* PostCommit_Java_MavenInstall [10],
* PostCommit_Java_RunnableOnService_Apex [11],
* PostCommit_Java_RunnableOnService_Flink [12],
* PostCommit_Java_RunnableOnService_Spark [13],
* PostCommit_Java_RunnableOnService_Dataflow [14].

The vote will be open for at least 72 hours. It is adopted by majority
approval, with at least 3 PMC affirmative votes.

Thanks,
Davor

[1]
https://issues.apache.org/jira/secure/ReleaseNote.jspa?projectId=12319527&version=12338859
[2] https://dist.apache.org/repos/dist/dev/beam/0.5.0/RC2/
[3] https://dist.apache.org/repos/dist/release/beam/KEYS
[4] https://repository.apache.org/content/repositories/orgapachebeam-1011/
[5] https://github.com/apache/beam/tree/v0.5.0-RC2
[6] https://github.com/apache/beam-site/pull/132
[7] https://github.com/apache/beam/pull/1903
[8] https://github.com/apache/beam/pull/1908
[9] https://builds.apache.org/job/beam_PreCommit_Java_MavenInstall/7028/
[10] https://builds.apache.org/job/beam_PostCommit_Java_MavenInstall/2514/
[11]
https://builds.apache.org/job/beam_PostCommit_Java_RunnableOnService_Apex/386/
[12]
https://builds.apache.org/job/beam_PostCommit_Java_RunnableOnService_Flink/1521/
[13]
https://builds.apache.org/job/beam_PostCommit_Java_RunnableOnService_Spark/830/
[14]
https://builds.apache.org/job/beam_PostCommit_Java_RunnableOnService_Dataflow/2180/


Re: [VOTE] Apache Beam, version 0.5.0, release candidate #2

2017-02-06 Thread Davor Bonaci
This vote is now complete. I'll summarize the results and next steps in a
separate thread.

On Mon, Feb 6, 2017 at 2:51 AM, Sergio Fernández  wrote:

> +1 (non-binding)
>
> So far I've successfully checked:
> * signatures and digests
> * source releases file layouts
> * no binaries included in the source release
> * matched git tag
> * NOTICE and LICENSE files
> * license headers
> * clean build (Java 1.8.0_91, Maven 3.3.9, Debian amd64)
>
> As I already commented on RC1, formally it's better to include commit id
> referencing the release, since git tags can be changed. Just take that into
> account for upcoming releases.
>
>
> On Fri, Feb 3, 2017 at 1:27 AM, Davor Bonaci  wrote:
>
> > Hi everyone,
> > With JB leaving for his vacation, I'll try to push the 0.5.0 release
> across
> > the finish line. Please review and vote on the release candidate #2 for
> the
> > version 0.5.0, as follows:
> > [ ] +1, Approve the release
> > [ ] -1, Do not approve the release (please provide specific comments)
> >
> > The complete staging area is available for your review, which includes:
> > * JIRA release notes [1],
> > * the official Apache source release to be deployed to dist.apache.org
> > [2],
> > which is signed with the key with fingerprint 8F0D334F [3],
> > * all artifacts to be deployed to the Maven Central Repository [4],
> > * source code tag "v0.5.0-RC2" [5],
> > * website pull request listing the release and publishing the API
> reference
> > manual [6].
> >
> > Compared to release candidate #1, this candidate contains pull requests
> > #1903 [7] and #1908 [8]; see the discussion for reasoning.
> >
> > A passing suite of Jenkins jobs:
> > * PreCommit_Java_MavenInstall [9],
> > * PostCommit_Java_MavenInstall [10],
> > * PostCommit_Java_RunnableOnService_Apex [11],
> > * PostCommit_Java_RunnableOnService_Flink [12],
> > * PostCommit_Java_RunnableOnService_Spark [13],
> > * PostCommit_Java_RunnableOnService_Dataflow [14].
> >
> > The vote will be open for at least 72 hours. It is adopted by majority
> > approval, with at least 3 PMC affirmative votes.
> >
> > Thanks,
> > Davor
> >
> > [1]
> > https://issues.apache.org/jira/secure/ReleaseNote.jspa?proje
> > ctId=12319527&version=12338859
> > [2] https://dist.apache.org/repos/dist/dev/beam/0.5.0/RC2/
> > [3] https://dist.apache.org/repos/dist/release/beam/KEYS
> > [4] https://repository.apache.org/content/repositories/
> orgapachebeam-1011/
> > [5] https://github.com/apache/beam/tree/v0.5.0-RC2
> > [6] https://github.com/apache/beam-site/pull/132
> > [7] https://github.com/apache/beam/pull/1903
> > [8] https://github.com/apache/beam/pull/1908
> > [9] https://builds.apache.org/job/beam_PreCommit_Java_MavenInstall/7028/
> > [10] https://builds.apache.org/job/beam_PostCommit_Java_
> MavenInstall/2514/
> > [11]
> > https://builds.apache.org/job/beam_PostCommit_Java_RunnableO
> > nService_Apex/386/
> > [12]
> > https://builds.apache.org/job/beam_PostCommit_Java_RunnableO
> > nService_Flink/1521/
> > [13]
> > https://builds.apache.org/job/beam_PostCommit_Java_RunnableO
> > nService_Spark/830/
> > [14]
> > https://builds.apache.org/job/beam_PostCommit_Java_RunnableO
> > nService_Dataflow/2180/
> >
>
>
>
> --
> Sergio Fernández
> Partner Technology Manager
> Redlink GmbH
> m: +43 6602747925 <+43%20660%202747925>
> e: sergio.fernan...@redlink.co
> w: http://redlink.co
>


[RESULT] [VOTE] Apache Beam, version 0.5.0, release candidate #2

2017-02-06 Thread Davor Bonaci
I'm happy to announce that we have unanimously approved this release.

There are 5 approving votes, 4 of which are binding:
* Davor Bonaci
* Sergio Fernández
* Dan Halperin
* Aljoscha Krettek
* Jean-Baptiste Onofré

There are no disapproving votes.

I'll proceed with the release as staged. Thanks everyone!


On Thu, Feb 2, 2017 at 4:27 PM, Davor Bonaci  wrote:

> Hi everyone,
> With JB leaving for his vacation, I'll try to push the 0.5.0 release
> across the finish line. Please review and vote on the release candidate #2
> for the version 0.5.0, as follows:
> [ ] +1, Approve the release
> [ ] -1, Do not approve the release (please provide specific comments)
>
> The complete staging area is available for your review, which includes:
> * JIRA release notes [1],
> * the official Apache source release to be deployed to dist.apache.org
> [2], which is signed with the key with fingerprint 8F0D334F [3],
> * all artifacts to be deployed to the Maven Central Repository [4],
> * source code tag "v0.5.0-RC2" [5],
> * website pull request listing the release and publishing the API
> reference manual [6].
>
> Compared to release candidate #1, this candidate contains pull requests
> #1903 [7] and #1908 [8]; see the discussion for reasoning.
>
> A passing suite of Jenkins jobs:
> * PreCommit_Java_MavenInstall [9],
> * PostCommit_Java_MavenInstall [10],
> * PostCommit_Java_RunnableOnService_Apex [11],
> * PostCommit_Java_RunnableOnService_Flink [12],
> * PostCommit_Java_RunnableOnService_Spark [13],
> * PostCommit_Java_RunnableOnService_Dataflow [14].
>
> The vote will be open for at least 72 hours. It is adopted by majority
> approval, with at least 3 PMC affirmative votes.
>
> Thanks,
> Davor
>
> [1] https://issues.apache.org/jira/secure/ReleaseNote.jspa?
> projectId=12319527&version=12338859
> [2] https://dist.apache.org/repos/dist/dev/beam/0.5.0/RC2/
> [3] https://dist.apache.org/repos/dist/release/beam/KEYS
> [4] https://repository.apache.org/content/repositories/orgapachebeam-1011/
> [5] https://github.com/apache/beam/tree/v0.5.0-RC2
> [6] https://github.com/apache/beam-site/pull/132
> [7] https://github.com/apache/beam/pull/1903
> [8] https://github.com/apache/beam/pull/1908
> [9] https://builds.apache.org/job/beam_PreCommit_Java_MavenInstall/7028/
> [10] https://builds.apache.org/job/beam_PostCommit_Java_MavenInstall/2514/
> [11] https://builds.apache.org/job/beam_PostCommit_Java_
> RunnableOnService_Apex/386/
> [12] https://builds.apache.org/job/beam_PostCommit_Java_
> RunnableOnService_Flink/1521/
> [13] https://builds.apache.org/job/beam_PostCommit_Java_
> RunnableOnService_Spark/830/
> [14] https://builds.apache.org/job/beam_PostCommit_Java_
> RunnableOnService_Dataflow/2180/
>


Re: BEAM-307(KafkaIO on Kafka 0.10)

2017-02-06 Thread Davor Bonaci
This would be a great contribution, Mingmin!

As a general rule, we'd like the connector to work with as many versions as
possible, with as little code duplication as possible.

Slightly orthogonal -- BigtableIO [1] is an example of a connector that
separates API portions from the underlying service. With a similar
layering, perhaps KafkaIO can support multiple versions of Kafka, while
maintaining as much of the common code as possible. KafkaIO case is a bit
more complicated, given that it needs changes on the API side as well.

[1]
https://github.com/apache/beam/tree/master/sdks/java/io/google-cloud-platform/src/main/java/org/apache/beam/sdk/io/gcp/bigtable

On Mon, Feb 6, 2017 at 12:21 PM, Raghu Angadi 
wrote:

> I see. kafka-clients dependency could also be in 'provided' scope so that
> is simpler to use different versions at runtime.
>
> On Mon, Feb 6, 2017 at 12:05 PM, Xu Mingmin  wrote:
>
> > The one I meet is external authentication added in 0.10, we use a
> > standalone token-based security service. In 0.9 the SASL-based
> > implementation is fixed with Kerberos.
> > Kafka client 0.10 cannot connect to Kafka server 0.9, that's why I
> mention
> > a separated project.
> >
> > Mingmin
> >
> > On Mon, Feb 6, 2017 at 11:45 AM, Raghu Angadi  >
> > wrote:
> >
> > > Current KafkaIO works just fine with Kafka 0.10. I don't know of any
> > > incompatibilities or regressions.
> > >
> > > It does not take advantage  of message timestamps, of course. It would
> be
> > > good to take handle tme in in a backward compatible way.. it might be
> > > required anyway if they are optional in 0.10.
> > >
> > > Not sure of scope of (1) below. I don't think it needs to be a new
> > > implementation.
> > >
> > > On Mon, Feb 6, 2017 at 11:35 AM, Xu Mingmin 
> wrote:
> > >
> > > > Hello,
> > > >
> > > > Is there anybody working on https://issues.apache.org/
> > > jira/browse/BEAM-307
> > > > ?
> > > > The existing KafkaIO is implemented with Kafka 0.9, and not
> compatible
> > > well
> > > > with Kafka 0.10.
> > > >
> > > > I'd like to take this task if not duplicated:
> > > > 1). a new KafkaIO based on Kafka 0.10, suggest a separated project
> for
> > > > easy-to-build;
> > > > 2). use timestamp of Kafka message as default event-timestamp
> > > >
> > > > Thanks!
> > > > Mingmin
> > > >
> > >
> >
>


Report to the Board, February 2017 edition

2017-02-06 Thread Davor Bonaci
We are expected to submit a project report to the ASF Board of Directors
ahead of its next meeting. The report is due on Wednesday, 2/8.

This is the second is the series of three monthly reports required for new
projects. We'll need to report next month as well.

If interested, please take a look at the draft [1], and comment as
appropriate. I'll submit the report sometime on Wednesday.

Thanks!

Davor

[1] https://docs.google.com/document/d/1QXc6lH8Zi6qqp_
tmVkh4lvJRcW18UO3F9EopjPhcJfQ/


Re: Beam connector development for Hive as a data source

2017-02-06 Thread Davor Bonaci
Hi Madhu,
Welcome! I suggest subscribing to the dev@ mailing list and using the same
email address when sending to the list, to avoid your email being caught in
moderation.

It would be great to have a connector for Apache Hive. Keep in mind that
several folks have expressed interest in using and contributing this
connector. As far as I know, nobody is *actively* working on it, so you
should be good to go. Please use BEAM-1158 [1] to coordinate this work with
any other interested contributor.

Note that there are several different ways of connecting Beam and Hive. The
simplest one is to write HiveIO that which would run a Hive query and
process Hive's results in Beam. Another would be to use Beam within Hive to
compute the results of a Hive query. Finally, one could possibly write a
Hive-based DSL on top of a Beam SDK.

All of these approaches are valid and somewhat orthogonal one to another.
I'm assuming you are after the first one. If so, and if you plan to follow
already established patterns in other connectors, you don't necessarily
need a design document. Otherwise, please start with a design document. We
have linked a template in the Contribution Guide [2, 3].

Once again, welcome and let us know if we can help in any way!

Davor

[1] https://issues.apache.org/jira/browse/BEAM-1158
[2] https://beam.apache.org/contribute/contribution-guide/
[3]
https://docs.google.com/document/d/1qYQPGtabN5-E4MjHsecqqC7PXvJtXvZukPfLXQ8rHJs

On Mon, Feb 6, 2017 at 4:27 PM, Madhusudan Borkar 
wrote:

> Hello,
>
> I am Big Data Architect working at eTouch Systems. We are GCP partners. We
> are planning to contribute to Beam by developing a connector for Apache
> Hive as a data source.
> I understand that before any development work begins, we need to submit our
> design to Beam community.  I would like to request you to please share a
> "design template" document for the same.  We will submit our design
> document, using the template.
>
>
> Thank you.
>
> best regards
> Madhu Borkar
>


Re: Report to the Board, February 2017 edition

2017-02-08 Thread Davor Bonaci
Thanks everyone -- the report has been posted.

On Tue, Feb 7, 2017 at 4:05 AM, Jean-Baptiste Onofré 
wrote:

> Hi
>
> It looks good to me.
>
> Thanks Davor
> Regards
> JB
>
> On Feb 6, 2017, 19:32, at 19:32, Davor Bonaci  wrote:
> >We are expected to submit a project report to the ASF Board of
> >Directors
> >ahead of its next meeting. The report is due on Wednesday, 2/8.
> >
> >This is the second is the series of three monthly reports required for
> >new
> >projects. We'll need to report next month as well.
> >
> >If interested, please take a look at the draft [1], and comment as
> >appropriate. I'll submit the report sometime on Wednesday.
> >
> >Thanks!
> >
> >Davor
> >
> >[1] https://docs.google.com/document/d/1QXc6lH8Zi6qqp_
> >tmVkh4lvJRcW18UO3F9EopjPhcJfQ/
>


Re: [RESULT] [VOTE] Apache Beam, version 0.5.0, release candidate #2

2017-02-08 Thread Davor Bonaci
This release is now complete.

Thanks to everyone who have helped make this release possible!

On Mon, Feb 6, 2017 at 8:36 AM, Davor Bonaci  wrote:

> I'm happy to announce that we have unanimously approved this release.
>
> There are 5 approving votes, 4 of which are binding:
> * Davor Bonaci
> * Sergio Fernández
> * Dan Halperin
> * Aljoscha Krettek
> * Jean-Baptiste Onofré
>
> There are no disapproving votes.
>
> I'll proceed with the release as staged. Thanks everyone!
>
>
> On Thu, Feb 2, 2017 at 4:27 PM, Davor Bonaci  wrote:
>
>> Hi everyone,
>> With JB leaving for his vacation, I'll try to push the 0.5.0 release
>> across the finish line. Please review and vote on the release candidate #2
>> for the version 0.5.0, as follows:
>> [ ] +1, Approve the release
>> [ ] -1, Do not approve the release (please provide specific comments)
>>
>> The complete staging area is available for your review, which includes:
>> * JIRA release notes [1],
>> * the official Apache source release to be deployed to dist.apache.org
>> [2], which is signed with the key with fingerprint 8F0D334F [3],
>> * all artifacts to be deployed to the Maven Central Repository [4],
>> * source code tag "v0.5.0-RC2" [5],
>> * website pull request listing the release and publishing the API
>> reference manual [6].
>>
>> Compared to release candidate #1, this candidate contains pull requests
>> #1903 [7] and #1908 [8]; see the discussion for reasoning.
>>
>> A passing suite of Jenkins jobs:
>> * PreCommit_Java_MavenInstall [9],
>> * PostCommit_Java_MavenInstall [10],
>> * PostCommit_Java_RunnableOnService_Apex [11],
>> * PostCommit_Java_RunnableOnService_Flink [12],
>> * PostCommit_Java_RunnableOnService_Spark [13],
>> * PostCommit_Java_RunnableOnService_Dataflow [14].
>>
>> The vote will be open for at least 72 hours. It is adopted by majority
>> approval, with at least 3 PMC affirmative votes.
>>
>> Thanks,
>> Davor
>>
>> [1] https://issues.apache.org/jira/secure/ReleaseNote.jspa?proje
>> ctId=12319527&version=12338859
>> [2] https://dist.apache.org/repos/dist/dev/beam/0.5.0/RC2/
>> [3] https://dist.apache.org/repos/dist/release/beam/KEYS
>> [4] https://repository.apache.org/content/repositories/orgapache
>> beam-1011/
>> [5] https://github.com/apache/beam/tree/v0.5.0-RC2
>> [6] https://github.com/apache/beam-site/pull/132
>> [7] https://github.com/apache/beam/pull/1903
>> [8] https://github.com/apache/beam/pull/1908
>> [9] https://builds.apache.org/job/beam_PreCommit_Java_MavenInstall/7028/
>> [10] https://builds.apache.org/job/beam_PostCommit_Java_MavenInst
>> all/2514/
>> [11] https://builds.apache.org/job/beam_PostCommit_Java_RunnableO
>> nService_Apex/386/
>> [12] https://builds.apache.org/job/beam_PostCommit_Java_RunnableO
>> nService_Flink/1521/
>> [13] https://builds.apache.org/job/beam_PostCommit_Java_RunnableO
>> nService_Spark/830/
>> [14] https://builds.apache.org/job/beam_PostCommit_Java_RunnableO
>> nService_Dataflow/2180/
>>
>
>


Re: Better developer instructions for using Maven?

2017-02-10 Thread Davor Bonaci
I think Dan's framework of thinking is right -- what is the probability of
something finding a real issue, vs. the cost of running that all the time.

Obviously, we cannot run *everything* all the time. There's an infinite
number of things to run and infinite matrix of configurations. Many tests
are not even possible to execute locally like testing scale, performance,
compatibility across platforms, etc. So, we run locally *the least*,
slightly more in presubmit, slightly more in post-submit, even more
nightly, and as much as possible for each release. Of course, it is great
to catch issues as soon as possible, but the cost is prohibitive in terms
of time and productivity.

This may sound a bit tangential so far... but, I think Aviem's (and others)
point of view come from a social aspect. As a contributor, nobody wants to
be "embarrassed" by proposing a pull request that is broken and has obvious
problems. One might think of themselves as failing to do a good job in this
particular case. So, instead, I might want to default to running more
things locally to prevent the "embarrassment".

Looking more broadly, however, there's a huge difference in things that the
default local execution *and* pre-commit don't exercise. It is really huge
and many breakages are caught in post commit, nightlies, or, at the very
end, cherrypicked into the release. So, I'd like to argue that nobody
should ever be embarrassed by proposing a pull request. It is normal,
expected, reasonable that Jenkins will catch issues -- that's why we have
it. Instead, think in terms of "error budget" -- increase individual
productivity by sacrificing occasional breakages. It is also completely
normal that a PR causes a post-commit breakage at times, or that a release
is delayed because of some PR -- this is all normal and fits into your
"error budget".

One easy improvement that I know Jason is looking at is to separate the
precommit signal into multiple signals. So, instead of a one "red" signal,
contributors can get five "greens" and one "red", which may help decrease
this social impact.

Hopefully, this is a convincing argument to use this framework in deciding
this matter. On the lower-lever issue, I think rat and findbugs have a low
probability of finding an issue in most cases.

(Also, an explicit +1 to Kenn's point of view of getting people to the PR
so we can work with them, as opposed to blocking them locally before they
interact with us.)

On Fri, Feb 10, 2017 at 11:50 AM, Kenneth Knowles 
wrote:

> Since the discussion has returned to the thread rather than Dan's PR, I
> want to paraphrase the point I feel strongest about here:
>
> *For a new contributor, I want to minimize the distance between them
> deciding to hack and becoming our friends.*
>
> So I don't want them to have to learn much, if anything, about our idioms
> prior to opening a PR. That includes checkstyle, findbugs, javadoc, RAT,
> (any others?). Once a PR is open, we can welcome them to the community,
> help them understand any failure that happened via Jenkins, tell them the
> command to run the more thorough test, or even issue little fix PRs against
> their branch. If they get blocked by nitpicky or confusing failures in
> their console or while hacking, they might (reasonably, IMO) decide to go
> do something else.
>
> Folks other than newcomers can learn a repertoire of commands, like Robert
> says. So we shouldn't consider them (aka "us") so much when deciding
> whether "fast" or "slow" is the default, as long as we can explicitly
> choose.
>
> Kenn
>
> On Fri, Feb 10, 2017 at 11:00 AM, Robert Bradshaw <
> rober...@google.com.invalid> wrote:
>
> > > 1. Developer productivity -- Jenkins should run many more checks than
> > > developers do. Especially time-, resource-, or setup- intensive tasks.
> > > 2. Automated enforcement -- Jenkins is better at running the right
> > commands
> > > than we are.
> > > 3. Lower the barrier to entry -- individual developers need not have a
> > > running Spark/Flink/Apex/Dataflow setup in order to contribute code.
> > > 4. Focus on the user -- someone checking out the code and using it for
> > the
> > > first time does not care whether the code style checks or has the right
> > > licenses -- that should have been enforced by the Beam team before
> > > committing.
> > >
> > > We should be *very* choosy about what we enforce on every developer
> every
> > > time they go to compile. I probably compile Beam 50x-100x a day.
> > Literally,
> > > the extra minutes you want to add here will cost me an hour daily.
> > >
> >
> > By the same token of having a different bar for the Jenkins presubmit vs.
> > what's run locally, I think it makes a lot of sense to run a different
> > command for iterative development than you run before creating a pull
> > request. E.g. during development I'll often run only one test rather than
> > the entire suite, but do run the entire suite occasionally (often before
> > commit, especially before pushing).
> 

Re: Beam on Kubernetes

2017-02-20 Thread Davor Bonaci
I think these are great ideas for simplifying the getting started
experience across runners -- we'd love a contribution in this space!

On Mon, Feb 20, 2017 at 12:46 AM, Jean-Baptiste Onofré 
wrote:

> Hi Nitin,
>
> It sounds like a good candidate for blog or documentation, or even an
> example.
>
> I have a step-by-step example of a pipeline running on Spark with Mesos
> (not yet Kubernetes) if you are interested.
>
> Regards
> JB
>
>
> On 02/20/2017 08:47 AM, Nitin Lamba wrote:
>
>> Hi,
>>
>> Trying to restart this thread from last November [1]. Packaging an
>> end-2-end Beam example for k8s environment, similar to the one from the
>> TensorFlow team [2], may be interesting to look at. The logical
>> progression
>> is:
>>
>> - Start with an example using the local (java) runner
>> - Build the next one for Spark; k8s repo already has Spark v1.5.x as an
>> example [3] that can be updated/ modified
>> - Other runners to follow using Spark as a template
>>
>> Let me know if there is interest in pursuing/ collaborating on this.
>>
>> Thanks,
>> Nitin
>>
>> [1]
>> https://www.mail-archive.com/user@beam.incubator.apache.org/msg00881.html
>> [2] https://tensorflow.github.io/serving/serving_inception
>> [3] https://github.com/kubernetes/kubernetes/tree/master/examples/spark
>>
>>
> --
> Jean-Baptiste Onofré
> jbono...@apache.org
> http://blog.nanthrax.net
> Talend - http://www.talend.com
>


Re: Better developer instructions for using Maven?

2017-02-20 Thread Davor Bonaci
(I'm also in favor of not overloading existing flags; they have some
meaning/semantics that developers have come to expect.)

On Sun, Feb 19, 2017 at 12:34 PM, Jean-Baptiste Onofré 
wrote:

> Thanks Aviem,
>
> Not sure if we should use skipTests as it really means unit tests and
> integration tests (in Karaf, skipTests skips the unit test,  integration
> test, archetype itests and maven plugin invoker test, but it doesn't skip
> checkstyle, findbugs, etc, for that,  we have a fastinstall property).
>
> Maybe it would make more sense to use a specific property like
> -DfastBuild=true.
>
> WDYT ?
>
> Regards
> JB
>
>
> On 02/19/2017 09:27 PM, Aviem Zur wrote:
>
>> I've created a PR which disables slow verifications if '-DskipTests' was
>> specified, otherwise runs them.
>> I think this satisfies all the considerations mentioned in this thread.
>>
>> PTAL: https://github.com/apache/beam/pull/2048
>> Ticket: https://issues.apache.org/jira/browse/BEAM-1513
>>
>> On Thu, Feb 16, 2017 at 3:05 PM Ismaël Mejía  wrote:
>>
>> JB, Maybe I was not clear, when I talked about the tests I was thinking
>>> more about execute them in parallel in the same machine, this is not the
>>> case today for some test suites, and for these the tests need to be
>>> refined
>>> to support this, and configured via the pom to execute the tests in
>>> parallel per method, class, etc. Of course we need to check if this is
>>> worth, because I can imagine that the more expensive time for example in
>>> the IO case comes from starting the embedded versions of the IOs (e.g.
>>> HadoopMiniCluster, MongodExecutable, HBasetestingutility, etc) and not
>>> from
>>> the tests themselves but this has to be evaluated.
>>>
>>>
>>>
>>> On Wed, Feb 15, 2017 at 5:46 PM, Jean-Baptiste Onofré 
>>> wrote:
>>>
>>> On Jenkins it's possible to run several jobs in the same time but on
 different executor. That's the easiest way.

 Regards
 JB

 On Feb 15, 2017, 10:15, at 10:15, "Ismaël Mejía" 
 wrote:

> This question got lost in the discussion, but there is a small
> improvement
> that we can do:
>
> Just to check, are we doing parallel builds?
>>
>
> We are on jenkins, not in travis, there is an ongoing PR to fix this.
>
> What we can improve is to check if we can run some of the test suites
> in
> parallel to gain some extra time. For exemple most of the IOs and some
> runners don't execute tests in parallel.
>
> Ismael
>
> (slightly related), is there a way to get change the timeout of travis
> jobs). Because it is failing most of the time now because of this, and
> it
> is quite noisey to have so many false positives.
>
>
>
>
> On Fri, Feb 10, 2017 at 8:00 PM, Robert Bradshaw <
> rober...@google.com.invalid> wrote:
>
> On Fri, Feb 10, 2017 at 8:45 AM, Dan Halperin
>>
> 
>>
>>> wrote:
>>
>> On Fri, Feb 10, 2017 at 7:42 AM, Kenneth Knowles
>>>
>> 
>>
>>> wrote:
>>>
>>> On Feb 10, 2017 07:36, "Dan Halperin"

>>> 
>
>> wrote:
>>>

 Before we added checkstyle it was under a minute. Now it's over

>>> five?
>
>> That's awful IMO


 Checkstyle didn't cause all that, did it?


>>> The "5 minutes" was going with Aviem's numbers after this change.
>>>
>> But
>
>> yes,
>>
>>> Checkstyle alone substantially (>+50%) the time from what it was
>>>
>> previously
>>
>>> to adding it back to the default build.
>>>
>>
>>
>> Just to check, are we doing parallel builds?
>>
>>
>>
>>> Noting that findbugs takes quite a lot more time. Javadoc and jar
>>>
>> are the
>
>> other two slow ones.

 RAT is fast. But it has very poor error messages, so we wouldn't

>>> want a
>
>> new
>>>
 contributor trying to figure out what is going on without our

>>> help.
>
>>

>>> There is a larger philosophical issue here: is there a point of
>>>
>> Jenkins
>
>> precommit testing? Why not just make `mvn install` run everything
>>>
>> that
>
>> Jenkins does? For that matter, why don't committers just push
>>>
>> directly to
>
>> master? Wouldn't that make everyone's life easier?
>>>
>>> I'd argue that's not true.
>>>
>>> 1. Developer productivity -- Jenkins should run many more checks
>>>
>> than
>
>> developers do. Especially time-, resource-, or setup- intensive
>>>
>> tasks.
>
>> 2. Automated enforcement -- Jenkins is better at running the right
>>>
>> commands
>>
>>> than we are.
>>> 3. Lower the barrier to entry -- individual developers need not
>>>
>> have a
>
>> running Spark/Flink/Apex/Dataflow set

Next major milestone: first stable release

2017-02-21 Thread Davor Bonaci
Graduating from incubation was our single, unifying goal for the past year.
With graduation now behind us, I think it is worth looking ahead towards
the next major milestone: the first stable release.

I think the first stable release is the logical next step. It enables the
growth of our user community by providing necessary guarantees and
confidence to deploy Beam into production. It is our message to the world:
“Beam is ready for prime time”.

With that, I’d like to start a discussion what the stable release really
means. For me, it is two equally important things:
* Production quality: it “just works”.
* Commitment to the API compatibility for the user-facing APIs.

Production quality is sometimes hard to define, but includes the following:
* No (known) major bugs.
* Polished user experience.
* Good documentation.
* Support for all major operating systems.
* Dependencies hidden from the callers (shading, API surface tests).
* Etc.

On the other hand, the API compatibility aspect includes:
* Proper use of semantic versioning [1]: major.minor.patch.
* No backward-incompatible API changes within a given major version for the
user-facing APIs across the project.
* Exception: APIs marked as experimental.
* Exception: internally-facing APIs, such as APIs between components.
* Any and all work can still proceed; we just need to be careful to do it
in a compatible way, at the worst, by introducing a new API and deprecating
the old one.

Time-wise, I think we are not far away from this goal. We do have a
compelling offering. Our APIs are already fairly stable. We just need a
little bit of effort across the project to polish the experience and do
those last few changes we always wanted. With that, I’d suggest to target:
* One more pre-release in late February/early March.
* The first stable release around the end of March.

I think it is worth noting that we’ll never get to perfection, and we’ll
never be able to finish “everything”. All that work, however, can still
proceed after the first stable release (just with a little extra overhead).

I’d love to hear everyone’s thoughts on this topic. It involves the future
project direction -- I’d like to invite everyone to participate!

If we have a consensus, I’d like to start marking progress on this effort
rather quickly. Perhaps we can jointly coordinate a project-wide effort to
polish the last few things and reach the first stable release.

Thanks!

Davor

[1] http://semver.org/


Interest in a (virtual) contributor meeting?

2017-02-21 Thread Davor Bonaci
In the early days of the project, we have held a few meetings for the
initial community to get to know each other. Since then, the community has
grown a huge amount, but we haven't organized any get-togethers.

I wanted to gauge interest in a potential video conference call in the near
future. No specific agenda -- simply a chance for everyone to meet others
and see the faces of people we share a common passion with. Of course, an
open discussion on any topic of interest to the contributor community is
welcome. This would be strictly informal -- any decisions are reserved for
the mailing list discussions.

If you'd be interested in attending, please reply back. If there's
sufficient interest, I'd be happy to try to organize something in the near
future.

Thanks!

Davor


Re: tf.Transform library for using TensorFlow with Beam

2017-02-23 Thread Davor Bonaci
Beam and TensorFlow coming together -- a big deal for us!

On Thu, Feb 23, 2017 at 3:49 PM, Ahmet Altay 
wrote:

> Hi all,
>
> Yesterday, there was an announcement from TensorFlow community about the
> new tf.Transform library [1]. It is a library that allows users to define
> pre-processing pipelines and run using large scale data processing
> frameworks. It is a library specifically designed to work with Apache Beam.
> It is great to see Python SDK getting a larger ecosystem and increased
> usage.
>
> Also worth mentioning is, PMC member Robert Bradshaw was one of the
> contributors to this new library.
>
> Thank you,
> Ahmet
>
> [1] https://research.googleblog.com/2017/02/preprocessing-for-machine-
> learning-with.html
>


Re: Interest in a (virtual) contributor meeting?

2017-02-23 Thread Davor Bonaci
Thanks everyone for your enthusiasms!

I'll try to organize something very shortly.

On Thu, Feb 23, 2017 at 12:48 AM, Jean-Baptiste Onofré 
wrote:

> Agree
>
> An agenda would be useful.
>
> Further more, I think it would be great to have minute notes on the
> mailing list: remember, if it's not on the mailing list, it never happened
> ;)
>
> My $0.01 ;)
>
> Regards
> JB
>
>
> On 02/23/2017 09:26 AM, Ismaël Mejía wrote:
>
>> +1 to do it periodically about different subjects.
>>
>> It is a good idea to have a sort of mini agenda, in the sense that the two
>> previous meetings had really different focus, the first one was about
>> contributors meeting each other and discussion of ongoing work just after
>> the project started on Apache, the second one was really focused on the
>> SplittableDoFn proposal, it was more focused on runner writers and IO
>> authors and it was a real 'tour de force' lead by Eugene,
>>
>>
>> On Wed, Feb 22, 2017 at 3:19 PM, Kobi Salant 
>> wrote:
>>
>> +1
>>>
>>> בתאריך 22 בפבר' 2017 2:54 PM,‏ "Aljoscha Krettek" 
>>> כתב:
>>>
>>> +1
>>>>
>>>> On Wed, 22 Feb 2017 at 10:08 JingsongLee 
>>>>
>>> wrote:
>>>
>>>>
>>>> +1
>>>>>
>>>>>
>>>>> 来自阿里邮箱 iPhone版 --原始邮件 --发件人:Davor
>>>>>
>>>> Bonaci
>>>
>>>> <
>>>>
>>>>> da...@apache.org>日期:2017-02-22 11:19:12收件人:dev@beam.apache.org <
>>>>> dev@beam.apache.org>主题:Interest in a (virtual) contributor meeting?In
>>>>>
>>>> the
>>>>
>>>>> early days of the project, we have held a few meetings for the
>>>>> initial community to get to know each other. Since then, the community
>>>>>
>>>> has
>>>>
>>>>> grown a huge amount, but we haven't organized any get-togethers.
>>>>>
>>>>> I wanted to gauge interest in a potential video conference call in the
>>>>>
>>>> near
>>>>
>>>>> future. No specific agenda -- simply a chance for everyone to meet
>>>>>
>>>> others
>>>
>>>> and see the faces of people we share a common passion with. Of course,
>>>>>
>>>> an
>>>
>>>> open discussion on any topic of interest to the contributor community
>>>>>
>>>> is
>>>
>>>> welcome. This would be strictly informal -- any decisions are reserved
>>>>>
>>>> for
>>>>
>>>>> the mailing list discussions.
>>>>>
>>>>> If you'd be interested in attending, please reply back. If there's
>>>>> sufficient interest, I'd be happy to try to organize something in the
>>>>>
>>>> near
>>>>
>>>>> future.
>>>>>
>>>>> Thanks!
>>>>>
>>>>> Davor
>>>>>
>>>>>
>>>>
>>>
>>
> --
> Jean-Baptiste Onofré
> jbono...@apache.org
> http://blog.nanthrax.net
> Talend - http://www.talend.com
>


Re: Release 0.6.0

2017-02-27 Thread Davor Bonaci
+1 -- let's get it started!

On Mon, Feb 27, 2017 at 2:01 PM, Ahmet Altay 
wrote:

> Hi all,
>
> It's been about a month since the last release. I would like propose
> starting the next release. There are no releasing blocking bugs in JIRA
> [1]. Are there any release blocking issues I am missing?
>
> Unless there is an objection I will volunteer to manage this release. This
> will be the first release with Python content. In case there are issues
> with that it might be easier for me to resolve and document those as part
> of the release process.
>
> Thank you,
> Ahmet
>
> [1]
> https://issues.apache.org/jira/issues/?jql=project%20%
> 3D%20BEAM%20AND%20resolution%20%3D%20Unresolved%20AND%
> 20fixVersion%20%3D%200.6.0%20ORDER%20BY%20due%20ASC%2C%
> 20priority%20DESC%2C%20created%20ASC
>


Re: Introduction

2017-02-27 Thread Davor Bonaci
Welcome -- it's great to have you!

(JIRA and Slack requests done.)

On Mon, Feb 27, 2017 at 2:51 PM, Roberto Bentivoglio <
roberto.bentivog...@radicalbit.io> wrote:

> Hi Everyone,
>
> I am a big fan of distributed streaming engine and I'm currently working at
> Radicalbit.
> My colleagues and I are planning to start to contribute actively on Beam in
> the next weeks, we'd like to give back to the Apache community a pay back
> working on this amazing project.
> Could you please add me to the contributors list / Slack in the meanwhile?
> My Jira ID on Apache is @robbenti and it's bound to this email, many thanks
> in advance!
>
> Looking forward to work with all of you!
>
> Kind Regards,
>
> --
> Roberto Bentivoglio
> CTO
> e. roberto.bentivog...@radicalbit.io
>


Re: tf.Transform library for using TensorFlow with Beam

2017-02-27 Thread Davor Bonaci
The Beam portability framework will enable this in Java too; not sure we
can do much sooner than that!

On Fri, Feb 24, 2017 at 3:33 PM, Amit Sela  wrote:

> That's great! many people have asked me about that and I'm glad to see this
> happening.
> Anyone know if there's something at work for the Java SDK (assuming I don't
> want to wait for Fn API support) ?
>
> On Fri, Feb 24, 2017 at 8:44 AM Jean-Baptiste Onofré 
> wrote:
>
> > Fantastic !
> >
> > That's a great addition and awesome to see that with Beam !
> >
> > Regards
> > JB
> >
> > On 02/24/2017 02:51 AM, Robert Bradshaw wrote:
> > > One thing I'm really excited about this library is that it allows one
> to
> > > more easily express transforms on columnar data (which is useful beyond
> > > just ML). For example, if your input elements have two fields "x" and
> "y"
> > > then you can write functions like
> > >
> > > def preprocessing_fn(inputs):
> > > x_centered = tft.map(lambda x, mean: x - mean, inputs['x'],
> > > tft.mean(inputs['x']))
> > > y_normalized = tft.scale_to_0_1(inputs['y'])
> > > return {
> > > 'x_centered': x_centered,
> > > 'y_normalized': y_normalized,
> > > 'x_centered_times_y_normalized': tft.map(operations.mul,
> > > x_centered, y_normalized)
> > > }
> > >
> > > # Read PCollection of dicts with 'x' and 'y' keys and numeric values
> > > input = p | Read(...)
> > >
> > > # output will contain dicts with 'x_centered', 'y_normalized', and
> > > 'x_centered_times_y_normalized' keys
> > > # with the expected values, and fn can be used to transform other data
> > > using the
> > > # statistics (mean, mins, and maxes) without re-analysis.
> > > output, fn = (input, schema) |
> > > beam_impl.AnalyzeAndTransformDataset(preprocessing_fn)
> > >
> > > This automatically injects the relevant global aggregations (which can
> be
> > > interleaved) and builds up tensorflow graphs to apply the
> transformations
> > > very efficiently.
> > >
> > >
> > > On Thu, Feb 23, 2017 at 4:55 PM, Davor Bonaci 
> wrote:
> > >
> > >> Beam and TensorFlow coming together -- a big deal for us!
> > >>
> > >> On Thu, Feb 23, 2017 at 3:49 PM, Ahmet Altay  >
> > >> wrote:
> > >>
> > >>> Hi all,
> > >>>
> > >>> Yesterday, there was an announcement from TensorFlow community about
> > the
> > >>> new tf.Transform library [1]. It is a library that allows users to
> > define
> > >>> pre-processing pipelines and run using large scale data processing
> > >>> frameworks. It is a library specifically designed to work with Apache
> > >> Beam.
> > >>> It is great to see Python SDK getting a larger ecosystem and
> increased
> > >>> usage.
> > >>>
> > >>> Also worth mentioning is, PMC member Robert Bradshaw was one of the
> > >>> contributors to this new library.
> > >>>
> > >>> Thank you,
> > >>> Ahmet
> > >>>
> > >>> [1] https://research.googleblog.com/2017/02/preprocessing-for-
> machine-
> > >>> learning-with.html
> > >>>
> > >>
> > >
> >
> > --
> > Jean-Baptiste Onofré
> > jbono...@apache.org
> > http://blog.nanthrax.net
> > Talend - http://www.talend.com
> >
>


Re: Release 0.6.0

2017-02-28 Thread Davor Bonaci
Can we please use JIRA to tag potentially release-blocking issues? Anyone
can just add a 'Fix Versions' field of an open issue to the next scheduled
release -- and it becomes easily visible to everyone in the project.

In general, I'm not a fan of blocking releases for new functionality.
Rushing new features and a lack of baking time usually translates to bugs.
However, I think this time it is totally justified -- on a separate thread
we plan for this to be the last release before the "first stable release";
and picking the new features now will provide additional coverage for it.

So, +1, but please tag in JIRA.

On Tue, Feb 28, 2017 at 2:09 AM, Aljoscha Krettek 
wrote:

> I would like to finish these two:
> https://issues.apache.org/jira/browse/BEAM-1036: Support for new State API
> in FlinkRunner
> https://issues.apache.org/jira/browse/BEAM-1116: Support for new Timer API
> in Flink runner
>
> Both of them are finished for the streaming runner, for the batch runner
> I'm merging the code for the first right now and the second will not take
> long.
>
> There is also this: https://issues.apache.org/jira/browse/BEAM-1517: User
> state in the Flink Streaming Runner is not garbage collected. It's not a
> regression from 0.5.0 where we simply didn't have this feature but I'm
> still somewhat uneasy about this.
>
>
> On Tue, 28 Feb 2017 at 09:44 Jean-Baptiste Onofré  wrote:
>
> > Fair enough.
> >
> > I also try to merge https://github.com/apache/beam/pull/1739 asap.
> >
> > Regards
> > JB
> >
> > On 02/28/2017 09:34 AM, Amit Sela wrote:
> > > I'd prefer we wait to merge https://github.com/apache/beam/pull/2050
> > > Shouldn't take long now..
> > >
> > > On Tue, Feb 28, 2017 at 10:00 AM Sergio Fernández 
> > wrote:
> > >
> > >> Sounds good!
> > >>
> > >> Ahmet, notice ASF has not current infrastructure to stage Python
> Release
> > >> Candidates. Anyway we left unmanaged the Maven deploy lifecycle for
> the
> > >> Python SDK, but it should be discussed at some point.
> > >>
> > >>
> > >>
> > >> On Mon, Feb 27, 2017 at 11:01 PM, Ahmet Altay
>  > >
> > >> wrote:
> > >>
> > >>> Hi all,
> > >>>
> > >>> It's been about a month since the last release. I would like propose
> > >>> starting the next release. There are no releasing blocking bugs in
> JIRA
> > >>> [1]. Are there any release blocking issues I am missing?
> > >>>
> > >>> Unless there is an objection I will volunteer to manage this release.
> > >> This
> > >>> will be the first release with Python content. In case there are
> issues
> > >>> with that it might be easier for me to resolve and document those as
> > part
> > >>> of the release process.
> > >>>
> > >>> Thank you,
> > >>> Ahmet
> > >>>
> > >>> [1]
> > >>> https://issues.apache.org/jira/issues/?jql=project%20%
> > >>> 3D%20BEAM%20AND%20resolution%20%3D%20Unresolved%20AND%
> > >>> 20fixVersion%20%3D%200.6.0%20ORDER%20BY%20due%20ASC%2C%
> > >>> 20priority%20DESC%2C%20created%20ASC
> > >>>
> > >>
> > >>
> > >>
> > >> --
> > >> Sergio Fernández
> > >> Partner Technology Manager
> > >> Redlink GmbH
> > >> m: +43 6602747925 <+43%20660%202747925> <+43%20660%202747925>
> > >> e: sergio.fernan...@redlink.co
> > >> w: http://redlink.co
> > >>
> > >
> >
> > --
> > Jean-Baptiste Onofré
> > jbono...@apache.org
> > http://blog.nanthrax.net
> > Talend - http://www.talend.com
> >
>


Re: Next major milestone: first stable release

2017-02-28 Thread Davor Bonaci
Alright -- sounds like we have a consensus to proceed with the first stable
release after 0.6.0, targeting end of March / early April. I'll kick off
separate threads for specific decisions we need to make.

On Thu, Feb 23, 2017 at 6:07 AM, Aljoscha Krettek 
wrote:

> I think we're ready for this! The public APIs are in very good shape,
> especially now that we have the new DoFn, user facing state and timers and
> splittable DoFn. Not all Runners support the more advanced features but we
> can work on this after a stable release and there are enough runners that
> support a large part of the features.
>
> Best,
> Aljoscha
>
> On Thu, 23 Feb 2017 at 06:15 Kenneth Knowles 
> wrote:
>
> > On Wed, Feb 22, 2017 at 5:35 PM, Chamikara Jayalath <
> chamik...@apache.org>
> > wrote:
> > >
> > > I think, this point applies to Python SDK as well (though as you
> > mentioned,
> > > API hiding in Python is a mere convention (prefix with underscore) not
> > > enforced. We already have mechanism for marking APIs as deprecated
> which
> > > might be useful here:
> > > https://github.com/apache/beam/blob/master/sdks/python/
> > > apache_beam/utils/annotations.py
> > >
> > > - Cham
> > >
> >
> > Perhaps an explicit @public annotation would fit. I could imagine easily
> > generating a spec to check against from such annotations, though tooling
> is
> > secondary to documentation.
> >
> > Kenn
> >
>


Re: Travis retest-this-please magic

2017-03-01 Thread Davor Bonaci
It cannot be done at this time.

We should really move all Travis coverage into Jenkins and completely
deprecate Travis. I know Jason is looking into that ;-)

On Wed, Mar 1, 2017 at 3:51 AM, Amit Sela  wrote:

> Hi all,
>
> Recently I've encountered PRs where everything was green in Jenkins but
> Travis was stuck and didn't execute.
> I couldn't (as the committer/reviewer) to do the same "retest this please"
> magic we apply to Jenkins, and I don't know of the possibility to do this
> in Travis.
> I know that on "my" Travis I can "Restart Build"  but I'm not sure
> contributors can do so on their, and I couldn't (on someone else's PR).
>
> Anyone knows how we can make this easier ?
>
> Appreciate the help.
>
> Amit
>


Re: Travis retest-this-please magic

2017-03-01 Thread Davor Bonaci
Use your best judgement. Travis right now provides multi-JDK,
multi-platform coverage not available in Jenkins. If the change is not
sensitive to that, it is probably reasonable to proceed.

On Wed, Mar 1, 2017 at 9:01 AM, Amit Sela  wrote:

> +1
> Can we merge PRs without waiting for Travis as long as it's not working ?
>
> On Wed, Mar 1, 2017 at 6:52 PM Davor Bonaci  wrote:
>
> > It cannot be done at this time.
> >
> > We should really move all Travis coverage into Jenkins and completely
> > deprecate Travis. I know Jason is looking into that ;-)
> >
> > On Wed, Mar 1, 2017 at 3:51 AM, Amit Sela  wrote:
> >
> > > Hi all,
> > >
> > > Recently I've encountered PRs where everything was green in Jenkins but
> > > Travis was stuck and didn't execute.
> > > I couldn't (as the committer/reviewer) to do the same "retest this
> > please"
> > > magic we apply to Jenkins, and I don't know of the possibility to do
> this
> > > in Travis.
> > > I know that on "my" Travis I can "Restart Build"  but I'm not sure
> > > contributors can do so on their, and I couldn't (on someone else's PR).
> > >
> > > Anyone knows how we can make this easier ?
> > >
> > > Appreciate the help.
> > >
> > > Amit
> > >
> >
>


Re: Next major milestone: first stable release

2017-03-01 Thread Davor Bonaci
We've now moved the discussion into the content of the first stable release.

I've created a version in JIRA called "First stable release". I'd like to
invite everyone to triage JIRA issues you care about, and assign "Fix
Versions" field to "First stable release" to mark the issue blocking for
the first stable release. This creates a project-wide burndown list and we
can track our progress towards the goal.

I'll try make a pass over as many JIRA issues as possible over the next day
or two, but it would be great if everyone, particularly component leads in
JIRA, take a pass too!

On Wed, Mar 1, 2017 at 2:51 AM, Jean-Baptiste Onofré 
wrote:

> Yes, fully agree.
>
> As far as I understood/know, BEAM-59 is targeted for Beam 1.0 (it's what
> we discussed with Pei and Davor).
>
> Regards
> JB
>
>
> On 03/01/2017 11:39 AM, Ismaël Mejía wrote:
>
>> Also joining a bit late, I agree with Amit, HDFS improvements are a really
>> good thing to have before the stable release. I will also add the
>> IOChannelFactory refactorings to support things like Read.from(“hdfs://”)
>> aka BEAM-59.
>>
>> In the worse case particular IOs can still be marked as experimental to
>> show users that they can still evolve, even after the first ‘stable’
>> release, the part that we have to pay more attention is not to break the
>> core SDK. And the question about Data Locality (BEAM-673) is where I am
>> afraid that we can have some breaking changes because there is not a way
>> from the IOs (Source/Sink) to send ‘a hint’ to the runner about Data
>> Locality (please correct me if I am wrong). And this even if not supported
>> in the first stable release by any runner, would be a really great thing
>> to
>> have and I think this is a good moment to do it, to avoid breaking any
>> IO/runner signature because of new methods.
>>
>> What do the others think ?
>> Ismaël
>>
>>
>>
>> On Tue, Feb 28, 2017 at 6:29 PM, Amit Sela  wrote:
>>
>> Joining in just a bit late, I'll be quick and say that IMHO the SDK is
>>> mature enough and so my only point to add is *HDFS support*.
>>> I think that in terms of adoption we have to support HDFS as a
>>> "first-class
>>> citizen" via the FileSystem API, and provide data locality (batch) on top
>>> of it - it serves not only HDFS, but other eco-system IOs such as HBase.
>>> From my experience with talking to people and companies, most are running
>>> batch in production with some streaming POC or even production use, but
>>> batch still takes most of production work. If we give them the same
>>> production results, with the Beam API, we can on-board them faster and
>>> make
>>> it easier for them to adopt streaming as well.
>>>
>>> Thanks,
>>> Amit
>>>
>>> On Tue, Feb 28, 2017 at 7:12 PM Davor Bonaci  wrote:
>>>
>>> Alright -- sounds like we have a consensus to proceed with the first
>>>>
>>> stable
>>>
>>>> release after 0.6.0, targeting end of March / early April. I'll kick off
>>>> separate threads for specific decisions we need to make.
>>>>
>>>> On Thu, Feb 23, 2017 at 6:07 AM, Aljoscha Krettek 
>>>> wrote:
>>>>
>>>> I think we're ready for this! The public APIs are in very good shape,
>>>>> especially now that we have the new DoFn, user facing state and timers
>>>>>
>>>> and
>>>>
>>>>> splittable DoFn. Not all Runners support the more advanced features but
>>>>>
>>>> we
>>>>
>>>>> can work on this after a stable release and there are enough runners
>>>>>
>>>> that
>>>
>>>> support a large part of the features.
>>>>>
>>>>> Best,
>>>>> Aljoscha
>>>>>
>>>>> On Thu, 23 Feb 2017 at 06:15 Kenneth Knowles 
>>>>> wrote:
>>>>>
>>>>> On Wed, Feb 22, 2017 at 5:35 PM, Chamikara Jayalath <
>>>>>>
>>>>> chamik...@apache.org>
>>>>>
>>>>>> wrote:
>>>>>>
>>>>>>>
>>>>>>> I think, this point applies to Python SDK as well (though as you
>>>>>>>
>>>>>> mentioned,
>>>>>>
>>>>>>> API hiding in Python is a mere convention (prefix with underscore)
>>>>>>>
>>>>>> not
>>>>
>>>>> enforced. We already have mechanism for marking APIs as deprecated
>>>>>>>
>>>>>> which
>>>>>
>>>>>> might be useful here:
>>>>>>> https://github.com/apache/beam/blob/master/sdks/python/
>>>>>>> apache_beam/utils/annotations.py
>>>>>>>
>>>>>>> - Cham
>>>>>>>
>>>>>>>
>>>>>> Perhaps an explicit @public annotation would fit. I could imagine
>>>>>>
>>>>> easily
>>>>
>>>>> generating a spec to check against from such annotations, though
>>>>>>
>>>>> tooling
>>>>
>>>>> is
>>>>>
>>>>>> secondary to documentation.
>>>>>>
>>>>>> Kenn
>>>>>>
>>>>>>
>>>>>
>>>>
>>>
>>
> --
> Jean-Baptiste Onofré
> jbono...@apache.org
> http://blog.nanthrax.net
> Talend - http://www.talend.com
>


First stable release: version designation?

2017-03-01 Thread Davor Bonaci
The first stable release is our next major project-wide goal; see
discussion in [1]. I've been referring to it as "the first stable release"
for a long time, not "1.0.0" or "2.0.0" or "2017" or something else, to
make sure we have an unbiased discussion and a consensus-based decision on
this matter.

I think that now is the time to consider the appropriate designation for
our first stable release, and formally make a decision on it. A reasonable
choices could be "1.0.0" or "2.0.0", perhaps there are others.

1.0.0:
* It logically comes after the current series, 0.x.y.
* Most people would expect it, I suppose.
* A possible confusion between Dataflow SDKs and Beam SDKs carrying the
same number.

2.0.0:
* Follows the pattern some other projects have taken -- continuing their
version numbering scheme from their previous origin.
* Better communicates project's roots, and degree of maturity.
* May be unexpected to some users.

I'd invite everyone to share their thoughts and preferences -- names are
important and well correlated with success. Thanks!

Davor

[1] https://lists.apache.org/thread.html/c35067071aec9029d9100ae973c629
9aa919c31d0de623ac367128e2@%3Cdev.beam.apache.org%3E


Apache Beam (virtual) contributor meeting @ Tue Mar 7, 2017

2017-03-01 Thread Davor Bonaci
Hi everyone,
Based on the high demand [1], let's try to organize a virtual contributor
meeting on Tuesday, March 7, 2017 at 15:00 UTC. For convenience, calendar
link [2] and an .ics file are attached.

I tried to accommodate as many time zones as possible, but I know it might
be hard for some of us at 7 AM on the US west coast or 11 PM in China.
Sorry about that.

Let's use Google Hangouts as the video conferencing technology. I think we
may be limited to something like 30 participants, so I'd encourage any
co-located contributors to consider joining together (if appropriate).
Joining the meeting should be straightforward -- please find the link
within. No special requirements that I'm aware of.

Just to re-state the expectations:
* This is totally optional and informal.
* It is simply a chance for everyone to meet others and see the faces of
people we share a common passion with.
* No specific agenda.
* An open discussion on any topic of interest to the contributor community
is
welcome -- please feel free to bring up any topics you care about.
* No formal discussion or decisions should to be made.
* We'll keep notes and share them on the mailing list shortly after the
meeting.

Thanks -- and hope to see all of you there!

Davor

[1]
https://lists.apache.org/thread.html/baf057b81c5f6d4127abadac165d923a224d34438fe67b71d73743ad@%3Cdev.beam.apache.org%3E
[2]
https://calendar.google.com/calendar/event?action=TEMPLATE&tmeid=a3A2MzdhaWdhdjByNWRibzZrN2ZnOG1kMTAgZGF2b3JAZ29vZ2xlLmNvbQ&tmsrc=davor%40google.com
BEGIN:VCALENDAR
PRODID:-//Google Inc//Google Calendar 70.9054//EN
VERSION:2.0
CALSCALE:GREGORIAN
METHOD:REQUEST
BEGIN:VEVENT
DTSTART:20170307T15Z
DTEND:20170307T16Z
DTSTAMP:20170301T203852Z
ORGANIZER;CN=Davor Bonaci:mailto:da...@google.com
UID:kp637aigav0r5dbo6k7fg8m...@google.com
ATTENDEE;CUTYPE=INDIVIDUAL;ROLE=OPT-PARTICIPANT;PARTSTAT=NEEDS-ACTION;RSVP=
 TRUE;CN=dev@beam.apache.org;X-NUM-GUESTS=0:mailto:dev@beam.apache.org
ATTENDEE;CUTYPE=INDIVIDUAL;ROLE=REQ-PARTICIPANT;PARTSTAT=ACCEPTED;RSVP=TRUE
 ;CN=Davor Bonaci;X-NUM-GUESTS=0:mailto:da...@google.com
CREATED:20170301T203851Z
DESCRIPTION:Hi everyone\,\nBased on the high demand [1]\, let's try to orga
 nize a virtual contributor meeting on Tuesday\, March 7\, 2017 at 15:00 UTC
 .\n\nI tried to accommodate as many time zones as possible\, but I know it 
 might be hard for some of us at 7 AM on the US west coast or 11 PM in China
 . Sorry about that.\n\nLet's use Google Hangouts as the video conferencing 
 technology. I think we may be limited to something like 30 participants\, s
 o I'd encourage any co-located contributors to consider joining together (i
 f appropriate). Joining the meeting should be straightforward -- please fin
 d the link within. No special requirements that I'm aware of.\n\nJust to re
 -state the expectations:\n* This is totally optional and informal.\n* It is
  simply a chance for everyone to meet others and see the faces of people we
  share a common passion with.\n* No specific agenda.\n* An open discussion 
 on any topic of interest to the contributor community is\nwelcome -- please
  feel free to bring up any topics you care about.\n* No formal discussion o
 r decisions should to be made.\n* We'll keep notes and share them on the ma
 iling list shortly after the meeting.\n\nIf you are planning to attend\, pl
 ease RSVP on this invitation.\n\nThanks -- and hope to see all of you there
 !\n\nDavor\n\n[1] https://lists.apache.org/thread.html/baf057b81c5f6d4127ab
 adac165d923a224d34438fe67b71d73743ad@%3Cdev.beam.apache.org%3E\n\nThis even
 t has a Google Hangouts video call.\nJoin: https://plus.google.com/hangouts
 /_/google.com/beam-dev-mtg?hceid=ZGF2b3JAZ29vZ2xlLmNvbQ.kp637aigav0r5dbo6k7
 fg8md10&hs=121\n\nView your event at https://www.google.com/calendar/event?
 action=VIEW&eid=a3A2MzdhaWdhdjByNWRibzZrN2ZnOG1kMTAgZGV2QGJlYW0uYXBhY2hlLm9
 yZw&tok=MTYjZGF2b3JAZ29vZ2xlLmNvbTljMWI0YTllOWVjNDZhNTExM2M0YTdjZGZkYmVmMTE
 4ODAxY2IwOGM&ctz=America/Los_Angeles&hl=en.
LAST-MODIFIED:20170301T203851Z
LOCATION:Google Hangouts\; just join the video call specified within
SEQUENCE:0
STATUS:CONFIRMED
SUMMARY:Apache Beam (virtual) contributor meeting
TRANSP:OPAQUE
END:VEVENT
END:VCALENDAR


Re: Apache Beam (virtual) contributor meeting @ Tue Mar 7, 2017

2017-03-02 Thread Davor Bonaci
I'd prefer not to record the video; just to keep things informal. We'll,
however, keep the notes and share anything that may be relevant.

On Thu, Mar 2, 2017 at 2:24 PM, Amit Sela  wrote:

> I'll be there!
>
> On Thu, Mar 2, 2017 at 1:06 PM Aljoscha Krettek 
> wrote:
>
> > Shoot, I can't because I already have another meeting scheduled. Don't
> mind
> > me, though. Will you also maybe produce a video of the meeting?
> >
> > On Wed, 1 Mar 2017 at 21:50 Davor Bonaci  wrote:
> >
> > > Hi everyone,
> > > Based on the high demand [1], let's try to organize a virtual
> contributor
> > > meeting on Tuesday, March 7, 2017 at 15:00 UTC. For convenience,
> calendar
> > > link [2] and an .ics file are attached.
> > >
> > > I tried to accommodate as many time zones as possible, but I know it
> > might
> > > be hard for some of us at 7 AM on the US west coast or 11 PM in China.
> > > Sorry about that.
> > >
> > > Let's use Google Hangouts as the video conferencing technology. I think
> > we
> > > may be limited to something like 30 participants, so I'd encourage any
> > > co-located contributors to consider joining together (if appropriate).
> > > Joining the meeting should be straightforward -- please find the link
> > > within. No special requirements that I'm aware of.
> > >
> > > Just to re-state the expectations:
> > > * This is totally optional and informal.
> > > * It is simply a chance for everyone to meet others and see the faces
> of
> > > people we share a common passion with.
> > > * No specific agenda.
> > > * An open discussion on any topic of interest to the contributor
> > community
> > > is
> > > welcome -- please feel free to bring up any topics you care about.
> > > * No formal discussion or decisions should to be made.
> > > * We'll keep notes and share them on the mailing list shortly after the
> > > meeting.
> > >
> > > Thanks -- and hope to see all of you there!
> > >
> > > Davor
> > >
> > > [1]
> > >
> > https://lists.apache.org/thread.html/baf057b81c5f6d4127abadac165d92
> 3a224d34438fe67b71d73743ad@%3Cdev.beam.apache.org%3E
> > > [2]
> > >
> > https://calendar.google.com/calendar/event?action=TEMPLATE&tmeid=
> a3A2MzdhaWdhdjByNWRibzZrN2ZnOG1kMTAgZGF2b3JAZ29vZ2xlLmNvbQ&
> tmsrc=davor%40google.com
> > >
> >
>


Re: Apache Beam (virtual) contributor meeting @ Tue Mar 7, 2017

2017-03-06 Thread Davor Bonaci
Just a remainder that this is happening in about ~22 hours from now. Hope
to see all of you there.

On Thu, Mar 2, 2017 at 4:22 PM, Davor Bonaci  wrote:

> I'd prefer not to record the video; just to keep things informal. We'll,
> however, keep the notes and share anything that may be relevant.
>
> On Thu, Mar 2, 2017 at 2:24 PM, Amit Sela  wrote:
>
>> I'll be there!
>>
>> On Thu, Mar 2, 2017 at 1:06 PM Aljoscha Krettek 
>> wrote:
>>
>> > Shoot, I can't because I already have another meeting scheduled. Don't
>> mind
>> > me, though. Will you also maybe produce a video of the meeting?
>> >
>> > On Wed, 1 Mar 2017 at 21:50 Davor Bonaci  wrote:
>> >
>> > > Hi everyone,
>> > > Based on the high demand [1], let's try to organize a virtual
>> contributor
>> > > meeting on Tuesday, March 7, 2017 at 15:00 UTC. For convenience,
>> calendar
>> > > link [2] and an .ics file are attached.
>> > >
>> > > I tried to accommodate as many time zones as possible, but I know it
>> > might
>> > > be hard for some of us at 7 AM on the US west coast or 11 PM in China.
>> > > Sorry about that.
>> > >
>> > > Let's use Google Hangouts as the video conferencing technology. I
>> think
>> > we
>> > > may be limited to something like 30 participants, so I'd encourage any
>> > > co-located contributors to consider joining together (if appropriate).
>> > > Joining the meeting should be straightforward -- please find the link
>> > > within. No special requirements that I'm aware of.
>> > >
>> > > Just to re-state the expectations:
>> > > * This is totally optional and informal.
>> > > * It is simply a chance for everyone to meet others and see the faces
>> of
>> > > people we share a common passion with.
>> > > * No specific agenda.
>> > > * An open discussion on any topic of interest to the contributor
>> > community
>> > > is
>> > > welcome -- please feel free to bring up any topics you care about.
>> > > * No formal discussion or decisions should to be made.
>> > > * We'll keep notes and share them on the mailing list shortly after
>> the
>> > > meeting.
>> > >
>> > > Thanks -- and hope to see all of you there!
>> > >
>> > > Davor
>> > >
>> > > [1]
>> > >
>> > https://lists.apache.org/thread.html/baf057b81c5f6d4127abada
>> c165d923a224d34438fe67b71d73743ad@%3Cdev.beam.apache.org%3E
>> > > [2]
>> > >
>> > https://calendar.google.com/calendar/event?action=TEMPLATE&;
>> tmeid=a3A2MzdhaWdhdjByNWRibzZrN2ZnOG1kMTAgZGF2b3JAZ29vZ2xlLm
>> NvbQ&tmsrc=davor%40google.com
>> > >
>> >
>>
>
>


Re: Apache Beam (virtual) contributor meeting @ Tue Mar 7, 2017

2017-03-06 Thread Davor Bonaci
Link: https://hangouts.google.com/hangouts/_/google.com/beam-dev-mtg

I'll try to be available on Slack shortly before the meeting, just in case
someone has trouble connecting.

On Mon, Mar 6, 2017 at 9:27 AM, Amit Sela  wrote:

> PayPal team will be there joined together.
>
> On Mon, Mar 6, 2017 at 7:23 PM Davor Bonaci  wrote:
>
> > Just a remainder that this is happening in about ~22 hours from now. Hope
> > to see all of you there.
> >
> > On Thu, Mar 2, 2017 at 4:22 PM, Davor Bonaci  wrote:
> >
> > > I'd prefer not to record the video; just to keep things informal.
> We'll,
> > > however, keep the notes and share anything that may be relevant.
> > >
> > > On Thu, Mar 2, 2017 at 2:24 PM, Amit Sela 
> wrote:
> > >
> > >> I'll be there!
> > >>
> > >> On Thu, Mar 2, 2017 at 1:06 PM Aljoscha Krettek 
> > >> wrote:
> > >>
> > >> > Shoot, I can't because I already have another meeting scheduled.
> Don't
> > >> mind
> > >> > me, though. Will you also maybe produce a video of the meeting?
> > >> >
> > >> > On Wed, 1 Mar 2017 at 21:50 Davor Bonaci  wrote:
> > >> >
> > >> > > Hi everyone,
> > >> > > Based on the high demand [1], let's try to organize a virtual
> > >> contributor
> > >> > > meeting on Tuesday, March 7, 2017 at 15:00 UTC. For convenience,
> > >> calendar
> > >> > > link [2] and an .ics file are attached.
> > >> > >
> > >> > > I tried to accommodate as many time zones as possible, but I know
> it
> > >> > might
> > >> > > be hard for some of us at 7 AM on the US west coast or 11 PM in
> > China.
> > >> > > Sorry about that.
> > >> > >
> > >> > > Let's use Google Hangouts as the video conferencing technology. I
> > >> think
> > >> > we
> > >> > > may be limited to something like 30 participants, so I'd encourage
> > any
> > >> > > co-located contributors to consider joining together (if
> > appropriate).
> > >> > > Joining the meeting should be straightforward -- please find the
> > link
> > >> > > within. No special requirements that I'm aware of.
> > >> > >
> > >> > > Just to re-state the expectations:
> > >> > > * This is totally optional and informal.
> > >> > > * It is simply a chance for everyone to meet others and see the
> > faces
> > >> of
> > >> > > people we share a common passion with.
> > >> > > * No specific agenda.
> > >> > > * An open discussion on any topic of interest to the contributor
> > >> > community
> > >> > > is
> > >> > > welcome -- please feel free to bring up any topics you care about.
> > >> > > * No formal discussion or decisions should to be made.
> > >> > > * We'll keep notes and share them on the mailing list shortly
> after
> > >> the
> > >> > > meeting.
> > >> > >
> > >> > > Thanks -- and hope to see all of you there!
> > >> > >
> > >> > > Davor
> > >> > >
> > >> > > [1]
> > >> > >
> > >> > https://lists.apache.org/thread.html/baf057b81c5f6d4127abada
> > >> c165d923a224d34438fe67b71d73743ad@%3Cdev.beam.apache.org%3E
> > >> > > [2]
> > >> > >
> > >> > https://calendar.google.com/calendar/event?action=TEMPLATE&;
> > >> tmeid=a3A2MzdhaWdhdjByNWRibzZrN2ZnOG1kMTAgZGF2b3JAZ29vZ2xlLm
> > >> NvbQ&tmsrc=davor%40google.com
> > >> > >
> > >> >
> > >>
> > >
> > >
> >
>


Re: First stable release: version designation?

2017-03-06 Thread Davor Bonaci
It sounds like we'll end up with two camps on this topic. This issue is
probably best resolved with a vote, but I'll try to rephrase the question
once to see whether a consensus is possible.

Instead of asking which option is better, does anyone think the project
would be negatively impacted if we were to decide on, in your opinion, the
less desirable variant? If so, can you comment on the negative impact of
the less desirable alternative please?

(I understand this may be pushing it a bit, but I think a possible
consensus on this is worth it. Personally, I'll stay away from weighing in
on this topic.)

On Thu, Mar 2, 2017 at 2:57 AM, Aljoscha Krettek 
wrote:

> I prefer 2.0.0 for the first stable release. It totally makes sense for
> people coming from Dataflow 1.x and I can already envision the confusion
> between Beam 1.5 and Dataflow 1.5.
>
> On Thu, 2 Mar 2017 at 07:42 Jean-Baptiste Onofré  wrote:
>
> > Hi Davor,
> >
> >
> > For a Beam community perspective, 1.0.0 would make more sense. We have a
> > fair number of people starting with Beam (without knowing Dataflow).
> >
> > However, as Dataflow SDK (origins of Beam) was in 1.0.0, in order to
> > avoid confusion with users coming to Beam from Dataflow, 2.0.0 could
> help.
> >
> > I have a preference to 1.0.0 anyway, but I would understand starting
> > from 2.0.0.
> >
> > Regards
> > JB
> >
> > On 03/01/2017 07:56 PM, Davor Bonaci wrote:
> > > The first stable release is our next major project-wide goal; see
> > > discussion in [1]. I've been referring to it as "the first stable
> > release"
> > > for a long time, not "1.0.0" or "2.0.0" or "2017" or something else, to
> > > make sure we have an unbiased discussion and a consensus-based decision
> > on
> > > this matter.
> > >
> > > I think that now is the time to consider the appropriate designation
> for
> > > our first stable release, and formally make a decision on it. A
> > reasonable
> > > choices could be "1.0.0" or "2.0.0", perhaps there are others.
> > >
> > > 1.0.0:
> > > * It logically comes after the current series, 0.x.y.
> > > * Most people would expect it, I suppose.
> > > * A possible confusion between Dataflow SDKs and Beam SDKs carrying the
> > > same number.
> > >
> > > 2.0.0:
> > > * Follows the pattern some other projects have taken -- continuing
> their
> > > version numbering scheme from their previous origin.
> > > * Better communicates project's roots, and degree of maturity.
> > > * May be unexpected to some users.
> > >
> > > I'd invite everyone to share their thoughts and preferences -- names
> are
> > > important and well correlated with success. Thanks!
> > >
> > > Davor
> > >
> > > [1] https://lists.apache.org/thread.html/
> c35067071aec9029d9100ae973c629
> > > 9aa919c31d0de623ac367128e2@%3Cdev.beam.apache.org%3E
> > >
> >
> > --
> > Jean-Baptiste Onofré
> > jbono...@apache.org
> > http://blog.nanthrax.net
> > Talend - http://www.talend.com
> >
>


Fwd: Google Summer of Code 2017 Mentor Registration

2017-03-06 Thread Davor Bonaci
There was a previous discussion and planning around Beam's involvement with
the Summer of Code 2017 -- here's more information on how to get started
and formally register as a mentor.

Davor

-- Forwarded message --
From: Ulrich Stärk 
Date: Mon, Mar 6, 2017 at 11:39 AM
Subject: Google Summer of Code 2017 Mentor Registration
To: ment...@community.apache.org
Cc: "d...@community.apache.org" 


Dear PMCs,

I'm happy to announce that the ASF has made it onto the list of accepted
organizations for
Google Summer of Code 2017! [1,2]

It is now time for mentors to sign up, so please pass this email on to your
community and
podlings. If you aren’t already subscribed to ment...@community.apache.org
you should do so now else
you might miss important information.

Mentor signup requires two steps: mentor signup in Google's system [3] and
PMC acknowledgement.

If you want to mentor a project in this year's SoC you will have to

1. Be an Apache committer.
2. Request an acknowledgement from the PMC for which you want to mentor
projects. Use the below
template and *do not forget to copy ment...@community.apache.org*. We will
use the email adress you
indicate to send the invite to be a mentor for Apache.

PMCs, read carefully please.

We request that each mentor is acknowledged by a PMC member. This is to
ensure the mentor is in good
standing with the community. When you receive a request for
acknowledgement, please ACK it and cc
ment...@community.apache.org

Lastly, it is not yet too late to record your ideas in Jira (see my
previous emails for details).
Students will now begin to explore ideas so if you haven’t already done so,
record your ideas
immediately!

Cheers,

Uli

mentor request email template:

to: private@.apache.org
cc: ment...@community.apache.org
subject: GSoC 2017 mentor request for 

 PMC,

please acknowledge my request to become a mentor for Google Summer of Code
2017 projects for Apache
.

I would like to receive the mentor invite to 





[1] https://summerofcode.withgoogle.com/organizations/
[2] https://summerofcode.withgoogle.com/organizations/5416945173135360/
[3] https://summerofcode.withgoogle.com/


Re: found a typographical error in the Beam documentation.

2017-03-07 Thread Davor Bonaci
Indeed -- there's a typo on that page.

Would you be willing to submit a pull request to our website repository [1]
correcting this?

Thanks!

Davor

[1] https://github.com/apache/beam-site

On Tue, Mar 7, 2017 at 12:28 AM, 성준영  wrote:

> In Beam Documentation - Programming Guide - Applying transform - Core Beam
> transform - ParDo section
>
>
> 
>
> When you apply a ParDo transform, you’ll need to provide user code in the
> form of a DoFn object. DoFn is a Beam SDK class that defines a distribured
> processing function.
>
> 
>
>
> It is possible to grasp the meaning, But It is considered correct to change
> distribured to distributed.
>


Re: found a typographical error in the Beam documentation.

2017-03-07 Thread Davor Bonaci
Thanks for the PR; it is now merged.

Regarding GSoC, please check the previous threads on this topic [1], [2].
You might be interested in checking some ideas being nominated [3].

Davor

[1]
https://lists.apache.org/thread.html/4e52cf18b2ed5a1bc457a894c04346b8b0ac713e22da069c67dc29ec@%3Cdev.beam.apache.org%3E
[2]
https://lists.apache.org/thread.html/7886fb5e8e024070741712899a18b4251592f9706ad0bc19e36a8226@%3Cdev.beam.apache.org%3E
[3]
https://issues.apache.org/jira/issues/?filter=12339687&jql=project%20%3D%20BEAM%20AND%20labels%20%3D%20gsoc2017

On Tue, Mar 7, 2017 at 7:05 AM, 성준영  wrote:

> A pull-request has been sent to the repository.
>
> I am a student from Korea who is trying to join GSoC. I am very interested
> in Apache Beam and I am working on translating the documentation into
> Korean
> <https://github.com/sungjunyoung/apache_beam_doc_ko>. If you are
> interested, it would be a great honor for me.
>
> Thanks for the quick answer :)
>
>
> 켜짐 2017년 3월 7일 에서 오후 11:50:58, Davor Bonaci (da...@apache.org) 작성됨:
>
> Indeed -- there's a typo on that page.
>
> Would you be willing to submit a pull request to our website repository [1]
> correcting this?
>
> Thanks!
>
> Davor
>
> [1] https://github.com/apache/beam-site
>
> On Tue, Mar 7, 2017 at 12:28 AM, 성준영  wrote:
>
> > In Beam Documentation - Programming Guide - Applying transform - Core
> Beam
> > transform - ParDo section
> >
> >
> > 
> >
> > When you apply a ParDo transform, you’ll need to provide user code in the
> > form of a DoFn object. DoFn is a Beam SDK class that defines a
> distribured
> > processing function.
> >
> > 
> >
> >
> > It is possible to grasp the meaning, But It is considered correct to
> change
> > distribured to distributed.
> >
>


Report to the Board, March 2017 edition

2017-03-07 Thread Davor Bonaci
We are expected to submit a project report to the ASF Board of Directors
ahead of its next meeting. The report is due on Wednesday, 3/8.

This is the third is the series of three consecutive monthly reports
required for new projects.

If interested, please take a look at the draft [1], and comment or
contribute content, as appropriate. I'll submit the report sometime on
Wednesday.

Thanks!

Davor

[1]
https://docs.google.com/document/d/1eYBBIafwnbNUZj6Iqk0_kDhnqJ8PYkzVjipNm1v1RJs/


Re: Report to the Board, March 2017 edition

2017-03-08 Thread Davor Bonaci
The report is now submitted; thanks for the comments and improvements!

On Tue, Mar 7, 2017 at 6:14 PM, Davor Bonaci  wrote:

> We are expected to submit a project report to the ASF Board of Directors
> ahead of its next meeting. The report is due on Wednesday, 3/8.
>
> This is the third is the series of three consecutive monthly reports
> required for new projects.
>
> If interested, please take a look at the draft [1], and comment or
> contribute content, as appropriate. I'll submit the report sometime on
> Wednesday.
>
> Thanks!
>
> Davor
>
> [1] https://docs.google.com/document/d/1eYBBIafwnbNUZj6Iqk0_
> kDhnqJ8PYkzVjipNm1v1RJs/
>


Re: Add GitHub topics to Beam repository

2017-03-09 Thread Davor Bonaci
We certainly cannot do this ourselves, but it is within the realm of
possibility that Apache Infra can assign tags to our GitHub repositories.

Looking at the tags that already exist, perhaps we could go with:
* apache-beam
* big-data
* data-processing
* data-analysis
* data-science
* data-analytics
* data-mining
* apache-spark
* spark
* apache-flink
* flink
* google-cloud-dataflow
* apex

If we are changing things in this space, we could also ask to update text
"Mirror of Apache Beam" to "Apache Beam is a unified programming model for
both batch and streaming data processing, enabling efficient execution
across diverse distributed execution engines and providing extensibility
points for connecting to different technologies and user communities", and
add a link to "https://beam.apache.org/";.

I think this would be an improvement, so +1 if this is something Infra can
do easily. (If not, that's fine too.)

On Thu, Mar 9, 2017 at 4:49 AM, Jean-Baptiste Onofré 
wrote:

> Hi Aviem,
>
> thanks for the hit !
>
> I thought about reviewers feature too but it requires "write" access to
> github, which is not possible as github is just a mirror of Apache git.
>
> Regards
> JB
>
>
> On 03/09/2017 01:31 PM, Aviem Zur wrote:
>
>> About a month ago GitHub introduced topics, which let GitHub users query
>> for repositories by topics (domains that the repos deal with).
>> We can leverage these to increase Beam's exposure on GitHub.
>>
>> Example topics we could add: big-data, google-cloud-dataflow, spark,
>> flink,
>> apex, gearpump
>> We can also add the topics which Dataflow added: data-science,
>> data-analysis, data-mining, data-processing
>>
>> [1] https://github.com/blog/2309-introducing-topics
>> [2] https://github.com/GoogleCloudPlatform/DataflowJavaSDK
>>
>>
> --
> Jean-Baptiste Onofré
> jbono...@apache.org
> http://blog.nanthrax.net
> Talend - http://www.talend.com
>


Re: [PROPOSAL] Add 2.0.0 version in Jira

2017-03-09 Thread Davor Bonaci
I added "First stable release" in JIRA as a version identifier a while
back, to avoid any prejudice what that version might be called. I think it
makes sense to resolve (newly fixed) issues against that version. Once we
finalize the name of that release, we can easily rename it to either
"1.0.0" or "2.0.0".

I'd probably suggest to keep the 0.7.0-SNAPSHOT in the code until it is
completely clear that the next release is 1.0.0 or 2.0.0. It is easy to
bump the version upward -- typically things continue working well. If the
version has to be adjusted downward for some reason, it gets tricky since
Maven version ordering breaks for the nightly build. This is a good change,
just no reason to rush it in, I think.

On Thu, Mar 9, 2017 at 8:39 AM, Amit Sela  wrote:

> Well, for now at least. We have to use something for fixed issues..
>
> On Thu, Mar 9, 2017 at 6:32 PM Jean-Baptiste Onofré 
> wrote:
>
> > By the way, waiting the end of this discussion, we can use "First stable
> > release" as version in Jira.
> >
> > Regards
> > JB
> >
> > On 03/09/2017 07:21 AM, Jean-Baptiste Onofré wrote:
> > > Hi all,
> > >
> > > Release branch for 0.6.0 has been created but the next cycle version
> has
> > > not been created in Jira.
> > >
> > > I propose to create 2.0.0 version in Jira (it's always possible to
> > > rename the version later).
> > >
> > > No objection ?
> > >
> > > Thanks
> > > Regards
> > > JB
> >
> > --
> > Jean-Baptiste Onofré
> > jbono...@apache.org
> > http://blog.nanthrax.net
> > Talend - http://www.talend.com
> >
>


Re: [VOTE] Release 0.6.0, release candidate #1

2017-03-10 Thread Davor Bonaci
I agree with the sentiment that we should build a new release candidate.

BEAM-1674 problem. I have a fix for it in this PR:
> https://github.com/apache/beam/pull/2217
>
> I'm afraid we have to cancel the release yet again because this is a real
> bug that people can run into.
>

+1 -- let's take this PR.


> * As I already commented on past release votes, formally it's better to
> include commit id referencing the release, since git tags can be changed.
> Just take that into account for upcoming releases.
>

+1 -- let's do this for the next release candidate. We should update the
release guide, so we don't forget it next time.


> 1. The https://dist.apache.org/repos/dist/dev/beam/0.6.0/apache-bea
> m-0.6.0.tar.gz should be name apache-beam-0.6.0-python.tar.gz
>

+1 -- let's fix this.


> 2. The distribution archives are not consistents: the source distribution
> is a zip archive, the python distribution is tar.gz. I would provide both
> tar.gz and zip for both.
>

+1 -- let's fix this.

* Also I got this minor warning on the Maven build:
>
> [WARNING] Some problems were encountered while building the effective model
> for org.apache.beam:beam-runners-google-cloud-dataflow-java:jar:0.6.0
> [WARNING] 'dependencies.dependency.(groupId:artifactId:type:classifier)'
> must be unique: org.apache.beam:beam-runners-core-construction-java:jar ->
> duplicate declaration of version (?) @ org.apache.beam:beam-runners-g
> oogle-cloud-dataflow-java:[unknown-version],
> /home/wikier/tmp/beam/apache-b
> eam-0.6.0/runners/google-cloud-dataflow-java/pom.xml, line 360, column 19
>

+1 -- let's fix this.

3. Python should contain a README to start with.
>

This would be great, but no reason to block a release for it, right?

* I've noticed the build take much more time than previous releases.
>

This is the first time with the Python SDK in the release, as well as
several new IOs, so I'd expect it to take longer.

Do you perhaps have any breakdown where the time is going? No reason to
block a release, right?

* Notice the current Python SDK may require sudo permissions in some
> environments
>

Ahmet, any thoughts here? Would be good to fix, if feasible.


Re: Beam deploy uploads some artifacts twice

2017-03-13 Thread Davor Bonaci
I haven't seen this specific issue personally, but, generally speaking,
multiple executions are often caused by an incorrect "execution id" in a
pom.xml. Instead of re-configuring the default execution, it creates a new
execution -- and this has previously caused issues elsewhere.

Usually, this can be debugged by running the same command with -X. The
debug output will log all plugin goal executions, along with their
execution id. Then, the unexpected execution can be traced back to the
pom.xml that introduced it. Also, "mvn help:effective-pom" often helps.

On Sun, Mar 12, 2017 at 7:57 PM, Jean-Baptiste Onofré 
wrote:

> Hi Amit,
>
> I just arrived in Hong Kong. As I have all setup on my machine to
> reproduce, I will investigate this issue.
>
> I keep you posted asap.
>
> Regards
> JB
>
>
> On 03/12/2017 10:32 AM, Amit Sela wrote:
>
>> I've been trying to release an internal fork and found out that trying to
>> release with maven release plugin uploads twice some artifacts - for me it
>> was "beam-sdks-java-core", for JB (who helped me investigate this) it was
>> a
>> different artifact.
>> At first I blamed release plugin, but even a simple deploy tries to upload
>> twice.
>> While this does not affect the Apache release process (since it uses a
>> staging Nexus), it is not behaving as it should, and prevents (advanced)
>> users who want to release/deploy their own fork into a Nexus that actually
>> enforces a single upload for releases.
>>
>> If anyone knows this issue and knows what is the problem - why deploy goal
>> is executed twice for some artifcats - I'd appreciate their help.
>>
>> Thanks,
>> Amit
>>
>>
> --
> Jean-Baptiste Onofré
> jbono...@apache.org
> http://blog.nanthrax.net
> Talend - http://www.talend.com
>


Re: [VOTE] Release 0.6.0, release candidate #2

2017-03-14 Thread Davor Bonaci
+1 (binding)

Contingent on adding NOTICE and LICENSE files into
"apache-beam-0.6.0-python.zip", just as they are present in the
"apache-beam-0.6.0-source-release.zip".

On Tue, Mar 14, 2017 at 10:02 AM, Aljoscha Krettek 
wrote:

> +1 (binding)
>
> - verified release signature and hashes
> - mvn install -Prelease runs smoothly
> - created a Quickstart against the staging repo
>   - ran Quickstart with Flink local mode
>   - ran Quickstart against a Flink 1.2 cluster
>
> On Tue, Mar 14, 2017, at 01:44, Eugene Kirpichov wrote:
> > Conclusion (see JIRA): Not a release blocker (but still a bug in
> > TestPipeline).
> >
> > On Mon, Mar 13, 2017 at 5:40 PM Eugene Kirpichov 
> > wrote:
> >
> > > +Aljoscha Krettek 
> > >
> > > On Mon, Mar 13, 2017 at 5:30 PM Eugene Kirpichov  >
> > > wrote:
> > >
> > > +Stas Levin  +Thomas Groh 
> > >
> > > On Mon, Mar 13, 2017 at 5:30 PM Eugene Kirpichov  >
> > > wrote:
> > >
> > > https://issues.apache.org/jira/browse/BEAM-1712 might be a release
> > > blocker.
> > >
> > > On Mon, Mar 13, 2017 at 4:53 PM Ahmet Altay 
> > > wrote:
> > >
> > > Thank you for all the comment so far.
> > >
> > > On Mon, Mar 13, 2017 at 4:23 PM, Ted Yu  wrote:
> > >
> > > > bq.  I would prefer that we have a .tar.gz release
> > > >
> > > > +1
> > > >
> > > > On Mon, Mar 13, 2017 at 4:21 PM, Ismaël Mejía 
> wrote:
> > > >
> > > > > ​+1 (non-binding)
> > > > >
> > > > > - verified signatures + checksums
> > > > > - run mvn clean install -Prelease, all artifacts build and the
> tests
> > > run
> > > > > smoothly (modulo some local issues I had with the installation of
> tox
> > > for
> > > > > the python sdk, I created a PR to fix those in case other people
> can
> > > have
> > > > > the same trouble).
> > > > >
> > > > > Some remarks still to fix from the release, but that I don’t
> consider
> > > > > blockers:
> > > > >
> > > > > 1. The section Getting Started in the main README.md needs to be
> > > updated
> > > > > with the information about the creating/activating the virtualenv.
> At
> > > > this
> > > > > moment just running mvn clean install won’t work without this.
> > > >
> > >
> > > mvn clean install should run without any additional steps, including
> the
> > > creation of a virtualenv. tox will manage this process, and it is
> already
> > > integrated Maven.
> > >
> > >
> > > > >
> > > > > 2.  Both zip files in the current release produce a folder with the
> > > same
> > > > > name ‘apache-beam-0.6.0’. This can be messy if users unzip both
> files
> > > > into
> > > > > the same folder (as happened to me, the compressed files should
> > > produce a
> > > > > directory with the exact same name that the file, so
> > > > > apache-beam-0.6.0-python.zip will produce apache-beam-0.6.0-python
> and
> > > > the
> > > > > other its respective directory.
> > > >
> > > >
> > > > > 3. The name of the files of the release probably should be
> different:
> > > > >
> > > > > The source release could be just apache-beam-0.6.0.zip instead of
> > > > > apache-beam-0.6.0-source-release.zip considering that we don’t
> have
> > > > binary
> > > > > artifacts, or just apache-beam-0.6.0-src.zip following the
> convention
> > > of
> > > > > other apache projects.
> > > > >
> > > > > The python release also could be renamed from
> > > > > apache-beam-0.6.0-bin-python.zip instead of
> > > apache-beam-0.6.0-python.zip
> > > > > so
> > > > > users understand that these are executable files (but well I am not
> > > sure
> > > > > about that one considering that python is a scripting language).
> > > >
> > >
> > > Python distribution is a source distribution, adding bin to the name
> would
> > > be confusing.
> > >
> > >
> > > > >
> > > > > Finally I would prefer that we have a .tar.gz release as JB
> mentioned
> > > in
> > > > > the previous vote, and as most apache projects do. In any case if
> the
> > > zip
> > > > > is somehow a requirement it would be nice to have both a .zip and a
> > > > .tar.gz
> > > > > file.
> > > > >
> > > >
> > >
> > > I think we should move this to a different thread. IMO, having a single
> > > source of truth is better than having both file formats. Between both
> file
> > > formats I don't have a strong opinion but considering the Windows
> users zip
> > > might be a portable option.
> > >
> > > Thank you,
> > > Ahmet
> > >
> > >
>


Re: Apache Beam (virtual) contributor meeting @ Tue Mar 7, 2017

2017-03-16 Thread Davor Bonaci
I'd like to thank everyone for coming -- notes and summary of the
discussion are below.

If there's any feedback, ideas for improvement, requests to do this again
at some point, etc. -- please comment!

---

Attendees:
* Jason
* Etienne
* Kenn
* Neleesh
* Pramod
* Raghu
* Sergio
* Amit
* Aviem
* Stas
* Koby
* Thomas
* Mingmin
* Ismael
* JB
* Kai
* Frances
* Ahmet
* Robert
* Stephen
* Davor

Discussion topics:
* First stable release
* Upcoming conferences

First stable release:
* The next big milestone for the project -- it means Beam is ready for
prime time
* Timeline: April.
* JB: encourage people to test Beam; test different scenarios; get more
feedback from community
* Davor: more experiences deploying Beam; more polish around user experience
* Thomas: few kinks to be worked out; documentation; easier-to-understand
examples, HDFS in particular
* Mingmin: better documentation / examples.
* Aviem: run pipelines on clusters; had to use Dataflow documentation at
times
* Ismael: examples use too many GCP IOs
* Sergio: docker can help with elaborate setup for examples, including IOs

General:
* Amit: we should position our batch offering better: agility, IOs are
major advantage
* Sergio: performance benchmarking -- it will push the needle because
everyone wants to be on top
* Davor: versioning -- how to support multiple versions of the system we
interconnect with?
* Mingmin: SQL interface in the first version?
* Neelesh: pursue more use case blog posts
* Etienne: should we pursue other projects to maintain their connectors
with Beam?
* Raghu: usage of coders in IO?

Upcoming conferences:
* ApacheCon coming up in May -- schedule to be published shortly
* There'll be Beam talks as well as social gatherings -- everyone's invited!

Action items:
* [Davor] Compare Dataflow and Beam documentation, and report back
* [all] Examples vs. GCP IO
* [JB] Blog: Talend use case
* [Amit] Blog: PayPal use case
* [all] Investigate docker usage in examples

On Tue, Mar 7, 2017 at 1:46 AM, Sergio Fernández  wrote:

> Thanks, Davor!
>
> On Tue, Mar 7, 2017 at 3:20 AM, Davor Bonaci  wrote:
>
> > Link: https://hangouts.google.com/hangouts/_/google.com/beam-dev-mtg
> >
> > I'll try to be available on Slack shortly before the meeting, just in
> case
> > someone has trouble connecting.
> >
> > On Mon, Mar 6, 2017 at 9:27 AM, Amit Sela  wrote:
> >
> > > PayPal team will be there joined together.
> > >
> > > On Mon, Mar 6, 2017 at 7:23 PM Davor Bonaci  wrote:
> > >
> > > > Just a remainder that this is happening in about ~22 hours from now.
> > Hope
> > > > to see all of you there.
> > > >
> > > > On Thu, Mar 2, 2017 at 4:22 PM, Davor Bonaci 
> wrote:
> > > >
> > > > > I'd prefer not to record the video; just to keep things informal.
> > > We'll,
> > > > > however, keep the notes and share anything that may be relevant.
> > > > >
> > > > > On Thu, Mar 2, 2017 at 2:24 PM, Amit Sela 
> > > wrote:
> > > > >
> > > > >> I'll be there!
> > > > >>
> > > > >> On Thu, Mar 2, 2017 at 1:06 PM Aljoscha Krettek <
> > aljos...@apache.org>
> > > > >> wrote:
> > > > >>
> > > > >> > Shoot, I can't because I already have another meeting scheduled.
> > > Don't
> > > > >> mind
> > > > >> > me, though. Will you also maybe produce a video of the meeting?
> > > > >> >
> > > > >> > On Wed, 1 Mar 2017 at 21:50 Davor Bonaci 
> > wrote:
> > > > >> >
> > > > >> > > Hi everyone,
> > > > >> > > Based on the high demand [1], let's try to organize a virtual
> > > > >> contributor
> > > > >> > > meeting on Tuesday, March 7, 2017 at 15:00 UTC. For
> convenience,
> > > > >> calendar
> > > > >> > > link [2] and an .ics file are attached.
> > > > >> > >
> > > > >> > > I tried to accommodate as many time zones as possible, but I
> > know
> > > it
> > > > >> > might
> > > > >> > > be hard for some of us at 7 AM on the US west coast or 11 PM
> in
> > > > China.
> > > > >> > > Sorry about that.
> > > > >> > >
> > > > >> > > Let's use Google Hangouts as the video conferencing
> technology.
> > I
> > > > >> think
> > > > >> > we
> > > >

[ANNOUNCEMENT] New committers, March 2017 edition!

2017-03-17 Thread Davor Bonaci
Please join me and the rest of Beam PMC in welcoming the following
contributors as our newest committers. They have significantly contributed
to the project in different ways, and we look forward to many more
contributions in the future.

* Chamikara Jayalath
Chamikara has been contributing to Beam since inception, and previously to
Google Cloud Dataflow, accumulating a total of 51 commits (8,301 ++ / 3,892
--) since February 2016 [1]. He contributed broadly to the project, but
most significantly to the Python SDK, building the IO framework in this SDK
[2], [3].

* Eugene Kirpichov
Eugene has been contributing to Beam since inception, and previously to
Google Cloud Dataflow, accumulating a total of 95 commits (22,122 ++ /
18,407 --) since February 2016 [1]. In recent months, he’s been driving the
Splittable DoFn effort [4]. A true expert on IO subsystem, Eugene has
reviewed nearly every IO contributed to Beam. Finally, Eugene contributed
the Beam Style Guide, and is championing it across the project.

* Ismaël Mejia
Ismaël has been contributing to Beam since mid-2016, accumulating a total
of 35 commits (3,137 ++ / 1,328 --) [1]. He authored the HBaseIO connector,
helped on the Spark runner, and contributed in other areas as well,
including cross-project collaboration with Apache Zeppelin. Ismaël reported
24 Jira issues.

* Aviem Zur
Aviem has been contributing to Beam since early fall, accumulating a total
of 49 commits (6,471 ++ / 3,185 --) [1]. He reported 43 Jira issues, and
resolved ~30 issues. Aviem improved the stability of the Spark runner a
lot, and introduced support for metrics. Finally, Aviem is championing
dependency management across the project.

Congratulations to all four! Welcome!

Davor

[1]
https://github.com/apache/beam/graphs/contributors?from=2016-02-01&to=2017-03-17&type=c
[2]
https://github.com/apache/beam/blob/v0.6.0/sdks/python/apache_beam/io/iobase.py#L70
[3]
https://github.com/apache/beam/blob/v0.6.0/sdks/python/apache_beam/io/iobase.py#L561
[4] https://s.apache.org/splittable-do-fn


Re: Release notes snapshotting to website

2017-03-19 Thread Davor Bonaci
Yes, I think we should do this -- primarily to increase the quality of the
release notes, which often need a reword. JIRA makes it easy by
automatically generating HTML for it.

The first stable release is an excellent time to start doing this!

On Sun, Mar 19, 2017 at 12:37 AM, Tibor Kiss  wrote:

> Hello,
>
> I’d like to propose to put the ‘Release notes’ as a separate document into
> the website.
>
> Currently we are providing the notes through JIRA search. If someone
> (accidentally) sets a ticket’s fix version to an already released version
> it will pop-up in the linked search later.
> Snapshotting to the website would resolve such problems.
>
> What do you think?
>
> - Tibor
>
>
> Begin forwarded message:
>
> From: Ahmet Altay mailto:al...@google.com>>
> Subject: Apache Beam, version 0.6.0 with Python SDK
> Date: March 17, 2017 at 6:19:26 AM GMT+1
> To: u...@beam.apache.org
> Reply-To: mailto:u...@beam.apache.org>>
>
> The Apache Beam community is pleased to announce the availability of the
> 0.6.0 release [1].
>
> This release introduces a new SDK for the Python programming language [2].
> Additionally, the release adds a new IO connector for Apache HBase in the
> Java SDK, along with a usual batch of bug fixes and improvements. Finally,
> several runners improved their support for the Beam model, including
> support for the recently-introduced State and Timer API, and Beam’s
> connectors to distributed file systems. For all major changes in this
> release, please refer to the release notes [3].
>
> The 0.6.0 release is now the recommended version; we encourage everyone to
> upgrade from any earlier releases.
>
> We thank all users and contributors who have helped make this release
> possible. If you haven't already, we'd like to invite you to join us, as we
> work towards our first release with API stability.
>
> - Ahmet Altay, on behalf of the Apache Beam community.
>
> [1] https://beam.apache.org/get-started/downloads/
> [2] https://beam.apache.org/blog/2017/03/16/python-sdk-release.html
> [3] https://issues.apache.org/jira/secure/ReleaseNote.jspa?
> projectId=12319527&version=12339256
>
>


  1   2   >