[RESULT] Release Apache Beam, version 0.2.0-incubating

2016-08-07 Thread Dan Halperin
I am happy to announce that the Incubator PMC has approved version
0.2.0-incubating-RC2 of Apache Beam for release as version 0.2.0-incubating.

There have been 6 binding approval votes from the IPMC:

* Jean-Baptiste Onofré
* John D. Ament
* Justin Mclean
* P. Taylor Goetz
* Seetharam Venkatesh
* Sergio Fernández

There are no disapproving votes. We will proceed with releasing the
candidate as staged.

Thanks!
Dan, on behalf of the Apache Beam community.


Re: [VOTE] Release Apache Beam, version 0.2.0-incubating

2016-08-07 Thread Dan Halperin
Thanks everyone for participating! At this time, more than 5 days since the
initial email and with 6 IPMC votes, I would like to declare the vote
closed. I will summarize in the following RESULT thread.

Thanks,
Dan


On Sat, Aug 6, 2016 at 10:57 AM, P. Taylor Goetz  wrote:

> FYI. Had the same address issue.
>
>
> Begin forwarded message:
>
> > From: "P. Taylor Goetz" 
> > Date: August 5, 2016 at 10:45:21 AM EDT
> > To: gene...@apache.incubator.org
> > Cc: dev@beam.incubator.apache.org
> > Subject: Re: [VOTE] Release Apache Beam, version 0.2.0-incubating
> >
> > +1 (binding)
> >
> > - built from source
> > - “incubating” in file name
> > - NOTICE and LICENSE look good
> > - license headers present
> > - no wayward binaries
> > - signatures check out
> >
> > -Taylor
> >
> >> On Aug 1, 2016, at 12:46 PM, Dan Halperin 
> wrote:
> >>
> >> Hey folks!
> >>
> >> Here's the vote for the second release of Apache Beam: version
> >> 0.2.0-incubating.
> >>
> >> The complete staging area is available for your review, which includes:
> >> * the official Apache source release to be deployed to dist.apache.org
> [1],
> >> and
> >> * all artifacts to be deployed to the Maven Central Repository [2].
> >>
> >> This corresponds to the tag "v0.2.0-incubating-RC2" in source control,
> [3].
> >>
> >> New for this release:
> >> * Release notes are available in JIRA [4].
> >> * We made sure to address all the issues that the Apache Incubator PMC
> >> raised in the previous release [5].
> >>
> >> The Apache Beam community has unanimously approved this release: [6],
> [7].
> >>
> >> Please vote as follows:
> >> [ ] +1, Approve the release
> >> [ ] -1, Do not approve the release (please provide specific comments)
> >>
> >> Thanks,
> >> Dan
> >>
> >> As customary, the vote will be open for at least 72 hours. It is
> adopted by
> >> a majority approval with at least three PMC affirmative votes. If
> approved,
> >> we will proceed with the release.
> >>
> >> [1]https://dist.apache.org/repos/dist/dev/incubator/beam/
> 0.2.0-incubating/RC2/
> >> [2] https://repository.apache.org/content/repositories/
> orgapachebeam-1004/
> >> [3] https://github.com/apache/incubator-beam/tree/v0.2.0-incubating-RC2
> >> [4]https://issues.apache.org/jira/secure/ReleaseNote.jspa?
> projectId=12319527&version=12335766
> >> [5] Thread afterhttp://mail-archives.apache.org/mod_mbox/incubator-
> general/201606.mbox/%3CCAMdX748VZg5-p%3D5x63se-
> iBZU0e32n20aRyVsDPWhWZaoq7SoA%40mail.gmail.com%3E
> >> [6]http://mail-archives.apache.org/mod_mbox/incubator-
> beam-dev/201607.mbox/%3CCAA8k_FJeyg%2BGWUBMeSPFQhnaPN3V4MrenJtrDbi
> yXyKJkzH7ZA%40mail.gmail.com%3E
> >> [7] http://mail-archives.apache.org/mod_mbox/incubator-beam-
> dev/201607.mbox/browser
> >
>


Re: [PROPOSAL] Website page or Jira to host all current proposal discussion and docs

2016-08-07 Thread Jean-Baptiste Onofré

Good point Ben.

I would say a "discussion" Jira can "evolve" to a implementation "Jira" 
(just changing the component).


WDYT ?

Regards
JB

On 08/08/2016 06:50 AM, Ben Chambers wrote:

Would we use the same Jira to track the series of PRs implementing the
proposal (if accepted) or would it be discussion only (possibly linked to
the implementation tasks)?

On Sun, Aug 7, 2016, 9:48 PM Frances Perry  wrote:


I'm a huge fan of keeping all the details related to a topic in a relevant
jira issue.

On Sun, Aug 7, 2016 at 9:31 PM, Jean-Baptiste Onofré 
wrote:


Hi guys,

we have now several technical discussions, sent on the mailing list with
link to document for details.

I think it's not easy for people to follow the different discussions, and
to look for the e-mail containing the document links.

Of course, it's required to have the discussion on the mailing list (per
Apache rules). However, maybe it could be helpful to have a place to find
open discussions, with the link to the mailing list discussion thread,

and

to the detailed document.
It could be on the website (but maybe not easy to maintain and publish),
or on Jira (one Jira per discussion), or a wiki.

WDYT ?

Regards
JB
--
Jean-Baptiste Onofré
jbono...@apache.org
http://blog.nanthrax.net
Talend - http://www.talend.com







--
Jean-Baptiste Onofré
jbono...@apache.org
http://blog.nanthrax.net
Talend - http://www.talend.com


Re: [PROPOSAL] Website page or Jira to host all current proposal discussion and docs

2016-08-07 Thread Jean-Baptiste Onofré

Me too ;)

What I'm doing in other Apache projects (like Karaf) is a Jira component 
named "discussion" where I describe the discussion and attach the 
related documents.


So, with a quick query, we can find all pending discussion, etc.

My $0.01

Regards
JB

On 08/08/2016 06:47 AM, Frances Perry wrote:

I'm a huge fan of keeping all the details related to a topic in a relevant
jira issue.

On Sun, Aug 7, 2016 at 9:31 PM, Jean-Baptiste Onofré 
wrote:


Hi guys,

we have now several technical discussions, sent on the mailing list with
link to document for details.

I think it's not easy for people to follow the different discussions, and
to look for the e-mail containing the document links.

Of course, it's required to have the discussion on the mailing list (per
Apache rules). However, maybe it could be helpful to have a place to find
open discussions, with the link to the mailing list discussion thread, and
to the detailed document.
It could be on the website (but maybe not easy to maintain and publish),
or on Jira (one Jira per discussion), or a wiki.

WDYT ?

Regards
JB
--
Jean-Baptiste Onofré
jbono...@apache.org
http://blog.nanthrax.net
Talend - http://www.talend.com





--
Jean-Baptiste Onofré
jbono...@apache.org
http://blog.nanthrax.net
Talend - http://www.talend.com


Re: [PROPOSAL] Website page or Jira to host all current proposal discussion and docs

2016-08-07 Thread Ben Chambers
Would we use the same Jira to track the series of PRs implementing the
proposal (if accepted) or would it be discussion only (possibly linked to
the implementation tasks)?

On Sun, Aug 7, 2016, 9:48 PM Frances Perry  wrote:

> I'm a huge fan of keeping all the details related to a topic in a relevant
> jira issue.
>
> On Sun, Aug 7, 2016 at 9:31 PM, Jean-Baptiste Onofré 
> wrote:
>
> > Hi guys,
> >
> > we have now several technical discussions, sent on the mailing list with
> > link to document for details.
> >
> > I think it's not easy for people to follow the different discussions, and
> > to look for the e-mail containing the document links.
> >
> > Of course, it's required to have the discussion on the mailing list (per
> > Apache rules). However, maybe it could be helpful to have a place to find
> > open discussions, with the link to the mailing list discussion thread,
> and
> > to the detailed document.
> > It could be on the website (but maybe not easy to maintain and publish),
> > or on Jira (one Jira per discussion), or a wiki.
> >
> > WDYT ?
> >
> > Regards
> > JB
> > --
> > Jean-Baptiste Onofré
> > jbono...@apache.org
> > http://blog.nanthrax.net
> > Talend - http://www.talend.com
> >
>


Re: [PROPOSAL] Website page or Jira to host all current proposal discussion and docs

2016-08-07 Thread Frances Perry
I'm a huge fan of keeping all the details related to a topic in a relevant
jira issue.

On Sun, Aug 7, 2016 at 9:31 PM, Jean-Baptiste Onofré 
wrote:

> Hi guys,
>
> we have now several technical discussions, sent on the mailing list with
> link to document for details.
>
> I think it's not easy for people to follow the different discussions, and
> to look for the e-mail containing the document links.
>
> Of course, it's required to have the discussion on the mailing list (per
> Apache rules). However, maybe it could be helpful to have a place to find
> open discussions, with the link to the mailing list discussion thread, and
> to the detailed document.
> It could be on the website (but maybe not easy to maintain and publish),
> or on Jira (one Jira per discussion), or a wiki.
>
> WDYT ?
>
> Regards
> JB
> --
> Jean-Baptiste Onofré
> jbono...@apache.org
> http://blog.nanthrax.net
> Talend - http://www.talend.com
>


[PROPOSAL] Website page or Jira to host all current proposal discussion and docs

2016-08-07 Thread Jean-Baptiste Onofré

Hi guys,

we have now several technical discussions, sent on the mailing list with 
link to document for details.


I think it's not easy for people to follow the different discussions, 
and to look for the e-mail containing the document links.


Of course, it's required to have the discussion on the mailing list (per 
Apache rules). However, maybe it could be helpful to have a place to 
find open discussions, with the link to the mailing list discussion 
thread, and to the detailed document.
It could be on the website (but maybe not easy to maintain and publish), 
or on Jira (one Jira per discussion), or a wiki.


WDYT ?

Regards
JB
--
Jean-Baptiste Onofré
jbono...@apache.org
http://blog.nanthrax.net
Talend - http://www.talend.com


Re: Proposal: Dynamic PIpelineOptions

2016-08-07 Thread Jean-Baptiste Onofré

+1

Thanks Sam, it sounds interesting.

Regards
JB

On 07/29/2016 09:14 PM, Sam McVeety wrote:

During the graph construction phase, the given SDK generates an initial
execution graph for the program.  At execution time, this graph is
executed, either locally or by a service.  Currently, Beam only supports
parameterization at graph construction time.  Both Flink and Spark supply
functionality that allows a pre-compiled job to be run without SDK
interaction with updated runtime parameters.

In its current incarnation, Dataflow can read values of PipelineOptions at
job submission time, but this requires the presence of an SDK to properly
encode these values into the job.  We would like to build a common layer
into the Beam model so that these dynamic options can be properly provided
to jobs.

Please see
https://docs.google.com/document/d/1I-iIgWDYasb7ZmXbGBHdok_IK1r1YAJ90JG5Fz0_28o/edit
for the high-level model, and
https://docs.google.com/document/d/17I7HeNQmiIfOJi0aI70tgGMMkOSgGi8ZUH-MOnFatZ8/edit
for
the specific API proposal.

Cheers,
Sam



--
Jean-Baptiste Onofré
jbono...@apache.org
http://blog.nanthrax.net
Talend - http://www.talend.com


Re: Proposal: Dynamic PIpelineOptions

2016-08-07 Thread Amit Sela
+1 sounds like a good idea.

Spark's driver actually takes all dynamic parameters starting with "spark."
and propagates them into SparkConf which is propagated onto the Executors
and is available via the environment's SparkEnv.

I'm wondering, does this mean that PipelineOption will be available to the
PTransform, or only the ValueSupplier (yes, (4) for me too please) ?

Thanks,
Amit

On Fri, Aug 5, 2016 at 5:41 PM Aljoscha Krettek  wrote:

> +1
>
> It's true that Flink provides a way to pass dynamic parameters to operator
> instances. That's not used in any of the built-in sources and operators,
> however. They are instantiated with their parameters when the graph is
> constructed. So what you are suggesting for Beam would actually provide
> more functionality than what we currently have in Flink. :-)
>
> Out of the options I think (4) would be the best. (1) and (2) are not type
> safe, correct? and (3) seems very boilerplate-y.
>
> Cheers,
> Aljoscha
>
> On Thu, 4 Aug 2016 at 21:53 Frances Perry  wrote:
>
> > +Amit, Aljoscha, Manu
> >
> > Any comments from folks on the Flink, Spark, or Gearpump runners?
> >
> > On Tue, Aug 2, 2016 at 11:10 AM, Robert Bradshaw <
> > rober...@google.com.invalid> wrote:
> >
> > > Being able to "late-bind" parameters like input paths to a
> > > pre-constructed program would be a very useful feature, and I think is
> > > worth adding to Beam.
> > >
> > > Of the four API proposals, I have a strong preference for (4).
> > > Further, it seems that these need not be bound to the PipelineOptions
> > > object itself (i.e. a named RuntimeValueSupplier could be constructed
> > > off of a pipeline object), which the Python API makes less heavy use
> > > of (encouraging the user to use familiar, standard libraries for
> > > argument parsing), though of course such integration is useful to
> > > provide for convenience.
> > >
> > > - Robert
> > >
> > > On Fri, Jul 29, 2016 at 12:14 PM, Sam McVeety  >
> > > wrote:
> > > > During the graph construction phase, the given SDK generates an
> initial
> > > > execution graph for the program.  At execution time, this graph is
> > > > executed, either locally or by a service.  Currently, Beam only
> > supports
> > > > parameterization at graph construction time.  Both Flink and Spark
> > supply
> > > > functionality that allows a pre-compiled job to be run without SDK
> > > > interaction with updated runtime parameters.
> > > >
> > > > In its current incarnation, Dataflow can read values of
> PipelineOptions
> > > at
> > > > job submission time, but this requires the presence of an SDK to
> > properly
> > > > encode these values into the job.  We would like to build a common
> > layer
> > > > into the Beam model so that these dynamic options can be properly
> > > provided
> > > > to jobs.
> > > >
> > > > Please see
> > > > https://docs.google.com/document/d/1I-iIgWDYasb7ZmXbGBHdok_I
> > > K1r1YAJ90JG5Fz0_28o/edit
> > > > for the high-level model, and
> > > > https://docs.google.com/document/d/17I7HeNQmiIfOJi0aI70tgGMM
> > > kOSgGi8ZUH-MOnFatZ8/edit
> > > > for
> > > > the specific API proposal.
> > > >
> > > > Cheers,
> > > > Sam
> > >
> >
>